ai-alignment · GitHub Topics

emcie-co / parlant

#大语言模型#Control GenAI interactions with power, precision, and consistency using Conversation Modeling paradigms

ai-agents genai 大语言模型 customer-service customer-success gemini llama3 openai Python ai-alignment

Python 1.99 k

16 小时前

MinghuiChen43 / awesome-trustworthy-deep-learning

#计算机科学#A curated list of trustworthy deep learning papers. Daily updating...

adversarial-machine-learning 安全隐私深度学习 poisoning fairness backdoor ownership robustness interpretable-deep-learning causality hallucinations uncertainty watermarking ai-alignment

364

4 天前

agencyenterprise / PromptInject

#计算机科学#PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Sa...

ai-safety language-models ml-safety agi ai-alignment adversarial-attacks gpt-3 large-language-models 机器学习 chain-of-thought prompt-engineering

Python 357

1 年前

tomekkorbak / pretraining-with-human-feedback

Code accompanying the paper Pretraining Language Models with Human Preferences

ai-alignment ai-safety gpt language-models pretraining reinforcement-learning rlhf

Python 180

1 年前

Giskard-AI / awesome-ai-safety

#自然语言处理#📚 A curated list of papers & technical articles on AI Quality & Safety

人工智能 ai-alignment ai-safety 大语言模型 llmops 机器学习 mlops 自然语言处理 ml-testing model-validation 机器视觉 Awesome Lists ml-safety robustness

175

1 年前

lets-make-safe-ai / make-safe-ai

How to Make Safe AI? Let's Discuss! 💡|💬|🙌|📚

agi 人工智能 ai-safety artificial-general-intelligence ai-alignment

168

2 年前

tsinghua-fib-lab / AAAI2025_MIA-Tuner

[AAAI'25 Oral] "MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector".

ai-alignment large-language-models

Python 142

1 个月前

EzgiKorkmaz / adversarial-reinforcement-learning

Reading list for adversarial perspective and robustness in deep reinforcement learning.

robust-machine-learning deep-reinforcement-learning ai-safety multiagent-reinforcement-learning ai-alignment adversarial-machine-learning responsible-ai

110

3 天前

AthenaCore / AwesomeResponsibleAI

#Awesome#A curated list of awesome academic research, books, code of ethics, data sets, institutes, maturity models, newsletters, principles, podcasts, reports, tools, regulations and standards related to Resp...

responsible-ai xai fairness-ai Awesome Lists explainable-ai interpretable-ai 人工智能 ai-alignment ai-safety

5 天前

dit7ya / awesome-ai-alignment

#Awesome#A curated list of awesome resources for Artificial Intelligence Alignment research

Awesome Lists ai-safety ai-alignment

2 年前

RLHFlow / Directional-Preference-Alignment

Directional Preference Alignment

rlhf ai-alignment large-language-models

7 个月前

wesg52 / sparse-probing-paper

Sparse probing paper full code.

ai-alignment ai-safety interpretability

Jupyter Notebook 56

1 年前

riceissa / aiwatch

Website to track people, organizations, and products (tools, websites, etc.) in AI safety

ai-safety PHP 数据库 dataset ai-alignment MySQL

HTML 21

4 小时前

UCSC-VLAA / Sight-Beyond-Text

#大语言模型#[TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"

llama2 llava 大语言模型 mllm vicuna vision-language ai-alignment alignment vlm

Python 19

2 年前

liondw / Signal-Alignment

An initiative to create concise and widely shareable educational resources, infographics, and animated explainers on the latest contributions to the community AI alignment effort. Boosting the signal ...

人工智能 ai-alignment design 教学

2 年前

lzzcd001 / nabla-gfn

Official Implementation of Nabla-GFlowNet (ICLR 2025)

ai-alignment diffusion-models generative-model finetuning

Python 17

5 天前

IQTLabs / daisybell

Scan your AI/ML models for problems before you put them into production.

bias-correction bias-detection Cybersecurity ai-alignment ai-safety

Python 12

13 天前

phelps-sg / llm-cooperation

#大语言模型#Code and materials for the paper S. Phelps and Y. I. Russell, Investigating Emergent Goal-Like Behaviour in Large Language Models Using Experimental Economics, working paper, arXiv:2305.07970, May 202...

economics gpt-3 大语言模型 ai-safety ai-alignment gpt-4

Python 12

4 个月前