#大语言模型#Control GenAI interactions with power, precision, and consistency using Conversation Modeling paradigms
#计算机科学#A curated list of trustworthy deep learning papers. Daily updating...
#计算机科学#PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Sa...
Code accompanying the paper Pretraining Language Models with Human Preferences
#自然语言处理#📚 A curated list of papers & technical articles on AI Quality & Safety
How to Make Safe AI? Let's Discuss! 💡|💬|🙌|📚
[AAAI'25 Oral] "MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector".
Reading list for adversarial perspective and robustness in deep reinforcement learning.
#Awesome#A curated list of awesome academic research, books, code of ethics, data sets, institutes, maturity models, newsletters, principles, podcasts, reports, tools, regulations and standards related to Resp...
#Awesome#A curated list of awesome resources for Artificial Intelligence Alignment research
Directional Preference Alignment
Website to track people, organizations, and products (tools, websites, etc.) in AI safety
#大语言模型#[TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"
An initiative to create concise and widely shareable educational resources, infographics, and animated explainers on the latest contributions to the community AI alignment effort. Boosting the signal ...
Official Implementation of Nabla-GFlowNet (ICLR 2025)
Scan your AI/ML models for problems before you put them into production.
#大语言模型#Code and materials for the paper S. Phelps and Y. I. Russell, Investigating Emergent Goal-Like Behaviour in Large Language Models Using Experimental Economics, working paper, arXiv:2305.07970, May 202...
#Awesome#Community list of awesome projects, apps, tools and more related to Polis.