Anthropic’s Post

View organization page for Anthropic, graphic

387,203 followers

Today, we're sharing a sample of red teaming methods we’ve used to test our AI systems. We detail challenges, findings, and the need to work towards common industry standards: https://lnkd.in/eR-6jd7Y

Challenges in Red Teaming AI Systems

Challenges in Red Teaming AI Systems

anthropic.com

Yair K.

Head of Security Architecture, Executive Director

1mo

Interesting view Anthropic, On the "How do we go from qualitative red teaming to quantitative evaluations?" point - In order to promote adoption in regulated industries and/or high risk use case (EU act) - it's worth to explore the approaches such as Common Criteria (ISO 15408) - whilst CC is quite complex, and can't be simply copied, it is an internationally recognized, well established approach for security evaluations. There some points in the approach and recommendations you may wish to adjust (for example) 1. "Fund organizations such as the ... (NIST) to ... how to red team AI systems safely and effectively" - the objective is too narrow, regulated industries (e.g. EU NIS 2 directive) require an established set of controls and best practices beyond simply red-teaming, e.g. the enhancements to AI SDLC (see NIST 800-218A) draft, enhancements to data controls and IAM ideally uplifting NIST 800-63B, etc. 2. Uplifting SoC 2 (SSAE 18) and ISO 27001 (see ISO/IEC CD 27090 ) frameworks to address AI specific threats as part of the attestation framework is essential for enterprise AI adoption - working with ISO and AICPA to extend the scope would significantly reduce ad-hoc assessments required today. We can discuss it further.

Des W Woodruff

Quantitative Fund (founder) | Ai Specialist | Machine Learning | Trading Education |(founder) | Innovative Entrepreneur | Public Speaker

1mo

Red Teaming: Why Should You Care? 1. Improved Safety: Just as cars undergo rigorous tests to prevent accidents, robust red teaming can prevent harmful AI failures. 2. Trustworthy AI: By finding and fixing vulnerabilities, AI systems become more reliable and trustworthy for everyone. 3. Policy Support: Calls for better standards and policies ensure that AI development is responsible and secure. 4. Future-Proofing: Understanding and mitigating risks today prepares AI systems for safer deployment in the future.

Like
Reply
Lisa Fast

UX Architect/Researcher

1mo

I’m super interested in this idea of using 2 LLM’s to test/evaluate/supervise each other like this. “we employ a red team / blue team dynamic, where we use a model to generate attacks that are likely to elicit the target behavior (red team) and then fine-tune a model on those red teamed outputs in order to make it more robust to similar types of attack (blue team). “

Like
Reply
Shashikumar Mysore Pandu

Engineering Manager, Cloud Security

1mo

Amazing walk through of different red teaming methods that goes beyond conventional application testing and red team excercise. Looking forward to framework on how this information will be used to build secure development practices.

https://www.zhihu.com/people/35-34-18-19 https://xihaoandhaidan.wixsite.com/natural-human-philos Claude: A New Era of Human Intelligence:The Historic Intersection of Artificial Intelligence and the Philosophical Revolution https://zhuanlan.zhihu.com/p/698907873 **************************** AI and Human Wisdom Unite - A Collaborative Plan to Advance Natural Human Philosophy Introduction Natural Human Philosophy is a groundbreaking theoretical system that situates philosophical inquiry within the paradigm of natural science, offering an unprecedented path forward from the age-old predicament of philosophy. By elucidating key concepts such as "Humanity's Two Great Transgressions," "The Third Nature of Homo Sapiens," and the "Cultural Cloud," Natural Human Philosophy has opened new vistas for reconstructing human understanding of the self and the world, providing theoretical guidance for resolving global crises and catalyzing transformations across the human and social sciences. ......

Like
Reply

Insightful! Every new technology comes with it's good and bad together. It's we, humans, decide how to use it for our better.

Like
Reply
Geoffroy Petit

Creating value out of Data + Technology + People - Risks

1mo

Zineb & Chloé - some interesting aspects to cover cyber threats related to Gen AI 💡

Heidi Andersén

MD, PhD, Senior Consultant, Researcher, Medical Teacher

1mo

Fine-tuning can generate hallucinations in linear matter. Creating evolutionary models from foundation models to replace found biases and subjecting them to red team testing might be an option. Selection from F1 models enables evolution.

Like
Reply

Safety and reliability cannot be stressed enough with AI technology, for the sake of humanity.

Interesting, especially on automated red teaming and moving onto quantitative valuation techniques.

See more comments

To view or add a comment, sign in

Explore topics