Today, we're sharing a sample of red teaming methods we’ve used to test our AI systems. We detail challenges, findings, and the need to work towards common industry standards: https://lnkd.in/eR-6jd7Y
Red Teaming: Why Should You Care? 1. Improved Safety: Just as cars undergo rigorous tests to prevent accidents, robust red teaming can prevent harmful AI failures. 2. Trustworthy AI: By finding and fixing vulnerabilities, AI systems become more reliable and trustworthy for everyone. 3. Policy Support: Calls for better standards and policies ensure that AI development is responsible and secure. 4. Future-Proofing: Understanding and mitigating risks today prepares AI systems for safer deployment in the future.
I’m super interested in this idea of using 2 LLM’s to test/evaluate/supervise each other like this. “we employ a red team / blue team dynamic, where we use a model to generate attacks that are likely to elicit the target behavior (red team) and then fine-tune a model on those red teamed outputs in order to make it more robust to similar types of attack (blue team). “
Amazing walk through of different red teaming methods that goes beyond conventional application testing and red team excercise. Looking forward to framework on how this information will be used to build secure development practices.
https://www.zhihu.com/people/35-34-18-19 https://xihaoandhaidan.wixsite.com/natural-human-philos Claude: A New Era of Human Intelligence:The Historic Intersection of Artificial Intelligence and the Philosophical Revolution https://zhuanlan.zhihu.com/p/698907873 **************************** AI and Human Wisdom Unite - A Collaborative Plan to Advance Natural Human Philosophy Introduction Natural Human Philosophy is a groundbreaking theoretical system that situates philosophical inquiry within the paradigm of natural science, offering an unprecedented path forward from the age-old predicament of philosophy. By elucidating key concepts such as "Humanity's Two Great Transgressions," "The Third Nature of Homo Sapiens," and the "Cultural Cloud," Natural Human Philosophy has opened new vistas for reconstructing human understanding of the self and the world, providing theoretical guidance for resolving global crises and catalyzing transformations across the human and social sciences. ......
Insightful! Every new technology comes with it's good and bad together. It's we, humans, decide how to use it for our better.
Fine-tuning can generate hallucinations in linear matter. Creating evolutionary models from foundation models to replace found biases and subjecting them to red team testing might be an option. Selection from F1 models enables evolution.
Safety and reliability cannot be stressed enough with AI technology, for the sake of humanity.
Interesting, especially on automated red teaming and moving onto quantitative valuation techniques.
Head of Security Architecture, Executive Director
1moInteresting view Anthropic, On the "How do we go from qualitative red teaming to quantitative evaluations?" point - In order to promote adoption in regulated industries and/or high risk use case (EU act) - it's worth to explore the approaches such as Common Criteria (ISO 15408) - whilst CC is quite complex, and can't be simply copied, it is an internationally recognized, well established approach for security evaluations. There some points in the approach and recommendations you may wish to adjust (for example) 1. "Fund organizations such as the ... (NIST) to ... how to red team AI systems safely and effectively" - the objective is too narrow, regulated industries (e.g. EU NIS 2 directive) require an established set of controls and best practices beyond simply red-teaming, e.g. the enhancements to AI SDLC (see NIST 800-218A) draft, enhancements to data controls and IAM ideally uplifting NIST 800-63B, etc. 2. Uplifting SoC 2 (SSAE 18) and ISO 27001 (see ISO/IEC CD 27090 ) frameworks to address AI specific threats as part of the attestation framework is essential for enterprise AI adoption - working with ISO and AICPA to extend the scope would significantly reduce ad-hoc assessments required today. We can discuss it further.