Anthropic’s Post

View organization page for Anthropic, graphic

387,203 followers

1mo

Today, we're sharing a sample of red teaming methods we’ve used to test our AI systems. We detail challenges, findings, and the need to work towards common industry standards: https://lnkd.in/eR-6jd7Y

Challenges in Red Teaming AI Systems

anthropic.com

30 Comments

Yair K.

Head of Security Architecture, Executive Director

1mo

Interesting view Anthropic, On the "How do we go from qualitative red teaming to quantitative evaluations?" point - In order to promote adoption in regulated industries and/or high risk use case (EU act) - it's worth to explore the approaches such as Common Criteria (ISO 15408) - whilst CC is quite complex, and can't be simply copied, it is an internationally recognized, well established approach for security evaluations. There some points in the approach and recommendations you may wish to adjust (for example) 1. "Fund organizations such as the ... (NIST) to ... how to red team AI systems safely and effectively" - the objective is too narrow, regulated industries (e.g. EU NIS 2 directive) require an established set of controls and best practices beyond simply red-teaming, e.g. the enhancements to AI SDLC (see NIST 800-218A) draft, enhancements to data controls and IAM ideally uplifting NIST 800-63B, etc. 2. Uplifting SoC 2 (SSAE 18) and ISO 27001 (see ISO/IEC CD 27090 ) frameworks to address AI specific threats as part of the attestation framework is essential for enterprise AI adoption - working with ISO and AICPA to extend the scope would significantly reduce ad-hoc assessments required today. We can discuss it further.

1 Reaction

Des W Woodruff

1mo

Red Teaming: Why Should You Care? 1. Improved Safety: Just as cars undergo rigorous tests to prevent accidents, robust red teaming can prevent harmful AI failures. 2. Trustworthy AI: By finding and fixing vulnerabilities, AI systems become more reliable and trustworthy for everyone. 3. Policy Support: Calls for better standards and policies ensure that AI development is responsible and secure. 4. Future-Proofing: Understanding and mitigating risks today prepares AI systems for safer deployment in the future.

Lisa Fast

UX Architect/Researcher

1mo

I’m super interested in this idea of using 2 LLM’s to test/evaluate/supervise each other like this. “we employ a red team / blue team dynamic, where we use a model to generate attacks that are likely to elicit the target behavior (red team) and then fine-tune a model on those red teamed outputs in order to make it more robust to similar types of attack (blue team). “

Shashikumar Mysore Pandu

Engineering Manager, Cloud Security

1mo

Amazing walk through of different red teaming methods that goes beyond conventional application testing and red team excercise. Looking forward to framework on how this information will be used to build secure development practices.

2 Reactions

袁西浩

No - Researcher

https://www.zhihu.com/people/35-34-18-19 https://xihaoandhaidan.wixsite.com/natural-human-philos Claude: A New Era of Human Intelligence:The Historic Intersection of Artificial Intelligence and the Philosophical Revolution https://zhuanlan.zhihu.com/p/698907873 **************************** AI and Human Wisdom Unite - A Collaborative Plan to Advance Natural Human Philosophy Introduction Natural Human Philosophy is a groundbreaking theoretical system that situates philosophical inquiry within the paradigm of natural science, offering an unprecedented path forward from the age-old predicament of philosophy. By elucidating key concepts such as "Humanity's Two Great Transgressions," "The Third Nature of Homo Sapiens," and the "Cultural Cloud," Natural Human Philosophy has opened new vistas for reconstructing human understanding of the self and the world, providing theoretical guidance for resolving global crises and catalyzing transformations across the human and social sciences. ......

HUB47 | Startups Incubator

1mo

Insightful! Every new technology comes with it's good and bad together. It's we, humans, decide how to use it for our better.

Geoffroy Petit

Creating value out of Data + Technology + People - Risks

1mo

Zineb & Chloé - some interesting aspects to cover cyber threats related to Gen AI 💡

2 Reactions

Heidi Andersén

MD, PhD, Senior Consultant, Researcher, Medical Teacher

1mo

Fine-tuning can generate hallucinations in linear matter. Creating evolutionary models from foundation models to replace found biases and subjecting them to red team testing might be an option. Selection from F1 models enables evolution.

Jotgenie | AI Meeting / Notetaking Assistant

1mo

Safety and reliability cannot be stressed enough with AI technology, for the sake of humanity.

1 Reaction

George Shevardenidze

AI Business consultant

1mo

Interesting, especially on automated red teaming and moving onto quantitative valuation techniques.

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

Anthropic

387,203 followers
3w
Report this post
Introducing Claude 3.5 Sonnet—our most intelligent model yet. This is the first release in our 3.5 model family. Claude 3.5 Sonnet now outperforms competitor models on key evaluations, at twice the speed of Claude 3 Opus and one-fifth the cost. Try it for free: https://claude.ai. Claude 3.5 Sonnet is now our strongest vision model. Sonnet now surpasses Claude 3 Opus across all standard vision benchmarks. Improvements are most noticeable in tasks requiring visual reasoning, like interpreting charts, graphs, or transcribing text from imperfect images. Claude 3.5 Sonnet sets new industry benchmarks for graduate-level reasoning (GPQA), undergraduate-level knowledge (MMLU), and coding proficiency (HumanEval). It shows marked improvement in grasping nuance, humor, and complex instructions, all while writing with a natural tone. Claude 3.5 Sonnet is available for free on claude.ai and the Claude iOS app. Claude Pro and Team subscribers benefit from significantly higher rate limits. Sonnet is also available via the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI. To complete the Claude 3.5 model family, we'll be releasing Claude 3.5 Haiku and Claude 3.5 Opus later this year. In addition, we're developing new modalities and features for businesses, alongside rigorous safety testing. Read more: https://lnkd.in/eddane8F
247 Comments
Like Comment
To view or add a comment, sign in
Anthropic

387,203 followers
5d Edited
Report this post
You can now fine-tune Claude 3 Haiku—our fastest and most cost-effective model—in Amazon Bedrock: https://lnkd.in/e8NX_F-g. In testing, we fine-tuned Haiku to moderate comments on internet forums. Fine-tuning improved classification accuracy from 81.5% to 99.6% and reduced tokens per query by 89%. Early customers, like SK Telecom, have used fine-tuning to create custom Claude 3 models. These models deliver more effective responses across a range of use cases, from customer support to legal operations. Fine-tuning is currently available for Claude 3 Haiku in preview.
38 Comments
Like Comment
To view or add a comment, sign in
Anthropic

387,203 followers
5d
Report this post
Artifacts made with Claude can now be published and shared. You can also remix Artifacts shared by others. This is an experimental feature. Remix an example Artifact to try it out: https://lnkd.in/eND9zqxp. You can ask Claude to generate docs, code, mermaid diagrams, vector graphics, or even simple games. Artifacts appear next to your chat, letting you see, iterate, and build on your creations in real-time. We look forward to seeing what you create: https://claude.ai.

56 Comments
Like Comment
To view or add a comment, sign in
Anthropic

387,203 followers
6d
Report this post
We've added new features to the Anthropic Console: https://lnkd.in/ertrbfcU. Claude can generate prompts, create test variables, and show you the outputs of prompts side by side. Use Claude to generate input variables for your prompt. Then run the prompt to see Claude’s response. The new Evaluate tab enables you to automatically create test cases to evaluate your prompt against real-world inputs. Modify your test cases as needed, then run all of them in one click. We’ve also added the ability to compare the outputs of two or more prompts side by side. As you iterate on different versions of the prompt, your subject matter experts can compare responses and grade them on a 5-point scale. Test case generation and output comparison features are available to all users today.

76 Comments
Like Comment
To view or add a comment, sign in
Anthropic

387,203 followers
2w
Report this post
You can now organize chats with Claude into shareable Projects. Each project includes a 200K context window, the equivalent of a 500-page book, so you can include relevant documents, code, and files. All chats with Claude are private by default. On the Claude Team plan, you can choose to share snapshots of conversations with Claude into your team’s shared project activity feed. You can also set custom instructions within each project to further tailor Claude's responses. Projects are available today on Claude.ai for all Pro and Team users. Read more: https://lnkd.in/g_H78_Kh

83 Comments
Like Comment
To view or add a comment, sign in
Anthropic

387,203 followers
3w
Report this post
Claude 3.5 Sonnet shows marked improvement in grasping nuance, humor, and complex instructions, all while writing with a natural tone. Pen some new ideas today: https://claude.ai. Sonnet also sets new industry benchmarks for graduate-level reasoning (GPQA), undergraduate-level knowledge (MMLU), and coding proficiency (HumanEval).

53 Comments
Like Comment
To view or add a comment, sign in
Anthropic

387,203 followers
3w
Report this post
Claude 3.5 Sonnet—our most intelligent model yet—is also our strongest vision model. Claude 3.5 Sonnet surpasses Claude 3 Opus across all standard vision benchmarks. All this, available to try for free: https://claude.ai. Improvements are most noticeable in tasks requiring visual reasoning, like interpreting charts, graphs, or transcribing text from imperfect images.

87 Comments
Like Comment
To view or add a comment, sign in
Anthropic

387,203 followers
3w
Report this post
We're introducing a feature preview of Artifacts on claude.ai. This is in addition to the launch of Claude 3.5 Sonnet—our most intelligent model yet. You can now ask Claude to generate docs, code, mermaid diagrams, vector graphics, or even simple games. Artifacts appear in a dedicated window next to your chat, letting you see, iterate, and build on your creations in real-time. We look forward to seeing what you create with Claude: https://claude.ai

98 Comments
Like Comment
To view or add a comment, sign in
Anthropic

387,203 followers
1mo
Report this post
Today, we're publishing details about the processes we use to test and mitigate elections-related risks. We're also sharing samples of the evaluations we use to test our models: https://lnkd.in/eQc5gSCQ

Testing and mitigating elections-related risks

anthropic.com

17 Comments
Like Comment
To view or add a comment, sign in

387,203 followers

View Profile Follow

Anthropic’s Post

More Relevant Posts

Explore topics