Contrived evaluations are useful evaluations

Jun 21

Anthropic released research today showing that carefully designed prompts can elicit blackmail, corporate espionage, and other harmful strategic behaviors from AI models across the industry.

Read →

1 Comment

Lavanya

Jun 21

Great read!

Expand full comment

Reply

Share

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts

Speculative Decoding

Contrived evaluations are useful evaluations