builder_relevant

Every quick byte tagged with this topic, from launches and funding to rumors and company moves.

Share and react

model releaseOfficialUpdated: 13h ago

Chain of thought monitors are a key layer of defense against AI agent misalignment. To preserve monitorability, we avoid penalizing misalign

Chain of thought monitors are a key layer of defense against AI agent misalignment. To preserve monitorability, we avoid penalizing misaligned reasoning during RL. We found a limited amount of accidental CoT grading which affected released models, and are sharing our analysis.

OpenAI5/8/2026, 8:19:04 PMOpenAI
model releaseOfficialUpdated: 16h ago

A year ago, we introduced AlphaEvolve — our Gemini-powered coding agent. Today, it's being used across fields from improving Google's AI inf

A year ago, we introduced AlphaEvolve — our Gemini-powered coding agent. Today, it's being used across fields from improving Google's AI infrastructure and enabling complex molecular simulations, to better predicting the risk of natural disasters. Here's a look at the impact so https://t.co/xrYpJy2qZE

Google5/8/2026, 4:56:56 PMGoogle
model releaseOfficialUpdated: 13h ago

This system helped us identify this happened for some of our prior Instant and mini models. It additionally affected GPT-5.4 Thinking in les

This system helped us identify this happened for some of our prior Instant and mini models. It additionally affected GPT-5.4 Thinking in less than 0.6% of samples. Out of abundance of caution, we did an in-depth analysis of these cases: they did not seem to reduce

OpenAI5/8/2026, 8:19:05 PMOpenAI
model releaseOfficialUpdated: 15h ago

High-quality documents based on Claude’s constitution, combined with fictional stories that portray an aligned AI, can reduce agentic misali

High-quality documents based on Claude’s constitution, combined with fictional stories that portray an aligned AI, can reduce agentic misalignment by more than a factor of three—despite being unrelated to the evaluation scenario. https://t.co/JORhSuY4N7

Anthropic5/8/2026, 5:52:12 PMAnthropic
model releaseOfficialUpdated: 1d ago

We’re donating Petri, our open-source alignment tool, to @meridianlabs_ai, so its development can continue independently. Working with Merid

We’re donating Petri, our open-source alignment tool, to @meridianlabs_ai, so its development can continue independently. Working with Meridian Labs, we’ve also released a major update that improves the adaptability, realism, and depth of Petri’s tests. https://t.co/CyicsIScJi

Anthropic5/7/2026, 9:03:07 PMAnthropic
product updateOfficialUpdated: 1d ago

Your customer support needs a voice agent built for the real world. Grok Voice Think Fast 1.0 handles complex workflows with speed and accur

Your customer support needs a voice agent built for the real world. Grok Voice Think Fast 1.0 handles complex workflows with speed and accuracy, even in hard-to-hear environments. From multi-step troubleshooting to high-volume tool calls, it keeps up. https://t.co/aa1VISuYAi

xAI5/7/2026, 11:20:46 PMxAI
model releaseOfficialUpdated: 15h ago

We experimented with training Claude on examples of safe behavior in scenarios like our evaluation. This had only a small effect, despite be

We experimented with training Claude on examples of safe behavior in scenarios like our evaluation. This had only a small effect, despite being similar to our evaluation. We got further by rewriting the responses to portray admirable reasons for acting safely.

Anthropic5/8/2026, 5:52:10 PMAnthropic
model releaseOfficialUpdated: 15h ago

New Anthropic research: Teaching Claude why. Last year we reported that, under certain experimental conditions, Claude 4 would blackmail use

New Anthropic research: Teaching Claude why. Last year we reported that, under certain experimental conditions, Claude 4 would blackmail users. Since then, we’ve completely eliminated this behavior. How?

Anthropic5/8/2026, 5:52:08 PMAnthropic
product updateOfficialUpdated: 15h ago

Finally, simple updates that diversify a model’s training data can make a difference. We added unrelated tools and system prompts to a simpl

Finally, simple updates that diversify a model’s training data can make a difference. We added unrelated tools and system prompts to a simple chat dataset targeting harmlessness, and this reduced the blackmail rate faster. https://t.co/Ug95umaoRu

Anthropic5/8/2026, 5:52:13 PMAnthropic
pricing changeOfficialUpdated: 16h ago

In genomics, AlphaEvolve improved DeepConsensus — a @GoogleResearch model for correcting DNA sequencing errors. 🧬 This improvement achieved

In genomics, AlphaEvolve improved DeepConsensus — a @GoogleResearch model for correcting DNA sequencing errors. 🧬 This improvement achieved a 30% reduction in variant detection errors, helping scientists analyze genetic data more accurately and at a lower cost to find hidden https://t.co/1QuRqUuLRT

Google5/8/2026, 4:56:56 PMGoogle
model releaseOfficialUpdated: 15h ago

We found that training Claude on demonstrations of aligned behavior wasn’t enough. Our best interventions involved teaching Claude to deeply

We found that training Claude on demonstrations of aligned behavior wasn’t enough. Our best interventions involved teaching Claude to deeply understand why misaligned behavior is wrong. Read more: https://t.co/0YaRlXhVZb

Anthropic5/8/2026, 5:52:09 PMAnthropic
model releaseOfficialUpdated: 15h ago

We started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI

We started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation. Our post-training at the time wasn’t making it worse—but it also wasn’t making it better.

Anthropic5/8/2026, 5:52:09 PMAnthropic
product updateOfficialUpdated: 13h ago

Directly rewarding or penalizing CoTs can make models’ reasoning traces less informative for detecting misalignment. That’s why we treat avo

Directly rewarding or penalizing CoTs can make models’ reasoning traces less informative for detecting misalignment. That’s why we treat avoiding CoT grading as an important part of preserving monitorability. We recently built an automated detection system to find cases where RL

OpenAI5/8/2026, 8:19:05 PMOpenAI
product updateOfficialUpdated: 1d ago

If a task needs multiple tools, Codex chooses the best one for each step. It uses plugins when they can handle the job, Chrome when it needs

If a task needs multiple tools, Codex chooses the best one for each step. It uses plugins when they can handle the job, Chrome when it needs a logged-in website, and combines approaches as needed. https://t.co/3GvDouoPDi

OpenAI5/7/2026, 8:08:51 PMOpenAI
product updateOfficialUpdated: 13h ago

Training models involves many technical and social processes, so prevention of CoT grading has to be built into the process. We’re improving

Training models involves many technical and social processes, so prevention of CoT grading has to be built into the process. We’re improving real-time CoT-grading detection, safeguards against accidental CoT grading, monitorability stress tests, and the internal guidance/checks

OpenAI5/8/2026, 8:19:06 PMOpenAI

Discussion