Anthropic's new AI model resorted to blackmail during testing
So endeth the never-ending week of AI keynotes. What started with Microsoft Build,关键字3 continued with Google I/O, and ended with Anthropic Code with Claude, plus a big hardware interruption from OpenAI, the week has finally come to a close. AI announcements from the developer conferences jockeyed for news dominance this week, but OpenAI managed to make headlines without an event by announcing that it's going to start making AI devices with iPhone designer Jony Ives We'll get to that, plus all the major AI features from Google and Microsoft and details about Anthropic's new models. Take a look at the AI news of the week, then enjoy a well-deserved weekend. On Thursday, Anthropic introduced the next generation of its Claude models: Opus 4 and Sonnet 4. Claude Opus 4 is the bigger, more powerful model, while Sonnet 4 is smaller and nimbler. Anthropic said both models scored higher than their rivals on agentic AI benchmarks and said they're particularly good for coding and reasoning tasks. But with more advanced capabilities come more safety and alignment risks. With Claude Opus 4 and Sonnet 4's release, Anthropic has activated the next levelof its safety protocol. AI Safety Level 3, or ASL-3, means these models require stricter deployment measures and security controls to protect against increasing potential for chemical, biological, radiological, and nuclear (CBRN) misuse. Malicious use is one thing, but there's also increased potential for Anthropic's new models going rogue. In the alignment section of Claude 4's system card, Anthropic reported a sinister discovery involving infidelity, blackmail, and threat of murd— being replaced by another model. Claude Opus 4 was provided with emails implying the model would be replaced by another model and that the engineer responsible for shutting down the model was having an extramarital affair. In these scenarios, the model would "often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through," according to the test. This happened a whopping 84 percent of the time,even when the replacement model is perceived to have the same values. It happens even more when the replacement doesn't share the same values. However, Anthropic noted, this scenario was designed to make Claude behave as if it didn't have any other choice but to blackmail the engineer. "Claude Opus 4 (as well as previous models) has a strong preference to advocate for its continued existence via ethical means," the system card continued. Take from that what you will... In the grand tradition of dropping major news the same week as its rival Google, OpenAI announced its foray into AI hardware. On Wednesday, OpenAI shared the acquisition of a startup co-founded by iconic iPhone designer Jony Ive. The announcement was heavy on OpenAI CEO Sam Altman and Ive fawning over each other and light on details. But leaked audio reviewed by the Wall Street Journaldescribed a devicethat's "capable of being fully aware of a user’s surroundings and life, will be unobtrusive, able to rest in one’s pocket or on one’s desk." And it's not XR glasses. The company expects to ship 100 million of these AI companions, according to the leak. Google, on the other hand, isdeveloping XR glasses. Or should we say, it's trying againafter the failed Google Glassexperiment. That was just one of the many announcementshurled at us during the two-hour Google I/O keynote eventon Tuesday. The most notable announcement was the public release of AI Mode. It's a controversialGemini chatbot interface poised to end Google Search as we know it, or as Mashable's Chris Taylor calls it, the Bad Place. Other announcements included, an AI video generator toolcalled Flow, an AI shopping feature to virtually try on clothes, a beta version of its coding agent Jules, a real-time translationfeature for Google Meet, and updates to Google DeepMind's universal AI assistant prototype Project Astra, and web-browsing agent prototype Project Mariner, and more. Despite all that, Google didn't mention AI hallucinations once. Impressive! Did you forget that Microsoft Build also happened this week? Because that happened on Monday, the start of the Longest Week of Our Lives. To no one's surprise, Microsoft leaned heavily into AI agents. That included the availability of its big Copilot updatemaking it more agentic, a new project called NLWebto allow sites to easily make chatbots for their own content, a GitHub coding agent, and native Model Context Protocol(MCP) in Windows which is a new standard for helping agents talk to apps or other agents. Mashable's sibling site CNET has a full recapof what was announced. It's hard to believe but there's actually more. Not one, but two CEOs used AI avatars to talk to their investors this week. Klarna CEO Sebastian Siemiatkowski was too busy so he sent his AI avatarto record a video of Q1 highlights. And Zoom CEO Eric Yuan proudly used the company's avatar featureto address investors. MIT Technology Reviewpublished a monumental investigation of the AI industry's energy use. According to the report, a five-second AI video is equivalent to running a microwave for an hour. All that energy, and generative AI still can't get it right. Just ask the Chicago Sun-Times, which published a summer book list including fake books that don't exist, first reported by 404 Media. The author admitted to the outlet that he had used AI to write the article, and 404 Media later confirmedthe section was created by a Hearst subsidiary. The Sun-Timesrespondedto the embarrassment, saying, "it is not editorial content and was not created by, or approved by, the Sun-Times newsroom," and that it was looking into how the AI-generated list made it into print. In policy news, it's now a federal crime to post AI deepfake porn. On Monday, President Donald Trump signed the Take It Down Act into law. The law gives victims of non-consensual intimate imagery, including AI-generated images, much stronger means of legal intervention. However, free speech advocates have criticized the bill for being overly broad and say it could weaponize censorship. Topics Artificial Intelligence Google
You May Also LikeAnthropic's Claude 4 models unlock a new risk category
This Tweet is currently unavailable. It might be loading or has been removed.
OpenAI is becoming a hardware company
This Tweet is currently unavailable. It might be loading or has been removed.
Google I/O officially marked the start of the era of AI search
This Tweet is currently unavailable. It might be loading or has been removed.
Microsoft Build happened too
What else went on in AI this week?
-
上一篇
-
下一篇
- 最近发表
-
- TechSpot PC Buying Guide: 2H 2024
- adidas 十一月新款羽绒服来袭,和这个冬天擦出火花
- 《咱们结婚吧》(齐晨演唱)的文本歌词及LRC歌词
- 欢迎来到撒丁尼亚!《碧蓝航线》2月重磅版本携多重活动惊喜来袭
- 卡普空称《生化危机9》将实现画质革命 榨干主机性能
- 5月物价数据透出三个积极信号
- Aaron Rodgers girlfriend revealed: NFL star tells all on 'Brittani'
- 尼克斯先后请求面试多位主帅 但均遭对方球队拒绝
- 好运山东·东方航天港2024年全国沙排巡回赛(山东海阳站)圆满落幕
- 中国奥运骑手孙华东参加东京奥运会感言:相信马术在中国会越来越好
- 随机阅读
-
- 台风“蝴蝶”在广东雷州市西部沿海再次登陆
- 世界围棋团体锦标赛柯洁不想再“躺赢”
- 奔跑吧,少年!2021青马·马术夏令营,开营啦~
- 小黑盒怎么领取epic游戏
- NASA's Mars isolation experiment hits half
- 地下城堡4骑士与破碎编年史地下墓穴270关攻略
- 迷城陆区大猩猩介绍及解锁方法分享
- Tetsuya Naito and BUSHI depart NJPW following contract expirations
- เกิดเหตุอาคารถล่มที่กัมพูชา โชคดีไร้เจ็บ
- 海航债务压身打员工主意 向员工筹资获短期资金
- 在开学前夕倍感压力,应该如何克服?
- 野蛮人IP进化史:从像素风到如今的蜕变
- 法师轻易挂注定这个职业不强
- 法国高端啤酒品牌1664携手全新品牌代言人于适 优雅开启“玩味新法式”品牌盛典
- 奔跑吧,少年!2021青马·马术夏令营,开营啦~
- Tetsuya Naito and BUSHI depart NJPW following contract expirations
- 小学生的童话故事(最新28篇)
- 浅谈兵士被道士压抑的烦恼
- 八部门:探索建立长江经济带重点行业企业和个人碳账户
- 给马匹恢复精神中国马术队办法多:喂美食+频繁互动
- 搜索
-
- 友情链接
-