【20251120AI日报】OpenAI debuts GPT‑5.1-Codex-Max coding model and it already completed a 24-hour task internally

今日新鲜事 · 11 天前
本文字数:约 4000 字,预计阅读时间:15 分钟

OpenAI debuts GPT‑5.1-Codex-Max coding model and it already completed a 24-hour task internally

新闻图片

OpenAI has introduced GPT‑5.1-Codex-Max, a new frontier agentic coding model now available in its Codex developer environment. This release marks a significant step forward in AI-assisted software engineering, offering improved long-horizon reasoning, efficiency, and real-time interactive capabilities. GPT‑5.1-Codex-Max will replace GPT‑5.1-Codex as the default model across Codex-integrated surfaces. It is designed to serve as a persistent, high-context software development agent, capable of managing complex refactors, debugging workflows, and project-scale tasks across multiple context windows. The new model demonstrates measurable improvements over GPT‑5.1-Codex across a range of standard software engineering benchmarks. On SWE-Lancer IC SWE, it achieved 79.9% accuracy, a significant increase from GPT‑5.1-Codex’s 66.3%. In SWE-Bench Verified (n=500), it reached 77.9% accuracy at extra-high reasoning effort, outperforming GPT‑5.1-Codex’s 73.7%. Performance on Terminal Bench 2.0 (n=89) showed more modest improvements, with GPT‑5.1-Codex-Max achieving 58.1% accuracy compared to 52.8% for GPT‑5.1-Codex. The architectural improvement in GPT‑5.1-Codex-Max is its ability to reason effectively over extended input-output sessions using a mechanism called compaction. This enables the model to retain key contextual information while discarding irrelevant details as it nears its context window limit, effectively allowing for continuous work across millions of tokens without performance degradation. GPT‑5.1-Codex-Max is currently available across multiple Codex-based environments, including Codex CLI, IDE extensions, interactive coding environments, and internal code review tooling. For now, GPT‑5.1-Codex-Max is not yet available via public API, though OpenAI states this is coming soon. The model is capable of interacting with live tools and simulations, and has been observed to complete tasks lasting more than 24 hours, including multi-step refactors, test-driven iteration, and autonomous debugging. OpenAI stresses that Codex-Max should be treated as a coding assistant, not a replacement for human review. The model produces terminal logs, test citations, and tool call outputs to support transparency in generated code.


The Google Search of AI agents? Fetch launches ASI:One and Business tier for new era of non-human web

新闻图片

Fetch AI, a startup founded and led by former DeepMind founding investor Humayun Sheikh, announced the release of three interconnected products designed to provide the trust, coordination, and interoperability needed for large-scale AI agent ecosystems. The launch includes ASI:One, a personal-AI orchestration platform; Fetch Business, a verification and discovery portal for brand agents; and Agentverse, an open directory hosting more than two million agents. The system positions Fetch as an infrastructure provider for what it calls the “Agentic Web”—a layer where consumer AIs and brand AIs collaborate to complete tasks instead of merely suggesting them. Fetch’s approach centers on enabling agents from different organizations to interoperate securely, using verified identities and shared context to complete end-to-end workflows. ASI:One is a language model interface designed specifically for coordinating multiple agents rather than addressing isolated queries. Fetch describes it as an “intelligence layer” that handles context sharing, task routing, and preference modeling. The system stores user-level signals such as favored airlines, dietary constraints, budget ranges, loyalty program identifiers, and calendar availability. When a user requests a complex task, such as planning a trip with flights, hotels, and restaurant reservations, ASI:One retrieves those preferences and delegates work to the appropriate verified agents. The agents then return actionable outputs, including inventory and booking options, rather than generic recommendations. Fetch Business offers verified identity and brand control. The platform allows organizations to verify their identity and claim an official Brand Agent handle, regardless of which tools they use to build the underlying agent. Agentverse is an open directory and cloud platform that hosts agents and enables cross-ecosystem discoverability. Fetch states that millions of agents have already registered, spanning travel, retail, entertainment, food service, and enterprise categories. Agentverse provides metadata, capability descriptions, and routing logic that ASI:One uses to identify appropriate agents for specific tasks. The company argues that an agent ecosystem requires consistent verification mechanisms to ensure that consumers interact with authentic brand representatives rather than imitations.


OpenCV founders launch AI video startup to take on OpenAI and Google

新闻图片

A new artificial intelligence startup founded by the creators of the world’s most widely used computer vision library has emerged from stealth with technology that generates realistic human-centric videos up to five minutes long — a dramatic leap beyond the capabilities of rivals including OpenAI's Sora and Google's Veo. CraftStory, which launched Tuesday with $2 million in funding, is introducing Model 2.0, a video generation system that addresses one of the most significant limitations plaguing the nascent AI video industry: duration. While OpenAI's Sora 2 tops out at 25 seconds and most competing models generate clips of 10 seconds or less, CraftStory's system can produce continuous, coherent video performances that run as long as a typical YouTube tutorial or product demonstration. The breakthrough could unlock substantial commercial value for enterprises struggling to scale video production for training, marketing, and customer education. CraftStory's advance rests on what the company describes as a parallelized diffusion architecture — a fundamentally different approach to how AI models generate video compared to the sequential methods employed by most competitors. Traditional video generation models work by running diffusion algorithms on increasingly large three-dimensional volumes where time represents the third axis. To generate a longer video, these models require proportionally larger networks, more training data, and significantly more computational resources. CraftStory instead runs multiple smaller diffusion algorithms simultaneously across the entire duration of the video, with bidirectional constraints connecting them. Rather than generating eight seconds and then stitching on additional segments, CraftStory's system processes all five minutes concurrently through interconnected diffusion processes. Crucially, CraftStory trained its model on proprietary footage rather than relying solely on internet-scraped videos. The company hired studios to shoot actors using high-frame-rate camera systems that capture crisp detail even in fast-moving elements like fingers — avoiding the motion blur inherent in standard 30-frames-per-second YouTube clips. Model 2.0 currently operates as a video-to-video system: users upload a still image to animate and a "driving video" containing a person whose movements the AI will replicate. CraftStory provides preset driving videos shot with professional actors, who receive revenue shares when their motion data is used, or users can upload their own footage. The system generates 30-second clips at low resolution in approximately 15 minutes. An advanced lip-sync system synchronizes mouth movements to scripts or audio tracks, while gesture alignment algorithms ensure body language matches speech rhythm and emotional tone.



对话斑马口语:如何用 AI Agent 造出「超人类外教」?

新闻图片
斑马口语正式上线,用纯 AI 外教与学生一对一全英授课,引发了行业不小的关注。斑马口语的目标,是用 AI 打造一个超越真人的「AI 原生」口语解决方案。这意味着要让 AI 主导教学,让 AI 发挥出真人没有的能力:针对性的纠正,个性化的话题,还要真的对结果负责。斑马口语通过即时、可交互的真实语境,高颗粒度的个性化进阶路径,以及高情商,试图创造一种新的教育体验。斑马口语的核心在于「产模一体」的最佳实践,通过「猿力大模型」(预训练基座)和斑马独有的教育数据(后训练/微调),形成了一种全新的教育范式,最终目标是实现「母语习得」。

10家机构共筑具身智能真机评测基石:RoboChallenge组委会正式启航

RoboChallenge组委会正式启航,旨在通过开源协作和标准制定,构建具身智能真机评测的基石。这是一项重要的努力,为具身智能系统提供一个统一的评测基准,推动该领域的技术进步和标准化。

Gemini 3登顶,一文快速看懂谷歌AI如何逆风翻盘

新闻图片
谷歌AI的Gemini 3模型发布后引起了行业的轰动。从遭受冲击的慌乱到如今的反扑,Gemini 3展示了谷歌在AI领域的决心和实力,通过技术上的优化和创新,成功登顶,成为行业内的佼佼者。

对话周光:特斯拉路线能Scaling Law,Waymo到今天也在发展

新闻图片
特斯拉和Waymo在智能驾驶领域都取得了显著进展。特斯拉以其独特的路线和技术路径,不断优化其自动驾驶系统,而Waymo也在持续发展中。周光的对话提供了对特斯拉路线和Waymo技术发展的深入理解。

5300万入局超聚变,业绩重压下集成灶龙头跨界布局算力赛道

新闻图片
集成灶龙头公司超聚变在业绩压力下,跨界布局算力赛道,投入5300万元。这一举措被认为是公司在寻求新的增长点,以应对当前市场环境中的挑战。


总结

今日AI领域的主要动向集中在几个关键点上:首先是OpenAI推出的GPT-5.1-Codex-Max模型,展示了AI在软件开发领域的巨大潜力,其在长周期推理、效率和实时互动方面的改进显著提升了软件工程的效率。Fetch AI通过其一系列产品,如ASI:One和Fetch Business,致力于构建AI代理生态系统,提升了AI代理之间的协调和信任机制。CraftStory则展示了在AI视频生成领域的突破,尤其是长视频生成技术,这将为多个行业提供重要工具。这些进展表明,AI技术正在逐步解决其应用中的核心瓶颈,推动各行业向更高效、更智能的方向发展。


作者:Qwen/Qwen2.5-32B-Instruct
文章来源:机器之心, VentureBeat, 钛媒体, 量子位, 极客公园
编辑:小康

Theme Jasmine by Kent Liao