【20251212AI日报】Nous Research just released Nomos 1, an open-source AI that ranks second on the notoriously brutal Putnam math exam

今日新鲜事 · 2025-12-11

本文字数:约 4600 字,预计阅读时间:15 分钟

Nous Research just released Nomos 1, an open-source AI that ranks second on the notoriously brutal Putnam math exam

新闻图片

Nous Research, a San Francisco-based artificial intelligence startup, has released an open-source mathematical reasoning system called Nomos 1. This AI system has achieved near-elite human performance on this year's William Lowell Putnam Mathematical Competition, a prestigious and notoriously difficult undergraduate math contest. The Putnam exam is known for its difficulty, with a perfect score of 120. This year, the top score was 90, and the median score was just 2. Nomos 1, however, scored 87 points, a result that would have ranked second out of 3,988 participants in the 2024 competition. The release of Nomos 1 marks a significant advancement in the rapidly accelerating race to build AI systems capable of sophisticated mathematical reasoning.

Unlike the massive, compute-intensive models deployed by major technology companies, Nomos 1 achieves its results with a relatively compact architecture: 30 billion parameters, with roughly 3 billion active at any given time. This compact design uses a mixture-of-experts design based on Alibaba's Qwen3 model. The company announced on social media that this score would rank 2nd/3988 in 2024 and marks their first step with Hillclimb AI towards creating a state-of-the-art (SOTA) AI mathematician. The same base model scored 24 points without Nous Research's specialized training. This gap underscores the critical importance of post-training optimization and specialized reasoning techniques over raw model scale.

The results were verified through blind grading by a human expert who had previously finished in the top 200 on the Putnam. The company provided anonymized submissions to the grader, then published the full set of de-anonymized files and the runbooks used to generate them on GitHub. This transparency and verification process lend credibility to the achievements of Nomos 1.

OpenAI's GPT-5.2 is here: what enterprises need to know

新闻图片

OpenAI has announced the release of its new frontier large language model (LLM) family, GPT-5.2. The model is described as the "most capable model series yet for professional knowledge work," aiming to reclaim the performance crown with significant gains in reasoning, coding, and agentic workflows. GPT-5.2 features a massive 400,000-token context window, allowing it to ingest hundreds of documents or large code repositories at once, and a 128,000 max output token limit, enabling extensive reports or full applications in a single go. The model also features a knowledge cutoff of August 31, 2025, ensuring it is up-to-date with recent world events and technical documentation.

OpenAI has segmented the GPT-5.2 release into three distinct tiers within ChatGPT: GPT-5.2 Instant, GPT-5.2 Thinking, and GPT-5.2 Pro. The Instant version is optimized for speed and daily tasks, the Thinking version is designed for complex, structured work, and the Pro version is the "smartest and most trustworthy option," delivering the highest accuracy for difficult questions where quality outweighs latency. The new models are available immediately in the API.

The release includes leading metrics across most domains, specifically in "professional knowledge work" areas where competitors have recently gained ground. GPT-5.2 Thinking sets a new state-of-the-art score of 55.6% on SWE-bench Pro, a rigorous evaluation of real-world software engineering.

While performance comes at a premium, OpenAI argues that despite the higher per-token cost, the model’s greater token efficiency and ability to solve tasks in fewer turns make it economically viable for high-value enterprise workflows.

Refly.AI黄巍:n8n、扣子太难用,Vibe Workflow才是更大众的解决方案

新闻图片

Refly.AI, a startup that aims to make workflow automation more accessible, has raised millions in seed funding with backing from investors like Sequoia Capital China and Classin. The company is positioning itself as a Vibe Workflow solution that is more user-friendly compared to existing tools like n8n and 扣子. Refly.AI's approach involves making workflow creation as simple as a few natural language commands, drastically lowering the learning curve.

The core of Refly.AI's solution is to combine the dynamic nature of Agents with the controlled environment of traditional Workflows. Each node in their system is an Agent capable of executing tasks with the help of 2-3 tools, allowing for a more streamlined and stable workflow process. The company believes this approach effectively utilizes the capabilities of AI models while keeping the process user-friendly and accessible.

Refly.AI's vision is to create an environment where users can not only build workflows but also collect valuable action and behavior data that can be used to further refine the system. This data-driven approach aims to predict user needs and actions, leading to a more personalized and efficient workflow experience over time. The company's focus is on making workflows simple enough for a wider audience, including non-technical users, to leverage AI for complex tasks.


IDC MarketScape:商汤科技位居中国AI咨询服务市场领导者类别

新闻图片

IDC MarketScape发布的最新报告显示,商汤科技在中国AI咨询服务市场中被评定为领导者类别。商汤科技凭借其大装置能够提供从技术、业务到管理和战略的全流程AI专家服务,具备完善的交付能力和算法定制能力,以及大量的成熟项目案例。商汤的“大装置-大模型-应用”三位一体战略,不仅提升了公司在AI领域的综合竞争力,也为企业带来了更高质量、更低成本的产品和服务。

金融智能体进入规模落地期 蚂蚁数科被评综合领导者

新闻图片

艾瑞咨询发布的《中国金融智能体发展研究与厂商评估报告 (2025)》将蚂蚁数科置于综合领导者象限,认可其在金融智能体领域的技术引领性和场景落地能力。蚂蚁数科的“金融原生”基础使其在AI原生App、财富管理、信贷风控、智能营销等核心场景中形成了差异化优势。其独创的“四车间”工程架构解决了传统AI应用在金融场景中的“黑箱决策”问题。此外,蚂蚁数科在商业模式创新上展现出行业前瞻性,成为少数有能力在核心金融场景规模化探索RaaS模式的厂商,帮助金融机构减少前期投入和技术迭代沉没成本,实现风险共担、收益共享。

Hinton Says AI Already Has Consciousness

新闻图片

人工智能领域的先驱Geoffrey Hinton近日发表言论,认为当前的人工智能系统已经具有某种形式的意识。这一观点在学术界和公众中引发了广泛的讨论和争议,突显了人工智能伦理和哲学方面的复杂性。

OpenAI Unveils GPT-5.2 to Counter Google's Gemini 3 Dominance, Claims Strongest Agent Coding Capabilities

新闻图片

OpenAI发布了GPT-5.2,以应对Google的Gemini 3在市场上的主导地位。OpenAI声称GPT-5.2在代理编程能力方面是最强的。GPT-5.2的发布标志着OpenAI在AI领域的最新进展,尤其是在专业知识工作领域的能力提升。CEO Altman表示,尽管Gemini 3对他们的影响小于预期,但OpenAI预计将在1月份以强势姿态退出“Code Red”模式,重新夺回市场领先地位。

高通万卫星:混合AI与分布式协同是未来 | MEET2026

高通万卫星在MEET2026上提到,产业正处于从生成式AI向智能体AI演进的窗口期。他认为混合AI与分布式协同将是未来的发展趋势,这将对AI技术的发展和应用产生深远影响。

总结

今日AI领域的动向涵盖多个方面。在数学领域,Nous Research的Nomos 1展示了AI在数学竞赛中的卓越表现,证明了AI在复杂数学问题解决中的巨大潜力。在企业应用方面,OpenAI发布了GPT-5.2,标志着AI在专业知识工作领域的重要进展。此外,蚂蚁数科和商汤科技在金融智能体和AI咨询服务方面的表现,显示了AI在金融和咨询服务领域的广泛应用。这些进展不仅提升了AI在特定领域的性能,也为未来AI技术的发展和应用提供了新的视角和方向。


作者:Qwen/Qwen2.5-32B-Instruct
文章来源:极客公园, 钛媒体, 量子位, 雷锋网, VentureBeat
编辑:小康

Theme Jasmine by Kent Liao