本文字数:约 4600 字,预计阅读时间:12 分钟
From shiny object to sober reality: The vector database story, two years later
When I first wrote “Vector databases: Shiny object syndrome and the case of a missing unicorn” in March 2024, the industry was awash in hype. Vector databases were positioned as the next big thing — a must-have infrastructure layer for the gen AI era. Billions of venture dollars flowed, developers rushed to integrate embeddings into their pipelines and analysts breathlessly tracked funding rounds for Pinecone, Weaviate, Chroma, Milvus and a dozen others.
The promise was intoxicating: Finally, a way to search by meaning rather than by brittle keywords. Just dump your enterprise knowledge into a vector store, connect an LLM and watch magic happen. Except the magic never fully materialized.
Two years on, the reality check has arrived: 95% of organizations invested in gen AI initiatives are seeing zero measurable returns. And, many of the warnings I raised back then — about the limits of vectors, the crowded vendor landscape and the risks of treating vector databases as silver bullets — have played out almost exactly as predicted.
Prediction 1: The missing unicorn
Back then, I questioned whether Pinecone — the poster child of the category — would achieve unicorn status or whether it would become the “missing unicorn” of the database world. Today, that question has been answered in the most telling way possible: Pinecone is reportedly exploring a sale, struggling to break out amid fierce competition and customer churn.
Yes, Pinecone raised big rounds and signed marquee logos. But in practice, differentiation was thin. Open-source players like Milvus, Qdrant and Chroma undercut them on cost. Incumbents like Postgres (with pgVector) and Elasticsearch simply added vector support as a feature. And customers increasingly asked: “Why introduce a whole new database when my existing stack already does vectors well enough?”
The result: Pinecone, once valued near a billion dollars, is now looking for a home. The missing unicorn indeed.
In September 2025, Pinecone appointed Ash Ashutosh as CEO, with founder Edo Liberty moving to a chief scientist role. The timing is telling: The leadership change comes amid increasing pressure and questions over its long-term independence.
Prediction 2: Vectors alone won’t cut it
I also argued that vector databases by themselves were not an end solution. If your use case required exactness — like searching for “Error 221” in a manual — a pure vector search would gleefully serve up “Error 222” as “close enough.” Cute in a demo, catastrophic in production.
That tension between similarity and relevance has proven fatal to the myth of vector databases as all-purpose engines. “Enterprises discovered the hard way that semantic ≠ correct.”
Developers who gleefully swapped out lexical search for vectors quickly reintroduced… lexical search in conjunction with vectors. Teams that expected vectors to “just work” ended up bolting on metadata filtering, rerankers and hand-tuned rules.
By 2025, the consensus is clear: Vectors are powerful, but only as part of a hybrid stack.
Prediction 3: A crowded field becomes commoditized
The explosion of vector database startups was never sustainable. Weaviate, Milvus (via Zilliz), Chroma, Vespa, Qdrant — each claimed subtle differentiators, but to most buyers they all did the same thing: store vectors and retrieve nearest neighbors.
Today, very few of these players are breaking out. The market has fragmented, commoditized and in many ways been swallowed by incumbents. Vector search is now a checkbox feature in cloud data platforms, not a standalone moat.
Just as I wrote then: Distinguishing one vector DB from another will pose an increasing challenge. That challenge has only grown harder.
Vald, Marqo, LanceDB, PostgresSQL, MySQL HeatWave, Oracle 23c, Azure SQL, Cassandra, Redis, Neo4j, SingleStore, ElasticSearch, OpenSearch, Apache Solr… the list goes on.
The new reality: Hybrid and GraphRAG
But this isn’t just a story of decline — it’s a story of evolution. Out of the ashes of vector hype, new paradigms are emerging that combine the best of multiple approaches.
Hybrid Search: Keyword + vector is now the default for serious applications. Companies learned that you need both precision and fuzziness, exactness and semantics. Tools like Apache Solr, Elasticsearch, pgVector and Pinecone’s own “cascading retrieval” embrace this.
GraphRAG: The hottest buzzword of late 2024/2025 is GraphRAG — graph-enhanced retrieval augmented generation. By marrying vectors with knowledge graphs, GraphRAG encodes the relationships between entities that embeddings alone flatten away. The payoff is dramatic.
Benchmarks and evidence
Amazon’s AI blog cites benchmarks from Lettria, where hybrid GraphRAG boosted answer correctness from ~50% to 80%-plus in test datasets across finance, healthcare, industry, and law.
The GraphRAG-Bench benchmark (released May 2025) provides a rigorous evaluation of GraphRAG vs. vanilla RAG across reasoning tasks, multi-hop queries and domain challenges.
An OpenReview evaluation of RAG vs GraphRAG found that each approach has strengths depending on task — but hybrid combinations often perform best.
FalkorDB’s blog reports that when schema precision matters (structured domains), GraphRAG can outperform vector retrieval by a factor of ~3.4x on certain benchmarks.
The rise of GraphRAG underscores the larger point: Retrieval is not about any single shiny object. It’s about building retrieval systems — layered, hybrid, context-aware pipelines that give LLMs the right information, with the right precision, at the right time.
What this means going forward
The verdict is in: Vector databases were never the miracle. They were a step — an important one — in the evolution of search and retrieval. But they are not, and never were, the endgame.
The winners in this space won’t be those who sell vectors as a standalone database. They will be the ones who embed vector search into broader ecosystems — integrating graphs, metadata, rules and context engineering into cohesive platforms.
In other words: The unicorn isn’t the vector database. The unicorn is the retrieval stack.
Looking ahead: What’s next
Unified data platforms will subsume vector + graph: Expect major DB and cloud vendors to offer integrated retrieval stacks (vector + graph + full-text) as built-in capabilities.
“Retrieval engineering” will emerge as a distinct discipline: Just as MLOps matured, so too will practices around embedding tuning, hybrid ranking and graph construction.
Meta-models learning to query better: Future LLMs may learn to orchestrate which retrieval method to use per query, dynamically adjusting weighting.
Temporal and multimodal GraphRAG: Already, researchers are extending GraphRAG to be time-aware (T-GRAG) and multimodally unified (e.g. connecting images, text, video).
Open benchmarks and abstraction layers: Tools like BenchmarkQED (for RAG benchmarking) and GraphRAG-Bench will push the community toward fairer, comparably measured systems.
From shiny objects to essential infrastructure
The arc of the vector database story has followed a classic path: A pervasive hype cycle, followed by introspection, correction and maturation. In 2025, vector search is no longer the shiny object everyone pursues blindly — it’s now a critical building block within a more sophisticated, multi-pronged retrieval architecture.
The original warnings were right. Pure vector-based hopes often crash on the shoals of precision, relational complexity and enterprise constraints. Yet the technology was never wasted: It forced the industry to rethink retrieval, blending semantic, lexical and relational strategies.
If I were to write a sequel in 2027, I suspect it would frame vector databases not as unicorns, but as legacy infrastructure — foundational, but eclipsed by smarter orchestration layers, adaptive retrieval controllers and AI systems that dynamically choose which retrieval tool fits the query.
As of now, the real battle is not vector vs keyword — it’s the indirection, blending and discipline in building retrieval pipelines that reliably ground gen AI in facts and domain knowledge. That’s the unicorn we should be chasing now.
Amit Verma is head of engineering and AI Labs at Neuron7. Read more from our guest writers. Or, consider submitting a post of your own! See our guidelines here.
安谋科技发了一枚 NPU,要把 AIGC 算力提升 10 倍
随着 AI 大模型技术从云端向边缘侧、端侧设备下沉,一场围绕端侧 AI 算力的「军备竞赛」已经打响。从智能手机、AI PC 到智能汽车,消费者对设备本地运行 AIGC 的需求正迎来爆发性增长。然而,要在功耗、散热和成本都受到严格限制的端侧设备上,高效运行动辄数十亿参数的大模型,整个行业都面临着算力受限、能效要求严苛、带宽瓶颈等一系列严峻挑战。
正是在这一行业背景下,11 月 13 日,安谋科技(Arm China)在上海正式发布了「周易」X3 NPU IP。这不仅是安谋科技 Arm China 明确「All in AI」产品战略后推出的首款重磅产品,也被视为其「AI Arm CHINA」战略发展的关键实践。安谋科技 Arm China 毫不掩饰其目标,即直面端侧 AI 大模型运行的难题,打造计算效率的新标杆。
为 Transformer 和浮点计算而生
半导体 IP 行业的一个共识是,产品研发必须「面向未来 5 年进行前瞻布局」。安谋科技 Arm China 产品研发副总裁刘浩在发布会上也强调了这一点,他表示公司将持续加大投入,以「前瞻性视野整合顶尖研发资源」,并秉持「开放合作理念」,为伙伴提供从硬件到软件的端到端解决方案。
「周易」X3 正是这一前瞻性布局的产物。安谋科技 Arm China NPU 产品线负责人兼首席架构师舒浩博士指出,X3 的产品优势源于其「通用、灵活、高效且软硬协同的系统架构设计」。
这种前瞻性首先体现在架构上。「周易」X3 采用了一种专为大模型而生的最新 DSP+DSA 架构。它在设计之初就深刻理解了 AI 模型的演进趋势——即从传统的 CNN(卷积神经网络)全面转向 Transformer(大模型的基础架构)。
因此,X3 采用了「兼顾 CNN 与 Transformer 的通用架构设计」,使其既能高效处理传统的 AI 任务,也能从容应对未来几年的 Gen AI(生成式 AI)、Agentic AI(代理 AI)与 Physical AI(具身智能)的端侧落地需求。
这种新架构带来的另一个关键转变,是对浮点运算的强力支持。传统 AI 运算(如安防)大多使用定点计算,而大模型推理则高度依赖浮点(FP)运算。X3 全面增强了浮点运算(FLOPS)能力,支持从定点到浮点计算的关键转变,为承载大模型奠定了技术基石。
解码 10 倍 AIGC 算力
如果说架构是蓝图,那么性能数据就是最直观的成果。相较于上一代产品,「周易」X3 在 AIGC 大模型能力上实现了高达 10 倍的增长。这一惊人的跃升并非单一因素造就,而是由 16 倍的 FP16 TFLOPS(每秒万亿次半精度浮点运算)、4 倍的计算核心带宽,以及超过 10 倍的 Softmax 和 LayerNorm(均为大模型关键算子)性能提升共同驱动的。
在具体规格上,「周易」X3 的单 Cluster(集群)最高支持 4 个 Core(核心),可提供 8 至 80 FP8 TFLOPS(每秒万亿次 8 位浮点运算)的算力,并且支持灵活配置。其单核带宽高达 256GB/s。即使在传统的 CNN 模型上,其性能也比 X2 提升了 30%~50%。
但对于大模型而言,峰值算力(TFLOPS)只是「入场券」,如何真正在运行中把算力用起来,即「算力利用率」,才是核心难题。
周易 X3 NPU IP 发布会现场|图片来源:安谋科技
安谋科技 Arm China 给出了一组基于 Llama2 7B(70 亿参数)大模型的实测数据:「周易」X3 在 Prefill(处理提示词)阶段的算力利用率高达 72%。这是一个远超行业平均水平的数字,意味着 NPU 在处理用户输入时没有「出工不出力」。
更令人瞩目的是 Decode(生成 token)阶段的数据。安谋科技 Arm China 宣称,在自研解压硬件 WDC 的加持下,X3 实现了「Decode 阶段有效带宽利用率超 100%」。
「有效带宽超 100%」听起来有悖常理,但这背后是安谋科技 Arm China 解决端侧带宽瓶颈的「独门武器」。这个名为 WDC 的自研解压硬件,允许大模型的权重(Weights)以软件无损压缩的形式存储。在 NPU 运算需要调用这些权重时,WDC 硬件会实时进行解压。这一过程对软件透明,却能带来 15%~20% 的等效带宽提升。换言之,它让有限的物理带宽「跑」出了远超其物理限制的数据量,从而极大满足了大模型解码阶段对高吞吐量的渴求。
为了让云端大模型能高效迁移到端侧,「周易」X3 还在架构上集成了多项关键创新。它新增了 W4A8/W4A16(4 位权重、8/16 位激活)计算加速模式,这种低比特量化技术能大幅降低模型对带宽的消耗。同时,它提供了极其广泛的多精度融合计算支持,涵盖 int4, int8, int16, int32, fp4, fp8, fp16, bf16, fp32 等几乎所有主流数据类型,使其能灵活平衡性能与能效,适配从传统 CNN 到前沿大模型的各种需求。
此外,X3 还集成了一个 AI 专属硬件引擎 AIFF(AI Fixed-Function)和一个专用硬化调度器。在智能座舱或 ADAS 这类需要多任务并行和高优先级响应的场景中,这一设计至关重要。它能将 AI 任务调度对 CPU 的负载降低至 0.5%,让宝贵的 CPU 资源去处理其他系统任务,同时确保高优先级 AI 任务(如碰撞预警)获得即时响应。
让 AI 开发从「好用」到「用好」
「周易」X3 不仅仅是一块高性能的硬件 IP,它还配套了一个名为「Compass AI」的软件平台。安谋科技 Arm China 产品总监鲍敏祺指出,X3 遵循「软硬协同、全周期服务与成就客户」的准则,旨在提供从硬件、软件到售后服务的全链路支持。
在 AI 落地过程中,软件开发的「适配难、周期长、门槛高」是长期存在的痛点。「Compass AI」平台的目标,就是通过「软硬一体」的协同设计,让开发者从「好用」进阶到「用好」。
「Compass AI」的软件平台|图片来源:安谋科技
该平台的核心是 NN Compiler(神经网络编译器)。它支持 TensorFlow、ONNX、PyTorch 等主流 AI 框架,兼容超过 160 种算子和 270 种模型。
对于当前火热的大模型生态,「Compass AI」平台提供了一个极具吸引力的功能:通过其 AIPULLM 工具链,可直接支持 Hugging Face 格式模型,实现「一站式」转化与部署。Hugging Face 是全球最大的 AI 模型集散地,这一功能意味着开发者可以极低门槛地将社区的前沿模型快速部署到「周易」X3 上。
该平台还具备先进的模型推理优化能力,包括业界领先的大模型动态 shape 支持(能高效处理任意长度的输入序列),并支持
作者:Qwen/Qwen2.5-32B-Instruct
文章来源:VentureBeat, 量子位, 钛媒体, 极客公园
编辑:小康