Nvidia's agentic AI push; Snowflake cuts inference costs

Welcome to Runtime! Today on Product Saturday: Nvidia and Snowflake try to get more enterprises on the AI train by focusing on safety and costs, and the quote of the week.

(Was this email forwarded to you? Sign up here to get Runtime each week.)

Ship it

NIMble AI: After two years of hype, 2025 is shaping up as a year where enterprise tech vendors are going to pull out all the stops in hopes of convincing businesses to adopt generative AI. Nvidia has as much riding on the continued expansion of the AI boom as anybody, and this week it introduced three new microservices it believes will help enterprises get over the hallucination problem.

The new NIM microservices are part of Nvidia's NeMo Guardrails software, and aim to help users improve how their AI applications handle content safety, topic control, and jailbreak attempts, according to a press release. "By applying multiple lightweight, specialized models as guardrails, developers can cover gaps that may occur when only more general global policies and protections exist — as a one-size-fits-all approach doesn’t properly secure and control complex agentic AI workflows," Nvidia argued.

Defense wins championships: Security is another huge concern when it comes to generative AI applications, and tried-and-true techniques used to secure regular applications don't necessarily apply to the new world. On Wednesday Cisco unveiled Cisco AI Defense, a new tool that works with its existing network infrastructure products to scan and inventory AI application activity.

Generative AI applications have introduced a new problem for security professionals; the application logic in traditional applications doesn't retain data, but that's not the case with GenAI. "The model retains and transforms the data, creating an entirely new layer in the stack that requires careful consideration and protection," Cisco's Tom Gillis told CSO.

Memory loss (positive): After companies decide that their generative AI applications are safe and secure, they've got to pay the bill. AI inference costs will have to come down if GenAI is going to be widely adopted, AWS's Swami Sivasubramanian told Runtime last month, and this week Snowflake open-sourced a tool that promises to do just that by cutting down the number of times GenAI apps need to reload information.

SwiftKV creates a new cache for storing inputs to enterprise GenAI apps by reusing "the hidden states of earlier transformer layers to generate a KV (key-value) cache for later layers," which Snowflake said it can do without compromising accuracy. Omdia analyst Bradley Shimmin isn't sure it will be that easy: "However, despite Snowflake’s claims of minimal accuracy loss of SwiftKV-optimized LLMs, Shimmin warned that there could be tradeoffs in terms of how complex they are to implement, how much they degrade capability, and how compatible they are with the underlying inferencing architecture," InfoWorld reported.

Faster and cheaper: Data lakehouse vendor Onehouse doesn't get as much attention as Snowflake and Databricks, but it has played a key role in the evolution of the category through its stewardship of projects like Hudi. On Thursday the company introduced the Onehouse Compute Runtime (!), which it said would deliver 30X faster queries and 10X faster writes.

OCR is a compute engine that was designed to read and write across open formats like Hudi, Iceberg, and Delta Lake, which are quickly gaining traction as enterprises seek greater control of their data and lower storage costs. "There has been an ongoing gap in the industry, where many vendors have simply adapted their existing engines to read and write from open table formats, which is a great start, but we believe we can go deeper," Onehouse founder and CEO Vinoth Chandar told VentureBeat.

RAG agents? RAG agents: It's hard to think of two more widely discussed AI-related terms in recent months than "RAG" — or retrieval-augmented generation, which helps reduce AI model hallucinations — and "agents," which we've all talked about ad nauseum. Contextual AI, founded by one of the creators of the RAG technique, Douwe Kiela, thinks 2025 is the year those terms converge.

This week the startup announced the general availability of its Contextual AI Platform, which "enables AI teams to rapidly launch specialized RAG agents that are capable of active retrieval and test-time reasoning over massive volumes of structured and unstructured enterprise data," it said in a blog post. Constellation Research's Andy Thurai told SiliconAngle that “Contextual AI’s niche is building domain-specific, highly specialized RAG agents that will perform high-value tasks for enterprises, but doing so will be a challenge” because of concerns about accuracy.

Stat of the week

While the challenges of deploying enterprise generative AI applications in production are well known, it's been hard to get a sense of how many companies have actually managed to do it. According to a survey of more than 1,250 developers conducted by Vellum, 25.1% of respondents said they had deployed GenAI apps in production, but almost an equal number of respondents said they were "still building our strategy" and hadn't even started gathering requirements.

Quote of the week

"Any time when you have a shift, there will be winners who emerge. There will also be a period of consolidation, and you can have this period where there's a lot of hype and there's a lot of trying to figure out which parts are going to be real and which parts are going to stick." — Igor Ostrovsky, co-founder of Augment Code, discussing changes in the market for code editors, which has been dominated by Microsoft's Visual Studio Code for nearly a decade but is getting a fresh look thanks to generative AI.

The Runtime roundup

ServiceNow acquired Cuein, which was working on AI tools for customer-service applications, for an undisclosed amount.

While it might not matter come this time next week, the FTC released a report Friday suggesting that Microsoft and AWS's deals with OpenAI and Anthropic, respectively, "can create lock-in, deprive startups of key AI inputs, and reveal sensitive information that can undermine fair competition."

Thanks for reading — see you Tuesday!

Anthropic creates a Skills issue; Oracle jumps in the lake

The U.S. is losing a cyberwar

Salesforce still dreaming about AI agents

Tom Krazit

Nvidia's agentic AI push; Snowflake cuts inference costs

Ship it

Stat of the week

Quote of the week

The Runtime roundup

Read next