Today: Why cloud storage architectures have an enormous impact on generative AI app performance, a vital component of cybersecurity preparedness is in limbo thanks to a cut in federal funding, and the latest funding rounds in enterprise tech.
Today on Product Saturday: Google Cloud outlines a new way for Kubernetes users to run inference on their existing clusters, why IBM thinks its new mainframe is an AI engine, and the quote of the week.
Today: Google Cloud makes its pitch to developers and CIOs as the best place to build enterprise AI apps, the meteoric rise of MCP hits a snag, and the latest enterprise moves.
Why Vercel overhauled its serverless infrastructure for the AI era
Vercel's serverless infrastructure was designed at a time when speed was the most important goal. AI apps are a little different, and Fluid Compute is an effort to rebuild that infrastructure for the AI era.
As companies struggle to deploy apps built around large-language models, they're also exposing inefficiencies in cloud tools that were designed to run older workloads. After rewriting the infrastructure beneath its serverless computing service last year, Vercel is ready to shift its customers onto a new platform that will make it cheaper to run AI apps.
Fluid Compute is a new architecture for Vercel Functions that was designed to eliminate the idle period when an AI app is waiting for a model to answer a question — which can take seconds or even minutes on computing infrastructure used to operating in milliseconds — and costs real money. In an exclusive interview with Runtime, Vercel co-founder and CEO Guillermo Rauch described Fluid Compute as the natural evolution of serverless computing.
"Fluid Compute sets out to fix serverless for the AI era," Rauch said. It's an acknowledgement that tried-and-true computing infrastructure strategies can change very quickly when something like generative AI comes along, a broader topic that I'll be discussing with Rauch at the HumanX conference in March alongside fellow panelists Andrew Feldman of Cerebras, Robert Nishihara of Anyscale, and Sharon Zhou of Lamini AI.
Vercel, which according to Crunchbase has raised $563 million in funding, is primarily known for its web application development platform. Developers use Vercel's open-source Next.js framework and managed infrastructure services to quickly launch and run cloud apps without having to provision and configure their own hardware.
However, Vercel originally designed the infrastructure that powers its managed computing services to run traditional web apps. Fluid Compute is an effort to rebuild that infrastructure to process AI apps without changing anything about the way non-AI apps run.
AWS introduced the principles behind serverless computing back in 2014 with the launch of Lambda. Apps built around Lambda and other serverless development platforms use functions that execute distinct tasks in response to external triggers, which allows computing resources to spin up and shut down very quickly.
At that time developers were obsessed with speed, having realized that their users and customers wouldn't tolerate sites and apps that ran even 100 milliseconds or so slower than what they expected, and "we optimized the world's compute for that [problem]," Rauch said. Vercel's managed infrastructure runs on AWS and the company works closely with its Lambda team.
Basically, you're treating it more like a server when you need it.
But as Vercel's customers started using the serverless platform to build AI apps, they realized they were wasting computing resources while awaiting a response from the model. Traditional servers understand how to manage idle resources, but in serverless platforms like Vercel's "the problem is that you have that computer just waiting for a very long time and while you're claiming that space of memory, the customer is indeed paying," Rauch said.
Fluid Compute gets around this problem by introducing what the company is calling "in-function concurrency," which "allows a single instance to handle multiple invocations by utilizing idle time spent waiting for backend responses," Vercel said in a blog post last October announcing a beta version of the technology. "Basically, you're treating it more like a server when you need it," Rauch said.
Suno was one of Fluid Compute's beta testers, and saw "upwards of 40% cost savings on function workloads," Rauch said. Depending on the app, other customers could see even greater savings without having to change their app's configuration, he said.
Back is the new front
Fluid Compute was designed to work with Node.js and Python applications, which are two of the most widely used frameworks and programming languages (respectively) among professional developers surveyed by Stack Overflow. Cloudflare Workers is a rival serverless computing platform that uses a similar technology to more efficiently deal with idle requests, but it is based on a different runtime and Node,js developers have to implement a few workarounds to get their apps to run.
Rauch is hopeful that Fluid Compute will reduce the number of customers shocked by the size of their Vercel bills after their apps went viral or saw an unexpected surge in demand. That experience has been even more painful for AI app developers, who found they were paying more than they expected to serve their users with a slow app.
"Fluid addresses a huge percentage of those cases," Rauch said. "Developers felt like they weren't in control of that back end becoming slower, and Fluid brings into a world of predictability where you're concerned about what you do control, which is your code and the things that you ship."
The new platform could also make Vercel a more interesting option for larger enterprises that like the principles of serverless computing but need to make sure they're operating as efficiently as possible.
"Typically, Vercel has been seen by many as for front-end workloads." Rauch said. "With Fluid, you can run any kind of back-end workload as long as it's in those runtimes like Node and Python. It's not just the ability to run it, but to do so efficiently."
Tom Krazit has covered the technology industry for over 20 years, focused on enterprise technology during the rise of cloud computing over the last ten years at Gigaom, Structure and Protocol.
Model Context Protocol (MCP) was introduced last November by Anthropic, which called it "an open standard that enables developers to build secure, two-way connections between their data sources and AI-powered tools." After kicking the tires for a few months, vendors are jumping on board.
Today: four perspectives on building AI infrastructure that can launch quickly and stand the test of time, ServiceNow buys an enterprise AI assistant company, and the latest funding rounds in enterprise tech.
Today: Vercel unveils a new serverless computing architecture that's better equipped to manage idle resources, nobody knows what Elon Musk's minions are doing to the federal government's servers, and the latest funding rounds in enterprise tech.
AI use cases are becoming more powerful and pervasive across the software delivery lifecycle, but adopting any new technology comes with some risks. Nine members of our Roundtable discussed how technology leaders can reap the benefits of AI software-development tools while avoiding the pitfalls.