Swami Sivasubramanian: Why AWS thinks enterprise GenAI is finally ready to take off

Most enterprises are still struggling to turn their generative AI experiments into actual production applications. They've done more than 99% of the work, but the last 1% has proven much harder than anticipated, according to AWS's Swami Sivasubramanian.

AWS's Swami Sivasubramanian speaks on stage at re:Invent 2024 wearing a dark blue blazer and white shirt.
AWS vice president of AI and data Swami Sivasubramanian speaks at re:Invent 2024. (Credit:AWS)

Former AWS CEO Adam Selipsky liked to compare his company's position at the onset of the enterprise generative AI frenzy to a runner taking their first steps in a long race. A year ago the analogy was a little hard to accept at face value given that it appeared Microsoft was several steps into the race while AWS and Google Cloud were still tying their shoes, but as it turned out, everybody was actually still warming up.

As 2024 winds down, most enterprises are still struggling to turn their generative AI experiments into actual production applications that can deliver an acceptable return on investment while scaling to meet their needs. They've done more than 99% of the work needed to accomplish that goal, but the last 1% has proven much harder than anticipated, said Swami Sivasubramanian, vice president for AI and data services at AWS.

Swami (as he is universally known) was one of the most prominent faces of AWS's AI strategy long before anyone was talking about generative AI. Over the last two years he has led the charge to develop the tools and platforms that AWS thinks its customers need to capitalize on the technology that many still believe will create a seismic shift in the history of enterprise software.

In an interview at AWS re:Invent last week the afternoon after his keynote address, Swami discussed the obstacles that have prevented enterprise generative AI applications from really making an impact, why Amazon decided it needed to develop world-class general-purpose AI models, and the rise of industry-specific AI models.

This interview has been edited and condensed for clarity.

This was a year where enterprises were using generative AI in proof of concepts and starting to experiment with deploying in production, but most of the folks I talked to were not quite ready to take that final step. What is the biggest obstacle in front of companies that want to do generative AI but just can't get through the last-mile problem?

You're starting to see a trend of customers moving from prototypes; I called 2023 the year of proof of concepts, and 2024 is them getting to production and realizing, "oh, there are actually a lot of obstacles that we need to solve."

Some of them are like, how do I run inference at scale? How do I customize my data and improve the relevancy of RAG [retrieval-augmented generation]? Then all my data is not actually ready for RAG today, and then, of course, my guardrails don't work very well for multimodal content. These are some of the biggest roadblocks, especially as we started working with many of these customers.

Hallucination is a real big fear. They actually have something almost 99% there, and then they used to tell me the last one percent turns out to be the longest, because I can't afford to get this wrong.

The interesting thing is everybody talks about RAG as a universal panacea, but the reality is it did not work with structured data, the reality is it did not work with multimodal data. Now all your data lakes and data warehouses are now ready for RAG, and it just suddenly changes the game for them.

And then final one is, especially in regulated industries, hallucination is a real big fear. They actually have something almost 99% there, and then they used to tell me the last one percent turns out to be the longest, because I can't afford to get this wrong.

I have a team of amazing automated reasoning scientists, these are people who love to prove math theorems. We ended up actually putting together a team; they live in a world that is binary, [but] GenAI is all about probability. So we said why not actually get these two to talk to each other and marry it to create an automated reasoning check guardrail. That, I think, is a game changer; these are examples of what it takes to really move to production in a very, very solid way.

Why is it important that Amazon and AWS have their own version of these AI models, when there are companies that are entirely dedicated to working on frontier models?

We actually have so much internal knowledge on the demands and the pain points it takes to build these GenAI applications at scale, between actually having built so many of these internally within Amazon — between the shopping assistant and the customer-service assistant, to Seller Central to Amazon ads — let alone what we are doing in AWS with Amazon Q.

Swami speaks at re:Invent 2023. (Credit: AWS)

We do think it's important in the same way … I built [Amazon] RDS as well. While we actually supported all major databases, we thought it was important that we take the learnings of customers and then reinvent databases for the cloud with Aurora and Dynamo. Our strategy here is the same, because especially as we work with more and more customers, we think it's important that we actually continue to double down on making sure we meet customers on their pain points and actually make sure these models work.

And this is an area where, again, we don't think that one model is going to rule the world, just like RDS has Aurora but we also have other [database] engines doing exceptionally well. The same is going to be true here. We think Nova is going to be extremely popular.

This space is simply too early. When we launched SageMaker, I used to get [questions] saying like, "why are you actually launching SageMaker? Everybody needs only a TensorFlow service." And at that time, I used to tell them that space is super early, and [using] deep-learning frameworks in the long run is not the big question: It is about the ability to build models faster.

Last year at re:Invent you and I were talking about model consolidation, and about how most of us agree at this point that there's not going to be one model, but there's not going to be infinite models either, right? And at the time left last year, you were thinking there would be consolidation in this space over the next couple of years. Has that timeline accelerated? Has it moved out more to the future?

I think I would put them in two categories. One is the world of generalized models, which actually are capable of handling a wide variety of use cases. I do think that just like in the database world, you're going to see like a dozen or more — no more probably — actively working on it in the fullness of time. But I don't know when that fullness of time is, because GenAI time is different from traditional enterprise time (laughs).

But the interesting fact compared to what was happening last year is I actually think there is an evolving world of highly specialized and emerging models. Like EvolutionaryScale as an example: The tens of thousands, let alone hundreds of thousands of customers who use our AI/ML services may not be interested in EvolutionaryScale, except for people who are in the healthcare or the sustainability space. For them, EvolutionaryScale's models are very, very interesting. So we wanted to make sure those customers can leverage the ESM3 model, but for those customers, they still love all the rest of Bedrock's capabilities. That's why we made Bedrock Marketplace, as an example.

I do think you're going to start seeing very industry-specific frontier models and horizontal use-case specific models, like what IBM Granite does with time series. These [models] are less about broad-scale general intelligence, but it's more about like, "Hey, I care about this area, this vertical or that horizontal." That's why we built Bedrock Marketplace, for that reason.

I was talking to Dave Brown [AWS vice president of compute and networking services] yesterday about how there's been a longer tail for some of the older AI chips than maybe folks would have thought. There are always going to be some companies that want the biggest, most powerful, most capable thing, but is something similar evolving with the model space, where folks are going to want to use older model generations, which obviously would be cheaper to operate?

Absolutely, but the short answer is, right now, it actually requires a huge amount of engineering work.

Right now, if you see what a typical development project in the GenAI world is like, product managers and developers end up profiling, here are the, let's say, 200 sample tasks or use cases that happen, and then here is the prompt. And let's find out which kind of prompts need to go to the biggest model, and then which ones need to go to the medium, and then the small. Then they actually do some kind of simple rule engine to route this.

I noticed this trend where people loved when they built the demo on the big model, and then they didn't like the bill.

This model selection ends up taking something like two to three months before they could actually get something, just to pick the model and then the routing engine. You're right that there is a really good set of use cases where you don't need the intelligence of the big model; I noticed this trend where people loved when they built the demo on the big model, and then they didn't like the bill. Then they actually ended up saying, okay, for these use cases, I can actually go and use the smaller model, and then they actually built this routing engine.

That's why we built Intelligent Prompt Routing, where we automatically figure out the optimal model to use for a given use case. In our own experiments and working with customers, we found this to actually save up to 30% in cost without compromising on accuracy. And I do think this will only get better.

You can't always pick the biggest model to route all the responses to, but you can't afford to only send everything to the small model [just] because of the cost. The ideal thing you want to do is balance these between the cost and the accuracy and do it in an optimal manner. This is the art that everybody is learning how to do, [but] I don't think they should do that, because it's undifferentiated heavy lifting in our mind and we can help.

This interview was updated to clarify how far most companies have gotten into their generative AI experiments.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to Runtime.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.