The CNCF's plan to crowdfight patent trolls
Today: Why enterprise open-source contributors might be the secret weapon against patent trolls, AI models are starting to run into scaling problems, and the latest enterprise moves.
The rise of OpenTelemetry coupled with the separation of application data from observability tools could shake up this fast-growing market right as more companies realize they need help managing their cloud apps.
It says a lot about the complexity of modern tech that one of the hottest sectors in enterprise infrastructure helps technology leaders understand what's really happening inside their applications and across their networks.
A drumbeat has been building behind observability technology for several years now, but several factors are converging in early 2024 to suggest it is about to become a huge market. Spending on cloud infrastructure and platform services surged 320% from the first quarter of 2018 to the same period last year, according to Synergy Research, and the hangover from that party is starting to set in.
If that five-year-period had a mantra, it was "build now, worry later." Those days are over, and the companies that poured all that money into those cloud services are facing pressure from their business leaders to cut costs and improve productivity while maintaining uptime and performance.
Observability tools promise better visibility into application performance and faster resolution of the inevitable problems that will happen when software is deployed at scale, and they are gaining converts because the average enterprise can no longer get away with average software. During his annual keynote at AWS re:Invent last November, Amazon CTO Werner Vogels urged attendees to adopt observability tools to help them rein in cloud costs and focus on their core business priorities.
"Every company in the world is now a software company," said Tom Wilkie, CTO at Grafana Labs, echoing Marc Andreessen's famous line. "Your competitiveness as a business is directly related to your ability for your software engineering team to ship changes."
Yet at the same time, cost is one of the biggest issues with the current generation of observability tools. These tools feed on application data, and as those applications generate more and more data every month, a lot of customers are shocked by skyrocketing bills.
As those customers look for ways to get more value out of tools they otherwise like, some new entrants and new techniques are poised to change the way companies measure application performance, and there will be winners and losers as the new version of this sector evolves.
"We're at a little bit of a point of inflection where that data layer is becoming commoditized in a way where we will be able to build observability tools that capture more data than than not," said Danel Dayan, principal at Battery Ventures.
Observability grew out of the realization that tools used to monitor the health of on-premises data centers were not designed for cloud environments with ephemeral infrastructure, container runtimes, and all the other bells and whistles that make up the "cloud native" movement.
"We borrowed the term from control theory," said Charity Majors, co-founder and CTO of Honeycomb, which was an early pioneer of this concept. "Observability is a mathematical dual of controllability and it means how well you can understand the inner system state just by observing its outputs."
There are three data types that traditionally define observability tools: metrics, logs, and traces. Metrics provide a picture of the current state of play, logs record events as they happen for later examination, and traces help troubleshoot how something innocuous happening in one part of an application can cause big problems for another part of that application.
This is incredibly valuable information for understanding the health of a distributed computing system, but early adopters found it difficult to manage data observed in one of those categories with data in another. Most companies used different vendors to gather data across the different legs of the observability stool, and also had to integrate the unstructured data formats those vendors produced on their own if they wanted to see the full picture.
Looking back at the history of enterprise tech, open-source software has generally helped bridge gaps across vendors and allowed end users to standardize on a common framework. In 2018 — right as cloud adoption was set to quadruple — there were two competing open-source standards for measuring observability data, and the people working on those projects started to acknowledge that they were working at cross-purposes, said Ben Sigelman, co-founder of Lightstep, which was acquired by ServiceNow in 2021.
"People who are close to open-source software will acknowledge that most of the problems in open source are people problems, not technology problems. And it was a rare instance where I think we really solved or found a nice solution to a people problem," he said, explaining how OpenTelemetry came to be.
OpenTelemetry is an open-source collection of software development tools that allows IT organizations to integrate a standard approach to collecting metrics, logs, and traces data into their software, known as instrumentation. Under the auspices of the Cloud Native Computing Foundation, the project has been widely adopted by a number of observability vendors that otherwise compete tooth and nail for business, and it is supported by all the major cloud providers.
"If I can get that data from OpenTelemetry, and I don't have to worry about framework instrumentation, that just means I can put a much larger percentage of my R&D investment back into those analytics" and spend the money saved extracting signal from the data, said Steve Tack, senior vice president of product management at Dynatrace.
(OpenTelemetry) makes vendors compete on stuff that actually matters.
Several experts interviewed for this article agreed that OpenTelemetry will be the foundation for the next evolution of observability tools, similar in some fashion to how Hadoop became the foundational layer for a generation of Big Data companies a decade ago.
"It makes vendors compete on stuff that actually matters," said Sigelman, who is now general manager of cloud observability at ServiceNow.
What actually matters in 2024 — after years of near-zero interest rates that allowed tech buyers to spend on infrastructure tooling like Swifties at Ticketmaster — is cutting costs. Businesses of all kinds are under pressure to justify their tech spending, which was an inevitable consequence of the headlong rush to "digital transformation" but is a surprisingly new concept to a lot of tech vendors raised in a bull market.
The good thing for observability companies is that in a market where "you're either one of the priorities, or you're not," as Redpoint's Scott Raney put it last November, they are one of the priorities. The bean counters are slowly regaining control of tech spending after allowing their employees to shop for the groceries they needed to cook their meals, but they understand that observability (theoretically) allows companies to be more productive with their current staffing levels and can play a big role in reducing application downtime.
In the boom times, “it was faster to hire people than it was to increase efficiency and productivity,” Brex co-founder Henrique Dubugras told The Information last week. “And then the market turned, and efficiency matters a lot more.”
The bad thing is observability vendors have quickly earned a reputation for draining tech budgets. "People are beginning to correctly intuit that the value they get out of their tooling has become radically decoupled from the price they are paying," Majors wrote in a blog post last month.
When money was cheap, companies were happy to throw all the data they had at observability tools hoping to discover as much as possible about how their apps actually run. This worked out really well for companies that charged by the byte, but customers started to notice that their returns diminished after a certain point.
Datadog is the vendor most buyers complain about when it comes to the skyrocketing cost of observability tools. To a certain extent, it reflects well on the product: people find it quite useful.
But Yrieix Garnier, vice president of products at Datadog, acknowledged that customers are asking about cost control more and more. That company has responded by trying to give customers more visibility into how they're spending that money, which is leading to some interesting changes in the way those customers think about data.
"If you don't want to retain some of the data, which you feel (is) not as valuable, you can drop (it) to reduce your overall cost," he said. Customers are starting to understand they probably only have a few applications that require a full data feed, while other lower-priority applications can get away with a lower tier, he said.
Perhaps in a reflection of how that cost-cutting mentality is setting in, Datadog reported fourth-quarter revenue that exceeded Wall Street expectations on Tuesday, but investors sent its stock down nearly four percent after it issued weaker-than-expected revenue guidance for 2024.
The data economy exploded in part because the industry figured out how to separate compute from storage, which allowed data storage to scale independently from computing resources and therefore operate more efficiently. The next generation of observability thinkers is working on ways to separate application data from the measuring sticks, which would change the way observability companies monetize that data.
Some of our customers view these larger, more traditional observability tools as data stores — very expensive data stores.
Corey Harrison, co-founder and CEO of Flip.AI, is trying to straddle two enterprise tech booms by marrying generative AI technology to observability technology. His company built a large-language model that sifts through observability data stored in basic cloud storage services like AWS's S3, rather than making customers import their data onto its servers.
"Some of our customers view these larger, more traditional observability tools as data stores — very expensive data stores," he said. Flip trained an LLM on data produced by what Harrison called the "chaos gym," a nod to the chaos engineering practice in which faults are deliberately injected into distributed computing systems in order to study how they fail, with the goal of preventing real-world failures from similar incidents.
"Troubleshooting application incidents and improving application performance is still largely a manual process," he said. Flip's goal is to take application performance data stored across any number of different internal tools and storage services and apply some generative AI magic on top while allowing customers to retain that data in-house, under their control.
Other companies are taking advantage of how the database market has changed amid the rise of cloud-native data warehouses like Snowflake and Databricks, and building on top of those services, Dayan said.
"The other thing that I think is a big driving force in the market that we're seeing with earlier-stage companies is the growth of adoption of ClickHouse, another open-source, very low cost but high performance analytical database system," he said, endorsing a startup that Battery actually didn't back. "It's just becoming incredibly easy and frictionless and a lot easier to manage."
While the current generation of observability tools and startups have mostly focused on application and infrastructure performance, several companies currently involved in the market see an easy area for expansion: cybersecurity.
Monitoring and logging tools have always been an important part of cybersecurity efforts, but the push to advance those tools and blend them together with other types of application data could allow companies to use observability data to preserve both uptime and security
"When you have a zero-day event, that's something where you need to know what's your risk and your exposure, and that's something that you can get from an observability solution" because of its ability to provide a real-time feed of application performance, said Dyantrace's Tack.
Splunk's Patrick Lin, senior vice president and general manager for observability, agreed but suggested that observability data flows designed for cybersecurity will need to be a little different than those designed for developers and operations specialists.
"A lot of information that is brought in by observability tools that is brought in for the specific needs of a particular audience; software developers, SREs, that kind of stuff. There's a question about how you take that data and do something with it — either on the way in or on the way out — to make it better suited for what the security practitioners need," he said.