Today: As is tradition, AWS released all the news that won't make the re:Invent keynote ahead of time, the Allen Institute for AI introduces a powerful and truly open-source AI model, and the quote of the week.
This era of enterprise software is either the dawn of a new era of corporate productivity or the most hyped money pit since the metaverse. ServiceNow's Amit Zavery talks about the impact of generative AI, how SaaS companies should think about AI models, and his decision to leave Google Cloud.
Today: why observability isn't necessarily a magic bullet for reliability, Red Hat fires back at critics of its new plans for CentOS Stream, and this week in enterprise startup funding.
Welcome to Runtime! Today: why observability isn't necessarily a magic bullet for reliability, Red Hat fires back at critics of its new plans for CentOS Stream, and this week in enterprise startup funding.
Leave a trace
Over the last two decades, enterprise tech has learned countless lessons about building reliable infrastructure for web applications. Many of those lessons could only be learned by going to instant replay through the use of monitoring, and later, observability tools that reported on what went right and what went wrong.
But as the complexity involved in operating and maintaining applications has skyrocketed, relying on those tools to diagnose problems isn't enough. That was one of the main themes of Monitorama this year, a three-day conference in Portland dedicated to helping the engineers who have to keep the modern world up and running understand how to think about solving problems.
Businesses that want to take full advantage of monitoring and observability tools need to do their homework first, according to Adriana Villela, a developer advocate at Lightstep and member of the OpenTelemetry project.
There are three key signals involved in observability — metrics, logs, and traces — but traces are the most important leg of that stool when trying to get a full picture of system health.
"Traces are key because they tell us the thing that is happening overall from start to finish of your request," she said, and businesses should increase the weight they put on traces compared to metrics and logs when conducting analysis.
Villela also urged attendees to instrument their code, which involves writing new code specifically to track application performance rather than relying on logs for an after-action report.
Developers might balk at the extra work required, but companies need to make code instrumentation a default part of the development process if they want to get the most out of observability tools, she said.
But companies also need to be careful when evaluating data produced by those tools, said Jack Neely, observability architect at Palo Alto Networks.
It's easy to fall back on measuring application performance using Google's famous "four golden signals," but there are actually five signals; none of the first four really matter unless the customer experience is satisfactory, he said.
And, as one attendee pointed out, relying on observability tools requires you to trust the output that those tools — written by flawed human beings and capable of misfiring at any point, like any piece of software — are generating, when raw data might actually lead to a different conclusion.
"Who will monitor the monitors?" Neely joked in response, acknowledging the problem.
He advised setting up separate infrastructure running the open-source Prometheus tool to monitor the performance of those tools, in order to really understand what's happening.
And, in an ironic twist, having tools that help solve easy infrastructure problems could be making the problems that do sneak through much worse, said Dylan Ratcliffe, founder and CEO of Overmind.
"By improving the understanding of our systems, we're making outages more complex," he said.
In other words, the low-hanging fruit has been picked and what remains are the "unknown unknowns," he said, or the looming infrastructure problems you'll never be able to foresee.
Understanding the root cause of an outage is of course important, but corporations being what they are, postmortem reports produced after outages often lead to new internal processes for deploying software, which slows down that deployment process, which makes each deployment larger, which increases the severity and complexity of a mistake.
But don't be fooled by vendors rushing to attach themselves to the observability movement by promising a magic fix, said Paige Cruz, senior developer advocate at Chronosphere.
One of the most overused terms in enterprise software over the last few years is the notion that any one tool can provide "a single pane of glass," sometimes also expressed as "a single source of truth," which sounds really attractive to managers overwhelmed by "tool sprawl," she said.
But purpose-built tools are actually useful to different people working in different job functions, as long as employees feel they're getting value out of those tools, she said.
"The ability to go from a services point of view to a system point of view down into a specific service interaction, that ability to zoom in and out of the system, that is what you need your suite of monitoring and observability tools to provide for you," she said.
A MESSAGE FROM HASHICORP
Operational cloud maturity is the key to helping enterprises get the most from multi-cloud, slash costs, and maximize ROI with respect to speed, risk, and efficiency. Highly mature organizations are less likely to waste money on avoidable cloud spending, have an easier time dealing with cloud security issues, and better cope with the ongoing shortage of cloud skills. See the third annual State of Cloud Strategy Survey, commissioned by HashiCorp and conducted by Forrester Consulting.
Seeing red
Red Hat's Mike McGrath came out firing Monday in response to criticism of its decision to limit the ability of other companies and organizations to redistribute clones of Red Hat Enterprise Linux.
After repositioning CentOS in 2020 as essentially a beta version of upcoming RHEL releases, the company continued to publish the RHEL source code previously used to create versions of CentOS on a public site, where anyone could take it and build their own operating system that would be compatible with RHEL. Several free clones were created around that code and widely adopted by enterprises, but Red Hat announced last week that it would only release that code to current customers, who don't appear to be allowed to redistribute it.
"Simply rebuilding code, without adding value or changing it in any way, represents a real threat to open source companies everywhere. This is a real threat to open source, and one that has the potential to revert open source back into a hobbyist- and hackers-only activity," McGrath wrote Monday, acknowledging the torrent of criticism from some open-source developers and companies that rely on the clones that came along with that decision.
Google Cloud's Kelsey Hightower announced his retirement, saying "I hope to spend the rest of my life learning how to live." Check out my 2020 profile of Hightower to learn about the remarkable life he's already had.
New Relic laid off around 10% of the company, a move that follows earlier cuts as the company struggles to transition into the observability era.
A MESSAGE FROM HASHICORP
Operational cloud maturity is the key to helping enterprises get the most from multi-cloud, slash costs, and maximize ROI with respect to speed, risk, and efficiency. Highly mature organizations are less likely to waste money on avoidable cloud spending, have an easier time dealing with cloud security issues, and better cope with the ongoing shortage of cloud skills. See the third annual State of Cloud Strategy Survey, commissioned by HashiCorp and conducted by Forrester Consulting.
Tom Krazit has covered the technology industry for over 20 years, focused on enterprise technology during the rise of cloud computing over the last ten years at Gigaom, Structure and Protocol.
Today: As is tradition, AWS released all the news that won't make the re:Invent keynote ahead of time, the Allen Institute for AI introduces a powerful and truly open-source AI model, and the quote of the week.
Today: Microsoft shores up its AI strategy heading into a pivotal year, Meta is getting into the AI SaaS business with the former leader of Salesforce's AI division, and the latest enterprise funding.
Today: OpenAI would rather ChatGPT users spend more time using its tool than other "copilots," HPE rolls out a new supercomputer design, and the quote of the week.