Meeting the Challenges of Managing Microservices-based Applications

It used to be so much easier to do application performance management (APM). Remember? Monolithic, old-school apps didn’t have that many dependencies. Fewer moving parts meant less to monitor. It also meant greater visibility of those parts. Less data collected. Fewer issues (but by no means few) to troubleshoot.
Things are different now. We’re in the era of microservices, of distributed app architectures that are constantly chattering away at each other, making for a staggeringly complex infrastructure. Visibility alone is a huge challenge in this type of environment, before you even get to monitoring and management. Then there’s the gobs of data being generated. How do you find what you need in all that? How do you know what’s causing the slowdown in your data center, or where the potential bottleneck lies, when the pieces are everywhere?
There’s no doubt that microservices are the future, which is why companies are moving away from the monolithic apps, either building cloud-native apps that are distributed from the start, or breaking down existing apps into microservices. With that new magnitude of distribution comes new problems of scale, and therefore new challenges for the APM ecosystem. Consider, for example, that a simple customer request for a Lyft ride involves roughly 60 microservices, and you get an idea of the scale we’re talking about.
The challenges of scale are nothing new for the founders of LightStep. Their company, which came out of stealth in 2017, was created by former Google engineers who cut their teeth on developing distributed tracing for that company. In a similar vein, the LightStep product – called [x]PM – is aimed at large companies who have started on their microservices journey, even if it’s only in the discussion and planning phase.

We had a great presentation from LightStep at Cloud Field Day 4 recently, and the company laid out their vision for why their product is different. LightStep’s solution observes all of an app’s transactions, in real time, without overhead. If that sounds like a Herculean task, you’ve got the right picture. It’s often the case that APM products get turned on only when there’s a problem, keeping network traffic, logging, and so on to a minimum.
LightStep says it’s different. It works both on-premises and via a Software-as-a-Service offering, so they can handle a company’s APM wherever the need exists. Their secret sauce is what they call their “Satellite Architecture,” which distributes data collection and statistical analyses, watching all the traffic and communicating with the LightStep Engine, which is the SaaS portion of [x]PM. The Engine then provides actionable information to save time, money and valuable man-hours tracking down the source of app performance problems.
As with all APM products, admins want to understand the performance and reliability characteristics of their apps, along with the root cause of problems. LightStep does this, but does it at a scale that makes it stand out in the market. Consider some of their reference clients: Lyft, Twilio, DigitalOcean, GitHub, Under Armour, Medium – the list goes on. LightStep says that Twilio, to take one example, improved their mean time to resolution – i.e., how long it takes to fix a problem – by 92% after implementing LightStep for their huge, highly-distributed environment.
One thing this means is that if you’re a small or medium business, and your apps don’t have a need for massive scale, including potentially thousands of microservices, you’re likely not a good fit for LightStep. But, if you’re a hyperscaler or large enterprise and you have, or are moving to, a distributed, microservices-based architecture, a call to LightStep should probably be in your future.
You can see the first of the LightStep presentation videos at Cloud Field Day 4 here. There are three others in this series.