There are quite a few exciting projects in development within the Open Source APM (application performance management) space. The emergence of the widely known Jaeger (coming from Uber) and it's slightly lesser known brother, Apache Skywalking, have opened up new capabilities for companies looking for Open Source APM solutions. Both follow the Open Tracing initiative and support SPANs in the specified CNCF formats.
APM tools monitor application performance in realtime, provide visual topologies, generate alarms based on configured SLA's and most importantly, deep trace information on which calls generated warnings and where the problematic call lies.
Please note, both of these projects are reviewed in Beta stage. Jaeger is currently at v1.8 and Skywalking at v6.
Visuals
Both platforms handle OpenTrace format spans, both have React interfaces (although Apache Skywalking uses a flavour called DVA).
Trace information is definitely better defined in Jaeger. Diving into traces is more user friendly and design is clearer.
Apache Skywalking's topology map has relevant icons as opposed to the flat layout of Jaeger.
Architectural differences
Instrumentation
Apache Skywalking is quite enterprise focused i.e. mostly supports traditional languages like Java, .NET and Node. It's entire platform is built in Java and has great Java Agent based integrations to commonly used apps and frameworks. This provides detailed spans from common frameworks and libraries such as:
- Spring
- HttpClients
- RabbitMQ
- Kafka
The Java agent based approach allows microservices to have instrumentation applied with minimal changes. This is a key feature as tracing can be enabled on apps without any code changes.
Jaeger relies more on actually creating spans within the code which is more overhead if we're retro-fitting to an existing solution.
Trace Information
Trace information is consistent across Jaeger and Skywalking and stored in similar indexes and buckets. Skywalking has more data in terms of performance, p99 values, alarm configuration which are calculated by it's collectors.
Deployment/Clustering
There are some major differences in approaches to deployment and clustering. Apache Skywalking has stateful collectors (called OAP servers in v6.0) which can support clustering using Apache Zookeeper or Kubernetes labels. This is to support metric aggregation.
It's worthwhile noting that this approach forces your deployment to expose all collector endpoints to your applications as a call to one collector may result in a forwarding call to the correct collector in the cluster.
Jaeger has a standalone agent which sits alongside the microservice. This abstract the routing and discovery of collectors away from the client.
This approach makes Jaeger agents suitable for sidecar deployments into kubernetes clusters but potentially trickier to back port to non-kubernetes based architectures as it requires the rollout of agents across your infrastructure.
Adaptive sampling
Adaptive sampling is the ability to alter your sampling rate dependent on the traces preceding the current sampling period i.e. a heavily used endpoint should be sampled less regularly due to sheer volumes of traffic and prevent resource starvation.
Neither service currently supports this but Jaeger have clearly indicated this on their roadmap.
Smaller features
- Jaeger supports a variety of storage mechanisms, most notably Kafka and Cassandra.
- Both support ElasticSearch back ends which we think will be most commonly used in deployment.
- Apache Skywalking supports basic authentication. Jaeger does not currently support any form of authentication.
- Jaeger uses TChannels, Thift and gRPC while Apache Skywalking uses Http and gRPC.
- Jaeger is written in GO and Apache Skywalking in Java.
Summary:
For high volume services that can support application changes to support tracing, we'd recommend Jaeger. It's sidecar agents and scalable collectors make it well suited to Kubernetes deployments.
For Java/.NET/Node stacks with less scalability requirements, we think Apache Skywalking is a better fit. Due to it's light-touch integration with microservices and ease of deployment, it can easily fulfill your APM needs. It also comes complete with alerting capabilities out of the box.