SpectatorandServoareNetflix's metrics collection libraries. Atlas is a Netflix metrics backend to manage dimensional time series data.
Servo served Netflix for several years and is still usable, but is gradually being phased out in favor of Spectator, which is only designed to work with Java 8. Spring Cloud Netflix provides support for both, but Java 8 based applications are encouraged to use Spectator.
=== Dimensional vs. Hierarchical Metrics
Spring Boot Actuator metrics are hierarchical and metrics are separated only by name. These names often follow a naming convention that embeds key/value attribute pairs (dimensions) into the name separated by periods. Consider the following metrics for two endpoints, root and star-star:
[source,json]
----
{
"counter.status.200.root": 20,
"counter.status.400.root": 3,
"counter.status.200.star-star": 5,
}
----
The first metric gives us a normalized count of successful requests against the root endpoint per unit of time. But what if the system had 20 endpoints and you want to get a count of successful requests against all the endpoints? Some hierarchical metrics backends would allow you to specify a wild card such as `counter.status.200.*` that would read all 20 metrics and aggregate the results. Alternatively, you could provide a `HandlerInterceptorAdapter` that intercepts and records a metric like `counter.status.200.all` for all successful requests irrespective of the endpoint, but now you must write 20+1 different metrics. Similarly if you want to know the total number of successful requests for all endpoints in the service, you could specify a wild card such as `counter.status.2*.*`.
Even in the presence of wildcarding support on a hierarchical metrics backend, naming consistency can be difficult. Specifically the position of these tags in the name string can slip with time, breaking queries. For example, suppose we add an additional dimension to the hierarchical metrics above for HTTP method. Then `counter.status.200.root` becomes `counter.status.200.method.get.root`, etc. Our `counter.status.200.*` suddenly no longer has the same semantic meaning. Furthermore, if the new dimension is not applied uniformly across the codebase, certain queries may become impossible. This can quickly get out of hand.
Netflix metrics are tagged (a.k.a. dimensional). Each metric has a name, but this single named metric can contain multiple statistics and 'tag' key/value pairs that allows more querying flexibility. In fact, the statistics themselves are recorded in a special tag.
Recorded with Netflix Servo or Spectator, a timer for the root endpoint described above contains 4 statistics per status code, where the count statistic is identical to Spring Boot Actuator'scounter.IntheeventthatwehaveencounteredanHTTP200and400thusfar,therewillbe8availabledatapoints:
Thenormaluseofagaugeinvolvesregisteringthegaugeonceininitializationwithanid,areferencetotheobjecttobesampled,andafunctiontogetorcomputeanumericvaluebasedontheobject.ThereferencetotheobjectispassedinseparatelyandtheSpectatorregistrywillkeepaweakreferencetotheobject.Iftheobjectisgarbagecollected,thenSpectatorwillautomaticallydroptheregistration.Seelink:https://github.com/Netflix/spectator/wiki/Gauge-Usage#using-lambda[thenote]inSpectator's documentation about potential memory leaks if this API is misused.
[source,java]
----
// the registry will automatically sample this gauge periodically
registry.gauge("gaugeName", pool, Pool::numberOfRunningThreads);
// manually sample a value in code at periodic intervals -- last resort!
A distribution summary is used to track the distribution of events. It is similar to a timer, but more general in that the size does not have to be a period of time. For example, a distribution summary could be used to measure the payload sizes of requests hitting a server.
[source,java]
----
// the registry will automatically sample this gauge periodically
WARNING: If your code is compiled on Java 8, please use Spectator instead of Servo as Spectator is destined to replace Servo entirely in the long term.
In Servo parlance, a monitor is a named, typed, and tagged configuration and a metric represents the value of a given monitor at a point in time. Servo monitors are logically equivalent to Spectator meters. Servo monitors are created and controlled by a `MonitorRegistry`. In spite of the above warning, Servo does have a link:https://github.com/Netflix/servo/wiki/Getting-Started[wider array] of monitor options than Spectator has meters.
Spring Cloud integration configures an injectable `com.netflix.servo.MonitorRegistry` instance for you. Once you have created the appropriate `Monitor` type in Servo, the process of recording data is wholly similar to Spectator.
==== Creating Servo Monitors
If you are using the Servo `MonitorRegistry` instance provided by Spring Cloud (specifically, an instance of `DefaultMonitorRegistry`), Servo provides convenience classes for retrieving link:https://github.com/Netflix/spectator/wiki/Servo-Comparison#dynamiccounter[counters] and link:https://github.com/Netflix/spectator/wiki/Servo-Comparison#dynamictimer[timers]. These convenience classes ensure that only one `Monitor` is registered for each unique combination of name and tags.
To manually create a Monitor type in Servo, especially for the more exotic monitor types for which convenience methods are not provided, instantiate the appropriate type by providing a `MonitorConfig` instance:
// somewhere we should cache this Monitor by MonitorConfig
Timer timer = new BasicTimer(config);
monitorRegistry.register(timer);
----
=== Metrics Backend: Atlas
Atlas was developed by Netflix to manage dimensional time series data for near real-time operational insight. Atlas features in-memory data storage, allowing it to gather and report very large numbers of metrics, very quickly.
Atlas captures operational intelligence. Whereas business intelligence is data gathered for analyzing trends over time, operational intelligence provides a picture of what is currently happening within a system.
No additional dependencies are necessary to send Spring Boot Actuator, Servo, and Spectator metrics to Atlas. Just annotate your Spring Boot application with `@EnableAtlas` and provide a location for your running Atlas server with the `netflix.atlas.uri` property.
==== Global tags
Spring Cloud enables you to add tags to every metric sent to the Atlas backend. Global tags can be used to separate metrics by application name, environment, region, etc.
Each bean implementing `AtlasTagProvider` will contribute to the global tag list:
TIP: An Atlas standalone node running on an r3.2xlarge (61GB RAM) can handle roughly 2 million metrics per minute for a given 6 hour window.
Once running and you have collected a handful of metrics, verify that your setup is correct by listing tags on the Atlas server:
[source,bash]
----
$ curl http://ATLAS/api/v1/tags
----
TIP: After executing several requests against your service, you can gather some very basic information on the request latency of every request by pasting the following url in your browser: `http://ATLAS/api/v1/graph?q=name,rest,:eq,:avg`
The Atlas wiki contains a link:https://github.com/Netflix/atlas/wiki/Single-Line[compilation of sample queries] for various scenarios.
Make sure to check out the link:https://github.com/Netflix/atlas/wiki/Alerting-Philosophy[alerting philosophy] and docs on using link:https://github.com/Netflix/atlas/wiki/DES[double exponential smoothing] to generate dynamic alert thresholds.