Nebula Profiling Analytics

Build A Holistic View Of Your Software Using Nebula

Shawn Cao
4 min readJun 10, 2020
Icicle view of CPU profile for Spark Cluster (Screenshot by Author)

In the previous post, I simply introduced what Nebula is and what it can do. You can revisit it quickly here: Introducing Nebula.

Nebula is a super fast block based real-time analytics platform implemented in modern C++. There are lots of potentials we can leverage its capability to take care of your big data needs and engineering productivity needs.

Today, we demonstrate one use case for the “engineering productivity” part. Flame graph (by Brendan Gregg) widely known and used to analyze code execution paths to understand where your resources (CPU/Memory/IO) is spent at, this toolset implemented in Perl is pretty convenient and useful when you want to do ad-hoc one time off analysis on a single process dump.

However, if your software is distributed and run in parallel on hundreds of nodes, how do you get a view of your code execution? and furthermore, you want to have a place to continuously monitor it from time to time? Also, you must wonder how can you easily verify if your optimization effort changes the landscape at all?

With Nebula flame view, you can easily answer all the questions above.

Firstly, let’s look at the simple design, in this diagram, “Kafka” is just one of supported message bus, you can use a lot of other alternatives including blob storage like S3, as long as they can temporarily store and convey messages.

Simple Design

You may notice that, this architecture works for all services running in your company, despite its programming languages, scale. Every service will run with a daemon which periodically profiles the target process for “call stacks”. And output each “call stack” in a JSON blob message to your message bus, a message like this:

{
“service”:”test”,
“host”:”dev-shawncao”,
“tag”:”local”,
“lang”:”java”,
“stack”:”root \n frame-1 \n frame-2 \n frame-3”
}

Very simple, of course, all these fields are customizable if you have a lot more dimensions to be used to slice/dice your perf data later.

Now, you can get Nebula (3 steps to run at your single box) up running, update its cluster.yml to connect this message bus (replace topic/brokers in the context of Kafka):

Configure Nebula To Connect The Message Bus (Screenshot by Author)

Now, you’re ready to get all the great view of service(s)!

Steps:

  • On the nebula web UI, choose the table name (in the example, it is “code-profile”) and then choose enter “treemerge(stack)” in fields.
  • Apply filter such as time range, service or appid to see the profiling result in icicle/flame graph at any granularity.
  • You can add “Dimensions” if you want to split the view into several ones (group by key), and click “SOAR”, yeah, correct, because it will bring you to nebula in a lightening speed.
    (here, I use a tag to differentiate my environment to see different stack)
    (for privacy, I use alphabetic letters to replace each frame in call stack)
Flame Graph Generated For Given Query

The interactive icicle/flame graph is implemented by HTML5 canvas, the source code of this module can be found here.

As a normal analytics, use filter to check timeline of method invokes to match specific pattern, for example. Count of method calls that has name “zstd” in it for the last 10 hours.

counting specific method in specified time window in the cluster (Screenshot by Author)

Thank for you reading so far — hope Nebula helps you to make yourself and your company better.

--

--

Shawn Cao

Drive towards the mission of enabling data science technology accessible to everyone.