Fixing Android Animation Jank with Perfetto & Macrobenchmark

2025-10-26 by Nikolay Vlasov

When the UI starts to lag, you can't find the cause with a simple glance—you need powerful tools. In this article, we'll show you how to use the professional arsenal of an Android developer—Macrobenchmark and Perfetto—to hunt down "janky frames" in a view animation. You'll go through the entire process from measuring the problem to solving it and see how data analysis transforms a lagging interface into a perfectly smooth one.

Hello! In my previous article, we integrated LeakCanary to catch memory leaks, using a ShimmerView as our test subject. This component did its job perfectly, but on some devices, I started noticing animation stutters—the very "janky frames" that ruin the user experience.

Just looking at a laggy UI is not our way. As engineers, we must approach the problem systematically: measure, analyze, find the cause, and fix it, then measure again to confirm our success.

In this article, I will show the entire process of investigating the performance of ShimmerView:

Writing a Macrobenchmark to get objective and reproducible performance metrics.
Analyzing traces in the Perfetto UI using SQL queries to find the bottleneck.
Formulating and testing a hypothesis about the cause of the FPS drops.
Implementing an optimization based on the data obtained.
Verifying the result and summarizing the findings.

Chapter 1: Measuring the Problem with Macrobenchmark

To understand how bad things are, we need concrete numbers. The ideal tool for this is Android Macrobenchmark. It allows us to run UI scenarios on real devices and collect performance metrics, such as FrameTimingMetric, which tracks the render time of each frame.

In a perfect world, to get a complete performance picture, tests should be run on a wide range of devices: from budget models to flagships. This helps to understand how the application behaves with different processor powers, RAM sizes, and screen resolutions.

However, for an initial assessment and to identify the most obvious problems, it's enough to use one representative device. In my case, the choice fell on a Huawei Honor 10i running Android 9. This smartphone, released in 2019, is equipped with a Kirin 710 processor and 4 GB of RAM, which can be considered modest specifications today. If an animation "lags" on such a device, the problem will likely be noticeable on more powerful models as well, albeit to a lesser extent. Thus, testing on a deliberately weaker device allows for the effective identification of performance bottlenecks.

Here is my test for the basic, unoptimized version of ShimmerView:

@RunWith(AndroidJUnit4::class)
class AnimationBenchmark {

    @get:Rule
    val benchmarkRule = MacrobenchmarkRule()

    @Test
    fun shimmerAnimationBasic() {
        measureAnimation(
            testName = "shimmer_basic",
            stringResName = "basic_shimmer",
            viewToWaitForId = "basic_shimmer_view"
        )
    }

    // ...

    private fun measureAnimation(
        testName: String,
        stringResName: String,
        viewToWaitForId: String
    ) {
        benchmarkRule.measureRepeated(
            packageName = "com.ndev.android.ui.sample",
            metrics = listOf(FrameTimingMetric()),
            iterations = 3,
            startupMode = StartupMode.WARM, // Simulate launching from memory
            setupBlock = {
                // Preparation: start Activity, navigate to the right screen
                startActivityAndWait()
                navigateToScreen(stringResName, viewToWaitForId)
                ensureAnimationStopped()
            }
        ) {
            // Measurement: start the animation and wait
            startAnimation()
            Thread.sleep(35_000L) // ANIMATION_DURATION_MS
            stopAnimation()
        }
    }
    // ... rest of the test code
}

Why does the animation last 35 seconds?

Many performance issues are cumulative. A short test (1-2 seconds) might not reveal problems related to CPU throttling due to heat, pressure on the Garbage Collector, or other effects that manifest over time. A long measurement gives us a more complete and honest picture.

After running shimmerAnimationBasic, we get the following results:

frameCount           min 2,179.0,   median 2,184.0,   max 2,187.0
frameDurationCpuMs   P50     12.2,   P90     27.1,   P95     28.6,   P99     29.2
Traces: Iteration 0 1 2

Analysis of the results:

P50 (median) at 12.2 ms looks good. It's less than the 16.6 ms required for 60 FPS.
But P90 (90% of frames render slower than this value), P95, and P99 are a disaster. 29.2 ms at the 99th percentile means that at its worst, our FPS drops to ~34 (1000 / 29.2). These are the "janks" we're talking about.

Now we have not just a feeling, but concrete numbers and a trace for deep analysis.

Chapter 2: Diving into the Trace with Perfetto

Let's open one of the traces generated by the benchmark in the Perfetto UI. We are faced with a huge amount of data. To make sense of it, we need to ask the right questions.

Investigation Navigator: Wall Time vs. CPU Time

The first and most important question is: is the thread busy with work or is it waiting for something?

Wall Time (or Wall Duration) is the total time that has passed from the beginning to the end of a function's execution. "The time on a wall clock."
CPU Time (or CPU Duration) is the time the processor actually spent executing the instructions of that thread.

If Wall Time ≈ CPU Time, it means the processor was fully loaded with computations. The problem is in our code—it's too "heavy." We need to find which specific methods (onMeasure, onDraw, inflate) are taking a long time and optimize them.

If Wall Time >> CPU Time, it means the thread was idle for most of the time, waiting. It could be waiting for a response from the disk (I/O), the network, another process (via Binder), or, as is often the case in UI, from the GPU.

To check this, we'll use an SQL query. First, let's find all the "problematic" frames (those that lasted longer than 16.6 ms), and then see what the main thread was doing during that time.

-- Step 1: Find all "janky" frames in the UI thread of our application
WITH janky_frames AS (
  SELECT
    ts, -- start timestamp
    dur -- duration
  FROM slice
  JOIN thread_track ON slice.track_id = thread_track.id
  JOIN thread ON thread_track.utid = thread.utid
  JOIN process ON thread.upid = process.upid
  WHERE
    process.name = 'com.ndev.android.ui.sample'
    AND slice.name = 'Choreographer#doFrame'
    AND slice.dur > 16666666 -- Duration > 16.6 ms (in nanoseconds)
)
-- Step 2: Aggregate all operations on the main thread that occurred INSIDE these janky frames
SELECT
  slice.name AS operation_name,
  SUM(slice.dur) / 1000000 AS total_wall_duration_ms,
  SUM(
    (SELECT SUM(sched.dur)
     FROM sched
     WHERE
       sched.utid = thread.utid
       AND sched.ts >= slice.ts AND sched.ts < slice.ts + slice.dur
    )
  ) / 1000000 AS total_cpu_duration_ms,
  COUNT(*) AS frequency
FROM slice
JOIN thread_track ON slice.track_id = thread_track.id
JOIN thread ON thread_track.utid = thread.utid
JOIN janky_frames ON
  slice.ts >= janky_frames.ts AND slice.ts < janky_frames.ts + janky_frames.dur
WHERE
  thread.is_main_thread = 1
  AND slice.dur > 1000000 -- Ignore very short operations
GROUP BY
  slice.name
ORDER BY
  total_wall_duration_ms DESC
LIMIT 25;

The result of this query is very telling:

name	total_wall_duration_ms	total_cpu_duration_ms	frequency
Choreographer#doFrame	619	67	37
traversal	591	65	32
draw	575	51	32
onMessageReceived	545	189	113
handleMessageRefresh	251	87	51
...	...	...	...

Looking at the first row: Choreographer#doFrame, the root operation for drawing a frame. The Wall Time (619 ms) is almost 10 times greater than the CPU Time (67 ms)!

Our hypothesis: The UI thread is not busy with work; it's waiting. Given that ShimmerView is a purely visual component that is constantly being redrawn, the most likely culprit is the GPU. The UI thread is sending draw commands too frequently and is waiting for the RenderThread and the GPU to handle them.

Chapter 3: Verifying the GPU Load Hypothesis

Let's test this hypothesis. We'll write a query that looks at what the RenderThread was busy with at the moments when the UI thread was experiencing jank.

WITH janky_frames AS (
  -- (same as in the previous query)
  SELECT ts, dur FROM slice
  JOIN thread_track ON slice.track_id = thread_track.id
  JOIN thread ON thread_track.utid = thread.utid
  JOIN process ON thread.upid = process.upid
  WHERE
    process.name = 'com.ndev.android.ui.sample'
    AND slice.name = 'Choreographer#doFrame'
    AND slice.dur > 16666666
),
render_thread AS (
  -- Find the RenderThread of our application
  SELECT utid
  FROM thread
  WHERE name = 'RenderThread' AND upid = (
    SELECT upid FROM process WHERE name = 'com.ndev.android.ui.sample' LIMIT 1
  )
)
SELECT
  slice.name,
  SUM(slice.dur) / 1000000 AS total_duration_ms,
  COUNT(*) as frequency
FROM slice
JOIN janky_frames ON
  -- Look for operations that OVERLAP in time with the janky frames
  slice.ts < janky_frames.ts + janky_frames.dur AND slice.ts + slice.dur > janky_frames.ts
WHERE
  -- We are only interested in slices from the RenderThread
  slice.track_id = (SELECT id FROM thread_track WHERE utid = (SELECT utid FROM render_thread))
GROUP BY
  slice.name
ORDER BY
  total_duration_ms DESC
LIMIT 20;

The results confirm our theory:

name	total_duration_ms	frequency
DrawFrame	1021	58
binder transaction	409	188
dequeueBuffer	382	57
eglSwapBuffersWithDamageKHR	226	29
queueBuffer	97	29
...	...	...

Here we see the full set of "heavy" graphics-related operations: DrawFrame, dequeueBuffer, eglSwapBuffers. This is direct evidence that the RenderThread is actively working with the GPU to render our frames.

Conclusion: The problem isn't the complexity of the ShimmerView itself (CPU Time was low), but the frequency of its redraws. Our ValueAnimator calls postInvalidateOnAnimation() on every value update, forcing the rendering system to work to exhaustion.

Chapter 4: The Solution — Adaptive Draw Frequency Control

Since the problem is excessive redraws, the solution is to reduce them to a reasonable limit. We don't need to update the frame more often than the display allows (usually 60 Hz, or 16.6 ms per frame).

We will implement an adaptive frame-skipping mechanism. The logic is as follows: 1. We still use a ValueAnimator to calculate the gradient's position. 2. But we don't always call postInvalidateOnAnimation(). We only call it if enough time has passed since the last redraw (e.g., > 16 ms). 3. To be more flexible, we will use an Exponential Moving Average (EMA) to calculate the actual time between frames. If the system is under load and frames are rendering slowly, we will adapt and not try to "push" extra updates, which would only worsen the situation.

Here is the key fragment of the optimized code in ValueAnimator.addUpdateListener:

addUpdateListener { animation ->
    shimmerTranslate = animation.animatedValue as Float

    val now = System.nanoTime()
    // Update EMA to calculate the average frame time
    if (lastAnimatorUpdateNs != 0L) {
        val delta = (now - lastAnimatorUpdateNs).toDouble()
        emaFrameNs = emaFrameNs * (1.0 - emaAlpha) + delta * emaAlpha
    }
    lastAnimatorUpdateNs = now

    // Determine how long to wait until the next frame
    // Either our adaptive value or the target (16.6 ms)
    val adaptiveFrameNs = if (useAdaptiveThrottling) {
        emaFrameNs.coerceAtLeast(targetFrameNs.toDouble()).toLong()
    } else {
        targetFrameNs
    }
    // Apply a hard limit to avoid stopping completely
    val allowedFrameNs = adaptiveFrameNs.coerceAtLeast(minFrameNsHardLimit)

    // Call invalidate only if enough time has passed
    if (now - lastInvalidateNs >= allowedFrameNs) {
        lastInvalidateNs = now
        postInvalidateOnAnimation()
    }
}

Chapter 5: Verifying the Results

With the new code, we run the shimmerAnimationOpt benchmark. The results speak for themselves:

AnimationBenchmark_shimmerAnimationOpt
frameCount           min 1,446.0,   median 1,447.0,   max 1,449.0
frameDurationCpuMs   P50     11.0,   P90     12.2,   P95     13.3,   P99     15.4

Let's compare "before" and "after":

Metric	`shimmerAnimationBasic` (Before)	`shimmerAnimationOpt` (After)	Change
`frameCount` (median)	2,184	1,447	-33.7%
`frameDurationCpuMs P95`	28.6 ms	13.3 ms	-53.5%
`frameDurationCpuMs P99`	29.2 ms	15.4 ms	-47.3%

We significantly reduced the number of rendered frames (frameCount), which directly lowered the load on the GPU. Most importantly, even the 99th percentile of frame time is now 15.4 ms, which fits completely within the 16.6 ms budget for 60 FPS. The animation has become smooth.

Conclusions

Don't trust your feelings, measure. Android Macrobenchmark is a powerful tool for obtaining objective data about UI performance.
Wall Time vs CPU Time is your first step in trace analysis. This approach instantly shows whether the root of the problem is "heavy" code or waiting times.
Perfetto SQL is a superpower. The ability to make precise queries to trace data allows for rapid hypothesis testing and avoids guesswork.
Frequency is more important than complexity. Sometimes, a performance problem is not about what you draw, but how often you draw it. Intelligent frame rate management can provide a colossal performance boost.

A systematic, data-driven approach to optimization not only solves specific problems but also provides a deeper understanding of how the rendering system in Android works. This knowledge will pay off in future projects.