Fixing Android Animation Jank with Perfetto & Macrobenchmark
2025-10-26 by Nikolay Vlasov
When the UI starts to lag, you can't find the cause with a simple glance—you need powerful tools. In this article, we'll show you how to use the professional arsenal of an Android developer—Macrobenchmark and Perfetto—to hunt down "janky frames" in a view animation. You'll go through the entire process from measuring the problem to solving it and see how data analysis transforms a lagging interface into a perfectly smooth one.
Hello! In my previous article, we integrated LeakCanary to catch memory leaks, using a ShimmerView as our test subject. This component did its job perfectly, but on some devices, I started noticing animation stutters—the very "janky frames" that ruin the user experience.
Just looking at a laggy UI is not our way. As engineers, we must approach the problem systematically: measure, analyze, find the cause, and fix it, then measure again to confirm our success.
In this article, I will show the entire process of investigating the performance of ShimmerView:
- Writing a Macrobenchmark to get objective and reproducible performance metrics.
- Analyzing traces in the Perfetto UI using SQL queries to find the bottleneck.
- Formulating and testing a hypothesis about the cause of the FPS drops.
- Implementing an optimization based on the data obtained.
- Verifying the result and summarizing the findings.
Chapter 1: Measuring the Problem with Macrobenchmark
To understand how bad things are, we need concrete numbers. The ideal tool for this is Android Macrobenchmark. It allows us to run UI scenarios on real devices and collect performance metrics, such as FrameTimingMetric, which tracks the render time of each frame.
In a perfect world, to get a complete performance picture, tests should be run on a wide range of devices: from budget models to flagships. This helps to understand how the application behaves with different processor powers, RAM sizes, and screen resolutions.
However, for an initial assessment and to identify the most obvious problems, it's enough to use one representative device. In my case, the choice fell on a Huawei Honor 10i running Android 9. This smartphone, released in 2019, is equipped with a Kirin 710 processor and 4 GB of RAM, which can be considered modest specifications today. If an animation "lags" on such a device, the problem will likely be noticeable on more powerful models as well, albeit to a lesser extent. Thus, testing on a deliberately weaker device allows for the effective identification of performance bottlenecks.
Here is my test for the basic, unoptimized version of ShimmerView:
@RunWith(AndroidJUnit4::class)
class AnimationBenchmark {
@get:Rule
val benchmarkRule = MacrobenchmarkRule()
@Test
fun shimmerAnimationBasic() {
measureAnimation(
testName = "shimmer_basic",
stringResName = "basic_shimmer",
viewToWaitForId = "basic_shimmer_view"
)
}
// ...
private fun measureAnimation(
testName: String,
stringResName: String,
viewToWaitForId: String
) {
benchmarkRule.measureRepeated(
packageName = "com.ndev.android.ui.sample",
metrics = listOf(FrameTimingMetric()),
iterations = 3,
startupMode = StartupMode.WARM, // Simulate launching from memory
setupBlock = {
// Preparation: start Activity, navigate to the right screen
startActivityAndWait()
navigateToScreen(stringResName, viewToWaitForId)
ensureAnimationStopped()
}
) {
// Measurement: start the animation and wait
startAnimation()
Thread.sleep(35_000L) // ANIMATION_DURATION_MS
stopAnimation()
}
}
// ... rest of the test code
}
Why does the animation last 35 seconds?
Many performance issues are cumulative. A short test (1-2 seconds) might not reveal problems related to CPU throttling due to heat, pressure on the Garbage Collector, or other effects that manifest over time. A long measurement gives us a more complete and honest picture.
After running shimmerAnimationBasic, we get the following results:
frameCount min 2,179.0, median 2,184.0, max 2,187.0
frameDurationCpuMs P50 12.2, P90 27.1, P95 28.6, P99 29.2
Traces: Iteration 0 1 2
Analysis of the results:
P50(median) at 12.2 ms looks good. It's less than the 16.6 ms required for 60 FPS.- But
P90(90% of frames render slower than this value),P95, andP99are a disaster. 29.2 ms at the 99th percentile means that at its worst, our FPS drops to ~34 (1000 / 29.2). These are the "janks" we're talking about.
Now we have not just a feeling, but concrete numbers and a trace for deep analysis.
Chapter 2: Diving into the Trace with Perfetto
Let's open one of the traces generated by the benchmark in the Perfetto UI. We are faced with a huge amount of data. To make sense of it, we need to ask the right questions.
Investigation Navigator: Wall Time vs. CPU Time
The first and most important question is: is the thread busy with work or is it waiting for something?
- Wall Time (or Wall Duration) is the total time that has passed from the beginning to the end of a function's execution. "The time on a wall clock."
- CPU Time (or CPU Duration) is the time the processor actually spent executing the instructions of that thread.
If Wall Time ≈ CPU Time, it means the processor was fully loaded with computations. The problem is in our code—it's too "heavy." We need to find which specific methods (onMeasure, onDraw, inflate) are taking a long time and optimize them.
If Wall Time >> CPU Time, it means the thread was idle for most of the time, waiting. It could be waiting for a response from the disk (I/O), the network, another process (via Binder), or, as is often the case in UI, from the GPU.
To check this, we'll use an SQL query. First, let's find all the "problematic" frames (those that lasted longer than 16.6 ms), and then see what the main thread was doing during that time.
-- Step 1: Find all "janky" frames in the UI thread of our application
WITH janky_frames AS (
SELECT
ts, -- start timestamp
dur -- duration
FROM slice
JOIN thread_track ON slice.track_id = thread_track.id
JOIN thread ON thread_track.utid = thread.utid
JOIN process ON thread.upid = process.upid
WHERE
process.name = 'com.ndev.android.ui.sample'
AND slice.name = 'Choreographer#doFrame'
AND slice.dur > 16666666 -- Duration > 16.6 ms (in nanoseconds)
)
-- Step 2: Aggregate all operations on the main thread that occurred INSIDE these janky frames
SELECT
slice.name AS operation_name,
SUM(slice.dur) / 1000000 AS total_wall_duration_ms,
SUM(
(SELECT SUM(sched.dur)
FROM sched
WHERE
sched.utid = thread.utid
AND sched.ts >= slice.ts AND sched.ts < slice.ts + slice.dur
)
) / 1000000 AS total_cpu_duration_ms,
COUNT(*) AS frequency
FROM slice
JOIN thread_track ON slice.track_id = thread_track.id
JOIN thread ON thread_track.utid = thread.utid
JOIN janky_frames ON
slice.ts >= janky_frames.ts AND slice.ts < janky_frames.ts + janky_frames.dur
WHERE
thread.is_main_thread = 1
AND slice.dur > 1000000 -- Ignore very short operations
GROUP BY
slice.name
ORDER BY
total_wall_duration_ms DESC
LIMIT 25;
The result of this query is very telling:
| name | total_wall_duration_ms | total_cpu_duration_ms | frequency |
|---|---|---|---|
| Choreographer#doFrame | 619 | 67 | 37 |
| traversal | 591 | 65 | 32 |
| draw | 575 | 51 | 32 |
| onMessageReceived | 545 | 189 | 113 |
| handleMessageRefresh | 251 | 87 | 51 |
| ... | ... | ... | ... |
Looking at the first row: Choreographer#doFrame, the root operation for drawing a frame. The Wall Time (619 ms) is almost 10 times greater than the CPU Time (67 ms)!
Our hypothesis: The UI thread is not busy with work; it's waiting. Given that ShimmerView is a purely visual component that is constantly being redrawn, the most likely culprit is the GPU. The UI thread is sending draw commands too frequently and is waiting for the RenderThread and the GPU to handle them.
Chapter 3: Verifying the GPU Load Hypothesis
Let's test this hypothesis. We'll write a query that looks at what the RenderThread was busy with at the moments when the UI thread was experiencing jank.
WITH janky_frames AS (
-- (same as in the previous query)
SELECT ts, dur FROM slice
JOIN thread_track ON slice.track_id = thread_track.id
JOIN thread ON thread_track.utid = thread.utid
JOIN process ON thread.upid = process.upid
WHERE
process.name = 'com.ndev.android.ui.sample'
AND slice.name = 'Choreographer#doFrame'
AND slice.dur > 16666666
),
render_thread AS (
-- Find the RenderThread of our application
SELECT utid
FROM thread
WHERE name = 'RenderThread' AND upid = (
SELECT upid FROM process WHERE name = 'com.ndev.android.ui.sample' LIMIT 1
)
)
SELECT
slice.name,
SUM(slice.dur) / 1000000 AS total_duration_ms,
COUNT(*) as frequency
FROM slice
JOIN janky_frames ON
-- Look for operations that OVERLAP in time with the janky frames
slice.ts < janky_frames.ts + janky_frames.dur AND slice.ts + slice.dur > janky_frames.ts
WHERE
-- We are only interested in slices from the RenderThread
slice.track_id = (SELECT id FROM thread_track WHERE utid = (SELECT utid FROM render_thread))
GROUP BY
slice.name
ORDER BY
total_duration_ms DESC
LIMIT 20;
The results confirm our theory:
| name | total_duration_ms | frequency |
|---|---|---|
| DrawFrame | 1021 | 58 |
| binder transaction | 409 | 188 |
| dequeueBuffer | 382 | 57 |
| eglSwapBuffersWithDamageKHR | 226 | 29 |
| queueBuffer | 97 | 29 |
| ... | ... | ... |
Here we see the full set of "heavy" graphics-related operations: DrawFrame, dequeueBuffer, eglSwapBuffers. This is direct evidence that the RenderThread is actively working with the GPU to render our frames.
Conclusion: The problem isn't the complexity of the ShimmerView itself (CPU Time was low), but the frequency of its redraws. Our ValueAnimator calls postInvalidateOnAnimation() on every value update, forcing the rendering system to work to exhaustion.
Chapter 4: The Solution — Adaptive Draw Frequency Control
Since the problem is excessive redraws, the solution is to reduce them to a reasonable limit. We don't need to update the frame more often than the display allows (usually 60 Hz, or 16.6 ms per frame).
We will implement an adaptive frame-skipping mechanism. The logic is as follows:
1. We still use a ValueAnimator to calculate the gradient's position.
2. But we don't always call postInvalidateOnAnimation(). We only call it if enough time has passed since the last redraw (e.g., > 16 ms).
3. To be more flexible, we will use an Exponential Moving Average (EMA) to calculate the actual time between frames. If the system is under load and frames are rendering slowly, we will adapt and not try to "push" extra updates, which would only worsen the situation.
Here is the key fragment of the optimized code in ValueAnimator.addUpdateListener:
addUpdateListener { animation ->
shimmerTranslate = animation.animatedValue as Float
val now = System.nanoTime()
// Update EMA to calculate the average frame time
if (lastAnimatorUpdateNs != 0L) {
val delta = (now - lastAnimatorUpdateNs).toDouble()
emaFrameNs = emaFrameNs * (1.0 - emaAlpha) + delta * emaAlpha
}
lastAnimatorUpdateNs = now
// Determine how long to wait until the next frame
// Either our adaptive value or the target (16.6 ms)
val adaptiveFrameNs = if (useAdaptiveThrottling) {
emaFrameNs.coerceAtLeast(targetFrameNs.toDouble()).toLong()
} else {
targetFrameNs
}
// Apply a hard limit to avoid stopping completely
val allowedFrameNs = adaptiveFrameNs.coerceAtLeast(minFrameNsHardLimit)
// Call invalidate only if enough time has passed
if (now - lastInvalidateNs >= allowedFrameNs) {
lastInvalidateNs = now
postInvalidateOnAnimation()
}
}
Chapter 5: Verifying the Results
With the new code, we run the shimmerAnimationOpt benchmark. The results speak for themselves:
AnimationBenchmark_shimmerAnimationOpt
frameCount min 1,446.0, median 1,447.0, max 1,449.0
frameDurationCpuMs P50 11.0, P90 12.2, P95 13.3, P99 15.4
Let's compare "before" and "after":
| Metric | shimmerAnimationBasic (Before) |
shimmerAnimationOpt (After) |
Change |
|---|---|---|---|
frameCount (median) |
2,184 | 1,447 | -33.7% |
frameDurationCpuMs P95 |
28.6 ms | 13.3 ms | -53.5% |
frameDurationCpuMs P99 |
29.2 ms | 15.4 ms | -47.3% |
We significantly reduced the number of rendered frames (frameCount), which directly lowered the load on the GPU. Most importantly, even the 99th percentile of frame time is now 15.4 ms, which fits completely within the 16.6 ms budget for 60 FPS. The animation has become smooth.
Conclusions
- Don't trust your feelings, measure. Android Macrobenchmark is a powerful tool for obtaining objective data about UI performance.
- Wall Time vs CPU Time is your first step in trace analysis. This approach instantly shows whether the root of the problem is "heavy" code or waiting times.
- Perfetto SQL is a superpower. The ability to make precise queries to trace data allows for rapid hypothesis testing and avoids guesswork.
- Frequency is more important than complexity. Sometimes, a performance problem is not about what you draw, but how often you draw it. Intelligent frame rate management can provide a colossal performance boost.
A systematic, data-driven approach to optimization not only solves specific problems but also provides a deeper understanding of how the rendering system in Android works. This knowledge will pay off in future projects.