tep 1: Configure Pipeline Trace Capture
Perfetto must be configured to record both the app process and SurfaceFlinger's frame timeline. The configuration below uses a circular buffer strategy optimized for scroll-heavy interactions, which are the most common jank triggers.
object PerfettoTraceConfig {
fun generateScrollTraceConfig(durationSeconds: Long = 15): String {
return """
buffers: { size_kb: 131072 }
duration_ms: ${durationSeconds * 1000}
data_sources: {
config {
name: "android.surfaceflinger.frametimeline"
target_buffer: 0
}
}
data_sources: {
config {
name: "android.choreographer"
target_buffer: 0
}
}
data_sources: {
config {
name: "android.hwui"
target_buffer: 0
hwui_config {
record_frames: true
record_layers: true
}
}
}
""".trimIndent()
}
}
Architecture Rationale: We allocate 128MB (131072 KB) to prevent buffer overflow during rapid fling gestures. The android.hwui source is explicitly enabled with record_frames: true to capture RenderThread stages. Without this, you will only see main-thread work and miss the 20-80ms shader compilation stalls that frequently break the 16.6ms budget on first launch.
Step 2: Extract and Correlate Frame Tokens
Once a trace is captured, the token acts as the primary key. You can parse the Perfetto protobuf output or use the Perfetto UI query engine to join SurfaceFlinger deadlines with Choreographer callbacks.
data class FrameDiagnostic(
val token: Long,
val expectedPresentNs: Long,
val actualPresentNs: Long,
val jankClassification: JankType,
val appProcessDurationNs: Long,
val renderThreadDurationNs: Long
)
enum class JankType {
APP_DEADLINE_MISSED,
COMPOSITOR_DELAY,
PREDICTION_ERROR,
CLEAN
}
class FrameCorrelator {
fun analyze(traceData: TraceSnapshot): List<FrameDiagnostic> {
return traceData.frames
.filter { it.jankType != JankType.CLEAN }
.map { frame ->
val appWork = traceData.findChoreographerSlice(frame.token)
val hwuiWork = traceData.findRenderThreadSlice(frame.token)
FrameDiagnostic(
token = frame.token,
expectedPresentNs = frame.expectedPresentTime,
actualPresentNs = frame.actualPresentTime,
jankClassification = mapSurfaceFlingerType(frame.jankType),
appProcessDurationNs = appWork?.duration ?: 0L,
renderThreadDurationNs = hwuiWork?.duration ?: 0L
)
}
.sortedByDescending { it.actualPresentNs - it.expectedPresentNs }
}
private fun mapSurfaceFlingerType(raw: String): JankType = when (raw) {
"AppDeadlineMissed" -> JankType.APP_DEADLINE_MISSED
"SurfaceFlinger" -> JankType.COMPOSITOR_DELAY
"PredictionError" -> JankType.PREDICTION_ERROR
else -> JankType.CLEAN
}
}
Why this structure: Separating appProcessDurationNs and renderThreadDurationNs forces explicit analysis of where the budget was consumed. A frame can have a 4ms main thread duration but a 14ms RenderThread duration, resulting in a missed deadline. The correlator surfaces this immediately, preventing the common mistake of optimizing the wrong thread.
Step 3: Automate Regression Detection
Manual trace analysis is unsustainable for CI. The Macrobenchmark library natively consumes FrameTimeline data through FrameTimingMetric. The following test pattern isolates scroll performance and enforces percentile gates.
@get:Rule
val uiPerformanceRule = MacrobenchmarkRule()
@Test
fun validateScrollFrameBudget() {
uiPerformanceRule.measureRepeated(
packageName = "com.example.production.app",
metrics = listOf(FrameTimingMetric()),
iterations = 7,
startupMode = StartupMode.COLD,
setupBlock = {
device.pressHome()
device.startActivityAndWait(Intent.ACTION_MAIN)
device.waitForIdle()
}
) {
val scrollContainer = device.findObject(By.res("main_feed_recycler"))
scrollContainer.setGestureMargin(device.displayWidth / 6)
scrollContainer.fling(Direction.DOWN)
}
}
The test runner outputs frameDurationCpuMs and frameOverrunMs at P50, P90, P95, and P99. CI pipelines should parse these JSON outputs and fail builds when P99 overrun exceeds 8ms or P90 CPU duration exceeds 14ms. This creates a deterministic quality gate that aligns with human perception thresholds.
Pitfall Guide
1. The Average Fallacy
Explanation: Teams often set CI thresholds on mean frame duration. A mean of 12ms can hide 15% of frames exceeding 25ms. Users perceive the worst frames, not the average.
Fix: Always gate on P95 or P99 percentiles. Configure your CI parser to extract frameOverrunMs.p99 and reject PRs where it exceeds 8ms.
2. Main Thread Myopia
Explanation: Developers assume a fast Choreographer.doFrame guarantees smooth rendering. The HWUI RenderThread operates independently and can stall on syncFrameState, texture Upload, or shader compile operations.
Fix: Enable android.hwui data sources in Perfetto. Explicitly monitor RenderThread slice durations. Offload heavy bitmap decoding to background threads and use remember with stable keys to prevent unnecessary recomposition.
3. Misinterpreting PredictionError
Explanation: PredictionError jankType indicates SurfaceFlinger's VSYNC prediction was inaccurate, not that your app ran slowly. This is common on devices with variable refresh rates or aggressive power saving.
Fix: Filter out PredictionError when calculating app-specific regression metrics. Focus CI gates on AppDeadlineMissed and SurfaceFlinger classifications.
4. Over-Configuring Trace Buffers
Explanation: Allocating excessive buffer sizes (size_kb > 262144) can cause the tracing daemon to drop events or increase overhead, artificially inflating frame durations.
Fix: Use 64MBβ128MB for scroll tests. For static screen measurements, 32MB is sufficient. Always validate trace completeness by checking for missing Choreographer callbacks.
5. Ignoring Compose Stability
Explanation: Unstable parameters in @Composable functions trigger unnecessary recompositions. The FrameTimeline will show compose:recomposition slices consuming 10-15ms, but the root cause is parameter instability, not algorithmic complexity.
Fix: Run @Stable and @Immutable annotations on data classes. Use the Compose compiler metrics plugin to track recomposition counts. Replace derivedStateOf with explicit state hoisting where possible.
6. Hardcoding CI Thresholds Without Baselines
Explanation: Setting absolute thresholds (e.g., P99 < 5ms) without establishing a device-specific baseline leads to false positives on mid-range hardware.
Fix: Run baseline measurements on your target device matrix. Set thresholds relative to baseline P99 + 2ms tolerance. Use device classes (low/mid/high) in your CI matrix.
7. Tracing Without VSYNC Alignment
Explanation: Perfetto traces captured without explicit VSYNC synchronization can show misaligned timelines, making token correlation appear broken.
Fix: Ensure your test harness waits for device.waitForIdle() before initiating gestures. Use adb shell setprop debug.hwui.profile visual_bars temporarily to verify VSYNC alignment in Logcat before capturing Perfetto data.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Pre-launch performance validation | Macrobenchmark + P99 gates | Deterministic, repeatable, CI-native | Low (automated) |
| Deep dive into first-launch stutter | Perfetto + HWUI RenderThread tracing | Exposes shader compilation & texture upload stalls | Medium (manual analysis) |
| Real-time production monitoring | Firebase Performance + custom frame logging | Captures field data, but lacks token correlation | High (infrastructure) |
| Compose recomposition optimization | Compiler metrics plugin + stability audit | Directly targets the most common app-side jank source | Low (developer time) |
Configuration Template
# perfetto-config-scroll.yaml
buffers:
- size_kb: 131072
fill_policy: DISCARD
duration_ms: 12000
data_sources:
- config:
name: "android.surfaceflinger.frametimeline"
target_buffer: 0
- config:
name: "android.choreographer"
target_buffer: 0
- config:
name: "android.hwui"
target_buffer: 0
hwui_config:
record_frames: true
record_layers: true
record_display_lists: true
# Macrobenchmark CI Gate Configuration (JSON output parser)
ci_thresholds:
frame_overrun_ms_p99: 8.0
frame_duration_cpu_ms_p90: 14.0
min_iterations: 5
device_class: mid_range
Quick Start Guide
- Prepare the device: Ensure your test device or emulator runs Android 12 (API 31) or higher. Enable developer options and USB debugging.
- Capture the trace: Run
adb shell perfetto -o /data/misc/perfetto-traces/ui_trace.pb -c perfetto-config-scroll.yaml while interacting with your scrollable UI.
- Open in Perfetto UI: Navigate to
ui.perfetto.dev, load the .pb file, and locate the Expected Timeline vs Actual Timeline lanes. Identify red slices indicating deadline misses.
- Extract the token: Click a red slice to retrieve the frame token. Cross-reference it with the
Choreographer#doFrame track in your app process to identify the exact recomposition or layout pass that exceeded the budget.
- Automate: Add the
FrameTimingMetric test to your project, configure CI to parse P99 overrun values, and enforce the threshold gate in your pull request workflow.