
Users expect apps that are highly performant, load fast, and interact smoothly. These qualities are not just a nice-to-have anymore. Adding performance validation as part of your release flow addresses one of these challenges: ensuring smooth interactions.
Google recently released updates to their Jetpack benchmarking libraries. One notable addition is the Macrobenchmark library. This lets you test your app’s performance in areas like startup and scroll jank. In this tutorial you will get started with the Jetpack benchmarking libraries, and learn how to implement it as part of a CI/CD flow.
Prerequisites
You will need a few things to get the most from this tutorial:
- Experience with Android development and testing, including instrumentation testing
- Experience with Gradle
- A free CircleCI account
- A Firebase and Google Cloud platform account
- Android Studio Narwhal Feature Drop or higher
About the project
The project is based on an earlier testing sample I created for a blog post on testing Android apps in a CI/CD pipeline.
I have expanded the sample project to include a benchmark
job as part of a CI/CD build. This new job runs app startup benchmarks using the Jetpack Macrobenchmark library.
Jetpack benchmarking
Android Jetpack offers two types of benchmarking: Microbenchmark and Macrobenchmark. Microbenchmark, which has been around since 2019, allows for making performance measurement of application code (think caching or similar processes that might take a while to process).
Macrobenchmark is the new addition to Jetpack. It allows you to measure your app’s performance as a whole on easy-to-notice areas like app startup and scrolling. The sample application uses these macrobenchmark tests for measuring app startup.
Both benchmarking libraries and approaches work with the familiar Android Instrumentation framework that runs on connected devices and emulators.
Set up the library
The library setup is documented well on the official Jetpack site - Macrobenchmark Setup.
Note: This tutorial will not cover all steps in fine detail because they might change in future preview releases.
Here is an overview of the procedure
- Create the Macrobenchmark module
- Configure the target app
- Set up signing
1. Create the Macrobenchmark module:
- Create a new test module named
macrobenchmark
. In Android Studio create an Android library module, and change thebuild.gradle.kts
to instead uselibs.plugins.test
as the plugin. The new module needs minimum SDK level to API 29: Android 10. - Make several modifications to the new module’s
build.gradle.kts
file. Change test implementations to implementation, point to the app module you want to test, and make the release build typedebuggable
.
1. Configure the target app:
- Add the
<profileable>
tag to your app’sAndroidManifest.xml
to enable detailed trace information capture. - Create a
benchmark
build type that mimics your release configuration but uses debug signing for local testing. - Ensure your app includes ProfilerInstaller 1.3 or higher for profile capture functionality.
1. Set up signing:
- Specify the local release signing config. You can use the existing
debug
config for the benchmark build type.
Please refer to the guide in the official Macrobenchmark library documentation for complete step-by-step instructions.
Writing and executing Macrobenchmark tests
Macrobenchmark library introduces a few new JUnit rules and metrics.
Here’s a simple startup benchmark test taken from the performance-samples project on GitHub:
package com.example.macrobenchmark.benchmark.startup
import androidx.benchmark.macro.StartupTimingMetric
import androidx.benchmark.macro.junit4.MacrobenchmarkRule
import androidx.test.ext.junit.runners.AndroidJUnit4
import androidx.test.filters.LargeTest
import com.example.macrobenchmark.benchmark.util.DEFAULT_ITERATIONS
import com.example.macrobenchmark.benchmark.util.TARGET_PACKAGE
import org.junit.Rule
import org.junit.Test
import org.junit.runner.RunWith
@LargeTest
@RunWith(AndroidJUnit4::class)
class SampleStartupBenchmark {
@get:Rule
val benchmarkRule = MacrobenchmarkRule()
@Test
fun startup() = benchmarkRule.measureRepeated(
packageName = TARGET_PACKAGE,
metrics = listOf(StartupTimingMetric()),
iterations = DEFAULT_ITERATIONS,
setupBlock = {
// Press home button before each run to ensure the starting activity isn't visible.
pressHome()
}
) {
// starts default launch activity
startActivityAndWait()
}
}
This startup benchmark measures how long it takes for your app to launch and become interactive using the StartupTimingMetric()
. The test runs multiple iterations (defined by DEFAULT_ITERATIONS
) to get consistent measurements, and provides concrete performance data you can use to track regressions over time.
For generating Baseline Profiles alongside your benchmarks, you can also include a profile generator:
package com.example.macrobenchmark.baselineprofile
import androidx.benchmark.macro.junit4.BaselineProfileRule
import androidx.test.internal.runner.junit4.AndroidJUnit4ClassRunner
import org.junit.Rule
import org.junit.Test
import org.junit.runner.RunWith
@RunWith(AndroidJUnit4ClassRunner::class)
class StartupProfileGenerator {
@get:Rule
val rule = BaselineProfileRule()
@Test
fun profileGenerator() {
rule.collect(
packageName = TARGET_PACKAGE,
maxIterations = 15,
stableIterations = 3,
includeInStartupProfile = true
) {
startActivityAndWait()
}
}
}
This baseline profile generator creates Baseline Profiles - a powerful Android optimization technique that improves app performance by about 30% from the first launch. It guides Android Runtime (ART) to perform Ahead-of-Time (AOT) compilation on critical code paths, and the includeInStartupProfile = true
parameter also generates Startup Profiles for DEX file layout optimization.
When you ship an app with Baseline Profiles, Google Play processes them and delivers optimized code to users immediately after installation.
Evaluating benchmark results
Once your benchmarks complete, they export results in a JSON file containing detailed timing metrics for each test. However, the benchmark command won’t indicate how well they performed - that analysis is up to you.
I wrote a Node.js script that evaluates benchmark results and fails the build if any measurements exceed expected thresholds:
const benchmarkData = require('/home/circleci/benchmarks/com.example.macrobenchmark-benchmarkData.json')
const STARTUP_MEDIAN_THRESHOLD_MILIS = 400
// Handle different possible metric names
function getStartupMetrics(benchmark) {
// Check for different possible startup metric names
if (benchmark.metrics.startupMs) {
return benchmark.metrics.startupMs;
} else if (benchmark.metrics.timeToInitialDisplayMs) {
return benchmark.metrics.timeToInitialDisplayMs;
} else if (benchmark.metrics.timeToFullDisplayMs) {
return benchmark.metrics.timeToFullDisplayMs;
}
return null;
}
let err = 0;
// Process all startup-related benchmarks
benchmarkData.benchmarks.forEach((benchmark, index) => {
const metrics = getStartupMetrics(benchmark);
if (!metrics) {
console.log(`⚠️ Benchmark ${index + 1} (${benchmark.name || 'unnamed'}) - No startup metrics found, skipping`);
return;
}
const benchmarkName = benchmark.name || `benchmark_${index + 1}`;
const className = benchmark.className || 'unknown';
const medianTime = metrics.median;
let msg = `Startup benchmark (${benchmarkName}) - ${medianTime.toFixed(2)}ms`;
if (medianTime > STARTUP_MEDIAN_THRESHOLD_MILIS) {
err = 1;
console.error(`${msg} ❌ - OVER THRESHOLD ${STARTUP_MEDIAN_THRESHOLD_MILIS}ms`);
console.error(` Class: ${className}`);
console.error(` Runs: [${metrics.runs.map(r => r.toFixed(2)).join(', ')}]`);
console.error(` Min: ${metrics.minimum.toFixed(2)}ms, Max: ${metrics.maximum.toFixed(2)}ms`);
} else {
console.log(`${msg} ✅`);
console.log(` Class: ${className}`);
console.log(` Runs: [${metrics.runs.map(r => r.toFixed(2)).join(', ')}]`);
}
});
// If no benchmarks were processed, that's an error
if (benchmarkData.benchmarks.length === 0) {
console.error('❌ No benchmarks found in the data file');
err = 1;
}
process.exit(err)
This script handles different metric types that benchmarks may export, provides detailed output for debugging, and uses proper exit codes to integrate with CI systems. It will fail the build if any startup time exceeds the 400ms threshold.
To establish your threshold, run benchmarks on a known-good version of your app and use those results as your baseline.
Running benchmarks in a CI/CD pipeline
You can run these samples in Android Studio, which gives you a nice printout of your app’s performance metrics, but won’t do anything to ensure your app is always performant. To do that you need to integrate benchmarks in your CI/CD process.
The steps required are:
- Build the release variants of app and Macrobenchmark modules
- Run the tests on Firebase Test Lab (FTL) or similar tool
- Download the benchmark results
- Store benchmarks as artifacts
- Process the benchmark results to get timings data
- Pass or fail the build based on the results
I created a new benchmarks-ftl
job to run my tests:
version: 2.1
orbs:
android: circleci/android@3.1.0
gcp-cli: circleci/gcp-cli@3.3.2
jobs:
benchmarks-ftl:
executor:
name: android/android_docker
tag: 2025.03.1-node
steps:
- checkout
- android/restore_gradle_cache
- run:
name: Build app and test app
command: ./gradlew app:assembleBenchmark macrobenchmark:assembleBenchmark
- gcp-cli/setup:
version: 404.0.0
- run:
name: run on FTL
command: |
gcloud firebase test android run \
--type instrumentation \
--app app/build/outputs/apk/benchmarkRelease/app-benchmarkRelease.apk \
--test macrobenchmark/build/outputs/apk/benchmarkRelease/macrobenchmark-benchmarkRelease.apk \
--device model=tokay,version=34,locale=en,orientation=portrait \
--timeout 45m \
--directories-to-pull /sdcard/Download \
--results-bucket gs://android-sample-benchmarks \
--results-dir macrobenchmark \
--environment-variables clearPackageData=true,additionalTestOutputDir=/sdcard/Download,no-isolated-storage=true
- run:
name: Download benchmark data
command: |
mkdir ~/benchmarks
gsutil cp -r 'gs://android-sample-benchmarks/macrobenchmark/**/artifacts/sdcard/Download/*' ~/benchmarks
gsutil rm -r gs://android-sample-benchmarks/macrobenchmark
- store_artifacts:
path: ~/benchmarks
- run:
name: Evaluate benchmark results
command: node scripts/eval_startup_benchmark_output.js
workflows:
test-and-build:
jobs:
- benchmarks-ftl:
filters:
branches:
only: main # Run benchmarks only on main branch
This job runs the macrobenchmark tests on a real device in Firebase Test Lab. We use the Android Docker executor with Node.js for our evaluation script.
Key aspects of this configuration:
- Benchmark build type: It will build the
benchmark
variant for proper profiling capabilities - Real device testing: Uses Firebase Test Lab with a Pixel 8a (Android 14) for reliable performance measurements. Running benchmarks on emulators is often flaky due to virtualization overhead and inconsistent performance, so real devices are strongly recommended for accurate results.
- Device selection: You can explore the list of available Firebase devices if you want to test against a different device.
- Extended timeout: Includes a 45-minute timeout to accommodate longer benchmark runs. Note that 45 minutes is the maximum timeout for physical devices on Firebase Test Lab.
- Branch filtering: Runs benchmarks only on the main branch to control resource usage
After building the benchmark APKs, we initialize the Google Cloud CLI and run tests in Firebase Test Lab. Once tests complete, we download the benchmark data from Cloud Storage, store it as build artifacts, and run our evaluation script to pass or fail the build based on performance thresholds.
CircleCI workflow for building, testing, macrobenchmarking and release
We will create a workflow that combines all these testing strategies where unit tests run on every commit to catch basic logic errors quickly, while UI tests run on multiple Android versions but only on feature branches to save resources. Instrumentation tests run on the main branch across multiple API levels for thorough compatibility testing, and benchmarks run only on main branch commits after unit tests pass to ensure performance validation. Finally, release builds only happen after all other tests pass, guaranteeing quality releases. The workflow uses job dependencies and branch filtering to create an efficient pipeline that balances thorough testing with resource management.
Here’s how the complete workflow looks with unit tests, instrumentation tests, benchmarks, and release builds all working together:
version: 2.1
orbs:
android: circleci/android@3.1.0
gcp-cli: circleci/gcp-cli@3.3.2
jobs:
unit-test:
executor:
name: android/android_docker
tag: 2025.03.1
steps:
- checkout
- android/restore_gradle_cache
- android/run_tests:
test_command: ./gradlew testDebug
- android/save_gradle_cache
- run:
name: Save test results
command: |
mkdir -p ~/test-results/junit/
find . -type f -regex ".*/build/test-results/.*xml" -exec cp {} ~/test-results/junit/ \;
when: always
- store_test_results:
path: ~/test-results
- store_artifacts:
path: ~/test-results/junit
android-test:
parameters:
system-image:
type: string
default: system-images;android-35;google_apis;x86_64
executor:
name: android/android_machine
resource_class: large
tag: 2024.11.1
steps:
- checkout
- android/start_emulator_and_run_tests:
test_command: ./gradlew :app:connectedDebugAndroidTest
system_image: << parameters.system-image >>
- run:
name: Save test results
command: |
mkdir -p ~/test-results/junit/
find . -type f -regex ".*/build/outputs/androidTest-results/.*xml" -exec cp {} ~/test-results/junit/ \;
when: always
- store_test_results:
path: ~/test-results
- store_artifacts:
path: ~/test-results/junit
benchmarks-ftl:
# ... (same as shown above)
release-build:
executor:
name: android/android_docker
tag: 2025.03.1
steps:
- checkout
- android/restore_gradle_cache
- run:
name: Assemble release build
command: |
./gradlew assembleRelease
- store_artifacts:
path: app/build/outputs/apk/release/app-release-unsigned.apk
workflows:
test-and-build:
jobs:
- unit-test
- android/run_ui_tests:
executor:
name: android/android_machine
resource_class: large
tag: 2024.11.1
filters:
branches:
ignore: main # regular commits
- android-test:
matrix:
alias: android-test-all
parameters:
system-image:
- system-images;android-35;google_apis;x86_64
- system-images;android-33;google_apis;x86_64
- system-images;android-32;google_apis;x86_64
- system-images;android-30;google_apis;x86
- system-images;android-29;google_apis;x86
name: android-test-<<matrix.system-image>>
filters:
branches:
only: main # Commits to main branch
- benchmarks-ftl:
requires:
- unit-test
filters:
branches:
only: main # Run benchmarks only on main branch
- release-build:
requires:
- unit-test
- android-test-all
- benchmarks-ftl
filters:
branches:
only: main # Commits to main branch
Running the workflow on CircleCI
In this section, you will learn how to automate the workflow using CircleCI.
Setting up the project on CircleCI
On the CircleCI dashboard, go to the Projects tab, search for the GitHub repo name and click Set Up Project for your project.
You will be prompted to add a new configuration file manually or use an existing one. Because you have already pushed the required configuration file to the codebase, select the Fastest option. Enter the name of the branch hosting your configuration file and click Set Up Project to continue.
Completing the setup will trigger the pipeline and the first build might fail as the environment variables are not set.
Create a Google Cloud service account
Before configuring environment variables, you’ll need to create a Google Cloud service account with the appropriate permissions for Firebase Test Lab. The service account needs the following IAM roles:
- Firebase Test Lab Admin - To run tests on Firebase Test Lab
- Storage Object Admin - To store and retrieve test results from Cloud Storage
- Cloud Storage Admin - To manage Cloud Storage buckets for benchmark data
You can create a service account and download its JSON key file by following the official Google Cloud documentation. Make sure to assign the required permissions during the service account creation process.
Once you have the JSON key file downloaded, you’ll need to base64 encode it before adding it to CircleCI as an environment variable.
Set environment variables
Our benchmarking workflow requires Google Cloud Platform access for Firebase Test Lab. On the project page, click Project settings and head over to the Environment variables tab. On the screen that appears, click on Add environment variable button and add the following environment variables:
GCLOUD_SERVICE_KEY
: Your Google Cloud service account key in JSON format (base64 encoded).GOOGLE_PROJECT_ID
: Your Google Cloud project ID where Firebase Test Lab is enabled.
Note: To get your service account key, create a service account in the Google Cloud Console with Firebase Test Lab Admin permissions, download the JSON key file, and base64 encode it before adding to CircleCI.
Once you add the environment variables, they should appear on the dashboard with their keys visible but values hidden for security.
Now that the environment variables are configured, trigger the pipeline again. This time the build should succeed and your benchmarks will run on Firebase Test Lab.
You can also go to the Firebase console to view more details about the test execution.
Conclusion
In this article, I covered how to include Android app performance benchmarking in a CI/CD pipeline alongside your other tests. This prevents performance regressions from reaching your users as you add new features and other improvements.
We used the new Android Jetpack macrobenchmarking library and showed ways to integrate it with Firebase Test Lab to run benchmarks on real devices. We showed how to analyze the results, passing the build if the application startup time exceeds our allowed threshold.