At Lightspeed, we maintain multiple large iOS projects as well as their modularized dependencies. The last year of acquisitions brought together many different approaches to CI/CD at our company. I recently led the initiative to bring these projects and practices into alignment.

In this post, I will explain the goals we had for our continuous integration pipeline and the implementations we used to achieve them.

What we needed from CI

Speed

The team made it clear from the beginning that total run time is a huge deal for them. Fast feedback enables them to remain focused on the same task.

Dependability

Existing solutions were mainly in-house machines, running Jenkins. This had the benefit of fast builds, but the maintenance of them was a drain on the developers. They were regularly requiring restarts, and in a globally distributed team in a pandemic it wasn’t great for productivity. It was also hard to maintain clean environments as well as uniform setups.

Flexibility

While we were investigating other solutions, flexibility of use became an issue. Each of our projects has many dependencies, such as cross-platform Appium tests or Tools that require Homebrew.

Standardization

This falls in line with flexibility, but having one standard of CI across every project is beneficial because the knowledge can be shared between teams and developers. CircleCI had already been picked up by other teams and projects independently, which helped us feel more confident in selecting it as the solution we needed.

Most of these requirements were met by using CircleCI. Circle’s uptime is >99%, and since other teams were already using it, standardizing was quicker. The only aspect of our pipeline that required work was the speed of execution. When we first spiked out the most basic iteration of a CircleCI config file, we had run times over an hour (1.5hrs). This was a drastic increase over our self-hosted machines, which were running at less than half that time (30min).

Top three changes we made to improve run time

Parallelism

Parallelism is a huge part of improving any CI workflow. Splitting tests up into concurrent flows means the run time is reduced to the length of the longest-running job. In addition, splitting using CircleCI jobs gives us the ability to re-run a single job, drastically reducing the re-run time.

Testing iOS apps is more complex than testing a web app because we need to pass the built binary around so that tests have something to run against. To achieve this we created two jobs in the config.yml.

Build

  • Checkout the code
  • Restore cache
  • Bundle install
  • Install Cocoapods
  • Install from Brewfile
  • Install SPM-based tools
  • Resolve Swift package dependencies
  • Build derived data
  • Save new cache
  • Persist to workspace

Run

  • Attach to workspace
  • Bundle Install
  • Run Fastlane lane
  • Prepare test results for storage
  • Store test results
  • Store artifacts

The key to iOS parallelism in CircleCI is persist_to_workspace, which will save the paths you define so that they can be accessed by another job. As every job is a clean instance, this is how we maintain state between them.

Persisting the entire project, including derived data from the build, was nearly 8Gb for our main project. This in itself required 10 minutes to upload and download again. We saw much better performance by only saving the required files and folders:

- persist_to_workspace:
    root: /Users/distiller # Default path for a Circle job
    paths:
      - project/DerivedData/Build/Products # This is the built app that can then be ran against for testing
      - project/fastlane # So we can run Fastlane on the parallel jobs
      - project/Gemfile # So we can run Fastlane on the parallel jobs
      - project/Gemfile.lock # So we can run Fastlane on the parallel jobs
      - project/SnapshotTests # Contains the source snapshot images
      - 'project/UnitTests/ViewControllerTests/__Snapshots__' # Contains the source snapshot images
      - project/commit_author # Used for the Slack message integration in Fastlane
      - .gem # So we can run Fastlane on the parallel jobs

We maintain a shared library of Fastlane lanes, utilized across each project. Having a Mediator between the CI service and our underlying business logic makes changes in the future easier. Both xcodebuild and Fastlane have the ability to build for tests and run them again without building.

The run job can simply retrieve all this stored state with:

- attach_workspace: # Gives this job access to the results stored by the build job, so that many parallel jobs can be used from the result
    at: /Users/distiller

We split tests up into Test Plans. We had Snapshot, Unit, and UI tests. The UI tests are by far the slowest, and we had multiple jobs to split them out.

The goal is for each test to have roughly the same execution time. Collecting them under a similar use case and providing a descriptive name can help the developer quickly understand what has failed.

Regression testing

The goal of testing a pull request (PR) before it can be merged is to make sure that existing code is not affected in some decremental way. This does not require every single test to run; generally, a subset can be used to verify the changes as long as the full set is run regularly.

By using more Test Plans, we separated out regression and PR tests. The regression tests had an optional hold step on PRs so they could be run if needed. Regressions were then scheduled throughout the work week based on each team’s timezones.

scheduled_regression:
  triggers:
    - schedule:
        cron: "0 05,18,23 * * 1-5" # 18:00 NZDT, 18:00 GMT, 18:00 EST, Monday - Friday (Circle uses UTC)
        filters:
          branches:
            only:
              - develop

Tweaking the contents of regression and PR tests can give you a faster PR run time while still maintaining good coverage. Developers also have the control to easily run all tests if they feel their feature needs it.

A completed regression test workflow

Caching

In this project, there is 8GB of data to download for each build through Swift Package Manager (SPM). It added significant time to the build, so we utilized caching to speed it up.

Within the build step, we restore an existing cache, resolve the packages, and save the cache for the next job. Resolving is a required step as it tells Xcode to refresh SPM based on the new local packages.

Restore:

- restore_cache: # To speed up builds we cache the SPM packages and use the resolved file as a hash
    key: spm-cache-v4-{{ checksum "Project.xcworkspace/xcshareddata/swiftpm/Package.resolved" }}

Resolve:

lane :resolve_cached_spm_dependencies do |options|
  if options[:xcode_scheme].nil? || options[:derived_data_path].nil?
    UI.user_error!("Resolving dependencies was invoked, but an `xcode_scheme` or `derived_data_path` were not provided.")
  end

  if options[:workspace_name].nil? == false
    # Resolve dependencies and point to cache directory
    sh("xcodebuild -resolvePackageDependencies -workspace ../#{options[:workspace_name]}.xcworkspace -scheme #{options[:xcode_scheme]} -clonedSourcePackagesDirPath #{options[:derived_data_path]}/SourcePackages")
  elsif options[:project_name].nil? == false
    sh("xcodebuild -resolvePackageDependencies -project ../#{options[:project_name]}.xcodeproj -scheme #{options[:xcode_scheme]} -clonedSourcePackagesDirPath #{options[:derived_data_path]}/SourcePackages")
  else
    UI.user_error!("Resolving dependencies was invoked, but a `workspace_name` or `project_name` were not provided.")
  end
end

Save:

- save_cache:
    paths:
      - ./DerivedData/SourcePackages
    key: spm-cache-v4-{{ checksum "Project.xcworkspace/xcshareddata/swiftpm/Package.resolved" }}

We noticed significant improvements in build times using the cache. It will only need updating when the contents of the resolved file changes, such as when a package is updated or added.

Conclusion

With these changes to our CircleCI configuration, we saw a significant reduction in our test run time. Overall, the run time of PR tests was reduced from 1.5hrs to 30 minutes.

Our pipeline still could be faster. But when we combined the benefits of:

  • Reduced flakiness
  • Zero hardware maintenance and downtime
  • No requirements on security, like VPNs, etc.
  • No on-site requirements to restart hardware

We find it to be an acceptable solution going forward, and will continue to work towards faster and smarter CI for the future.


Jonathan is a Staff iOS Developer at Lightspeed and has been a mobile developer for nine years. He is interested in clean, intuitive processes and making lives easier through code. He always has a side project on the go and lives with his family beside the sea in Northern Ireland, the tech capital of the world.