Leading open source ML advancements
PyTorch relies on CircleCI to manage 900+ open source contributors
The Challenge
The PyTorch team has nearly 100 core members (internal to Facebook as well as externally), in addition to their 900+ open source contributors, and 6 general maintainers.
Their library enables hundreds of downstream academic and commercial projects. “The things we touch are all still areas of active research, such as quantization and low precision neural networks,” said Joe Spisak, a Facebook product manager who works on PyTorch. “We take these for granted because we run literally trillions of predictions at low precision today for things like translation and computer vision. But this is an area where people still publish a lot of papers; they’re finding new ways to do things.” The team’s work on PyTorch contributes vital tooling to those conducting the latest research.
Prioritizing open source
Managing the needs of the open source community within a for-profit company might be a challenge for most, but for PyTorch, it’s core to who they are. “Open source has been in our DNA from the beginning,” Spisak says. “When Facebook came out in 2004, it was built on an open source stack, including Linux, Apache, MySQL and PHP, and as a result, open source has always been part of the core culture within Facebook’s engineering department.” Facebook’s open source efforts have grown alongside the company, enabling the PyTorch team and others within the AI department to provide vital tooling to researchers all over the world and build an enormous interconnected network of projects.
“We work to open source most everything we do,” continues Joe, “When we feel projects are stable, and we think they can benefit the community or research, it’s open sourced. We maintain them, and we create communities out of a lot of projects like PyTorch.”
Running CI/CD for OSS at scale
Over 900 PyTorch contributors rely on CircleCI to manage their software delivery workflow. The team’s stack includes CircleCI, CDN services like Netlify, and GitHub. This gives them a consistent process on any commit or pull request that comes in, no matter how many contributors the project has. “That type of system scales from small to big projects,” says Joel Marcey, developer advocate for Facebook’s OSS program. “Every person that contributes to the project knows they’re going to have a consistent experience, and can see if their PR is passing or failing.”
“CircleCI has helped us do things in a more flexible manner,” says Eric Nakagawa, Head of Open Source at Facebook. “One of the biggest challenges that our team has worked on directly was just building the tutorials on the website. That process previously took four to six hours, which when you’re trying to make small tweaks here and there can be quite frustrating.” With CircleCI, Eric’s team was able to run their builds in parallel on multiple instance types, which allowed them to reduce their build time by 75%. More importantly, by freeing themselves from the daily concerns of keeping machines humming along, they were able to focus on the beating heart of the project: their community.
“By allowing us not to focus on infrastructure, we’re able to aim [our efforts] where we think we can provide more value, and that’s one of the great takeaways I’ve had from working with CircleCI.”
Eric Nakagawa | Head of Open Source at PyTorch
About PyTorch
PyTorch is an open source deep learning platform created by Facebook’s AI research group. Like NumPy, PyTorch is a library for tensor operations but adds support for GPU and other hardware acceleration and efficient tools for AI researchers to explore different domains. While it started out as a Python-based successor to the Lua Torch framework, PyTorch has expanded in scope to be not just a research platform but also a deployment platform. Organizations such as Stanford and Salesforce rely on PyTorch to fuel their machine learning research and offerings. And PyTorch relies on CircleCI to create a smooth and reliable PR process for their many open source contributors, and to ship code more quickly and with less overhead than their previous solution.