Product
Performance👆, complexity👇: Killer updates from Shopify engineering
January 10, 2024
By Farhan Thawar, VP & Head of Engineering at Shopify
At Shopify, we’re obsessed with technical excellence. Always have been. We spend a lot of time on our infrastructure—even when the results aren’t immediately obvious.
Often, this infrastructure work involves simplifying our systems. Continually doing this is a requirement for innovation. Why? Not all fast software is great, but all great software is fast. Every millisecond matters for our merchants. That means it needs to be simple to scale, and not bogged down by overly complex architecture.
As our CEO and founder Tobi Lütke put it, some of the best long-term returns on investment in software come from these simplifications. Here are some of the killer updates we shipped in 2023 to remove complexity and improve performance.
Performance wins
In commerce, performance is king. Performance leads to delightful and seamless user experiences. Performance leads to conversion, which adds dollars to the bottom line of our merchants. And so performance wins were a key focus for us in 2023 (and every year). We:
- Optimized the tail of the Shopify Admin customer and order pages’ load times for certain merchants—an 80K list went from ~20 seconds to ~400 milliseconds (ms) to load.
- Dramatically improved the speed of duties calculations for cross-border orders. Our p99* is down from ~550ms to ~80ms. TL;DR: Checkout is critical, commerce is global, and we sweat the details. *This means 99% of requests are completed this fast.
- Shopify admin search results now show up more than 7x faster. This is just one of many ways we’re increasing efficiency and giving time back to our merchants.
- For products with multiple variants, it’s now 200x faster to serialize them (meaning, assign a unique ID to each). A product with 2,000 variants now takes one second to serialize. This win came when we realized there was a cache slowing this down.
- GraphQL Storefront API queries are now rendered 3x faster. We extracted these queries from Shopify Core into our Storefront Renderer. And good thing, too, because Storefront API traffic has significantly increased in the past year. This is one of many GraphQL improvements we made in 2023.
- We 5x’d the query performance of Observe, the home base for our dashboards tracking Shopify’s performance. We created a new query engine that distributes requests across multiple locations, leading to an 80% improvement in query performance. Loading dashboards is now smoother and snappier. 🫰
Cleanup and subtraction
We love addition by subtraction.
Clutter slows things down. Complicates things unnecessarily for our merchants. So we got rid of a ton of it.
Here are just a few complexity-extraction highlights from 2023:
- Removed nearly 3 million lines of code. Probably more; this is a rough tally from our internal board where we share our “cleanup wins.” I’m willing to bet the real number was far more.
- Sped up Shopify Admin’s developer feedback loop by 20x, including 50% faster continuous integration (CI) using 35% less compute resources.
- Archived ~6,800 unused and unnecessary GitHub repositories, vastly more than past years.
- Merged 702 machine-generated pull requests (PRs) to delete dead code.
- Reduced memory usage of a background process from ~3 gigabytes to 400 megabytes on online-store-web. This is a substantial improvement to our developer experience, which matters to us because we are committed to making Shopify the best place to build.
- Majorly improved Storefront Renderer’s use of Ruby’s Garbage Collector (GC), a memory management system. Our tweaks led to a 56% decrease in average GC time, and an 80% decrease in P99 GC time. A clear example of how we’re focused on making the whole stack work well for commerce.
AI in engineering
Leaning on AI allows us to ship more, faster for our merchants, so we’re constantly finding new ways to incorporate AI into our workflows. Our engineering team works in close partnership with AI to make us more efficient in our work.
One of the tools we use to do this is GitHub Copilot—we were their first customer in January ‘22. Here are some of the ways Copilot has impacted our work:
- About 70% of Shopify’s engineering team uses Copilot regularly
- An average of 21-34% acceptance rate of suggestions depending on the programming language
- We estimate we’re accepting over 20,000 lines of code every weekday
- ~675K suggestions accepted total
- ~975K lines of code accepted total
These kinds of performance enhancements help lead to more than 25,000 commits per week across Shopify, and about 1,300-1,400 PRs per day.
We’ve also been leaning on an internal tool we built called VaultBot, our AI-powered chatbot where Shopifolk can ask Shopify-related questions. VaultBot is currently answering around 32% of all engineering questions.
BFCM performance
The Black Friday-Cyber Monday shopping weekend is the biggest event of the year for many of our merchants. And because of all the traffic it generates, it’s the ultimate stress test for our platform. We build all year to handle the traffic spikes of BFCM — and then that level of traffic tends to become our new normal the following year.
We’re so proud of this year’s BFCM performance because it allowed each of our millions of merchants to show up to their customers with a fast, smooth shopping experience that led to many cha-chings from their Shopify app.
Here are some of the truly wild numbers reflecting Shopify’s performance over BFCM 2023, starting with our CEO Tobi Lütke’s fave stats:
Nerd BFCM stats:
— tobi lutke (@tobi) November 25, 2023
Shopify’s egress processed 145 billion requests on Friday. App servers handled peak of ~60 million requests per minute. Increase of 38%. Total GMV was $4.1b, up by 22% from last year.
But Rails doesn't scale so what are we even doing 🤷♂️
- We achieved 99.999+% uptime, handling 29.7 petabytes of data served from across our infrastructure—that’s over 5 terabytes per minute.
- At peak, our core application server handled 967K requests per second, equivalent to 58 million requests per minute.
- During the BFCM rush, our MySQL fleet (a combo of MySQL 5.7 and MySQL 8) handled over 19 million queries per second (QPS).
- At peak, we were indexing 22 GB/sec of logs and 51.4 GB/sec of metrics data! On top of that we ingested 9 million spans a second of tracing data. We constantly monitor second-to-second data on how production systems are performing.
- Our Apache Kafka streaming infrastructure served 29 million messages per second at peak. That’s 45% growth from 2022—and we did it more efficiently this year with 14% fewer brokers.
All of this performance enabled our merchants to reach $9.3B in global sales throughout BFCM, with a peak of $4.2 million per minute on Black Friday.
________________________________
“Fast software is a cultural phenomenon, driven by curiosity not ego.” - Ian Ker-Seymer (a production engineer at Shopify)
To keep building that culture, we’ll keep celebrating and sharing these wins. We are only as strong as our foundation, and our culture helps keep that foundation strong, agile, and delightful.
Why is all of this so important? Because it allows us to show up for our merchants in the best way possible, the way they need us to, so they can keep building their businesses and showing the world the real power of entrepreneurship—backed by a performant AF commerce platform.