The Core Web Vitals hype train

Earlier this week, Google announced that they will officially start using a subset of web performance metrics known as Core Web Vitals as part of their search ranking calculations starting in May 2021. This is huge, positive news in the perf & SEO worlds. Google's inclusion of Core Web Vitals will act as a much-needed forcing function to encourage sites to improve their performance.

However, my excitement for Core Web Vitals is tempered with a healthy skepticism. I'm not yet convinced that Largest Contentful Paint (LCP), First Input Delay (FID), and Cumulative Layout Shift (CLS) are the right metrics that all sites should be measuring themselves against. I worry that the outsized emphasis placed on Core Web Vitals by including them in SEO scoring will result in developers focusing solely on those three numbers without truly understanding what they mean and, more importantly, what they don't mean.

A brief history of web perf metrics #

In order to understand the importance of Core Web Vitals, it's helpful to understand the historical context of web performance metrics and how we got to this point. For many years, the only way that we were able to track and measure performance on the web was via a set of network- and browser internals-based metrics known collectively as Navigation Timing. The Navigation Timing spec is widely supported across browsers (even going back to IE9!). These metrics answer questions like:

When does the server respond to a request with the first byte of HTML (time to first byte/TTFB)?
When does the browser finish loading and parsing the DOM (DOM content loaded/DCL)?
When have all the assets included on the page finished downloading (page load)?
And many others...

For years, Navigation Timing was the standard that folks used to measure their performance, but over time it became clear that there were significant issues with this approach. Firstly, navigation-centered metrics are highly variable based on your tech choices - the size of your assets combined with where, when, and how you include them in your markup can dramatically change the navigation timing values of your page. If your site (like many) switches from a fully server-rendered page to a client-rendered SPA, you'll see vastly different values reported for navigation metrics, but that doesn't necessarily mean that your site's performance got better or worse.

This is because Navigation Timing metrics measure milestones which are meaningful to a web browser, but often completely invisible to a user. If you ask a human being how fast a page loaded on their phone, they aren't going to respond with "when the DOM was fully constructed" or "when all the site's assets were downloaded and parsed". They'll answer based on when the content appears on screen, when the main image or header text was fully rendered, or when they were able to click/tap/scroll to interact with the page.

To bring us closer to a world where we can measure performance in a way that matters to the people that use the web, the industry has more recently shifted away from navigation timings towards new, user-centered metrics that better reflect the experience of users.

Enter Core Web Vitals #

Google's answer to the question of what metrics best represent the experience of users are three metrics that they are calling Core Web Vitals:
Largest Contentful Paint (LCP), which measures the amount of time (in ms) that it takes for the largest (and probably most important) thing on the page to render,
First Input Delay (FID), which measures the amount of time (in ms) that it takes for a page to respond to user input such as tapping/clicking/scrolling/etc,
Cumulative Layout Shift (CLS), which is a fraction representing how much the layout of the page changes during load (e.g. that annoying thing where you go to tap a link but the page has shifted the location of the link out from under you).

Now, don't get me wrong, I am pro-Core Web Vitals! These metrics, what they measure, and the fact that Google is holding sites accountable for their performance is a huge step forward for the web. Developers who have struggled to get buy-in from leadership now have a rock-solid business case for investing in performance improvements. The folks at Google who worked on Core Web Vitals are lovely people who have the best of intentions and truly believe in the importance of performance to move the web forward.

I'm just not quite ready to jump on the Core Web Vitals hype train. Google has chosen to include metrics in Core Web Vitals that meet Google's needs, and I'm not convinced that Google's goals and our goals as web performance practitioners are exactly the same thing.

Vital for whom? #

The Core Web Vital metrics - LCP, FID, and CLS - measure common annoyances that would cause a user to get frustrated and bounce early after clicking a link, especially on their first visit to a site or page. And, because these metrics are all tech-stack agnostic, they can be used to compare sites against each other, unlike traditional Navigation Timing metrics. It doesn't matter if you serve static HTML with no javascript, or a framework-powered SPA that renders completely on the client, or any of myriad possible frontend technology choices in between - Core Web Vitals measurements are comparable across every site on the internet.

So, from a business perspective, the metrics that were included in Core Web Vitals make total sense for Google. Their search business is predicated on returning a list of relevant links - links that users click on and stay at (one could argue that because of advertising revenue Google would want people to constantly go back to the results list, but if users don't find Google's results relevant, then they won't keep using the service). And, Google needs to be able to compare millions of sites with different tech stacks against each other in order to provide the best, least bounce-inducing results.

For the rest of us, however, I don't think LCP, FID, and CLS alone give us a complete picture of our sites' performance. There are significant gaps in what Core Web Vitals measure, so I think it's important to temper our enthusiasm with a frank discussion of the shortcomings of these metrics.

The browser problem #

One of the main problems I see with Core Web Vitals is the lack of cross-browser support. Two of the three (LCP and CLS) are currently only available in Chrome and other Chromium-based browsers. The third, FID, is supported natively in Chrome via the PerformanceEventTiming API, or is available via a polyfill that site owners need to individually add to their markup in order to track FID on their own.

This is unfortunately a pervasive and frustrating issue in the performance community - many important and useful new metrics are only supported in Chrome. I’m happy that folks at Google are working hard to push the state of performance monitoring forward with these awesome and useful new metrics, but if we focus on them alone (Vitals or otherwise), we exclude a significant portion of traffic from our performance monitoring.

There's a tendency to dismiss this concern because "all modern browsers are pretty much the same, so Chrome is representative of all of our traffic", but that is far from the reality I've seen from analyzing millions of data points that show pretty clear differences between browser performance. Modern browsers may support similar features, but under the hood, they handle loading, rendering, and javascript execution in different ways, and it feels short-sighted and injudicious to simply ignore the experiences of huge swaths of users simply because we can’t use these new metrics to do so.

The solution to this problem, of course, is that other browser vendors need to start providing these super important metrics to developers! Until that beautiful day comes, I don't think we should dismiss other non-Vitals timings wholesale, or stop searching for creative cross-browser ways to measure real user performance. At Etsy, for example, we implemented a custom user timing that logs when our SVG logo renders. This measures approximately the same value as FCP (within a 50ms standard deviation), and is available in all browsers, even back to IE10!

The business outcome problem #

Another issue with the metrics selected for Core Web Vitals is that it's been hard to correlate changes in FID and CLS with business outcomes, and even harder to do causal analysis when we see regressions. Note that we don't yet have enough historical LCP data to know how useful it will be, but based on FCP analysis I am expecting LCP to be highly correlated with business outcomes, unlike the other two Core Web Vitals.

FID, in particular, has proven to be a highly subjective measurement that isn't always useful when viewed in isolation. FID is ostensibly supposed to be measuring the effect of thread-blocking Javascript execution, but our FID RUM values are most often logged when a user clicks on a navigation link that takes them to another page.

FID data is also difficult to analyze, because in reality it can change either because you are blocking the thread for a longer period, or because you changed the user interface in a way that changes user behavior. Disentangling experimental results that have an ambiguous cause is frustrating to say the least.

To give an example of how FID is an unreliable tool for measuring user outcomes: Let's say we load images in a row of search results faster. This causes users to click on search results earlier than before, but now this interaction happens while Javascript fired at DCL is still executing, so FID increases. Should we undo that change and go back to loading the images slower? If we only looked at Core Web Vitals, then the answer would be "CLS and LCP stayed the same, but FID degraded, so we should probably undo the change". However the real, nuanced answer is more like "Long tasks didn't increase alongside FID, so it seems like we changed user behavior. Let's continue to load the image earlier, and see if we can split up or reduce the page's JS execution to give the main thread a chance to respond faster".

This is why it's so important for web performance practitioners to continue to monitor multiple metrics. Core Web Vitals are super important, especially now because they will affect SEO, but they are not necessarily the most important or the only important metric to track. Do your own analysis to determine what metrics matter most to you and your users and have the biggest impact on your business outcomes.

Where do we go from here? #

Core Web Vitals metrics are data points that are certainly important, but, on their own, they don't give us a complete picture into the entirety of our user's experiences. So what does give us that picture? Welllllllllll, we're not quite there yet.

To gain better, more holistic insight into a user’s loading experience we can combine Core Web Vitals with other metrics like Time to First Byte (TTFB), First Contentful Paint (FCP), Time to Interactive (TTI), and Total Blocking Time (TBT). We can layer on APIs like Long Tasks, Resource Timing, and Element Timing that expose more data from the browser's inner workings to measure parts of the user's experience interacting with our pages. But, if we use a SPA Framework, PWA App Shell, or other AJAX methodology to change out content without a full-page refresh, then we are basically on our own.

The story for measuring the full lifecycle of a user's experience loading, interacting with, and coming back to our sites is still woefully inadequate. Particularly when it comes to cross-browser compatibility. To extend the hype train metaphor a bit - there’s a whole lot of track that we still have to build in front of us!

Getting off the hype train #

My fear is that Google's putting a stake in the ground with Web Vitals and all of the breathless hype surrounding them will make these three metrics web performance dogma for the forseeable future. They'll eventually turn into a "don't use tables for layout" situation where the initial impetus behind the good advice gets lost and mutated into bad advice e.g. "don't use tables for anything including tabular data", and this horrible advice somehow manages to persist even a decade later.

I also foresee us falling into the trap of “the tyranny of metrics” - this is a phenomenon documented by social scientists that is best summed up by Gergely Nemeth:

“[W]hen a given metric becomes the target, some participants will start gaming for their gains, so the metric stops being a good indicator of what it was initially intended to measure.”

If developers start to focus solely on Core Web Vitals because it is important for SEO, then some folks will undoubtedly try to game the system. Bad intentions aside, it’s so very easy to make a performance change that improves one number while degrading another. Sometimes this tradeoff is desired or acceptable because it’s what’s best for our users, but other times those tradeoffs actually make the problem worse. Distinguishing between the two scenarios is complicated and nuanced, and requires that we look at a wide range of primary and supporting metrics. Not just the three that Google tells us to look at.

When talking about Core Web Vitals, we need to make sure that we keep in mind both the benefits and the tradeoffs inherent in narrowing our scope to three metrics out of many. Performance cannot be measured by a single number, or even three numbers, no matter how much easier our lives would be if it was. Core Web Vitals are an engine that can help pull us forward, not a fully decked-out passenger train with sleeping cars and a cafe that will whisk us wherever we need to go.

Many thanks to Will Gallego and Jeremy Wagner for sharing their thoughts on this post!