HTTP/2 Push : The details

HTTP/2 (h2) is here and it tastes good! One of the most interesting new features is h2 push, which allows the server to send data to the browser without having to wait for the browser to explicitly request it first.

This is useful because normally we need to wait for the browser to parse the .html file and discover the needed links before these resources can be sent. This typically leads to 2 Round-Trip-Times (RTTs) before data arrives at the browser (1 to fetch the .html, then 1 to fetch a resource). So with h2 push, we can eliminate this extra RTT and make our site faster! In this it is conceptually similar to inlining critical CSS/JS in the .html, but should give better cache utilization. Push also helps to ensure better bandwidth utilization at the start of the connection. Plenty of posts talk about the basics, how to use it on different platforms and how to debug it.

Theoretical performance of HTTP/2 push

Figure 1: How push after index.html works theoretically

In an ideal world with unlimited bandwidth, we could theoretically just push all our resources with the .html, eliminating waiting for the browser completely (see Figure 1, right side)! Sadly, in practice, even pushing just a tad too much data can actually lead to significant slow downs in site loads.

This post aims to look at some of the low-level issues that determine how useful push can be and how this impacts practical usage. The text should be comprehensible for relative newcomers to the topic, while more advanced readers can find in-depth information in the many reference links and sources and reflect on the ideas in chapter 2.

TL;DR

1. Underlying basic principles and limits

The performance of h2 push is heavily dependent on the underlying networking protocols and other aspects of the h2 protocol itself. Here we introduce these principles on a comprehensible level to later discuss them more practically in chapter 2.

1.1 Bandwidth and TCP slow start

On the internet, every connection has a limited amount of bandwidth. If we try to send too much data at once, the network will start discarding the excess to keep the link from getting flooded/congested (packet loss). For this reason, the reliable TCP protocol uses a mechanism called slow start which basically means we start by sending just a little bit of data at first and only increase our send rate if the network can handle it (no packet loss occurs). In practice, this initial congestion window (cwnd) is about 14kB on most linux servers. Only when the browser confirms it has successfully received those 14kB (by sending ACK message(s)) will the cwnd double in size to 28kB and we can send that much data. After the next AKCs arrive we can grow to 56kB etc.

note: in practice cwnd can grow in other ways too, but the behaviour is similar to the process described above, so we will use the ACK-based model as it's easy to reason about

Impact of TCP slow start on HTTP/2 push

Figure 2: Impact of TCP slow start (compare to theoretical performance in Figure 1)

This means that on a cold connection, we can only send 14kB of data to the browser in the first RTT anyway: if we push more, it is buffered at the server until the browser ACKs these first 14kB. In that case, push can have no additional benefits: if the browser just issues a new request along with the ACKs, it would have a similar effect (2 RTTs needed to download the full resource, see figure 2). Of course, on a warm/reused connection, where the cwnd has already grown large, you can push more data in 1 RTT. A more in-depth discussion of this can be found in chapter 1 of this excellent document.

Note that this is primarily a problem because HTTP/2 uses just 1 connection vs 6 parallel connections in most HTTP/1.1 implementations. Interestingly, there are also reports that increasing the initial cwnd doesn't help much for general HTTP/2, but it's unclear if this holds true with h2 push as well.

1.2 Priorities

HTTP/2 only uses a single TCP connection on which it multiplexes data from different requests. To decide which data should be sent first if multiple resources are pending transmission, h2 employs priorities. Basically, each resource is given a certain order in which it has to be sent: for example .html is the most important, so it has a priority of 1, .css and .js get 2 and 3, while images get 4.

If we have pending transmissions for multiple resources with the same priority, their data can be interleaved: the server sends a chunk of each in turn, see figure 3. This interleaving can lead to both resources being delayed, as it takes longer for them to fully download. This can work well for progressively streamed/parsed resources, such as progressive jpgs or .html, but possibly less so for resources that need to be fully downloaded to be used (e.g. .css, .js and fonts).

HTTP/2 priorities

Figure 3: HTTP/2 priorities and interleaving

HTTP/2 prioritization is much more complex than these examples, sorting priorities into trees and also allowing interdependencies and weights for subtrees to help decide how bandwidth should be distributed (and how interleaving should be done). Many interesting posts have been written on this.

1.3 Buffers

Networks use a lot of buffering on all levels (routers, cell towers, OS Kernel, etc.) to handle bursts of incoming data which cannot be immediately transmitted. In most cases, once data is in one of these buffers, it's impossible to get it out or rearrange the data in the buffer. This can make it impossible to correctly apply the HTTP/2 priorities. As a contrived example, if we push 3 large images right after our .html, they can fill up the send buffer. The browser then parses the .html, discovers an important .css file and requests it. The server wants to send this before the rest of the images (because it blocks the page render), but can't because it is unable to access data in the buffers. Its only choice is to send the .css after some image data, effectively delaying it and the page load, see figure 4.

HTTP/2 buffering delays

Figure 4: Large buffers can cause critical data to be delayed. The server can only re-prioritize the userspace buffer.

One possible solution for this is limiting the use of the kernel buffer as much as possible and only give data to the kernel if it can be sent immediately, as explained in detail by Kazuho Oku. This is only partly a solution however, since there can also be significant bufferbloat in the network itself. If the network allows the server to send at a high rate, only to then stall the transmitted data in large internal buffers along the network path, we will see the same detrimental effect. This becomes a greater problem for warm connections, that can have more data in-flight at the same time. This issue is discussed further by google engineers in chapter 2 of this excellent document. Interestingly, they argue that h2 push can also help with this issue, by pushing (only) critical resources in the correct order. This means knowing this exact order is important to get the most out of h2 push (and that pushing images directly after .html is probably a bad idea ;).

1.4 Caching

Modern browsers (and network intermediaries like CDNs) make heavy use of caching to quickly load previously downloaded resources without having to issue a new request to the origin server. This of course means we don't want to push already cached resources, as this would waste bandwidth and might delay other resources that aren't cached yet. The problem is knowing which resources are cached; ideally the browser would send along a summary of its cache when requesting index.html, but this concept is not covered by the official HTTP/2 standard and not implemented in any browser at this moment. Instead, the official method is that the browser can signal the server to cancel a push with a so-called RST_STREAM reply to a PUSH_PROMISE. However, in practice, (some) browsers don't do this either, and will happily accept pushes for resources they have already cached. Even if they would use RST_STREAM, we can question how effective it would be: much (or all) of the data of the push can already be en-route/buffered before the RST_STREAM arrives at the server, possibly rendering it largely void.

HTTP/2 RST_STREAM

Figure 5: RST_STREAM can be used to cancel h2 push (source)

To work around these shortcomings while we wait for an official browser implementation, several options have been proposed, see chapter 3 of this excellent document. The canonical concept is to have the server set a cookie detailing which resources it has pushed and check this cookie for each request to see what push targets remain. This seems to work well in practice, though it can fail if resources are removed from the client's cache in the meantime.

2. Practical implications

Now that we have a good understanding of the underlying principles at work, we can look at how these affect using h2 push in practice.

2.1 When to push?

"When to push?" is difficult to answer and depends on your goals. I can see roughly 4 major possibilities (see Figure 6), each with their own downsides :

  1. After-Html: Directly after index.html (benefit limited to: cwnd - size(index.html))
  2. Before-Html: Before/while waiting for index.html (can slow down actual .html if wrongly prioritized/buffered)
  3. With-Resource: Alongside another resource (can be non-optimal but easier to get order/priorities right)
  4. During-Interactive: After page fully loaded (not for improving key pageload metrics)

When to push?

Figure 6: Various times when push can be used

Initial page load

It seems that After-Html is what most people think about when talking about push, while it's arguably the least useful in the list, especially on cold connections (see 1.1). Conversely, Before-Html is a lot more interesting: we can actually increase the congestion window up-front, so that even large .html responses can be sent in a single RTT when they become available. Especially in the case of CDNs/edge servers this can work well if the origin is far away/slow: because of the small RTT between client and edge, the cwnd can grow quite quickly. It is difficult to fine-tune though: if we push too much we might fill up the buffers (1.3) and might not be able to prioritize the .html correctly. Colin Bendell created shouldipush.com to help you assess the gains your site can make if pushing before index.html (before "TTFB: time to first byte") and after index.html (before "TTFR: time to first resource")(see Figure 7).

shouldipush.com bandwidth opportunities

Figure 7: Opportunities to increase bandwidth utilization (source)

With-Resource could be a little less optimal for faster first load, but this approach might make it (much) easier to manage what to push and limit the risks of delaying other resources (see next in 2.2) as resources are closely tied to other related data. For example in Figure 6, we know style.css will reference font.woff2, so we push it along whenever style.css is requested. This naive scheme means we don't have to worry about how style.css and font.woff2 fit into the larger "dependency graph", which makes the setup easier. A lot can go wrong here though and this is probably the least useful option.

After load and across pages

Finally we have During-Interactive, which is often ignored and considered controversial by some because there seems to be a direct, more attractive competitor: "Resource Hints" (which you may know as <link rel="preload/prefetch">). Using Resource Hints can trigger the browser to fetch a resource before it needs it, both for the current page load and the next. It's often seen as superior to push because among others it works cross origin, doesn't suffer from the (current) caching problems and the browser can better decide how to fit it into the download schedule. So after the page is fully loaded and we have some javascript running, we might as well fetch any resource we need via Resource Hints instead of pushing them. It is worth mentioning though that Resource Hints can suffer from the same limitations as push, especially in dealing with h2 priorities and excessive buffering (chapter 5).

Seeing Resource Hints and push as complementary options, I feel there is a lot of room for experimentation here and cases where push can have its place. For example, in academic literature, many papers have investigated push to speed-up video segment delivery. The MPEG-DASH standard streams videos in small segments (e.g. 2s) and instead of waiting for the browser to request the next segment (in time before the playback of the current one finishes), we can use push to instead send available (live) segments immediately. Another interesting paper is MetaPush: the researchers push so-called "meta files" which contain information about resources used in future pages and then use Resource Hints to download them. This should combine the best of both worlds and prevent excessive bandwidth usage.

In the field, Facebook has been using push in their native app to quickly load images/video, as an app has no critical js/css to load first. While Facebook's image use case looks more like After-Html, push could also be employed in a (Single Page) app context to send along resources with data updates, e.g. when the app fetches the latest content.json which references some images, they can be pushed along.

At Velocity Amsterdam 2016, Amazon's Cynthia Mai talked about how Amazon aggressively prefetches resources to speed up next-page load, but that <link rel="prefetch"> was too unpredictable and didn't scale to large amounts of resources (as it seems to only fetch during browser idle time). A smart push scheme could be a more reliable option in this case, as you have more control over what is actually sent. In addition, though I haven't found many good examples of this, combining push with service workers can be a very powerful concept that can also help with the current caching issues. Finally, I've seen it mentioned several times that an origin server might also push updated resources to the CDN edge, which could make for a lower-overhead API between the two.

2.2 What to push?

As discussed in 1.2 and 1.3, push can slow down the initial page load if we push too much or in the wrong order, as data can get stuck in buffers and (re-)prioritization can fail. To get it right, we need a very detailed overview of the order in which resources are loaded.

Dependency graphs

This load order is related to the "dependency graph", which specifies resource interdependencies. While this is quite simple in concept (simply build a tree by looking at the child resources each parent resource includes), in practice these graphs can be quite complex (See Figure 8). The very interesting Polaris paper from MIT looks into how you can distill a correct dependency graph and shows that just by loading resources in the correct order/with correct priorities, page load times could be improved by 34% at the median (even without using server push or dedicated browser support!).

Polaris: complex dependency graph
Figure 8: Real dependency graphs can be very complex and lead to unexpected optimal resource priorities and load orders (source)

Manually creating a correct dependency graph can be difficult, even if we're just looking at the critical resources. Some basic tools already exist to partly help you with this, popular frameworks like webpack also keep some dependency information and Yoav Weiss is reportedly looking into exposing dependency info via the Resource Timing API (as dependencies and priorities can be different between browsers).

To then use your constructed graph to decide what to push and what to load normally/via Resource Hints is even more difficult. The most extensive work I've seen on this comes from Akamai. They use collected Resource Timing data from RUM (Real User Monitoring) to extract the dependency graph and then statistically decide which resources should be pushed. The integration into their CDN means they can also watch out for (large) regressions from push and adapt accordingly. Their approach shows how difficult it can be to properly leverage push and to what lengths you need to go to optimize its operation. Alternatively, CloudFlare discussed pros and cons of other ways a CDN can provide push.

While waiting for advanced supporting tools, we are stuck with mostly the manual methods, and many server/framework implementations primarily look at critical resources at initial page load as the main use-case for push (probably since they are easiest to manually fine-tune), see 2.3. It is here that With-Resource can make it easier (if used conservatively): if we just push direct child dependencies along with a resource, we don't need to keep the overview of the full dependency graph. This is especially interesting for sites where individual teams work on small feature "pagelets" that combine into a larger site.

Note that these dependency graphs are also needed to optimally profit from Resource Hints!

Warm connections and caching

Getting the push order right is particularly important for cold connections and first page loads (nothing cached). For warm connections or in case many critical resources have already been cached, we suddenly get a lot more options because we can now push non-critical things: do we first push ad-related resources? our hero image or main content image(s)? Do we start pre-loading video segments? social integrations/comments section? I think it depends on the type of site you run and what experience you want to prioritize for your users. I feel that it is in these situations push will really start to shine and can provide significant benefits.

I haven't seen much material that looks into how push behaves on warm connections/cross pages in practice (except, of course, for this excellent document, chapter 1), which is probably because of the caching issues and the fact that it's more difficult to test with existing tools. Because of the many possible permutations and trickiness of push, I predict it will be some time before we see this being used properly. Akamai's RUM-based system also doesn't include this yet because they are focusing on the other use cases first and, in the cross-page case, because:

The goal is a) predicting the next page correctly b) not burning up a cellular user's data plan and c) not overcharging our customer for data that an end user doesn't request (getting the push wrong). This was the failure of the 'prerender' Resource Hint. We can do a) really well and can predict the next page with high confidence. But since it's not 100% guaranteed, the blowback from b) and c) are of great concern. - Colin Bendell

Pushing warm connections with caching

Figure 9: Pushing can look very different over warm connections and/or with cached critical resources

This "wealth of options" becomes even larger if we start prefetching assets for future page loads, as more will be cached and we need to go further and further down the dependency graph for knowing what to push. To make optimal use of this prefetching scheme, we do need to employ some good prediction algorithms. For some sites this will be trivial (Amazon's product list will probably lead to a detail view somewhere down the line), but other sites might need to use a statistical/machine learning system to predict how users will traverse through their pages. This can of course have deep integrations with the already discussed RUM-controlled push scheme.

Fine-grained pushing/streaming

Up to this moment, we have primarily considered resources as single files which need to be downloaded fully to be used. While this is true for some resource types (.css, .js, fonts) others already allow streaming (.html, progressive images), where the browser starts using/processing the data incrementally/asap. For streamable resources, we might actually use resource interleaving (see 1.2) to our benefit to get some version of the content displayed early on.

For example, the h2o server interrupts sending .html data when requests for .css come in. Akamai did impressively in-depth research on progressive images and the Shimmercat server uses a very nice implementation that allows you to prioritize parts of (progressive) images. And of course, every browser progressively scans incoming .html for new links to request.

If we can figure out how to make .js and .css streamable or make tools to split larger files, we can move towards incredibly fine-grained pushing and get more use out of the low cwnds on cold connections (1.1). Advances in the Streaming API/service workers might mean we don't even have to wait for browser vendors to start experimenting with this.

2.3 How to push (2016)?

Up until now, we've been discussing the problems and opportunities of push without worrying (much) about what is actually possible in the current browser/server implementations. I don't think it's useful to try to give a full overview at this point, since things may look very different a few months from now. Instead I will give a non-exhaustive list of current problems/quirks/gotcha's to make the point that it might be a little too early to really use push in production.

note: some of these were overheard during conference questions and talks with others, so not everything has a hard reference.

3. Personal conclusions

After this wall of text, I still have the feeling many (important) details on h2 push remain undiscussed. Especially given that h2 push seems to only grant limited performance gains (especially for cold connections on slow networks) and can sometimes even slow down your site, is it worth all this effort? If Akamai has to use a complex RUM-based data-mining approach to make optimal use of it and nginx doesn't consider it a priority, are we not doing exactly what Ian Malcolm warned about in Jurassic Park? Shouldn't we just use Resource Hints and drop push?

Could vs should. Source: https://img.pandawhale.com/162716-dr-ian-malcolm-meme-your-scien-wFWh.jpeg

While I would agree HTTP/2 push isn't ready for production yet (and, in my opinion, the same could be said to a lesser extent for HTTP/2 in general), I feel we're just getting to know these new technologies and how to best use them. It is said the best laid plans of mice and men fail at the first contact with the enemy, and I feel the same applies for most new standards: even though many people have spent years thinking about them, things start to unravel fast when used in practice and at scale. This doesn't mean they don't have merit or potential, just that we need more time to figure out best practices (how long have we been optimizing for HTTP/1.1?).

Many of the biggest current issues (e.g. cache-digest, early hints) will be solved soon and implementations will mature. Other underlying concepts such as dependency graphs, explicit prioritization, user behaviour prediction (for prefetch) and more fine-grained/interleaved streaming are useful for many more techniques than just push and will rise in the years to come. QUIC, the next big protocolâ„¢, is heralded to help with the buffering issues, alongside offering many other interesting features. Interesting times lie ahead for webperf and I think h2 push will find its place in the end.

For me personally, I hope to do some research/work on the following items in the months to come:

I hope you've learned something from this post (I certainly have from the research!) and that you will start/continue to experiment with HTTP/2 push and join the webperf revolution!


Thanks to Colin Bendell and Maarten Wijnants for their help and feedback. Thanks to all the people who have written and talked about this topic, allowing me to piece together this document.

Custom figures were made with https://mscgen.js.org/, https://draw.io and https://cloudconvert.com/svg-to-png.

Last update: 30/11/2016
Live version: https://rmarx.github.io/h2push-thedetails/