Don't Panic: Instruct your Java application to facilitate the browser cache

Suppose you are using a Java application that serves content as you normally would static content. Are you sure your content makes use of standard HTTP caching functionality? If not, your application might have to endure a lot more reloads than necessary. The demo for this article is based on JOSS, ie a client tool for accessing OpenStack Object Storage. However, the principles described herein can be applied beyond the scope of JOSS. Learn here how.

If-Modified-Since - natural behavior

In any decent browser, you get the If-Modified-Since behavior out-of-the-box. You may not know it, but it happens all the time. Whenever you refresh a page with images, your browser will do a quick check on the server to see if the resources have been modified. If they have not been modified, your browser will serve the cached content. Like so:

What happens under the hood, is that the image will have a response header called “Last-Modified”. Every time the browser needs to fetch that image, it will send a request header called “If-Modified-Since” with that exact same date. The server, presumably Apache, will use “If-Modified-Since” to verify if the content has changed. If not, it will not return the content, but will instead set a HTTP status code 304, aka “Not Modified”. The browser, in turn, uses this code to serve cached content.

Simple, no? Standard stuff that makes the web a bit of a better place by disallowing the same content to be repeatedly downloaded.

Through the application layer

Here’s the catch, though: if you, for whatever reason, choose to serve your images through an application layer, this chain is broken. The application layer does not automatically connect browser and underlying storage. Therefore, images served through an application layer are served over and over again. Not very efficient:

You might be wondering why you would want to introduce your application as an extra layer. That is a very good question, because without urgent reason, you should aspire to always make use of the public URL for images. It’s quicker and gives you out-of-the-box HTTP functionality. That said, there are some reasons that justify the extra stop through the application layer. One of them is that the content is private and a user’s credentials need to be verified before content may be accessed. Another reason may be that you require the content to be visibly or invisibly watermarked before serving it. Perhaps yet another is that the content is tailored to the consumer, eg images for retina devices. It is your call. Just remember that the extra stop has a price. See here for some of the architecture reasons for introducing the application as an extra stop.

Connect browser and Object Storage

The question then is how to get the 304 behavior back if the application layer is introduced as an extra stop. This is actually quite simple and consists of the following steps:

read If-Modified-Since; read the value of the If-Modified-Since header from the request

pass If-Modified-Since; pass the If-Modified-Since to the destination (OpenStack Object Storage in our example). The destination server can now decide whether the content has been modified and it will return a 304 if it is unchanged.

content is unchanged - set 304; if the storage node reports back a 304, simply set this status in the response header

content is changed - pass content and Last-Modified; ask the object’s modification date and set that value in the response header. This will trigger the browser to use If-Modified-Since on its next call to the server. Serve the content in the response.

Be sure to check out the part of the code which demonstrates these steps.

We now get this situation, which is exactly what we want:

To demonstrate that it actually works, here is a screenshot of an image which is shown twice - once directly from a public URL and once through the application layer. The top half shows the first load (ie, HTTP status code 200). The bottom half shows the reload, which gives back a 304 in both cases, also the one that goes through the application layer.

Note that the call that passes the application takes longer than the direct call. This is the price you pay for mediation.

Some things you might want to ponder:

Seconds and milliseconds; if you are implementing your own If-Modified-Since algorithm, be aware that server-side modification date/time often has milliseconds precision, whereas the standard for the browser is to have seconds precision. Be sure to align the two.

Spring support; if you are considering implementing this method yourself, you might want to check out built-in Spring support, particularly WebRequest.checkNotModified

Etag and If-None-Match; If-None-Match is similar to If-Modified-Since, except that the check is made on the Etag, which probably is the hash of the content, but not necessarily. At least, it’s unlikely to be the modification date. If you want to use this instead, be sure to send the hash in the Etag header of the response, and read the If-None-Match header of the request.

To see the source code for the above in action with JOSS, check out the If-Modified-Since demo in Github. If you have no OpenStack Object Storage account, be sure to uncomment the ClientMock in nl.tweeenveertig.openstack.tutorial.StorageProvider and disable the line above it. Note that you will not see the public URL image, because there is no public URL for the mock mode.

account = new ClientImpl().authenticate(
    tenant, username, password, auth_url);
// account = new ClientMock().allowEveryone().authenticate(
//    tenant, username, password, auth_url);

If you want to know more about JOSS, be sure to read the high-level description and tutorial also.

11 October 2012

Instruct your Java application to facilitate the browser cache

If-Modified-Since - natural behavior

Through the application layer

Connect browser and Object Storage

No comments:

Post a Comment