Call (877) SITE-OPT (748-3678)

Use Server Cache Control to Improve Performance

Summary: Configure your Apache server for more efficient caching to save bandwidth and improve web site performance. A web cache reduces latency and improves web site response times.

Caching is the temporary storage of frequently accessed data in higher speed media (typically SRAM or RAM) for more efficient retrieval. Web caching stores frequently used objects closer to the client through browser, proxy, or server caches. By storing "fresh" objects closer to your users, you avoid round trips to the origin server, reducing bandwidth consumption, server load, and most importantly, latency. This article shows how to configure your Apache server for more efficient caching to save bandwidth and improve performance.

Caching is not just for static sites, even dynamic sites can benefit from caching. Graphics and multimedia typically don't change as frequently as (X)HTML files. Graphics that seldom change like logos, headers, and navigation can be given longer expiration times while resources that change more frequently like XHTML and XML files can be given shorter expiration times. By designing your site with caching in mind, you can target different classes of resources to give them different expiration times with only a few lines of code.

Three Ways to Cache In

There are three ways to set cache control rules for your web site.

  1. Via <meta> tags (<meta http-equiv="Expires"...>)
  2. Programmatically by setting HTTP headers (CGI scripts etc.)
  3. Through web server configuration files (httpd.conf)

This article addresses the third method of cache control through server configuration files. The first method works with browsers, but most intermediate proxy servers don't parse HTML files, they look for HTTP headers to set caching policy. The second method of programmatically setting cache control headers (Expires and CacheControl for example) is useful for dynamic CGI scripts that output dynamic data.

Cache Freshness Guaranteed

In order to cache web objects, browsers and proxy servers upstream from origin servers must be able to calculate "freshness lifetimes," or how long from a previous access or modification of an object it is still OK to display from the cache. HTTP does this digital melon squeezing primarily through brief HTTP header conversations between client, proxy, and origin servers to determine whether it is OK to reuse a cached object, or reload the resource to get a fresh one. Here's an example REQUEST/RESPONSE sequence for our logo image, l.gif.

Host: www.websiteoptimization.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3)
Gecko/20041001 Firefox/0.10.1
Accept: image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://www.websiteoptimization.com/

Our server responds as follows:

HTTP/1.1 200 OK
Date: Mon, 25 Oct 2004 11:55:45 GMT
Server: Apache/1.3.31
Cache-Control: max-age=2592000
Expires: Wed, 24 Nov 2004 11:55:45 GMT
Last-Modified: Sat, 19 Jun 2004 15:25:10 GMT
ETag: "7b80d9-891-40d45ad6"
Accept-Ranges: bytes
Content-Length: 2193
Keep-Alive: timeout=15, max=99
Connection: Keep-Alive
Content-Type: image/gif

This image was last modified on June 19 and is fresh for 30 days from the last access. It is clear from these response headers that this object does not change frequently and can be safely cached for up to a month. After a client queries a proxy or origin server for a specific object, if that object is validated as still fresh, it is returned from the cache. If not, the object is reloaded from the origin server to grab a fresh copy.

Cache Control with mod_expires and mod_headers

For Apache, mod_expires and mod_headers handle cache control through HTTP headers sent from the server. Since they are not installed by default, have your server administrator install them for you. For Apache/1.3x, enable the expires and headers modules by adding the following lines to your httpd.conf configuration file.

LoadModule expires_module     libexec/mod_expires.so
LoadModule headers_module     libexec/mod_headers.so

AddModule mod_expires.c
AddModule mod_headers.c
...
AddModule mod_gzip.c

Note that the load order is important in Apache/1.3x, mod_gzip must load last, after all other modules.

For Apache/2.0, enable the modules in your httpd.conf file like this.

LoadModule expires_module modules/mod_expires.so
LoadModule headers_module modules/mod_headers.so
LoadModule deflate_module modules/mod_deflate.so

mod_deflate is the native compression module in Apache/2.0 (although mod_gzip does a better job of handling wayward browsers). In this case, the load order does not matter, as Apache/2.0 handles this for you.

Target Files by Extension for Caching

One quick way to enable cache control headers for existing sites is to target files by extension. Although this method has some disadvantages (notably the requirement of file extensions), it has the virtue of simplicity. To turn on mod_expires set ExpiresActive to on.

ExpiresActive On

Next target your website's root HTML directory to enable caching for your site in one fell swoop.

<Directory "/home/website/public_html">
    Options FollowSymLinks MultiViews
    AllowOverride All
    Order allow,deny
    Allow from all
    ExpiresDefault A300
    <FilesMatch "\.html$">
        Expires A86400
    </FilesMatch>
    <FilesMatch "\.(gif|jpg|png|js|css)$">
        Expires A2592000
    </FilesMatch>
</Directory>

ExpiresDefault A300 sets the default expiry time to 300 seconds after access (A). Using M300 would set the expiry time to 300 seconds after file modification. The FilesMatch segment sets the cache-control header for all .html files to 86400 seconds (1 day). The second FilesMatch section sets the cache-control header for all images, external JavaScripts and CSS files to 2592000 seconds (30 days).

Note that you can target your files with more granularity using multiple directory sections, like this:

<Directory "/home/website/public_html/images/logos/">

For truly dynamic content you can force resources to not be cached by setting an age of zero seconds and to not store the resource anywhere.

<Directory "/home/website/cgi-bin/">
    Header Set Cache-Control "max-age=0, no-store"
</Directory>

Target Files by MIME Type

The disadvantage of the above method is the reliance on the existence of file extensions. In some cases webmasters elect to use extensionless URLs for portability and performance (see Rewrite URLs with Content Negotiation). A better method is to use the ExpiresByType command of the mod_expires module. As the name implies, ExpiresByType targets resources for caching by MIME type, like this.

ExpiresActive On
ExpiresDefault "access plus 300 seconds"

<Directory "/home/website/public_html">
    Options FollowSymLinks MultiViews
    AllowOverride All
    Order allow,deny
    Allow from all
    ExpiresByType text/html "access plus 1 day"
    ExpiresByType text/css "access plus 1 day"
    ExpiresByType text/javascript "access plus 1 day"
    ExpiresByType image/gif "access plus 1 month"
    ExpiresByType image/jpg "access plus 1 month"
    ExpiresByType image/png "access plus 1 month"
    ExpiresByType application/x-shockwave-flash "access plus 1 day"
</Directory>

This httpd.conf code sets the same parameters, only in a more flexible and readable way. For expiry commands you can use access or modified, depending on whether you want to start counting from the last time the file was accessed, or the last time the file was modified. In our case for WebSiteOptimization.com, I chose to use short access offsets for text files likely to change, and longer access offsets for infrequently changing images.

Note the AllowOverride All command. This allows webmasters to override these settings with .htaccess files for directory-based authentication and redirection. However, overriding the httpd.conf file gives a performance hit because Apache must traverse the document tree looking for .htaccess files.

HTTP Header Results

For our Apache/1.3x server, the httpd.conf file comes with cache-control disabled. Let's look at the headers for the WebSiteOptimization.com home page and embedded logo (l.gif) before we update the httpd.conf configuration file.

HTTP/1.1 200 OK
Date: Sat, 23 Oct 2004 23:15:38 GMT
Server: Apache/1.3.31 (Unix) mod_gzip/1.3.26.1a mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.3.9 FrontPage/5.0.2.2634a mod_ssl/2.8.20 OpenSSL/0.9.7a
Connection: close
Content-Type: text/html
Content-Encoding: gzip
Content-Length: 4326

HTTP/1.1 200 OK
Date: Sat, 23 Oct 2004 23:14:13 GMT
Server: Apache/1.3.31 (Unix) mod_gzip/1.3.26.1a mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.3.9 FrontPage/5.0.2.2634a mod_ssl/2.8.20 OpenSSL/0.9.7a
Last-Modified: Sat, 19 Jun 2004 15:25:21 GMT
ETag: "7b80da-4f2-40d45ae1"
Accept-Ranges: bytes
Content-Length: 1266
Connection: close
Content-Type: image/gif

After updating the httpd.conf file with the above MIME-based code, we restart the HTTP daemon using this command:

service httpd restart

The headers for our home page and logo now look like this.

HTTP/1.1 200 OK
Date: Sat, 23 Oct 2004 23:17:52 GMT
Server: Apache/1.3.31
Cache-Control: max-age=86400
Expires: Sun, 24 Oct 2004 23:17:52 GMT
Connection: close
Content-Type: text/html
Content-Encoding: gzip
Content-Length: 4326

HTTP/1.1 200 OK
Date: Sat, 23 Oct 2004 23:18:54 GMT
Server: Apache/1.3.31
Cache-Control: max-age=2592000
Expires: Mon, 22 Nov 2004 23:18:54 GMT
Last-Modified: Sat, 19 Jun 2004 15:25:21 GMT
ETag: "7b80da-4f2-40d45ae1"
Accept-Ranges: bytes
Content-Length: 1266
Connection: close
Content-Type: image/gif

Both resources now have cache-control headers. Note also that the Server field is also stripped down. This is done with the ServerTokens command:

ServerTokens Min

This minimizes the response header from:

Server: Apache/1.3.31 (Unix) mod_gzip/1.3.26.1a mod_auth_passthrough/1.8 
mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.3.8 FrontPage/5.0.2.2634a mod_ssl/2.8.19 
OpenSSL/0.9.7a

to

Server: Apache/1.3.31

Our images are now cachable for 30 days. However the HTML file does not have a Last-Modified header. This is because we use conditional server-side includes to merge in different CSS for different browsers to save a HTTP request. We'll address the cachability of SSI pages in a future tweak.

Warning: Pragma no-cache Deprecated

According to Stephen Pierzchala of Gomez, you should avoid using the deprecated Pragma no-cache header. The following is an INVALID server response:

Header Set Pragma "no-cache"
"I see this a lot in server responses. In the HTTP specs, the Pragma header is a deprecated, client-side, HTTP/1.0 request header."

Conclusion

Server cache control can improve your site's performance while reducing bandwidth bills. By caching objects that change infrequently for longer periods, and caching frequently updated content for shorter periods (or not at all) you can speed up perceived load times while maintaining fresh content.

About the Author

Andy King is the founder of five developer-related sites, and the author of Speed Up Your Site: Web Site Optimization (http://www.speedupyoursite.com) from New Riders Publishing. He publishes the monthly Bandwidth Report, the weekly Optimization Week, and the weekly Speed Tweak of the Week.

Further Reading

CacheRight
A Microsoft IIS ISAPI filter that provides similar functionality to mod_expires, with similar commands. From Port80 Software.
Caching Tutorial for Web Authors and Webmasters
An introduction to web caching by Mark Nottingham.
Effective Website Acceleration
How-to article on speeding up web sites includes section on cache control, by Thomas Powell and Joe Lima.
HTTP/1.1: Header Field Definitions
Field definitions from the official HTTP 1.1 specification from the W3C.
Microsoft® IIS Leads the Corporate Web Server Market and the First Survey of Cache Control Among Fortune 1000 Web Sites
In a cache control usage survey, Port80 Software found that 21.1% of the Fortune 1000 use explicit cache control policies on their web sites saving an estimated 28.1% of cachable requests for repeat visitors. Nov. 23, 2004
mod_expires module documentation
From Apache.
Rewrite URLs with Content Negotiation
Content negotiation can make your URLs shorter and more abstract. By rewriting URLs without file extensions to the right resources you can save bytes and migration headaches. From Speed Tweak of the Week.
WebCaching.org
A collection of web caching resources from Stephen Pierzchala, including Caching for Performance (PDF).
Web Caching
The definitive book on Web caching. Duane Wessels, Nathan Torkington (ed.). O'Reilly and Associates, 2001.

By website optimization on 23 Oct 2004 AM

Copyright © 2002-2013 Website Optimization, LLC. All Rights Reserved - Free website speed test - Privacy Policy
Last modified: August 26, 2013.

Follow us on: Twitter, Google+, Facebook, Linked In