Caching is the temporary storage of frequently accessed data in higher speed media (typically SRAM or RAM) for more efficient retrieval. Web caching stores frequently used objects closer to the client through browser, proxy, or server caches. By storing “fresh” objects closer to your users, you avoid round trips to the origin server, reducing bandwidth consumption, server load, and most importantly, latency. This article shows how to configure your Apache server for more efficient caching to save bandwidth and improve performance.
Caching is not just for static sites, even dynamic sites can benefit from caching. Graphics and multimedia typically don’t change as frequently as (X)HTML files. Graphics that seldom change like logos, headers, and navigation can be given longer expiration times while resources that change more frequently like XHTML and XML files can be given shorter expiration times. By designing your site with caching in mind, you can target different classes of resources to give them different expiration times with only a few lines of code.
Three Ways to Cache In
There are three ways to set cache control rules for your web site.
- Programmatically by setting HTTP headers (CGI scripts etc.)
- Through web server configuration files (httpd.conf)
This article addresses the third method of cache control through server configuration files. The first method works with browsers, but most intermediate proxy servers don’t parse HTML files, they look for HTTP headers to set caching policy. The second method of programmatically setting cache control headers (
CacheControl for example) is useful for dynamic CGI scripts that output dynamic data.
Cache Freshness Guaranteed
In order to cache web objects, browsers and proxy servers upstream from origin servers must be able to calculate “freshness lifetimes,” or how long from a previous access or modification of an object it is still OK to display from the cache. HTTP does this digital melon squeezing primarily through brief HTTP header conversations between client, proxy, and origin servers to determine whether it is OK to reuse a cached object, or reload the resource to get a fresh one. Here’s an example REQUEST/RESPONSE sequence for our logo image,
Host: www.websiteoptimization.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1 Accept: image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: https://www.websiteoptimization.com/
Our server responds as follows:
HTTP/1.1 200 OK Date: Mon, 25 Oct 2004 11:55:45 GMT Server: Apache/1.3.31 Cache-Control: max-age=2592000 Expires: Wed, 24 Nov 2004 11:55:45 GMT Last-Modified: Sat, 19 Jun 2004 15:25:10 GMT ETag: "7b80d9-891-40d45ad6" Accept-Ranges: bytes Content-Length: 2193 Keep-Alive: timeout=15, max=99 Connection: Keep-Alive Content-Type: image/gif
This image was last modified on June 19 and is fresh for 30 days from the last access. It is clear from these response headers that this object does not change frequently and can be safely cached for up to a month. After a client queries a proxy or origin server for a specific object, if that object is validated as still fresh, it is returned from the cache. If not, the object is reloaded from the origin server to grab a fresh copy.
Cache Control with mod_expires and mod_headers
For Apache, mod_expires and mod_headers handle cache control through HTTP headers sent from the server. Since they are not installed by default, have your server administrator install them for you. For Apache/1.3x, enable the expires and headers modules by adding the following lines to your httpd.conf configuration file.
LoadModule expires_module libexec/mod_expires.so LoadModule headers_module libexec/mod_headers.so AddModule mod_expires.c AddModule mod_headers.c ... AddModule mod_gzip.c
Note that the load order is important in Apache/1.3x, mod_gzip must load last, after all other modules.
For Apache/2.0, enable the modules in your httpd.conf file like this.
LoadModule expires_module modules/mod_expires.so LoadModule headers_module modules/mod_headers.so LoadModule deflate_module modules/mod_deflate.so
mod_deflate is the native compression module in Apache/2.0 (although mod_gzip does a better job of handling wayward browsers). In this case, the load order does not matter, as Apache/2.0 handles this for you.
Target Files by Extension for Caching
One quick way to enable cache control headers for existing sites is to target files by extension. Although this method has some disadvantages (notably the requirement of file extensions), it has the virtue of simplicity. To turn on mod_expires set
ExpiresActive to on.
Next target your website’s root HTML directory to enable caching for your site in one fell swoop.
<Directory "/home/website/public_html"> Options FollowSymLinks MultiViews AllowOverride All Order allow,deny Allow from all ExpiresDefault A300 <FilesMatch "\.html$"> Expires A86400 </FilesMatch> <FilesMatch "\.(gif|jpg|png|js|css)$"> Expires A2592000 </FilesMatch> </Directory>
ExpiresDefault A300 sets the default expiry time to 300 seconds after access (A). Using M300 would set the expiry time to 300 seconds after file modification. The
FilesMatch segment sets the cache-control header for all
.html files to 86400 seconds (1 day). The second
Note that you can target your files with more granularity using multiple directory sections, like this:
For truly dynamic content you can force resources to not be cached by setting an age of zero seconds and to not store the resource anywhere.
<Directory "/home/website/cgi-bin/"> Header Set Cache-Control "max-age=0, no-store" </Directory>
Target Files by MIME Type
The disadvantage of the above method is the reliance on the existence of file extensions. In some cases webmasters elect to use extensionless URLs for portability and performance (see Rewrite URLs with Content Negotiation). A better method is to use the
ExpiresByType command of the mod_expires module. As the name implies,
ExpiresByType targets resources for caching by MIME type, like this.
This httpd.conf code sets the same parameters, only in a more flexible and readable way. For expiry commands you can use
modified, depending on whether you want to start counting from the last time the file was accessed, or the last time the file was modified. In our case for WebSiteOptimization.com, I chose to use short access offsets for text files likely to change, and longer access offsets for infrequently changing images.
AllowOverride All command. This allows webmasters to override these settings with .htaccess files for directory-based authentication and redirection. However, overriding the httpd.conf file gives a performance hit because Apache must traverse the document tree looking for .htaccess files.
HTTP Header Results
For our Apache/1.3x server, the httpd.conf file comes with cache-control disabled. Let’s look at the headers for the WebSiteOptimization.com home page and embedded logo (l.gif) before we update the httpd.conf configuration file.
HTTP/1.1 200 OK Date: Sat, 23 Oct 2004 23:15:38 GMT Server: Apache/1.3.31 (Unix) mod_gzip/22.214.171.124a mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.3.9 FrontPage/126.96.36.19934a mod_ssl/2.8.20 OpenSSL/0.9.7a Connection: close Content-Type: text/html Content-Encoding: gzip Content-Length: 4326 HTTP/1.1 200 OK Date: Sat, 23 Oct 2004 23:14:13 GMT Server: Apache/1.3.31 (Unix) mod_gzip/188.8.131.52a mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.3.9 FrontPage/184.108.40.20634a mod_ssl/2.8.20 OpenSSL/0.9.7a Last-Modified: Sat, 19 Jun 2004 15:25:21 GMT ETag: "7b80da-4f2-40d45ae1" Accept-Ranges: bytes Content-Length: 1266 Connection: close Content-Type: image/gif
After updating the httpd.conf file with the above MIME-based code, we restart the HTTP daemon using this command:
service httpd restart
The headers for our home page and logo now look like this.
HTTP/1.1 200 OK Date: Sat, 23 Oct 2004 23:17:52 GMT Server: Apache/1.3.31 Cache-Control: max-age=86400 Expires: Sun, 24 Oct 2004 23:17:52 GMT Connection: close Content-Type: text/html Content-Encoding: gzip Content-Length: 4326 HTTP/1.1 200 OK Date: Sat, 23 Oct 2004 23:18:54 GMT Server: Apache/1.3.31 Cache-Control: max-age=2592000 Expires: Mon, 22 Nov 2004 23:18:54 GMT Last-Modified: Sat, 19 Jun 2004 15:25:21 GMT ETag: "7b80da-4f2-40d45ae1" Accept-Ranges: bytes Content-Length: 1266 Connection: close Content-Type: image/gif
Both resources now have cache-control headers. Note also that the
Server field is also stripped down. This is done with the
This minimizes the response header from:
Server: Apache/1.3.31 (Unix) mod_gzip/220.127.116.11a mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.3.8 FrontPage/18.104.22.16834a mod_ssl/2.8.19 OpenSSL/0.9.7a
Our images are now cachable for 30 days. However the HTML file does not have a
Last-Modified header. This is because we use conditional server-side includes to merge in different CSS for different browsers to save a HTTP request. We’ll address the cachability of SSI pages in a future tweak.
Warning: Pragma no-cache Deprecated
According to Stephen Pierzchala of Gomez, you should avoid using the deprecated Pragma no-cache header. The following is an INVALID server response:
Header Set Pragma "no-cache"
“I see this a lot in server responses. In the HTTP specs, the Pragma
header is a deprecated, client-side, HTTP/1.0 request header.”
Server cache control can improve your site’s performance while reducing bandwidth bills. By caching objects that change infrequently for longer periods, and caching frequently updated content for shorter periods (or not at all) you can speed up perceived load times while maintaining fresh content.
About the Author
Andy King is the founder of five developer-related sites, and the author of Speed Up Your Site: Web Site Optimization (http://www.speedupyoursite.com) from New Riders Publishing. He publishes the monthly Bandwidth Report, the weekly Optimization Week, and the weekly Speed Tweak of the Week.
- A Microsoft IIS ISAPI filter that provides similar functionality to mod_expires, with similar commands. From Port80 Software.
- Caching Tutorial for Web Authors and Webmasters
- An introduction to web caching by Mark Nottingham.
- Effective Website Acceleration
- How-to article on speeding up web sites includes section on cache control, by Thomas Powell and Joe Lima.
- HTTP/1.1: Header Field Definitions
- Field definitions from the official HTTP 1.1 specification from the W3C.
- MicrosoftÂ® IIS Leads the Corporate Web Server Market and the First Survey of Cache Control Among Fortune 1000 Web Sites
- In a cache control usage survey, Port80 Software found that 21.1% of the Fortune 1000 use explicit cache control policies on their web sites saving an estimated 28.1% of cachable requests for repeat visitors. Nov. 23, 2004
- mod_expires module documentation
- From Apache.
- Rewrite URLs with Content Negotiation
- Content negotiation can make your URLs shorter and more abstract. By rewriting URLs without file extensions to the right resources you can save bytes and migration headaches. From Speed Tweak of the Week.
- A collection of web caching resources from Stephen Pierzchala, including Caching for Performance (PDF).
- Web Caching
- The definitive book on Web caching. Duane Wessels, Nathan Torkington (ed.). O’Reilly and Associates, 2001.