Home » Abbreviate URLs with mod_rewrite – url abbreviation uri mod rewrite

Abbreviate URLs with mod_rewrite – url abbreviation uri mod rewrite

Summary:

We look behind the screens at Yahoo and Webreference to see how using short abbreviated URLs and mod_rewrite can save space for maximum speed.

Called the “Swiss Army knife” of Apache modules, mod_rewrite can be used for everything from URL rewriting to load balancing. Where mod_rewrite and its ilk shine is in abbreviating and rewriting URLs.

One of the most effective optimization techniques available to web developers, URL abbreviation substitutes short URLs like “r/pg” for longer ones like “/programming” to save space. Apache and IIS, Manilla, and Zope all support this technique. Yahoo.com, WebReference.com, and other popular sites use URL abbreviation to shave anywhere from 20% to 30% off of HTML file size. The more links you have, the more effective this technique.

How mod_rewrite Works

As its name implies mod_rewrite rewrites URLs using regular expression pattern matching. If a URL matches a pattern that you specify, mod_rewrite rewrites it according to the rule conditions that you set. mod_rewrite essentially works as a smart abbreviation expander. Let’s take our example above from WebReference.com. To expand “r/pg” into “/programming” Apache requires two directives, one turns on the rewriting machine (RewriteEngine On) and the other specifies the rewrite pattern matching rule (RewriteRule). The RewriteRule syntax looks like this:

RewriteRule <pattern> <rewrite as>

Becomes:

RewriteEngine	On
RewriteRule ^/r/pg(.*)  /programming$1

This regular expression matches a URL that begins with the /r/ (we chose this sequence to signify a redirect to expand) with “pg” following immediately afterwords. The pattern (.*) matches one or more characters after the “pg.” So when a request comes in for the URL <a href="/r/pg/perl/">Programming Perl</a> the rewrite rule expands this abbreviated URI into <a href="/programming/perl/">Programming Perl</a>.

RewriteMap for Multiple Abbreviations

That’ll work well for a few abbreviations, but what if you have lots of links? That’s where the RewriteMap directive comes in. RewriteMaps group multiple lookup keys (abbreviations) and their corresponding expanded values into one tab-delimited file. Here’s an example map file snippet from WebReference.com.

d      dhtml/
dc     dhtml/column
pg     programming
h      html/
ht     html/tools/

The MapName file maps keys to values for a rewrite rule using the following syntax:

${ MapName : LookupKey | DefaultValue }

MapNames require a generalized RewriteRule using regular expressions. The RewriteRule references the MapName instead of a hard-coded value. If there is a key match, the mapping function substitutes the expanded value into the regular expression. If there’s no match, the rule substitutes a default value or a blank string.

To use this MapName we need a RewriteMap directive to show where the mapping file is, and a generalized regular expression for our RewriteRule.

RewriteEngine 	On
RewriteMap	abbr	txt:/www/misc/redir/abbr_webref.txt
RewriteRule	^/r/([^/]*)/?(.*)	$(abbr:$1}$2	[redirect=permanent,last]

The new RewriteMap rule points the rewrite module to the text version of our map file. The revamped RewriteRule looks up the value for matching keys in the map file. The permanent redirect (301 instead of 302) boosts performance by stopping processing once the matching abbreviation is found in the map file.

Binary Hash RewriteMaps

For maximum speed you should convert your text map files into binary *DBM hash file, which is optimized for maximum lookup speed. Then the above RewriteMap line would look like this:

RewriteMap	abbr	txt:/www/misc/redir/abbr_webref

Automating URL Abbreviation

The above URL abbreviation technique works well for URLs that don’t change very often. But what about news or blog sites where URLs change every hour or every minute? You can create a shell script that automatically scans and abbreviates incoming URLs or use the free open source script available at WebReference.com (http://www.webreference.com/scripts/) that does just that. That’s the abbreviated version of URL abbreviation.

Abbreviating Yahoo.com

Yahoo! uses a similar technique to squeeze nearly 30% off of their home page. Because they manage the busiest page on the Web, Yahoo! takes abbreviation to the extreme. So this expanded URL:

http://dir.yahoo.com/Computers_and_Internet/Internet/World_Wide_Web/

Becomes this miniscule abbreviation:

r/ww

Yahoo’s webmaster created a mapping file that looks something like this:

r/bu    http://dir.yahoo.com/Business_and_Economy/
r/bb    http://dir.yahoo.com/Business_and_Economy/Business_to_Business/
r/fi    http://dir.yahoo.com/Business_and_Economy/Finance_and_Investment/
r/bs    http://dir.yahoo.com/Business_and_Economy/Shopping_and_Services/
r/jo    http://dir.yahoo.com/Business_and_Economy/Employment_and_Work/
r/ci    http://dir.yahoo.com/Computers_and_Internet/
r/in    http://dir.yahoo.com/Computers_and_Internet/Internet/
r/ww    http://dir.yahoo.com/Computers_and_Internet/Internet/World_Wide_Web/
r/sf    http://dir.yahoo.com/Computers_and_Internet/Software/
r/ga    http://dir.yahoo.com/Recreation/Games/Video_Games/
...

So this expanded version:

<font size=-1><b><a href=http://dir.yahoo.com/Business_and_Economy/>Business & Economy</a></b></font><br><font size=-2><a href=http://dir.yahoo.com/Business_and_Economy/Business_to_Business/>B2B</a>,
<a href=http://dir.yahoo.com/Business_and_Economy/Finance_and_Investment/>Finance</a>, <a href=http://dir.yahoo.com/Business_and_Economy/Shopping_and_Services/>Shopping</a>, <a href=http://dir.yahoo.com/Business_and_Economy/Employment_and_Work/>Jobs</a>...</font> <br><br><font size=-1><b><a href=http://dir.yahoo.com/Computers_and_Internet/>Computers & Internet</a></b></font><br>
<font size=-2><a href=http://dir.yahoo.com/Computers_and_Internet/Internet/>Internet</a>, <a href=http://dir.yahoo.com/Computers_and_Internet/Internet/World_Wide_Web/>WWW</a>, <a href=http://dir.yahoo.com/Computers_and_Internet/Software/>Software</a>, <a href=http://dir.yahoo.com/Recreation/Games/Video_Games/>Games</a>...</font>

Becomes this abbreviated version:

<font size=-1><b><a href=r/bu>Business & Economy</a></b></font><br>
<font size=-2><a href=r/bb>B2B</a>, <a href=r/fi>Finance</a>, <a href=r/bs>Shopping</a>, <a href=r/jo>Jobs</a>...</font><br><br>
<font size=-1><b><a href=r/ci>Computers & Internet</a></b></font><br>
<font size=-2><a href=r/in>Internet</a>, <a href=r/ww>WWW</a>, <a href=r/sf>Software</a>, <a href=r/ga>Games</a>...</font>

Note that Yahoo! does not quote URLs, which is invalid but works in most browsers. Yahoo saves nearly 30% off of their home page HTML with this technique. Yahoo also uses subdomains, which further redistributes the load.

Conclusion

Abbreviating URLs with mod_rewrite is one of the most effective techniques available to optimize HTML files. File size savings can range up to 20% to 30%, depending on the number of links in your HTML page. You can combine this technique with URL Rewriting with Content Negotiation for maximum savings. Best used on high traffic pages like home pages, automated URL abbreviation can squeeze more bytes out of critical pages for server-savvy developers.

About the Author

Andy King is the founder of five developer-related sites, and the author of Speed Up Your Site: Web Site Optimization (http://www.speedupyoursite.com) from New Riders Publishing. He publishes the monthly Bandwidth Report, the weekly Optimization Week, the weekly Speed Tweak of the Week, and the semiweekly WebReference Update.

Further Reading

Apache URL Rewriting Guide
Ralf Engelschall shows how to use mod_rewrite.
Case Studies: Yahoo.com and WebReference.com
Chapter 19 summary of Speed Up Your Site shows how Yahoo and WebReference abbreviate their URLs with mod_rewrite.
ISAPI_Rewrite
URI rewriting ISAPI filter for Microsoft’s IIS server, from Helicon Tech.
modrewrite.com
Resources on this versatile module.
mod_rewrite
Documentation from Apache.
URLS! URLS! URLS!
Documents URL rewriting in Apache with mod_rewrite. By Bill Humphries for A List Apart.
Rewrite URLs with Content Negotiation
Content negotiation can make your URLs shorter and more abstract. By rewriting URLs without file extensions to the right resources you can save bytes and migration headaches.
Server-Side Techniques
Chapter 17 summary of Speed Up Your Site shows how to shunt work to the server to shrink XHTML code. Details URL abbreviation with mod_rewrite, browser sniffing, mod_include for SSI, and form and CGI script optimization.

2 thoughts on “Abbreviate URLs with mod_rewrite – url abbreviation uri mod rewrite”

Leave a Comment