PDF optimization is often overlooked when creating PDF files for the Web. While PDFs have become quite popular on the Web, many PDFs used in web sites are designed for high quality print output and are not optimized for the Web. Even PDFs designed for Web use can have a wait problem, weighed down with excess fonts, change histories, and unoptimized images and forms. Optimizing PDF files for the Web can significantly shrink their size and boost display speed, saving bandwidth and user frustration.
In this article we’ll give you tips and tools to optimize PDFs for minimum file size while still maintaining accessibility and search engine visibility. We review Adobe’s PDF Optimizer in Acrobat 8 Professional (pre-release) and Apago’s PDF Enhancer 3.1.
What Is a PDF?
Portable Document Format (PDF) is the defacto file format for presenting device-independent documents on and off the Web. PDFs are an efficient way to accurately describe simple to intricate documents for screen or print output. A PDF document is a collection of objects with structural information in a self-contained series of bytes. PDF is a page description language, like PostScript but simplified with restricted functionality (no programming like PostScript) to be more lightweight. PDFs use the following compression algorithms to reduce file size:
- LZW (Lempel-Ziv-Welch, see Minimize Bit Depth) and FLATE (ZIP, in PDF 1.2) for lossless compression of text and images
- JPEG and JPEG2000 (PDF version 1.5) for lossy (or lossless for JPEG2000) compression of color and gray scale images
- CCITT (the facsimile standard, Group 3 or 4), run-length, and JBIG2 compression (PDF version 1.4) of monochrome images
It is in how well you use these compression techniques, how efficiently the data is described (including image resolution) and the complexity of the document (read number of fonts, forms, images, and multimedia) that ultimately determines how large your resulting PDF file will be.
Creating Small PDFs
The main factors in creating small PDFs are image resolution, image type (bitmap or vector), the number of fonts used and how they are embedded, PDF version, and the level of compression. In general the higher the PDF version number, the smaller the file. Acrobat 5 (PDF version 1.4) added JBIG2 compression, which is superior to the CCITT or Zip algorithms when compressing scanned monochromatic copy (see Table 1). JBIG2 (Joint Bilevel Image Experts Group) encodes compressed monochrome (1 bit per pixel) image data from 20:1 to 50:1 for pages full of text. Like other dictionary-based algorithms (LZW, ZIP) JBIG2 creates a table of unique symbols and when a subsequent symbol matches one in the table, it substitutes a token pointing to the table index. JBIG2 also compresses the entire table.
Acrobat 6 (PDF version 1.5) added the ability to compress the entire file (Clean Up Settings dialog). However, since over 90% of Acrobat users have version 5.0 or greater, using PDF 1.4 is a safer alternative. Acrobat will usually display (with a warning) a more recent PDF version, but new compression schemes will spawn an error when opened in older versions of Acrobat.
|Acrobat Version||Year Introduced||PDF Version||Major Features Added|
|3.0||1996||1.2||Added interactions, movies/sounds, forms, CJK (Chinese, Japanese, Korean), web (hyperlinks, URLs, etc.), linearization for fast page display|
|5.0||2001||1.4||Added JBIG2 compression, transparent imaging model, tagged PDF files (for standardized extraction of objects)|
|6.0||2003||1.5||Added Compress entire file, XFA Forms (XML)|
|7.0||2005||1.6||Added improved PDF Optimizer Interface, Path smoothing, Chopped image repair, Image enhancement (despeckle, etc.), can embed OpenType fonts|
|8.0||2006||1.7||Improved PDF Optimizer Interface again, broke out “Discard User Data” into one pane, flatten form fields, combine and optimize files. Some improvement in areas of 3D, advanced commenting features, and security. Note defaults to saving in Acrobat 1.6 format.|
To create the smallest possible PDFs for the Web minimize the number of fonts, bitmapped images, and substitute vector based-graphics instead. Minimize the number and complexity of forms in your PDF document and flatten form fields, and avoid the use of multimedia.
There are different methods to create PDFs, including outputting to PostScript and Distilling, GDI/Printing (Webopedia Definition of Graphic Device Interface), one-click “Direct to PDF,” and dynamically on the server-side. However you create a PDF, the techniques and tools listed below can help you enhance and optimize your PDFs for the Web.
Avoid Refried Graphics
For graphics that must be inserted as bitmaps, prepare them for maximum compressibility and minimum dimensions. Use the best quality images that you can at the output resolution of the PDF. Inserting compressed JPEGs into PDFs and Distilling them may recompress JPEGs, which can create noticeable artifacts. Use black and white images and text instead of color images to allow the use of the newer JBIG2 standard that excels in monochromatic compression. Be sure to turn off thumbnails when saving PDFs for the Web.
Use Vector Graphics
Use vector-based graphics wherever possible for images that would normally be made into GIFs. Vector images scale perfectly, look marvelous, and their mathematical formulas usually take up less space than bitmapped graphics that describe every pixel (although there are some cases where bitmap graphics are actually smaller than vector graphics). You can also compress vector image data using ZIP compression, which is built into the PDF format. Acrobat Reader version 5 and 6 also support the SVG standard.
How you use fonts, especially in smaller PDFs, can have a significant impact on file size. Minimize the number of fonts you use in your documents to minimize their impact on file size. Each additional fully embedded font can easily take 40K in file size, which is why most authors create “subsetted” fonts that only include the glyphs actually used.
Flatten Fat Forms
Acrobat forms can take up a lot of space in your PDFs. New in Acrobat 8 Pro you can flatten form fields in the Advanced -> PDF Optimizer -> Discard Objects dialog. Flattening forms makes form fields unusable and form data is merged with the page. You can also use PDF Enhancer from Apago to reduce forms by 50% by removing information present in the file but never actually used. You can also combine a refried PDF with the old form pages to create a hybrid PDF in Acrobat (see “Refried PDF” section below).
Use the RGB versus CMYK Color Space
For web-only PDFs if you have a choice, use the RGB color space for your PDFs versus the CMYK color space. RBG has one less data channel than CMYK, so files are that much smaller in size. Also, Microsoft applications all think in RGB, even when importing CMYK images.
Convert to Grayscale
If color is not required, you can convert your PDF to grayscale. In Acrobat 8 select Advanced -> Print Production -> Convert Colors menu. Under Document Colors select “Device Gray,” and under Destination Space choose “Gray Gamma 1.8” or 2.2. A test on a color print ad saved 54% when converting to grayscale (save as).
Optimizing Existing PDFs
In many cases you won’t have access to the original document, just the resulting PDF file. Many PDFs we’ve seen are not fully optimized for the Web, using conservative settings more appropriate to high-resolution printers. For computer monitors viewing web-based PDFs, you don’t need high resolution images and exact reproduction of font faces, you just want to convey your information in an efficient way. Using the techniques outlined below, you can shrink your PDFs, while still maintaining the textual data for search engines, and reasonable quality for print output. Some webmasters offer two versions of their PDFs, once for fast web display, and one for printing.
Once you’re done making changes to your PDF document choose File -> Save As and overwrite your existing PDF file (see Figure 2). By default, save as removes changes that are appended to PDFs by the Save command, linearizes the file for fast web viewing, and removes unused objects.
The result is a compact, linearized PDF that displays the first page (or an arbitrary page) quickly, while the rest of the file downloads in the background. Although linearized PDFs are slightly larger, they also increase perceived speed. Note that optimizing a signed document will invalidate its signature. More »