Javascript Madness: KISS Javascript Compression

Jan Wolter
Feb 23, 2009

When writing, debugging and maintaining a Javascript program, you typically want to your code to be split over many different files, with lots of white space and reams of useful comments.

When users are uploading your Javascript programs, they'll get the fastest response if there are as few separate files as possible, and those files as small as possible, with minimum white space and no comments.

I think it is important to Javascript programmers to overcome this dichotomy. It's very bad if, everytime you write a comment, you find yourself thinking "is this comment important enough to justify increasing my application's upload time for all the hundreds of thousands of users who will be uploading it?" That's a bad thing to be thinking. It leads to horrible, unmaintainable code.

So this is a problem every coder of serious Javascript applications has to overcome, but it's not particularly a Javascript problem. You have the same problems with CSS style sheets. So this topic is kind of a bad fit to these "Javascript Madness" pages.

But who cares?

Gzip Data Compression

All vaguely modern browsers are able to accept files that have been compressed by the gzip algorithm, which is pretty good for general data compression. The browsers signal their willingness to accept such data in every HTTP request that they make by sending an "Accept-Encoding: gzip" header. Compressing Javascript with gzip will reduce the amount of data transfered to about 30% of normal.

Note that though all modern browsers are capable of handling compressed data, there seems to be a small fraction of users for whom even the newest browser releases don't send an "Accept-Encoding: gzip" header, and can't handle compressed data if it is sent to them. I've seen this both for IE and Firefox users. I don't know why it is. Perhaps a missing library file on their computers. I've only seen it for Windows users, but that may just be because the vast majority of users are Windows users. In any case, you cannot just send compressed data to everyone. You must heed the "Accept-Encoding" headers in the request.

Apache has mod_deflate which can automatically compress anything sent to the browser, including the output of CGI programs. But I found this unattractive, because it runs the compression algorithm every time a request is made. I just wanted to compress the files once, store them on disk, and have the webserver serve them up to any browser that can take them. Of course, the browser still has to uncompress the file, but it's fast and I'm willing to assume that user's computers have plenty of extra CPU.

There is also mod_gzip, which I believe can be configured to serve up precompressed files. It is certainly much more well-tested than what I did, but I didn't find any good documentation for it and it's not clear to me whether versions exist for Apache 2.2.

So here's the setup I am using. It requires the Apache http server, mod_rewrite and mod_header.

I keep my javascript source files in a directory under the application's document root named "js".
I created another directory under the application's document root called "jsz" where I will keep the concatinated and compressed files.
I have a simple program (see below) that generates the contents of the "jsz" directory from the files in the "js" directory. If there is a page that needs 6 files from the "js" directory, then the program concatinates them into a single file stored in the jsz directory. It then compresses the file, leaving the uncompressed file there too, so we always have both "jsz/whatever.js" and "jsz/whatever.js.gz" in the directory. Obviously I have to rerun my program everytime I change the source Javascript and want to publish the changes.
Production versions of application pages have <script> tags that request "jsz/whatever.js". Development versions include all the separate files from the "js" directory to make debugging easier. Nothing ever directly requests "whatever.js.gz".
The "jsz" directory contains the following .htaccess file:
```
<FilesMatch "\.js$">
RewriteEngine On
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteCond %{REQUEST_FILENAME}.gz -f
RewriteRule (.*)\.js$ $1\.js.gz [L]
ForceType text/javascript
</FilesMatch>

<FilesMatch "\.js\.gz$">
ForceType text/javascript
Header set Content-Encoding gzip
Header set Vary Accept-Encoding
</FilesMatch>
```
The first block says that if "whatever.js" is requested, and the request says it will accept a gziped file, and the file "whatever.js.gz" exists, then change the request to ask for that instead.
The second block says that if a ".js.gz" file is requested (either directly or as a result of an internal redirect triggered by the first block), then we should set a couple headers saying it is compressed Javascript, and that the response to such requests varies depending on the "Accept-Encoding" header in the request. The latter is needed for proxies. We don't want a proxy to serve a cached copy of the compressed file to a browser that didn't say it can handle compressed files.
The rare browser that cannot handle gzipped files just gets the uncompressed file. There are instances of browsers that say they can accept gzipped content, but actually can't, but they are pretty old, like Netscape 4. It would be possible to write exceptions for each of them (and add "User-Agent" to the "Vary" header) but I haven't felt it was worth it. None of those browsers would be able to run the all that javascript anyway, so I don't care if they can't load it.
These commands can alternately be put in an appropriate <Directory> block in the apache config file.
In some cases you may need an "Options +FollowSymLinks command before the first block, because rewrites are the moral equivalent to symbolic links.
The version above assumes that the directory is under the server's document root and is unaliased. If the directory is accessed via an alias (including ~username sorts of things), then things get a bit more complex since it won't automatically reconstruct the alias in the redirect URL. Normally the "RewriteBase" command would be used, but I don't think it works inside "<FilesMatch>" directives. Instead you have to change the "RewriteRule" to something like:
```
RewriteRule /home/mylogin/public_html/mydir/(.*).js$ /~mylogin/mydir/$1\.js.gz [L]
```
Remember that you are rewriting the pathname of the requested file into the URL to redirect to, so your rule has to convert apples to oranges in situations where they don't happen to be the same.
I'm not sure that the same thing couldn't be achieved more cleanly using Multiviews and Apache's internal content negotiation mechanism, but I haven't been able to manage that.

In my early tests, I just watched the file sizes on the http_access log to see if the the compressed or uncompressed file was being delivered to the browser. But to do really thorough testing, I needed to be able to control the "Accept-Encoding" headers being sent in requests, and see all headers in responses, so I wrote a little Perl program to send requests and display responses using libwww.

Javascript Code Compression

Gzip compression helps a lot, but each comment still adds to the size of the code. So it makes sense to strip out the comments and otherwise compress the Javascript code before we gzip it.

A lot of very clever people have addressed this problem, so there are plenty of applications around that will scrunch up Javascript files. I've tested just a couple of these.

JSMin does the obvious stuff. It deletes safely deletable whitespace and all comments. You need to be a wee bit careful programming. The author says it converts "a + ++b" into "a+++b" which the Javascript parser interprets as "a++ + b", so you need to be careful with parentheses. The tool was originally written in C and exists in many other languages.

Dojo Shrinksafe and the YUI Compressor are somewhat smarter Java tools. They actually use the parser from Rhino, a Javascript interpretor written by Mozilla, to analyze the code. This allows them to do things like renaming local variables with shorter names without mangling the public API of the function.

Another widely used tool is Dean Edward's Packer tool. It isn't richly documented and I haven't figured out exactly what it's strategy is, but it seems somewhat similar to JSMin. I did my tests using the perl version of the packer2 tool.

So this is all very clever, but it all seemed over-complex to me. What if we stuck more to the KISS (Keep It Simple, Stupid) philosophy? What if you just strip comments, leading and trailing spaces, and blank lines. That's 100% safe, and very easy to write.

I fed one of my Javascript apps through that KISS algorithm and each of the four compressors mentioned above. Here's what I saw:

Size Gzipped Size

Original File 115,547 bytes 100% 34,478 bytes 30%

KISS 79,812 bytes 69% 22,911 bytes 20%

JSMin 72,462 bytes 63% 21,835 bytes 19%

Packer2 71,434 bytes 62% 21,670 bytes 19%

Dojo ShrinkSafe 74,486 bytes 64% 22,000 bytes 19%

YUI Compressor 66,415 bytes 57% 20,958 bytes 18%

	Size	Gzipped Size
Original File	115,547 bytes	100%	34,478 bytes	30%
KISS	79,812 bytes	69%	22,911 bytes	20%
JSMin	72,462 bytes	63%	21,835 bytes	19%
Packer2	71,434 bytes	62%	21,670 bytes	19%
Dojo ShrinkSafe	74,486 bytes	64%	22,000 bytes	19%
YUI Compressor	66,415 bytes	57%	20,958 bytes	18%

Note that your results are likely to be significantly different, depending on your coding style.

We see that the smarter algorithms do gain a decent amount of extra compression, though the trivial steps of discarding comments and leading and trailing white space really account for most of their compression. The YUI Compressor seemed to work a fair bit better for me than the others.

However, once the files are all gzipped, the differences between them become quite a bit less. The better the Javascript compressor does, the less well gzip will be able to do. (This is a general law of diminishing returns for file compression: the more the data has already been compressed, the less you'll be able to compress it.)

In the end, after gzipping, the fancy compressors only gain you 2% more compression than the KISS approach. They are much more complex, they make the production code much harder to debug, and there is some slight risk that they might introduce bugs. 2% might be worth that to you, but not to me. I decided to stick with the KISS approach.

Of course, they do a heck of a lot more to obfuscate the code, if you value that. I mostly find that annoying, because, on occasion, I have to debug based on the production code, and because I don't believe that obfuscation is an effective way to protect your code. In the unlikely event that your code is valuable enough to be worth stealing, people who want it will read through any obfuscation.

More significantly, compressed code may also run a bit faster. All browsers these days "compile" the Javascript after it is loaded, and this compile time will typically be smaller if your source code is smaller, and in this stage the file has already been decompressed so we are looking at a 12% difference in size between KISS and YUI, not a 2% difference. Also, the shorter variable names that appear in a program compressed by YUI may result in slightly faster run times in some browsers. I suspect that these differences are usually small compared to the download time differences, but if you are really pushing for every last bit of Javascript performance, then you should probably use one of the fancier compressors.

I ran into one small surprise when testing these: I got syntax errors from the YUI compressor and Shrinksafe, because Rhino thinks "goto" is a reserved word and I was using it as a variable name. Technically Rhino is right. There is no goto command in Javascript, but it is reserved anyway in case they ever want to add one (why would they?). But no browser ever complained to me about that. I actually had to debug my code to make it compress. Code with syntax errors cannot be compressed. I dunno if that is a good thing or bad thing, but it's a thing.

KISS Compressor

So here's my whole compression program, written in Perl because this kind of thing is what Perl is for.

All configuration information is embedded in the program. There are no command line arguments.

The first few lines give the full path name of the "js" and "jsz" directories. The next few lines define which sets of files in the source directory are to be concatinated together, and what they are to be called in the destination directory. So, for instance, we create jsz/front.js and jsz/front.js.gz from js/browser.js, js/button.js, js/http.js, js/draw_html.js and js/play.js,

#!/usr/bin/perl

# Source and Destination Directories
$SRC= '/home/webpbn/public_html/js';
$DST= '/home/webpbn/public_html/jsz';

# Target files to build, and source files to put into them
@group=
(
    {
        name => 'front',
        file => ['browser',
                 'button',
                 'http',
                 'draw_html',
                 'play'],
    },
    {
        name => 'play',
        file => ['browser',
                 'args',
                 'session',
                 'button',
                 'http',
                 'panel',
                 'play',
                 'tip'],
    },
);

# Loop through files to generate
for ($i= 0; $i < @group; $i++)
{
    print $group[$i]{name}, "\n";
    $fn= $DST.'/'.$group[$i]{name}.'.js';
    open OUT, ">$fn" or die("Cannot open $fn");

    # Loop through files to concatinate and compress
    $f= $group[$i]{file};
    for ($j= 0; $j < @{$f}; $j++)
    {
        print " ",${$f}[$j],"\n";
        WriteJSFile(${$f}[$j],\*OUT)
    }
    close OUT;

    # Greate gzipped copy of the file
    system "gzip -c $fn > $fn.gz"
}

# WriteJSFile($filename, $outh)
#   Write a JavaScript file to the filehandle $outh with some compressions.

sub WriteJSFile
{
    my ($file, $outh)= @_;
    open JS,"$SRC/$file.js" or
       die("No file $SRC/$file found");
    while (<JS>) {
        s/(^| |\t)\/\/.*$//;        # delete // comments
        s/^[ \t]+//;                # delete leading white space
        s/[ \t]+$//;                # delete trailing white space
        next if /^$/;               # delete blank lines
        print $outh $_;
    }
}

I keep this program in the "js" directory and run it any time I'm ready to publish my changes to the Javascript source.

Note that I leave /*...*/ type comments in the code. I use these for copyright messages and // comments for everything else.

Not counting configuration data, it's about 20 lines of code. Pretty simple, and a very safe bet that it isn't going to introduce any errors into the code.

Another speedup that can be done is to set the expiration headers of the big files far into the future, so that browsers will just use them from cache without even asking the server if they have changed. This can cause problems when you actually do change the files, but those can be worked around without too much difficulty. My problem was that I couldn't actually convince myself that any of the major browsers were actually heeding the expiration date I set in my tests. They still seemed to be making conditional GET requests for the unexpired files most of the time. I need to research this more.