Jim Lawless' Blog


Cheating the LZW

Originally published on: Fri, 24 Apr 2009 00:10:27 +0000

In the early days of the web, many web pages sported hit-counters that would generate a graphical meter indicating the number of visitors who had viewed the page.

I thought that it would be a nice thing to have on my own site, so I set out to build my own.

I had decided to try using Perl to build my counter as it had become my language of choice when developing other CGI's.

Some of the popular hit-counters would accept libraries of JPEG's or GIF's representing various kinds of digits and would then use the GD graphics library to fuse the appropriate digit images together into a single image.

I opted to take a different route.

I had decided to use the GIF format for the output image. I wanted a lossless format that could represent a small, monochrome bitmap containing six eight-by-sixteen-pixel characters.

If one has a binary character-set available, building a raw bitmap of a stream of digits is relatively painless. A monochrome bitmap eight by sixteen by two (colors) by 6 (characters) can be represented in 768 bits ( 96 bytes ) uncompressed. That's pretty tiny!

Unfortunately, the GIF format did not permit a flag to indicate that the bitmap image would be an uncompressed bitmap.

So, I was going to have to employ the LZW compression algorithm on this tiny blob of data. ...or WAS I ?

The LZW algorithm uses a protocol of sorts that must be observed by both the compressor and the decompressor. An initial code-size must be agreed upon by both prior to compression or decompression. This code-size value indicates the bit-width for each code in the I/O stream. ( Like many compression algorithms, the LZW leverages variable-length bit I/O. )

The code-size must allow for two codes; clear and end-of-input.

I decided on a code-size of three bits and allowed the LZW special characters to push the code-size up to four bits ... which is easier to handle in bit-shift operations. The clear-code would become eight and the EOI code would be nine as three bits can represent the numbers zero through seven.

For each bit in the generated bitmap, I wrote both a nybble with a value representing a single pixlel, ...a one or a zero followed by a nybble representing the clear-code.

The bit-image was emitted without any true compression. In fact, using the LZW as a protocol, the stream of raw data grew in size ( four bits to represent a monochrome pixel instead of one plus an extra four bits for each clear code. ) This process uses a byte to represent a single pixel when both nybbles are coupled.

At the end of stream, I needed to emit the EOI character, so I simple wrote a byte with the value 9*16, shifting the nine four pixels to the left so that the decompressor would stop processing the bitmap.

The Perl code below illustrates this technique by creating a GIF file named number.gif with the digits "123456" represented as a bitmapped image.

Sample digits

You might change the value of $count to something other than "123456", but it must be six characters in length and must be comprised of all digits.

The generated image is always 805 bytes in size. 768 bytes for the bitmap, one for the EOI code, and thirty-six bytes for the GIF header and footer.


# License: MIT / X11
# Copyright (c) 2009 by James K. Lawless
#
# Permission is hereby granted, free of charge, to any person
# obtaining a copy of this software and associated documentation
# files (the "Software"), to deal in the Software without
# restriction, including without limitation the rights to use,
# copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following
# conditions:
#
# The above copyright notice and this permission notice shall be
# included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
# HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
# WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
# OTHER DEALINGS IN THE SOFTWARE.

 
@charset=(0,0,0,0,0,0,0,0,0,0,
   0,0,0,0,0,0,0,0,0,0,
   56,24,124,124,12,254,56,254,124,124,
   108,56,198,198,28,192,96,198,198,198,
   198,120,6,6,60,192,192,6,198,198,
   198,24,12,6,108,192,192,6,198,198,
   214,24,24,60,204,252,252,12,124,126,
   214,24,48,6,254,6,198,24,198,6,
   198,24,96,6,12,6,198,48,198,6,
   198,24,192,6,12,6,198,48,198,6,
   108,24,198,198,12,198,198,48,198,12,
   56,126,254,124,30,124,124,48,124,120,
   0,0,0,0,0,0,0,0,0,0,
   0,0,0,0,0,0,0,0,0,0,
   0,0,0,0,0,0,0,0,0,0,
   0,0,0,0,0,0,0,0,0,0);

   # "clear" code
$clr=8;

   # end-of-input code in left-nybble of byte
$eoi=9*16;

   # must be 6-digits
$count="123456";

open(FIL,">number.gif");
binmode(FIL);
   # GIF 87 header
print FIL "GIF87a" ;
print FIL pack("CC",48,0); # Raster width 48
print FIL pack("CC",16,0); # Raster height 16

   # Colormap, numbr bits color res -1, 0, bits per pixel - 1
   # 1 000 0 000
print FIL pack("C",128);

   # Background color index
print FIL pack("C",0);
   # dummy byte
print FIL pack("C",0);

   # Color Map
   # Black
print FIL pack("CCC",0,0,0);

   # White
print FIL pack("CCC",255,255,255);

   # Image descriptor
   # Separator
print FIL ',' ;

   # Left
print FIL pack("CC",0,0);

   # Top
print FIL pack("CC",0,0);

   # Width
print FIL pack("CC",48,0);

   # Height
print FIL pack("CC",16,0);

   # Use global color map=0, sequential image = 0, 000, pixel -1 bpp
   # 0 0 000 000
print FIL pack("C",0);

   # Raster data
   # Code size 3 ( LZW will use 4), this makes clear code 8!, EOI 9
print FIL pack("C",3);

   # here's where we'll need to send the data out in blocks. It will be

$blkcount=0; # total blocks so far
$blkinprog=0; # block in progress

for($j=0;$j<16;$j++) {
   for($i=0;$i<6;$i++) {
      for($bits=128;$bits>=1;$bits=$bits/2) {
         # Determine bit settings
         $b=substr($count,$i,1)-'0';
         if( ($charset[$j*10+$b] & $bits) == $bits ) {
            $bitout = 16;
         }
         else {
            $bitout = 0;
         }
         if( $blkinprog == 0 ) {
            if($blkcount<3) {
               $blkinprog=240;
            }
            else {
               $blkinprog=49;
            }
            print FIL pack("C",$blkinprog);
            $blkcount++;
         }
         print FIL pack("C",$bitout+$clr);
         $blkinprog--;
      }
   }
}
print FIL pack("C",$eoi);
   
   # Zero length for block
print FIL pack("C",0);
   # Terminator
print FIL ";";

close(FIL);

Unless otherwise noted, all code and text entries are Copyright ©2009 by James K. Lawless



Views expressed in this blog are those of the author and do not necessary reflect those of the author's employer. Views expressed in the comments are those of the responding individual.

stumbleupon Save to StumbleUpon
digg Digg it
reddit Save to Reddit
facebook Share on Facebook
twitter Share on Twitter
aolfav More bookmarks


Previous post: E-mail cleansing
Next post:WSH2EXE part 1


About Jim ...


Click **here**
to try out MailWrench;
a command-line SMTP /
SMTPS (Google Gmail)
mailer for Windows.


Follow me on Twitter

http://twitter.com/lawlessGuy


Recent Posts

A JavaScript REPL for Android Devices

MailSend is Free

My Blog Engine

The October 10th Bug

A Review of Kevin Mitnick's Book Ghost in the Wires

Spellbound by Web Programming

Backlinks to my Blog Posts

Play MP3 Files with Python on Windows


Random Posts

An Interview with Brad Templeton

Yet Another Enhanced Echo Command

Flirting with Forth

An Embedded Mini-Interpreter

Invoking the Default Windows Screen-Saver

A Command-Line CD Tray Opener

A TCP Command Line Interface in Rhino JavaScript

Computers I Have Known

Book Review : Using Google App Engine

Backlinks to my Blog Posts


Full List of Posts

http://www.mailsend-online.com/bloglist.htm


Recent Posts from my Other Blog

Remembering Dr. San Guinary

Why Some Web Sites will go Dark on Jan 18th

SNL Superhero Skit

More Ruby Games

My Ruby Game Challenge Entry

Steal this Bookmarklet

Nerd Toys

Learn New Jargon, You Must

Spot the Wiebe

Tech Magazine Glory Days

Book Review : Paull Allen - Idea Man

A 90's Experiment in Online Systems - The U.S. West CommunityLink Service