where to buy misoprostol online how to buy valtrex
Python WSGI Middleware for automatic Gzipping | Evan Fosmark

Python WSGI Middleware for automatic Gzipping

I’ve just started learning Python WSGI (PEP-333) and thought the best way to learn would be to write some WSGI tools myself. Most recently, I chose to write a middleware application that converts all output into valid gzipped data. In this article, I will be demonstrating how my middleware gzipper works and how to implement it.

Gzipping arbitrary data

The first obstacle was that Python has no simple way of converting a normal string into a gzipped string. To do this, let’s build off of the gzip.GzipFile object, which isn’t really designed for this sort of task.

#!/usr/bin/python2.5
from gzip import GzipFile
import StringIO
 
def gzip_string(string, compression_level):
    """ The `gzip` module didn't provide a way to gzip just a string.
        Had to hack together this. I know, it isn't pretty.
    """
    fake_file = StringIO.StringIO()
    gz_file = GzipFile(None, 'wb', compression_level, fileobj=fake_file)
    gz_file.write(string)
    gz_file.close()
    return fake_file.getvalue()

The above function accomplishes this nicely. By using a `StringIO` instance for the `fileobj` in the `GzipFile`, it allows us to make the function think that it is writing to a normal file. Instead, we’re grabbing the content through our fake file.

Does the client want gzipped output?

This is a pretty important task. What if the client didn’t even ask for gzipped output? Or worse, what if they can’t even support it? Cases like these must be taken into consideration. We’ll use the following code to check.

def parse_encoding_header(header):
    """ Break up the `HTTP_ACCEPT_ENCODING` header into a dict of
        the form, {'encoding-name':qvalue}.
    """
    encodings = {'identity':1.0}
 
    for encoding in header.split(","):
        if(encoding.find(";") > -1):
            encoding, qvalue = encoding.split(";")
            encoding = encoding.strip()
            qvalue = qvalue.split('=', 1)[1]
            if(qvalue != ""):
                encodings[encoding] = float(qvalue)
            else:
                encodings[encoding] = 1
        else:
            encodings[encoding] = 1
    return encodings
 
def client_wants_gzip(accept_encoding_header):
    """ Check to see if the client can accept gzipped output, and whether
        or not it is even the preferred method. If `identity` is higher, then
        no gzipping should occur.
    """
    encodings = parse_encoding_header(accept_encoding_header)
 
    # Do the actual comparisons
    if('gzip' in encodings):
        return encodings['gzip'] >= encodings['identity']
 
    elif('*' in encodings):
        return encodings['*'] >= encodings['identity']
 
    else:
        return False

What this does is parse the `HTTP_ACCEPT_ENCODING` response header to make sure that it says that gzipping is alright and preferred over no compression. It does this by looking at the qvalue weights for each type of encoding. For more information on this, I suggest you read RFC 2616, section 14 which covers the “Accept-Encoding” header.

Creating the WSGI middleware

Alright, now it’s time to use that function an create a middleware application.

class Gzipper(object):
    """ WSGI middleware to wrap around and gzip all output.
        This automatically adds the content-encoding header.
    """
    def __init__(self, app, compresslevel=6):
        self.app = app
        self.compresslevel = compresslevel
 
    def __call__(self, environ, start_response):
        """ Do the actual work. If the host doesn't support gzip as a proper encoding,
            then simply pass over to the next app on the wsgi stack.
        """
        accept_encoding_header = environ.get("HTTP_ACCEPT_ENCODING", "")
        if(not client_wants_gzip(accept_encoding_header)):
                return self.app(environ, start_response)
 
        def _start_response(status, headers, *args, **kwargs):
            """ Wrapper around the original `start_response` function.
                The sole purpose being to add the proper headers automatically.
            """
            headers.append(("Content-Encoding", "gzip"))
            headers.append(("Vary", "Accept-Encoding"))
            return start_response(status, headers, *args, **kwargs)
 
        # Unfortunately, we can't gzip data chunk-by-chunk, so we need to join up whatever
        # is being sent out first. (Remember, WSGI apps return items which can be iterated.)
        data = "".join(self.app(environ, _start_response))
        return [gzip_string(data, self.compresslevel)]

Before doing any encoding, this checks to make sure the client can even support gzipped output by looking at the “HTTP_ACCEPT_ENCODING” header value. What’s really cool about how this works is that it adds a content-encoding header for you. This, however, shows a weakness in WSGI as the middleware is required to write a new start_response function that wraps around the old one. WSGI 2.0 clears this up, but this version isn’t in use yet.

Using the middleware

Now, let’s see how we would implement this with a simple “hello world” WSGI application.

#!/usr/bin/python2.5
from gzipper import Gzipper
from wsgiref.simple_server import WSGIServer, WSGIRequestHandler
 
#
# Let's create a simple WSGI application to test out the Gzipper.
#
def test_app(environ, start_response):
    status = "200 OK"
    headers = [("content-type", "text/html")]
    start_response(status, headers)
 
    return ["Hello world!"]
 
# Set up the WSGI server and add the middleware that wraps around
# the actual application.
httpd = WSGIServer(('', 8080), WSGIRequestHandler)
httpd.set_app(Gzipper(test_app, compresslevel=8))
httpd.serve_forever()

Final thoughts

Middleware can be incredibly useful. I, however, have been seeing numerous people approaching it incorrectly by modifying the environ and having the application relying on it. I believe that PJ Eby said it best: “If your application requires that API to be present, then it’s not middleware any more!” It would be better to just add that code to a library utilized by the application.

To view all of the code in one block, go to the next page.

Pages: 1 2

 

 

6 Comments

  1. japherwocky wrote,

    I think that’s exactly the proper usage of StringIO. Not a hack at all. :)

  2. Evan wrote,

    Heh, yeah I guess you’re right. It just looks messy to me. :-P

  3. Jim wrote,

    Two bugs:

    > if(environ.get(“HTTP_ACCEPT_ENCODING”, “”).find(“gzip”) < 0)

    This fails for cases like Accept-Encoding: identity, compress;q=0.5, gzip;q=0

    You need to actually split up the header value properly in order to handle it correctly, a simple search fails for corner cases.

    You also need to transmit a Vary header.

  4. Evan wrote,

    Jim, thank you for your input. I have updated the code accordingly.

  5. Sergey wrote,

    In the first for loop, you could do

    encoding, sep, qvalue = encoding.partition(';')
    qvalue = qvalue.partition('=')[2]
    qvalue = 1 if not qvalue else float(qvalue)
    encodings[encoding] = qvalue

    The advantage here is that the code is linear. The disadvantage is that a later Python version is required.

  6. gthomas wrote,

    def iterstream(stream, chunksize=65536):
        '''stream iterator'''
        read = True
        if not hasattr(stream, 'read'):
            if hasattr(stream, 'next') and not isinstance(stream, basestring):
                read = False
                for chunk in stream:
                    yield chunk
            else:
                from cSringIO import StringIO
                stream = StringIO(stream)
        while read:
            buf = stream.read(chunksize)
            if not buf: break
            yield buf
     
    import zlib, struct
    def _zip_stream(input, compress_level=6, zlib_format=True):
        '''compress chunked input with zlib'''
        #
        size = 0
        if zlib_format:
            crc = zlib.crc32('')
            _crc32 = zlib.crc32
            # magic header, compression method, no flags
            header = '372131000'
            # timestamp
            header += struct.pack('&lt;L', 0)
            # uh.. stuff
            header += '02377'
            yield header
     
        compress = zlib.compressobj(compress_level, zlib.DEFLATED,
            -zlib.MAX_WBITS if zlib_format else zlib.MAX_WBITS, zlib.DEF_MEM_LEVEL, 0)
        _compress = compress.compress
     
        #print '$ input: %s (%s) %s' % (repr(input), type(input), dir(input))
        #
        for buf in iterstream(input):
            #print '$ %d' % len(buf)
            if len(buf) != 0:
                if zlib_format:
                    crc = _crc32(buf, crc)
                size += len(buf)
                yield _compress(buf)
     
        yield compress.flush()
        if zlib_format:
            yield struct.pack('&lt;LL', crc &amp; 0xFFFFFFFFL, size &amp; 0xFFFFFFFFL)
     
    def gzip_stream(input, compress_level=6):
        return _zip_stream(input, compress_level=compress_level, zlib_format=True)
     
    def deflate_stream(input, compress_level=6):
        # NOTE: this produces RFC-conformant but some-browser-incompatible output.
        # The RFC says that you're supposed to output zlib-format data, but many
        # browsers expect raw deflate output. Luckily all those browsers support
        # gzip, also, so they won't even see deflate output.
        return _zip_stream(input, compress_level=compress_level, zlib_format=False)

Leave a comment

Buy Buy Neurontin Pill Online
Where To Buy Cheap Levaquin No Perscription
Wholesale Neurontin Cheap
Side Effects On Neurontin
Purchase Levaquin In Internet Priority Mail China
Get Nizoral Mail Order
Lipitor online Canada no prescription
Purchase Online No Prescription Neurontin
Buy Cheapest Online Neurontin
Buy Generic Lexapro in US
Find Diflucan online purchase
Purchase No Prescription Lisinopril
Nizoral In Canada
Buy Inderal Overnight Free Delivery
Online Order Levlen Without Prescription
Buy Levlen
Order Nizoral Medication
Buy Tablets Lasix Online
Take Lexapro Without Prescription
Take Lipitor Legally