where to buy misoprostol online how to buy valtrex
Programming | Evan Fosmark - Part 2

Archive for the ‘Programming’ Category.

Sexy Lexing with Python

Lexical analysis, a daunting task right? Wrong! In the following document we’ll walk through different methods of lexical scanning in Python. First, we’ll look at a pre-built solution found in the library, and then at a custom-built solution.

Using re.Scanner

In the re module there is a class called Scanner that can do lexical sanning. It is completely undocumented other than for a small example code block found on the Python-Dev mailing list, but well worth mentioning. It works by feeding in a list of regular-expressions and callback functions linked to them. When it matches a token, it first runs its value through the appropriate callback and then appends it to the token list being returned. If the scanner reaches a spot where a token match cannot be found, it returns what matches (if any) it did have along with the rest of the document that couldn’t be matched. Here is an example:

import re
 
def identifier(scanner, token): return "IDENT", token
def operator(scanner, token):   return "OPERATOR", token
def digit(scanner, token):      return "DIGIT", token
def end_stmnt(scanner, token):  return "END_STATEMENT"
 
scanner = re.Scanner([
    (r"[a-zA-Z_]\w*", identifier),
    (r"\+|\-|\\|\*|\=", operator),
    (r"[0-9]+(\.[0-9]+)?", digit),
    (r";", end_stmnt),
    (r"\s+", None),
    ])
 
tokens, remainder = scanner.scan("foo = 5 * 30; bar = bar - 60;")
for token in tokens:
    print token

Which provides the output:

('IDENT', 'foo')
('OPERATOR', '=')
('DIGIT', '5')
('OPERATOR', '*')
('DIGIT', '30')
END_STATEMENT
('IDENT', 'bar')
('OPERATOR', '=')
('IDENT', 'bar')
('OPERATOR', '-')
('DIGIT', '60')
END_STATEMENT

Truly easy, fast, and relatively simple to understand.
Using this is perfect for small projects, but it has some downsides such as not allowing simple error handling and not implicitly handling whitespace. Additionally, it suffers from having to tokenize the whole document before being able to provide anything, and that can get costly on larger projects.

Custom-Built Lexer

I had decided to build a custom lexer as a means to break away from the re.Scanner. Here is the code for the actual lexer. It is broken into three classes: UnknownTokenError which gets thrown when a non-recognized token is found, Lexer which holds the settings for scanning, and _InputScanner which is in charge of scanning specific input, as the name implies. A few benefits built into the Lexer include automatic whitespace handling (if desired) and the ability to easily make the scan case-insensitive. Additionally, you can optionally provide a callback with the rule to run the token through before returning it by making the rule a tuple of the rule and callback.

import re
 
 
class UnknownTokenError(Exception):
    """ This exception is for use to be thrown when an unknown token is
        encountered in the token stream. It hols the line number and the
        offending token.
    """
    def __init__(self, token, lineno):
        self.token = token
        self.lineno = lineno
 
    def __str__(self):
        return "Line #%s, Found token: %s" % (self.lineno, self.token)
 
 
class _InputScanner(object):
    """ This class manages the scanning of a specific input. An instance of it is
        returned when scan() is called. It is built to be great for iteration. This is
        mainly to be used by the Lexer and ideally not directly.
    """
 
    def __init__(self, lexer, input):
        """ Put the lexer into this instance so the callbacks can reference it 
            if needed.
        """
        self._position = 0
        self.lexer = lexer
        self.input = input
 
    def __iter__(self):
        """ All of the code for iteration is controlled by the class itself.
            This and next() (or __next__() in Python 3.0) are so syntax
            like `for token in Lexer(...):` is valid and works.
        """
        return self
 
    def next(self):
        """ Used for iteration. It returns token after token until there
            are no more tokens. (change this to __next__(self) if using Py3.0)
        """
        if not self.done_scanning():
            return self.scan_next()
        raise StopIteration
 
    def done_scanning(self):
        """ A simple boolean function that returns true if scanning is
            complete and false if it isn't.
        """
        return self._position >= len(self.input)
 
    def scan_next(self):
        """ Retreive the next token from the input. If the
            flag `omit_whitespace` is set to True, then it will
            skip over the whitespace characters present.
        """
        if self.done_scanning():
            return None
        if self.lexer.omit_whitespace:
            match = self.lexer.ws_regexc.match(self.input, self._position)
            if match:
                self._position = match.end()
        match = self.lexer.regexc.match(self.input, self._position)
        if match is None:
            lineno = self.input[:self._position].count("\n") + 1
            raise UnknownTokenError(self.input[self._position], lineno)
        self._position = match.end()
        value = match.group(match.lastgroup)
        if match.lastgroup in self.lexer._callbacks:
            value = self.lexer._callbacks[match.lastgroup](self, value)
        return match.lastgroup, value
 
 
class Lexer(object):
    """ A lexical scanner. It takes in an input and a set of rules based
        on reqular expressions. It then scans the input and returns the
        tokens one-by-one. It is meant to be used through iterating.
    """
 
    def __init__(self, rules, case_sensitive=True, omit_whitespace=True):
        """ Set up the lexical scanner. Build and compile the regular expression
            and prepare the whitespace searcher.
        """
        self._callbacks = {}
        self.omit_whitespace = omit_whitespace
        self.case_sensitive = case_sensitive
        parts = []
        for name, rule in rules:
            if not isinstance(rule, str):
                rule, callback = rule
                self._callbacks[name] = callback
            parts.append("(?P<%s>%s)" % (name, rule))
        if self.case_sensitive:
            flags = re.M
        else:
            flags = re.M|re.I
        self.regexc = re.compile("|".join(parts), flags)
        self.ws_regexc = re.compile("\s*", re.MULTILINE)
 
    def scan(self, input):
        """ Return a scanner built for matching through the `input` field. 
            The scanner that it returns is built well for iterating.
        """
        return _InputScanner(self, input)

This version does on-the-fly scanning through the use of building the class as an iterator. So, you can work with a token the moment it gets scanned, and before any other tokens get scanned. This can help reduce overhead in case you have a large document and may need to exit prematurely. And, of course, when you write your own lexer, it is much easier to modify it to your needs. Now let’s test the above code and see what sort of token stream we arrive with.

def stmnt_callback(scanner, token):
    """ This is just an example of providing a function to run the
        token through.
    """
    return ""
 
rules = [
    ("IDENTIFIER", r"[a-zA-Z_]\w*"),
    ("OPERATOR",   r"\+|\-|\\|\*|\="),
    ("DIGIT",      r"[0-9]+(\.[0-9]+)?"),
    ("END_STMNT",  (";", stmnt_callback)), 
    ]
 
lex = Lexer(rules, case_sensitive=True)
for token in lex.scan("foo = 5 * 30; bar = bar - 60;"):
    print token

Outputs:

('IDENTIFIER', 'foo')
('OPERATOR', '=')
('DIGIT', '5')
('OPERATOR', '*')
('DIGIT', '30')
('END_STMNT', '')
('IDENTIFIER', 'bar')
('OPERATOR', '=')
('IDENTIFIER', 'bar')
('OPERATOR', '-')
('DIGIT', '60')
('END_STMNT', '')

Pretty easy to understand, right? A great thing about the `Lexer` is that it is easy to subclass. For instance, in a project that I’m doing for a complex template parser, I added in the ability to only do scanning inside specific tags while treating non-tag data as their own type of token. Maybe I’ll cover that in more detail in a future post.

Update: The custom lexer has been updated to accept a list of tuples as the rules instead of the dict. This is so one can implement an order on the rules.

Random password generator in Python and Tkinter

This is always a fun project. The task? To create a random password of random length. The reason for a password generator obvious: you suck at choosing a password. Let’s start with how to create the actual generator, and then we’ll focus on the presentation.

from random import *
import string
 
# The characters to make up the random password
chars = string.ascii_letters + string.digits
 
def random_password():
    """ Create a password of random length between 8 and 16
        characters long, made up of numbers and letters.
    """
    return "".join(choice(chars) for x in range(randint(8, 16)))

The above function is pretty easy to follow. What it does is build a generator object that creates a random list of characters between 8 and 16 in length. It then just compresses the list into a string. You can do something as simple as print random_password() and it’ll display a password in the terminal. This, of course, isn’t really the best way of acheiving it. After all, you don’t want to have to navigate the terminal each and every time. So, let’s add in a graphical user interface using the Tkinter window builder:

from Tkinter import *
from random import *
import string
 
# The characters to make up the random password
chars = string.ascii_letters + string.digits
 
def random_password():
    """ Create a password of random length between 8 and 16
        characters long, made up of numbers and letters.
    """
    return "".join(choice(chars) for x in range(randint(8, 16)))
 
#
# BEGIN GUI CODE
#
 
root = Tk()
root.title("Password Generator")
root.resizable(0,0)
root.minsize(300,0)
 
frame = Frame(root)
frame.pack(pady=10, padx=5)
 
content = StringVar()
updater = lambda:content.set(random_password())
 
gen_btn = Button(frame, text="Generate", command=updater)
gen_btn.config(font=("sans-serif", 14),  bg="#92CC92")
gen_btn.pack(side=LEFT, padx=5)
 
field = Entry(frame, textvariable=content)
field.config(fg='blue', font=('courier',  16, "bold"), justify='center')
field.pack(fill=BOTH, side=RIGHT, padx=5)
 
root.mainloop()

The above should be pretty simple to follow. As you can see, pressing the gen_btn activates the updater lambda function which populates the entry field. Here is a sample output:

password generator

Simple, clean, and easy. This is just a quick project I made for myself and decided to share it.

UNIX Time Clock, in honor of “1234567890 Day”

On this Friday the 13th, we are to witness a very unique moment in history. That is, it is the day when UNIX time will roll over to 1234567890. Here’s a simple Python/Tk app that’ll show the timestamp.

import time
from Tkinter import *
 
class TimestampClock(Frame):
    def __init__(self, root):
        Frame.__init__(self, root)
        self.pack()
        self.time = Label(self, text=int(time.time()))
        self.time.config(fg='red', font=('Monospace', 20, 'bold'))
        self.time.pack(padx=10, pady=10)
        self.update()
 
    def update(self):
        self.time.config(text=int(time.time()))
        self.after(1000, self.update)
 
if __name__ == "__main__":
    root = Tk()
    root.title("Timestamp-Clock")
    root.wm_attributes("-topmost", 1)
    root.resizable(0,0)
    root.minsize(300,50)
    TimestampClock(root).mainloop()

I hope you did something special for the occasion. Me? I was sitting in a computer lab at Chemeketa waiting for the clock to hit that special number. Here is my screen grab of this great event:

1234567890

Module wsgiref doesn’t work in Python 3.0 – How to fix it

A very large oddity regarding the newest version of Python is that the wsgiref module is completely broken. Go ahead and try to run the following:

from wsgiref.simple_server import make_server, demo_app
httpd = make_server('', 8000, demo_app)
httpd.handle_request()

You should notice that a nice little "ValueError: need more than 1 value to unpack" message when you try to open it in your web browser. The main ticket for this bug can be found here, and it comes with a patch! If you’re running Linux, then the fix is easy. Once you download it simply run this in the command line in the same folder as the patch to issue it:

sudo patch < wsgiref.patch

It will come up with prompts for each of the files to in wsgiref to patch. Simply specify their locations and you’re done!

Update: Sources have told me that this has been fixed in Python 3.0.1, so here’s hoping.

Cross-platform file locking support in Python

On occasion, one requires the need to lock a file. Now, this is relatively easy if you’re targeting a specific platform because there is often a function in the library to do it for you. But what if you want to target a larger set of platforms? The following is a solution I wrote up today. It’s lockfile creation is an atomic operation and thus doesn’t suffer from any race conditions. It should work in both Windows and Unix environments.

# Copyright (c) 2009, Evan Fosmark
# All rights reserved.
# 
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met: 
# 
# 1. Redistributions of source code must retain the above copyright notice, this
#    list of conditions and the following disclaimer. 
# 2. Redistributions in binary form must reproduce the above copyright notice,
#    this list of conditions and the following disclaimer in the documentation
#    and/or other materials provided with the distribution. 
# 
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# 
# The views and conclusions contained in the software and documentation are those
# of the authors and should not be interpreted as representing official policies, 
# either expressed or implied, of the FreeBSD Project.
 
import os
import time
import errno
 
class FileLockException(Exception):
    pass
 
class FileLock(object):
    """ A file locking mechanism that has context-manager support so 
        you can use it in a with statement. This should be relatively cross
        compatible as it doesn't rely on msvcrt or fcntl for the locking.
    """
 
    def __init__(self, file_name, timeout=10, delay=.05):
        """ Prepare the file locker. Specify the file to lock and optionally
            the maximum timeout and the delay between each attempt to lock.
        """
        self.is_locked = False
        self.lockfile = os.path.join(os.getcwd(), "%s.lock" % file_name)
        self.file_name = file_name
        self.timeout = timeout
        self.delay = delay
 
 
    def acquire(self):
        """ Acquire the lock, if possible. If the lock is in use, it check again
            every `wait` seconds. It does this until it either gets the lock or
            exceeds `timeout` number of seconds, in which case it throws 
            an exception.
        """
        start_time = time.time()
        while True:
            try:
                self.fd = os.open(self.lockfile, os.O_CREAT|os.O_EXCL|os.O_RDWR)
                break;
            except OSError as e:
                if e.errno != errno.EEXIST:
                    raise 
                if (time.time() - start_time) >= self.timeout:
                    raise FileLockException("Timeout occured.")
                time.sleep(self.delay)
        self.is_locked = True
 
 
    def release(self):
        """ Get rid of the lock by deleting the lockfile. 
            When working in a `with` statement, this gets automatically 
            called at the end.
        """
        if self.is_locked:
            os.close(self.fd)
            os.unlink(self.lockfile)
            self.is_locked = False
 
 
    def __enter__(self):
        """ Activated when used in the with statement. 
            Should automatically acquire a lock to be used in the with block.
        """
        if not self.is_locked:
            self.acquire()
        return self
 
 
    def __exit__(self, type, value, traceback):
        """ Activated at the end of the with statement.
            It automatically releases the lock if it isn't locked.
        """
        if self.is_locked:
            self.release()
 
 
    def __del__(self):
        """ Make sure that the FileLock instance doesn't leave a lockfile
            lying around.
        """
        self.release()

The above class is best used in a context manager fashion through the with statement like in the example below:

with FileLock("test.txt", timeout=2) as lock:
    print("Lock acquired.")
    # Do something with the locked file

The largest downside of this is that the directory the file is located in must be writable. I hope this code helps you. Of course, if you have a better recipe, please share it in the comments. ;)

rot13 in Python 3.x

As of Python 3.0, rot13 is no longer built accessible from the str.encode("rot13") call. If needed, here is an implementation I pieced together:

from string import ascii_uppercase, ascii_lowercase
 
def rot13(data):
    """ A simple rot-13 encoder since `str.encode('rot13')` was removed from
        Python as of version 3.0.  It rotates both uppercase and lowercase letters individually.
    """
    total = []
    for char in data:
        if char in ascii_uppercase:
            index = (ascii_uppercase.find(char) + 13) % 26
            total.append(ascii_uppercase[index])
        elif char in ascii_lowercase:
            index = (ascii_lowercase.find(char) + 13) % 26
            total.append(ascii_lowercase[index])
        else:
            total.append(char)
    return "".join(total)

Pretty simple, right? Knowing how modulus (%) works helped greatly in finding the proper index. I hope this helps.

Update: There’s a simpler solution on the comments below. You should probably use it instead.

Python WSGI Middleware for automatic Gzipping

I’ve just started learning Python WSGI (PEP-333) and thought the best way to learn would be to write some WSGI tools myself. Most recently, I chose to write a middleware application that converts all output into valid gzipped data. In this article, I will be demonstrating how my middleware gzipper works and how to implement it.

Continue reading ‘Python WSGI Middleware for automatic Gzipping’ »