A cache module for python CGI scripts

October 14th, 2008

After an earlier failed attempt at writing a cache module for python CGI scripts, a not-so-nice email from one of my web hosts made me try again after they mentioned they have no plans for enabling Apache’s mod_cache module. I suspect that the pickle module somehow messed up my previous attempt, leaving me with no other choice than to disable it and burden the server more than needed.

The building blocks for a flat-file cache module is a unique mapping from url to filename and a place to store the files. An md5 hash creates a sufficiently unique mapping and a directory is a nice place to store files. We also need a time limit (in seconds) so the web pages are not stored forever.

The complete module


"""A module that writes a webpage to a file so it can be restored at a later time
Interface:
filecache.write(...)
filecache.read(...)
"""

import time
import os
import md5

def key(url):
k = md5.new()
k.update(url)
return k.hexdigest()

def filename(basedir, url):
return "%s/%s.txt"%(basedir, key(url))

def write(url, basedir, content):
""" Write content to cache file in basedir for url"""
fh = file(filename(basedir, url), mode="w")
fh.write(content)
fh.close()

def read(url, basedir, timeout):
"""Read cached content for url in basedir if it is fresher than timeout (in seconds)"""
fname = filename(basedir, url)
content = ""
if os.path.exists(fname) and (os.stat(fname).st_mtime > time.time() - timeout):
fh = open(fname, "r")
content = fh.read()
fh.close()
return content

A minimal example, including time measurement

Instead of explaining what the functions are doing, I hope they are fairly understandable and that a usage example is sufficient for understanding how it works. As a bonus, the example includes timing so you can see how long it takes to build your pages from scratch as opposed to reading from cache.


import time
startTime = time.clock()

import sys
import os
import filecache

cache_timeout = 10
cache_basedir = "cache"

cache = filecache.read(os.environ.get("REQUEST_URI", ""), cache_basedir, cache_timeout)
if cache:
print cache
print "“%(time.clock() - startTime)
sys.exit()

# generete output
output =”stuff”

#Write output to cache
filecache.write(os.environ.get(”REQUEST_URI”, “”), cache_basedir, output)
print output
print ““%(time.clock() - startTime)

Store the example as example.py, the cache module as filecache.py and create a directory named cache. Run as


python example.py

Note that the timeout is set very low, at 10 seconds. This is fine for testing but not much more.

While this very minimal example is slower when the output is fetched from cache, I can assure you that this is not the case with more realistic web pages. In my case, I have experienced speedups from around 0.7 seconds to hardly measurable time (0.00 to 0.01 seconds). This does not include the time needed to start the Python interpreter and importing the time module so a very popular site might still get you in trouble with your web host. I think the mod_cache module for Apache would take care of that too, but that wasn’t available in my case.

There is no way to remove the cache other than a rm * or similar in the cache directory. It works for me but probably not for a very dynamic site.

This cache module is used in production at Good Web Hosting Info with a timeout of one hour. The time measurement is shown at the bottom of the source code. It’s not a very busy site so it’s quite likely to get a page built from scratch if you look beyond the front page. The time precision is 1/100 second so cached pages normally have 0.000 of 0.010 seconds.

The python files are also available from here.

Using SBCL for Common Lisp shell scripting

October 5th, 2008

I have previously developed some Commom Lisp shell scripts with Emacs/Slime/SBCL and used Clisp for running the scripts. But after running into a compatibility problem between SBCL and Clisp while developing a script for maintaing an automatic mirror of my music collection where flac files are converted to much smaller ogg files, I decided I might as well do what has to be done for using SBCL directly for running the script.

Prepping SBCL

As was mentioned in a comment in my previous post about using Common Lisp for shell scripting, the SBCL manual outlines a piece of code that must be added to an initialization file (I have added it to my $HOME/.sbclrc file).

After adding that to .sbclrc, the next step is to add a shebang line to my script


#!/usr/bin/sbcl --noinform

Adapting for development

The problem now is that the code that is needed for running the script will also be executed while compiling the file inside Slime. I found that this can be fixed by inspecting the *posix-argv* variable. This is a list, that inside Slime has one entry (”/usr/bin/sbcl”), and when the script is executed from the command line has two entries (path to the SBCL interpreter and path to your script in addition to possible command line arguments to the scripts). So if *posix-argv* is longer than 1, we can execute the script. One caveat here, if you are inside a package, *posix-argv* is not directly available. We must use sb-ext:*posix-argv* instead. This leads us to the following file structure:


#!/usr/bin/sbcl --noinform

;; Lisp code (defuns, defclasses, whatever)

;; If run from command line, the following if-test will succeed
(if (> (length sb-ext:*posix-argv*) 1)
;; Code that is executed from the command line only goes here
)

Now, the script can be developed as usual in Slime and be executed from the command line without changing anything. This also means that the command line arguments to the script is found from the third position and onwards in ext:*posix-argv*.

Script details

The complete script.

This loops through my music directory and creates a parallell directory structure where flac files are converted to ogg files and other file types are just hard linked to the original file. Soft links does not work, because when you try to copy the files to your portable music player, you get a soft link/permission error instead of getting your actual music file.

The flac to ogg conversion is done with sox, so you need to install sox in addition to sbcl and cl-asdf to run this script.

Script usage

General:
./converter.lisp basedir targetdir quality
Example:
./converter.lisp /media/sda4/musikk/ /media/sda4/ogg-musikk/ 5
Note: You need the ending / in the directory paths, it does not work otherwise.

Portability hints for Clisp

The main problem I ran into with Clisp is different behaviour in the directory function. This can probably be fixed by using the complete pathname library from PCL chapter 15. The shebang line must be changed to f.ex.
#!/usr/bin/env clisp
and command line arguments are available in ext:*args*

More hints for porting shell scripts to other CL implementations are available in the Common Lisp Cookbook.

How to fix CSS problems in Internet Explorer

February 24th, 2008

Internet Explorer is a major source of frustration for web developers because
of its incomplete and buggy CSS support. Luckily, there are quite simple fixes
for many of the problems.

The solutions

By adding css rules that logically should not have any effect (and will have no effect in sane web
browsers) you can make IE behave like it should. The following rules are my common tricks for doing this.


line-height: 1.25;

zoom: 1;

position: relative;

In addition, there are 2 more tricks that might help:

  • Invisible bottom border for bleeding background images.
  • Putting absolutely positioned elements inside relatively positioned elements at the bottom of the relatively positioned element.

Examples and explanations

The line-height trick is used when the content in a floated element does not appear before you scroll it off screen and back. Which value you specify for line-height does not matter, 1.25 is just what I think is the closest to browser default values. This is a typical error for IE6 and does not happen that often in IE7.

For things like disappearing background images and incorrectly (absolutely) positioned elements, the position: relative and zoom: 1 combo often solves the problem.
You don’t always need both of them. Try them one by one and then in combination if you don’t see any effect. Note: zoom is a proprietary property that is implemented in IE only so it will invalidate your stylesheet.

Sometimes, a vertically repeated background image bleeds out of its element. This can be stopped by adding a bottom border on the element. You can use the background color of the containting element as border color so the extra border won’t be visible. This problem might also be solved with the zoom trick.

A more complicated problem sometimes happen when you have absolutely positioned elements inside a relatively positioned element. The absolutely positioned elements will then not appear no matter which of the above mentioned tricks you apply. You must then edit your html code and place your absolutely positioned elements at the bottom of your relatively positioned element (just before the closing tag). I don’t remember having this problem on anything but absolutely positioned images but you might have other experiences.

Using Opera’s web search capabilities for easy validation

July 31st, 2007

Opera has a quite nice feature that allows you to add arbitrary search engines to its’ builtin selection of search engines, available from a dedicated search field or from the address field with special keywords. By using this feature you can also add easy access to online validators for HTML, CSS and RSS/Atom feeds.

Adding a search engine

All you have to do to add a new search engine is to right click in that engines’ search field and select Create search from the context menu. You can now modify the name for that search and add a keyword you can use to access the new search from the address field.

By default, by typing g foo in the address field, you use Google to search for foo. y foo does the same, but with Yahoo instead of Google.

Adding validators

Since Opera just stores data for the search form and submits an appropriate GET/POST request this is not really limited to searching. You can use the same technique to submit any form.

Just right click in the url field for a validator of some sort, f.ex. the Feed Validator and create search. If you enter feed in the keyword field you can type/paste f.ex. feed http://www.xhbml.com/feed/ into the address field in your Opera and you will instantly validate the RSS feed for this website.

Opera will also store other data for the form so you can store a customized validation for W3’s html validator that might suit you better than the one that is available from Opera’s context menu. By doing the same for W3’s CSS validator you can have RSS/Atom, HTML and CSS validation instantly available from the address field.

Examples

You can type/paste any of the following (without the list marker) into your address field to try it out after completing the instructions above.

  • html http://www.xhbml.com
  • css http://www.xhbml.com
  • css http://www.xhbml.com/wp-content/themes/default/style.css
  • feed http://www.xhbml.com/feed/

Learning Common Lisp by using it for shell scripting

May 13th, 2007

For quite a while now, Lisp, or more specifically, Common Lisp has been on my list of languages to learn.
Lack of time and suitable projects for learning has put it off for a quite a while. And, while Lisp can be a neat
language for almost everything, a significant effort is required for getting up to sufficient speed for programming a typical web application.

Getting the idea

A post I found on comp.lang.lisp
raised the subject of
using Common Lisp for shell scripting
. So, I thought that shell scripting would be perfect for learning Common Lisp. Small programs that are done quickly and are usable are a lot more motivating
than dabbling with a small part of a normal application and not getting anywhere near a finished project.

So far, it’s been pretty rewarding. I have written a small backup utility (script source) that copies a file and adds a timestamp to the filename and a script for reminding me every time one of my domain names
closes in on its expiration date (script source). Variable binding with let and multiple-value-bind, the
format directive and date handling are the most important lessons I have learned from these scripts.

Shell scripting quirks

Compared to a normal Common Lisp program, there are two things to notice. First, the shebang line
that is required for use as a shell script (#!/usr/bin/env clisp). It causes a syntax error in Lisp so I
comment it out for development. Second, the whole script is fired with a main function that takes no
arguments. The main call is also commented out for development. I have basically followed the model
from Lars Rune Nøstdal’s example script from the mentioned CLL thread
and I think the idea behind this approach is that all functions can be developed and tested
independently. When going from development to executable shell script, all that is needed is to uncoment the shebang line and the call to the main function.

You can probably use any Common Lisp implementation for this. Clisp is quite small so it works well for shell scripting and is what I have used for my scripts.

Script usage

Usage of the backup script (after making the script executable with chmod +x clbackup.lisp):

harald@semmentjern:~/prog/lisp$ mkdir test
harald@semmentjern:~/prog/lisp$ touch test/somefile.txt
harald@semmentjern:~/prog/lisp$ ls test
somefile.txt
harald@semmentjern:~/prog/lisp$ ./clbackup.lisp test/somefile.txt
Copying file test/somefile.txt to test/somefile.txt.20070513154857
harald@semmentjern:~/prog/lisp$ ls test
somefile.txt  somefile.txt.20070513154857

Usage of the domain alert script (with a 150 days limit):

harald@semmentjern:~/prog/lisp$ ./domainalert.lisp 150
goodwebhosting.info expires in 144 days
flaks.net expires in 150 days