Archive for the 'Python' Category

Using **kwargs with CherryPy and WTForms

Sunday, June 23rd, 2013

I just ran into a problem with using **kwargs as a catchall parameter in cherrypy while sending the data into a form built with WTForms. Like this:


def index(self, **kwargs):
sf = forms.SearchForm(kwargs)

This leads to an error:

TypeError: formdata should be a multidict-type wrapper that supports the 'getlist' method

The reason is that WTForms expects something that supports the ‘getlist’ method while kwargs is a plain dictionary and does not have any ‘getlist’ method. The solution i found was to subclass dict and just add the getlist method. As far as I know, getlist refers to a method in the official cgi module and the cgi module documentation says that

This method always returns a list of values associated with form field name. The method returns an empty list if no such form field or value exists for name. It returns a list consisting of one item if only one such value exists.

So, a simple implementation of a subclass of dict that just adds the getlist method looks like this:


class InputDict(dict):
def getlist(self, arg):
for key in self.keys():
if self.has_key(arg):
if isinstance(self[arg], list):
return self[arg]
else:
return [self[arg]]
else:
return []

And we can use it to fill our form:


def index(self, **kwargs):
sf = forms.SearchForm(InputDict(kwargs))

A cache module for python CGI scripts

Tuesday, October 14th, 2008

After an earlier failed attempt at writing a cache module for python CGI scripts, a not-so-nice email from one of my web hosts made me try again after they mentioned they have no plans for enabling Apache’s mod_cache module. I suspect that the pickle module somehow messed up my previous attempt, leaving me with no other choice than to disable it and burden the server more than needed.

The building blocks for a flat-file cache module is a unique mapping from url to filename and a place to store the files. An md5 hash creates a sufficiently unique mapping and a directory is a nice place to store files. We also need a time limit (in seconds) so the web pages are not stored forever.

The complete module


"""A module that writes a webpage to a file so it can be restored at a later time
Interface:
filecache.write(...)
filecache.read(...)
"""

import time
import os
import md5

def key(url):
k = md5.new()
k.update(url)
return k.hexdigest()

def filename(basedir, url):
return "%s/%s.txt"%(basedir, key(url))

def write(url, basedir, content):
""" Write content to cache file in basedir for url"""
fh = file(filename(basedir, url), mode="w")
fh.write(content)
fh.close()

def read(url, basedir, timeout):
"""Read cached content for url in basedir if it is fresher than timeout (in seconds)"""
fname = filename(basedir, url)
content = ""
if os.path.exists(fname) and (os.stat(fname).st_mtime > time.time() - timeout):
fh = open(fname, "r")
content = fh.read()
fh.close()
return content

A minimal example, including time measurement

Instead of explaining what the functions are doing, I hope they are fairly understandable and that a usage example is sufficient for understanding how it works. As a bonus, the example includes timing so you can see how long it takes to build your pages from scratch as opposed to reading from cache.


import time
startTime = time.clock()

import sys
import os
import filecache

cache_timeout = 10
cache_basedir = "cache"

cache = filecache.read(os.environ.get("REQUEST_URI", ""), cache_basedir, cache_timeout)
if cache:
print cache
print ""%(time.clock() - startTime)
sys.exit()

# generete output
output ="stuff"

#Write output to cache
filecache.write(os.environ.get("REQUEST_URI", ""), cache_basedir, output)
print output
print ""%(time.clock() - startTime)

Store the example as example.py, the cache module as filecache.py and create a directory named cache. Run as


python example.py

Note that the timeout is set very low, at 10 seconds. This is fine for testing but not much more.

While this very minimal example is slower when the output is fetched from cache, I can assure you that this is not the case with more realistic web pages. In my case, I have experienced speedups from around 0.7 seconds to hardly measurable time (0.00 to 0.01 seconds). This does not include the time needed to start the Python interpreter and importing the time module so a very popular site might still get you in trouble with your web host. I think the mod_cache module for Apache would take care of that too, but that wasn’t available in my case.

There is no way to remove the cache other than a rm * or similar in the cache directory. It works for me but probably not for a very dynamic site.

This cache module is used in production at Good Web Hosting Info with a timeout of one hour. The time measurement is shown at the bottom of the source code. It’s not a very busy site so it’s quite likely to get a page built from scratch if you look beyond the front page. The time precision is 1/100 second so cached pages normally have 0.000 of 0.010 seconds.

The python files are also available from here.

Search engine friendly urls with CherryPy’s default method

Thursday, December 29th, 2005

My fishing directory haven’t been a success at the search engines and I discovered that most of the pages were listed as Supplemental Results in Google. One reason for that can be urls with question marks and multiple parameters. So i tried to figure out how to implement that. Apache’s mod_rewrite is a common method for this but
it’s just very cryptic. I implemented a search engine friendly url system for an earlier version of CherryPy (0.x) with a filter but filters just have to be more complicated than the solution I found.

CherryPy has a default method that will be called in case of a partial match. It is explained in the CherryPy tutorial (scroll down to “Partial matches and the default method”). So, just define a method named default with a suitable amount of parameters and expose it. With a few lines of Python code I could transform urls like DOMAIN/?node1=10&node2=20 into DOMAIN/10/20 and DOMAIN/website?wid=123 into DOMAIN/website/123,


  def default(self, node1=0, node2=0, pagenr=1, perpage=10):
    if node1 == "website":
      return self.website(node2)
    else: #index method
      if node1 == 0 or node1== "0":
        node1 = ""
      if node2 == 0 or node2 == "0":
        node2 = ""
      return self.index(node1, node2, pagenr, perpage)
  default.exposed = True

The zero-checking is there so I can use DOMAIN/0/10 etc. for urls where node1 isn’t specified and still keep compatibility with the current index method.

I choose to keep perpage and pagenr as normal query string parameters to keep it simple and not overwhelm search engines with a massive amount of pages with friendly urls. The parameter list is adapted at my index method so it is a little hackish to use it for the website method, but it’s simple and it works. Now I just have to change all the output methods and ideally something for 301 redirecting the old-style urls to the new-style urls to get the old urls out of the search engine as fast as possible.

Adding columns to Sqlite tables behind SQLObject

Thursday, December 29th, 2005

So I finally got some time on the internet (stupid ISP, bad luck etc..) and one of the tasks to do was to add a feed column to the website table in my fishing directory. Since I am using Sqlite 2.8 (?) no ALTER TABLE construct is available (added with Sqlite 3) and SQLObject doesn’t make it any easier.

I did the same change while changing the rest of the code for this (so it should be a quick task once I got online and had access to the updated database) but it was a little complicated. It involved running two simultaneous SQLObject databases with one containing the extra column for feed urls and writing a special function for copying between the databases. It was in fact so complicated that i didn’t remember exactly how i did it and SQLObject started complaining about classes that were already in the class registry. And that class was part of a module I wasn’t importing this time. (Insert extremely frustrated smiley here).

This lead me to explore how this could be done with Sqlite. The recipe is quite simple. The sqlite database is in the olddatabase.db file and I will create a file named newdatabase.db for the new database.


C:\>sqlite olddatabase.db
sqlite> .output sqlitedump.sql
sqlite> .dump
sqlite> .quit

Now I had a dump of the freshest database without the feed column in sqlitedump.sql. Adding the feed column was as easy as opening sqlitedump.sql and inserting a new column into the SQL table definition. The second step was to search-and-replace ); with ,”); inside the sql for inserting the websites to account for the extra column and


C:\>sqlite newdatabase.db < sqlitedump.sql

The only things left were to add the feed column to the SQLObjectified Website class definition and point the database connection to the new database. Very simple compared to the SQLObject approach I had used before.

Case study: A Python-based CMS in a low-cost hosting environment

Friday, May 13th, 2005

I recently replaced the simple (PHP-based) backend for my Good Web Hosting Info site with something written in Python that at least deserves the CMS tag.

I decided to use CGI for this because it’s easiest to find web hosting for that. And if I want to start more websites I can just plug the CMS into a budget hosting account. If I later want to use a better framework (like CherryPy)I can easily port it because it is in fact structured like I have learned to structure CherryPy applications. The major difference is that I needed to write my own url-to-function code.

What it does

From the web interface I can add, edit and delete categories and articles, adjust the ordering of the categories and set a few configuration options, including page caching. The CMS also produces RSS feeds for all categories and writes them to static files.

How I built it

I started with a template manager and a view class that I have developed for CherryPy applications but they can just as well be used with any other framework. These two classes take all the stress out of building the interface and all that is left is to place the templates in the correct files. They are adapted to Cheetah but they can easily be modified to use most other template modules. I got a page up with PXTL within a few hours. Not bad considering that multiple templates are used for one page and this was my first time using PXTL.

With the view component figured out of the way I could dive straight into the database module. I started with the SnakeSQL pure-Python database and it worked well for what I told it to do even if it is not of production quality yet. What stopped me from progressing with SnakeSQL is that it can’t store longer strings than 256 characters.

Since hosting availability was an important factor the obvious database choice was MySQL. It doesn’t have all the features of Postgres and Firebird but this is a simple CMS with only two database tables so hosting availability is a far more important factor than stored procedures etc..

I have written my own module for form handling. It’s called hbform and represents each form with a Python class. It helps with validating user input, filling and manipulating forms, and generates xhtml code for the form controls. It can generate code for the whole form, or work together with a templating solution like Cheetah. I plan to release it some day…

CGI and efficiency

Besides the fact that the Python interpreter needs to be started for every request there are other factors that slow down CGI-based programs. First, each and every module needs to be reloaded. And Cheetah templates are compiled to Python classes so it’s smart to cache the compiled templates in a way. This turned out to be very hard with CGI.

With a persistent application my template manager stores the templates in an in-memory dictionary. The obvious CGI alternative was to pickle the dictionary with the templates but that didn’t work, I don’t remember the exact error but it was something with a function type used by Cheetah that is not picklable. Bummer!

Generating a page from scratch takes around 0.5 seconds and a little inaccurate profiling revealed that about 90% of this was used for template parsing. So I thought caching the templates was the key to speed. After struggling with this I managed to write the generated Python classes for the templates to the file structure and import them almost like regular Python modules. This resulted in a small speed increase (~0.1 second). I also tried to use non-compiled templates in the form of PXTL but that was even slower.

I was disappointed by this so I did more profiling and it turned out that ~0.1 second was originally used for template parsing and most of this was eliminated by template caching. But loading the Cheetah module took ~0.15 seconds. Strange, but I realized that caching the generated pages was a better approach. A few lines of code for pickling pages and the time to deliver a cached page is ~0.01 second. Problem solved :-)

Hosting environment compatibility

Good web Hosting Info is hosted by Site5 and they have Python 2.2.2 and the MySQLdb module installed. I run 2.4.1 at home so a few compatibility problems were to be expected. I experienced 3, involving the DateTime module that was introduced with 2.3, MySQLdb’s executemany function and a warning about different versions of the C interface used for the NameMapper module in Cheetah. Could have been worse.

I did a user-level install of Cheetah. Maybe Site5 would have installed it for me but I didn’t ask. I just copied the src directory of the Cheetah download and placed it in the base directory for my CMS. I removed namemapper.pyd (C code) to avoid warnings in the error log about different versions of the C interface. There is a fallback for the C component written in Python so this only leads to a little less speed. This installation procedure is not described in the Cheetah documentation but it has worked so far.

Conclusion

With the help of some decent modules, developing my own simplistic, efficient and usable CMS in Python, suitable for a low-cost hosting environment wasn’t too hard. I get exactly the functionality I want and if I need something else it’s easy to add since I know the code in and out. It’s also pleasing to run my own website on my own CMS.

Even if the Python support in budget hosting accounts normally is far from ideal it’s often sufficient for typical web applications. In my case, Python 2.2.2 and MySQLdb was all that was supplied from the web host but that is just what I needed. If you develop with a fresh Python version (2.4.1) the incompatibilities doesn’t have to be a problem.

Ps!

If you just want better Python support in a shared hosting environment, try a more specialized hosting provider like Python-hosting.com or GrokThis.net.