The random rantings of a concerned programmer.

PHP Unserialization in Python, and the Mysterious str.partition

July 08th, 2009 | Category: Random

At work we’re migrating our content to a Drupal install (it’s better than the ColdFusion piece of shit currently running). Since the interface is a piece of shit and the language is a piece of shit, I find myself doing a lot of things with Python, directly manipulating the database. Which means I effectively have ended up writing my own ORM layer for it.

Drupal, like many shit PHP “applications” stores serialized blobs of data in the database. To marshal the data back and forth they use PHP’s built-in functions which use an obtuse plaintext format (rather than say, JSON, though arguably it’s for historical reasons). Anyway so I found myself needing to unserialize this shit in Python and didn’t want to code my own. So I did what any time-starved person would do and Googled for existing code.

The results were laughably bad. This one goes as far as using a fucking regex-based parser. They’re all horribly long and convoluted with absolutely massive functions which completely mask any scent of structure.

I think the grossest thing though, is that each of them (including the one on PyPI I found later, which is based off the first link above) goes through each individual character in the string looking for an instance of a specific character. They’re effectively implementing strchr, and in a very shitty way.

Python doesn’t have a strchr, but it has str.partition which is an even more useful. I can’t fatholm why none of the fucking implementations used it when it makes the code much cleaner, faster (presumably, as a built-in it’s implemented with strchr in CPython) and more concise.

Anyway tl;dr internet is filled with shit code so I wrote my own (edit: relink) which, although completely undocumented isn’t a massive fucking bloatfest.




January 16th, 2008 | Category: Random

lol, so for the first programming languages assignment we basically have to write a topological sort in 5 languages: Python, Ruby, O’Caml, C and Cool.

So I started off writing the Python implementation because I figured it’d be the easiest. Somewhere along the way though, I decided to be a retard and abuse map/reduce/lambda and ended up with some serious eye rape. It works though, so I’ll probably just turn it in:

# prepare for major eye-rapage
from sys import stdin, stdout

def get_input( Work = [] ):
    '''Builds a list of ( task, dependancy ) tuples from stdin.'''
    task = stdin.readline().strip()
    depd = stdin.readline().strip()

    if ( not task or not depd ):
        return Work
    return get_input( Work + [ ( task, depd ) ] )

def collapse_dependancies( TaskList ):
    '''Takes the tasklist and reduces it to a list where each task
       is represented exactly once in the form (Task, [Dependancies...]).'''
    # may guido forgive me
    return reduce( lambda Work, Elem: ( Work.update( { Elem[0]: Work.get( Elem[0], [] ) + [ Elem[1] ], Elem[1]: Work.get( Elem[1], [] ) } ), Work )[1], TaskList, {} )

def grab_doable_tasks( TaskList ):
    '''Takes a collapsed tasklist and returns a list of tasks which have no
    # Elem is a task key into TaskList, which is a { task : [ dependancies... ] }
    return reduce( lambda Work, Elem: ( Work + ( [], [ Elem ] )[len(TaskList[Elem]) == 0] ), TaskList, [] )

def remove_dependancy( TaskList, Dependancy ):
    '''Goes through the collapsed TaskList and removes the specified Dependancy
       from the dependancy lists of every entry.'''
    NewTaskList = {}
    for Task in TaskList.keys():
        # Could have avoided the filter/lambda with a list comprehension but <3 lambda
        NewTaskList[Task] = filter( lambda Elem: Elem != Dependancy, TaskList[Task] )
    return NewTaskList

def generate_ordering( TaskList, Work = [] ):
    '''Takes the collapsed tasklist, gets the doable tasks, sorts 'em, then
       chooses the first task alphabetically. Then removes the task
       and dependancies from the list, and loops. If it reaches a point where
       there are tasks to do but none are doable then it's a cycle, and returns
       None. Otherwise returns a list with the desired task ordering.'''
    if ( len( TaskList ) == 0 ):
        return Work
    DoableTasks = grab_doable_tasks( TaskList )
    DoableTasks.sort() # ffh, in-place sorting >_<
    if ( len( DoableTasks ) == 0 ):
        return None
    return generate_ordering( remove_dependancy( dict( [ ( Key, TaskList[Key] ) for Key in TaskList if Key != DoableTasks[0] ] ), DoableTasks[0] ), Work + [ DoableTasks[0] ] )
if ( __name__ == "__main__" ):
    l = generate_ordering( collapse_dependancies( get_input() ) )
    if ( l == None ):
        print "Cycle"
        for i in l:
            print "%s" % i

Lots of fun one-liners, which, honestly kind of violate some of rules of lambda (destructive operations, etc) but it runs so meh.

At the bottom of the assignment page, it says that they were going to add in a LOLCODE bonus implementation, but that they couldn't find a suitable interpreter/compiler. I'm kind of tempted to, for the C implementation, write a LOLCODE interpreter which reads in the topological sort from a file and executes it.

Granted, this means I haven't been working on my other projects, haha.

Comments are off for this post

Foray Embedding Pythong

December 29th, 2007 | Category: Random

I was bored today and decided to poke around with Boost.Python, because I’ve never used it before. I didn’t want to write a trivial “hello world” application, I so instead decided to write a cellular automata framework which reads in a Python script for the atomic cell update stuff.

Turns out that it’s actually really nice (once it’s working, lawds), to expose a function there’s some sickening hacks that allow this beautiful syntax -

int getCell( int x, int y ) {
	return *getCell_( x, y );

void setCell( int x, int y, int v ) {
	*getCell_( x, y ) = v;

	def( "getCell", getCell );
	def( "setCell", setCell );

Loading the python script from file wasn’t too hard. The tutorial code I was using was a little bit off – the tutorial code needs to pass in the namespace for both the global and local variable dictionaries otherwise the interpreter shits all over itself (at least it did for me, running Python2.5 etc). Anyway, the code to actually create the main namespace for the interpreter and load in the script from a file looks like this -

object main_module, main_namespace;
try {
	main_module = import( "__main__" );
	main_namespace = main_module.attr( "__dict__" );
	exec_file( argv[4], main_namespace, main_namespace );
catch( error_already_set const& ) {
	running = false;

Which gets used to (in the debugger) load the following Python script:

import CA
import random

g_neighbors = []
g_width = 0
g_height = 0

def init( width, height ):
	globals()["g_width"] = width
	globals()["g_height"] = height
	for y in xrange( 0, height ):
		for x in xrange( 0, width ):
			CA.setCell( x, y, random.randint( 0, 1 ) )
			globals()["g_neighbors"].append( 0 )

def precalc():
	for y in xrange( 0, g_height ):
		for x in xrange( 0, g_width ):
			g_neighbors[ y * g_width + x ] = 0
	for y in xrange( 0, g_height ):
		for x in xrange( 0, g_width ):
			if CA.getCell( x, y ):
				for iy in xrange( -1, 2 ):
					for ix in xrange( -1, 2 ):
						if ( ix or iy ):
							g_neighbors[ ( ( y + iy ) % g_height ) * g_width + ( ( x + ix ) % g_width )] += 1

def update( x, y ):
	#count = 0
	#for iy in xrange( -1, 2 ):
	#	for ix in xrange( -1, 2 ):
	#		if ( ix or iy ):
	#			count += CA.getCell( x + ix, y + iy )
	count = g_neighbors[ y * g_width + x ]

	if CA.getCell( x, y ):
		if count < 2:
			return 0
		elif count > 3:
			return 0
			return 1
		if count == 3:
			return 1
			return 0

Calling the functions from within the C++ application is fairly straightforward, because the generic PyObject wrapper that Boost.Python provides defines an overloaded operator(), which makes calling the grabbed functions really easy:

object init = main_namespace["init"];
init( screen_width, screen_height );

Despite the ease with which I was able to embed Python into my application with Boost.Python, there was a significant performance overhead for doing so. The Python implementation of Conway’s Life performed approximately 4x slower than the C++ implementation. While for many applications the advantages of writing the logic in Python may far outweigh performance losses, for cellular automata it just isn’t feasible, since the logic is the primary bottleneck for the simulation.

Fun to tinker with, nonetheless. The source and binary can be obtained here (165KB).