PHP Unserialization in Python, and the Mysterious str.partition
At work we’re migrating our content to a Drupal install (it’s better than the ColdFusion piece of shit currently running). Since the interface is a piece of shit and the language is a piece of shit, I find myself doing a lot of things with Python, directly manipulating the database. Which means I effectively have ended up writing my own ORM layer for it.
Drupal, like many shit PHP “applications” stores serialized blobs of data in the database. To marshal the data back and forth they use PHP’s built-in functions which use an obtuse plaintext format (rather than say, JSON, though arguably it’s for historical reasons). Anyway so I found myself needing to unserialize this shit in Python and didn’t want to code my own. So I did what any time-starved person would do and Googled for existing code.
The results were laughably bad. This one goes as far as using a fucking regex-based parser. They’re all horribly long and convoluted with absolutely massive functions which completely mask any scent of structure.
I think the grossest thing though, is that each of them (including the one on PyPI I found later, which is based off the first link above) goes through each individual character in the string looking for an instance of a specific character. They’re effectively implementing strchr, and in a very shitty way.
Python doesn’t have a strchr, but it has str.partition which is an even more useful. I can’t fatholm why none of the fucking implementations used it when it makes the code much cleaner, faster (presumably, as a built-in it’s implemented with strchr in CPython) and more concise.
Anyway tl;dr internet is filled with shit code so I wrote my own (edit: relink) which, although completely undocumented isn’t a massive fucking bloatfest.
Fuck.
14 comments(Untitled)
lol, so for the first programming languages assignment we basically have to write a topological sort in 5 languages: Python, Ruby, O’Caml, C and Cool.
So I started off writing the Python implementation because I figured it’d be the easiest. Somewhere along the way though, I decided to be a retard and abuse map/reduce/lambda and ended up with some serious eye rape. It works though, so I’ll probably just turn it in:
#!/bin/FORCED_INDENTATION_OF_THE_CODE
# prepare for major eye-rapage
from sys import stdin, stdout
def get_input( Work = [] ):
'''Builds a list of ( task, dependancy ) tuples from stdin.'''
task = stdin.readline().strip()
depd = stdin.readline().strip()
if ( not task or not depd ):
return Work
return get_input( Work + [ ( task, depd ) ] )
def collapse_dependancies( TaskList ):
'''Takes the tasklist and reduces it to a list where each task
is represented exactly once in the form (Task, [Dependancies...]).'''
# may guido forgive me
return reduce( lambda Work, Elem: ( Work.update( { Elem[0]: Work.get( Elem[0], [] ) + [ Elem[1] ], Elem[1]: Work.get( Elem[1], [] ) } ), Work )[1], TaskList, {} )
def grab_doable_tasks( TaskList ):
'''Takes a collapsed tasklist and returns a list of tasks which have no
dependancies.'''
# Elem is a task key into TaskList, which is a { task : [ dependancies... ] }
return reduce( lambda Work, Elem: ( Work + ( [], [ Elem ] )[len(TaskList[Elem]) == 0] ), TaskList, [] )
def remove_dependancy( TaskList, Dependancy ):
'''Goes through the collapsed TaskList and removes the specified Dependancy
from the dependancy lists of every entry.'''
NewTaskList = {}
for Task in TaskList.keys():
# Could have avoided the filter/lambda with a list comprehension but <3 lambda
NewTaskList[Task] = filter( lambda Elem: Elem != Dependancy, TaskList[Task] )
return NewTaskList
def generate_ordering( TaskList, Work = [] ):
'''Takes the collapsed tasklist, gets the doable tasks, sorts 'em, then
chooses the first task alphabetically. Then removes the task
and dependancies from the list, and loops. If it reaches a point where
there are tasks to do but none are doable then it's a cycle, and returns
None. Otherwise returns a list with the desired task ordering.'''
if ( len( TaskList ) == 0 ):
return Work
DoableTasks = grab_doable_tasks( TaskList )
DoableTasks.sort() # ffh, in-place sorting >_<
if ( len( DoableTasks ) == 0 ):
return None
return generate_ordering( remove_dependancy( dict( [ ( Key, TaskList[Key] ) for Key in TaskList if Key != DoableTasks[0] ] ), DoableTasks[0] ), Work + [ DoableTasks[0] ] )
if ( __name__ == "__main__" ):
l = generate_ordering( collapse_dependancies( get_input() ) )
if ( l == None ):
print "Cycle"
else:
for i in l:
print "%s" % i
Lots of fun one-liners, which, honestly kind of violate some of rules of lambda (destructive operations, etc) but it runs so meh.
At the bottom of the assignment page, it says that they were going to add in a LOLCODE bonus implementation, but that they couldn't find a suitable interpreter/compiler. I'm kind of tempted to, for the C implementation, write a LOLCODE interpreter which reads in the topological sort from a file and executes it.
Granted, this means I haven't been working on my other projects, haha.
No commentsForay Embedding Pythong
I was bored today and decided to poke around with Boost.Python, because I’ve never used it before. I didn’t want to write a trivial “hello world” application, I so instead decided to write a cellular automata framework which reads in a Python script for the atomic cell update stuff.
Turns out that it’s actually really nice (once it’s working, lawds), to expose a function there’s some sickening hacks that allow this beautiful syntax -
int getCell( int x, int y ) {
return *getCell_( x, y );
}
void setCell( int x, int y, int v ) {
*getCell_( x, y ) = v;
}
BOOST_PYTHON_MODULE( CA ) {
def( "getCell", getCell );
def( "setCell", setCell );
}
Loading the python script from file wasn’t too hard. The tutorial code I was using was a little bit off – the tutorial code needs to pass in the namespace for both the global and local variable dictionaries otherwise the interpreter shits all over itself (at least it did for me, running Python2.5 etc). Anyway, the code to actually create the main namespace for the interpreter and load in the script from a file looks like this -
object main_module, main_namespace;
try {
main_module = import( "__main__" );
main_namespace = main_module.attr( "__dict__" );
exec_file( argv[4], main_namespace, main_namespace );
}
catch( error_already_set const& ) {
PyErr_Print();
running = false;
}
Which gets used to (in the debugger) load the following Python script:
import CA import random g_neighbors = [] g_width = 0 g_height = 0 def init( width, height ): globals()["g_width"] = width globals()["g_height"] = height for y in xrange( 0, height ): for x in xrange( 0, width ): CA.setCell( x, y, random.randint( 0, 1 ) ) globals()["g_neighbors"].append( 0 ) def precalc(): for y in xrange( 0, g_height ): for x in xrange( 0, g_width ): g_neighbors[ y * g_width + x ] = 0 for y in xrange( 0, g_height ): for x in xrange( 0, g_width ): if CA.getCell( x, y ): for iy in xrange( -1, 2 ): for ix in xrange( -1, 2 ): if ( ix or iy ): g_neighbors[ ( ( y + iy ) % g_height ) * g_width + ( ( x + ix ) % g_width )] += 1 def update( x, y ): #count = 0 #for iy in xrange( -1, 2 ): # for ix in xrange( -1, 2 ): # if ( ix or iy ): # count += CA.getCell( x + ix, y + iy ) count = g_neighbors[ y * g_width + x ] if CA.getCell( x, y ): if count < 2: return 0 elif count > 3: return 0 else: return 1 else: if count == 3: return 1 else: return 0
Calling the functions from within the C++ application is fairly straightforward, because the generic PyObject wrapper that Boost.Python provides defines an overloaded operator(), which makes calling the grabbed functions really easy:
object init = main_namespace["init"]; init( screen_width, screen_height );
Despite the ease with which I was able to embed Python into my application with Boost.Python, there was a significant performance overhead for doing so. The Python implementation of Conway’s Life performed approximately 4x slower than the C++ implementation. While for many applications the advantages of writing the logic in Python may far outweigh performance losses, for cellular automata it just isn’t feasible, since the logic is the primary bottleneck for the simulation.
Fun to tinker with, nonetheless. The source and binary can be obtained here (165KB).
2 comments
