The random rantings of a concerned programmer.


March 24th, 2009 | Category: Random


God I was fucked up last night (again). I remember trying to compile devel/cross-gcc for some reason (and laughing when I remembered I had to apply patches to get the arm-eabi version to build). I should submit those patches upstream one of these days.

You know what’s always bothered me a bit? Why don’t NDS emulations use dynamic recompilation to make them not slow as balls? Naturally, due to Nintendo’s API model (ie, “nothing” except memory-mapped registers) there are a couple tricks — not only do those registers have to be emulated (by mmap‘ing those regions and reading them with a helper thread in the same process space), but you’d also have to simulate interrupts.

On the DS, interrupts aren’t that bad. There’s a couple of IRQ tables sitting off in memory where you put pointers to your interrupt handlers. Simulating interrupts are basically just a matter of pthread_suspend_np‘ing the application code, jumping to it’s interrupt function, then resuming the application (I think).

And then there’s the actual recompilation of the ARM binaries to a native assembly (honestly, it’d be better to recompile to LLVM or GCC bytecode, then push that through a compiler to generate the native assembly), which I imagine is a massive pain in the ass.

Not nearly as much a pain in the ass as going through and figuring out what all the hundreds of little weird places in memory are supposed to do. It’d be a lot of digging through reverse-engineered documentation and diving into the code for other emulators and stuff.

But I think this is basically the only way to get any reasonable performance. I think emulating the hardware with an ARM interpreter is a needlessly costly way to implement the functionality (though it works, for the most part). Blah.


4scrape — further ideas

March 10th, 2009 | Category: Random

I know I’ve been ranting a lot about Meimei, but hopefully I’ll have an early version ready to deploy in a couple days (want to have a couple instances deployed on some friends’ hosting first so I can make sure everything works). Once that’s taken care of, there are plenty more sub-projects to tackle -

User-Aided Autonomous Metadata Creation

A user requests an image (landing page). The server checks the cookies and sees that no other images are in the cookies. No data is modified server-side, but the client now has that image’s ID in its cookies. It requests another image. The server checks the cookies and sees that there is one image ID in the cookies. It modifies the “correlation files” for both the image requested and the one in the cookies to reflect the correlation (that is, it increments a count in each file for the other’s ID, creating an entry for the image if there wasn’t one). The client’s cookies now contain both image’s IDs. The client requests a third image. The server checks the cookies and sees two image IDs. It modifies the files for both of those images to reflect the correlation with the current one and the file for the current one to reflect the correlation with the other two. The client’s cookies now contain three image IDs.

And so on.

Meanwhile, any time anyone requests an image, the server checks the file for that image. The top ten or so images with the highest seen count are displayed.

Basically, the idea is that people tend to cluster into groups which have similar image tastes. By tracking what images are viewed in a single session, you can cluster images by preference group. Those clusterings can then be used as a smart image suggestion system.

A similar system could be implemented, in which the images clicked during a search are auto-tagged with the search terms used. This would be kind of useful, but ultimately wouldn’t really do anything with the current search system (since there are no relevancy rankings — everything’s sorted by date).

Image Analysis — Common Colors

Matt wrote a nice Python script like 6 months ago that (as I remember) would take an image, quantize it, then return the N most recurring colors. If you ran each image through this, a color index could be created, such that you select a color (or two) with a JavaScript color picker and the search would be able to limit images by that color.

Tag Scrape/MD5 Lookup Service

There’s a bunch of sites out there which index basically the same image data — just do an image search on IQDB and you’ll get the same image back from multiple sites. I think all the sites IQDB indexes (except 4scrape) are Danbooru instances, which means they have delicious tags which can be scraped by MD5.

It’s kind of what Anisearch already does — aggregate tags from a bunch of Danbooru instances and throw them into a search index. Taking it a step further though, the service could not only aggregate tags, but provide an API to query against the image MD5 (to get a list of tags) and to adjust the tag weights.

Once there was a service, it would be cool to integrate with other software like pImgDB (and have a Danbooru plugin to facilitate a push model rather than aggressive scraping) and have a massive image MD5 <-> tags thing.

Better Fulltext Indexing

Right now 4scrape uses PostgreSQL’s fulltext search engine. While it works and all, it’s kind of gross. At the very least, it needs a natural language wrapper which parses the queries and formats the search query properly (right now it just AND’s all terms together).

Still, it would be kind of cool to re-implement an existing fulltext index like Whoosh (a Python one) in Haskell. Some of the projects at work use Solr, which is a gross REST-based webservice (with support for some other stuff) built on top of Lucene. And it works really well except that it’s slow as fuck. There are a couple of other fulltext search solutions (Oracle, etc) but they’re ugh.

I’d be nice to have a fancy pants fulltext search index written in Haskell.

I don’t have nearly enough free time :(

PS: If anyone wants to implement any of this (since they can all mostly be implemented as services external to 4scrape), let me know and we can work something out :3


Erlang MUD I

November 28th, 2007 | Category: Random

I’ve been tinkering with the idea of writing a MUD in Erlang, a functional language which is focused for concurrent fault-tolerant applications and supports distributed systems out of the box. It’s a pretty nifty language, after using it for a couple of small random things, I wouldn’t really think about using anything else for small projects involving networks or threads.

You can’t change a variable.

Program structure in Erlang is a bit different from imperative languages in that you’re not really thinking about code in a flowing manner, but rather thinking in terms of processes and their interactions. The Erlang VM implements it’s own lightweight process system. Processes interact through messaging rather than shared memory, which is really nice.

Erlang also has a nice built-in database back-end, Mnesia. It’s nice because it seems pretty simple to use, and can be spread over multiple physical nodes (since it’s written in Erlang) for added redundancy. Should be fun.

The way I think I’m going to structure my MUD server is as such – each client has two separate processes. One of these processes is a message producer, it just sits on the socket and waits for data to come in, passing it to the second process. The second process also has access to the client socket, and is responsible for sending stuff. So when it receives the parsed data from the first socket, it can either send the client an “invalid message” string, or it can send another message to a higher structure, or alter the player’s state, or whatever.

That’s just the basics really. I need to sit down and actually work out the design of the MUD before I commit to any program structures, because MUDs are one of the types of games in which a lot can be done (in terms of gameplay). MUDs are so different from modern games that I don’t really want to take concepts from graphical games and just slap them around until they work in a text-environment. I want to develop my core mechanics with the environmental limitations in mind.

lol, so I’m basically still at square 1, “CHOOSE TECHNOLOGY”. Which arguably isn’t even a very good square to start with in some situations, but probably is just fine for this one.