The random rantings of a concerned programmer.


March 14th, 2009 | Category: Random

hahaha, I fucking hate PHP.

But I finally got a Meimei instance running on someone else’s machine. God, there are so many fucking differences between PHP installs. It’s bloody ridiculous. The damn thing worked fine on my Dreamhost account last weekend, but they’re running Apache with mod_php (I think?). It looks like the guy who put this instance up is running Lighttpd (and possibly a different minor version of PHP).

And it doesn’t help that the language is such a fucking mess. There were a couple of really stupid fucking mistakes (res instead of ret) that would have been caught at compile time by a real language rather than this toy. Maybe I’ve just been pampered too much?

On the work front, we’ve got 5 days to put together a mobile version of our main web site (in PHP, of course — how the fuck did I turn into a fucking PHP web developer). God that shit’s pretty fucking bad; at least it’s easy “lol let’s generate sum html codez” rather than Meimei which is batshit insane from any perspective.

Goddamn I need another drink.



March 06th, 2009 | Category: Random

This blog gets way too much spam (but Akismet is damn accurate). I should just stop thumbing through it, it’s a waste of time.

So I figured I’d jot down some comments about Meimei so I don’t forget (since I’m waiting for someone to finish something so I can demo some changes they requested on one of the ColdFusion applications). Specifically, I’m concerned about mirror validation.

I want to have it so anyone can drop Meimei on their webhost and just run a partial mirror of 4scrape. When you first access the script, it looks at your system to see if it’ll run fine, A couple of things it checks –

  • That the hostname is a DNS-resolvable string.
  • That the DNS actually resolves to the server’s IP address.
  • That it can successfully download an image from Suigintou.
  • That Suigintou can download data from it.
  • How large the pipes are (both up and down).
  • That the PHP openssl extension is installed (or a usable openssl binary is in the search path)

Pretty straightforward checks. If everything looks like it’ll work, Meimei prompts the user for some configuration details — how much disk and bandwidth to allocate, whether or not to serve NSFW-flagged images, etc. Then the script rewrites itself to serve images and sends a message to Suigintou notifying that it exists, after which it’ll start getting requests.

There are two things that need to happen after this -

  1. The script needs to be able to update itself.
  2. Suigintou needs to make sure the script hasn’t been tinkered with.

I think I’m going to roll these into one function — basically put a known backdoor into Meimei such that it will run arbitrary, cryptographically signed code (hense the need for openssl) from Suigintou’s IP address. I dunno if people will bitch at this or not since they’re already effectively running mysterious code, and I tell them beforehand that it’s in there.

The idea is that Suigintou would be able to send an encrypted 3-tuple of (timestamp, hostname of meimei instance, code) which would then be decrypted with a public key embedded in the Meimei code. The remote IP address would be checked to make sure it was from Suigintou, then the timestamp would be checked against the current time to make sure it wasn’t older than a specific threshold and that it hadn’t been seen before. That, combined with the hostname should prevent replay attacks. Once those are validated, the code block is eval‘d and the output returned.

Since we’re running arbitrary code, you could send something like “send me the MD5 of yourself, in addition to this arbitrary string”. Or “try to fetch this image from yourself then tell me the MD5″. With arbitrary access the nodes can be pretty accurately verified to work, but I don’t know if people would be willing to install that. I’ll probably include an option to disable it (ie, it doesn’t get included at all when the script rewrites itself), then just not trust those nodes very much (ie, disable them when something is detected wrong).

The “something wrong” is basically one of two things — abuse of privileged download rights (ie, throttle lifted) and serving bad data. I don’t really give a shit about the former, but someone serving goatse for every request is certainly something I want to avoid. The most ideal way to accomplish this is to have the client (ie, end-user) verify that the images are valid. They can hypothetically do this, because the API gives them both the image link and MD5, however it means they’d be doing it in JavaScript. And while I’m sure FrozenVoid can implement a MD5′er (or, more realistically, a CRC32 summer) that’ll run faster than light in JavaScript, I’m not convinced I can. That would be ideal, though.

Assuming it gets implemented, when the client detected a bad mirror, it would send a request to Suigintou notifying of the checksum failure. Suigintou would reply with the locally-stored URL so the client could display the image, but then we’d have to validate that the mirror is bad. This is where we’d just dump the mirrors with remote-code-execution disabled (after a couple reports, at least) and run some code to validate on the ones that have it enabled.

I dunno. It really has to be a non-trivial system, unfortunately, because I’m not using trusted mirrors. I guess I should just implement it, set up an IRC channel to hang out with the initial people who are kind enough to run my sketchy PHP code and see what their take on it is.



March 01st, 2009 | Category: Random

So I came up with a brilliant solution to bandwidth/disk rape — write a small PHP script (Meimei) which caches images from 4scrape (Suigintou), give it out to people, and just randomly direct people to those mirrors with javascript. Validation issues aside (“oh hey let’s just serve lemonparty for every image”) — because those can be “fixed” by only allowing trusted mirrors — I don’t think it’s going to work.

I threw up a test version of the script on my DreamHost account and it completely shit itself. It worked for a while, then it started spewing 503 errors everywhere. I don’t know if this is an issue with DreamHost or my shitty code (or the fact that by putting it up to serve 100% of the requests, it was bombarded with almost 1000 hits in the few minutes that it was up), but it definitely needs a lot more tweaking before it’s usable. It’s one of those “ugh PHP” things and it’s just a bitch.

The version I have running right now is really stupid — it basically just acts as a caching HTTP proxy. To have the thing fully functional, it needs to do a lot more than just that (and the server stuff on Suigintou needs to be a bit more sophisticated too). Each Meimei instance needs to track how much bandwidth it’s used (it already tracks disk consumption); it needs a means to communicate back to Suigintou that it’s all filled up or out of bandwidth or whatever.

There also needs to be something which gives some notion of cache locality — ie, a Meimei might say “I only want images in the range 30000-40000 that are SFW and from /wg/”. That way it’ll consistently get handed those requests and hopefully hit the cache more often (though eventually, assuming unlimited disk, each Meimei instance will be a full mirror — this isn’t a sound assumption).

And then you hit mirror validation. I’d want to do in a distributed manner with JavaScript, ie, check the MD5 of a subset of the image data against a known value from the authoritative Suigintou, then report back to Suigintou if the mirror is giving bad data and not display the image. I don’t even know if that’s going to be possible in a realistic manner, or how evadable my checks would be. I think if there aren’t more than 5-7 Meimei instances they’ll either get overloaded with requests, not be clustered enough to achieve decent locality, or just be otherwise ineffective at staving off bandwidth consumption.

Would be nice to have it all working and shit but ugh it’s almost too much work. Bleh.