This blog gets way too much spam (but Akismet is damn accurate). I should just stop thumbing through it, it’s a waste of time.
So I figured I’d jot down some comments about Meimei so I don’t forget (since I’m waiting for someone to finish something so I can demo some changes they requested on one of the ColdFusion applications). Specifically, I’m concerned about mirror validation.
I want to have it so anyone can drop Meimei on their webhost and just run a partial mirror of 4scrape. When you first access the script, it looks at your system to see if it’ll run fine, A couple of things it checks –
- That the hostname is a DNS-resolvable string.
- That the DNS actually resolves to the server’s IP address.
- That it can successfully download an image from Suigintou.
- That Suigintou can download data from it.
- How large the pipes are (both up and down).
- That the PHP openssl extension is installed (or a usable openssl binary is in the search path)
Pretty straightforward checks. If everything looks like it’ll work, Meimei prompts the user for some configuration details — how much disk and bandwidth to allocate, whether or not to serve NSFW-flagged images, etc. Then the script rewrites itself to serve images and sends a message to Suigintou notifying that it exists, after which it’ll start getting requests.
There are two things that need to happen after this -
- The script needs to be able to update itself.
- Suigintou needs to make sure the script hasn’t been tinkered with.
I think I’m going to roll these into one function — basically put a known backdoor into Meimei such that it will run arbitrary, cryptographically signed code (hense the need for openssl) from Suigintou’s IP address. I dunno if people will bitch at this or not since they’re already effectively running mysterious code, and I tell them beforehand that it’s in there.
The idea is that Suigintou would be able to send an encrypted 3-tuple of (timestamp, hostname of meimei instance, code) which would then be decrypted with a public key embedded in the Meimei code. The remote IP address would be checked to make sure it was from Suigintou, then the timestamp would be checked against the current time to make sure it wasn’t older than a specific threshold and that it hadn’t been seen before. That, combined with the hostname should prevent replay attacks. Once those are validated, the code block is eval‘d and the output returned.
Since we’re running arbitrary code, you could send something like “send me the MD5 of yourself, in addition to this arbitrary string”. Or “try to fetch this image from yourself then tell me the MD5″. With arbitrary access the nodes can be pretty accurately verified to work, but I don’t know if people would be willing to install that. I’ll probably include an option to disable it (ie, it doesn’t get included at all when the script rewrites itself), then just not trust those nodes very much (ie, disable them when something is detected wrong).
The “something wrong” is basically one of two things — abuse of privileged download rights (ie, throttle lifted) and serving bad data. I don’t really give a shit about the former, but someone serving goatse for every request is certainly something I want to avoid. The most ideal way to accomplish this is to have the client (ie, end-user) verify that the images are valid. They can hypothetically do this, because the API gives them both the image link and MD5, however it means they’d be doing it in JavaScript. And while I’m sure FrozenVoid can implement a MD5′er (or, more realistically, a CRC32 summer) that’ll run faster than light in JavaScript, I’m not convinced I can. That would be ideal, though.
Assuming it gets implemented, when the client detected a bad mirror, it would send a request to Suigintou notifying of the checksum failure. Suigintou would reply with the locally-stored URL so the client could display the image, but then we’d have to validate that the mirror is bad. This is where we’d just dump the mirrors with remote-code-execution disabled (after a couple reports, at least) and run some code to validate on the ones that have it enabled.
I dunno. It really has to be a non-trivial system, unfortunately, because I’m not using trusted mirrors. I guess I should just implement it, set up an IRC channel to hang out with the initial people who are kind enough to run my sketchy PHP code and see what their take on it is.
9 comments