This blog gets way too much spam (but Akismet is damn accurate). I should just stop thumbing through it, it’s a waste of time.
So I figured I’d jot down some comments about Meimei so I don’t forget (since I’m waiting for someone to finish something so I can demo some changes they requested on one of the ColdFusion applications). Specifically, I’m concerned about mirror validation.
I want to have it so anyone can drop Meimei on their webhost and just run a partial mirror of 4scrape. When you first access the script, it looks at your system to see if it’ll run fine, A couple of things it checks –
- That the hostname is a DNS-resolvable string.
- That the DNS actually resolves to the server’s IP address.
- That it can successfully download an image from Suigintou.
- That Suigintou can download data from it.
- How large the pipes are (both up and down).
- That the PHP openssl extension is installed (or a usable openssl binary is in the search path)
Pretty straightforward checks. If everything looks like it’ll work, Meimei prompts the user for some configuration details — how much disk and bandwidth to allocate, whether or not to serve NSFW-flagged images, etc. Then the script rewrites itself to serve images and sends a message to Suigintou notifying that it exists, after which it’ll start getting requests.
There are two things that need to happen after this -
- The script needs to be able to update itself.
- Suigintou needs to make sure the script hasn’t been tinkered with.
I think I’m going to roll these into one function — basically put a known backdoor into Meimei such that it will run arbitrary, cryptographically signed code (hense the need for openssl) from Suigintou’s IP address. I dunno if people will bitch at this or not since they’re already effectively running mysterious code, and I tell them beforehand that it’s in there.
The idea is that Suigintou would be able to send an encrypted 3-tuple of (timestamp, hostname of meimei instance, code) which would then be decrypted with a public key embedded in the Meimei code. The remote IP address would be checked to make sure it was from Suigintou, then the timestamp would be checked against the current time to make sure it wasn’t older than a specific threshold and that it hadn’t been seen before. That, combined with the hostname should prevent replay attacks. Once those are validated, the code block is eval‘d and the output returned.
Since we’re running arbitrary code, you could send something like “send me the MD5 of yourself, in addition to this arbitrary string”. Or “try to fetch this image from yourself then tell me the MD5″. With arbitrary access the nodes can be pretty accurately verified to work, but I don’t know if people would be willing to install that. I’ll probably include an option to disable it (ie, it doesn’t get included at all when the script rewrites itself), then just not trust those nodes very much (ie, disable them when something is detected wrong).
Assuming it gets implemented, when the client detected a bad mirror, it would send a request to Suigintou notifying of the checksum failure. Suigintou would reply with the locally-stored URL so the client could display the image, but then we’d have to validate that the mirror is bad. This is where we’d just dump the mirrors with remote-code-execution disabled (after a couple reports, at least) and run some code to validate on the ones that have it enabled.
I dunno. It really has to be a non-trivial system, unfortunately, because I’m not using trusted mirrors. I guess I should just implement it, set up an IRC channel to hang out with the initial people who are kind enough to run my sketchy PHP code and see what their take on it is.9 comments
So I just got back from a “watch girls wrestle half-naked in a pool filled with jello” party, and it was fairly entertaining. Got to see tits, grabbing, etc while drinking free (as in free beer) beer and enjoying the company.
Anyway, now that I got that bit of bragging off my shoulders. As I was walking home I was thinking about Dennou Coil (since I’m modelling my Erlang MUD’s world around it, a bit). The best parts of Dennou Coil revolve around fight scenes, where the characters battle each other (and the constructs in the system) with programs manifested within objects. These programs are forged from items called “metabugs”, which aren’t really explained well, but are rare and sought-after glitches which occur in the digital space.
The thing is, the characters need metabugs to produce the programs they use (really, they barter with other people who can use the metabugs and refine them into things). This was always a bit curious to me, since a program is just data. And data can easily be replicated, so why can’t they just make a couple copies of that powerful metatag they only have 2 more of?
Well, durr, security. They’re interacting with a large system, and any such a system is going to have restrictions on what kind of code it’s going to execute, so some random user can’t just execute any arbitrary code. I mean, stuff like this is already done today, shadow memory and shit where you pad all the data in the machine with extra attribute bits to mark executability or taint or whatever the hell you want.
And then you introduce the Imago and encodings and it fucks up my entire hypothesis, lol.
But I was thinking, if I were to build this MUD, I’d have to have a fairly good idea of how everything “works” in the make-believe system, so that it all makes sense and shit. I think you’d be able to collect metabugs, but each bug actually is represented as a block of data. The actual data is essentially random (garbage data), and as garbage data is affected by the environment which spawned it.
The characteristics of the metabug would be gleaned from the value of the garbage data; things like color, shape, size, images and sounds perceived within, etc, in addition to what you could turn it into would all be dependent on this data, such that you can just supply any random string of garbage data and it would always describe a unique metabug.
Essentially procedural metabugs.
This, of course, has all the drawbacks of procedural generation: there’s a good chance it could generate dull, boring and expansive possibilities. Which isn’t good.
I dunno. Need to think moar I guess.