Archive for July, 2009
- Looks like someone tried to break down the door to my house (and they almost succeeded — the screws to the door handle were almost ripped out). Gonna have to start deadbolting the motherfucker because shit. THEY COULD HAVE STOLEN MY ROOMBA AND EEEPC THE HORROR.
- Tried to buy an xbox 360 today. Would have succeeded if they didn’t fucking slip me a Kung-Fu-Squirrel/Lego-Indiana-Nazis bundle instead of the fucking Halo 3/Fable 2 they have advertised throughout the store. Sneaky assholes (though they’ll let me swap it tomorrow).
- I have shit taste in women.
- I think I’m going to
buyobtain (after all, I would never admit to purchasing a copy of an operating system) a copy of Windows so I can dick around with F#. Possibly related to #2.
Also, I think I’m pregnant.21 comments
I no longer have the time or energy to continue maintaining the 4scrape codebase. There’s currently a crapton of issues with it –
- Cache for searches (potentially just post searches) is broken.
- Threads need to be cached as a whole unit — assembling them from a 500,000-row table is too slow.
- General consistency errors — there’s a bunch of images missing (???)
- The scraper likes to shit itself to keep things lively.
- The backend would occasionally crash/spinlock (???)
I took the entire server offline for now, but I’m going to re-post the shit code here along with a complete dump of the entire database (over a year’s worth of posts, lawds). I don’t really know how to offload the image data (~150GB) — I’ll talk to Shuugo about setting up a torrent (or just turn the web server back on and let people hammer the extra 150mbps).
It was a fun run; I had a good time. Sorry for crapping out on you :(
EDIT: 4scrape source code [178KB]. I should have the full database dump uploaded some time tomorrow (it’s xbox hueg, a little over 500MB compressed comprising almost a million posts and metadata for over 1/3 million images). I’m not quite sure how I’m going to offload the image data yet but it’ll either be over bittorrent, or providing unthrottled HTTP access to the image data (but not the API/front-end, so you’ll need the database dump (or just 1.jpg 2.jpg 3.jpg etc — though you’ll miss *.png etc) to accidentially the whole thing).
EDIT2: And the SQL dump [505MB]. I was running PostgreSQL 8.3; you can probably massage the dump to fit into something else but it uses some Postgres-only features (ie, for the fulltext indexes) so it isn’t going to fit into anything else (ie, MySQL/MSSQL/whatever) as-is. Additionally, I turned back on the image access (for what’s still there) and turned off the access restrictions. Feel free to hammer the balls out of the server while I figure out a better way to get them off.
EDIT3: Go to 4walled.org instead.454 comments
I spent a good hour today at work writing a scraper to pull all my old highschool postings from a gamedev.net blog and pushed them into WordPress here. Not that there’s any actual good content in there; it’s just kind of nice to have everything in one place.
I was thumbing through some of the entries and — holy shit, I was a fucking retard a couple years ago. Admittedly I’m still a retard, but I was even worse back then. And I like to think that I’m less of an emo faggot now (bawww I’m socially inept), but I’m probably just less vocal about my insecurities. OR SOME STUPID SHIT LIKE THAT.
Just spent a couple of hours wiping through the last bit of main storyline on Disgaea 3. I want to try to spend the rest of the weekend working on my Erlang shit and not wasting it drinking or vidya’ing though. Especially since I broke something and drinking two beers makes me really nauseous (yet not drunk — the worst possible combination) these days. Maybe some peptobismal will help…3 comments
So I’ve spent some more time today dicking around with Comet, specifically the Erlycomet implementation (which is an implementation of the Bayeux protocol). The Bayeux protocol in itself is pretty interesting (though the documentation is shit) though I have some issues with it.
Effectively, an implementation of the protocol has a single endpoint by which all requests are made. Each request is associated with a “channel” and specifies an operation to make on that channel. The following operations are –
- subscribe to the channel — server will send any updates
- unsubscribe — server will stop sending updates
- publish — server will push the raw data to anyone subscribed to the channel
The first two are fairly straightforward, but the third is kind of hazy and probably varies widely by implementation. Obviously, there’s a couple of ways that the publish event can be handled –
- Push the data out to all subscribers without any processing.
- Have the server review/transform/store the data before broadcasting it.
- Don’t broadcast stuff — just send the response back to the publishing client.
I should note here that 3 is actually an option in the Bayeux specification — any channel matching /service/* is a special channel which doesn’t permit subscribers and just pushes responses back to the client. And Erlycouch implements that option.
The protocol doesn’t really provide a means to get to #2. From what I can glean from it, they expected the clients to listen on one channel and publish to another, then have “server clients” read and publish or god I don’t fucking know it’s retarded. Anyway, the guy who wrote Erlycomet probably thought it was retarded too because he layered his own protocol on top of the Bayeux specification specifically to make RPC requests.
Basically, if you send a message to Erlycomet which, in the JSON payload has the following fields –
- id — opaque identifier which gets passed back to the client in the response
- method — name of the Erlang function in the RPC module to call
- params — list of parameters to pass to the RPC function
then it tries to call the function specified. If the call works (ie, doesn’t raise an exception due to invalid arguments or missing function name or whatever) it broadcasts the result of the RPC call. If it doesn’t, it just reverts back to #1 and broadcasts the raw message to anyone subscribed.
In my application, I don’t want people sending random unvalidated shit around, so I ripped that “if the RPC call fails send the raw message” bullshit out. Every message is an RPC call which then pushes the validated message to any subscribed clients (and stores the state persistently and shit).