The random rantings of a concerned programmer.

Archive for September, 2008

(Untitled)

September 19th, 2008 | Category: Random

It’s a bit past midnight and I’m kind of drunk (and trying to work on LickPens) so I guess I’m going to try and talk my way through the design of the fetcher thing. There’s two types of data which need to be fetched and parsed — the gallery.xml indexes and the actual image data referenced within them (both thumbnails and the actual images).

Since the fetch order (gallery.xml -> thumbs -> full images) is ordered from most memory intensive to least, I think a tiered system would be the most efficient way to organize things –

                               
               |     Images    |    
          |       Thumbnails        |    
     |       URLs of unfetched images    |    
|        URLs of unfetched gallery.xmls       |

I’m not really sure about keeping the bottom tier around, since I want to use shit like 4scrape’s random feed where if you fetched a gallery.xml again the contents would be different. So there wouldn’t be a point of saving the URL after freeing the data. Other than that, this setup is fairly straightforward on paper.

I just can’t figure out how the fuck to implement it. I mean, the simplistic way I figure it, the images only need to be fetched when a person wants to zoom up on a thumbnail (so it can be fetched on-demand and freed with the thumbnail). The thumbnails can be kept in a deque and just popped off as you scroll away from them (and then their URL would have to be popped onto the list of unfetched images).

It’s basically a zipper where the zipper bit is a deque, ie –

[ [unfetched1], [thumbnails], [unfetched2] ]

When we’re scrolling to the right, the fetcher pops URLs off the front of unfetched2 and fetches the data, pushing the fetched thumbnails onto thumbnails, then popping off the other end (freeing the image data) and pushing the old URLs onto unfetched1. As more gallery.xml‘s are fetched, the image URLs are pushed onto one of the unfetched deques and entries are popped from the other unfetched deque.

I guess the thing I’m dancing around is that I don’t want to have to maintain three separate deques (but will probably end up doing it anyway) — there’s got to be some sane interface which can handle this kind of data structure. I guess I should just use three deques for now and refactor later or something. Or just pass the fuck out already, lawds :<

Comments are off for this post

So libcurl and openssl walk into a bar…

September 17th, 2008 | Category: Random

After spending all of last night embarking on a RuneQuest (3rd ed.) campaign in which the GM had to bring in this gigantic motherfucker with a fucking massive maul to finish off our unarmed characters (who were surrounded by the bodies of those that they had slain) to further the plot, I finally got a chance to spend some time working on that PicLens clone for *nix.

Little did I know there were so many fucking cans of worms in this fucking project.

I wrote a skeleton renderer with OpenGL and shit so I’ve got pretty quads flying around the screen, so I figured the next step was to throw libcurl into the mix and pull some images or something. Let me start this rant by saying libcurl is fucking awesome. It has all the functionality you typically need, is stable as fuck, and has bindings in most reasonable languages. Despite that, after I got it “working” I was still left with a mysterious segfault whenever I tried to fetch a page over SSL.

$ man libcurl-tutorial
libcurl is completely thread safe, except for two issues:  signals  and
SSL/TLS handlers. Signals are used timeouting name resolves (during DNS
lookup) - when built without c-ares support and not on Windows..

If you are accessing HTTPS or FTPS URLs in a multi-threaded manner, you
are  then of course using the underlying SSL library multi-threaded and
those libs might have their own requirements on this issue.  Basically,
you  need to provide one or two functions to allow it to function prop-
erly. For all details, see this:

    OpenSSL
    http://www.openssl.org/docs/crypto/threads.html#DESCRIPTION

It took me a while to realize where the fuck the segfault was coming from (because it was raping the fuck out of the stack, so the backtrace was little help). And once I did figure out what the problem was it took awhile to figure out how to fix the fucker. Why didn’t those OpenBSD fucks just use the POSIX locking shit? It’s either for performance or portability, and I’d argue that their method of working around it is a nasty fucking hack.

tl;dr documentation — OpenSSL is not thread safe, unless you pass it a couple of callbacks which

  • Return a unique identifier for the current thread of execution

    unsigned long openssl_thread_id() {
    	return (unsigned long) pthread_self();
    }
  • Provide a function which can both lock and unlock a series of mutexes
    void openssl_thread_lock( int mode, int lock_id, const char* file, int line ) {
    	if ( mode & CRYPTO_LOCK )
    		pthread_mutex_lock( &thread_locks[lock_id] );
    	else
    		pthread_mutex_unlock( &thread_locks[lock_id] );
    }

Then you’ve got to hook those fuckers into the slots OpenSSL has for them (and hope none of the shared libraries you’re using is trying to do the same thing — which is why libcurl doesn’t try to fix that shit)

int openssl_init() {
	/* If some other library already set these, don't dick around */
	if ( CRYPTO_get_locking_callback() && CRYPTO_get_id_callback() ) {
		fprintf( stderr, "OpenSSL locking callbacks are already in place?\n" );
		fflush( stderr );
		return 1;
	}

	num_thread_locks = CRYPTO_num_locks();
	thread_locks = calloc( sizeof( pthread_mutex_t ), num_thread_locks );
	
	size_t i;
	for ( i = 0; i < num_thread_locks; ++i ) {
		if ( pthread_mutex_init( &thread_locks[i], NULL ) ) {
			fprintf( stderr, "Unable to initialize a mutex.\n" );
			fflush( stderr );
			return 1;
		}
	}

	CRYPTO_set_locking_callback( &openssl_thread_lock );
	CRYPTO_set_id_callback( &openssl_thread_id );

	return 0;
}

And with that, libcurl no longer shits itself when you attempt to asynchronously fetch pages over SSL (which is good, because the page I'm going to be using this the most on is 4scrape, and I'm not moving that over to raw HTTP (SSL rox)). The one catch is that I'm still doing something or another wrong and the damn thing doesn't close nicely (libcurl segfaults in curl_multi_cleanup):

#0  0x2866a43a in free () from /lib/libc.so.7
#1  0x2827c67a in CRYPTO_free () from /lib/libcrypto.so.5
#2  0x2825effc in ASN1_OBJECT_free () from /lib/libcrypto.so.5
#3  0x2825eb5f in ASN1_primitive_free () from /lib/libcrypto.so.5
#4  0x2825ee3a in ASN1_primitive_free () from /lib/libcrypto.so.5
#5  0x2825eecb in ASN1_template_free () from /lib/libcrypto.so.5
#6  0x2825edc0 in ASN1_primitive_free () from /lib/libcrypto.so.5
#7  0x2825eecb in ASN1_template_free () from /lib/libcrypto.so.5
#8  0x2825edc0 in ASN1_primitive_free () from /lib/libcrypto.so.5
#9  0x2825ef13 in ASN1_item_free () from /lib/libcrypto.so.5
#10 0x2822efc7 in X509_free () from /lib/libcrypto.so.5
#11 0x281d4bd0 in X509_STORE_CTX_get1_issuer () from /lib/libcrypto.so.5
#12 0x282153f3 in sk_pop_free () from /lib/libcrypto.so.5
#13 0x281d5092 in X509_STORE_free () from /lib/libcrypto.so.5
#14 0x2814a503 in SSL_CTX_free () from /usr/lib/libssl.so.5
#15 0x281051f1 in Curl_ossl_close () from /usr/local/lib/libcurl.so.4
#16 0x28116944 in Curl_ssl_close () from /usr/local/lib/libcurl.so.4
#17 0x280ff257 in conn_free () from /usr/local/lib/libcurl.so.4
#18 0x28101e43 in Curl_rm_connc () from /usr/local/lib/libcurl.so.4
#19 0x281121d9 in curl_multi_cleanup () from /usr/local/lib/libcurl.so.4
#20 0x0804aacf in main () at main.c:397

At this point though, I'm willing to say "FUCK IT" and just not call that cleanup function. I'll figure it out later or something; in the meanwhile I need to start parsing some XML and loading some image data.

Come back next week when I try to stick my dick into Firefox's XPCOM extension bullshit only to find out my dick is no match for it's spinning razors! Will I have to repaint my blood-soaked walls to alleviate the horror of being permanently disfigured? Only time will tell!

17 comments

(Untitled)

September 14th, 2008 | Category: Random

I’m going to try to spare you more rants about work shit for awhile since I’ve been rambling on about that shit a lot. Part of that is because I haven’t been spending time on personal projects lately, it’s all been “go to work, come home and work on Apache modules, go to work, etc”. Not due to lack of projects on my part, mostly because writing Apache modules in C is really fun (flashy dynamic languages have too much wank) and I honestly don’t have any game ideas (or artistic talent) to use for a Haskell PC/DS game (going to write a hentai game eventually!).

I think I am going to start on a personal project soon though — a fucking PicLens clone for *nix. The concept is fucking golden, but the lack of support for anything except terrible!!! operating systems has driven me nuts. Browsing 4scrape through the normal interface is fun and all, but using PicLens on the random viewer is my favorite fucking timekiller. And my Windows machine went tits up, so I need an escape.

The real PicLens has some other “problems” I want to address in my version, as well. I’m not quite sure how they’re managing memory, but I don’t think they’re freeing as proactively as they should. PicLens isn’t designed for a near-infinite stream of input, which is exactly what 4scrape’s random function does. So after 10 minutes of zooming around on my 5GB Windows machine, everything would start to get a bit sluggish.

PicLens also has an embedded UI which lets you search the big-name sites for images (Google, Youtube, etc), but it isn’t extensible. You can’t change any of the entries there (well, without binary hacking, at least).

There are some features that PicLens has that there’s no way in hell I can implement; namely, the embedded Flash player for watching Youtube. I imagine this is the primary reason they’re not going to release a *nix version anytime soon, since Flash support is absolute shit (hooray binary blobs!!! Thanks, Adobe!!!). Hopefully Gnash will stop sucking eventually, but I doubt I’ll bother incorperating it (even if I could) simply because fuck it. Really depends on how easy it is to embed Gnash in an existing application (ie, how to tie it into the renderer). Also it’s written in Sepples which opens up another can of worms but whatever.

tl;dr Taro can’t sleep and likes pretty pictures.

3 comments

(Untitled)

September 13th, 2008 | Category: Random

Goddammit, I have to give a presentation/demo of Muradora on Tuesday (which means I first have to dick around and try to get it installed on one of our machines which is a pain in the dick for any Tomcat-deployable application; one of the many reasons I’m writing my own webservice deployment framework as an Apache module).

Anyway, the IT infrastructure here has a couple of brand new 8-core 24GB machines running VMWare ESX Server which they use to hand out virtual machines for development (or, they would, except it was decided by the higher ups that they shouldn’t). I had a VM from this pool sitting around, but it was a fucking mess (we’d been using it to test all kinds of random fucking installs) so on Thursday I asked the guys in charge of that system to wipe it and give me a fresh Fedora Core 9 image, noting that if it couldn’t be done before the weekend to just forget about it.

Late Thursday afternoon I was called up to their office so I could watch how they did it (because VMWare ESX is a pretty neat concept1 and I was interested in how it was managed). Unfortunately, when they attempted to clone a new VM it bitched about multiplexing the NIC (or something stupid) so they told me they’d do it on Friday.

Today is Saturday and I still haven’t heard anything from them (despite sending a couple more emails), and my VM instance is still offline. Which means they’re probably not going to “get around to it”, which means I have to once again use one of my personal dev machines (probably Barasuishou, since I don’t want to break Suigintou). Thanks, fuckheads.


[1] I’m not sure if it’s the way that VMWare handles things or if it’s just the way they have it configured, but the ESX client is basically a fucking remote desktop multiplexer. Each of the fucking VM instances are running an X server and have a full blown desktop environment running for the goddamn ESX client. What the fucking hell, talk about a serious waste of resources. If I ever get my fucking VM back I’m uninstalling X, just to be an asshole.

3 comments

(Untitled)

September 10th, 2008 | Category: Random

So far ads on 4scrape have generated a whopping $2.66, which I think is pretty damn good, especially considering the ad providers are out to rip you the fuck off. For example, the 200×300 rectangle banner used on 4scrape serves both banner ads and text ads (and AdBrite doesn’t let you disable the text ads). While this seems like a trivial thing, it really isn’t.

First, I want to point out that no one clicks on ads. Maybe they did back in the 90′s, but people either have them blocked altogether or are trained to ignore them, save for a side glance. Obviously this doesn’t necessarily apply to people who “INTERNET? HOW DID I GET HERE?”, but the userbase of 4scrape seems reasonably saavy. And of course, statistics don’t lie — in the past 10 days, 98,736 ads have been viewed and there’s only been one click.

Now, this is the trick when it comes to how ads pay out — they’re either Cost-Per-Click (CPC) or Cost-Per-iMpression (CPM). Both payouts are measured in $/thousand(click|view)s. While CPC ads tend to have a considerably larger payout (100-1000x larger) ZERO TIMES ZERO IS STILL FUCKING ZERO. So any time someone gets a CPC ad on their screen, no money is made.

The clencher is that, on AdBrite, all the text ads are CPC and all of the banner ads are CPM, but there’s no way to tell them “I just want banner ads”. And if you don’t tell them, they’ll give you mostly text ads (off of which no money is made). Over the past 10 days, a whopping 80% of the ads displayed have been those fucking text ads, essentially a financial no-op.

So what did I do? I disabled their “automatically approve network ads for your site!” and went through the list of current ads and rejected all of the text ones. Now I can make sure that only ads I’m actually getting paid to display are displayed, which will increase my un-profit margin by 500% (turning that $2 into $10).

Hooray!

17 comments

« Previous PageNext Page »