The random rantings of a concerned programmer.

May 12

So, I lied.

Category: Random

4scrape is alive again. Completely rewritten from scratch in Go, featuring even more stupid bullshit on the frontend. And it’s using a real fulltext indexer this time around. In it for the long haul this time around.


Tagged with: ,
12 comments

Aug 16

bullshit rant, part 1

Category: Random

If you haven’t watched Josh Berkus’s presentation, “Scale Fail”, consider doing so now.

I’m a real sucker for using Reddit/prog/it to choose what programming language/framework/libraries to use for my upcoming projects. Sometimes it’s worse than usual, and I find myself trying to figure out which project to shoehorn one thing or another into.

Naturally, right now I’m running high on the Node.js wave and tweaking along the Go rave. Naturally, I’ve had wildly different experiences with each.

I started using Node.js back when it was version 0.2.0, back before npm was widely used, when Express was first brewing, even before hipsters decided how to be cool. To be honest, since then it’s had a fairly stable API, and the ecosystem gets better every day. In my mind though, it still has two massive warts:

Continuation passing can make code pig disgusting

Quite a few people have bitched about continuation passing in Node.js being a pain for a variety of reasons.

It makes the code look like piss

This is not that big of a deal — if your code looks like piss because of the continuation passing style, then you’re not doing it right. Your code needs to either be restructured to accurately reflect what you’re trying to do, or you should use one of several existing libraries for manipulating control flow in a CPS-friendly manner.

Lexical scoping can make refactoring code “fun”

Almost every Node.js application is going to have code that looks like

function(foo, bar) {
  foo.do(function(x) {
    bar.do(function(y) {
      /* snip */
    });
  });
}

Assume that you want to lift the callback passed to foo.do into a separate, named function:

function lifted(x) {
  bar.do(function(y) {
    /* snip */
  });
}

The problem is that we’re referencing bar which was previously closed on from the parent environment. By lifting the function out of the lexical scope (presumably to use the same functionality elsewhere), we completely sacrifice all the automatic loveliness of closures. Instead, we have to explicitly go through the function to lift (which may be fairly long and complex, mind you), annotate exactly what’s used from the parent scope, then manually bind it into a closure:

function lifted(bar) {
  return function(x) {
    bar.do(function(y) {
      /* snip */
    });
  };
}

function(foo, bar) {
  foo.do(lifted(bar));
}

To add insult to injury, the compiler will accept whatever you give it, and fault you at runtime for any mistakes.

If you’ve got some Javascript curry magic, please let me know :(

When spaghetti is what’s on the menu

The biggest issue I have with continuation passing is that (since I’m dealing mostly with web applications) it’s exceedingly difficult to trace a failure back to a specific request. Error propagation in Node.js manifests itself in one of three forms:

  1. A thrown exception
  2. An 'error' event is emitted from an EventListener
  3. An error parameter is passed to a callback

Thrown Exceptions

Exceptions are used solely to communicate errors in synchronous calls, which naturally are few and far between. The handler is almost always in the same lexical scope of the calling code (since letting the exception propagate further may trigger a process.unhandledException event, which should almost always kill the application as cleanly as possible), so it’s not that big of a deal.

If you’re balking at the aforementioned “process.unhandledException should kill the application as cleanly as possible” — the logic here is that you really have no idea where the hell the exception came from, and even if you do, you have no access to the scope from which it was thrown. You can’t guarantee that your application is in a consistent state — your only recourse is to SHUT. DOWN. EVERYTHING. (if this is not the case and you can guarantee a consistent state, somehow, please let me know. that would be amazing).

Error Events

'error' events are fun little things — whenever you emit an 'error' event on any EventEmitter, if there are no listeners for that event, the process exits (I don’t believe that an process.unhandledException is emitted, even).

This actually works out really well, in most cases. Whenever you’re binding listeners for something, you should bind the error listener too — you’ve got all the stuff you need to identify from whence the error originated within lexical scope.

The one massive snafu is that when you’re abstracting an abstraction that uses EventEmitter internally, you MUST remember to handle and forward all the error events. You might be reading this and say “oh but that’s easy to remember it’ll never happen”. It actually happens more often than you think — the built-in http.Client functionality didn’t properly catch and forward errors from the internal net.Socket for a long time. You had to manually get the undocumented socket member and attach a listener manually. I think they fixed this in the new http.request interface.

Errors in callbacks — lexical scope strikes again

I think the most common form of error passing is by just returning an error code in a callback parameter. This works really well, until you start refactoring stuff and realize how much shit you’re stuffing in a closure.

Actually I don’t remember where I was going on this one — it seems that it’s trivially solved by just using the “pass it forward” error code C-ism.

Writing native (C++) extension is a bitchface

I uhh, this is getting kind of long. I’m gonna break this into a multipart/post and write up section 2 of Node.js bitchings and then eventually write up a section on Go bitchings. Hooray!


Tagged with: , , , , ,
7 comments

Jul 17

Invoking mount(2) in FreeBSD 8.x

Category: Random

So I’m still writing Go bindings for a lot of common FreeBSD functionality. Yesterday I implemented a means to list all mounted filesystems, so today I’m writing the bindings to mount(2) to mount/umount them.

If you look at the man page for mount, you’ll see that the function signature looks like this:

    int mount(const char *type, const char *dir, int flags, void *data);

The void* should scare you.

I haven’t been able to dig up any information about what the fuck should be passed to it (granted, I haven’t looked very hard because, judging from the contents of src/sbin/mount_*/*.c in the FreeBSD sources, it’s been entirely superseded by nmount.

    int nmount(struct iovec *iov, u_int niov, int flags);

Poking around, struct iovec (eventually included from sys/uio.h) is defined as this:

struct iovec {
    void *iov_base;
    size_t iov_len;
}

Effectively, nmount takes an array of these structs which effectively form a flattened vector of (key, value) tuples. As far as I can tell, iov_base is always a NULL-terminated char*, and iov_len should be strlen(iov_base) + 1 (for the NULL terminator).

Unfortunately, the only hints that man 2 nmount gives us is

The following options are required by all file
     systems:
	   fstype     file system type name (e.g., ``procfs'')
	   fspath     mount point pathname (e.g., ``/proc'')

     Depending on the file system type, other options may be recognized or
     required; for example, most disk-based file systems require a ``from''
     option containing the pathname of a special device in addition to the
     options listed above.

So far, the only way I’ve been able to find the actual options is to dig through mount_* sources and see what they use, but it’s pretty gross. Take, for example, the following two filesystems:

  • nullfs simply layers one vnode on top of another, effectively grafting one directory over another.
  • unionfs (roughly) does the same thing, but still lets you access the grafted-over directory in read-only mode (and can be configured to do cool shit like copy-on-write).

They’re pretty close, but let’s look at the arguments that each of them take:

nullfs
  • fstype: “nullfs”
  • fspath: Path to the directory to graft over.
  • target: Path of the directory that’s being grafted onto another.

IMHO, "target" should be "from", bikesheds, et. al.

unionfs
  • fstype: “unionfs”
  • fspath: Path to the directory where the unionfs will be mounted.
  • from: Same as “target”, above.
  • below: Makes “fspath” writable, “from” read-only (swaps default behavior)
  • errmsg: …I have no fucking idea, a char[255] which presumably is used as a buffer instead of errno?
  • …anything else passed as -oyour=mom passed to mount_unionfs?!

Maybe this is more a gripe that unionfs seems to be very shitty. And maybe I just haven’t found a nice magical table of options that every filesystem takes. But FFFFFF SERIOUSLY >:(


Tagged with: , , , ,
2 comments

Jul 16

getmntinfo(2) from Go — a foray into cgo

Category: Random

Go is a fun esoteric language that strives for system-level usage. Currently in all real operating systems, C is the dominant systems language and as such, all the functionality for interfacing with core features are exposed as raw C APIs. Go provides a C FFI layer called cgo, which handles all the preprocessing and linking magic in the background. Unfortunately, there’s little-to-no documentation available for cgo, just a couple of toy examples in Go’s misc/cgo directory (there’s actually a shitton of production examples in the Go package sources though — fucking everything uses cgo).

So, what I want to do is expose getmntinfo, which simply lists the metadata for all mounted filesystems. In C, this is pretty trivial:

#include <sys/param.h>
#include <sys/ucred.h>
#include <sys/mount.h>

#include <stdio.h>

int main() {
	struct statfs *bufs;
	int i = getmntinfo(&bufs, 0);
	int j = 0;

	for (j = 0; j < i; ++j) {
		struct statfs fs = bufs[j];
		printf("[%s] %s -> %s\n", fs.f_fstypename,
			fs.f_mntfromname, fs.f_mntonname);
	}

	return 0;
}

This, however, presents a variety of problems for the Go implementation –

  1. We don’t really know how many struct statfs we’re getting back.
  2. The memory allocated is actually allocated statically; we just get an opaque pointer back to an in-library address.
  3. The fields of struct statfs are char[N]s rather than char*s.

Thankfully, calling getmntinfo is pretty trivial –

func GetMntInfo() []MntInfo {
	var tmp *C.struct_statfs;
	i := int(C.getmntinfo(&tmp, 0))

It’s pretty close to the C version — we just allocate a pointer, and pass a pointer to it in. getmntinfo sets the value of the pointer to an internal array of struct statfs‘s and lets us go along our merry way. Naturally, we want to marshal it to the appropriate Go types.


	info := make([]MntInfo, i)
	for j, _ := range(info) {

So we create an array to marshal values into and begin to iterate through it.

This is where it gets nasty. All we have right now is an opaque pointer to a struct statfs — in C we’d just use pointer arithmetic to get the other entries in the array. Go, fortunately, explicitly disallows pointer arithmetic. I’m not sure what the appropriate method to get values out of it is. First, I tried something like


foo := (*[]MntInfo)(unsafe.Pointer(tmp))
item := (*foo)[j]

But that seems to cause a panic (no idea why). I got tired of dicking with it and threw in the cards, simply exposing the following C function in the cgo header –


struct statfs* offset(struct statfs *v, int i) {
	return v + i;
}

With that, there’s no need to dick with much of anything, so we can get the current struct statfs of the iteration pass via


		s := C.offset(tmp, C.int(j))

Finally, the char[16] values need to be marshaled out. Unfortunately, the C.GoString marshaling function only takes a char* and it’s too damn stubborn to take an implicitly-convertible type (noting that X* != X[]). The other beef is that cgo’s type system processes a char[] strangely as a []_C_char_type, so we can index it perfectly fine (but not implicitly coerce it into a pointer).

So we juggle some types and shit all over unsafe.Pointer and make it do what we want –

		info[j].FsType = C.GoString((*C.char)
			(unsafe.Pointer(&s.f_fstypename[0])))
		info[j].MntFrom = C.GoString((*C.char)
			(unsafe.Pointer(&s.f_mntfromname[0])))
		info[j].MntOn = C.GoString((*C.char)
			(unsafe.Pointer(&s.f_mntonname[0])))
	}

	return info
}

And, after several hours of not finding any fucking documentation and screaming at the fucking monitor the damn thing finally works. I’m completely glossing over the terrible shitty build system they’ve got set up (it basically only provides functionality to INSTALL to built cgo packages — I haven’t found a way to actually build and link them otherwise) — will probably have to read through all the fucking makefiles that do evil shit.

At some point just doing everything in C is easier, I suspect :|


will post full code listing in a sec


Tagged with: , , ,
4 comments

Jul 11

Calling a templated member function of a typedef’d template class

Category: Random

C++ is insane.

Assume you have a templated Object:

template 
struct Object {
	template  void func(){};
};

And you want to wrap up the instance in a Proxy object:

template 
struct Proxy {
	typedef Object WrappedType;
	WrappedType obj;

	static void Func() {
		Proxy *self = new Proxy;
		self->obj.func();
	}
};

Pretty straightforward, but when you actually try to invoke Proxy::Func on an arbitrary T using g++

struct Foo {};

int main() {
	Proxy::Func();
	return 0;
}

g++ shits itself completely:

$ g++ test1.cpp
test1.cpp: In static member function ‘static void Proxy::Func()’:
test1.cpp:13: error: ‘Foo’ was not declared in this scope
test1.cpp:13: error: expected primary-expression before ‘)’ token
$ g++ --version
i686-apple-darwin10-g++-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)

Fucking fantastic.

Some tinkering reveals that the compiler is getting confused as to what the fuck obj.func is somewhere. The following implementation of Func works fine (but defeats the point of using templates) --

	static void Func() {
		Proxy *self = new Proxy;
		Object bar = self->obj;
		bar.func();
	}

I searched for awhile and turned up jack diddly squat, then a co-worker informed me the fix is to use the following:

	static void Func() {
		Proxy *self = new Proxy;
		self->obj.template func();
	}

I don't know what the fuck this instance.template function<..>() bullshit is, but apparently MSVC implicitly puts it in there for you. I've certainly never seen it before and it's completely orthogonal to any fix I would have assumed.

tl;dr C++ is a clusterfuck.


EDIT: A stack overflow post which contains a reference to the C++03 standard (14.2/4) in the answers. fml.


Tagged with: , , ,
4 comments

Next Page »