The random rantings of a concerned programmer.

A centralized system for sharing sensitive content

June 20th, 2011 | Category: Random

I have too many stupid ideas which I’ll never have enough time to implement. Despite that, some of them I’d really like to see implemented because I bloody need them. So please someone steal this and implement it, even though it’s a stupid piece of trash ;_;

Overview

The goal is to combine the easy-to-use native interfaces of DropBox (http://www.dropbox.com/) with the paranoid strong-encryption cryptography of Tarsnap (http://www.tarsnap.com/) to create a cloud-based sharable storage system where you can share content with yourself and other people, but not even the server providers can see the content being shared.

Deficits in Existing Systems

DropBox

DropBox is a service for easily storing and sharing content in the cloud — after registering an account, it effectively presents itself as a file share on your local machine (Windows, OSX, Linux, etc). Any changes to the data on the file share are automatically and seamlessly propagated to the central server, and from there to any other clients looking at those files. Effectively, it’s a USB drive that’s stored on the internet.

Their shell integration is critical to their success — a naive user can simply run the software and interact with it in the same manner as a USB thumb drive. Because it exposes itself as a logical volume, applications can interface with it out-of-the-box.

Despite the amazing ease-of-use, DropBox is completely insecure and unsuitable for use in a sensitive environment:

  • It relies on password authentication
  • The server software they use is buggy; numerous critical security holes are constantly found
    • Password reset doesn’t deauth clients, http://forums.dropbox.com/topic.php?id=12645
    • Able to reset any password, http://pastebin.com/yBKwDY6T
  • Data is not encrypted; hosting providers (or anyone who can get access) has all your data

Tarsnap

Tarsnap is an online backup system “for the truely paranoid”. After registering, you provide tarsnap with a public key to authenticate all data requests. There are two methods of operation — put data and get data. All data is automatically encrypted by the client software with your public key, then signed with your private, then sent to the server. As soon as the data leaves your system, no one can access it ever again without your private key.

Despite the extreme caution it takes with data security, Tarsnap is completely unusable for the majority of DropBox’s use cases:

  • All core functionality is exposed in command-line tools rather than shell integration
  • Designed around loading large, static files; no support for inter-file metadata (directories, etc)
  • Everything done with a single key pair — cannot share data with other uses without giving them your private key

Solution Criteria

We need something that combines the ease-of-use of DropBox’s data-sharing characteristics with the data paranoia of Tarsnap. In particular, it should fulfill the following criteria:

  • No data sent over the network or stored on the server should be unencrypted
  • The server should not be able to decrypt any of the data it contains
  • Private keys must never be shared
  • It must be possible for one user to share a single binary file with multiple users without duplicating the binary content
  • The system must present itself to the end-user as if it were a USB drive (e.g., seamless shell integration)

Proposed Solution

Transport-Level Details

Data is represented in an encrypted unit which will be henseforth termed a “blob”. A blob consists of the following data segments:

  1. The binary payload itself, encrypted with a single-use symmetric key, X⁰
  2. A list of Pⁿ, where each Pⁿ is the known public key of a friend the user authorizes to view the data
  3. A list of Xⁿ, where X⁰ is encrypted with each Pⁿ

Each blob is identified by the SHA256 (or equivalent) hash of its contents (henceforth referred to as the blob ID).

Like Tarsnap, the transport provides two operations – putting content on the server, and getting content from the server.

Sending Content

To put content on the server, one blob for each logical file is created, signed with the user’s private key, then uploaded to the server. The server can then verify that the payload was sent by the user and is what the user intended to send. Furthermore, it can see who the user has authorized to view the data (so it can quickly send access denied messages to people who don’t have access to the content).

Receiving Content

Likewise, a client can receive content by sending a request for a specific blob ID. The request is signed with the user’s key for authentication purposes. If the client is authenticated, the server then transmits the blob.

The client then thumbs through the blob and finds the copy of the single-use symetric key signed with their public key. They decrypt it, then use the decrypted key to decrypt the payload of the blob.

Listing/Removing Content

Since the server knows effectively nothing about the content, these are pretty easy use-cases: the client simply sends a signed request to the server. In the former, the server sends a list of blob IDs back to the client (in addition to possible metadata, like file size, for billing purposes). In the later, the client simply sends a blob ID (or list of IDs) to the server and the server removes them.

Providing a Seamless User Experience

What’s been described thus far is effectively Tarsnap with a form of content sharing built-in. As such, it is only suitable for client consumption, not end-user consumption. In addition to transmitting, storing and receiving binary blobs, the user must be able to append metadata to that blob. Some likely forms of metadata include

  • Symbolic name of the content (e.g., a filename)
  • Hierarchical organization of the content (e.g., file directory structure)
  • Other tidbits normally expected of filesystems to provide (atime/mtime, etc)

Support for metadata is built entirely on top of the existing transport infrastructure — metadata for all files belonging to a user is encoded as a single, separate blob which contains a hierarchy of metadata objects, each of which contain the blob IDs of the data they reference.

In addition to the actual metadata, as listed above, each metadata object also contains one of the following:

  1. A blob ID which references the blob containing the content of the file, OR
  2. A set of “child” metadata objects (e.g., this one is a directory) OR
  3. A blob ID which references another metadata blob (e.g., a shared directory)

The “shared directory” is an abstraction on top of the transport-level permission details that services two purposes: it provides beyond all-or-nothing to share metadata with other users, and it provides an intuitive way to do directory-level sharing (e.g., having a “Shared with Alice and Bob” directory — though the client would have to make sure every blob referenced in that tree was appended with the appropriate encrypted keys).

At this point, we’ve effectively built, from the ground-up, a centralized file-sharing system with no shared secrets.

Good luck making it financially viable ;_;

EDIT: Apparently it already exists. lol.

6 comments

(Untitled)

April 24th, 2009 | Category: Random

I just found this article on Kotaku which lightly discusses the use of the Nintendo DS as an education tool — having the students play Brain Age or write stories about their Nintendogs and stupid shit like that.

Don’t get me wrong — I think using a Nintendo DS (or other handheld technology) as an educative device is fucking great. I think trying to hack a curriculum around existing games is fucking stupid.

The technology should incorporate into the existing curriculum. I remember learning multiplication tables — every day we’d have a 10-question quiz where you had 30 seconds to complete a bunch of simple multiplication problems. The teacher would then go through them and mark each sheet and record all the results.

It’s a really dead-simple use-case: write an NDS application which allows the student to authenticate and take the test on the hardware during the test period instead of taking notes on a piece of paper. This has a shitload of advantages over the low-tech version –

  • Math questions can be randomly generated based on a set of rules to prevent cheating (everyone has a slightly different set of questions).
  • Tests can be immediately and automatically scored to save time and improve turnaround and accuracy.
  • Questions, answers and results can be stored in a centralized database for statistics and analysis on a large-scale.

Maybe it’s just because I’m a programmer and something like this would be dead-easy to do (but incredibly costly for an educational institution), but my god. Seriously.

No comments

(Untitled)

April 18th, 2009 | Category: Random

FFFFFFFUUUUUUUUUUUUUUUUUUUUUUUUUUUU

I just discovered a really fucking massive huge flaw in my shitty ORM shit. The ORM layer is supposed to lazily resolve foreign references, right? So if you have a schema

You’ll notice that the post table contains a self-reference. So the ORM layer will generate the following code for that table –

The own-table foreign key, post_parent is a killer, because it causes parseSql' to lazily enter an infinite loop. While this isn’t bad, when you throw that DbRecord to the automatically-generated JSON thingy, it tries chomping through the entire thing and gets tangled up.

There’s no good way to fix this, aside from doing a topological sort of all the inter-table relations and having the ORM layer go “oh hey” when it starts to loop. That kind of functionality could be added to either the parseSql' function above, or to the toAscListJSON function, which converts the DbRecord into a JSON-injestable form (and should be replaced by using Data.Data instead…)

Realistically, I should throw this shit out and do it some other way, especially since the code generation bit has become pig disgusting. Fuck! (Eventually I’ll rewrite it using Template Haskell or something).

No comments

(Untitled)

March 30th, 2009 | Category: Random

So, I’ve been working on my Haskell Web Framework shit (oh god). My current project is rewriting the abortion of a codebase, LulzBB, in Haskell. LulzBB is a PhpBB clone written in PHP. It was born after I got so sick and tired of maintaining shittons of modifications to the PhpBB codebase (which is an even bigger failed abortion, if that’s possible).

One of the “FUCK THAT WAS A TERRIBLE IDEA”s in the original version of LulzBB was keeping the same ill-conceived database schema as PhpBB. Instead, I’m going to use one that isn’t absolute shit (and going to use PostGRES instead of MySQL, though that’s more of a deployment difference since HDBC is awesome).

Anyway it’s still a shitty web forum and it’s gonna have the same shitty features as every other fucking forum software except it’ll be in Haskell and isn’t that great. Also I’m toying around with git, so if you want to see how shitty my code actually is (spoiler: it’s really shitty) you can visually molest your eyes at github.

7 comments

(Untitled)

March 01st, 2009 | Category: Random

So I came up with a brilliant solution to bandwidth/disk rape — write a small PHP script (Meimei) which caches images from 4scrape (Suigintou), give it out to people, and just randomly direct people to those mirrors with javascript. Validation issues aside (“oh hey let’s just serve lemonparty for every image”) — because those can be “fixed” by only allowing trusted mirrors — I don’t think it’s going to work.

I threw up a test version of the script on my DreamHost account and it completely shit itself. It worked for a while, then it started spewing 503 errors everywhere. I don’t know if this is an issue with DreamHost or my shitty code (or the fact that by putting it up to serve 100% of the requests, it was bombarded with almost 1000 hits in the few minutes that it was up), but it definitely needs a lot more tweaking before it’s usable. It’s one of those “ugh PHP” things and it’s just a bitch.

The version I have running right now is really stupid — it basically just acts as a caching HTTP proxy. To have the thing fully functional, it needs to do a lot more than just that (and the server stuff on Suigintou needs to be a bit more sophisticated too). Each Meimei instance needs to track how much bandwidth it’s used (it already tracks disk consumption); it needs a means to communicate back to Suigintou that it’s all filled up or out of bandwidth or whatever.

There also needs to be something which gives some notion of cache locality — ie, a Meimei might say “I only want images in the range 30000-40000 that are SFW and from /wg/”. That way it’ll consistently get handed those requests and hopefully hit the cache more often (though eventually, assuming unlimited disk, each Meimei instance will be a full mirror — this isn’t a sound assumption).

And then you hit mirror validation. I’d want to do in a distributed manner with JavaScript, ie, check the MD5 of a subset of the image data against a known value from the authoritative Suigintou, then report back to Suigintou if the mirror is giving bad data and not display the image. I don’t even know if that’s going to be possible in a realistic manner, or how evadable my checks would be. I think if there aren’t more than 5-7 Meimei instances they’ll either get overloaded with requests, not be clustered enough to achieve decent locality, or just be otherwise ineffective at staving off bandwidth consumption.

Would be nice to have it all working and shit but ugh it’s almost too much work. Bleh.

6 comments

Next Page »