The random rantings of a concerned programmer.

Archive for June, 2011

A centralized system for sharing sensitive content

June 20th, 2011 | Category: Random

I have too many stupid ideas which I’ll never have enough time to implement. Despite that, some of them I’d really like to see implemented because I bloody need them. So please someone steal this and implement it, even though it’s a stupid piece of trash ;_;

Overview

The goal is to combine the easy-to-use native interfaces of DropBox (http://www.dropbox.com/) with the paranoid strong-encryption cryptography of Tarsnap (http://www.tarsnap.com/) to create a cloud-based sharable storage system where you can share content with yourself and other people, but not even the server providers can see the content being shared.

Deficits in Existing Systems

DropBox

DropBox is a service for easily storing and sharing content in the cloud — after registering an account, it effectively presents itself as a file share on your local machine (Windows, OSX, Linux, etc). Any changes to the data on the file share are automatically and seamlessly propagated to the central server, and from there to any other clients looking at those files. Effectively, it’s a USB drive that’s stored on the internet.

Their shell integration is critical to their success — a naive user can simply run the software and interact with it in the same manner as a USB thumb drive. Because it exposes itself as a logical volume, applications can interface with it out-of-the-box.

Despite the amazing ease-of-use, DropBox is completely insecure and unsuitable for use in a sensitive environment:

  • It relies on password authentication
  • The server software they use is buggy; numerous critical security holes are constantly found
    • Password reset doesn’t deauth clients, http://forums.dropbox.com/topic.php?id=12645
    • Able to reset any password, http://pastebin.com/yBKwDY6T
  • Data is not encrypted; hosting providers (or anyone who can get access) has all your data

Tarsnap

Tarsnap is an online backup system “for the truely paranoid”. After registering, you provide tarsnap with a public key to authenticate all data requests. There are two methods of operation — put data and get data. All data is automatically encrypted by the client software with your public key, then signed with your private, then sent to the server. As soon as the data leaves your system, no one can access it ever again without your private key.

Despite the extreme caution it takes with data security, Tarsnap is completely unusable for the majority of DropBox’s use cases:

  • All core functionality is exposed in command-line tools rather than shell integration
  • Designed around loading large, static files; no support for inter-file metadata (directories, etc)
  • Everything done with a single key pair — cannot share data with other uses without giving them your private key

Solution Criteria

We need something that combines the ease-of-use of DropBox’s data-sharing characteristics with the data paranoia of Tarsnap. In particular, it should fulfill the following criteria:

  • No data sent over the network or stored on the server should be unencrypted
  • The server should not be able to decrypt any of the data it contains
  • Private keys must never be shared
  • It must be possible for one user to share a single binary file with multiple users without duplicating the binary content
  • The system must present itself to the end-user as if it were a USB drive (e.g., seamless shell integration)

Proposed Solution

Transport-Level Details

Data is represented in an encrypted unit which will be henseforth termed a “blob”. A blob consists of the following data segments:

  1. The binary payload itself, encrypted with a single-use symmetric key, X⁰
  2. A list of Pⁿ, where each Pⁿ is the known public key of a friend the user authorizes to view the data
  3. A list of Xⁿ, where X⁰ is encrypted with each Pⁿ

Each blob is identified by the SHA256 (or equivalent) hash of its contents (henceforth referred to as the blob ID).

Like Tarsnap, the transport provides two operations – putting content on the server, and getting content from the server.

Sending Content

To put content on the server, one blob for each logical file is created, signed with the user’s private key, then uploaded to the server. The server can then verify that the payload was sent by the user and is what the user intended to send. Furthermore, it can see who the user has authorized to view the data (so it can quickly send access denied messages to people who don’t have access to the content).

Receiving Content

Likewise, a client can receive content by sending a request for a specific blob ID. The request is signed with the user’s key for authentication purposes. If the client is authenticated, the server then transmits the blob.

The client then thumbs through the blob and finds the copy of the single-use symetric key signed with their public key. They decrypt it, then use the decrypted key to decrypt the payload of the blob.

Listing/Removing Content

Since the server knows effectively nothing about the content, these are pretty easy use-cases: the client simply sends a signed request to the server. In the former, the server sends a list of blob IDs back to the client (in addition to possible metadata, like file size, for billing purposes). In the later, the client simply sends a blob ID (or list of IDs) to the server and the server removes them.

Providing a Seamless User Experience

What’s been described thus far is effectively Tarsnap with a form of content sharing built-in. As such, it is only suitable for client consumption, not end-user consumption. In addition to transmitting, storing and receiving binary blobs, the user must be able to append metadata to that blob. Some likely forms of metadata include

  • Symbolic name of the content (e.g., a filename)
  • Hierarchical organization of the content (e.g., file directory structure)
  • Other tidbits normally expected of filesystems to provide (atime/mtime, etc)

Support for metadata is built entirely on top of the existing transport infrastructure — metadata for all files belonging to a user is encoded as a single, separate blob which contains a hierarchy of metadata objects, each of which contain the blob IDs of the data they reference.

In addition to the actual metadata, as listed above, each metadata object also contains one of the following:

  1. A blob ID which references the blob containing the content of the file, OR
  2. A set of “child” metadata objects (e.g., this one is a directory) OR
  3. A blob ID which references another metadata blob (e.g., a shared directory)

The “shared directory” is an abstraction on top of the transport-level permission details that services two purposes: it provides beyond all-or-nothing to share metadata with other users, and it provides an intuitive way to do directory-level sharing (e.g., having a “Shared with Alice and Bob” directory — though the client would have to make sure every blob referenced in that tree was appended with the appropriate encrypted keys).

At this point, we’ve effectively built, from the ground-up, a centralized file-sharing system with no shared secrets.

Good luck making it financially viable ;_;

EDIT: Apparently it already exists. lol.

6 comments

Why Nagios Sucks

June 07th, 2011 | Category: Random

fml, dependency hell

The first thing you’ll notice when deploying Nagios is that it’s a bloody mess of languages. While I don’t normally mind when some software does this, when I have to install and configure a long list of software packages that all need their own dependencies configured separately, I start to groan.

Back in the day (was it the Nagios 2.x era?) Nagios was simply a cluster of C, some embedded Perl, and a bit of C++ in the event broker subsystem. That was kind of a mess since the CGI scripts (also written in C) had to be served in a special way. But it was one special way.

It seems like the new distribution of Nagios now ships with PHP wrappers around the CGI interfaces. It may not sound like much more, but sweet pineapples, really? At this point you’re running some kind of webserver (nginx) that’s configured to serve static files, serve out requests to the FCGI-wrapped (with yet another piece of software) Nagios CGI scripts, and serve out requests to a PHP daemon for trivial nonsense. For a single bloody monitoring tool you’ve now got 4 heavy daemons running (Nagios, Nagios CGI Scripts, Nginx, PHP). This is too much for me.

clusterfuck configuration DSL

Once you get past that part (and I’m skipping configuration of the Nagios daemon itself — not covered in the object files you need to write), you have to configure all your services and hosts and contacts and stuff. The configuration language isn’t absolutely terrible (but perhaps I’ve just gotten used to it over the years?) but it is exceptionally verbose. Arguably, it supports a lot of functionality, but rarely in a non-enterprise setup are you going to need to specify that a service alert should be escalated to a different group of people on holidays.

The sample configuration files are over 1000 lines long. A non-trivial deployment can expect to around 2000-10000 lines of their configuration DSL, depending on how much you can consolidate common functionality in templates.

the Nagios plugin API is a shitty cludge

A “Nagios plugin” is basically an external executable on the machine you’re probing. The input/output format is quite simple — your executable takes any command-line arguments you want (you have to configure them in the Nagios configuration files anyway), and returns output via stdout and the POSIX return value. It’s not too shabby, and it’s kind of nice to use whatever language you want to write plugins.

The cludge, however, is that they’re returning three distinct strings via stdout. I’ll let the documentation explain:

At a minimum, plugins should return at least one of text output. Beginning with Nagios 3, plugins can optionally return multiple lines of output. Plugins may also return optional performance data that can be processed by external applications. The basic format for plugin output is shown below:

TEXT OUTPUT | OPTIONAL PERFDATA
LONG TEXT LINE 1
LONG TEXT LINE 2
...
LONG TEXT LINE N  | PERFDATA LINE 2
PERFDATA LINE 3
...
PERFDATA LINE N

Getting your output data into that format isn’t too bad, but…

oh, you wanted to USE that data?

Once you’ve got your pretty install working with a bunch of blinking green lights, you might get the bright idea to use some of the data to make graphs and such. There are plenty of “scripts” to do this already, but they’re all clunky as shit. Why?

Nagios is designed to throw data away.

It doesn’t store data persistently. The only state it stores is the current state of all services (and even that gets thrown away when you kill the Nagios daemon).

To ameliorate this, Nagios has a component called the event broker. I haven’t mucked with it in a couple of years, but IIRC it’s a tacked-on piece written in C++ (whereas the rest of Nagios is in C). The idea is that it provides a hook for an external service to listen and collect information and package it off somewhere. So you’ve got tools like rrdtool (and I remember using one that dumped to a PostgreSQL database too, but can’t find it) that grab and warehouse the log data.

The problem is that you’ve got all this log data which is just a bunch of unstructured strings. To add insult to injury, when I was hacking on it a couple years ago, the event broker didn’t bother broking the long text line (I think it still did the perfdata though). Hacking that fix in was a nightmare, mostly because the event broker code is a clusterfuck of indirection. It necessarily has to be, because it’s loading in arbitrary shared object files which contain the plugin code.

sound like a piece of shit yet?

No? Here’s my current use-case: I have some hosts connected via low-reliability SSH tunnels, and want a graph of how often they’re online. To use Nagios, I need to

  • Configure 5 daemons (Nginx, PHP, Nagios, Nagios-CGI, an event broker plugin, and probably an RDBMS for the plugin).
  • Write a custom check plugin that serializes and deserializes input through an insane format.
  • Write a web interface which gets the data out of the RDBMS and generates a fancy graph.

…at some point, it becomes more work than just hacking a brand new piece of software.

Arguably, my use-case is a pretty atypical one for Nagios. Maybe I should be writing a custom piece of software.

But fuck it I’m writing one anyway so I might as well bitch about stuff.

16 comments

Searching for n180211? It’s actually nl80211.

June 01st, 2011 | Category: Random

Since there were no relevant results when I googled this a month ago, the appropriate driver for the Linux hostapd for cards supported by the shiny new nl80211 driver is actually “nl80211″. With an L. So if you’re getting something like

Configuration file: /etc/hostapd/hostapd.conf
Line 3: invalid/unknown driver 'n180211'

That’s why. Mind your 1′s and L’s.

For additional Linux lolz,

I should really switch to a distro that isn’t absolute shite when I have to choose a non-FreeBSD for deployment (due to hardware support constraints).

3 comments