lol, this is a short report I wrote for my E-Commerce class. Figured I’d post it since I have nothing better to rant about today. Kind of sucks that ALL MY GOOD MACHINES ARE DOWN and won’t be back up until the end of break, because it would have been kinda fun to be able to fuck around some more with lighttpd and Squid and stuff. oh well.
When a potential customer comes into a datacenter, the first two questions he should ask are, “What kind of uptimes do you guarantee,” and “how much can I push through the pipes?” Availability and extensibility are possibly the two most important factors on the infrastructure side of E-Commerce – if your server goes down, whether the cause was a hardware failure or too many subsequent requests, your E-Business is shot until the problem can be repaired. Every second that your business is down is money lost. The goal of this short report is to broadly discuss and evaluate several technologies to guarantee redundancy and ensure expandability, such that your E-Business stays on solid ground.
For me, this all started when looking for a lightweight web server to tinker with. I had toyed with Apache in the past, and while Apache provides possibly every feature you can dream up, it just seemed too heavyweight for the simple FastCGI chores I was using it for.
While browsing the www-ports listing on one of my FreeBSD boxes, I noticed an entry called lighttpd, and checked the package description:
“lighttpd a secure, fast, compliant and very flexible web-server which has been optimized for high-performance environments. It has a very low memory footprint compared to other webservers and takes care of cpu-load. Its advanced feature-set (FastCGI, CGI, Auth, Output-Compression, URL-Rewriting and many more) make lighttpd the perfect webserver-software for every server that is suffering load problems.”
Browsing through the lighttpd documentation, it seems like there’s quite a few neat trick features available. The first, and probably most interesting for me is the ability to specify a list of remote FastCGI-running servers to offload requests to. The node running lighttpd, in this case, simply acts as a pipe connecting the user to the nodes which are running a dynamic scripting language within a FastCGI module.
The idea here is that, when running dynamic scripts, the first bottleneck often encountered is the CPU. Since most scripting languages don’t cache by default, they end up crunching a lot of the same data for each request, and this can quickly eat up CPU cycles. There are lots of ways to cache both the compiled script (when using a language which compiles to bytecode before being passed to a VM) and the script output, but in many situations this may not be ideal.
Because all the heavy CPU computations are offloaded to independent FastCGI modules running on several machines, expansion of such a setup is fairly trivial – just add more machines and add them into lighttpd’s list of FastCGI nodes. This is effective until either the network bottlenecks (which can be solved by either upgrading the hardware, or through some channel bonding tricks), or shared resources like databases and disks become overtasked (which Google solves by using it’s own BigTable distributed database and GoogleFS).
On lighttpd’s homepage, there’s an impressive list of prominent sites which claim to use lighttpd, among which are wikipedia, and meebo. Not wanting to take this at face value, I decided to check out some of the headers myself. Originally I was using WireShark, a network protocol analyzer to get the headers, but WireShark gets confused when the response is broken up over multiple packets. Instead, I’m using wget with the –S switch to grab the server response headers and dump the rest of the file.
So, the first page I tried was Wikipedia, the headers returned were:
HTTP/1.0 200 OK
Date: Wed, 19 Dec 2007 22:23:28 GMT
Server: Apache
X-Powered-By: PHP/5.1.2
Content-Language: en
Vary: Accept-Encoding,Cookie
Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
Last-Modified: Wed, 19 Dec 2007 22:17:35 GMT
Content-Length: 51279
Content-Type: text/html; charset=utf-8
X-Cache: HIT from sq27.wikimedia.org
X-Cache-Lookup: HIT from sq27.wikimedia.org:3128
Age: 7
X-Cache: HIT from sq30.wikimedia.org
X-Cache-Lookup: HIT from sq30.wikimedia.org:80
Via: 1.0 sq27.wikimedia.org:3128 (squid/2.6.STABLE16), 1.0 sq30.wikimedia.org:80 (squid/2.6.STABLE16)
Connection: keep-alive
The first thing I noticed was, hey, they’re serving the page with Apache, not lighttpd! Looking further down though, you can see from the Via header that they’re forwarding the returned page through Squid, a proxy server . This in itself adds a layer of redundancy – there’s several tiers of servers all running the same scripts. Should one of the Apache servers fail, Squid will simply request the page from a different working one. And, should one of the Squid servers fail, there’s at least a 2-level hierarchy to take the rest of the load.
In this case, we’re actually hitting the page caches of the Squid layer, so our request probably didn’t even go down to Apache. We probably got served the same copy of a page someone else requested a while back.
In the end though, there isn’t any lighttpd in this transaction! So I decided to take a closer look at their claims; checking out their PoweredBy page, next to the entry about Wikipedia it states that lighttpd is used for upload.wikimedia.org. Whoo, false advertising much?
Just to verify that, here’s a couple of the headers returned by wget for the index page –
HTTP/1.0 200 OK
Server: lighttpd/1.4.18
X-Cache: HIT from sq10.wikimedia.org
X-Cache-Lookup: HIT from sq10.wikimedia.org:3128
X-Cache: MISS from sq46.wikimedia.org
X-Cache-Lookup: MISS from sq46.wikimedia.org:80
Via: 1.0 sq10.wikimedia.org:3128 (squid/2.6.STABLE16), 1.0 sq46.wikimedia.org:80 (squid/2.6.STABLE16)
So they are using it for something, just not the main load of the page. upload.wikimedia.org essentially serves all the static images for Wikipedia. Looking at the list of sites powered by lighttpd, it seems almost all of them use the server exclusively to serve static data, and rely on an Apache+Squid combination for load distribution.
There are many other web servers written in a variety of languages; the only other one I’d consider looking at is Yaws, a web server written entirely in Erlang. Erlang is a functional language developed by Ericsson with an emphasis on high availability. Most of Erlang’s feature sets are geared toward massively multi-threaded network applications. Internally, Erlang uses a lightweight process system to manage thousands of threads with ease.
The benefit of Yaws over Apache lies in these lightweight threads – because Apache relies on the OS for threading support, it’s threads are inherently very heavyweight. In both applications, each incoming request is serviced within it’s own thread. In situations where there are many incoming requests (such as a DDoS attack), the Apache software will consume system resources much faster than Yaws. A study done showed that Yaws can handle more than twenty times the number of parallel sessions.
High availability and maintainable performance under load, in addition to easy expansion is key when developing fledgling E-Commerce startups. If the underlying infrastructure is not present, then the entire operation is doomed. And while Apache may currently be the reigning champion for web servers, there are a variety of other solutions which are just as extensible, though much less widely used.
3 comments