The random rantings of a concerned programmer.

Archive for March, 2010

Taro bitches about Drupal to his boss’s boss

March 11th, 2010 | Category: Random

Recently one of my superiors sent me an email pointing out Selenium, a web application testing framework, and asking if I thought it would be beneficial to deploy for our software. Because I’m a stupid bum I accidentally a whole essay about why it would be a pain in the ass to run and wouldn’t have much benefit. And then I rebutted against parts of my own essay, presumably because I really didn’t want to do any real work (I am a bad employee).

Anyway someone asked for a rant, so here’s a rant.

On Tue, Mar 2, 2010 at 11:27 AM, Taro wrote:
> The grief I have with TDD/BDD on our site is that most of the
> functionality that is going to break (that we’d want to test) involves
> manipulating content (making posts, changing nodes, etc). This means
> we’d have to write our test suite to either poke around on our
> production site (yuck) or configure it as a pre-commit hook on the
> staging site (which won’t catch intermittent failures).

The counter-argument to this might be “why would your staging site
ever be de-synced from production? You should always make changes on
staging and merge them into production after testing.”

Modern CMS systems, due to their database-centric nature (which is
somewhat necessitated by their GUI-does-everything philosophy) make
merging changes from staging into production impossible.

Consider this real case: we want to create a new content type (a
“Resource” content type for TBET), have the TBET team populate the
data, etc, then dump it into production when it’s ready. All of these
changes are done via the Drupal admin interface. Creating a new
content type entails creating (potentially) several new tables in the
database. Adding the data for the resource node entails inserting rows
into not only the new tables, but the master node tables.

Because of how Drupal structures it’s database (and how RDBMS’s assign
primary key values in general, to a lesser degree), what happens is
that the next row inserted into the master node table will have the
same ID. Because changes are made to both production and staging and
made asymmetrically, you’ll get rows on the main node tables with the
same ID, but different data.

This introduces several problems:

1) How do you reproduce the ‘CREATE TABLE’ generated by CCK on the
production machine?

This is pretty trivial — you can either export just the tables, or go
through the same series of clicks on production that you did on
staging. Doing it manually introduces capacity for error, however.
Since, for the most part, creating CCK node types is a fairly safe
operation I tend to create them on production, then export them back
to staging (since that’s safer than porting a subset of tables from
staging -> production).

2) How do you re-sync the staging and production databases after
making changes on staging (ie, how do you pull updates from production
-> staging during the development process without overwriting
changes)?

3) How do you merge changes back from staging -> production once
everything’s done?

This is the primary key problem: there will be different nodes on
production/staging with the same ID. The data for these nodes is
spread across no less than 5-7 tables if they have CCK fields; so
you’d have to obtain new IDs for each node/revision pair, then update
all the referring IDs in the old data before importing it into the new
data.

The bigger problem is when you combine 2 & 3 — if you merge
production into staging, you can’t change the production IDs (because
when you later merge staging back into production, you don’t want to
duplicate everything). You basically have to do the same thing `git
rebase` does — somehow automagically save all the changes you’ve made
on staging since the last sync, sync the databases, then re-apply all
the changes you’ve made to staging (making sure to fix conflicting IDs
and everything which refers to them).

This, simply put, isn’t feasible.

Anyway, tl;dr Drupal (and I’d venture to say “most CMSes in general”)
isn’t/aren’t designed to have staging/production environments so you
can’t reliably “test” anything that relies on database manipulations
without taking excessive measures (or testing it directly on
production).

If someone has ideas on how to merge between Drupal production/development databases in a safe way (both data and CCK content types — last I checked content_copy was broken in 6.x), please let me know ;_;

UP NEXT: Why WordPress’s development cycle gives me a raging erection.

28 comments