A trip down memory lane

Quite some years ago, when I was working on the Reusable Application Platform for Internet Devices (RAPID) at St. Bernard Software, I encountered a conundrum that has bugged me ever since. This was the second time I'd used PostgreSQL to store configuration information for an internet appliance, and I just knew there had to be and elegant way to monitor for changes in a database. We didn't find the solution, we were under a lot of pressure to get this new version of RAPID done, so I could start working on the actual application, but because it was the second time I'd punted on this question it really bugged me.

Guess what? I got to solve it this week. I'd done something like this in Oracle; the feature was present all the way back in Oracle 6 in some form, which I was using in 1989-1990. Yeah, the dark ages of RDMS, for sure.

What we want is to be able to store configuration information for 'the system', or for 'a subsystem', in the database, and to have 'the system' or 'a subsystem' automagically reconfigure itself when the configuration database is updated.

The solution in modern PostgreSQL turns out to be relatively simple, and fairly well documented. I found all of the information needed to complete this task in a few other blog posts, and I'll link the most helpful one at the bottom. For purposes of this post, I'll keep the database really simple. Let's assume the device we're configuring is a modem, and what we are storing is the modems S Registers. For those of you too young to know what a modem is, let alone and S Register, don't worry. The key is to know that the database holds name/value pairs than can be updated or inserted, and that the modem driver software needs to know when these values change, so they can be stuffed into the modem.

So the first trick is, how do we know when something has changed? In PostgreSQL, as with all competent RDBMS, the answer is a TRIGGER. A trigger tells the database to call a function that you supply whenever a table gets changed. So obviously we're going to need a function too.

When it comes to functions, PostgreSQL provides an embarrassment of riches, in terms of languages you can use. First, you can write functions in C and load them into the database engine, if you really like pain. To make things less painful, they provide their own database-oriented language, plpgsql, which is patterned after Oracle's plsql. Then they provide a plugin methodology that allows programmers to plug their own interpreted languages into the database system, so you could literally stuff any interpreter in there. Then, based on this lovely interface, they provide pltcl, plperl, and plpython. Others have provided yet more plugins. Based on my friend Jason Crawford's love of Lua, I just checked and yes indeed there is a pllua plugin as well.

OK, so we have a trigger mechanism and a way to write a function. So what? The things you can accomplish in a trigger function are somewhat limited. You can't, for instance, open a file in a function. (OK, you can, but only with a untrusted function language, and who wants something untrusted in their database engine?) This is the piece that had eluded me last time. I flopped around for a couple of hours, and then got smart and asked on the PostgreSQL irc channel, which is where the really smart PostgreSQL guys and gals hang out. Sure enough, within moments of asking a relatively dumb question, someone said 'You want to look at LISTEN/NOTIFY.'

I did a quickie search for 'postgresql listen notify' and started smiling. Then I thanked my irc-angel and got to work.

The NOTIFY command allows you to basically shout a word into a notification channel. Any connected clients that have previously LISTENed for that word will get an asynchronous notification from the database engine when you NOTIFY it. The real key is that you can NOTIFY in a trigger function.

Great, so you can only say the equivalent of a single word. How do we know what was changed? Well, it is a database, and we are writing a trigger function after all. We can queue the changes, using another table!

So, here is the implementation. First we'll start with the configuration table, which is trivial:

CREATE TABLE modemConfig (
    register VARCHAR(4) NOT NULL,
    setting  VARCHAR(3) NOT NULL
);

S-Registers are only one of the kinds of settings modems have. The set of S-Registers are named S0 - S255, and the settings are 0-255. Since they're transmitted to the modem as ASCII strings, we'll just store them in the database as strings, and not worry about somebody getting stupid and trying to do math on them.

Next, we need our queue of changes. For reasons that will become apparent later, we want to tag each of the changes with a unique id. The PostgreSQL way to do this is to assign the value from a sequence, so we'll create that first:

CREATE SEQUENCE modemChangeId;

We'll let PostgreSQL provide the default values for our sequence, because they're all fine. Now we can create our queue:

CREATE TABLE modemChanges (
    id       INTEGER    NOT NULL DEFAULT nextval('modemChangeId'),
    register VARCHAR(4) NOT NULL,
    setting  VARCHAR(3) NOT NULL
);

Looks quite a lot like the modemConfig table, except for that id field, right? There's a good reason for that. Now we can work on our trigger function. Remember, we want the trigger to do two things: first, we want it to queue the change, and second we want it to notify waiting client(s).

CREATE OR REPLACE FUNCTION modemConfigChange() RETURNS TRIGGER AS 
$$
    BEGIN
        INSERT INTO modemChanges VALUES (NEW.register, NEW.setting);
        NOTIFY modem;
        RETURN NULL;
    END;
$$ 
LANGUAGE plpgsql;

So this is what a function in plpgsql looks like. The stuff between the $$ markers is the body of the function. In plpgsql, for UPDATE or INSERT operations, the value of the NEW record (after update or insert) is passed to the function, this is where NEW.register and NEW.setting come from. The rest of the function should be mostly self-explanatory. The NULL return is mostly just syntatical sugar to make the function compile correctly. So now we have to tie this to the modemConfig table:

CREATE TRIGGER modemChange
AFTER INSERT OR UPDATE ON modemConfig
FOR EACH ROW EXECUTE PROCEDURE modemConfigChange();

This tells the database engine to call our function on each row of the table, AFTER it has been updated or inserted. This allows us to pick up each record, one at a time, in case somebody does a giant transaction and for instance plops all 256 S Registers into the database in one gulp.

As we insert each record into the change queue, a unique id will be assigned to the change. We'll use that on the receiving side. This is it for the database work. Now we'll hop out to the system and write a little Perl program to watch for changes, and just write them to the terminal. We're looking for a demonstration here, right? Our perl program needs to be able to connect to the database, and wait for asynchronous events, so it's going to need the DBI and IO::Select modules. Here's the script:

#! /usr/bin/perl -w

use strict;
use warnings;
use DBI;
use IO::Select;

my $dbcon = "dbi:Pg:dbname=config;host=127.0.0.1";
my $dbuser = "wpeters";
my $dbpass = "nopassword";
my $dbattr = {RaiseError => 1, AutoCommit => 1};

my $dbh = DBI->connect($dbcon, $dbuser, $dbpass, $dbattr);

my $select_req = $dbh->prepare("SELECT id, register, setting FROM modemChanges");
my $delete_req = $dbh->prepare("DELETE FROM modemChanges WHERE id = ?");

$dbh->do("LISTEN modem");

my $fd = $dbh->func("getfd");
my $selector = IO::Select->new($fd);

while (1) {
    print "Waiting...\n";
    $selector->can_read;
    my $notify = $dbh->func("pg_notifies");
    if ($notify) {
        $select_req->execute();
        while (my $hr = $select_req->fetchrow_hashref()) {
            my ($id, $reg, $val) = ($hr->{id}, $hr->{register}, $hr->{setting});
            print "Update $reg to $val [$id]\n";
            $delete_req->execute($id);
        }
    }
}

The first few lines of code setup the connection to the database. The two prepare statements actually prepare functions we are going to execute often in the database engine. The LISTEN command tells the database engine to send us an asynchronous notification when somebody NOTIFYs the word 'modem'. The next two lines are Perl and IO module speak for setting up an I/O Selector to wait on the connection to the database server.

Inside our while loop, we wait until the IO selector "can_read." When it can, we check to make sure the function that issued the notification is "pg_notifies." Note that if you have listened for more than one word, this strategy won't tell you which one it was. When we have been notified by "pg_notifies," we call our prepared SELECT query to read changes from the database, one by one, print them to the output (aka do something useful with them), and then call our preparted DELETE query to delete the specific change we just processed. This is why we added the unique id to each of the changes in the queue, so we could delete each of the changes as we process it.

So that's it for this simple example. Here's what it looks like running:

$ ./modemConfig.pl
waiting...

In the database connection:

config=# update modemConfig set setting='10' where register='S26';
UPDATE 1

Back at the client:

Update S26 to 10 -- 21
waiting...

The trigger is called once per transaction, but the function is called per-row because that is what the trigger specified. On the database server:

config=# BEGIN TRANSACTION READ WRITE;
BEGIN
config=# INSERT INTO modemConfig VALUES ('S40', '40');
INSERT 0 1
config=# INSERT INTO modemConfig VALUES ('S41', '41');
INSERT 0 1
config=# INSERT INTO modemConfig VALUES ('S42', '42');
INSERT 0 1
config=# INSERT INTO modemConfig VALUES ('S43', '43');
INSERT 0 1
config=# INSERT INTO modemConfig VALUES ('S44', '44');
INSERT 0 1
config=# INSERT INTO modemConfig VALUES ('S45', '45');
INSERT 0 1
config=# INSERT INTO modemConfig VALUES ('S46', '46');
INSERT 0 1
config=# INSERT INTO modemConfig VALUES ('S47', '47');
INSERT 0 1
config=# INSERT INTO modemConfig VALUES ('S48', '48');
INSERT 0 1
config=# INSERT INTO modemConfig VALUES ('S49', '49');
INSERT 0 1
config=# COMMIT TRANSACTION;
COMMIT

The database engine delays all NOTIFY calls until the end of the transaction, and is smart enough to coalesce all 10 calls into a single notification. The client sees:

Update S40 to 40 -- 22
Update S41 to 41 -- 23
Update S42 to 43 -- 24
Update S43 to 43 -- 25
Update S44 to 44 -- 26
Update S45 to 45 -- 27
Update S46 to 46 -- 28
Update S47 to 47 -- 29
Update S48 to 48 -- 30
Update S49 to 49 -- 21
waiting...

Needless to say, I'm quite pleased with this solution. It uses all published, working parts of the PostgreSQL engine, no messy custom-developed addons needed. The next step is to do some testing to see just how much overhead this imposes, but it has to be measured against the overhead of accomplishing the same thing by other means, not just against the database writes without the triggers. The nicest part of this is that it doesn't involve any customization at all.

I'm searching for a similar mechanism in MySQL, and there I think I'm going to have to go the route of a trigger written in C, publishing to something like a pub/sub queue. A lot more work, and work that has to run inside the database server itself. PostgreSQL once again proves that it is the most advanced open source database engine out there.

Comments

  1. I forgot to link to the helpful blog I found. Here it is: http://www.bzerk.org/documents/postgresperl/

    ReplyDelete

Post a Comment

Popular Posts