PGSQL Phriday #007: The Art of the Trigger

It's triggers this time! I've said it before and I'll say it again: if you need to compute, do it as close to your data as you can get away with. But programmed databases, and especially programmed databases that use triggers to encode automatic behaviors and responses, are infamously hard to understand, and the more programmed the more difficult. Why is this, and what can we do about it?

Trigger utility is limited first by the limits of database procedural languages. The other PLs like Python or JavaScript can't touch anything PL/pgSQL can't (it bears mentioning here: there's more than OLD and NEW! TG_OP, TG_TABLE_NAME, and TG_ARGV in particular) and are useful because they can express complex and specific manipulations in algorithmic instead of relational-calculus terms. Higher-level abstractions are not available to database functions in general unless built to that purpose, in database procedural languages, which is when I start feeling compelled to apologize to code reviewers in advance.

The real limits, though, aren't purely technological. All things are possible with a Turing complete language and sufficient patience. But let's say we're adequately funded with all the time in the world, have a trusted and capable DBA at the helm, and they've judged that encoding the processes under consideration into the database will save our organization money and simplify our infrastructure. Someone in the room is going to be nervous, and it's not infrequently the DBA: why?

Any successful automation, mechanical or virtual, changes the structure and politics (but I repeat myself) of an organization, absorbing money, risks, responsibilities, jobs, entire professions, and reorganizing them into new, more efficient or more specialized forms; these projects only fail insofar as they do not take over operational territory. That's reason enough for nerves right there. Database automation in particular, though, is notably arcane and access to it is strictly controlled for very good reasons.

Other virtual automations are invisible compared to the mechanical sort, but they at least tend to have names: the such-and-such datafeed ETL, the new-member flow, the delivery queue. In a healthy organization, those names are backed up by teams or at least by relatively well-defined responsibilities. They have a recognizable surface area which can be examined or interacted with. People know when a given ETL job has crashed, they can often see exactly why (whether or not they can use that information), and they usually know whom to call.

The names of database-internal programs, by contrast, are invisible to the uninitiated. Experts can locate and analyze them, but from outside they inhabit The Database, an undifferentiated and undifferentiable space bordering every other territory on the organization's operational map. Responsibility for database programs is often more diffuse but is also harder to identify in the first place. Effects are visible, their causes are not. After The Database takes over a new operational area, both those previously responsible and others across the organization can no longer see what's going on. If any other department worked this way it'd be a sign of major dysfunction, but again: very good reasons.

And triggers are the acme of database programming. When the new-member flow becomes an after insert trigger and a series of database functions, this is in a very real sense the database encroaching on other operational demesnes. For the good of all, naturally: if much of the initial processing of new members can be made to happen in the database, with perhaps the necessary external data sources connected through foreign data wrappers, everyone's happier! Signups are much faster for members. The team currently responsible for setting the latest introductory rate every so often can devolve that to the database team, or even help design a self-service rate lever for the business people, and move on permanently. Ops can even take a node or two off the infrastructure-that-needs-watching graph.

But it also makes the signup process more opaque to everyone else. Downstream dependents are less able to reason about what is happening or has happened, and while the subsumption of the process into the database hopefully gives those dependents less cause to wonder than they used to have, it can't eliminate that need completely. "What happens during signup" is less knowable, less memorable, and less perceivable to the rest of the organization. That's also cause for concern: is encoding our institutional knowledge into this self-governing black box worth what we gain from computing close to the data? Will we be going all the way back to the drawing board if an acquisition or regulation or sheer signup volume forces us to store and process new members differently? Will we become uncertain about the results and ramifications of the encoded processes as they're performed internally? Will we be able to implement changes or respond to problems with appropriate efficiency?

Only experience can tell us whether our programmed-database strategy will be worth the sacrifices we make for speed and simplicity. Each automation project is unique, but there are common workflow adjustments and technical solutions which help improve the odds of success. Our goals on this tactical level are to speed up development and test feedback loops, keep implementors' options open in the face of unforeseen obstacles, and demystify database automation for everyone else who works with it.

priorities

Databases change more slowly than do their client programs. New or external processes moving into the database should be as completely defined as possible to avoid flurries of updates as requirements continue to evolve or edge cases and bugs are squashed. It's usually better to give young processes time to stabilize before incorporating them, just like it's less work in aggregate to refine queries embedded in client code before turning them into views.

fast iteration

Databases change more slowly than client programs, but during active development the latter change on the scale of seconds. Development databases need to be as close behind that as possible. It should be fast to stand up a clean schema from scratch, faster to reapply changes as implementation progresses.

When I'm writing triggers and functions, I'll often revise them directly in psql, making heavy use of conveniences like \ef. Once I'm happy with the result I'll "canonize" the final code in the schema migration I'm working on. This works best with very focused changes; if the work spreads out to more than one table-trigger-function it's too easy to lose track of individual elements.

Migration frameworks that encourage idempotence, like graphile-migrate, also save a step compared to frameworks with an apply/revert model. In my day job we do a lot with create or replace this, if not exists that, and attempted changes in do blocks ignoring known exceptions:

do $maybe_create$begin
  create domain checked_text as text ....
  -- there's no `create domain if not exists`, so trap the exception if it does
  exception when duplicate_object then null;
end$maybe_create$;

debug

I have never used pldebugger and in fact didn't know it existed until this week. I'm not going to be able to install it on every server I need to debug, although I'm certainly going to try it where I can. Where I can't, raise warning will always have my back (notice is too polite: the default client_min_messages prints it, but the default log_min_messages is stricter). Want to see variable values? raise warning. Not sure which execution path it's heading down? raise warning. Is my complicated when predicate even satisfied? raise warning first thing into the function and find out.

Sometimes if there's more data in play than I want to dig through in psql or logs I'll create a temporary (sensu lato) table and have my trigger function write interesting things to it, whereupon I can sort, filter, and the rest. This does only work as long as there are no fatal errors that would roll back the transaction.

And speaking of, transactions are great for testing triggers faster, fully operational or not. Fire off your DML statement, inspect the outcome, and roll back ready to do the same exact thing all over again without having to worry about unique constraint collisions or other consequences of the new database state. I often try to get into loops like this in a dedicated testing psql session, modifying the function separately:

rollback; begin;⏎
↑↑⏎

test

↑↑⏎ in a REPL is almost an automated test already -- all it's missing is a way to assert and report things about the outcome without human intervention. Trigger development is easier with the ability to evaluate assertions about everything in the database at your fingertips, but more importantly, true automated tests are legible to others as well. Anyone can look at a sufficiently descriptive test output with "success" or "failure" printed next to it and understand instantly what it means without having to know SQL.

For this reason alone, pgTAP may be the best thing since TOAST.

It's important to do two things with pgTAP tests: first, make sure they describe themselves adequately in their real context. Many checks are completely self-explanatory already, especially the "schema things" like has_table and policy_roles_are. Others, such as lives_ok and results_eq, usually want a note detailing exactly what just happened or why the comparison matters.

Second, they need to be organized. The default TAP output is a list of files with status or error count, with the errors themselves included. The latter will be useless to external viewers, but it should be clear which major functional groups are being exercised and how they're doing. Splitting up test files also helps with state management. It's all too easy for tests to become implicitly dependent on writes made by previous tests, and innocently introducing a new one in between or reorganizing them can wreak havoc.

pgTAP does represent an extra logistical commitment! Integration tests (in that loose quasi-Bechdelian sense of "at least two programs talking to each other, and writing state to disk") or even well-honed manual test loops usually come first, depending on the purpose the database serves. Testing the whole system can tell you enough about the functioning of the database to get by initially. As the database becomes more extensively programmed, the debugging needs of external statements start to be outweighed by those of procedures and triggers, and there are enough of the latter as well that internal dependencies start to form and changes here can cause failures there. Any sufficiently internally complex subsystem benefits from testing in isolation, and the database is no exception.