After Massive

MassiveJS version 7 went places.

    .join( // implicit join on foreign key holdings.library_id
    .join(db.books)    // implicit join on foreign key holdings.book_id
    .join(db.authors, db.$join.left, {[db.authors.$id]: db.books.$author_id})
      [db.libraries.$postcode]: '12345',
      [`${db.authors.$name} ilike`]: 'Lauren%Ipsum'
      $key: db.libraries.$id,
      $columns: [...db.libraries],
      authors: [{
        $key: db.authors.$id,
        $columns: [
            `extract(year from age(coalesce(${db.authors.$death}, now()), ${db.authors.$birth}))`
        // notice `books` is a collection on authors, even though we join authors to books!
        books: [{
          $key: db.books.$id,
          $columns: [...db.books]

It'd be stretching an ecological metaphor to say that the middle tier is being eaten, but GraphQL and the "app logic on the client" tendency in web development make a powerful combination. Together, they constitute a -- big, important, immediately useful -- local maximum on the software fitness landscape.

Of course, fitness one way comes at costs in others, and like any species of software system GraphQL backends are histories of decisions about what to make possible or impossible, simple or detailed, how to balance the correlated complexities of model and interface, fast good or cheap and all that. More important decisions may or may not be intentional but have in common that they exclude or foreclose ways of interacting with, here, your database and its contents. In a very roughly chronological order:

Classic object/relational mappers, including Hibernate and its kin but also and especially the ActiveRecord pattern, represent a choice to treat the database as a perfect, crystalline extrusion into time of the object graph and decisions on how best to patch over the resulting impedance mismatch. They also often hide or try to replace SQL and tend to target "lowest common denominator" database vendor compatibility.

Other data mappers and query builders, from MyBatis to Knex, identified a better corresponding structure to programmatic objects in the SQL statement, transforming those objects into parameters and from results, and made decisions about whether to generate, store, or construct statements and how.

There's an identifiable "query runner" tendency, projects like pg-promise, slonik, yesql, and aiosql, which offer more affordances than the plain database driver but ultimately decide the important thing is helping you write exactly the SQL you need. Everything before and after getting that hand-written SQL to the driver is best left up to you, even if it means you write your own boilerplate -- at least it's yours.

Finally-so-far, GraphQL backends like Postgraphile go all in on being an HTTP API for independent clients interacting statelessly, and minus a few caveats basically nail atomic create-retrieve-update-delete from that distance. Between database functions and custom resolvers, they can cover even quite complex data models and server-side logic as well, within the bounds of request and response.

The first category isn't dead by any means but its innate internal contradictions are well recognized; many examples of the second are a reaction to them, Massive included. What still unites the two tendencies is their competition on the territory of the web service, which must wane as that of the independent client application has waxed. Between GraphQL serving that use case so effectively, and query runners sufficing for cases that don't involve extensive manipulation of complex object graphs, the space for mappers of any stripe at least has not been getting much bigger, relatively speaking, in the past decade. A data access library of the older school therefore will have to do a lot more than CRUD to compete, or even to differentiate itself, on its traditional terrain. If it can be useful elsewhere too, so much the better.

Massive isn't, and can't be, that library.

"Make working with your data and your database as easy and intuitive as possible, then get out of your way" was and is a great mission statement, but the fact is Massive was largely built for simple CRUD. There's more to it, of course: full-text search, array and JSON field support, runtime document table generation, keyset pagination, sequence and matview management, but these are extras on a design rooted in intentionally chosen simplifications. Finding all fields by a criteria object goes a really long way!

Many of these extra ideas and tools Massive adds on top of that foundation, original and inherited alike, still point a useful way forward: abandoning compatibility to support Postgres in detail, using introspection to facilitate reasoning about and manipulating database objects directly, record schemata inferred from joins or declared as needed without the maintenance and synchronization burden of model classes, collapsing the distinctions between script files and database functions, and more. But it also includes a lot of decisions made for and in the very different context that entailed a decade ago, and for very different approaches to writing JavaScript as well (it antedates the Promise API!). Some of those decisions can't be grown past in a way that remains recognizably Massive.

For example:

  • An API surface of do-it-all functions like readable.find winds up with a fairly low complexity ceiling that can cover many to most common scenarios, but ultimately can't keep up with plenty of still fairly routine data access tasks that could benefit from dynamic construction in JavaScript.
  • Because a single function call has to convey everything from sort order to streaming to decomposition and beyond, all manner of functional and organizational purposes get crammed into options objects with little rhyme or reason. Some options are mutually exclusive; others contain arbitrarily complex nested objects and arrays.
  • Transaction clones are extremely heavyweight since they copy and substitute the dedicated connection across the entire database object tree.
  • CommonJS has become a dead end. I don't feel particularly strongly either way about the relative merits of CJS vs ESM, but I think it's better to pick one and Node's use of CJS is odd out.

I started monstrous a few months ago, while working on my fourth or fifth really substantial project with Massive. I'd been finding its limitations harder and harder to ignore, and the many other options available didn't serve my goals either.

I do web stuff but I've no intention of trying to keep up with the Modern Frontend Stack. I support a Postgraphile API at my day job, and have only good things to say about it, but my day job is data architecture and Postgres wrangling on behalf of people who aren't me or even on the same team. GraphQL's a sensible choice there given the coordination and communication requirements in play, but my other projects don't have those pressures and constraints.

And I'm never going to write another model class again if I can help it, so that rules out almost everything in the first two categories. It's true Knex has always been around and doesn't force you to recapitulate your schema in classes, but if Knex organized my data model to the extent and in the direction I wanted, I'd already have been using it.

That leaves query runners, and if I'm going to use a query runner and maintain my own boilerplate -- well, that's kind of what this is, no?

I'd seen Penkala some time ago, and that in turn pointed to alf/bmg. If you're looking for something in Clojure or Ruby respectively you should check them out! The latter two implement a full relational algebra and translate it to the relational calculus of SQL, while Penkala extracts the core principle of composability from that approach -- something SQL has never done well. Other tools try to supply that missing piece, most commonly by supporting technically-separable subqueries, but few go as far as these two. However, I'm already locked in to writing JavaScript for my charmingly retro coupled frontends, so I default to writing it on the server as well.

monstrous takes after those two in emphasizing composability. Everything done to a relation is a contained transformation step: join specifies relation, type, and condition; filter, criteria; project, an output record shape. Each transformation yields a new joined or filtered or projected relation. You can attach any such derived relation to the database just as if it were an original table or view, and reference it in other joins or filters as a subquery.

Moreover, you can use the same relations in reads and writes. Possibly monstrous' most fundamental departure from Massive is the inversion of subject and verb, separating statement construction from execution. With Massive, you could pass a criteria object from a find into an update, although there aren't many reasons to. With monstrous, you can much more usefully select an attached relation here and update it there.

In short: still no models, but if a certain complex product is a common motif in your project, you can define it once and reuse it without repeating the same transformations every time it appears. Attached relations are akin to writable views that respect the object graphs you're working with in client code.

The construction-execution split also means that tasks and transactions, which in Massive deep clone the entire database structure to swap a dedicated connection into each attached relation, instead use a cheap, lightweight class comprising a dozen or so functions and practically no extra state.

For more, check out the readme and the tests!

As for Massive: it still exists, is still moderately popular going by weekly downloads, and even sees the odd issue or merge request. I'll continue to keep an eye on it into the near future, but I think it's developed about as much as it's going to; certainly I've developed it about as much as I'm going to. If there's interest from any extant contributors or users (email address is up top!) I'll see about spinning it out into its own group/organization and adding maintainers.