A Unified Multi-Tenant User Cache with PostgreSQL
I've been working on a multitenant Node.js product which recently moved its authentication into a Single
- Foreign data wrappers (FDWs)
- Table inheritance
- Materialized views
Foreign Data Wrappers
When we began redesigning the application's user infrastructure we wanted to avoid maintaining a copy of user data independent from the chosen SSO system, Keycloak. We knew we could represent data from other sources through a foreign data wrapper. This is (so far as I know) a unique feature to Postgres, which lets you represent data in other sources as tables by implementing a standard connecting API.
The bad news: Postgres is written in C, and while I could probably brush up on pointers and make it work eventually, higher-level languages have spoiled me. Fortunately, there's a project which enables FDW development in Python: Multicorn. With my coworker's efforts on foreign-keycloak-wrapper, that got us as far as being able to create a table representing a particular "realm" or user organization in Keycloak (we kept our organizations
table in order to have referential integrity in our data ownership) and retrieve its users through the Keycloak API.
CREATE SERVER "myrealm_server"
FOREIGN DATA WRAPPER multicorn
OPTIONS (
wrapper 'keycloak.Keycloak',
url 'url to the Keycloak instance',
username 'a realm admin username',
password 'a realm admin password',
realm 'myrealm',
client_id 'a realm client id',
grant_type 'password',
client_secret 'a realm client secret',
organization_id 'id of an entry in the organizations table'
);
CREATE FOREIGN TABLE "myrealm" (
id uuid,
username text,
"firstName" text,
"lastName" text,
email text,
organization_id uuid
) SERVER "myrealm_server";
SELECT * FROM myrealm;
We'd still have to create a foreign server and table for each realm, since the Keycloak API only retrieves users per realm by design. In a multitenant system, we want bootstrapping new organizations to be easy and automated on the backend. Here Keycloak has the ability to export realm connection information as JSON, which lets us access the information required to CREATE FOREIGN SERVER
s and CREATE FOREIGN TABLE
s on the fly. So, while it's possible to pull user information from new realms after creation, it will always be separated by realm. We didn't want to have to figure out which table to pull from in the JavaScript API -- best to keep that as straightforward as possible and manage data complexity in the database. It's what it's there for.
Table Inheritance
Table inheritance is another feature unique to Postgres among the four major RDBMSs. Setting up a base users
table and declaring that the myrealm
table INHERITS (users)
accomplishes two things:
First, myrealm
builds on top of users
' column list. This mostly makes the CREATE FOREIGN TABLE
statement shorter (it's also optional), since we have no new columns to add as long as the base users
schema conforms to the Keycloak API contract.
Second, myrealm
's data can be accessed through users
with a simple SELECT
. In fact, this is the default behavior, and a SELECT
must specify FROM ONLY users
in order to omit rows from descendant tables.
CREATE TABLE "users" (
id uuid,
username text,
"firstName" text,
"lastName" text,
email text,
organization_id uuid
);
CREATE FOREIGN TABLE "myrealm" (
) INHERITS "users" SERVER "myrealm_server";
SELECT * FROM users;
The SELECT
now combines information from every active realm, so for higher-level APIs the only question is one of ensuring the requesting user is authorized to see the information retrieved. This is exactly the way we had it with the local users table, so we've already got that authorization infrastructure in place and overall impact to the rest of the application is minimal.
Materialized Views
The Keycloak server being separate from the application database server means longer roundtrip time in any query involving user records. There are some advantages to having the data stored locally, after all! However, the real problem isn't in having the data but in ensuring it stays current: what we need is a cache. A materialized view is exactly that.
Materialized views are found in Postgres, SQL Server, and Oracle. If you use MySQL, you're out of luck (but then, if you're following this whole thing, it's Postgres or bust anyway). It's defined just like a regular view, with the MATERIALIZED
between CREATE
and VIEW
the important difference indicating that the results of the view query are to be stored until refreshed. The stored results can be indexed just like tables, too.
CREATE MATERIALIZED VIEW cached_users AS
SELECT * FROM users;
If we add a new realm and its foreign table, or if information inside an existing realm changes (such as if we see a previously-unknown user try to login), we can REFRESH MATERIALIZED VIEW CONCURRENTLY cached_users;
to.... refresh the cache. The CONCURRENTLY
means it happens in the background, so SELECT
s happening while the data is being retrieved see the old version. It's not staying as close to realtime as possible; we could do that with cron or a systemd timer if we really wanted to, but for our purposes refreshing on new organizations being created or unknown users authenticating suffices.
Wrap-up
Overall this has added some complexity to our database setup. We're no longer running stock Postgres since both Multicorn and foreign-keycloak-wrapper must be installed. Discrepancies between Python versions bundled with Postgres on various operating systems have also caused some issues -- universally resolved with a careful inspection of the Postgres configuration and the install logs, but annoying. Docker's taken some of the pain out of that, since we can ship an image with everything ready to go and use volumes to persist data.
Lastly, realms have to be created in Keycloak before we can do anything with them, so there are more moving parts to keep track of. Oh well; we have our unified user cache so the application logic stays simple, and that's all we wanted out of it. SSO is supposed to make life easier for users, not necessarily for architects!