Fixing Slow Row-Level Security Policies
At my day job, we use row-level security extensively. Several different roles interact with Postgres through the same GraphQL API; each role has its own grants and policies on tables; whether a role can see record X in table Y can depend on its access to record A in table B, so these policies aren't merely a function of the contents of the candidate row itself. There's more complexity than that, even, but no need to get into it.
Two tables, then.
set jit = off; -- just-in-time compilation mostly serves to muddy the waters here
create table tag (
id int generated always as identity primary key,
name text
);
insert into tag (name)
select * from unnest(array[
'alpha', 'beta', 'gamma', 'delta', 'epsilon', 'zeta', 'eta', 'iota', 'kappa', 'lambda', 'mu',
'nu', 'xi', 'omicron', 'pi', 'rho', 'sigma', 'tau', 'upsilon', 'phi', 'chi', 'psi', 'omega'
]);
create table item (
id int generated always as identity primary key,
value text,
tags int[]
);
insert into item (value, tags)
select
md5(random()::text),
array_sample((select array_agg(id) from tag), trunc(random() * 4)::int + 1)
from generate_series(1, 1000000);
create index on item using gin (tags);
alter table tag enable row level security;
alter table item enable row level security;
We'll set up two roles to compare performance. item_admin
will have a simple policy allowing it to view all items, while item_reader
's access will be governed by session settings that the user must configure before attempting to query these tables.
create role item_admin;
grant select on item to item_admin;
grant select on tag to item_admin;
create policy item_admin_tag_policy on tag
for select to item_admin
using (true);
create policy item_admin_item_policy on item
for select to item_admin
using (true);
create role item_reader;
grant select on item to item_reader;
grant select on tag to item_reader;
-- `set item_reader.allowed_tags = '{alpha,beta}'` and see items tagged
-- alpha or beta
create policy item_reader_tag_policy on tag
for select to item_reader
using (
current_setting('item_reader.allowed_tags') is not null and
current_setting('item_reader.allowed_tags')::text[] @> array[name]
);
create policy item_reader_item_policy on item
for select to item_reader
using (
exists (
select 1 from tag
where item.tags @> array[tag.id]
)
);
Before we proceed, this post includes a lot of explain
plans. These can be a bit intimidating to read at first (although reading them is a skill very much worth developing if you have any stake in making Postgres databases fast!). There are many explain
helpers that you can copy-paste the plans into for a more intuitively structured view; some popular offerings include depesz's, pgExplain, and Dalibo.
Baseline Performance
Okay! Let's look at an example query, first as item_admin
. This retrieves item
s tagged alpha
and having a value
beginning with the letter A.
set role item_admin;
explain (analyze, verbose, costs, timing) select * from item where value ilike 'a%' and tags && array[1];
QUERY PLAN
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Bitmap Heap Scan on public.item (cost=748.75..14704.25 rows=109556 width=68) (actual time=7.041..50.822 rows=6791 loops=1)
Output: id, value, tags
Recheck Cond: (item.tags && '{1}'::integer[])
Filter: (item.value ~~* 'a%'::text)
Rows Removed by Filter: 101875
Heap Blocks: exact=12311
-> Bitmap Index Scan on item_tags_idx (cost=0.00..721.36 rows=109567 width=0) (actual time=5.762..5.762 rows=108666 loops=1)
Index Cond: (item.tags && '{1}'::integer[])
Query Identifier: 1548793227074419886
Planning Time: 0.123 ms
Execution Time: 51.001 ms
So far, so good: we're using the GIN index, the bitmap is small enough to stay lossless (Heap Blocks: exact=12311
) so nothing needs to be rechecked, the filter is filtering, 51ms.
Now, as item_reader
:
set role item_reader;
set item_reader.allowed_tags = '{alpha,beta}';
explain (analyze, verbose, costs, timing) select * from item where value ilike 'a%' and tags && array[1];
QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Seq Scan on public.item (cost=0.00..41777312.00 rows=54778 width=68) (actual time=0.476..8100.051 rows=6791 loops=1)
Output: item.id, item.value, item.tags
Filter: (EXISTS(SubPlan 1) AND (item.value ~~* 'a%'::text) AND (item.tags && '{1}'::integer[]))
Rows Removed by Filter: 993209
SubPlan 1
-> Seq Scan on public.tag (cost=0.00..41.75 rows=1 width=0) (actual time=0.008..0.008 rows=0 loops=1000000)
Filter: ((current_setting('item_reader.allowed_tags'::text) IS NOT NULL) AND ((current_setting('item_reader.allowed_tags'::text))::text[] @> ARRAY[tag.name]) AND (item.tags @> ARRAY[tag.id]))
Rows Removed by Filter: 18
Query Identifier: 1548793227074419886
Planning Time: 0.135 ms
Execution Time: 8100.319 ms
That's roughly 150 times slower. What's the difference?
The item_reader
plan is very different, in fact. Instead of using the GIN index on item.tags
, we're sequentially scanning tag
in the subplan.
We're sequentially scanning tag
in the subplan a million times, once per item
record (loops=1000000
).
That's probably the issue.
How can we fix it?
Use More Efficient Operations
Our policy on item
tests array containment, but that's kind of overkill; the any
operation is better optimized for what we need here.
set role postgres;
drop policy if exists item_reader_item_policy on item;
create policy item_reader_item_policy on item
for select to item_reader
using (
exists (
select 1 from tag
where tag.id = any(item.tags)
)
);
set role item_reader;
explain (analyze, verbose, costs, timing) select * from item where value ilike 'a%' and tags && array[1];
QUERY PLAN
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Seq Scan on public.item (cost=0.00..15035202.44 rows=53295 width=68) (actual time=0.531..2441.101 rows=6854 loops=1)
Output: item.id, item.value, item.tags
Filter: (EXISTS(SubPlan 1) AND (item.value ~~* 'a%'::text) AND (item.tags && '{1}'::integer[]))
Rows Removed by Filter: 993146
SubPlan 1
-> Bitmap Heap Scan on public.tag (cost=4.23..15.01 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1000000)
Recheck Cond: (tag.id = ANY (item.tags))
Filter: ((current_setting('item_reader.allowed_tags'::text) IS NOT NULL) AND ((current_setting('item_reader.allowed_tags'::text))::text[] @> ARRAY[tag.name]))
Rows Removed by Filter: 2
Heap Blocks: exact=1000000
-> Bitmap Index Scan on tag_pkey (cost=0.00..4.23 rows=10 width=0) (actual time=0.001..0.001 rows=3 loops=1000000)
Index Cond: (tag.id = ANY (item.tags))
Query Identifier: -1492990194093681799
Planning Time: 0.102 ms
Execution Time: 2441.345 ms
Quite a bit better, although still very slow by comparison with the admin query. We're still performing a million loops, but the any
is able to take advantage of tag
's primary key index a million times instead of sequentially scanning it all million times.
Functions can be Execution Boundaries
The tags an item_reader
is allowed to see won't change in the middle of the query; even if there's an access-control table being updated concurrently, we're in our own transaction here and won't see or care about changes. There's no need to get the list anew for each candidate item
record. Let's try caching that using a function.
set role postgres;
create or replace function item_reader_allowed_tags()
returns int[]
language sql
begin atomic;
select array_agg(id)
from tag
where name = any(current_setting('item_reader.allowed_tags')::text[]);
end;
drop policy if exists item_reader_item_policy on item;
create policy item_reader_item_policy on item
for select to item_reader
using (
item_reader_allowed_tags() && item.tags
);
Et voilà:
set role item_reader;
explain (analyze, verbose, costs, timing) select * from item where value ilike 'a%' and tags && array[1];
QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Seq Scan on public.item (cost=0.00..279812.00 rows=1096 width=68) (actual time=0.911..13425.743 rows=6791 loops=1)
Output: id, value, tags
Filter: ((item_reader_allowed_tags() && item.tags) AND (item.value ~~* 'a%'::text) AND (item.tags && '{1}'::integer[]))
Rows Removed by Filter: 993209
Query Identifier: 1548793227074419886
Planning Time: 0.102 ms
Execution Time: 13426.034 ms
Hold up a minute. That's worse. Worse even than the baseline, and not by a little.
Functions Need Configuring
Functions are extremely powerful. This means that Postgres considers that any function could get up to absolutely anything in the database, and that means that the query engine has to treat it like it's a screwdriver-slip away from criticality. However, we can pinky promise that our function doesn't do anything untoward.
set role postgres;
alter function item_reader_allowed_tags stable;
set role item_reader;
explain (analyze, verbose, costs, timing) select * from item where value ilike 'a%' and tags && array[1];
QUERY PLAN
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Bitmap Heap Scan on public.item (cost=1353.87..68540.82 rows=22474 width=68) (actual time=12.503..2880.043 rows=6791 loops=1)
Output: id, value, tags
Filter: ((item_reader_allowed_tags() && item.tags) AND (item.value ~~* 'a%'::text) AND (item.tags && '{1}'::integer[]))
Rows Removed by Filter: 200730
Heap Blocks: exact=12312
-> Bitmap Index Scan on item_tags_idx (cost=0.00..1348.25 rows=205140 width=0) (actual time=11.115..11.115 rows=207521 loops=1)
Index Cond: (item.tags && item_reader_allowed_tags())
Query Identifier: 1548793227074419886
Planning Time: 0.280 ms
Execution Time: 2880.319 ms
Finally, item_reader
gets to use the GIN index! With the default volatility setting of volatile
, Postgres assumes both that the function could do anything and that "anything" could change at any time, including from successive invocations in the same statement. If we say instead that it will always return the same output for the same input within a statement, the planner takes us at our word and lets us perform an index scan.
This is much better than our eight-second baseline, but still actually a bit worse than the simple inline any
policy. We've definitely got a ways to go before we approach the 50ms times those lucky admins get. What if we parallelize?
set role postgres;
alter function item_reader_allowed_tags parallel safe;
set role item_reader;
explain (analyze, verbose, costs, timing) select * from item where value ilike 'a%' and tags && array[1];
QUERY PLAN
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Gather (cost=2353.87..39777.84 rows=22474 width=68) (actual time=15.013..1008.731 rows=6791 loops=1)
Output: id, value, tags
Workers Planned: 2
Workers Launched: 2
-> Parallel Bitmap Heap Scan on public.item (cost=1353.87..36530.44 rows=9364 width=68) (actual time=9.112..997.387 rows=2264 loops=3)
Output: id, value, tags
Filter: ((item_reader_allowed_tags() && item.tags) AND (item.value ~~* 'a%'::text) AND (item.tags && '{1}'::integer[]))
Rows Removed by Filter: 66910
Heap Blocks: exact=4128
Worker 0: actual time=5.971..994.315 rows=2253 loops=1
Worker 1: actual time=6.901..994.412 rows=2256 loops=1
-> Bitmap Index Scan on item_tags_idx (cost=0.00..1348.25 rows=205140 width=0) (actual time=13.099..13.099 rows=207521 loops=1)
Index Cond: (item.tags && item_reader_allowed_tags())
Query Identifier: 1548793227074419886
Planning Time: 0.336 ms
Execution Time: 1008.934 ms
Well, we parallelize. That sounds about right. It's somewhat better than halved from the stable
plan but I'll chalk that up to a warm cache.
There are two more settings that sound like they could be useful.
set role postgres;
alter function item_reader_allowed_tags cost 100000;
You can find some recommendations online to set the function's estimated execution cost to a very high value to "encourage" the planner to execute it as few times as possible. In our example here, upcosting shaves off a few milliseconds, but we're in rounding-error territory. I think there are likely some scenarios in which this does make a significant difference, but I'm not sure what those might be; this is not one. And in any case, it's a kludge.
set role postgres;
alter function item_reader_allowed_tags cost 100;
alter function item_reader_allowed_tags leakproof;
Leakproofness asserts that the function cannot reveal information about its arguments through a side channel (e.g. error messages) in addition to through its return value. Potentially-leaky functions are treated as security barriers that have to be passed before other conditions can be evaluated. However, alter function item_reader_allowed_tags leakproof
doesn't seem to change the plan noticeably. Stick a pin in that.
Is This as Good as it Gets?
It sure seems like we're stuck. There are still some differences from the admin query plan: as admin, we don't care about item_reader_allowed_tags
, and our index scan performs the test for tag id 1. As reader, our index scan checks for allowed tags, that apparently being more cost-optimal during planning, but then in a post-scan filter we run item_reader_allowed_tags
a second time along with the
This is the point where I started source-diving. My initial assumption was that, because RLS was active, the policy predicates needed to be checked separately from and before those in the where
clause. However, item_reader_allowed_tags
is a simple select
statement at heart, and, especially if it's leakproof
, it might be inlineable. If so, then all the predicates ("quals" in internalsese) get dumped into one big bucket for the planner to pick through as it likes, which would make the post-scan filtering rather odd. I just needed to find out whether that was happening, and this information isn't available in the explain
plan.
Internal Representations
The explain
plans I've been posting throughout are more-or-less human-readable representations of what the query planner intends to do or has done. They are rather lossy summaries, it turns out. If you're brave, foolish, desperate, or an internals developer, though, you can turn on a couple of client settings that dump the planner's internal representation out.
set debug_print_plan = on;
set client_min_messages = log;
It's a bad idea to use tab-completion after setting these, for the record. Disable by setting them to off
and notice
, respectively.
I'm not going to copy the internal representation of the simple test query we've been using here because it's 2500 lines long and the first thing I want to point out is an absence anyway.
After I proved out my performance fix on one role-policy combination, the next set I'm tackling produces an internal representation of more than 2 million lines, well past my terminal's scrollback.
psql
doesn't pipe debug messages into stdout or stderr, so I wound up having topsql -c "set debug_print_plan=on; set client_min_messages=log; explain analyze ...." 2>&1 | tee out.plan
to capture it.
To test my assumption, I was looking for the :hasRowSecurity
flag. If it appeared, that would mean item_reader_allowed_tags
is not inlined, and constitutes a security barrier forcing quals to be evaluated in multiple stages. The where
quals would only be evaluated for a candidate record after the quals in the function, there's an obvious explanation for the post-index-scan filtering step, we're not happy but we understand it.
I didn't see :hasRowSecurity
anywhere.
What I did see was not one, but seven plans produced for my test query. Try it!
How Does One Query Turn into Seven Plans?
Or, functions can be execution boundaries, oh no! In order, the plans are:
- Aggregation of
tag.id
based on an array-containment (OPEXPR :opno 2751 :opfuncid 2748
) betweencurrent_setting
(FUNCEXPR :funcid 2077
) and an array instantiation. This is clearlyitem_reader_allowed_tags
, although there's some extra funny business like a null test. - Is the same plan again.
- Gathering a bitmap heap scan on
item
, with three quals involved testingitem_reader_allowed_tags
overlap with thetags
column; a textilike
comparison; and a second array overlap with a constant. Only the first qual makes it down into the plan tree. This is the GIN index scan. - Guess what, it's plan 1 again!
- And again!
- And yet again!
- And one for the road!
That's right: six identical plans for RLS, one for the actual query. The null test is from the policy on tag
that ensures the item_reader.allowed_tags
setting is non-null, so at least it's kind of inlining. The next question is how we get to six.
In the course of doing this analysis for real, I'd been searching up and reading anything I could. What put me on the right track was a discussion in PostgREST's docs issue tracker about performance implications of current_setting
in RLS. current_setting
, like many other leakproof
. We may have declared item_reader_allowed_tags
leakproof
so it could qualify for inlining, but current_setting
remains a barrier -- it just doesn't have any quals, so :hasRowSecurity
doesn't need to be invoked.
Between the policy on tag
and item_reader_allowed_tags
we have three invocations of current_setting
. item_reader_allowed_tags
is parallel safe
, and we spin up two workers. That's six. It's easy enough to verify: mark the function parallel unsafe
, and we're down to four redundant plans. One inside item_reader_allowed_tags
, two in the tag
policy; I'm not sure where the fourth is coming from.
The PostgREST discussion contributors did work out how best to optimize the policies. If a plan has no dependencies, it can be pulled up into an InitPlan, which is guaranteed to run only once and is then referenced by other plans. All it takes is keeping the current_setting
invocations in their own select
s:
set role postgres;
drop policy if exists item_reader_tag_policy on tag;
create policy item_reader_tag_policy on tag
for select to item_reader
using (
(select current_setting('item_reader.allowed_tags')::text[]) is not null and
(select current_setting('item_reader.allowed_tags')::text[]) @> array[name]
);
create or replace function item_reader_allowed_tags(allowed_tags text[])
returns int[]
language sql
stable
leakproof
parallel safe
begin atomic;
select array_agg(id)
from tag
where name = any(allowed_tags);
end;
drop policy if exists item_reader_item_policy on item;
create policy item_reader_item_policy on item
for select to item_reader
using (
item_reader_allowed_tags(
(select current_setting('item_reader.allowed_tags')::text[])
) && item.tags
);
With that, we've almost cut our query time in half again:
set role item_reader;
explain (analyze, verbose, costs, timing) select * from item where value ilike 'a%' and tags && array[1];
QUERY PLAN
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Gather (cost=1125.07..14581.22 rows=1096 width=68) (actual time=13.479..627.433 rows=6791 loops=1)
Output: item.id, item.value, item.tags
Workers Planned: 2
Workers Launched: 2
InitPlan 1
-> Result (cost=0.00..0.02 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=1)
Output: (current_setting('item_reader.allowed_tags'::text))::text[]
-> Parallel Bitmap Heap Scan on public.item (cost=125.05..13471.60 rows=457 width=68) (actual time=7.711..617.697 rows=2264 loops=3)
Output: item.id, item.value, item.tags
Filter: ((item_reader_allowed_tags((InitPlan 1).col1) && item.tags) AND (item.value ~~* 'a%'::text) AND (item.tags && '{1}'::integer[]))
Rows Removed by Filter: 66910
Heap Blocks: exact=4130
Worker 0: actual time=4.878..614.809 rows=2285 loops=1
Worker 1: actual time=5.181..614.866 rows=2278 loops=1
-> Bitmap Index Scan on item_tags_idx (cost=0.00..124.78 rows=10000 width=0) (actual time=11.706..11.706 rows=207521 loops=1)
Index Cond: (item.tags && item_reader_allowed_tags((InitPlan 1).col1))
Query Identifier: 1548793227074419886
Planning Time: 0.261 ms
Execution Time: 627.645 ms
The internal representation still shows a surprising number of plans -- 5, with 4 still being redundant item_reader_allowed_tags
invocations, this time using a Param node in place of the FuncExpr. And while better than anything else we've gotten by a not-inconsiderable margin, it still takes more than half a second, compared to the admin's 50ms.
This, and especially the presence of multiple item_reader_allowed_tags
plans, suggests that we simply aren't InitPlanning hard enough. We know it's not just current_setting
that's stable within a single query; item_reader_allowed_tags
itself won't change either, but something's preventing the current_setting
invocations it triggers in the tag
policy from using the InitPlan we generate in the item
policy. So let's stuff that whole invocation in an InitPlan! We'll keep the select
-wrapped current_setting
invocation in its arguments since we might as well eke out every bit of reusability we can, but ensuring we only call item_reader_allowed_tags
once will probably have a far bigger effect than that.
set role postgres;
drop policy if exists item_reader_item_policy on item;
create policy item_reader_item_policy on item
for select to item_reader
using (
(select item_reader_allowed_tags(
(select current_setting('item_reader.allowed_tags')::text[])
)) && item.tags
);
set role item_reader;
explain (analyze, verbose, costs, timing) select * from item where value ilike 'a%' and tags && array[1];
QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Bitmap Heap Scan on public.item (cost=125.08..12531.39 rows=1084 width=68) (actual time=12.057..93.546 rows=6726 loops=1)
Output: item.id, item.value, item.tags
Recheck Cond: ((InitPlan 2).col1 && item.tags)
Filter: ((item.value ~~* 'a%'::text) AND (item.tags && '{1}'::integer[]))
Rows Removed by Filter: 200646
Heap Blocks: exact=12311
InitPlan 2
-> Result (cost=0.02..0.28 rows=1 width=32) (actual time=0.076..0.076 rows=1 loops=1)
Output: item_reader_allowed_tags((InitPlan 1).col1)
InitPlan 1
-> Result (cost=0.00..0.02 rows=1 width=32) (actual time=0.003..0.003 rows=1 loops=1)
Output: (current_setting('item_reader.allowed_tags'::text))::text[]
-> Bitmap Index Scan on item_tags_idx (cost=0.00..124.53 rows=10000 width=0) (actual time=10.942..10.942 rows=207372 loops=1)
Index Cond: (item.tags && (InitPlan 2).col1)
Query Identifier: 2531873819528607867
Planning Time: 0.088 ms
Execution Time: 93.713 ms
It did have a far bigger effect! We're not beating the admin timing, but considering we have to perform double the tests on item.tags
, not-quite-double the duration is probably about as good as we can hope for, until Postgres' planner gets smart enough to combine the two tests in the index scan.
The internal plan representation is also finally down to two separate plans: the bitmap heap scan, and a single instance of item_reader_allowed_tags
.