The Ultimate Postgres vs MySQL Blog Post
I should probably say up front that I love working with Postgres and could die happy without ever seeing a mysql>
prompt again. This is not an unbiased comparison -- but those are no fun anyway.
The scenario: two applications, using Massive.js to store and retrieve data. Massive is closely coupled to Postgres by design. Specializing lets it take advantage of features which only exist in some or no other relational databases to streamline data access in a lighter, more "JavaScripty" way than a more traditional object-relational mapper. It's great for getting things done, since the basics are easy and for the complicated stuff where you'd be writing SQL anyway.... you write SQL, you store it in one central place for reuse, and the API makes running it simple.
Where Massive is less useful is if you have to support another RDBMS. This is, ideally, something you know about up front. Anyway: things happen, and sometimes you find yourself having to answer the question "what's it going to look like if we need to run these applications with light but tightly coupled data layers on MySQL?"
Not good, was the obvious answer, but less immediately obvious was how not good. I knew there were some things Postgres did that MySQL didn't, but I also knew there were a ton of things I'd just never tried in the latter. So as I got to work on this, I started keeping notes. Here's everything I found.
Schema Layout
Now that we're all basically over the collective hallucination of a "schemaless" future, arguably the most important aspect of data storage is how information is modeled in a database. Postgres and MySQL are both relational databases, grouping records in strictly-defined tables. But there's a lot of room for variation within that theme.
Multiple Schemas
First things first: "schema" doesn't always mean the same thing. To MySQL, "schema" is synonymous with "database". For Postgres, a "schema" is a namespace within a database, which allows you to group tables, views, and functions together without having to break them apart into different databases.
MySQL's simplicity in this respect is ameliorated by its offering cross-database queries:
SELECT *
FROM db1.table1 t1
JOIN db2.table2 t2 ON t2.t1_id = t1.id;
With Postgres, you can work across schemas, but if you need to query information in a different database, that's a job for...
Foreign Data Wrappers
Foreign data wrappers let Postgres talk to practically anything that represents information as discrete records. You can create a "foreign table" in a Postgres database and SELECT
or JOIN
it like any other table -- only under the hood, it's actually reading a CSV, talking to another DBMS, or even querying a REST API. It's a powerful enough feature that NoSQL stalwart MongoDB sneakily built their BI Connector on top of Postgres with foreign data wrappers. You don't even need to know C to write a new FDW when Multicorn lets you do it in Python!
Oracle and SQL Server both have some functionality for registering external data sources, but Postgres' offering is the most extensible I'm aware of. MySQL, besides the inter-database query support mentioned above, has nothing.
Table Inheritance
Inheritance is more commonly thought of as an attribute of object-oriented programming languages rather than databases, but Postgres is technically an ORDBMS or object-relational database management system. So you can have a table cities
with columns name
and population
, and a table capitals
which inherits the definition of cities
but adds an of_country
column only relevant, of course, for capital cities. If you SELECT
from cities
, you get rows from capitals
-- they're cities too! You can of course SELECT name FROM ONLY cities
to exclude the capitals. This is something of a niche feature, but when you have the right use case it really shines.
MySQL, being a traditional RDBMS, doesn't do this.
Materialized Views
Materialized views are like regular views, except the results of the specifying query are physically stored ('materialized') and must be explicitly refreshed. This allows database developers to cache the results of slower queries when the results don't have to be realtime.
Oracle has materialized views, and SQL Server's indexed views are similar, but MySQL has no materialized view support.
Check Constraints
Constraints in general ensure that invalid data is not stored. The most common constraint is NOT NULL
, which prevents records without a value for the non-nullable column from being inserted or updated. Foreign key constraints do likewise when a reference to a record in another table is invalid. Check constraints are the most flexible, and allow validation of any predicate you could put in a WHERE
clause -- for example, asserting that prices have to be positive numbers, or that US zip codes have to be five digits.
Per the MySQL docs: the CHECK
clause is parsed but ignored by all storage engines.
JSONB and Indexing
Postgres and MySQL both have a JSON
column type (MySQL replacement MariaDB does too, but it's currently just an alias for LONGTEXT
) and functions for building, processing, and querying JSON fields. Postgres actually goes a step further by offering a JSONB
type which processes input data into a binary format. This means it's a little bit slower to write, but much faster to query.
It also means you can index the binary data. A GIN or Generalized INverted index allows queries checking for the existence of specific keys or key-value pairs to avoid scanning every single record for matches. This is huge if you run queries which dig into JSON fields in the WHERE
clause.
Default Values Defined by Functions
DEFAULT
is a useful specification for columns in a CREATE TABLE
statement. At the simplest level, this could be used to baseline a boolean field to true
or false
if the INSERT
statement doesn't give an explicit value. But you can do more than set a scalar value: a timestamp can default to now()
, a UUID to any of a variety of UUID-generating functions, any other field to the value returned by whatever function you care to write -- the sky's the limit!
Unless you're using MySQL, in which case the only function you can reference in a DEFAULT
clause is now()
.
Type Differences
Layout's only part of the story, though. Equally important is the difference in type support. The benefit of a robust type system is in enabling database architects to represent information with the greatest accuracy possible. If a value is difficult or impossible to represent with
UUIDs
Postgres has a UUID type. MySQL does not. If you want to store a UUID in MySQL, your options are CHAR, if you want values to be as human-readable as UUIDs ever are, or BINARY, if you want it to be faster but more difficult to work with manually. Postgres also generates more types of UUIDs.
Booleans
Boolean seems like a pretty basic type to have! However, MySQL's boolean is actualy an alias for TINYINT(1). This is why query results show 0 or 1 instead of true
or false
. It's also why you can set the value of an ostensibly boolean field to 2. Try it!
Postgres: has proper booleans.
Varlena and Lengths
MySQL isn't alone in aliasing standard types in strange ways, however. CHAR, VARCHAR, and TEXT types in Postgres are all aliased representations of the same structure -- the only distinction is that length constraints will be enforced if specified. The documentation notes that this is actually slower, and recommends that unbounded text simply be defined as the TEXT type instead of given an arbitrary maximum length.
What's happening here is that Postgres uses a data structure called a varlena, or VAriable LENgth Array, to store the information. A varlena's first four bytes store the length of the value, making it easy for the database to pick the whole thing out of storage. TEXT is only one of the types that uses this structure, but it's easily the most commonly encountered.
If a varlena is longer than would fit inline, the database uses a system called TOAST ("The Oversized Attribute Storage Technique") to offload it to extended storage transparently. Queries with predicates involving a TOASTable field might not be all that performant with large tables unless designed and indexed carefully, but when the database is returning records it's easy enough to follow the TOAST pointer that the overhead is barely noticeable for most cases.
The upshot of all this, as far as most people are concerned, is this: with Postgres, you only have to worry about establishing a length constraint on fields that have a reason for a length constraint. If there's no clear requirement to limit how much information can go into a field, you don't have to pick an arbitrary number and try to match it up with your page size.
Arrays
Non-scalar values in records! Madness! Dogs and cats living together! Anyone who's worked with JSON, XML, YAML, or even HTML understands that information isn't always flat. Relational architectures have traditionally mandated breaking out any vectors, let alone even more complex values, into new tables. Sometimes that's useful, but often enough it adds complexity to no real purpose. Inlining arrays makes many tasks -- such as tagging records -- much easier.
Postgres has arrays, as does Oracle; MySQL and SQL Server don't.
Customizing Types
If the CHECK
clause to ensure that values consist of five digits, optionally followed by a dash and four more digits.
If you're using Postgres, that is. Oracle and SQL Server both offer some custom type functionality, but MySQL has nothing. You can't even use table-level CHECK
constraints because the engine simply ignores them.
Enums
Enumerations don't get enough love. If I had a dollar for every INT -- or worse, VARCHAR -- field I've seen representing one of a fixed set of potential values, I probably still couldn't retire but I could at least have a pretty nice evening out. There are drawbacks to using enums, to be sure: adding new values requires DDL, and you can't remove values at all. But appropriate use cases for them are still reasonably common.
MySQL and Postgres both offer enums. The critical distinction is that Postgres' enums are proper reusable types. MySQL's enums are more like the otherwise-ignored CHECK
constraints and specify a valid value list for a single column in a single table. Possible improvement on allowing a boolean column to contain -100?
Querying Data
So that's data modeling covered. There's an entire other half to go: actually working with the information being stored. SQL itself is divided in two parts, the "data definition language" which defines the structure of a database and the "data manipulation language". This latter comprises the SELECT
, INSERT
, and other statements most people think of when they hear the name "SQL". And just as with modeling, there are substantial differences between Postgres and MySQL in querying.
RETURNING
Autogenerating primary keys takes a huge headache out of storing data. But there's one catch: when you insert a new record into a table, you don't know what its primary key value got set to. Most relational databases will tell you what the last autogenerated key was if you call a special function; some, like SQL Server, even let you filter down to the single table you're interested in.
Postgres goes above and beyond with the RETURNING
clause. Any write statement -- INSERT
, UPDATE
, DELETE
-- can end with a RETURNING [column-list]
, which acts as a SELECT
on the affected records. RETURNING *
gives you the entire recordset from whatever you just did, or you can restrict what you're interested in to certain columns.
That means you can do this:
INSERT INTO foos (name)
VALUES ('alpha'), ('beta')
RETURNING *;
id │ name
────┼───────
1 │ alpha
2 │ beta
(2 rows)
With MySQL, you're stuck with calling LAST_INSERT_ID()
after you add a new record. If you added multiple, LAST_INSERT_ID
only gives you the earliest new id, leaving you to work out the rest yourself. And of course, this is only good for integer primary keys.
MySQL also has no counterpart to this functionality for UPDATE
s and DELETE
s. Competitor MariaDB supports RETURNING
on DELETE
, but not on any other kind of statement.
Common Table Expressions
Common Table Expressions or CTEs allow complex queries to be broken up and assembled from self-contained parts. You might write this:
WITH page_visits AS (
SELECT p.id, p.site_id, p.title, COUNT(*) AS visits
FROM pages AS p
JOIN page_visitors AS v ON v.page_id = p.id
GROUP BY p.id, p.site_id, p.title
), max_visits AS (
SELECT DISTINCT ON (site_id)
site_id, title, visits
FROM page_visits
ORDER BY site_id, visits DESC
)
SELECT s.id, s.name,
max_visits.title AS most_popular_page,
SUM(page_visits.visits) AS total_visits
FROM sites AS s
JOIN page_visits ON page_visits.site_id = s.id
JOIN max_visits ON max_visits.site_id = s.id
GROUP BY s.id, s.name, max_visits.title
ORDER BY total_visits DESC;
In the first query, we aggregate visit counts; in the second, we use DISTINCT ON
on the results of the first to filter out all but the most popular pages; finally, we join both of our intermediary results to provide the output we're looking for. CTEs are a really readable way to factor query logic out, and they let you do some things in one statement that you can't otherwise.
MySQL does have CTEs! However: thanks to the RETURNING
clause, Postgres can write records in a CTE and operate on the results. This is huge for application logic. This next query writes a record in a CTE, then adds a corresponding entry to a junction table -- all in the same transaction.
WITH wine AS (
INSERT INTO wines (name, year)
VALUES ('Herrenreben', 2015)
RETURNING id
), reviewer AS (
SELECT id
FROM reviewers
WHERE name = 'Wine Enthusiast'
)
INSERT INTO wine_ratings (wine_id, reviewer_id, score)
SELECT wine.id, reviewer.id, 92
FROM wine
JOIN reviewer ON TRUE;
Casting
Sometimes a query needs to treat a value as if it has a different type, whether to store it or to operate on it somehow. Postgres even lets you define additional conversions between types with CREATE CAST
.
MySQL supports casting values to binary, char/nchar, date/datetime/time, decimal, JSON, and signed and unsigned integers. Absent from this list: tinyints, which, since booleans are actually tinyints, means you're stuck with conditionals when you need to coerce a value to true or false for storage in a "boolean" column.
Lateral Joins
A lateral join is fundamentally similar to a correlated subquery, in that it executes for each row of the current result set. However, a correlated subquery is limited to returning a single value for a SELECT
list or WHERE
clause; subqueries in the FROM
clause run in isolation. A lateral join can refer back to information in the rest of the result set:
CREATE TABLE docs (id serial, body jsonb);
INSERT INTO docs (body) VALUES ('{"a": "one", "b": "two"}'), ('{"c": "three"}');
SELECT docs.id, keys.*
FROM docs
JOIN LATERAL jsonb_each(docs.body) AS keys ON TRUE;
id │ key │ value
────┼─────┼─────────
1 │ a │ "one"
1 │ b │ "two"
2 │ c │ "three"
(3 rows)
It can also invoke table functions like unnest
which return multiple rows and columns:
CREATE TABLE multiple_arrays(arr1 int[], arr2 int[]);
INSERT INTO multiple_arrays (arr1, arr2)
VALUES
('{1,2,3}', '{4,5}'),
('{6,7}', '{8,9,10}');
SELECT raw.*
FROM multiple_arrays
JOIN LATERAL unnest(arr1, arr2) AS raw ON TRUE;
unnest │ unnest
────────┼────────
1 │ 4
2 │ 5
3 │ (null)
6 │ 8
7 │ 9
(null) │ 10
(6 rows)
Oracle and SQL Server offer similar functionality with the LATERAL
keyword in the former, and CROSS APPLY
/OUTER APPLY
. MySQL does not.
Variadic Function Arguments
Functions! Procedures, if you believe in making that distinction! They're great! You can declare variadic arguments -- "varargs" or "rest parameters" in other languages -- to pull an arbitrary number of arguments into a single collection named for the final argument.
In Postgres.
Predicate Operations
A handful of useful operations which allow more expressive WHERE
clauses with Postgres:
IS DISTINCT FROM
and its counterpartIS NOT DISTINCT FROM
offer a null-sensitive equality test. Null isn't ordinarily comparable since it represents the absence of a value, so the predicateWHERE field <> 1
will not return records wherefield
is null.WHERE field IS DISTINCT FROM 1
returns all records wherefield
is other-than-1, including where it's null.ILIKE
is a case-insensitiveLIKE
operation. MySQL does have the capability for case-insensitive pattern matching, but it depends on your collation and can't be toggled on a per-query basis (the default collation is case-insensitive, to be completely fair).~
,~*
,!~
, and!~*
form a set of POSIX regular expression tests: match, case-insensitive match, no match, and no case-insensitive match respectively. MySQL does haveREGEXP
andNOT REGEXP
; however, Postgres' implementation has lookahead and lookbehind.
General Database Work
That's it for the architecture and query language feature gaps I discovered. I did run into a couple other things that bear mentioning, however:
Dependencies
MySQL doesn't care about dependencies among database objects. You can tell it to drop a table a view or proc depends on and it will go right ahead and drop it. You'll have no idea something's gone wrong until the next time you try to invoke the view or proc. Postgres saves you from yourself, unless you're really sure and drop your dependents too with CASCADE
.
Triggers and Table Writes
Just the mention of triggers is probably putting some people off their lunch. They're not that bad, honest (well, they can be, but it's not like it's their fault). Anyway, point is: sometimes you want to write a trigger that modifies other rows in the table it's being activated from.
Well, you can't in MySQL.
The End?
This may have exhausted me, but I'm pretty sure it's still not an exhaustive list of the feature gaps between Postgres and MySQL. I did cop to my preference up front, but having spent six weeks putting the effort into converting the comparison is pretty damning. I think there could still be reasons to pick MySQL -- but I'm not sure they could be technical.