Databases
Short version: Files are bad1. Use SQLite. If you can’t just use SQLite then use PostgreSQL. If PostgreSQL isn’t enough try reducing load with caching via Redis.
PostgreSQL Link to heading
- PostgreSQL had issues saving files in certain cases because of a misunderstanding of how
fsync
works- Also the recommendation to fix this is to switch to
iouring
which is taking a while and could have other unknown implications.
- Also the recommendation to fix this is to switch to
- Migrating Heroku Postgres to Aurora [2020]
- fly.io has support for creating PG clusters
- fly.io has made this documentation more clear. I still think it’s a bit risky to do, but it’s best to try it and see if it works for you. I feel like running PG inside fly.io will still have way better latency than talking to RDS since it’s not like you’re inside an AWS AZ as you’d be inside an EC2 instance.
- post explaining how to run postgres on a per project basis
- SUPER interesting
- postgres data in the project in
postgres/
folder + the unix socket is in the project root. - simple shell script to just run postgres in current dir
- using
direnv
andnix
to make it possible to do this automatic whencd
ing into the project dir.
SQLite Link to heading
SQLite is an open source project known for its reliability. It’s written in C, but with an extremely thorough test suite covering all kinds of conditions. Instead of being its own server it runs as part of the application so there is no network or socket communication needed.
In many tests only SQLite gets writing to disk correctly and is often recommended as a better way to write files in general even if you don’t need the full SQL features2.
It’s best to always run SQLite in WAL mode as that’s the safest version and also allows for better multi-threading/process support.
Interesting note: because SQLite is a library inside the application some query patterns that would be horrible for MySQL/PostgreSQL are great for SQLite. Namely making many small queries is faster than trying to do everything within a single query as you might with Postgres.
- http://sqlite.org
- LiteStream is a version of SQLite with features for replicating changes to things like S3 to handle failure of a server with the only copy of the data (or live migrating etc)
- One process notes by David Crawshaw
PlanetScale Link to heading
- website: https://planetscale.com/
Nice serverless database based off Vitess with added DX features around it including advanced migrations built-in to the database.
The free plan is generous.
ElasticSearch Link to heading
ElasticSearch is a powerful tool for storing documents which can be indexed and then searched any number of ways.
It’s used for Log analysis although it wasn’t originally intended for that.
Generally you want to keep your real data in something like PostgreSQL and then write something to sync changes to ElasticSearch as well as provide a way to recreate the “index” in a zero down time way.
Elastic search has “indexes” which are akin to what you’d call a database in PostgreSQL. You can create aliases for indexes so “my-index” actually points to “my-index-2021-11-27-0” which lets you create a new index “my-index-2021-11-28-0” without affecting the first index. Once it’s created and replicated you can point the original alias to the new index and delete the old index.
Elastic search clustering is very nice because by default a cluster will distribute the data in Shards. You can configure how many shards are made per index as well as how many times the shards should be replicated. The ES instances can determine a master node via election which coordinates changes in the cluster. However, a cool thing, you can talk to any of the instances to do your query and all the relevant nodes with shards will receive a request to do part of the work and return it. This means ElasticSearch does not need any kind of load balancing in front of it because the load balancing takes place inside the cluster itself.
File Consistency - Dan Luu, Files are fraught with peril - Dan Luu, Talk at DeconstructConf 2019 ↩︎
Most databases got it wrong. Even PostgreSQL got it wrong and it revealed to the wider community that
sync()
in Linux when it fails that callingsync()
again does not retry. Instead it clears the error flag and drops the write. SQLite got this correct in WAL mode which is not the default but is the recommended mode. ↩︎