This page is part of my digital garden.

This page might be unfinished or have typos. These pages are meant as part of a public living notebook to be edited over time. For more, visit the page explaining the concept of a digital garden.

Databases

Short version: Files are bad1. Use SQLite. If you can’t just use SQLite then use PostgreSQL. If PostgreSQL isn’t enough try reducing load with caching via Redis.

PostgreSQL Link to heading

SQLite Link to heading

SQLite is an open source project known for its reliability. It’s written in C, but with an extremely thorough test suite covering all kinds of conditions. Instead of being its own server it runs as part of the application so there is no network or socket communication needed.

In many tests only SQLite gets writing to disk correctly and is often recommended as a better way to write files in general even if you don’t need the full SQL features2.

It’s best to always run SQLite in WAL mode as that’s the safest version and also allows for better multi-threading/process support.

Interesting note: because SQLite is a library inside the application some query patterns that would be horrible for MySQL/PostgreSQL are great for SQLite. Namely making many small queries is faster than trying to do everything within a single query as you might with Postgres.

PlanetScale Link to heading

Nice serverless database based off Vitess with added DX features around it including advanced migrations built-in to the database.

The free plan is generous.

ElasticSearch Link to heading

ElasticSearch is a powerful tool for storing documents which can be indexed and then searched any number of ways.

It’s used for Log analysis although it wasn’t originally intended for that.

Generally you want to keep your real data in something like PostgreSQL and then write something to sync changes to ElasticSearch as well as provide a way to recreate the “index” in a zero down time way.

Elastic search has “indexes” which are akin to what you’d call a database in PostgreSQL. You can create aliases for indexes so “my-index” actually points to “my-index-2021-11-27-0” which lets you create a new index “my-index-2021-11-28-0” without affecting the first index. Once it’s created and replicated you can point the original alias to the new index and delete the old index.

Elastic search clustering is very nice because by default a cluster will distribute the data in Shards. You can configure how many shards are made per index as well as how many times the shards should be replicated. The ES instances can determine a master node via election which coordinates changes in the cluster. However, a cool thing, you can talk to any of the instances to do your query and all the relevant nodes with shards will receive a request to do part of the work and return it. This means ElasticSearch does not need any kind of load balancing in front of it because the load balancing takes place inside the cluster itself.


  1. File Consistency - Dan Luu, Files are fraught with peril - Dan Luu, Talk at DeconstructConf 2019 ↩︎

  2. Most databases got it wrong. Even PostgreSQL got it wrong and it revealed to the wider community that sync() in Linux when it fails that calling sync() again does not retry. Instead it clears the error flag and drops the write. SQLite got this correct in WAL mode which is not the default but is the recommended mode. ↩︎

Last updated on