datasette

No description

Shell 51.2%
Nix 48.8%

Find a file

heyarne a8e558b3d5 Single build output: Earth.db with cumulative CO2 emissions		2021-06-11 10:31:08 +02:00
bin	Initial commit	2021-02-27 09:06:12 +01:00
dbs	Add population stats	2021-02-27 09:42:02 +01:00
importers	Single build output: Earth.db with cumulative CO2 emissions	2021-06-11 10:31:08 +02:00
sources	Add population stats	2021-02-27 09:42:02 +01:00
CHECKS	Initial commit	2021-02-27 09:06:12 +01:00
default.nix	Single build output: Earth.db with cumulative CO2 emissions	2021-06-11 10:31:08 +02:00
flake.lock	Set up `nix develop` to work with nix flakes	2021-06-03 11:57:14 +02:00
flake.nix	Set up `nix develop` to work with nix flakes	2021-06-03 11:57:14 +02:00
metadata.yml	Add descriptions	2021-02-27 10:05:53 +01:00
Procfile	Initial commit	2021-02-27 09:06:12 +01:00
README.org	Add population stats	2021-02-27 09:42:02 +01:00
requirements.txt	Initial commit	2021-02-27 09:06:12 +01:00
shell.nix	Set up `nix develop` to work with nix flakes	2021-06-03 11:57:14 +02:00

README.org

datasette
- Database Setup
  - Use snake_case for all file names
  - Earth
    - Cumulative CO2 emissions
    - Population, Historic and Projected
- Pitalls
  - My queries are slow, what do I do?

datasette

This repository contians everything you need to deploy SQLite databases to dokku via https://datasette.io/.

All files matching `db/*.db` will get exposed in the interface. Simply add your new database if all you want to do is make it accessible through the web.
You can edit `metadata.yml` to add descriptions, link to sources and add predefined queries.
The `requirements.txt` contains additional packages such as plugins. Please add them manually and without a version string because python package management is kind of messy.
The `CHECKS` file contains a URL and an expected string that is returned in this URLs HTTP response. This is added as a post-deployment sanity check and you shouldn't need to change it.
Optional: If anything should happen after deployment, edit the `bin/post_compile` script. You can use this to fetch data from other sources for example.

The databases are assumed to be immutable or read-only. This allows us to use efficient caching by configuring nginx as a caching reverse proxy and serving content a static cache. This effectively that queries often only have to be run the first time after the database has changed and are afterwards served from a file-system cache.

Database Setup

This section aims to contain all information needed to convert data from their respective source files to SQLite databases. This should make the updating process easier when the sources change

Use snake_case for all file names

cd sources
for file in $(fd --type f); do
    mv $file $(basename $file | sed 's/-/_/g')
done

Earth

Cumulative CO2 emissions

Source: https://ourworldindata.org/grapher/cumulative-co-emissions

csvs-to-sqlite \
    --shape 'Entity:entity(text),Code:code(text),Year:year(integer),Cumulative CO2 emissions:cumulative_co2_emissions(real)' \
    --index entity \
    --index code \
    --index year \
    --replace-tables \
    sources/cumulative_co2_emissions.csv  dbs/earth.db

Loaded 1 dataframes
Created dbs/earth.db from 1 CSV file

Population, Historic and Projected

sed -i 's/"Population by country and region, historic and projections (Gapminder, HYDE & UN)"/Population/' sources/population.csv
csvs-to-sqlite \
    --shape 'Entity:entity(text),Code:code(text),Year:year,Population:population(integer)' \
    --index entity \
    --index code \
    --index year \
    --replace-tables \
    sources/population.csv  dbs/earth.db

Loaded 1 dataframes
Added 1 CSV file to dbs/earth.db

Pitalls

My queries are slow, what do I do?

Make sure SQlite uses the correct indices. You can debug this by writing

EXPLAIN QUERY PLAN SELECT …

and then continuing your `SELECT` query like you normally would.