Skip to content

Stack

The infrastructure layer. Where things run, how they deploy, where secrets live.

For the data model that runs on this stack, see Data model. For the architecture this stack implements, see Overview.


At a glance

Layer Technology Notes
VM Azure Linux (Ubuntu 24.04) esg-screening-01 in RG rg-esg-screening
Runtime Node.js + PM2 Single VM, single process
Database SQLite at data/esg.db better-sqlite3 driver
Scheduler node-cron in-process Sunday 02:00 UTC weekly
External APIs googleapis (was for Sheets writer — now likely deprecating per ADR-0002)
Tunnel Cloudflare Tunnel Connector token in 1Password
Auth Cloudflare Access OTP, 5-address allow list
DNS Cloudflare esg-screen.org zone in account mcmillangrubb
Repo GitHub McMillanGrubb/esg-screening
CI/CD (none yet) Manual git pull + pm2 restart on the VM

VM

Hostname: esg-screening-01 OS: Ubuntu 24.04 Login: azureuser (default Azure VM convention) Worktree: /home/azureuser/esg-screening (the repo working directory)

Access paths:

  • Cloudflare tunnel SSH (preferred) — ssh esg-screening-01 via the SSH config that routes through the tunnel
  • Public port 22 on the Azure NSG — still open as of last check, but redundant since the tunnel works. Worth closing.

Daily operations happen via CC-on-VM (see Session protocol).


Node + PM2

The Node application runs under PM2 for process management and auto-restart.

Process name What it is Where
esg-web Express server for the read interface (per ADR-0002) (pending — not yet running)
esg-scheduler node-cron scheduler for weekly scrape + score Enabled when ENABLE_SCHEDULER=1 env var is set

Manual triggers during development:

npm run scrape         # Trigger a scrape run on demand
npm run score [run_id] # Re-score against a specific or latest run

PM2 commands:

pm2 list
pm2 logs esg-web
pm2 restart esg-web
pm2 show esg-scheduler

SQLite

The DB file lives at data/esg.db in the repo worktree. It is NOT checked into Git — .gitignore excludes it. Schema is rebuilt from src/db/migrations/*.sql on a fresh checkout.

Driver: better-sqlite3 (synchronous, in-process). Singleton client at src/db/client.js.

Backups: not automated. The DB is small enough that a manual sqlite3 esg.db .dump > backup.sql on demand is fine for now. Worth automating when DB grows or before a destructive migration.

Migrations: applied via the migration runner. The runner wraps each file in its own transaction — do not include BEGIN TRANSACTION / COMMIT inside migration files (nested transactions error in SQLite).


Cloudflare

Account: mcmillangrubb

Zone: esg-screen.org

Workers / Pages: TBD — the ops site (this site you're reading) will deploy via Cloudflare Pages.

Cloudflare Tunnel: named esg-screening. Connector token lives in 1Password under the ESG Screening vault (it is NOT a User API Token; it's a Tunnel connector token, surfaced under Zero Trust → Networks → Tunnels). This architectural distinction matters — wrong page = wrong search.

Cloudflare Access: gates esg-screen.org (the product) and ops.esg-screen.org (this site) under a single Access app. OTP flow, 5-address email allow list. Adding the ops site to the existing app rather than a new one keeps the allow list in one place.


DNS

Zone: esg-screen.org in the mcmillangrubb Cloudflare account.

Records:

Type Name Target Purpose
CNAME esg-screen.org Tunnel Product UI
CNAME ops.esg-screen.org Cloudflare Pages This ops site

Old DNS: esg.mcmillangrubb.com — superseded by esg-screen.org. Do not use the old name.


Secrets

Where each secret lives:

Secret Location Notes
GitHub auth (for the VM to push) gh CLI on the VM, token in ~/.config/gh/hosts.yml Set up via gh auth login device-code flow on 12 May
Cloudflare Tunnel token 1Password ESG Screening vault Connector token
GLEIF API None — public unauthenticated
SBTi data download None — public Excel
UK Companies House None for the read endpoints used
Slack Bot token in environment (managed by claude.ai integration, not on the VM)

Anti-patterns to avoid:

  • PAT (personal access token) at rest on the VM filesystem (e.g. /root/.esg-gh-token) — gh auth login is the preferred pattern
  • Secrets in committed files (.env is gitignored; never commit one)
  • Secrets in scraper_config_json — that column is for non-secret config (alternate names, Companies House numbers, etc.)

Repository

Org: McMillanGrubb Main repo: esg-screening Ops site repo: esg-screen-ops (pending — to be created when ADR-0006 lands)

Branch convention: main only. No feature branches in single-operator mode. If/when a second contributor lands, switch to PR-based workflow.

Commit convention: Conventional Commits-ish prefixes:

  • feat(scope): ... for new functionality
  • feat(schema): ... for migrations
  • fix(scope): ... for fixes
  • docs: ... for documentation
  • chore: ... for housekeeping

Push permissions: managed by gh on the VM. See "Secrets" above.


Local development (on the VM)

The VM is also the dev environment. There is no separate "dev" instance of the stack. Changes happen in the worktree, tested in-place against the live DB, committed and pushed when stable.

This is fine at single-operator scale. Pattern note: when adding destructive migrations, take a DB backup first (sqlite3 data/esg.db .dump > data/backups/esg-pre-NNN.sql).


Monitoring

Cron health: PM2 logs for esg-scheduler. No external monitoring yet.

Scraper health: per ADR-0003, the /methodology page in the product shows last successful run + last error per source. Becomes the operator-visible source status surface.

Stack health: PM2 status, /health endpoint (TBD on the read interface).

Alerting: none. Single-operator scale, weekly cadence — Rob notices.


Deployment

Code deploys:

# On the VM
cd /home/azureuser/esg-screening
git pull
npm install   # If package.json changed
pm2 restart esg-web

Schema migrations: applied via the migration runner, idempotent (uses schema_migrations tracking table). CC-on-VM applies migrations during the cycle that introduces them.

Cloudflare changes (tunnels, Access policies, DNS): manual via the Cloudflare dashboard. Document significant changes in Slack handoffs.

Ops site (this site): deploys via GitHub Actions → Cloudflare Pages from the esg-screen-ops repo. Push to main = deploy. (See ADR-0006 for the architecture.)