docs: infrastructure and backend design #157
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "infrastructure-backend-design"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Draft design for spec #10 (Infrastructure) and #11 (Backend)
@ -15,0 +164,4 @@- **owner** — Created the org. Full access. Cannot be removed. One per org.- **admin** — Manage users, entitlements, and all data. Multiple per org.- **member** — Standard data access (read/write records, entities). No user or entitlement management.Ideas:
sys-adminor something similar.manager? not sure. need to look at indrustry normsIn my organization we use the
clientrole for example, it might not be standard, but it is pretty clear.Hmm, i'm not sure that is intuitive to me as a user role
I'll go with
managerfor now, it's a pretty common designation.@ -15,0 +182,4 @@}````method` is `"email"` or `"sms"`. If `"sms"`, `contact` is a phone number (E.164 format).How about starting with only email but leave design open for SMS in the future? Email is cheaper for me at this point.
@ -15,0 +147,4 @@}```**Refresh token** (long-lived, 30 days):Would it be bad to consider increasing this to 180 days? That should be long enough that a user only has to sign in once or twice a year.
If that is considered bad practice, let's consider 90 or 60 days.
Standard seems to be about a max of 90 days. What we could implement that would kind of replicate this "long enough that a user only has to sign in once or twice a year." logic is apply a rotation of refresh tokens, using a new refresh token received with every access token request to replace the old one before the 90-day window closes. So we would have refresh tokens with a sliding window, if the user does not log in those 90 days, the refresh token is not renewed and it has to login again.
That sounds great 👍👍
@ -36,0 +45,4 @@- **Pros**: Simple, cheap, full control, great for SQLite (local disk I/O)- **Cons**: Manual scaling, single point of failure unless you add redundancy#### Option B: Container Platform (Fly.io, Railway, etc.)My current preference would be something that is easy to coordinate with Cloudflare.
Maybe this: https://developers.cloudflare.com/containers/, but it is in beta so might be unstable.
@ -36,0 +54,4 @@- **Pros**: Easier deploys, built-in health checks, can scale to multiple regions- **Cons**: Persistent volume adds complexity with SQLite, vendor lock-in, higher cost at scale#### Option C: Self-HostedI have a local datacenter I could ask for pricing. Not sure that we need that for this year.
@ -36,0 +68,4 @@- `karriba.com` — Landing page (Cloudflare Workers, already deployed)- `demo.karriba.com` — Web demo (Cloudflare Workers, already deployed)- `api.karriba.com` — Backend API (Go server)- `admin.karriba.com` — Admin portal (spec #13)We need to add a line here for
<org-id>.karriba.comto show that each org will get a dedicated subdomain for the web app.demo.karriba.comis just one example of this.We need to look into wildcard subdomains.
https://developers.cloudflare.com/dns/manage-dns-records/reference/wildcard-dns-records#specific-dns-records-take-precedence-over-wildcard-records
@julian if we could use wildcard records like this, we might not even need a reverse proxy!
@ -44,0 +130,4 @@- [ ] Choose hosting option (A, B, or C)- [ ] Provision server / container / hardware- [ ] Set up reverse proxy with TLS (Caddy or platform-managed)Definitely prefer
caddyif we decide to run on VM directlyWIP: chore(design): infrastructure and backend designto chore(design): infrastructure and backend design@ -15,0 +267,4 @@#### `POST /v1/orgs`Create a new organization. The authenticated user becomes the sys-admin. Provisions a new tenant DB.Hmmmm, this is not quite right. The system admins should be be assigned to any organization. They are responsible for overall system administration.
Maybe we should rethink our organization to user association system?
I did push up one commit with a few revisions. See
d73eb30@ -15,0 +384,4 @@#### `PATCH /v1/orgs/:orgId/users/:userId`Update a user's role. Sys-admin cannot be demoted. Only `sys-admin` can promote to `manager`.This isn't quite right. manager should be allowed to promote any user within their organization.
sys-admin should have no direct access to organization users. They are just responsible for creating the organization and adding the initial manager user account.
@ -15,0 +449,4 @@**Side effect:** Updated entitlements are synced to the tenant DB on next sync cycle (or immediately if the device is online).**Error codes:** `INVALID_MODULE`, `ENHANCEMENT_REQUIRES_STANDALONE`We should make it more clear here the situations that would trigger
ENHANCEMENT_REQUIRES_STANDALONE@ -15,0 +484,4 @@- `id` TEXT PRIMARY KEY (UUIDv7)- `email` TEXT (nullable, unique)- `phone` TEXT (nullable, unique)- `org_id` TEXT REFERENCES organizations(id)Should we have
org_idhere since sys-admin will have no organization? Not sure yet 🤔@ -15,0 +531,4 @@### Sys-Admin RoleSeparate from manager to prevent lockout. Every org has exactly one sys-admin (the creator). Sys-admin transfer can be added in the future, but is out of scope for now.This needs revised. System admins are outside the organization system.
@ -15,0 +472,4 @@### Tables**organizations**I would prefer singular names for DB tables. That would also be consistent with the tables in our app.
@ -15,0 +129,4 @@### OTP Flow1. Client sends email address2. Server generates a 6-digit OTP, stores it with expiry (5 minutes), sends via email5 minutes seems very short. What do you think of 15?
I have a few concerns, especially about the sys admin system. Feel free to leave replies or let me know if you want to schedule a quick meeting to discuss.
I think I see what you mean about the sys-admin system, I've made the changes based on what you said. The revised spec should have decoupled that role from the organization level have it as a platform level administrator without access to organization specific information/access.
I've also decided to declare some security decisions to make it more clear and to guard a bit for future cybersecurity concerns.
chore(design): infrastructure and backend designto docs: infrastructure and backend designGreat! I will review more thoroughly as soon as i can! It might not be until next week 😢
@addison wrote in #157 (comment):
Don't worry about it, I have a pretty busy week ahead so it's okay ☺️
@ -15,0 +20,4 @@- **Language**: Go- **Router**: `net/http` with `chi` (lightweight, stdlib-compatible)- **Database driver**: `github.com/mattn/go-sqlite3` (CGo) or `modernc.org/sqlite` (pure Go)Let's firmly pick one.
It seems that
go-sqlite3is more popular, we just have to have a C compiler available for builds.@ -15,0 +163,4 @@The `jti` claim is the primary key of the corresponding `refresh_token` row. It guarantees uniqueness across tokens issued in the same second and gives the server an O(1) lookup path during rotation and reuse detection.Refresh tokens are stored in the Admin DB and can be revoked. Each time a refresh token is used to obtain a new access token, a **new refresh token** is also issued and the old one is revoked. This creates a sliding window — as long as the user is active within 90 days, their session continues indefinitely. If inactive for 90 days, the token expires and they must re-authenticate via OTP.Is it typical to have this sort of sliding window with refresh tokens?
@ -15,0 +748,4 @@### Threat: JWT signing-key compromise**Mitigation**: the access-token signing secret is loaded from environment configuration and rotated by deploying a new key. To support rotation without invalidating active sessions, the JWT carries a `kid` header and the server holds a small key set (current + previous). On rotation, the previous key remains valid for one access-token TTL (15 min) before being removed. Refresh tokens carry their `kid` too. This is a Phase 3 hardening task; the initial implementation can ship with a single static key and add rotation later.Is this accurate? I didn't see
kidreferenced anywhere else?@ -15,0 +678,4 @@Orgs are archived (soft-deleted via `archived_at`) rather than permanently deleted. Archiving disables access but preserves all data and the tenant DB. This avoids destructive operations and allows data recovery. Hard delete may be added later with a grace period and explicit confirmation flow.**Enforcement**: the auth middleware treats members of an archived org as if they had no membership. Their access token still authenticates them, but `org_id` and `role` are not populated, so every org-scoped endpoint returns `403 FORBIDDEN`. Sys-admins continue to have full read access to archived orgs (e.g., to inspect or unarchive them). The `GET /v1/orgs/:orgId` response includes `archived_at` so clients can surface the disabled state to users.If a member of an archived org is treated as having no membership, they will get 403 from
GET /v1/orgs/:orgId, correct? It seems there is a contradiction there - how will the client be able to determine that the org is archived and display error message to the user?@ -15,0 +175,4 @@- Creates organizations and assigns the initial manager- Manages organization entitlement assignments- Manages organization archival- Has **no access** to organization data or usersThis should probably say something like "internal organization data" or something. We need to make it clear that sys admins do have access to organization metadata. Maybe a separate bullet point for that would be good.
@ -44,0 +143,4 @@### Phase 2: Database Setup- [ ] Choose multi-tenant isolation model (Option 1 or 2)We definitely want option 1 for tenant isolation.
I think you can update this file to prefer multi-tenant DB system. Otherwise this file looks good. The hosting option I'm not sure about yet, but good to have some options to think about 👍
I might hold off on final hosting decision until I have real customers lined up to actually use the servers. No need to pay for servers we aren't using.
@julian This is looking really good! In the future, you don't have to put quite so much detail in the specs 😆
I left a few comments/questions, then I think we can approve the backend spec and merge this one.
View command line instructions
Checkout
From your project repository, check out a new branch and test the changes.Merge
Merge the changes and update on Forgejo.Warning: The "Autodetect manual merge" setting is not enabled for this repository, you will have to mark this pull request as manually merged afterwards.