docs: infrastructure and backend design #157

Open
julian wants to merge 7 commits from infrastructure-backend-design into main
Collaborator

Draft design for spec #10 (Infrastructure) and #11 (Backend)

Draft design for spec #10 (Infrastructure) and #11 (Backend)
specs/backend.md Outdated
@ -15,0 +164,4 @@
- **owner** — Created the org. Full access. Cannot be removed. One per org.
- **admin** — Manage users, entitlements, and all data. Multiple per org.
- **member** — Standard data access (read/write records, entities). No user or entitlement management.
Owner

Ideas:

  • rename owner to sys-admin or something similar.
    • keep access focused on managing orgs and entitlements
  • rename admin to manager? not sure. need to look at indrustry norms
    • can view entitlements but not edit them
  • consider adding a basic MD table to describe write/read/none for each of the roles and the main features
    • main features:
      • org management
        • add
        • update
        • archive (NO HARD DELETE)
      • org entitlements
      • org data
Ideas: - rename owner to `sys-admin` or something similar. - keep access focused on managing orgs and entitlements - rename admin to `manager`? not sure. need to look at indrustry norms - can view entitlements but not edit them - consider adding a basic MD table to describe write/read/none for each of the roles and the main features - main features: - org management - add - update - archive (NO HARD DELETE) - org entitlements - org data
Author
Collaborator

rename admin to manager? not sure. need to look at indrustry norms

In my organization we use the client role for example, it might not be standard, but it is pretty clear.

> rename admin to manager? not sure. need to look at indrustry norms In my organization we use the `client` role for example, it might not be standard, but it is pretty clear.
Owner

Hmm, i'm not sure that is intuitive to me as a user role

Hmm, i'm not sure that is intuitive to me as a user role
Author
Collaborator

I'll go with manager for now, it's a pretty common designation.

I'll go with `manager` for now, it's a pretty common designation.
addison marked this conversation as resolved
specs/backend.md Outdated
@ -15,0 +182,4 @@
}
```
`method` is `"email"` or `"sms"`. If `"sms"`, `contact` is a phone number (E.164 format).
Owner

How about starting with only email but leave design open for SMS in the future? Email is cheaper for me at this point.

How about starting with only email but leave design open for SMS in the future? Email is cheaper for me at this point.
addison marked this conversation as resolved
specs/backend.md Outdated
@ -15,0 +147,4 @@
}
```
**Refresh token** (long-lived, 30 days):
Owner

Would it be bad to consider increasing this to 180 days? That should be long enough that a user only has to sign in once or twice a year.

If that is considered bad practice, let's consider 90 or 60 days.

Would it be bad to consider increasing this to 180 days? That should be long enough that a user only has to sign in once or twice a year. If that is considered bad practice, let's consider 90 or 60 days.
Author
Collaborator

Standard seems to be about a max of 90 days. What we could implement that would kind of replicate this "long enough that a user only has to sign in once or twice a year." logic is apply a rotation of refresh tokens, using a new refresh token received with every access token request to replace the old one before the 90-day window closes. So we would have refresh tokens with a sliding window, if the user does not log in those 90 days, the refresh token is not renewed and it has to login again.

Standard seems to be about a max of 90 days. What we could implement that would kind of replicate this "long enough that a user only has to sign in once or twice a year." logic is apply a rotation of refresh tokens, using a new refresh token received with every access token request to replace the old one before the 90-day window closes. So we would have refresh tokens with a sliding window, if the user does not log in those 90 days, the refresh token is not renewed and it has to login again.
Owner

That sounds great 👍👍

That sounds great 👍👍
addison marked this conversation as resolved
@ -36,0 +45,4 @@
- **Pros**: Simple, cheap, full control, great for SQLite (local disk I/O)
- **Cons**: Manual scaling, single point of failure unless you add redundancy
#### Option B: Container Platform (Fly.io, Railway, etc.)
Owner

My current preference would be something that is easy to coordinate with Cloudflare.

Maybe this: https://developers.cloudflare.com/containers/, but it is in beta so might be unstable.

My current preference would be something that is easy to coordinate with Cloudflare. Maybe this: https://developers.cloudflare.com/containers/, but it is in beta so might be unstable.
addison marked this conversation as resolved
@ -36,0 +54,4 @@
- **Pros**: Easier deploys, built-in health checks, can scale to multiple regions
- **Cons**: Persistent volume adds complexity with SQLite, vendor lock-in, higher cost at scale
#### Option C: Self-Hosted
Owner

I have a local datacenter I could ask for pricing. Not sure that we need that for this year.

I have a local datacenter I could ask for pricing. Not sure that we need that for this year.
addison marked this conversation as resolved
@ -36,0 +68,4 @@
- `karriba.com` — Landing page (Cloudflare Workers, already deployed)
- `demo.karriba.com` — Web demo (Cloudflare Workers, already deployed)
- `api.karriba.com` — Backend API (Go server)
- `admin.karriba.com` — Admin portal (spec #13)
Owner

We need to add a line here for <org-id>.karriba.com to show that each org will get a dedicated subdomain for the web app. demo.karriba.com is just one example of this.

We need to look into wildcard subdomains.

https://developers.cloudflare.com/dns/manage-dns-records/reference/wildcard-dns-records#specific-dns-records-take-precedence-over-wildcard-records

We need to add a line here for `<org-id>.karriba.com` to show that each org will get a dedicated subdomain for the web app. `demo.karriba.com` is just one example of this. We need to look into wildcard subdomains. https://developers.cloudflare.com/dns/manage-dns-records/reference/wildcard-dns-records#specific-dns-records-take-precedence-over-wildcard-records
Owner

@julian if we could use wildcard records like this, we might not even need a reverse proxy!

@julian if we could use wildcard records like this, we might not even need a reverse proxy!
addison marked this conversation as resolved
@ -44,0 +130,4 @@
- [ ] Choose hosting option (A, B, or C)
- [ ] Provision server / container / hardware
- [ ] Set up reverse proxy with TLS (Caddy or platform-managed)
Owner

Definitely prefer caddy if we decide to run on VM directly

Definitely prefer `caddy` if we decide to run on VM directly
addison marked this conversation as resolved
julian changed title from WIP: chore(design): infrastructure and backend design to chore(design): infrastructure and backend design 2026-04-05 10:14:45 -04:00
julian self-assigned this 2026-04-05 10:16:45 -04:00
julian requested review from addison 2026-04-05 10:16:54 -04:00
specs/backend.md Outdated
@ -15,0 +267,4 @@
#### `POST /v1/orgs`
Create a new organization. The authenticated user becomes the sys-admin. Provisions a new tenant DB.
Owner

Hmmmm, this is not quite right. The system admins should be be assigned to any organization. They are responsible for overall system administration.

Maybe we should rethink our organization to user association system?

I did push up one commit with a few revisions. See d73eb30

Hmmmm, this is not quite right. The system admins should be be assigned to any organization. They are responsible for overall system administration. Maybe we should rethink our organization to user association system? I did push up one commit with a few revisions. See d73eb30
addison marked this conversation as resolved
specs/backend.md Outdated
@ -15,0 +384,4 @@
#### `PATCH /v1/orgs/:orgId/users/:userId`
Update a user's role. Sys-admin cannot be demoted. Only `sys-admin` can promote to `manager`.
Owner

This isn't quite right. manager should be allowed to promote any user within their organization.

sys-admin should have no direct access to organization users. They are just responsible for creating the organization and adding the initial manager user account.

This isn't quite right. **manager** should be allowed to promote any user within their organization. **sys-admin** should have no direct access to organization users. They are just responsible for creating the organization and adding the initial **manager** user account.
addison marked this conversation as resolved
@ -15,0 +449,4 @@
**Side effect:** Updated entitlements are synced to the tenant DB on next sync cycle (or immediately if the device is online).
**Error codes:** `INVALID_MODULE`, `ENHANCEMENT_REQUIRES_STANDALONE`
Owner

We should make it more clear here the situations that would trigger ENHANCEMENT_REQUIRES_STANDALONE

We should make it more clear here the situations that would trigger `ENHANCEMENT_REQUIRES_STANDALONE`
addison marked this conversation as resolved
specs/backend.md Outdated
@ -15,0 +484,4 @@
- `id` TEXT PRIMARY KEY (UUIDv7)
- `email` TEXT (nullable, unique)
- `phone` TEXT (nullable, unique)
- `org_id` TEXT REFERENCES organizations(id)
Owner

Should we have org_id here since sys-admin will have no organization? Not sure yet 🤔

Should we have `org_id` here since **sys-admin** will have no organization? Not sure yet 🤔
addison marked this conversation as resolved
specs/backend.md Outdated
@ -15,0 +531,4 @@
### Sys-Admin Role
Separate from manager to prevent lockout. Every org has exactly one sys-admin (the creator). Sys-admin transfer can be added in the future, but is out of scope for now.
Owner

This needs revised. System admins are outside the organization system.

This needs revised. System admins are outside the organization system.
addison marked this conversation as resolved
specs/backend.md Outdated
@ -15,0 +472,4 @@
### Tables
**organizations**
Owner

I would prefer singular names for DB tables. That would also be consistent with the tables in our app.

I would prefer singular names for DB tables. That would also be consistent with the tables in our app.
addison marked this conversation as resolved
@ -15,0 +129,4 @@
### OTP Flow
1. Client sends email address
2. Server generates a 6-digit OTP, stores it with expiry (5 minutes), sends via email
Owner

5 minutes seems very short. What do you think of 15?

5 minutes seems very short. What do you think of 15?
addison marked this conversation as resolved
addison left a comment
Owner

I have a few concerns, especially about the sys admin system. Feel free to leave replies or let me know if you want to schedule a quick meeting to discuss.

I have a few concerns, especially about the sys admin system. Feel free to leave replies or let me know if you want to schedule a quick meeting to discuss.
Author
Collaborator

I think I see what you mean about the sys-admin system, I've made the changes based on what you said. The revised spec should have decoupled that role from the organization level have it as a platform level administrator without access to organization specific information/access.

I've also decided to declare some security decisions to make it more clear and to guard a bit for future cybersecurity concerns.

I think I see what you mean about the sys-admin system, I've made the changes based on what you said. The revised spec should have decoupled that role from the organization level have it as a platform level administrator without access to organization specific information/access. I've also decided to declare some security decisions to make it more clear and to guard a bit for future cybersecurity concerns.
addison changed title from chore(design): infrastructure and backend design to docs: infrastructure and backend design 2026-04-29 17:52:42 -04:00
Owner

Great! I will review more thoroughly as soon as i can! It might not be until next week 😢

Great! I will review more thoroughly as soon as i can! It might not be until next week :cry:
Author
Collaborator

@addison wrote in #157 (comment):

Great! I will review more thoroughly as soon as i can! It might not be until next week 😢

Don't worry about it, I have a pretty busy week ahead so it's okay ☺️

@addison wrote in https://git.kwila.cloud/kwila/karriba/pulls/157#issuecomment-2630: > Great! I will review more thoroughly as soon as i can! It might not be until next week :cry: Don't worry about it, I have a pretty busy week ahead so it's okay ☺️
@ -15,0 +20,4 @@
- **Language**: Go
- **Router**: `net/http` with `chi` (lightweight, stdlib-compatible)
- **Database driver**: `github.com/mattn/go-sqlite3` (CGo) or `modernc.org/sqlite` (pure Go)
Owner

Let's firmly pick one.

Let's firmly pick one.
Owner

It seems that go-sqlite3 is more popular, we just have to have a C compiler available for builds.

It seems that `go-sqlite3` is more popular, we just have to have a C compiler available for builds.
specs/backend.md Outdated
@ -15,0 +163,4 @@
The `jti` claim is the primary key of the corresponding `refresh_token` row. It guarantees uniqueness across tokens issued in the same second and gives the server an O(1) lookup path during rotation and reuse detection.
Refresh tokens are stored in the Admin DB and can be revoked. Each time a refresh token is used to obtain a new access token, a **new refresh token** is also issued and the old one is revoked. This creates a sliding window — as long as the user is active within 90 days, their session continues indefinitely. If inactive for 90 days, the token expires and they must re-authenticate via OTP.
Owner

Is it typical to have this sort of sliding window with refresh tokens?

Is it typical to have this sort of sliding window with refresh tokens?
@ -15,0 +748,4 @@
### Threat: JWT signing-key compromise
**Mitigation**: the access-token signing secret is loaded from environment configuration and rotated by deploying a new key. To support rotation without invalidating active sessions, the JWT carries a `kid` header and the server holds a small key set (current + previous). On rotation, the previous key remains valid for one access-token TTL (15 min) before being removed. Refresh tokens carry their `kid` too. This is a Phase 3 hardening task; the initial implementation can ship with a single static key and add rotation later.
Owner

Is this accurate? I didn't see kid referenced anywhere else?

Is this accurate? I didn't see `kid` referenced anywhere else?
@ -15,0 +678,4 @@
Orgs are archived (soft-deleted via `archived_at`) rather than permanently deleted. Archiving disables access but preserves all data and the tenant DB. This avoids destructive operations and allows data recovery. Hard delete may be added later with a grace period and explicit confirmation flow.
**Enforcement**: the auth middleware treats members of an archived org as if they had no membership. Their access token still authenticates them, but `org_id` and `role` are not populated, so every org-scoped endpoint returns `403 FORBIDDEN`. Sys-admins continue to have full read access to archived orgs (e.g., to inspect or unarchive them). The `GET /v1/orgs/:orgId` response includes `archived_at` so clients can surface the disabled state to users.
Owner

If a member of an archived org is treated as having no membership, they will get 403 from GET /v1/orgs/:orgId, correct? It seems there is a contradiction there - how will the client be able to determine that the org is archived and display error message to the user?

If a member of an archived org is treated as having no membership, they will get 403 from `GET /v1/orgs/:orgId`, correct? It seems there is a contradiction there - how will the client be able to determine that the org is archived and display error message to the user?
specs/backend.md Outdated
@ -15,0 +175,4 @@
- Creates organizations and assigns the initial manager
- Manages organization entitlement assignments
- Manages organization archival
- Has **no access** to organization data or users
Owner

This should probably say something like "internal organization data" or something. We need to make it clear that sys admins do have access to organization metadata. Maybe a separate bullet point for that would be good.

This should probably say something like "internal organization data" or something. We need to make it clear that sys admins **do** have access to organization metadata. Maybe a separate bullet point for that would be good.
@ -44,0 +143,4 @@
### Phase 2: Database Setup
- [ ] Choose multi-tenant isolation model (Option 1 or 2)
Owner

We definitely want option 1 for tenant isolation.

We definitely want option 1 for tenant isolation.
Owner

I think you can update this file to prefer multi-tenant DB system. Otherwise this file looks good. The hosting option I'm not sure about yet, but good to have some options to think about 👍

I might hold off on final hosting decision until I have real customers lined up to actually use the servers. No need to pay for servers we aren't using.

I think you can update this file to prefer multi-tenant DB system. Otherwise this file looks good. The hosting option I'm not sure about yet, but good to have some options to think about 👍 I might hold off on final hosting decision until I have real customers lined up to actually **use** the servers. No need to pay for servers we aren't using.
Owner

@julian This is looking really good! In the future, you don't have to put quite so much detail in the specs 😆

I left a few comments/questions, then I think we can approve the backend spec and merge this one.

@julian This is looking really good! In the future, you don't have to put quite so much detail in the specs 😆 I left a few comments/questions, then I think we can approve the backend spec and merge this one.
This pull request can be merged automatically.
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin infrastructure-backend-design:infrastructure-backend-design
git switch infrastructure-backend-design

Merge

Merge the changes and update on Forgejo.

Warning: The "Autodetect manual merge" setting is not enabled for this repository, you will have to mark this pull request as manually merged afterwards.

git switch main
git merge --no-ff infrastructure-backend-design
git switch infrastructure-backend-design
git rebase main
git switch main
git merge --ff-only infrastructure-backend-design
git switch infrastructure-backend-design
git rebase main
git switch main
git merge --no-ff infrastructure-backend-design
git switch main
git merge --squash infrastructure-backend-design
git switch main
git merge --ff-only infrastructure-backend-design
git switch main
git merge infrastructure-backend-design
git push origin main
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
kwila/karriba!157
No description provided.