Skip to content

01 — Prerequisites

Scope: Per-client. Sit with the client and collect everything needed before any technical setup begins.

The output of this phase is all the values needed to configure the deployment pipeline in Step 02. Go through each section with the client.


Checklist

Required from Client

  • AWS Account ID — 12-digit account identifier (AWS Console > top-right > Account ID)
  • Custom domain name — e.g., ai.university.edu
  • DNS access — ability to create a CNAME record pointing to our subdomain or ALB
  • SSL preferenceCampusCore-managed SSL (recommended, we handle everything) or self-managed SSL (client handles SSL via Cloudflare, nginx, etc.)
  • OpenAI API key — for embeddings and LLM (platform.openai.com/api-keys)
  • Gemini API key — for document OCR processing (ai.google.dev)
  • Cohere API key — for search reranking (cohere.com)

Required from Client (Branding)

These are used in the post-deployment setup wizard (Step 04).

  • University name — full name (e.g., "Howard University")
  • University abbreviation — short form (e.g., "HU")
  • Logo URL — publicly accessible URL to the university logo
  • Brand colors — primary and secondary hex colors
  • Assistant name — what to call the AI assistant (e.g., "Bison", "Beacon")

Internal Decisions (CampusCore Engineering)

These don't involve the client but should be settled before Step 04 so the deploy carries the right config from the first push.

  • Sentry project — one Sentry project per tenant. Create now or note "deferred." See Sentry Setup.
  • Slack workflow_runs channel — decide on a channel name (convention: {client}_workflow_runs, e.g., vsu_pilot_workflow_runs), whether it's public or private, and capture the channel ID once created. The bot (@campuscore-platform) must be invited if private. See Slack Setup — the per-client section walks through channel creation + bot invite + capturing the ID + setting it as a GitHub variable (not a secret — common mistake).
  • Auto-rebuild schedule — decide whether to enable ENABLE_INDEX_MAINTENANCE_SCHEDULE for this tenant now (production traffic) or leave off until ~2 weeks of IndexMaintenanceLog data has accumulated (pilot/staging). Defaults to off. See Vector Index Observability.

What Gets Created in the Client's AWS Account

For transparency, share this with the client so they know what the deploy role provisions:

Resource Purpose
Dedicated VPC (2 AZs) Public + private subnets; app, RDS, and cache run in private subnets
NAT Gateway (single) Outbound internet for private-subnet tasks (AI provider APIs)
VPC Interface Endpoints (5) Private links to ECR (api + dkr), CloudWatch Logs, SQS, SSM; free S3 gateway endpoint
ECS Cluster + Services Web app (1 vCPU/2 GB) + document worker (1 vCPU/4 GB) on Fargate
ECR Repositories Docker image storage (web + worker)
RDS PostgreSQL 17 Database with pgvector (db.t3.large, 20 GB); optional read replica off by default
ElastiCache Serverless (Valkey) Shared in-memory cache (storage capped at 1 GB)
S3 Buckets App storage + user file uploads
SQS Queue Document processing job queue + dead-letter queue
ALB Load balancer (HTTP, 180s idle timeout for streaming)
AWS WAF WAFv2 web ACL on the ALB (4 AWS managed rule sets + per-IP rate limit)
CloudWatch Logs Application logging (14-day retention)
IAM Roles ECS task execution and task roles

Encryption by default: All storage resources are created with encryption enabled — RDS uses AWS-managed KMS encryption, S3 buckets use AES-256 server-side encryption, and SQS queues use AWS-managed SSE. No additional configuration is needed from the client. See Security & SOC 2 Compliance for details.

Not in their account: Terraform state is stored in our admin account's S3 bucket.

Estimated AWS Costs

Baseline monthly cost for a single-instance deployment (us-east-1, desired_count 1, light traffic), at current on-demand pricing:

Service Configuration Approximate Cost
RDS PostgreSQL db.t3.large + 20 GB ~$108
ECS Fargate (web + worker) 1 vCPU/2 GB + 1 vCPU/4 GB ~$79
VPC interface endpoints 5 services × 2 AZs @ $0.01/hr ~$73
NAT gateway single @ $0.045/hr + data ~$33
ALB $0.0252/hr + LCUs ~$20
ElastiCache Serverless (Valkey) $0.084/GB-hr, capped 1 GB ~$10
AWS WAF web ACL + 5 rules ~$10
S3 + SQS + CloudWatch + transfer usage-based ~$15
Total ~$350/month

Costs scale with usage: the worker auto-scales 1-10 and web 1-5 under bulk ingestion (the largest variable), and NAT/endpoint data processing, egress, and WAF requests grow with traffic. AI provider usage (OpenAI/Gemini/Cohere) is billed to the client's own accounts, separate from AWS. Enabling the read replica roughly doubles the RDS line.


Output of This Phase

By the end of this step, you should have values for every secret and variable listed below. These map directly to the GitHub Environment configuration in Step 02.

Collected Value Maps to GitHub Secret/Variable
AWS Account ID Used to derive AWS_ROLE_ARN (after client deploys role in Step 03)
Custom domain CUSTOM_DOMAIN_WITH_PROTOCOL
SSL preference ENABLE_CUSTOM_DOMAIN_WITH_SSL
OpenAI API key OPENAI_API_KEY
Gemini API key GEMINI_API_KEY
Cohere API key COHERE_API_KEY
Slack channel ID (per-client) SLACK_CHANNEL_WORKFLOW_RUNS (variable, not secret)

See GitHub Environment Variables Reference for the complete specification — including the Slack and index-maintenance variables added by recent work, and the SLACK_BOT_TOKEN repo-level secret shared across all tenants.


Next: 02 — Pipeline Setup