01 — Prerequisites¶

Scope: Per-client. Sit with the client and collect everything needed before any technical setup begins.

The output of this phase is all the values needed to configure the deployment pipeline in Step 02. Go through each section with the client.

Checklist¶

Required from Client¶

AWS Account ID — 12-digit account identifier (AWS Console > top-right > Account ID)
Custom domain name — e.g., ai.university.edu
DNS access — ability to create a CNAME record pointing to our subdomain or ALB
SSL preference — CampusCore-managed SSL (recommended, we handle everything) or self-managed SSL (client handles SSL via Cloudflare, nginx, etc.)
OpenAI API key — for embeddings and LLM (platform.openai.com/api-keys)
Gemini API key — for document OCR processing (ai.google.dev)
Cohere API key — for search reranking (cohere.com)

Required from Client (Branding)¶

These are used in the post-deployment setup wizard (Step 04).

University name — full name (e.g., "Howard University")
University abbreviation — short form (e.g., "HU")
Logo URL — publicly accessible URL to the university logo
Brand colors — primary and secondary hex colors
Assistant name — what to call the AI assistant (e.g., "Bison", "Beacon")

Internal Decisions (CampusCore Engineering)¶

These don't involve the client but should be settled before Step 04 so the deploy carries the right config from the first push.

Sentry project — one Sentry project per tenant. Create now or note "deferred." See Sentry Setup.
Slack workflow_runs channel — decide on a channel name (convention: {client}_workflow_runs, e.g., vsu_pilot_workflow_runs), whether it's public or private, and capture the channel ID once created. The bot (@campuscore-platform) must be invited if private. See Slack Setup — the per-client section walks through channel creation + bot invite + capturing the ID + setting it as a GitHub variable (not a secret — common mistake).
Auto-rebuild schedule — decide whether to enable ENABLE_INDEX_MAINTENANCE_SCHEDULE for this tenant now (production traffic) or leave off until ~2 weeks of IndexMaintenanceLog data has accumulated (pilot/staging). Defaults to off. See Vector Index Observability.

What Gets Created in the Client's AWS Account¶

For transparency, share this with the client so they know what the deploy role provisions:

Resource	Purpose
Dedicated VPC (2 AZs)	Public + private subnets; app, RDS, and cache run in private subnets
NAT Gateway (single)	Outbound internet for private-subnet tasks (AI provider APIs)
VPC Interface Endpoints (5)	Private links to ECR (api + dkr), CloudWatch Logs, SQS, SSM; free S3 gateway endpoint
ECS Cluster + Services	Web app (1 vCPU/2 GB) + document worker (1 vCPU/4 GB) on Fargate
ECR Repositories	Docker image storage (web + worker)
RDS PostgreSQL 17	Database with pgvector (db.t3.large, 20 GB); optional read replica off by default
ElastiCache Serverless (Valkey)	Shared in-memory cache (storage capped at 1 GB)
S3 Buckets	App storage + user file uploads
SQS Queue	Document processing job queue + dead-letter queue
ALB	Load balancer (HTTP, 180s idle timeout for streaming)
AWS WAF	WAFv2 web ACL on the ALB (4 AWS managed rule sets + per-IP rate limit)
CloudWatch Logs	Application logging (14-day retention)
IAM Roles	ECS task execution and task roles

Encryption by default: All storage resources are created with encryption enabled — RDS uses AWS-managed KMS encryption, S3 buckets use AES-256 server-side encryption, and SQS queues use AWS-managed SSE. No additional configuration is needed from the client. See Security & SOC 2 Compliance for details.

Not in their account: Terraform state is stored in our admin account's S3 bucket.

Estimated AWS Costs¶

Baseline monthly cost for a single-instance deployment (us-east-1, desired_count 1, light traffic), at current on-demand pricing:

Service	Configuration	Approximate Cost
RDS PostgreSQL	db.t3.large + 20 GB	~$108
ECS Fargate (web + worker)	1 vCPU/2 GB + 1 vCPU/4 GB	~$79
VPC interface endpoints	5 services × 2 AZs @ $0.01/hr	~$73
NAT gateway	single @ $0.045/hr + data	~$33
ALB	$0.0252/hr + LCUs	~$20
ElastiCache Serverless (Valkey)	$0.084/GB-hr, capped 1 GB	~$10
AWS WAF	web ACL + 5 rules	~$10
S3 + SQS + CloudWatch + transfer	usage-based	~$15
Total		~$350/month

Costs scale with usage: the worker auto-scales 1-10 and web 1-5 under bulk ingestion (the largest variable), and NAT/endpoint data processing, egress, and WAF requests grow with traffic. AI provider usage (OpenAI/Gemini/Cohere) is billed to the client's own accounts, separate from AWS. Enabling the read replica roughly doubles the RDS line.

Output of This Phase¶

By the end of this step, you should have values for every secret and variable listed below. These map directly to the GitHub Environment configuration in Step 02.

Collected Value	Maps to GitHub Secret/Variable
AWS Account ID	Used to derive `AWS_ROLE_ARN` (after client deploys role in Step 03)
Custom domain	`CUSTOM_DOMAIN_WITH_PROTOCOL`
SSL preference	`ENABLE_CUSTOM_DOMAIN_WITH_SSL`
OpenAI API key	`OPENAI_API_KEY`
Gemini API key	`GEMINI_API_KEY`
Cohere API key	`COHERE_API_KEY`
Slack channel ID (per-client)	`SLACK_CHANNEL_WORKFLOW_RUNS` (variable, not secret)

See GitHub Environment Variables Reference for the complete specification — including the Slack and index-maintenance variables added by recent work, and the SLACK_BOT_TOKEN repo-level secret shared across all tenants.

Next: 02 — Pipeline Setup