Slack Setup¶

Operator playbook for getting Slack notifications wired for a CampusCore tenant. End state: the per-client workflow_runs channel receives start/complete/fail messages from the index health check, HNSW rebuild, and scrape pipelines — so engineers can spot a stuck workflow without opening AWS.

For the architecture (what posts to Slack, when, and how slack_utils is wired into call sites), see Observability Stack and the per-workflow doc Vector Index Observability.

The Slack app already exists. The CampusCorePlatform app is installed in the CampusCore Slack workspace with the right scopes. This playbook assumes that's a given — there's no per-tenant or per-deploy step for "create the app." If you're starting a workspace from scratch, see the App Setup Reference at the bottom.

1. Structure: one workspace, one bot, per-client channels¶

One Slack app installed in the CampusCore Slack workspace. The app is CampusCorePlatform; the bot user shows up in Slack as @campuscoreplatform. It's the same bot identity regardless of which tenant fires a notification. We don't run one bot per tenant — that would multiply admin work without buying isolation we actually want.

One channel per tenant. Each GitHub Environment (vsu-troy-pilot, howard-prod, …) posts to its own dedicated channel. That keeps cross-tenant traffic from cluttering any one client's feed, and lets you point per-tenant alert rules at different Slack rooms later if needed.

Slack workspace: CampusCore
  ├── App: CampusCorePlatform (single bot, one token at the repo level)
  │   └── bot user: @campuscoreplatform
  ├── #vsu_pilot_workflow_runs       ◄── vsu-troy-pilot GitHub Environment
  ├── #howard_pilot_workflow_runs    ◄── howard-pilot GitHub Environment
  └── #cc-errors                     ◄── Sentry-driven, separate concern

So: a new tenant = create one new channel, invite the bot, set one variable. The bot token is set once at the repo level and never per-tenant unless you need credential isolation (see Step 3 for that override).

2. Bot token storage (already done — referenced for verification)¶

The bot token (the xoxb-... value from CampusCorePlatform → OAuth & Permissions → Bot User OAuth Token) is stored as a GitHub repository-level secret named SLACK_BOT_TOKEN. This is a one-time setup that's already in place.

Verify it's present (lists names only — secret values are never readable):

gh secret list --repo CampusCoreAI/campuscore | grep SLACK_BOT_TOKEN

That one secret is available to every deploy job for every environment. The workflow line that reads it (echo "TF_VAR_slack_bot_token=${{ secrets.SLACK_BOT_TOKEN }}" in .github/workflows/deploy-aws.yml) resolves to the same value for every tenant.

If you ever need to rotate the token (compromised, app reinstalled, …):

In api.slack.com → CampusCorePlatform → OAuth & Permissions → Reinstall to Workspace to get a fresh token.
bash gh secret set SLACK_BOT_TOKEN \ --repo CampusCoreAI/campuscore \ --body 'xoxb-<new-token>'
Re-deploy each tenant. Old token stops working as soon as Slack issues the new one; re-deploying just re-injects the new value into ECS task definitions.

3. (Optional) override the bot token per environment¶

If a client requires their own isolated bot identity for compliance — separate audit trail, ability to revoke without affecting other tenants — set an environment-level secret of the same name. Env-level wins over repo-level.

gh secret set SLACK_BOT_TOKEN \
  --env <tenant-env> \
  --repo CampusCoreAI/campuscore \
  --body 'xoxb-<tenant-specific>'

This is rare. Most deployments share the one repo-level bot.

4. Per-client setup¶

These five steps run once per GitHub Environment. The user-facing checklist for onboarding a new tenant:

4a. Create the channel¶

In Slack:

Channel name: {client}_workflow_runs (e.g., vsu_pilot_workflow_runs, howard_workflow_runs)
Visibility:
Private if only the engineering team reads it (default — keeps workflow noise out of client-facing channels)
Public if the client team also wants visibility into deploys and index health
Description: "CampusCore workflow notifications for {client}: index health checks, HNSW rebuilds, scrape runs."

4b. Invite the bot¶

In the channel:

/invite @campuscoreplatform

Required for private channels. Public channels technically work without an explicit invite thanks to the chat:write.public scope, but inviting anyway makes channel membership auditable and matches what Slack admins expect to see.

If you skip this step on a private channel, the deploy will succeed, the env vars will land in the ECS task, the running task will think Slack is configured — but every chat.postMessage will return channel_not_found and silently no-op.

4c. Copy the channel ID¶

The Slack API requires the channel ID (C0XXXXXXXX), not the #name. Two ways to grab it:

Slack desktop: click the channel name → "About" pane → bottom shows Channel ID: C0XXXXXXXX. Copy.
Slack URL: open the channel, look at slack.com/.../archives/C0XXXXXXXX. The trailing C0XXX is the ID.

⚠ Do not paste the #channel-name. The variable resolves to text, the SDK calls chat.postMessage(channel="#workflow-runs"), and Slack returns channel_not_found — even when a channel with that exact name exists. Use the ID.

4d. Set the channel ID as a GitHub VARIABLE¶

The channel ID is not sensitive (it's an internal workspace identifier — knowing the ID alone gets you nothing without the bot token). Store it as a variable, not a secret.

gh variable set SLACK_CHANNEL_WORKFLOW_RUNS \
  --env <tenant-env> \
  --repo CampusCoreAI/campuscore \
  --body 'C0XXXXXXXX'

⚠ Common mistake: gh secret set SLACK_CHANNEL_WORKFLOW_RUNS …. The deploy workflow reads ${{ vars.SLACK_CHANNEL_WORKFLOW_RUNS }} — a value in the secrets namespace will resolve to empty even though it exists. The two namespaces don't fall back to each other.

If you accidentally set it as a secret:

gh secret delete SLACK_CHANNEL_WORKFLOW_RUNS --env <tenant-env> --repo CampusCoreAI/campuscore
gh variable set SLACK_CHANNEL_WORKFLOW_RUNS --env <tenant-env> --repo CampusCoreAI/campuscore --body 'C0XXXXXXXX'

4e. Trigger a redeploy¶

Push to the deploy branch, or run the workflow manually. The next ECS task definition revision will have SLACK_BOT_TOKEN and SLACK_CHANNEL_WORKFLOW_RUNS populated. Within ~3 minutes the running tasks roll over to the new revision and Slack posts start firing.

5. Verify¶

After the redeploy completes:

5a. Confirm the deploy actually carried the values¶

In the workflow log for the deploy-app job, find the "Export Terraform variables" step. Expected:

echo "TF_VAR_slack_bot_token=***"                       ◄── non-empty (masked)
echo "TF_VAR_slack_channel_workflow_runs=C0XXXXXXXX"    ◄── non-empty (channel IDs are not sensitive)

If the bot-token line is =*** and the channel-id line is blank, you missed step 4d. If both are blank, the repo-level SLACK_BOT_TOKEN secret is also missing — verify it exists with gh secret list --repo CampusCoreAI/campuscore.

5b. Confirm the running container sees the env vars¶

aws ecs describe-task-definition \
  --task-definition campuscore-<tenant-env> \
  --query 'taskDefinition.containerDefinitions[0].environment[?name==`SLACK_BOT_TOKEN` || name==`SLACK_CHANNEL_WORKFLOW_RUNS`].{name: name, has_value: length(value) > `0`}' \
  --output table \
  --profile <tenant>

Both rows should show has_value: True. The values themselves stay hidden — we only ever check lengths.

5c. Smoke-test from the dashboard¶

Open https://<tenant-domain>/admin/observability/vector/
Switch to the Maintenance tab
Click Run check now

Within ~10 seconds, #{client}_workflow_runs should show:

▶ Index health check starting
✓ Index health check — all metrics ok
    (or ⚠ / ✗ depending on the current state)

If both messages appear, Slack is wired end-to-end.

Troubleshooting¶

Log line `Slack not configured (SLACK_BOT_TOKEN empty); skipping notification: …` in CloudWatch¶

The diagnostic logging in campus_core/shared_utils/slack_utils.py emits a more detailed line right before this one:

Slack token resolution failed: settings.SLACK_BOT_TOKEN attribute <STATE>,
  settings value length <N>, os.environ['SLACK_BOT_TOKEN'] length <N>

Decode:

Diagnostic	Meaning	Fix
`attribute MISSING`	The deployed image is older than commit `6448fd7` (the one that added the settings.py line). Stale image.	Push a fresh deploy.
`attribute PRESENT, settings 0, os.environ 0`	ECS env var actually isn't set on the running container. Either the deploy didn't carry the secret (workflow-side problem) or the running task is on an older task-def revision than you think.	Step 5a + 5b above; if both pass, the running tasks haven't rolled yet — wait or force a new deployment.
`attribute PRESENT, settings 0, os.environ N`	The OS has it but Django settings lost it. Most likely cause: `.env` file got loaded with `overwrite=True` somewhere, blanking the OS value.	Inspect `campus_core/settings.py` around the `env.read_env(...)` call.
`attribute PRESENT, settings N, …`	This path shouldn't fire — token is non-empty. If you still see it, you're reading logs from before the redeploy.	Check log timestamps; trigger a fresh request and re-read.

Log line `Channel key 'workflow_runs' not mapped in NOTIFICATION_CHANNELS; skipping notification: …`¶

The bot token is fine, but settings.NOTIFICATION_CHANNELS["workflow_runs"] is empty. Caused by step 4d going wrong:

The variable is set as a secret instead of a variable — the most common mistake. Verify with gh variable list --env <tenant-env>.
A different env var name was used. The exact name the workflow expects is SLACK_CHANNEL_WORKFLOW_RUNS.

Posts succeed via the bot but the channel doesn't see them¶

You're posting to a channel ID, but the bot isn't a member of that channel and the channel is private — chat:write.public doesn't apply to private channels. Slack returns channel_not_found because, from the bot's perspective, the private channel doesn't exist.

/invite @campuscoreplatform in the channel.

Posts succeed but show as a bare username instead of the bot's display name¶

The CampusCorePlatform app already has a display name configured, so this shouldn't happen for our workspace. If you do see it (e.g., after reinstalling the app from scratch): api.slack.com/apps → CampusCorePlatform → App Home → Edit display info — give the bot a name and avatar. Cosmetic only; doesn't affect functionality.

`users_conversations` returns an empty list even though the bot was invited¶

Slack's users_conversations API only returns channels with members the OAuth user can see. If the bot was invited but the workspace owner restricted the bot's discovery, it can post to a channel it doesn't appear in via this API. Not a bug — just an artifact of the API's scoping. Posts will still work.

Slack channel ID changed (e.g., archived and recreated)¶

Channel IDs are stable for the lifetime of a channel — archive + unarchive keeps the same ID, but delete + recreate produces a new one. If you ever recreate a channel:

gh variable set SLACK_CHANNEL_WORKFLOW_RUNS --env <tenant-env> --body 'C0NEW...'

Redeploy. The old ID stops resolving.

Sharing one Slack channel across tenants. Tempting for the first one or two universities, but it makes routing per-tenant alerts impossible later — you can't filter the channel feed by tenant since the message body is the only discriminator. One channel per GitHub Environment is the model.
Setting SLACK_CHANNEL_WORKFLOW_RUNS as a secret. Channel IDs aren't credentials. Putting them in the secrets namespace also breaks the workflow (which reads them as vars.). Use variables.
Using #channel-name instead of C0XXXXXXXX. The Slack API requires the ID. Names look stable but they aren't (channels can be renamed); IDs are.
Creating a separate Slack app per tenant. One workspace, one CampusCorePlatform app, one token. Per-tenant isolation comes from the channel boundary, not the bot identity.
Embedding the bot token in .env committed to the repo. Even though .gitignore excludes .env, a developer running git add -A once is enough to leak it. Always inject via the GitHub Environment, never via the local .env_sample shape.

Adding more notification channels later¶

Today the only channel we use is workflow_runs. If you later add (say) index_alerts for trigger-only notifications:

Add to settings.NOTIFICATION_CHANNELS (one line):

NOTIFICATION_CHANNELS = {
    'workflow_runs': env('SLACK_CHANNEL_WORKFLOW_RUNS', default=''),
    'index_alerts':  env('SLACK_CHANNEL_INDEX_ALERTS',  default=''),
}

Add the corresponding TF_VAR_* echo in .github/workflows/deploy-aws.yml deploy-app job
Add a variable "slack_channel_index_alerts" declaration in infrastructure/app/variables.tf
Add the new env var in infrastructure/app/ecs.tf shared_env
Per tenant: create the channel, invite the bot, copy the ID, gh variable set SLACK_CHANNEL_INDEX_ALERTS --env <tenant-env> --body 'C0YYY'

The settings dict is the single source of truth — any call site that wants to post somewhere new just calls notify_workflow_event(channel_key="index_alerts", …) and resolution flows through settings.NOTIFICATION_CHANNELS.

Appendix: App setup reference¶

The CampusCorePlatform Slack app is already installed and shouldn't need to be recreated. This appendix exists only as a reference for "if we ever start a fresh workspace" — read in case of total workspace rebuild, never as a routine setup step.

In api.slack.com/apps:

Create New App → From scratch
App name: CampusCorePlatform (the display name shown in channel members)
Workspace: CampusCore
OAuth & Permissions → Scopes → Bot Token Scopes, add:
chat:write — required, post messages to channels the bot is a member of
chat:write.public — post to public channels without being invited (convenience; we still recommend explicit invites for audit)
channels:read — list public channels (used by diagnostic scripts)
groups:read — list private channels the bot is a member of (used by diagnostic scripts)
App Home → Edit the bot display name and (optionally) avatar
Install to Workspace → Authorize
Copy the Bot User OAuth Token from OAuth & Permissions. Format: xoxb-<numbers>-<numbers>-<random>. Store as the repo-level SLACK_BOT_TOKEN secret per Step 2 above.

No user-token scopes are needed. The app never acts as a real user.