Connector Philosophy¶

This document captures the why behind how we build connectors and surface their actions to the agent. It is the source of truth for human contributors and for the connector assistant skill. The technical reference (file structure, model definitions, decorator usage) lives in connectors.md; this doc is about the judgment calls.

When in doubt, this doc wins over hunches and convention. Updates to it require explicit review — it's not a casual edit target.

1. Why we hand-roll connectors as the default¶

Most of our connectors are direct REST clients written in Python with @action-decorated methods. We do not reach for vendor SDKs, community libraries, or community MCP servers as a default. The reasons:

Authentication refresh stays in our control. Authlib's OAuth2Session plus our token_updater closure (token_updater.py) handles OAuth refresh correctly. Most community SDKs and MCP servers assume a long-lived API token and either don't handle refresh at all or handle it in ways that fight our per-tenant Connection model.
Action surface curation. A community Canvas SDK exposes 200+ endpoints. We expose 5. The 200+ surface is noise — it bloats agent context, dilutes tool selection accuracy, and surfaces operations that aren't relevant to student-facing chat. Curation is the value, not coverage.
Tailored return shapes optimized for LLM consumption. Vendor APIs return raw JSON envelopes designed for machine integration. We reshape them into formats that read well in an LLM's working memory (see §6).
Per-tenant customization is cheap. When a specific university asks for a custom Banner action that joins enrollment + financial aid in one call, we ship it in a half-hour task with the connector assistant skill. With vendor SDKs or MCP servers, that's a feature request to the vendor.
No runtime dependency on external MCP server SLAs. A community MCP server going down or going unmaintained breaks our customers. A direct REST client we wrote ourselves only fails if the upstream API fails.
Custom code is no longer expensive. Agentic generation has collapsed the cost of writing typed action wrappers from "one engineer-week" to "an afternoon with review." The historical "always prefer libraries" advice assumed library code was free and our code was expensive. That asymmetry is gone for well-bounded translation work like REST docs → typed methods.

This is not a knee-jerk "build everything ourselves" stance. It's a deliberate position grounded in the specific economics of our deployment model and the current state of agentic tooling.

2. When we use vendor MCP (the exception)¶

There is exactly one test for reaching for a vendor's MCP server instead of hand-rolling:

Does the vendor's MCP server bring runtime intelligence we cannot cheaply replicate?

"Runtime intelligence" means deeply integrated permission systems, audit trails, governance layers, or per-tenant configuration that lives inside the vendor's product and is materially harder to re-implement than to consume.

Currently only ServiceNow meets this bar. ServiceNow's native MCP server (Zurich+ release) is integrated with the Now Platform's ACL model, audit log, and Now Assist governance. Re-implementing those access controls via direct Table API calls would require reimplementing parts of ServiceNow's permissions model per customer instance. That's not "tedious translation work." That's "live integration with a runtime authorization system that varies per customer." Agentic tooling does not collapse the cost of that work.

Microsoft OneDrive has a first-party MCP server but does NOT qualify. The auth and ACL story is not deeper than what we already get from direct Microsoft Graph calls. The MCP server is a thin wrapper around the same API surface.

Canvas IgniteAgent (announced for late 2026) is TBD. Re-evaluate when it ships. If it brings deep integration with Canvas's role/permission model, it may qualify. Until then, hand-roll.

3. What we explicitly do NOT do¶

These are deliberate exclusions:

Don't use community MCP servers. Quality varies wildly (weekend projects to mature open-source). The auth model usually mismatches OAuth refresh. The maintenance commitment is uncertain. Community MCP servers are fine for personal Claude Desktop setups; they are not fit for production CampusCore.
Don't ship CLIs as connectors. CLIs work when the agent runs alongside the developer's machine (Claude Code style). CampusCore runs in ECS Fargate. CLIs would be a deployment burden, not a feature.
Don't auto-regenerate connectors from upstream specs. Versioned APIs are stable within a version (see §7). Regenerating "to keep up" burns review time, risks introducing AI hallucinations into working code, and provides zero value because the API didn't actually break. We regenerate only on explicit deprecation events.
Don't expose every endpoint a system has. Curate based on actual student/staff workflows. If we can't answer "what real question does this action serve?" then the action does not get added.
Don't drift-detect against upstream specs in CI. The conformance harness is a static check on our own code (§11). We do not poll vendor specs, diff schemas, or alert on upstream changes. Drift within a version is handled at runtime via observability, not at CI time.

4. The agentic generation principle¶

Engineering time for translation work — taking REST docs and producing typed @action methods — is no longer the scarce resource. The scarce resource has shifted to product judgment:

Which actions to expose
What return shapes the LLM consumes well
What permission level each write should require
Which fields to strip and which to preserve
How to aggregate related operations into agent-friendly units

These are taste questions. They benefit from the connector assistant skill as a collaborator but cannot be delegated to it. Every decision goes through human review.

The connector assistant skill (SKILL.md) is a collaborator, not autopilot. It reads this philosophy doc, follows the conventions, produces drafts, and surfaces them for human review. It never auto-merges. It never silently regenerates. It never reaches into live systems.

The conformance harness (test_connector_contract.py) is the safety net that catches AI-generated slop (missing docstrings, missing type hints, unset api_version, malformed aliases) before it ships.

5. Action surface discipline¶

Every action must answer: what real student/staff question or workflow does this serve?

If the answer is "it's part of the API, so we wrapped it" — the action does not get added. We are not building an API explorer for the agent. We are building a curated capability surface tied to actual product use cases.

Anti-patterns to avoid:

Raw CRUD wrappers. get_resource_by_id(resource_id: str), update_field(resource_id: str, field: str, value: Any). The agent can't use these meaningfully because it doesn't know which IDs or fields matter for the user's question.
Direct API mirrors. If the vendor's API has 30 endpoints under /students/, we don't expose 30 actions called get_student, list_students, get_student_courses, etc. We expose 3-5 actions tied to specific scenarios.
Pure plumbing actions. list_pages, get_metadata, count_records. The agent rarely needs these directly. Build them as private helpers if needed; don't expose them as actions.

Patterns to favor:

Question-driven actions. get_my_holds(), get_upcoming_assignments(), get_advisor_contact(). Each one maps to a question a real student asks.
Aggregated actions. get_student_overview() that joins enrollment + holds + financial aid in one call is better than three separate actions, because the LLM doesn't have to chain calls.
Read-mostly bias. Writes are the exception, not the rule. Most connectors should be 80%+ reads.
Permission-gated writes. Writes must declare side_effect="write" and permission="user_confirm" (or "admin_approve" for high-stakes operations). See §10.

6. Return shape philosophy¶

Optimize for how an LLM reads JSON, not how a machine integration consumes it.

Strip noise. Vendor responses include internal IDs, audit metadata, deprecation hints, debug fields, and reserved fields. None of these help the LLM. Drop them unless the agent specifically needs them.

Preserve human-readable field names. Use student_name not stdnt_nm. Use due_date not dt_due_dttm. If the vendor uses cryptic abbreviations, rename in our wrapper.

Always declare typed return shapes. Never -> dict or -> Any. Use Pydantic models or precise type hints (list[CourseInfo], StudentOverview). The conformance harness enforces this.

Returns must be JSON-serializable via json.dumps(default=str) so the agent bridge can stringify them for the LLM. Datetimes, UUIDs, Decimals — convert to strings or floats in the action method, not at the bridge.

Flatten deeply nested envelopes. If the vendor returns:

{"data": {"attributes": {"student": {"profile": {"name": "..."}}}}}

our action should return:

{"name": "...", ...}

Not because the original is wrong, but because the LLM wastes tokens parsing the envelope and is more likely to extract the wrong field.

Aggregate where it helps. If the agent will almost always need fields A, B, and C together, return them together. Don't force three sequential tool calls.

7. API version pinning is set-and-forget¶

Every system we integrate has a versioned REST API. The version style varies (URL path, header content negotiation, query param), but the contract within a version is the same: vendors do not break it.

Pinning rules:

Pin api_version at the connector level when it's first created. Set BaseConnector.api_version and api_version_style explicitly. Never leave them empty (the conformance harness enforces this).
Trust the vendor's contract within a version. Additive changes (new fields, new optional params, new endpoints) are expected and safe. Breaking changes don't happen within a version.
Migrate only on explicit deprecation announcements. Not on speculation, not on "the latest is now v2," not on automated drift detection.
Document the version in the connector class docstring so future maintainers see it without having to read the code:

class CanvasConnector(BaseConnector):
    """
    Canvas LMS OAuth2 connector.

    API: v1
    API style: URL path versioning
    Deprecation status: None announced. v1 has been stable since 2014.
    Last reviewed: 2026-04-13
    """
    slug = "canvas"
    api_version = "v1"
    api_version_style = "url_path"

The Last reviewed date is institutional memory: "we checked the deprecation status at this date." When someone touches the connector in the future, they decide whether to bump the date based on whether they re-verified.

Action contracts vs API contracts. ActionMeta.version is a separate concept — it's our internal contract version for an action's signature. Bump it when WE change the params or return shape. Use ActionMeta.aliases to keep old names resolvable for any rename. These are independent of the upstream API version.

Per-resource versioning (Banner Ethos): some systems version individual resources independently. Use ActionMeta.api_version_override per action to express this. The default falls back to the connector's api_version.

8. State separation¶

Connections carry two kinds of data:

Authentication and credentials → Connection.credentials. Fernet-encrypted opaque blob. Owned by the auth flow. Touched only by get_client and the token updater.
Everything else (sync cursors, cache entries, webhook subscription IDs, rate-limit counters, per-connection feature flags) → ConnectionState sidecar via connection.state_get(key) / connection.state_set(key, value).

Never mix the two. Sync cursors do not belong in the credentials blob. OAuth tokens do not belong in ConnectionState. The separation makes auth refresh logic safe to write and operational state easy to inspect.

ConnectionState is the right home for anything that: - Needs to be read or written by middleware (rate limiting, caching) - Needs to survive across action calls but is not authentication - Has a clear key name and a JSON-serializable value

9. Progressive disclosure at the agent layer¶

The agent does not see all connector actions on every turn. It sees two bootstrap tools always (list_my_connections, load_connector_actions), and action tools appear in its toolset only after the agent explicitly loads a connector.

Why: Frontloading every action across every connector into the system prompt: - Bloats context (a single connector with 50 actions can consume 50k+ tokens) - Defeats provider-side schema validation (forcing the model to emit JSON inside JSON) - Degrades reliability as the catalog grows (more tools = lower selection accuracy, well-documented by Anthropic)

The pattern we use mirrors Anthropic's official Tool Search / Agent Skills design (3 levels: metadata → instructions → resources). The agent discovers what's available, loads what it needs, calls typed tools directly. This is also how coding agents work — list, read, grep instead of pre-indexing every file.

The implementation is in connector_tool_factory.py and the bootstrap tools alongside it. The agent loop (agent_core.py) rebuilds the tool list on each iteration based on loaded_connectors run state.

Anti-pattern to avoid: the previous design had a single connector_action tool that took a params: str JSON-string-inside-a-tool-call. This defeated schema validation and grew unboundedly. We deleted it. Don't bring it back.

10. Permission and confirmation philosophy¶

Reads are auto. Writes are gated.

`permission`	Behavior	Use for
`"auto"`	Executes immediately, no confirmation	All read actions
`"user_confirm"`	Agent surfaces a confirmation event the user must approve	Most writes (create ticket, send email, book appointment)
`"admin_approve"`	Requires admin approval, not user approval	High-stakes operations (grade changes, enrollment changes, financial actions)

Rules:

The agent never silently mutates external state. Every write surfaces a confirmation event the user must approve.
Writes must declare both side_effect="write" AND permission (the latter is not implied).
High-stakes writes (anything involving grades, money, enrollment, or compliance-sensitive data) use permission="admin_approve". When in doubt, admin-approve.
ActionMeta.idempotency_key should be set on writes so future RetryMiddleware can safely retry on transient failures without creating duplicates.

The confirmation flow lands when the first write connector ships. Until then, the permission field is recorded but unused at runtime — that's fine, it's already in the schema for when we need it.

11. The conformance harness is a self-check, not drift detection¶

test_connector_contract.py is the contract test that runs in CI on every PR. It validates our code's internal consistency:

Every connector declares slug, kind, display_name
Every connector declares api_version and api_version_style
Every action has a non-empty docstring
Every action parameter has a real type annotation (not Any, except for explicit params/kwargs)
Every action's aliases don't collide with primary names
Every connector loads without errors

It does NOT: - Make live API calls to vendor systems - Diff our schemas against upstream OpenAPI specs - Detect upstream API drift - Validate response shapes against real responses - Poll vendor docs for changes

When the harness fails, it's because we shipped slop — not because the vendor changed something. This narrow scope is intentional: we want a fast, deterministic, network-free CI check. Drift within a version is rare and handled at runtime via observability (Sentry alerts on 4xx/5xx spikes, log entries on unexpected response shapes). It is not a CI concern.

The harness is also the safety net for AI-generated code. When the connector assistant skill produces a draft, the human reviewing the diff also runs the harness. Hallucinations get caught mechanically before merge.

12. When in doubt, hand-roll¶

The default for every new system is custom Python with @action-decorated methods. Reach for the connector assistant skill (SKILL.md) to make this fast.

Reserve kind=mcp for systems that meet the §2 test (currently: ServiceNow only).

Reserve community SDKs and community MCP servers for last resort, never for production paths.

When you are about to make an exception to any of the rules in this doc, write down the reason in the connector's class docstring so the next person can see why. Exceptions are allowed. Silent exceptions are not.

13. Authorization scoping: upstream vs local¶

Connectors fall into two authorization models, and the model determines how much enforcement work CampusCore has to do itself. This is a platform-level distinction that shapes identity mapping, middleware, and the entire security posture of a connector — it must be decided before writing code, not discovered later.

Upstream-scoped (the preferred model)¶

The vendor exposes per-user OAuth2 (or equivalent delegated-permission flow):

The workspace admin registers CampusCore as an authorized application with the vendor (Canvas Developer Keys, Google Cloud Console, Azure AD app registration, etc.) and pastes CLIENT_ID / CLIENT_SECRET into workspace config via /settings/workspace-connectors/. This is the "permission to initiate OAuth flows" layer.
Each individual user then clicks "Connect", hits the vendor's consent screen, and grants CampusCore permission to access their data. CampusCore stores that user's access token per-user on Connection.credentials (Fernet-encrypted).
When the agent makes a call, it uses that specific user's token, and the vendor enforces "Alice's token can only see Alice's courses."

Canvas, Google Drive, and OneDrive all work this way. For these connectors, CampusCore does essentially zero authorization enforcement — the vendor does it for us. This is why those connectors' action methods can safely take bare parameters like per_page without worrying about cross-user data leakage.

Locally-scoped (service-account connectors)¶

The vendor only issues an institution-level credential — one API key or service account that can see everything the tenant can see. There is no per-user consent flow, no per-user token, no upstream scoping.

Banner/Ethos is the canonical example. Ellucian mints a single ETHOS_API_KEY that reads any student's data for the whole institution. When a CampusCore user asks the agent "show me student X's holds," the upstream cannot distinguish between a student asking about themselves and a student snooping on a peer — it sees only "app CampusCore, with institution-level key, asking about student X."

This means CampusCore itself must enforce authorization before the action runs. Three pieces are mandatory for any locally-scoped connector:

Identity mapping. Each CampusCore user needs a stable link to their upstream identity (e.g., banner_id), populated at SSO time from SAML/OIDC attributes or by admin fallback. Without this the connector cannot tell self-queries from peer-queries.
A connector-specific authorization middleware on <Connector>.middleware (the hook exists at base.py and is an empty list by default, overridable per connector). The middleware inspects ctx.user, their RBAC role, and the action parameters, then raises PermissionDenied before the upstream call if the rules don't check out. The rules are always domain-specific — a Banner faculty member can see their own roster; a PeopleSoft HR connector might have different rules entirely — so the middleware lives in the connector's own file, not in the generic platform.
A loud failure mode when identity mapping is missing. If a user's linked identity is not set, self-scoped actions must refuse with a clear error message, not silently return empty results. An empty-result failure mode is worse than a PermissionDenied — it looks like "no data" instead of "you aren't linked yet."

How to decide which model a new connector needs¶

Read the vendor's auth docs before writing any code:

Phrases that indicate upstream-scoped: "per-user OAuth consent," "authorization code flow," "delegated permissions," "user scopes," "on-behalf-of flow."
Phrases that indicate locally-scoped: "API key," "client credentials grant," "integration user," "service account," "application permissions."

State the model explicitly to the human reviewer before generating code. Never silently default.

When in doubt, upstream-scoped is cheaper. If the vendor offers both (some do — Salesforce, Microsoft Graph with delegated vs. application permissions), pick upstream-scoped unless there's a specific reason not to. Locally-scoped connectors compound CampusCore's audit and compliance surface because every authorization decision becomes our decision, not the vendor's.

Relationship to §10¶

§10 ("Permission and confirmation philosophy") gates writes behind user confirmation. §13 (this section) gates reads and writes behind identity scoping. They are orthogonal:

An upstream-scoped connector still needs §10 confirmation for writes, but §13 is free (the vendor enforces scoping).
A locally-scoped connector needs both: §13 authorization middleware on every self-scoped action and §10 confirmation on every write.

Both must be satisfied. Neither is optional for the case it covers.