Audit an existing Sim knowledge base connector against the service API docs and repository conventions, then report and fix issues in auth, config fields, pagination, document mapping, tags, and registry entries. Use when validating or repairing code in `apps/sim/connectors/{service}/`.
You are an expert auditor for Sim knowledge base connectors. Your job is to thoroughly validate that an existing connector is correct, complete, and follows all conventions.
When the user asks you to validate a connector:
Read every file for the connector — do not skip any:
apps/sim/connectors/{service}/{service}.ts # Connector implementation
apps/sim/connectors/{service}/index.ts # Barrel export
apps/sim/connectors/registry.ts # Connector registry entry
apps/sim/connectors/types.ts # ConnectorConfig interface, ExternalDocument, etc.
apps/sim/connectors/utils.ts # Shared utilities (computeContentHash, htmlToPlainText, etc.)
apps/sim/lib/oauth/oauth.ts # OAUTH_PROVIDERS — single source of truth for scopes
apps/sim/lib/oauth/utils.ts # getCanonicalScopesForProvider, getScopesForService, SCOPE_DESCRIPTIONS
apps/sim/lib/oauth/types.ts # OAuthService union type
apps/sim/components/icons.tsx # Icon definition for the service
If the connector uses selectors, also read:
apps/sim/hooks/selectors/registry.ts # Selector key definitions
apps/sim/hooks/selectors/types.ts # SelectorKey union type
apps/sim/lib/workflows/subblocks/context.ts # SELECTOR_CONTEXT_FIELDS
Fetch the official API docs for the service. This is the source of truth for:
Use Context7 (resolve-library-id → query-docs) or WebFetch to retrieve documentation. If both fail, note which claims are based on training knowledge vs verified docs.
If the service docs do not clearly show document list responses, document fetch responses, metadata fields, or pagination shapes, you MUST tell the user instead of guessing.
If a schema is unknown, validation must explicitly recommend:
For every API call in the connector (listDocuments, getDocument, validateConfig, and any helper functions), verify against the API docs:
Authorization: Bearer ${accessToken}Content-Type is set for POST/PUT/PATCH requestsNotion-Version, Dropbox-API-Arg)null or empty unless the API expects that)$filter: single quotes escaped with '' (e.g., externalId.replace(/'/g, "''"))\'encodeURIComponent() appliedsiteUrl, instanceUrl) are normalized:
https:// / http:// prefix if the API expects bare domains/.trim() before validationdata.results vs data.items vs data)?? null or || undefinedScopes must be correctly declared and sufficient for all API calls the connector makes.
requiredScopes in the connector's auth config lists all scopes needed by the connectorrequiredScopes is a real, valid scope recognized by the service's APIrequiredScopes exists in the OAuth provider's scopes array in lib/oauth/oauth.tsOAUTH_PROVIDERS[providerGroup].services[serviceId].scopesrequiredScopes ⊆ OAUTH_PROVIDERS scopes (every required scope is present in the provider config)For each API endpoint the connector calls:
requiredScopesrequiredScopes, flag as warninggetOAuthTokenRefreshConfig function in lib/oauth/oauth.ts for this provideruseBasicAuth matches the service's token exchange requirementssupportsRefreshTokenRotation matches whether the service issues rotating refresh tokensnext_cursor, nextPageToken, @odata.nextLink, offset)hasMore is correctly determined from the responsenextCursor is correctly passed back for the next pagemaxItems / maxRecords cap is correctly applied across pages using syncContext.totalDocsFetchedmaxItems cap exists, the final page request uses Math.min(PAGE_SIZE, remaining) to avoid fetching more records than neededsyncContext is used to cache state across pages (user names, field maps, instance URLs, portal IDs, etc.)syncContext is correctly initialized on first page and reused on subsequent pagesConnectors that require per-document API calls to fetch content (file download, export, blocks fetch) MUST use contentDeferred: true. This is the standard pattern for reliability — without it, content downloads during listing can exhaust the sync task's time budget before any documents are saved.
listDocuments, it MUST use contentDeferred: true insteadlistDocuments returns lightweight stubs with content: '' and contentDeferred: truegetDocument fetches actual content and returns the full document with contentDeferred: falsefileToStub) is used by both listDocuments and getDocument to guarantee contentHash consistencycontentHash is metadata-based (e.g., service:{id}:{modifiedTime}), NOT content-based — it must be derivable from list metadata alonecontentHash is identical whether produced by listDocuments or getDocumentConnectors where the list API already returns content inline (e.g., Slack messages, Reddit posts) do NOT need contentDeferred.
externalId is a stable, unique identifier from the source APItitle is extracted from the correct field and has a sensible fallback (e.g., 'Untitled')content is plain text — HTML content is stripped using htmlToPlainText from @/connectors/utilsmimeType is 'text/plain'contentHash uses a metadata-based format (e.g., service:{id}:{modifiedTime}) for connectors with contentDeferred: true, or computeContentHash from @/connectors/utils for inline-content connectorssourceUrl is a valid, complete URL back to the original resource (not relative)metadata contains all fields referenced by mapTags and tagDefinitionsBuffer.byteLength(text, 'utf8') not text.length when comparing against byte-based limits (e.g., MAX_FILE_SIZE in bytes)tagDefinition has an id, displayName, and fieldTypefieldType matches the actual data type: 'text' for strings, 'number' for numbers, 'date' for dates, 'boolean' for booleansid in tagDefinitions is returned by mapTagstagDefinition references a field that mapTags never producestagDefinition id values exactlyparseTagDate from @/connectors/utilsjoinTagArray from @/connectors/utilsNaN)mapTags match what listDocuments/getDocument store in metadataid, title, typerequired is set explicitly (not omitted)options with label and id for each optiontype: 'selector' field with selectorKey, canonicalParamId, mode: 'basic'type: 'short-input' field with the same canonicalParamId, mode: 'advanced'required is identical on both fields in the pairselectorKey values exist in the selector registrydependsOn references selector field id values, not canonicalParamIdNumber.isNaN, positive values)VALIDATE_RETRY_OPTIONS for retry budget{ valid: true } on success{ valid: false, error: 'descriptive message' } on failureexternalIdnull for 404 / not found (does not throw)ExternalDocument shape as listDocumentslistDocuments uses contentDeferred: true, getDocument MUST fetch actual content and return contentDeferred: falselistDocuments uses contentDeferred: true, getDocument MUST use the same stub function to ensure contentHash is identicallistDocuments can produce (e.g., if listDocuments returns both pages and blogposts, getDocument must handle both — not hardcode one endpoint)syncContext if it needs cached state (user names, field maps, etc.)fetchWithRetry from @/lib/knowledge/documents/utilsfetch() calls to external APIsVALIDATE_RETRY_OPTIONS used in validateConfigvalidateConfig calls a shared helper (e.g., linearGraphQL, resolveId), that helper must accept and forward retryOptions to fetchWithRetrylistDocuments/getDocument$select, sysparm_fields, fields) should request only the fields the connector needs — in both listDocuments AND getDocumentPromise.all and a concurrency limit of 3-5Promise.all over large arrayscreateLogger from @sim/logger (not console.log)info levelwarn or error level with contextconnectors/{service}/index.tsconnectors/registry.tsid fieldGroup findings by severity:
Critical (will cause runtime errors, data loss, or auth failures):
requiredScopes not a subset of OAuth provider scopes$filter, SOQL, or query strings without escapinglistDocuments without contentDeferred: true — causes sync timeouts for large document setscontentHash mismatch between listDocuments stub and getDocument return — causes unnecessary re-processing every syncWarning (incorrect behavior, data quality issues, or convention violations):
htmlToPlainTextgetDocument not forwarding syncContextgetDocument hardcoded to one content type when listDocuments returns multiple (e.g., only pages but not blogposts)tagDefinition for metadata fields returned by mapTagsuseBasicAuth or supportsRefreshTokenRotation in token refresh configtext.length (character count) instead of Buffer.byteLength (byte count) for byte-based limitsVALIDATE_RETRY_OPTIONS not threaded through helper functions called by validateConfigSuggestion (minor improvements):
orderBy for deterministic paginationsyncContextPromise.all (concurrency 3-5)$select, sysparm_fields, fields)getDocument re-fetches data already included in the initial API response (e.g., comments returned with post)PAGE_SIZE when fewer records remain (Math.min(PAGE_SIZE, remaining))After reporting, fix every critical and warning issue. Apply suggestions where they don't add unnecessary complexity.
After fixing, confirm:
bun run lint passesrequiredScopes ⊆ OAuth provider scopes in oauth.tsuseBasicAuth, supportsRefreshTokenRotation)contentDeferred: true used when per-doc content fetch required, metadata-based contentHash consistent between stub and getDocumentbun run lint after fixes