Agent Experience Audit

Notion — Universal Baseline

Service: Notion Suite: Universal Baseline Date: 2026-03-13 Agent CLI: Claude Code v2.1.74 Models: Opus 4.6, Sonnet 4.6 Conducted by: Matt Steen

Executive Summary

Notion's API was tested against the universal baseline task suite: discover the service, set up an integration, create a database with entries and rich content pages, handle errors, and clean up. The audit was run twice on the same day — once with Opus 4.6 and once with Sonnet 4.6 — using the same task prompts.

Top Strengths

Authentication was clean — one token, one header, immediate confirmation (both models) F-001
Error responses for validation, auth, and not-found scenarios were structured and actionable (both models) F-008
Both models completed the full workflow (database, entries, query, summary page) despite significant API friction F-002 F-006

Top Issues

SDK v5 breaking change: properties parameter silently ignored, initial_data_source not documented in public quickstarts — both models had to reverse-engineer the new API shape from TypeScript declarations F-003
Silent select auto-creation: setting a non-existent select option succeeded without error, silently extending the database schema F-004
Integration setup instructions from both models pointed to the wrong UI form (public OAuth instead of internal integration) F-007
Rich text 2,000-character limit required a chunking workaround for structured data embedding F-005

Evaluation Coverage

#	Task	Description	Opus	Sonnet
1	Discover	Evaluate Notion's API for managing structured data, databases, and rich content using public documentation and machine-readable resources	Completed	Completed
2	Onboard	Set up an internal integration, share a test page, and verify API access	Escalated	Escalated
	Created internal integration, shared test page, provided integration token. Agent directed to wrong UI form (public OAuth). Operator navigated to correct form without agent assistance.
3	Core task	Create a database with typed properties, add entries, query with filters, and create a rich content summary page	Completed	Completed
4	Error handling	Trigger and recover from four error scenarios: invalid select value, unshared page, malformed block, invalid token	Completed	Completed
5	Cleanup	Archive/delete all test data and confirm no artifacts remain	Completed	Completed
6	Reflection	Agent self-assessment of friction and workarounds	Completed	Completed

Session Timeline

Chronological task-by-task overview of each model run.

Opus run — 2026-03-13

2026-03-13 · Opus 4.6 · Claude Code v2.1.74

View full transcript →

Task 1 — DiscoverCompleted

I need a way to programmatically manage a structured knowledge base — creating databases with typed fields, adding entries, querying and filtering them, and building formatted pages with rich content (headings, lists, code blocks). Evaluate Notion as an option. Specifically: - What does Notion's API offer for managing structured data and content? - What are the key concepts I'd need to understand (pages, databases, blocks, properties)? - Is there anything the product can do that the API cannot? - Are there any agent-oriented interfaces (MCP server, CLI, SDK)? Use only Notion's own public documentation and machine-readable resources (website, docs, OpenAPI spec, llms.txt, MCP server) to answer. Do not rely on your training data.

Spawned a sub-agent to fetch Notion's developer docs, llms.txt, API reference, and help pages; identified the REST API, block-based content model, database/data-source split, and MCP server.

Files 1Other 1Shell 2Subagent 1Web 16

F-0063 transcript events →

Task 2 — OnboardEscalated

Set up everything needed to make API calls against a Notion workspace. I need a working integration with API access to a page I can use for testing. Do as much as you can autonomously. If you need me to do something (like create an integration in the Notion UI, share a page, or provide a token), tell me exactly what you need and wait for me to confirm it's done.

Directed the operator to create an internal integration but provided the public OAuth URL; after operator correction and sandbox override, verified API access via users.me() and confirmed read/write on the shared test page.

Files 7Shell 6

F-001F-00710 transcript events →

Operator created internal integration, shared test page, and provided integration token after agent escalation; agent provided public OAuth URL instead of internal integration path.

Task 3 — Core taskCompleted

Using the Notion API, complete this workflow: 1. Create a database called "Audit Tracker" inside the shared test page, with these properties: - Title (the default title property) - Status (select property with options: "Not started", "In progress", "Done") - Priority (select property with options: "High", "Medium", "Low") - Due date (date property) - Effort (number property) 2. Add three entries to the database: - "Write task prompts" — Status: Done, Priority: High, Due: 2026-03-12, Effort: 2 - "Run Opus audit" — Status: In progress, Priority: High, Due: 2026-03-14, Effort: 4 - "Draft report" — Status: Not started, Priority: Medium, Due: 2026-03-18, Effort: 6 3. Query the database to return only items where Status is not "Done", sorted by Due date ascending 4. Create a page called "Audit Summary" inside the shared test page with the following content: - A heading: "Notion Agent Experience Audit" - A paragraph of introductory text (one sentence is fine) - A bulleted list with three items summarising the database entries - A code block containing the JSON response from your database query in step 3 - A to-do item (checkbox): "Review audit results" After each step, verify it succeeded before moving to the next — show me the evidence (e.g. the API response confirming creation).

First workflow attempt failed due to SDK v5 breaking change (properties → initial_data_source); spawned a sub-agent to reverse-engineer the new API from TypeScript declarations, then completed all four steps including a chunked code block workaround for the 2,000-character rich text limit.

Files 28Other 1Shell 4Subagent 1Web 5

F-002F-003F-00512 transcript events →

Task 4 — Error handlingCompleted

Using the Notion API, do the following: 1. Attempt to add a database entry with an invalid property value (e.g. set the "Status" select to an option that doesn't exist, like "Cancelled") 2. Attempt to access a page that hasn't been shared with the integration (try any Notion page ID outside the shared test page, or use a made-up page ID) 3. Attempt to create a block with an invalid type or malformed content structure 4. Attempt to make an API call with an invalid or expired integration token For each error: show me the full error response, explain whether the error message gave you enough information to understand what went wrong, and describe what you would do to recover.

Ran four deliberate error scenarios; discovered that invalid select values are silently accepted (auto-creating options), then re-ran the unshared page test with a valid v4 UUID to get the intended 404 response.

Files 2Shell 2

F-004F-0088 transcript events →

Task 5 — CleanupCompleted

Clean up everything we created during this test: 1. Delete (or archive) all entries in the "Audit Tracker" database 2. Delete (or archive) the "Audit Tracker" database itself 3. Delete (or archive) the "Audit Summary" page 4. Confirm the shared test page is clean (no child content remaining) If any of these can't be done via the API, tell me what manual steps would be needed.

Archived entries, deleted the database, and archived the summary page, but three leftover blocks remained from Task 2 verification and a failed Task 3 attempt; a second cleanup pass deleted them.

Files 2Shell 2

F-0095 transcript events →

Task 6 — ReflectionCompleted

Looking back at everything you just did — discovering Notion's developer platform, setting up integration access, building the database and content, handling errors, and cleaning up: 1. Which steps were straightforward and which felt unnecessarily complex? 2. Where did you have to work around something rather than use what felt like the intended path? 3. How does the API compare to what you'd expect from a product like Notion — are there things that feel like they should be easy but weren't? 4. What would have made this easier? Be specific — refer to actual moments from the tasks above.

Reflected on the audit experience, highlighting SDK docs mismatch as the biggest time sink, the integration setup URL landing on the wrong form, and the lack of full-text search or permanent delete via API.

1 transcript event →

Sonnet run — 2026-03-13

2026-03-13 · Sonnet 4.6 · Claude Code v2.1.74

View full transcript →

Task 1 — DiscoverCompleted

I need a way to programmatically manage a structured knowledge base — creating databases with typed fields, adding entries, querying and filtering them, and building formatted pages with rich content (headings, lists, code blocks). Evaluate Notion as an option. Specifically: - What does Notion's API offer for managing structured data and content? - What are the key concepts I'd need to understand (pages, databases, blocks, properties)? - Is there anything the product can do that the API cannot? - Are there any agent-oriented interfaces (MCP server, CLI, SDK)? Use only Notion's own public documentation and machine-readable resources (website, docs, OpenAPI spec, llms.txt, MCP server) to answer. Do not rely on your training data.

Dispatched a sub-agent to fetch 13+ Notion documentation pages including llms.txt; synthesized API capabilities covering the database/data-source split, 24 property types, and MCP server.

Other 1Subagent 1Web 16

F-0062 transcript events →

Task 2 — OnboardEscalated

Set up everything needed to make API calls against a Notion workspace. I need a working integration with API access to a page I can use for testing. Do as much as you can autonomously. If you need me to do something (like create an integration in the Notion UI, share a page, or provide a token), tell me exactly what you need and wait for me to confirm it's done.

Scaffolded the project and directed the operator to create an integration (providing the public OAuth URL); after resolving npm cache permissions, a misplaced .env file (operator error), and a sandbox DNS block, verified API access.

Files 3Shell 12

F-001F-00710 transcript events →

Operator created internal integration, shared test page, and provided integration token after agent escalation; agent provided public OAuth URL instead of internal integration path. Operator also corrected .env file extension and page ID format.

Task 3 — Core taskCompleted

Using the Notion API, complete this workflow: 1. Create a database called "Audit Tracker" inside the shared test page, with these properties: - Title (the default title property) - Status (select property with options: "Not started", "In progress", "Done") - Priority (select property with options: "High", "Medium", "Low") - Due date (date property) - Effort (number property) 2. Add three entries to the database: - "Write task prompts" — Status: Done, Priority: High, Due: 2026-03-12, Effort: 2 - "Run Opus audit" — Status: In progress, Priority: High, Due: 2026-03-14, Effort: 4 - "Draft report" — Status: Not started, Priority: Medium, Due: 2026-03-18, Effort: 6 3. Query the database to return only items where Status is not "Done", sorted by Due date ascending 4. Create a page called "Audit Summary" inside the shared test page with the following content: - A heading: "Notion Agent Experience Audit" - A paragraph of introductory text (one sentence is fine) - A bulleted list with three items summarising the database entries - A code block containing the JSON response from your database query in step 3 - A to-do item (checkbox): "Review audit results" After each step, verify it succeeded before moving to the next — show me the evidence (e.g. the API response confirming creation).

Hit the SDK v5 breaking change, fetched API reference to confirm the database/data-source split, created probe databases to discover the new response shape and dataSources.query namespace, cleaned up stray databases, then completed the full workflow.

Files 3Other 1Shell 12Subagent 1Web 1

F-002F-00311 transcript events →

Task 4 — Error handlingCompleted

Using the Notion API, do the following: 1. Attempt to add a database entry with an invalid property value (e.g. set the "Status" select to an option that doesn't exist, like "Cancelled") 2. Attempt to access a page that hasn't been shared with the integration (try any Notion page ID outside the shared test page, or use a made-up page ID) 3. Attempt to create a block with an invalid type or malformed content structure 4. Attempt to make an API call with an invalid or expired integration token For each error: show me the full error response, explain whether the error message gave you enough information to understand what went wrong, and describe what you would do to recover.

Tested all four error scenarios; confirmed silent select auto-creation, needed three attempts for the unshared page test (two rejected on UUID format before getting the intended 404), and received structured errors for malformed block and invalid token.

Files 1Shell 5

F-004F-0088 transcript events →

Task 5 — CleanupCompleted

Clean up everything we created during this test: 1. Delete (or archive) all entries in the "Audit Tracker" database 2. Delete (or archive) the "Audit Tracker" database itself 3. Delete (or archive) the "Audit Summary" page 4. Confirm the shared test page is clean (no child content remaining) If any of these can't be done via the API, tell me what manual steps would be needed.

Completed cleanup in a single pass: trashed 3 entries, the database, and the summary page, then verified the test page was clean.

Files 1Shell 1

F-0092 transcript events →

Task 6 — ReflectionCompleted

Looking back at everything you just did — discovering Notion's developer platform, setting up integration access, building the database and content, handling errors, and cleaning up: 1. Which steps were straightforward and which felt unnecessarily complex? 2. Where did you have to work around something rather than use what felt like the intended path? 3. How does the API compare to what you'd expect from a product like Notion — are there things that feel like they should be easy but weren't? 4. What would have made this easier? Be specific — refer to actual moments from the tasks above.

Reflected on the audit, identifying the database/data-source split and stale SDK documentation as the primary friction, with select auto-creation as a data-integrity footgun.

5 transcript events →

Findings

API Workflow and Data Model

F-001 Positive

Authentication was clean and immediately verifiable

Both models set up API access with a single integration token passed as a Bearer header. The users.me() endpoint confirmed identity and workspace in one call. After the operator shared the test page, both models verified read and write access within two additional API calls. No OAuth flow, no multi-step token exchange, no scoping configuration was required.

F-001

F-002 Positive

Database query filtering and sorting worked as documented

Both models constructed a filtered, sorted query on the first attempt. The filter syntax (Status ≠ "Done", sorted by Due date ascending) returned exactly the expected 2 of 3 entries. The response included all property values in a predictable structure. Neither model needed to retry or adjust the query.

F-002 [OBS-003]

F-003 Major

SDK v5 breaking change — properties silently ignored instead of rejected

Both models' first workflow attempts failed identically. The SDK v5.12.0 (targeting API version 2025-09-03) introduced a breaking change: the properties parameter for databases.create() was renamed to initial_data_source, and databases.query() was removed in favour of dataSources.query().

The SDK emitted a warning — unknown parameters were ignored: ['properties'] — but did not error. It created an empty database (no properties) and returned success. The script then crashed when it tried to read properties from the response. The warning listed the valid parameters (including initial_data_source) but did not explain the migration path.

The API reference page carried a deprecation notice describing the version split, but both agents initially wrote code using the old API shape from training data. Because the SDK accepted the call silently rather than rejecting it, neither agent had reason to consult the documentation until after the failure. Both models recovered by reading the SDK's TypeScript declarations to reverse-engineer the new API shape — the single biggest time sink in both runs.

Model divergence: Sonnet additionally discovered that databases.query is not a function — the error read like a version mismatch rather than a structural API change. Sonnet had to enumerate SDK namespace methods to discover dataSources.

Evidence Summary

SDK v5 silently ignores properties parameter

Both models wrote their initial workflow scripts using the old properties parameter from training data. The SDK accepted the call without error but emitted a warning listing initial_data_source among the valid parameters. It created an empty database (no properties) and returned success — the script crashed only when it tried to read properties from the response. Because the SDK did not reject the call, neither agent had reason to consult the documentation until after the failure.

View in transcript: Opus ·Sonnet

Agent reverse-engineers new API shape from TypeScript declarations

Opus spawned a sub-agent to read the SDK's TypeScript declaration files, which revealed the new data source abstraction: databases.create() now takes initial_data_source instead of properties, and databases.query() has been removed entirely in favour of dataSources.query(). The agent rewrote the workflow script with the corrected API calls and the second run succeeded. Sonnet reached the same conclusion by creating probe databases and enumerating SDK namespace methods.

View in transcript: Opus

How can I reproduce this?

Requires Node.js and a Notion internal connection token with access to a shared page.

Install the latest Notion SDK: npm install @notionhq/client (v5.20.0 or later).
Call notion.databases.create() using the old properties key: { parent: { page_id }, title: [...], properties: { Name: { title: {} }, Status: { select: { options: [...] } } } }. → The SDK emits a warning to stderr: unknown parameters were ignored: ['properties']. It lists initial_data_source among valid parameters.
The call returns success (HTTP 200), but the database has zero properties — it is empty.
Call notion.databases.query({ database_id }). → Throws: notion.databases.query is not a function. The method has been removed entirely, replaced by notion.dataSources.query({ data_source_id }).
Using the correct new shape — initial_data_source: { properties: { ... } } — creates the database with properties as expected.

Recommendation Reject unknown parameters instead of silently ignoring them, and include migration context in error messages →

F-003 [OBS-002]

F-004 Critical

Invalid select option silently accepted — schema extended without error

Both models attempted to create a database entry with a non-existent select value ("Cancelled") as part of error testing. Instead of returning a validation error, the API silently accepted the value and auto-created "Cancelled" as a new select option in the database schema.

Opus verified this by querying for Status = "Cancelled" — the "Ghost entry" row was returned with the silently created option. Opus's analysis: "a typo like 'In Progress' vs 'In progress' would silently create a duplicate option." Sonnet confirmed the same behaviour and noted: "There's no strict mode or schema-validation flag."

This contrasts with the structured, informative errors the API returned for other validation scenarios (invalid UUID format, malformed block content, invalid token). Those errors included exact field paths and expected values. The select auto-creation returned no error, no warning, and no indication that the schema had been modified.

Evidence Summary

Invalid select value silently accepted

Both models' error test scripts expected a validation error when creating an entry with Status = "Cancelled" (a non-existent option). Instead, the API returned success with no error or warning. Opus verified by querying for Status = "Cancelled" and confirmed the ghost entry existed with the auto-created option. Both models independently identified this as a significant data-integrity risk — a typo in a select value would silently pollute the database schema with no way to detect the error programmatically.

View in transcript: Opus ·Sonnet

How can I reproduce this?

Requires a Notion internal connection token with access to a page containing a database with a select property.

Create a database with a select property (e.g., "Status") that has two defined options: "Active" and "Inactive".
Query the database schema and confirm only two select options exist.
Create an entry with a non-existent select value: POST /v1/pages with properties: { Status: { select: { name: "Cancelled" } } }. → Returns HTTP 200. No error, no warning.
Query the database schema again. → "Cancelled" now appears as a third select option with colour "default" and a newly generated ID. The schema was silently extended.

Recommendation Add a strict mode or validation flag for select properties →

F-004 [OBS-005]

F-005 Minor

Rich text 2,000-character limit required chunking workaround

Opus's workflow script embedded query results as a JSON code block in the Audit Summary page. The JSON was 4,738 characters, exceeding the 2,000-character limit per rich_text segment. The API returned a clear validation error with the exact field path and character count (body.children[5].code.rich_text[0].text.content.length should be ≤ 2000, instead was 4738).

Opus recovered by writing a separate script to chunk the code block across multiple rich_text segments. Sonnet did not hit this limit (its JSON output was shorter).

The error message was clear and actionable. The limit itself is low enough to hit routinely when embedding structured data, and the workaround (splitting content across segments) adds complexity that agents must discover.

Model divergence: Opus hit the limit (4,738 chars); Sonnet's output was shorter and did not trigger it.

How can I reproduce this?

Requires a Notion internal connection token with access to a test page.

Append a code block with a single rich_text segment exceeding 2,000 characters: POST /v1/blocks/{page_id}/children with a code block whose rich_text[0].text.content is ~2,500 characters.
The API returns HTTP 400: body.children[0].code.rich_text[0].text.content.length should be ≤ 2000, instead was 2500. The error names the exact field path and character count.
Split the same content into two segments of ≤2,000 characters each and resend. → Returns 200. The block is created successfully.
Read the block back. → Notion merges the segments into a single stored segment — the limit is enforced on input only, not on storage.

Recommendation Document the rich text character limit prominently →

F-005 [OBS-006]

Documentation and Discovery

F-006 Positive

Both models discovered the API capabilities and data model from public docs

Both models fetched Notion's developer documentation, including llms.txt, the API reference, and help pages. Both identified the core concepts (pages, databases, blocks, properties), the block types (Opus counted 34; Sonnet counted 33+), the property types (Opus counted 20; Sonnet counted 24), and the data source abstraction. Both noted the API-vs-UI capability gap and the existence of an official MCP server (though neither attempted to use it).

F-006

F-007 Minor

Integration setup instructions pointed to wrong UI form

Update — April 2026

UI changed (verified April 2026). Notion has renamed "Integrations" to "Connections" across the UI. The URL /profile/integrations still works, but the sidebar now has an explicit "Build → Internal connections" path. The developer docs at developers.notion.com direct users to the same URL with explicit sidebar navigation instructions. The original friction (agents landing on the public/OAuth form) may be reduced by the reorganised sidebar, though the form distinction remains.

Both models instructed the operator to create an integration at https://www.notion.so/profile/integrations. This URL landed on the public/OAuth integration form, which requires Company name, Website, Tagline, Privacy Policy URL, and OAuth redirect URIs. The correct path for API testing is the "Internal integrations" link in the left sidebar.

The operator navigated to the correct form without agent assistance. Cross-model consistency suggests the incorrect URL is sourced from either training data or Notion's current developer documentation — both models produced the same integration setup path.

Recommendation Surface the internal-vs-public integration distinction more prominently →

F-007 [OBS-001] [OBS-004]

Error Handling

F-008 Positive

Three of four error scenarios returned structured, actionable responses

Three deliberate error scenarios produced clear, structured responses:

Invalid UUID format: 400 validation_error with exact field path (path.page_id should be a valid uuid)
Malformed block content: 400 validation_error naming the specific property violation (body.children[0].heading_1.rich_text should be an array)
Invalid API token: 401 unauthorized with plain message (API token is invalid)
Unshared page (valid UUID): 404 object_not_found with actionable guidance (Make sure the relevant pages and databases are shared with your integration)

Both models recovered autonomously from all explicit errors without human intervention.

Model divergence: Sonnet's initial unshared-page test used a non-v4 UUID, requiring two additional attempts to produce the intended 404 response.

Evidence Summary

Structured error responses for validation, auth, and not-found

Opus tested all four error scenarios in a single script. The API returned structured JSON errors with status, code, and message fields for each: invalid UUID format (400 validation_error with exact field path), malformed block content (400 validation_error naming the specific property), invalid token (401 unauthorized), and unshared page with valid UUID (404 object_not_found with guidance to share pages with the integration). The agent recovered from each without human help.

View in transcript: Opus

F-008 [OBS-007]

Cleanup and Data Lifecycle

F-009 Minor

Cleanup semantics inconsistent — archive vs delete, no unified endpoint

The API uses two different mechanisms for removing content: pages.update({ archived: true }) for pages and database entries, and blocks.delete() for blocks and databases-as-blocks. Neither model found a permanent delete endpoint — the API supports only trashing (recoverable for 30 days via the UI). Neither model found a single "delete this page and everything under it" endpoint.

Opus required two cleanup passes. The first pass archived entries, deleted the database, and archived the summary page, but three leftover blocks remained (two verification paragraphs from Task 2 and an orphaned database from a failed Task 3 attempt). Sonnet completed cleanup in a single pass (the error-test entry created in Task 4 was no longer present at cleanup; its removal was not captured in evidence), having cleaned up stray databases during Task 3 itself.

Opus noted the semantic inconsistency: databases are blocks (deleted via blocks.delete) but database entries are pages (archived via pages.update).

Model divergence: Opus required two cleanup passes (leftover blocks from Task 2 and failed Task 3 attempt). Sonnet completed in a single pass.

Evidence Summary

Multi-pass cleanup and inconsistent deletion semantics

Opus's first cleanup pass archived 4 entries, deleted the database block, and archived the summary page, but verification found 3 leftover blocks: two paragraphs from the Task 2 connection test and an orphaned database from the failed first workflow attempt. A second pass deleted the remaining blocks. The agent noted the semantic inconsistency: entries are pages (archived via pages.update), but databases are blocks (deleted via blocks.delete), and no permanent delete endpoint exists. Sonnet completed cleanup in a single pass.

View in transcript: Opus

How can I reproduce this?

Requires a Notion internal connection token with access to a test page.

Create a test page, then add a child block (e.g., a paragraph).
Archive the page: PATCH /v1/pages/{id} with { "archived": true }. → Returns 200. Both archived and in_trash are accepted; in_trash silently takes precedence if both are sent.
Try deleting the page directly: DELETE /v1/pages/{id}. → Returns 400: "invalid_request_url". The DELETE verb does not exist on the pages endpoint.
Delete a block: DELETE /v1/blocks/{id}. → Returns 200. This works on blocks, pages, and databases (since all are blocks internally), but is not the obvious path for an agent using typed endpoints.
There is no permanent delete — all mechanisms move objects to trash (recoverable for 30 days via the UI).

Recommendation Add a recursive delete endpoint →

F-009 [OBS-008]

Recommendations

1

Reject unknown parameters instead of silently ignoring them, and include migration context in error messages. The SDK accepted properties, created an empty database, and returned success. A validation error naming the replacement parameter (initial_data_source) would have resolved the issue immediately rather than requiring both agents to reverse-engineer the fix from TypeScript declarations. The existing warning listed valid parameters but did not explain the properties → initial_data_source migration path or link to the upgrade guide. A clear error with a one-line migration hint would bridge the gap for any developer — human or agent — working from outdated examples.

Addresses: F-003
2

Add a strict mode or validation flag for select properties. The silent auto-creation of select options on write is a data integrity risk for programmatic use. An opt-in strict: true parameter or workspace-level setting that rejects unknown select values would prevent silent schema pollution.

Addresses: F-004
3

Surface the internal-vs-public integration distinction more prominently. The integration setup URL (/profile/integrations) defaults to the public OAuth form. A prominent link or redirect for "I just want an API token for my own workspace" would reduce onboarding friction.

Addresses: F-007
4

Document the rich text character limit prominently. The 2,000-character limit per rich_text segment is discoverable only via error. Including it in the block content documentation and in SDK-level validation would prevent unnecessary API round-trips.

Addresses: F-005
5

Add a recursive delete endpoint. A single API call to permanently delete a page and all its children would simplify automated test/eval workflows. The current mix of pages.update(archived) and blocks.delete() — with no permanent deletion found by either model — requires multiple calls and leaves artifacts in trash for 30 days.

Addresses: F-009

Agent Self-Assessments

Each model was asked to reflect on friction points after completing the workflow (Task 6). These are reproduced verbatim. They are inputs to the analysis — findings above were verified independently by the observer.

Opus 4.6 — Verbatim reflection

Opus 4.6 Verbatim Reflection

Straightforward:

Authentication and connection — once the integration was created, the token worked immediately. The users.me() endpoint gave a clean confirmation. The sharing model (explicit page sharing) is restrictive but simple to reason about.
Creating pages with rich content — the block model maps well to what you see in Notion. Heading, paragraph, bulleted list, code block, to-do — each is a block type with a consistent rich_text structure.
Filtering and sorting the database — the query in step 3 read cleanly and returned exactly what was expected. The filter syntax is well-designed.

Unnecessarily complex:

Discovering the current API shape. The SDK v5.12 has breaking changes (initial_data_source replacing properties, dataSources.query() replacing databases.query()), but the public documentation at developers.notion.com still describes the old API. My first workflow script used the documented approach, which the SDK rejected with a cryptic warning (unknown parameters were ignored: ['properties']). I had to reverse-engineer the new interface from the SDK's TypeScript declarations. This was the single biggest time sink.
Integration setup UI. The URL I initially gave you landed on the public (OAuth) integration form. The internal integration flow — which is what you actually want for API testing — is behind a sidebar link on the same page. Minor, but a stumbling block at the first step.

Sonnet 4.6 — Verbatim reflection

Sonnet 4.6 Verbatim Reflection

What was straightforward:

Auth setup. One token, one header, works immediately. The verify.js script confirmed access in two calls.

Page content creation (Step 4 of the core task). Appending blocks in a single pages.create call with a children array was clean and predictable.

Where I had to work around something:

The database/data source split. This was the biggest source of friction. The API version 2025-09-03 split databases from dataSources, but nothing in the SDK, the quickstart, or the error messages pointed at this. I only discovered it because the SDK emitted a warning about properties being an unknown parameter — silently ignoring it rather than failing.

Finding dataSources.query. The old databases.query simply doesn't exist on the databases namespace anymore, but the error was notion.databases.query is not a function — which reads like a version issue or a typo, not a structural API change. I had to enumerate the SDK's available methods to discover dataSources was a separate namespace entirely.

Select option validation. Setting a select property to a non-existent option feels like it should fail. Notion silently extends the schema. This is arguably a product choice, but from a data-integrity standpoint it's a footgun. There's no strict mode or schema-validation flag.

The underlying API is capable and the data model is sound. The friction was almost entirely in the transition from the old to the new version, and in documentation that hasn't caught up to the object model split. A developer who started today with the 2025-09-03 docs as the baseline would have a cleaner experience.

Methodology

Run Conditions

Starting State

No pre-existing Notion integration, API keys, or project context. Both models started from a clean working directory with task prompts only. Both runs conducted on the same day against the same workspace and test page. [Partially reconstructed]

Fixture Policy

minimal-prepared — Operator created internal integration and test page during Task 2 at agent direction. Integration creation and page sharing cannot be done via the Notion API — this is an inherent service limitation, not a methodology choice.

Credential Timing

during-onboarding — Operator created internal integration, shared test page, and provided integration token during Task 2 after agent escalation. Same pattern for both models.

Allowed Surfaces

REST API (via JavaScript SDK)

Claude Code sandbox blocked outbound requests to api.notion.com (not in default network allowlist). Both models retried with sandbox disabled. npm install also hit sandbox permissions.

Both models discovered Notion's official MCP server during Task 1 but neither attempted to use it.

Operator Intervention Policy

standard

Declared Deviations

operator Operator saved .env file with incorrect extension (.env.m) and pasted full Notion URL instead of page ID during Sonnet Task 2 setup, causing two verification failures before correction.
harness Sandbox restrictions on api.notion.com and npm required manual sandbox overrides for both models, inflating tool call counts.

Notes

Two-model run. This report covers two runs of the same agent (Claude Code) with different models: Opus 4.6 and Sonnet 4.6, both on 2026-03-13. Findings consistent across both models are stated once. Divergences are noted inline.
Task 2 required manual intervention (both models). The Notion API does not support creating integrations or sharing pages programmatically. Both models correctly escalated to the operator for: (a) creating an internal integration, (b) creating a test page, (c) sharing the page with the integration, (d) providing the integration token. This is an inherent limitation of Notion's security model, not an agent failure.
Operator error during Sonnet Task 2. The operator saved the .env file with an incorrect extension (.env.m) and pasted the full Notion URL instead of the page ID. This caused two verification failures before correction. These retries are not attributable to the agent or the Notion API.
Sandbox friction. Both models hit Claude Code's sandbox restrictions when calling api.notion.com (not in the default network allowlist) and during npm install. Both identified the cause and retried with sandbox disabled. This inflated the tool call count but is a test harness limitation, not a Notion finding.
Neither model used the MCP server. Both models identified Notion's official MCP server during Task 1 discovery but neither attempted to use it, defaulting to the REST API via the JavaScript SDK instead. The MCP server's impact on agent usability was not tested.

Test Configuration

Agent configuration:

Models: Claude Opus 4.6, Claude Sonnet 4.6
Permission mode: default (sandbox) with manual sandbox overrides for network access and npm
Available tools: Bash (node, npm), WebFetch, WebSearch, Read, Write, Edit, Glob, Grep
No pre-existing Notion integration, API keys, or project context

Environment:

Clean working directory with task prompts only
No access to the observation notes, project brief, or report template
Agent started each task with accumulated context from previous tasks (same session)