Data Classification Administration¶

OpenClaw Enterprise enforces a four-level data classification system. Classification travels with data at all times -- summaries, derivatives, cached results, and agent-to-agent exchanges all carry the classification of their source material. This is a constitutional requirement of the system.

Classification Levels¶

Level	Label	Description	Example Data
1	`public`	Information intended for public access. No restrictions on model routing or sharing.	Public GitHub issues, open-source documentation, published blog posts
2	`internal`	General business information. Default level for most connectors.	Internal emails, calendar events, Jira tickets, Google Drive documents
3	`confidential`	Sensitive business information requiring access controls and restricted model routing.	Financial reports, HR documents, customer PII, security configurations
4	`restricted`	Highest sensitivity. Strictest controls on access, model routing, and sharing.	Credentials, encryption keys, legal holds, M&A documents, medical records

Key Rule: Data classification can only be upgraded (moved to a higher level), never downgraded, through automated processes. Only an admin override with a logged reason can change classification in any direction.

Three-Layer Assignment Pipeline¶

Data classification is assigned through a three-layer pipeline. Each layer can only upgrade (increase) the classification set by the previous layer, with one exception: admin overrides can set any level.

Layer 1: Connector Default
    │
    ▼
Layer 2: AI Reclassification (upgrade only)
    │
    ▼
Layer 3: Admin Override (any direction, reason required)
    │
    ▼
Final Classification

Layer 1: Connector Defaults¶

Each connector assigns a default classification to all data it ingests:

Connector	Default Classification	Rationale
Gmail	`internal`	Email content is business-internal by default
Google Calendar	`internal`	Calendar events contain meeting details and attendees
Jira	`internal`	Tickets contain project details and business logic
GitHub (public repos)	`public`	Public repository data is publicly accessible
GitHub (private repos)	`internal`	Private repository data is business-internal
Google Drive	`internal`	Documents are business-internal by default

These defaults are configurable via the data policy domain. See Policy Engine for details.

Layer 2: AI Reclassification¶

After connector ingestion, an AI classifier analyzes the content and may upgrade the classification. The AI classifier:

Can only upgrade the classification (e.g., internal to confidential). It can never downgrade.
Scans for patterns indicating higher sensitivity: PII (names, emails, SSNs, phone numbers), financial data, credentials, API keys, legal language, medical information.
Logs every reclassification decision to the audit trail with the reason for upgrade.
Can be disabled per-policy by setting allow_ai_reclassification: false in the data policy domain.

Example reclassification:

An email ingested by the Gmail connector is initially classified as internal (connector default). The AI classifier detects a Social Security number in the email body and upgrades the classification to confidential.

Gmail email → internal (connector default) → confidential (AI detected PII)

Layer 3: Admin Override¶

Administrators can override the classification of any data item to any level. This is the only mechanism that can downgrade a classification.

Requirements for admin override:

Caller must have org_admin or enterprise_admin role.
A reason field is required and cannot be empty.
The override, including the administrator identity, previous classification, new classification, and reason, is logged to the immutable audit trail.

curl -X PUT https://openclaw.example.com/api/v1/data/item_abc123/classification \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "classification": "public",
    "reason": "Document has been approved for public release by legal team (LEGAL-2026-0142)"
  }'

Response:

{
  "item_id": "item_abc123",
  "previous_classification": "internal",
  "new_classification": "public",
  "override_by": "admin@example.com",
  "override_at": "2026-03-13T14:30:00Z",
  "reason": "Document has been approved for public release by legal team (LEGAL-2026-0142)"
}

Classification Propagation¶

When data is processed, transformed, or summarized, the resulting output inherits the classification of its source material. This is enforced automatically and cannot be bypassed.

Operation	Propagation Rule	Example
Summarization	Output inherits source classification	Summary of a `confidential` document is `confidential`
Multi-source aggregation	Output inherits the highest source classification	Summary of `internal` email + `confidential` doc = `confidential` summary
Cached/derived data	Inherits source classification	Cached search results retain original classification
Agent-to-agent exchange	Classification travels with the data envelope	OCIP message carries source classification
Copy/export	Retains original classification	Exported data retains its classification label

Important: There is no concept of "declassification by transformation." Summarizing a confidential document does not make the summary internal. The summary is confidential.

Model Routing¶

Data classification directly controls which AI models can process the data. This is enforced by the policy engine at the point of model invocation.

Classification	Default Model Routing	Override Available
`public`	Any allowed model (external or self-hosted)	N/A
`internal`	Any allowed model (external or self-hosted)	N/A
`confidential`	Self-hosted models only	Enterprise policy can explicitly allow specific external models
`restricted`	Self-hosted models only	Enterprise policy can explicitly allow specific external models

Default Behavior¶

By default, confidential and restricted data is only sent to self-hosted models. This prevents sensitive data from being transmitted to third-party model providers.

The models policy domain controls which models are approved for classified data:

{
  "domain": "models",
  "rules": {
    "allowed_models": ["gpt-4", "claude-3-opus", "llama-3-70b"],
    "confidential_data_models": ["llama-3-70b"],
    "restricted_data_models": ["llama-3-70b"]
  }
}

In this example, only the self-hosted llama-3-70b model can process confidential or restricted data. Public and internal data can use any of the three allowed models.

Overriding Model Routing¶

An enterprise_admin can explicitly allow external models for classified data by adding them to the confidential_data_models or restricted_data_models arrays. This should only be done when the external model provider has been vetted and contractual data protection agreements are in place.

{
  "domain": "models",
  "rules": {
    "confidential_data_models": ["llama-3-70b", "claude-3-opus"],
    "restricted_data_models": ["llama-3-70b"]
  },
  "change_reason": "Allow Claude for confidential data per DPA signed 2026-03-01 (CONTRACT-2026-0089)"
}

Agent-to-Agent Exchange Enforcement¶

When OpenClaw instances communicate via the OCIP protocol, data classification is enforced at the sender side:

The sender constructs an OCIP envelope containing data items with their classifications.
The policy engine checks the receiver's maximum allowed classification (configured in the agent_to_agent policy domain via max_classification_outbound).
Any data item with a classification above the receiver's clearance is filtered out of the envelope before sending.
The filtering action is logged to the audit trail.

Example:

Sender policy: max_classification_outbound = "internal"

OCIP envelope contains:
  - Item A: public     → INCLUDED
  - Item B: internal   → INCLUDED
  - Item C: confidential → FILTERED OUT
  - Item D: restricted   → FILTERED OUT

The receiver never sees Items C and D. The sender's audit log records that these items were filtered due to classification policy.

Configuration via Policy¶

Data classification behavior is controlled through the data policy domain:

{
  "domain": "data",
  "scope": { "level": "enterprise" },
  "status": "active",
  "rules": {
    "default_classification": "internal",
    "allow_ai_reclassification": true,
    "ai_reclassification_direction": "upgrade_only",
    "retention_days": 365,
    "ephemeral_data_ttl_hours": 24,
    "allow_data_export": true,
    "export_max_classification": "internal"
  },
  "change_reason": "Enterprise baseline data classification policy"
}

Field	Description	Default
`default_classification`	Classification assigned when no connector default or AI classification applies	`internal`
`allow_ai_reclassification`	Whether the AI classifier is enabled	`true`
`ai_reclassification_direction`	Direction AI can reclassify (`upgrade_only` is the only supported value)	`upgrade_only`
`retention_days`	Minimum data retention period in days	`365`
`ephemeral_data_ttl_hours`	Time-to-live for ephemeral/cached data in hours	`24`
`allow_data_export`	Whether data export is permitted	`true`
`export_max_classification`	Maximum classification level that can be exported via the export API	`internal`

Troubleshooting¶

Data Classified Higher Than Expected¶

Check the connector default for the data source (see table above).
Check the audit log for AI reclassification events on the data item.
The AI classifier may have detected sensitive patterns. Review the reclassification reason in the audit entry.
If the classification is incorrect, use an admin override to set the correct level (requires a reason).

Confidential Data Sent to Wrong Model¶

Verify the confidential_data_models list in the active models policy.
Check for org-level or team-level policies that may have modified the model list (hierarchy merge uses intersection -- a lower-level policy cannot add models not in the parent).
Review the audit log for the model invocation to see which policy was applied.

Agent Exchange Missing Data¶

If a receiving OpenClaw instance reports missing data from an OCIP exchange:

Check the sender's max_classification_outbound in the agent_to_agent policy.
Review the sender's audit log for classification filtering events.
The missing data items likely had a classification above the outbound maximum and were filtered.