Testing Guide¶

OpenClaw Enterprise uses Vitest for all TypeScript plugin tests and the OPA test framework for Rego policy tests. This guide covers test configuration, patterns, mocking strategies, and coverage requirements.

Vitest Configuration¶

The root vitest.config.ts configures test discovery and coverage for all plugins:

import { defineConfig } from 'vitest/config';

export default defineConfig({
  test: {
    globals: true,
    environment: 'node',
    include: ['plugins/*/tests/**/*.test.ts'],
    coverage: {
      provider: 'v8',
      reporter: ['text', 'json', 'html'],
      include: ['plugins/*/src/**/*.ts'],
      exclude: ['plugins/*/src/**/*.d.ts', 'plugins/shared/**'],
      thresholds: {
        statements: 80,
        branches: 80,
        functions: 80,
        lines: 80,
      },
    },
    testTimeout: 10000,
  },
});

Key points:

Test globals are enabled (describe, it, expect do not need explicit imports, though importing them from vitest is recommended for clarity).
Test discovery scans plugins/*/tests/**/*.test.ts.
Coverage is collected from plugins/*/src/**/*.ts, excluding declaration files and the shared library.
Thresholds enforce 80% minimum for statements, branches, functions, and lines per plugin.
Test timeout is 10 seconds.

Running Tests¶

# Run all tests once
pnpm test

# Run tests in watch mode (re-runs on file change)
pnpm test:watch

# Run tests for a specific plugin
pnpm test -- --filter plugins/connector-gmail

# Run a specific test file
pnpm test -- plugins/connector-gmail/tests/gmail.test.ts

# Run with coverage report
pnpm test -- --coverage

Test File Locations¶

Tests live alongside their plugin in the tests/ directory:

plugins/
  connector-gmail/
    tests/
      gmail.test.ts
  policy-engine/
    tests/
      hierarchy.test.ts
      evaluate.test.ts
      classify.test.ts
  audit-enterprise/
    tests/
      writer.test.ts
  work-tracking/
    tests/
      work-tracking.test.ts

Mocking the Gateway¶

The most common mock in enterprise plugin tests is the GatewayMethods object. Every connector and feature plugin depends on policy.evaluate, policy.classify, and audit.log.

Standard Mock Gateway¶

import { vi } from 'vitest';
import type { GatewayMethods } from '@openclaw-enterprise/shared/connector-base.js';

function createMockGateway(overrides?: Partial<GatewayMethods>): GatewayMethods {
  return {
    'policy.evaluate': vi.fn().mockResolvedValue({
      decision: 'allow',
      policyApplied: 'test-policy',
      reason: 'Allowed by test',
      constraints: {},
    }),
    'policy.classify': vi.fn().mockResolvedValue({
      classification: 'internal',
      assignedBy: 'connector_default',
      originalLevel: null,
      confidence: 1.0,
    }),
    'audit.log': vi.fn().mockResolvedValue({ id: 'audit-001' }),
    ...overrides,
  };
}

This mock allows all actions by default. Override specific methods to test denial, approval, and classification scenarios.

Overriding for Policy Denial¶

const gateway = createMockGateway({
  'policy.evaluate': vi.fn().mockResolvedValue({
    decision: 'deny',
    policyApplied: 'restrict-email',
    reason: 'User not authorized for email access',
    constraints: {},
  }),
});

Overriding for Classification¶

const gateway = createMockGateway({
  'policy.classify': vi.fn().mockResolvedValue({
    classification: 'confidential',
    assignedBy: 'ai_reclassification',
    originalLevel: 'internal',
    confidence: 0.95,
  }),
});

Mocking fetch for Connector Tests¶

Connector plugins use global.fetch to call external APIs. Mock it with vi.fn():

import { vi, beforeEach } from 'vitest';

beforeEach(() => {
  vi.restoreAllMocks();
});

it('fetches data from the API', async () => {
  global.fetch = vi.fn().mockResolvedValue({
    ok: true,
    json: async () => ({
      messages: [{ id: 'msg-1', subject: 'Test' }],
    }),
  }) as unknown as typeof fetch;

  const tools = new GmailReadTools(gateway, 'tenant-1', 'user-1', 'token');
  const result = await tools.emailRead({ messageId: 'msg-1' });

  expect(result.connectorStatus).toBe('ok');
});

Simulating API Errors¶

// OAuth revocation (401)
global.fetch = vi.fn().mockResolvedValue({
  ok: false,
  status: 401,
  statusText: 'Unauthorized',
}) as unknown as typeof fetch;

// API unavailability (503)
global.fetch = vi.fn().mockRejectedValue(
  new Error('503 Service Unavailable'),
) as unknown as typeof fetch;

// Network error
global.fetch = vi.fn().mockRejectedValue(
  new Error('ECONNREFUSED'),
) as unknown as typeof fetch;

Policy Denial Tests¶

Every plugin must test fail-closed behavior. These tests verify that when the policy engine denies an action, the plugin:

Does NOT perform the action (no external API call).
Returns an appropriate error or empty result.
Logs the denial to the audit trail.

Example: Verifying Fail-Closed Behavior¶

describe('policy denial', () => {
  it('returns denied result when policy denies', async () => {
    const gateway = createMockGateway({
      'policy.evaluate': vi.fn().mockResolvedValue({
        decision: 'deny',
        policyApplied: 'restrict-acme',
        reason: 'Not authorized',
        constraints: {},
      }),
    });

    const tools = new AcmeReadTools(gateway, 'tenant-1', 'user-1', 'token');
    const result = await tools.listTasks({ projectId: 'proj-1' });

    // Action was not performed
    expect(global.fetch).not.toHaveBeenCalled();

    // Result indicates denial
    expect(result.items).toHaveLength(0);
    expect(result.connectorStatus).toBe('error');
    expect(result.errorDetail).toContain('Denied by policy');
  });

  it('fails closed when policy engine is unreachable', async () => {
    const gateway = createMockGateway({
      'policy.evaluate': vi.fn().mockRejectedValue(
        new Error('ECONNREFUSED'),
      ),
    });

    const tools = new AcmeReadTools(gateway, 'tenant-1', 'user-1', 'token');

    // Should throw PolicyEngineUnreachableError, not proceed with the action
    await expect(tools.listTasks({ projectId: 'proj-1' })).rejects.toThrow();
    expect(global.fetch).not.toHaveBeenCalled();
  });
});

Verifying Audit Logging on Denial¶

it('logs audit entry for denied write operations', async () => {
  const gateway = createMockGateway({
    'policy.evaluate': vi.fn().mockResolvedValue({
      decision: 'deny',
      policyApplied: 'restrict-write',
      reason: 'Write access not authorized',
      constraints: {},
    }),
  });

  const tools = new AcmeWriteTools(gateway, 'tenant-1', 'user-1', 'token');
  const result = await tools.createTask({ title: 'Test' });

  expect(result.success).toBe(false);
  expect(gateway['audit.log']).toHaveBeenCalledWith(
    expect.objectContaining({
      policyResult: 'deny',
      outcome: 'denied',
    }),
  );
});

OPA Test Framework for Rego Policies¶

Rego policies in plugins/policy-engine/rego/ are tested using OPA's built-in test framework.

Running Rego Tests¶

# Install OPA CLI
brew install opa

# Run all Rego tests
opa test plugins/policy-engine/rego/ -v

# Run tests for a specific policy
opa test plugins/policy-engine/rego/models.rego plugins/policy-engine/rego/models_test.rego -v

Writing Rego Tests¶

Rego test files use the _test.rego suffix and define rules prefixed with test_:

# plugins/policy-engine/rego/models_test.rego
package openclaw.enterprise.models_test

import rego.v1
import data.openclaw.enterprise.models

# Test: public data allowed with default policy
test_allow_public_data if {
    models.allow with input as {
        "data_classification": "public",
        "additional": { "provider": "openai" }
    }
}

# Test: confidential data blocked for external models
test_deny_confidential_external if {
    not models.allow with input as {
        "data_classification": "confidential",
        "additional": { "provider": "openai" }
    }
}

# Test: confidential data allowed for self-hosted models
test_allow_confidential_self_hosted if {
    models.allow with input as {
        "data_classification": "confidential",
        "additional": { "provider": "self-hosted" }
    }
    with data.policy as {
        "allowed_classifications": ["public", "internal", "confidential"],
        "max_classification": "confidential"
    }
}

Full Test Example¶

This is a complete test file from the Gmail connector, demonstrating all key patterns:

import { describe, it, expect, vi, beforeEach } from 'vitest';
import { GmailReadTools } from '../src/tools/read.js';
import type { GatewayMethods } from '@openclaw-enterprise/shared/connector-base.js';
import { OAuthRevocationError } from '@openclaw-enterprise/shared/errors.js';

function createMockGateway(overrides?: Partial<GatewayMethods>): GatewayMethods {
  return {
    'policy.evaluate': vi.fn().mockResolvedValue({
      decision: 'allow',
      policyApplied: 'test-policy',
      reason: 'Allowed by test',
      constraints: {},
    }),
    'policy.classify': vi.fn().mockResolvedValue({
      classification: 'internal',
      assignedBy: 'connector_default',
      originalLevel: null,
      confidence: 1.0,
    }),
    'audit.log': vi.fn().mockResolvedValue({ id: 'audit-001' }),
    ...overrides,
  };
}

describe('GmailReadTools', () => {
  let gateway: GatewayMethods;

  beforeEach(() => {
    gateway = createMockGateway();
    vi.restoreAllMocks();
  });

  it('evaluates policy before reading email', async () => {
    global.fetch = vi.fn().mockResolvedValue({
      ok: true,
      json: async () => ({ id: 'msg-1', payload: { headers: [] } }),
    }) as unknown as typeof fetch;

    const tools = new GmailReadTools(gateway, 'tenant-1', 'user-1', 'token');
    await tools.emailRead({ messageId: 'msg-1' });

    expect(gateway['policy.evaluate']).toHaveBeenCalledWith(
      expect.objectContaining({
        tenantId: 'tenant-1',
        userId: 'user-1',
        action: 'email_read',
      }),
    );
  });

  it('returns denied result when policy denies', async () => {
    gateway = createMockGateway({
      'policy.evaluate': vi.fn().mockResolvedValue({
        decision: 'deny',
        policyApplied: 'restrict-email',
        reason: 'User not authorized',
        constraints: {},
      }),
    });

    const tools = new GmailReadTools(gateway, 'tenant-1', 'user-1', 'token');
    const result = await tools.emailRead({ messageId: 'msg-1' });

    expect(result.items).toHaveLength(0);
    expect(result.connectorStatus).toBe('error');
  });

  it('logs audit entry after successful read', async () => {
    global.fetch = vi.fn().mockResolvedValue({
      ok: true,
      json: async () => ({ id: 'msg-1', payload: { headers: [] } }),
    }) as unknown as typeof fetch;

    const tools = new GmailReadTools(gateway, 'tenant-1', 'user-1', 'token');
    await tools.emailRead({ messageId: 'msg-1' });

    expect(gateway['audit.log']).toHaveBeenCalledWith(
      expect.objectContaining({
        actionType: 'data_access',
        outcome: 'success',
      }),
    );
  });

  it('throws OAuthRevocationError on 401', async () => {
    global.fetch = vi.fn().mockResolvedValue({
      ok: false,
      status: 401,
      statusText: 'Unauthorized',
    }) as unknown as typeof fetch;

    const tools = new GmailReadTools(gateway, 'tenant-1', 'user-1', 'bad-token');
    await expect(tools.emailRead({ messageId: 'msg-1' })).rejects.toThrow(OAuthRevocationError);
  });
});

Coverage Targets¶

The project enforces 80% minimum coverage for statements, branches, functions, and lines across all plugins. The shared library is excluded from coverage measurement (it is tested indirectly through plugin tests).

To check coverage:

pnpm test -- --coverage

The coverage report is generated in three formats: - text -- printed to the terminal - json -- machine-readable for CI - html -- browsable report in coverage/

Test Categories¶

Category	What to Test	Where
Policy evaluation	Verify policy is called before actions; verify denial behavior	Every plugin
Classification	Verify items are classified via `policy.classify`	Connector plugins
Audit logging	Verify audit entries for success and denial	Every plugin
OAuth revocation	Verify connector disables on 401/invalid_grant	Connector plugins
API unavailability	Verify graceful degradation on 503/timeout	Connector plugins
Hierarchy	Verify child scopes cannot expand beyond parent	policy-engine
Hot-reload	Verify policy changes are detected and applied	policy-engine
OCIP envelope	Verify OCIP metadata injection and parsing	ocip-protocol
Loop prevention	Verify round limits and human escalation	ocip-protocol
Approval queue	Verify pending items, approve, reject flows	auto-response