ADR-004: Multi-Signal Task Correlation with Confidence Scoring¶

Property	Value
Status	Accepted
Date	2026-03-13
Decision Makers	Project Team
Source	`docs/adr/004-multi-signal-task-correlation.md`

Context¶

When tasks arrive from multiple sources (Jira tickets, GitHub PRs, email threads, calendar events, Google Drive documents), duplicates and related items must be identified and correlated. Naive approaches (e.g., exact title matching) miss many duplicates, while overly aggressive merging creates false positives that confuse users. A balanced approach is needed that surfaces likely duplicates without incorrectly merging unrelated tasks.

Decision¶

Use multi-signal deduplication with confidence scoring for task correlation. The system evaluates multiple signals and produces a weighted confidence score that determines the action taken.

Correlation Signals¶

Signal	Description	Weight
Title similarity	Fuzzy text matching (Jaccard similarity) between task titles	Variable
Entity references	Shared ticket IDs, PR numbers, issue links across systems	High
Temporal proximity	Tasks created close together in time	Low-medium
Participant overlap	Same people involved across tasks from different systems	Medium
Cross-system ID	Explicit cross-references (e.g., Jira ticket mentioned in PR description)	High

Confidence Thresholds¶

Score Range	Action
>= 0.8	Auto-merge the tasks as duplicates
0.5 -- 0.8	Flag as "possibly related" for human review
< 0.5	Treat as separate tasks

Rationale¶

This approach prevents false merges while surfacing likely duplicates for human review. The multi-signal strategy is more robust than any single signal because:

Title similarity alone misses tasks described differently but referring to the same work.
Entity references alone misses tasks that have not been formally linked yet.
Combining signals with weighted scoring produces a more accurate confidence measure.
The three-tier threshold (auto-merge, review, separate) provides appropriate automation without sacrificing accuracy.

Alternatives Considered¶

Exact matching only¶

Simple but misses the majority of duplicates that use different wording or identifiers.

Aggressive auto-merge on any single signal¶

Leads to false merges, requiring manual cleanup and eroding user trust.

Fully manual deduplication¶

Accurate but does not scale and creates toil for operators.

Consequences¶

What becomes easier¶

Reducing duplicate task noise for users.
Keeping task lists clean across multiple source systems.
Surfacing relationships between work items from different sources.
Prioritizing tasks accurately by consolidating signals.

What becomes more difficult¶

Tuning confidence thresholds to balance false positives and false negatives.
The system requires ongoing calibration as usage patterns evolve.
Multiple signal sources must be maintained and kept in sync.

Implementation¶

Correlator: plugins/task-intelligence/src/correlation/correlator.ts
Scorer: plugins/task-intelligence/src/scoring/scorer.ts
Thresholds: defined in plugins/shared/src/constants.ts
CORRELATION_AUTO_MERGE_THRESHOLD = 0.8
CORRELATION_POSSIBLY_RELATED_THRESHOLD = 0.5