Skip to content

Feature Request: Adaptive Sender Importance Scoring (feedback loop for AI rule matching) #1981

@SvetZitrka

Description

@SvetZitrka

Problem

Inbox Zero's AI rule matching currently treats every email independently — rules are static and don't improve based on user behavior. This is the #1 gap compared to tools like SaneBox, which learns from user interactions to progressively improve sorting accuracy.

In practice, static rules decay fast. Services get cancelled, new senders appear, newsletter subscriptions change — and rules that were perfect 6 months ago now misfire or do nothing. Users end up with two painful choices: constantly hand-edit rules, or accept growing inaccuracy over time.

There are zero existing issues, PRs, or community discussions requesting adaptive learning or behavioral feedback loops. I believe this is the single highest-impact feature that could differentiate Inbox Zero from SaneBox and closed-source alternatives.

Real-world scenarios this solves

These are generalized patterns common to small publishers, agencies, and businesses managing 30+ emails/day:

1. Sender relationships change, rules don't. A user sets up auto-forward rules for PR agencies. But some agency contacts become direct business partners over time — the user starts replying to them, exchanging contracts, negotiating deals. The system should notice the shift from "auto-forward" to "needs personal attention" without the user rewriting rules.

2. Newsletter volume explodes unpredictably. AI newsletters alone can go from ~10% to ~30% of inbox volume in under a year. A scoring system that tracks engagement (does the user ever reply? or just archive?) would naturally deprioritize newsletters the user ignores — without manual unsubscribe.

3. "Fix once, then it knows." User corrects an AI decision (moves an auto-archived email back to inbox, or rejects a draft). Today, that correction is lost. Tomorrow, the same mistake repeats. A feedback loop would capture the correction and adjust the sender's importance score, so the same mistake doesn't happen twice.

4. Financial emails buried under noise. Invoice reminders, domain expiration warnings, contracts awaiting signature (DocuSign) — these are low-volume but high-consequence. A sender that the user always acts on quickly should automatically score high, even if they only email once a month.

5. Services come and go — rules don't clean up. When users cancel a SaaS tool or end a partnership, that sender's emails drop to zero. But the rule stays. Over 1–2 years, rule lists accumulate dead entries. A confidence score that decays with inactivity would naturally surface stale rules for cleanup.

Proposed Solution: Sender Importance Score (SIS)

A per-sender numerical score (0–100) computed from behavioral signals, injected as context into the choose-rule AI prompt. No new ML models, no new dependencies, no new services.

Available signals (verified against Gmail API capabilities)

Signal | Source | Weight rationale -- | -- | -- Reply ratio (replies sent / emails received from sender) | Gmail thread structure | Strongest signal — replying = importance (confirmed by Google Priority Inbox paper and SaneBox) Bidirectional communication (has user ever sent TO this sender?) | Sent messages lookup / existing hasSentEmail in Cold Email Blocker | Binary but powerful — eliminates newsletters and cold email in one check Recency (days since last interaction) | Timestamp from DB | Exponential decay — contacts go stale Email frequency (normalized volume from sender) | Tinybird analytics (already collected!) | Moderate — high frequency alone ≠ important (spammers are frequent too) Explicit user feedback (approve/reject AI actions, manual label changes) | ExecutedRule logs + UI approve/reject flow | Direct signal — user explicitly correcting the system

Total: ~600 lines of new code, zero new npm dependencies.

Privacy and safety

  • Opt-in by default — feature flag OFF, users explicitly enable
  • Header/metadata only — never reads email body (same approach as SaneBox)
  • Per-user, per-instance — scores never leave the user's database
  • Manual override — users can pin a sender's score or exclude them from scoring
  • Human-in-the-loop preserved — scoring provides context, doesn't override rules

Implementation offer

I'm prepared to implement this via PR, split into 2–3 incremental ones (data layer → scoring engine + pipeline → UI). I've reviewed ARCHITECTURE.md, the choose-rule pipeline, the Prisma schema, the CLA, and existing .cursor/rules.

Before writing code, I want to validate the approach:

  1. Does this align with the project's direction? Especially given the Agent PR (Agent #1475) — would scoring complement or conflict?
  2. Prompt injection vs. hard-coded logic — do you prefer score context in the LLM prompt, or score-based rule conditions (e.g., "if sender score > 70")?
  3. Any concerns about the SenderScore Prisma model?

Happy to discuss on Discord. I have a detailed technical spec covering algorithm design (informed by Google's Priority Inbox paper and SaneBox's documented approach), integration points, and risk analysis.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions