Skip to content

[Feature Request] Re-embed existing memories when embedding model/provider changes #1333

@z6166

Description

@z6166

Pre-submission checklist | 提交前检查

  • I have searched existing issues and this hasn't been mentioned before | 我已搜索现有问题,确认此问题尚未被提及
  • I have read the project documentation and confirmed this issue doesn't already exist | 我已阅读项目文档并确认此问题尚未存在
  • This issue is specific to MemOS and not a general software issue | 该问题是针对 MemOS 的,而不是一般软件问题

Problem

When a user changes their embedding model configuration (e.g., switching from text-embedding-3-small to text-embedding-3-large, or from OpenAI to a self-hosted model), existing memories become incompatible with vector search.

The embeddings table stores vectors with dimensions but does not store which embedding model/provider generated them:

CREATE TABLE embeddings ( chunk_id TEXT PRIMARY KEY, vector BLOB NOT NULL, dimensions INTEGER NOT NULL, updated_at INTEGER NOT NULL );

After changing the embedding model:

  • New queries are embedded with the new model
  • Existing stored vectors were generated by the old model
  • Cosine similarity between vectors from different embedding spaces yields meaningless scores
  • Vector search effectively breaks (either returns garbage or 0 scores when dimensions mismatch)

The system falls back to FTS-only search, but this is a silent degradation — users have no indication that their vector search is broken.

Expected Behavior

A supported way to re-embed all existing memories when the embedding model changes:

  1. Detection: When embedding config changes (model, provider, or endpoint), warn users that vector search may be degraded until re-embedding is complete
  2. Re-embedding tool: Provide a CLI command or API to regenerate embeddings for all existing chunks using the new model
  3. Incremental support: Allow re-embedding only new chunks (those without embeddings)

Environment

  • MemOS Local Plugin: memos-local-openclaw-plugin
  • Storage: SQLite with embeddings table
  • Search: Hybrid (FTS5 + vector via RRF fusion)

Happy to contribute if pointed in the right direction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions