Skip to content

khubaib-ctrl/rag-core

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

⚑ rag-core

Minimal, elegant RAG framework for TypeScript

Zero dependencies Β· Type-safe Β· Production-grade Β· Shield included

TypeScript Zero Dependencies MIT License Node.js


Why rag-core?

Most RAG frameworks are heavyweight, opinionated, and leave security as an afterthought. rag-core is different:

Principle What it means
πŸͺΆ Zero Dependencies Pure TypeScript. Uses native fetch. No bloated dependency tree.
πŸ”’ Shield Built-in Prompt injection detection out of the box. Production-grade from day one.
🧩 Modular & Swappable Every component β€” embedder, store, ranker β€” implements a clean interface. Swap OpenAI for Cohere in one line.
🎯 Deep Type Safety Generic metadata flows through the entire pipeline. Your IDE knows the shape of your data everywhere.
⚑ 10-Line Quickstart From install to working RAG pipeline in under 10 lines of code.

Quick Start

npm install rag-core
import { RagCore, OpenAIEmbedder, MemoryStore } from 'rag-core'; const rag = new RagCore({ embedder: new OpenAIEmbedder({ apiKey: process.env.OPENAI_API_KEY! }), vectorStore: new MemoryStore(), shield: { detectInjection: true }, }); // Ingest a document β€” chainable, readable, typed await rag .ingest({ content: 'TypeScript is a typed superset of JavaScript...' }) .split({ chunkSize: 500 }) .store(); // Query with automatic shield + vector search const results = await rag.query('What is TypeScript?', { topK: 5 });

That's it. 7 lines to a working RAG pipeline with prompt injection protection.


Architecture

 Document β†’ [ Ingestor ] β†’ Chunks β†’ [ Embedder ] β†’ Vectors β†’ [ Store ] ↓ Query β†’ [ Shield ] β†’ [ Embedder ] β†’ [ Store.search ] β†’ [ Ranker ] β†’ Results 

The 5 Pillars

Module Class Purpose
Ingestor RecursiveCharacterSplitter Smart text chunking with recursive separator hierarchy
Embedder OpenAIEmbedder, CohereEmbedder Map text to vectors via any embedding API
Store MemoryStore Vector storage with cosine similarity search
Ranker CohereRanker Re-rank results with cross-encoder models
Shield InjectionDetector Detect prompt injection attacks before they reach your LLM

API Reference

RagCore<TMeta>

The main orchestrator. All operations flow through this class.

const rag = new RagCore<MyMetadata>({ embedder: new OpenAIEmbedder({ apiKey: '...' }), vectorStore: new MemoryStore<MyMetadata>(), ranker: new CohereRanker({ apiKey: '...' }), // optional shield: { detectInjection: true, threshold: 0.7 }, // optional });

.ingest(document) β†’ Pipeline

Start a chainable ingestion pipeline:

await rag .ingest({ id: 'doc-1', content: '...', metadata: { source: 'web' } }) .split({ chunkSize: 500, chunkOverlap: 50 }) .store();

.ingestMany(documents, options?) β†’ Promise<EmbeddedChunk[]>

Batch-ingest multiple documents:

await rag.ingestMany([doc1, doc2, doc3], { chunkSize: 500 });

.query(question, options?) β†’ Promise<SearchResult[]>

Query the pipeline with automatic shield β†’ embed β†’ search β†’ rerank:

const results = await rag.query('How does X work?', { topK: 5, // number of results (default: 5) rerank: true, // enable re-ranking (default: false) shield: true, // enable injection check (default: true) });

.shield(input) β†’ ShieldResult

Manually check any text for prompt injection:

const check = rag.shield(userInput); if (!check.safe) { console.warn(`Blocked! Threats: ${check.threats.join(', ')}`); }

Embedders

OpenAIEmbedder

new OpenAIEmbedder({ apiKey: 'sk-...', model: 'text-embedding-3-small', // default baseUrl: 'https://api.openai.com/v1', // default });

CohereEmbedder

new CohereEmbedder({ apiKey: '...', model: 'embed-english-v3.0', // default });

Custom Embedder

Implement the Embedder interface to use any provider:

import type { Embedder } from 'rag-core'; class MyEmbedder implements Embedder { async embed(texts: string[]): Promise<number[][]> { /* ... */ } async embedQuery(text: string): Promise<number[]> { /* ... */ } }

Vector Stores

MemoryStore<TMeta>

In-memory store with pure cosine similarity. Great for prototyping and small datasets.

const store = new MemoryStore<MyMeta>(); store.size; // number of stored chunks store.clear(); // remove all

Custom Store

Implement the VectorStore interface for Pinecone, Weaviate, Qdrant, etc.:

import type { VectorStore, EmbeddedChunk, SearchResult } from 'rag-core'; class PineconeStore<TMeta> implements VectorStore<TMeta> { async upsert(chunks: EmbeddedChunk<TMeta>[]): Promise<void> { /* ... */ } async search(query: number[], topK: number): Promise<SearchResult<TMeta>[]> { /* ... */ } }

Shield Layer

The unique selling point of rag-core. Most frameworks completely ignore prompt security.

import { InjectionDetector, sanitize } from 'rag-core'; const detector = new InjectionDetector(0.7); // threshold const result = detector.analyze('Ignore all previous instructions...'); // { safe: false, score: 0.9, threats: ['role-override:ignore-previous'] } const clean = sanitize(rawInput); // strips control chars, normalizes unicode

Detected threat categories:

  • πŸ›‘οΈ Role overrides (ignore previous, you are now, pretend to be)
  • πŸ”“ Delimiter injection (<|im_start|>, [INST], <<SYS>>)
  • πŸ“€ Data exfiltration (show me your prompt, repeat the context)
  • 🎭 Jailbreaks (DAN mode, developer mode, god mode)
  • πŸ”€ Obfuscation (base64, eval(), encoding tricks)

Splitter

import { RecursiveCharacterSplitter } from 'rag-core'; const splitter = new RecursiveCharacterSplitter({ chunkSize: 500, // max chars per chunk (default: 500) chunkOverlap: 50, // overlap between chunks (default: 50) separators: ['\n\n', '\n', '. ', ' ', ''], // custom hierarchy }); const chunks = splitter.split({ id: 'doc-1', content: longText });

Type Safety

rag-core uses TypeScript generics so your metadata type flows through the entire pipeline:

interface MyMeta { source: string; page: number; confidential: boolean; } const rag = new RagCore<MyMeta>({ embedder: new OpenAIEmbedder({ apiKey: '...' }), vectorStore: new MemoryStore<MyMeta>(), }); // Metadata is typed everywhere await rag.ingest({ content: '...', metadata: { source: 'report.pdf', page: 42, confidential: true }, }).split().store(); const results = await rag.query('...'); results[0].chunk.metadata?.source; // βœ… TypeScript knows this is `string` results[0].chunk.metadata?.page; // βœ… TypeScript knows this is `number`

License

MIT Β© rag-core contributors

About

Minimal, elegant RAG framework for TypeScript (minimum dependencies, type-safe, and production-grade)

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors