GitHub - XiangpengHao/liquid-cache: Pushdown cache for DataFusion

LiquidCache understands both your data and your query.

It transcodes storage data into an optimized, cache-only format, so you can keep using your favorite formats without worrying about performance.
It keeps the data that matters in memory and uses modern SSDs efficiently. For example, if your query groups by year, LiquidCache stores only the year in memory and keeps the full timestamp on disk.

LiquidCache is a research project funded by InfluxData, SpiralDB, and Bauplan.

You may want to consider Foyer if you're looking for a black-box cache: easier to setup, but not as "smart" as LiquidCache.

Quick start

This quick start uses the core cache API from src/core. Add these dependencies to your project: liquid-cache, arrow, and datafusion. The example below shows insert, get, get with selection, and get with predicate pushdown.

use arrow::array::{BooleanArray, UInt64Array}; use arrow::buffer::BooleanBuffer; use datafusion::logical_expr::Operator; use datafusion::physical_plan::PhysicalExpr; use datafusion::physical_plan::expressions::{BinaryExpr, Column, Literal}; use datafusion::scalar::ScalarValue; use liquid_cache::cache::{EntryID, LiquidCacheBuilder}; use std::sync::Arc; tokio_test::block_on(async { let cache = LiquidCacheBuilder::new().build().await; let entry_id = EntryID::from(1); let values = Arc::new(UInt64Array::from(vec![10, 11, 12, 13, 14, 15])); // 1) insert cache.insert(entry_id, values.clone()).await; // 2) get let all_rows = cache.get(&entry_id).await.expect("entry should exist"); // 3) get filtered (selection pushdown): keep rows 0, 2, 4 let selection = BooleanBuffer::from(vec![true, false, true, false, true, false]); let selected_rows = cache .get(&entry_id) .with_selection(&selection) .await .expect("entry should exist"); // 4) get with predicate pushdown: col > 12 let predicate: Arc<dyn PhysicalExpr> = Arc::new(BinaryExpr::new( Arc::new(Column::new("col", 0)), Operator::Gt, Arc::new(Literal::new(ScalarValue::UInt64(Some(12)))), )); let predicate_mask = cache .eval_predicate(&entry_id, &predicate) .await .expect("entry should exist") .expect("predicate should be evaluated in cache"); // Conceptual expectations: assert_eq!(all_rows.as_ref(), values.as_ref()); // [10, 11, 12, 13, 14, 15] assert_eq!(selected_rows.as_ref(), &UInt64Array::from(vec![10, 12, 14])); assert_eq!( predicate_mask, BooleanArray::from(vec![ Some(false), Some(false), Some(false), Some(true), Some(true), Some(true), ]), ); });

Development

See dev/README.md

Benchmark

See benchmark/README.md

Performance troubleshooting

Use LiquidCache with DataFusion

LiquidCache requires a few non-default DataFusion configurations:

ListingTable:

let (ctx, _) = LiquidCacheLocalBuilder::new().build(config).await?; let listing_options = ParquetReadOptions::default() .to_listing_options(&ctx.copied_config(), ctx.copied_table_options()); ctx.register_listing_table("default", &table_path, listing_options, None, None) .await?;

Or register Parquet directly:

let (ctx, _) = LiquidCacheLocalBuilder::new().build(config).await?; ctx.register_parquet("default", "examples/nano_hits.parquet", Default::default()) .await?;

Disable background transcoding

For performance testing, disable background transcoding:

let (ctx, _) = LiquidCacheLocalBuilder::new() .with_squeeze_policy(Box::new( squeeze_policies::Evict, )) .build(config) .await?;

x86-64 optimization

LiquidCache is optimized for x86-64 with specific instructions. On ARM (e.g., Apple Silicon), fallback implementations are used. Contributions are welcome.

FAQ

Can I use LiquidCache in production today?

Not yet. Production readiness is our goal, but we are still implementing features and polishing the system. LiquidCache began as a research project exploring new approaches to cost-effective caching. Like most research projects, it takes time to mature—we welcome your help.

How does LiquidCache work?

See our paper for details. We are also working on a technical blog to introduce LiquidCache in a more accessible way.

How can I get involved?

We are always looking for contributors. Feedback and improvements are welcome—explore the issue list and contribute where you can. If you want to get involved in the research side, reach out.

Who is behind LiquidCache?

LiquidCache is a research project funded by:

SpiralDB
InfluxData
Bauplan
Taxpayers of the state of Wisconsin and the federal government.

LiquidCache is and will remain open source and free to use.

Your support for science is greatly appreciated!

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 403 Commits
.cargo		.cargo
.github		.github
.vscode		.vscode
benchmark		benchmark
dev		dev
examples		examples
fuzz		fuzz
src		src
.envrc		.envrc
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
flake.lock		flake.lock
flake.nix		flake.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick start

Development

Benchmark

Performance troubleshooting

Use LiquidCache with DataFusion

Disable background transcoding

x86-64 optimization

FAQ

Can I use LiquidCache in production today?

How does LiquidCache work?

How can I get involved?

Who is behind LiquidCache?

License

About

Uh oh!

Releases 12

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Quick start

Development

Benchmark

Performance troubleshooting

Use LiquidCache with DataFusion

Disable background transcoding

x86-64 optimization

FAQ

Can I use LiquidCache in production today?

How does LiquidCache work?

How can I get involved?

Who is behind LiquidCache?

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 12

Uh oh!

Contributors

Uh oh!

Languages