dataframe: A fast, safe, and intuitive DataFrame library.

[ data, library, mit, program ] [ Propose Tags ] [ Report a vulnerability ]

A fast, safe, and intuitive DataFrame library for exploratory data analysis.

[Skip to Readme]

Modules

[Index] [Quick Jump]

DataFrame

Downloads

dataframe-0.3.3.0.tar.gz [browse] (Cabal source package)
Package description (as included in the package)

Maintainer's Corner

Package maintainers

mchav

For package maintainers and hackage trustees

edit package information

Candidates

No Candidates

Versions [RSS]	0.1.0.0, 0.1.0.1, 0.1.0.2, 0.1.0.3, 0.2.0.0, 0.2.0.1, 0.2.0.2, 0.3.0.0, 0.3.0.1, 0.3.0.2, 0.3.0.3, 0.3.0.4, 0.3.1.1, 0.3.1.2, 0.3.2.0, 0.3.3.0, 0.3.3.1, 0.3.3.2, 0.3.3.3, 0.3.3.4, 0.3.3.5, 0.3.3.6, 0.3.3.7, 0.3.3.8, 0.3.3.9, 0.3.4.0, 0.3.4.1, 0.3.5.0, 0.4.0.0, 0.4.0.2, 0.4.0.3, 0.4.0.4, 0.4.0.5, 0.4.0.6, 0.4.0.7, 0.4.0.8, 0.4.0.9, 0.4.0.10, 0.4.1.0, 0.5.0.0, 0.5.0.1, 0.6.0.0, 0.7.0.0, 1.0.0.0, 1.0.0.1
Change log	CHANGELOG.md
Dependencies	array (>=0.5 && <0.6), attoparsec (>=0.12 && <0.15), base (>=4 && <5), bytestring (>=0.11 && <0.13), bytestring-lexing (>=0.5 && <0.6), containers (>=0.6.7 && <0.9), dataframe (>=0.3 && <0.4), directory (>=1.3.0.0 && <2), granite (>=0.3 && <0.4), hashable (>=1.2 && <2), process (>=1.6 && <1.7), random (>=1 && <2), snappy-hs (>=0.1 && <0.2), template-haskell (>=2.0 && <3), text (>=2.0 && <3), time (>=1.12 && <2), vector (>=0.13 && <0.14), vector-algorithms (>=0.9 && <0.10), zstd (>=0.1.2.0 && <0.2) [details]
Tested with	ghc ==9.4.8 \|\| ==9.6.7 \|\| ==9.8.4 \|\| ==9.10.3 \|\| ==9.12.2
License	GPL-3.0-or-later
Copyright	(c) 2024-2025 Michael Chavinda
Author	Michael Chavinda
Maintainer	mschavinda@gmail.com
Uploaded	by mchav at 2025-10-08T00:50:17Z
Category	Data
Bug tracker	https://github.com/mchav/dataframe/issues
Source repo	head: git clone https://github.com/mchav/dataframe
Reverse Dependencies	4 direct, 0 indirect [details]
Executables	dataframe
Downloads	840 total (199 in the last 30 days)
Rating	(no votes yet) [estimated by Bayesian average]
Your Rating	λ λ λ
Status	Docs available [build log] Last success reported on 2025-10-08 [all 1 reports]

Readme for dataframe-0.3.3.0

[back to package description]

User guide | Discord

DataFrame

A fast, safe, and intuitive DataFrame library.

Why use this DataFrame library?

Encourages concise, declarative, and composable data pipelines.
Static typing makes code easier to reason about and catches many bugs at compile time—before your code ever runs.
Delivers high performance thanks to Haskell’s optimizing compiler and efficient memory model.
Designed for interactivity: expressive syntax, helpful error messages, and sensible defaults.
Works seamlessly in both command-line and notebook environments—great for exploration and scripting alike.

Example usage

Interactive environment

Screencast of usage in GHCI

Key features in example:

Intuitive, SQL-like API to get from data to insights.
Create typed, completion-ready references to columns in a dataframe using :exposeColumns
Type-safe column transformations for faster and safer exploration.
Fluid, chaining API that makes code easy to reason about.

Standalone script example

-- Useful Haskell extensions. {-# LANGUAGE OverloadedStrings #-} -- Allow string literal to be interpreted as any other string type. {-# LANGUAGE TypeApplications #-} -- Convenience syntax for specifiying the type `sum a b :: Int` vs `sum @Int a b'. import qualified DataFrame as D -- import for general functionality. import qualified DataFrame.Functions as F -- import for column expressions. import DataFrame ((|>)) -- import chaining operator with unqualified. main :: IO () main = do df <- D.readTsv "./data/chipotle.tsv" let quantity = F.col "quantity" :: D.Expr Int -- A typed reference to a column. print (df |> D.select ["item_name", "quantity"] |> D.groupBy ["item_name"] |> D.aggregate [ (F.sum quantity) `F.as` "sum_quantity" , (F.mean quantity) `F.as` "mean_quantity" , (F.maximum quantity) `F.as` "maximum_quantity" ] |> D.sortBy D.Descending ["sum_quantity"] |> D.take 10)

Output:

------------------------------------------------------------------------------------------ index | item_name | sum_quantity | mean_quanity | maximum_quanity ------|------------------------------|--------------|--------------------|---------------- Int | Text | Int | Double | Int ------|------------------------------|--------------|--------------------|---------------- 0 | Chicken Bowl | 761 | 1.0482093663911847 | 3 1 | Chicken Burrito | 591 | 1.0687160940325497 | 4 2 | Chips and Guacamole | 506 | 1.0563674321503131 | 4 3 | Steak Burrito | 386 | 1.048913043478261 | 3 4 | Canned Soft Drink | 351 | 1.1661129568106312 | 4 5 | Chips | 230 | 1.0900473933649288 | 3 6 | Steak Bowl | 221 | 1.04739336492891 | 3 7 | Bottled Water | 211 | 1.3024691358024691 | 10 8 | Chips and Fresh Tomato Salsa | 130 | 1.1818181818181819 | 15 9 | Canned Soda | 126 | 1.2115384615384615 | 4

Full example in ./examples folder using many of the constructs in the API.

Installing

Jupyter notebook

We have a hosted version of the Jupyter notebook on azure sites. This is hosted on Azure's free tier so it can only support 3 or 4 kernels at a time.
To get started quickly, use the Dockerfile in the ihaskell-dataframe to build and run an image with dataframe integration.
For a preview check out the California Housing notebook.

CLI

Run the installation script curl '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/mchav/dataframe/refs/heads/main/scripts/install.sh | sh
Download the run script with: curl --output dataframe "https://raw.githubusercontent.com/mchav/dataframe/refs/heads/main/scripts/dataframe.sh"
Make the script executable: chmod +x dataframe
Add the script your path: export PATH=$PATH:./dataframe
Run the script with: dataframe

What is exploratory data analysis?

We provide a primer here and show how to do some common analyses.

Coming from other dataframe libraries

Familiar with another dataframe library? Get started:

Supported input formats

CSV
Apache Parquet

Supported output formats

Future work

Apache arrow compatability
Integration with more data formats (SQLite, Postgres, json lines, xlsx).
Host the whole library + Jupyter lab on Azure with auth and isolation.