dataframe: A fast, safe, and intuitive DataFrame library.

[ data, library, mit, program ] [ Propose Tags ] [ Report a vulnerability ]
Versions [RSS] 0.1.0.0, 0.1.0.1, 0.1.0.2, 0.1.0.3, 0.2.0.0, 0.2.0.1, 0.2.0.2, 0.3.0.0, 0.3.0.1, 0.3.0.2, 0.3.0.3, 0.3.0.4, 0.3.1.1, 0.3.1.2, 0.3.2.0, 0.3.3.0, 0.3.3.1, 0.3.3.2, 0.3.3.3, 0.3.3.4, 0.3.3.5, 0.3.3.6, 0.3.3.7, 0.3.3.8, 0.3.3.9, 0.3.4.0, 0.3.4.1, 0.3.5.0, 0.4.0.0, 0.4.0.2, 0.4.0.3, 0.4.0.4, 0.4.0.5, 0.4.0.6, 0.4.0.7, 0.4.0.8, 0.4.0.9, 0.4.0.10, 0.4.1.0, 0.5.0.0, 0.5.0.1, 0.6.0.0, 0.7.0.0, 1.0.0.0, 1.0.0.1
Change log CHANGELOG.md
Dependencies array (>=0.5 && <0.6), attoparsec (>=0.12 && <0.15), base (>=4 && <5), bytestring (>=0.11 && <0.13), bytestring-lexing (>=0.5 && <0.6), containers (>=0.6.7 && <0.9), dataframe (>=0.3 && <0.4), directory (>=1.3.0.0 && <2), granite (>=0.3 && <0.4), hashable (>=1.2 && <2), process (>=1.6 && <1.7), random (>=1 && <2), snappy-hs (>=0.1 && <0.2), template-haskell (>=2.0 && <3), text (>=2.0 && <3), time (>=1.12 && <2), vector (>=0.13 && <0.14), vector-algorithms (>=0.9 && <0.10), zstd (>=0.1.2.0 && <0.2) [details]
Tested with ghc ==9.4.8 || ==9.6.7 || ==9.8.4 || ==9.10.3 || ==9.12.2
License GPL-3.0-or-later
Copyright (c) 2024-2025 Michael Chavinda
Author Michael Chavinda
Maintainer mschavinda@gmail.com
Uploaded by mchav at 2025-10-08T00:50:17Z
Category Data
Bug tracker https://github.com/mchav/dataframe/issues
Source repo head: git clone https://github.com/mchav/dataframe
Reverse Dependencies 4 direct, 0 indirect [details]
Executables dataframe
Downloads 840 total (199 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2025-10-08 [all 1 reports]

Readme for dataframe-0.3.3.0

[back to package description]

dataframe logo

hackage Latest Release C/I

User guide | Discord

DataFrame

A fast, safe, and intuitive DataFrame library.

Why use this DataFrame library?

  • Encourages concise, declarative, and composable data pipelines.
  • Static typing makes code easier to reason about and catches many bugs at compile time—before your code ever runs.
  • Delivers high performance thanks to Haskell’s optimizing compiler and efficient memory model.
  • Designed for interactivity: expressive syntax, helpful error messages, and sensible defaults.
  • Works seamlessly in both command-line and notebook environments—great for exploration and scripting alike.

Example usage

Interactive environment

Screencast of usage in GHCI

Key features in example:

  • Intuitive, SQL-like API to get from data to insights.
  • Create typed, completion-ready references to columns in a dataframe using :exposeColumns
  • Type-safe column transformations for faster and safer exploration.
  • Fluid, chaining API that makes code easy to reason about.

Standalone script example

-- Useful Haskell extensions. {-# LANGUAGE OverloadedStrings #-} -- Allow string literal to be interpreted as any other string type. {-# LANGUAGE TypeApplications #-} -- Convenience syntax for specifiying the type `sum a b :: Int` vs `sum @Int a b'. import qualified DataFrame as D -- import for general functionality. import qualified DataFrame.Functions as F -- import for column expressions. import DataFrame ((|>)) -- import chaining operator with unqualified. main :: IO () main = do df <- D.readTsv "./data/chipotle.tsv" let quantity = F.col "quantity" :: D.Expr Int -- A typed reference to a column. print (df |> D.select ["item_name", "quantity"] |> D.groupBy ["item_name"] |> D.aggregate [ (F.sum quantity) `F.as` "sum_quantity" , (F.mean quantity) `F.as` "mean_quantity" , (F.maximum quantity) `F.as` "maximum_quantity" ] |> D.sortBy D.Descending ["sum_quantity"] |> D.take 10) 

Output:

------------------------------------------------------------------------------------------ index | item_name | sum_quantity | mean_quanity | maximum_quanity ------|------------------------------|--------------|--------------------|---------------- Int | Text | Int | Double | Int ------|------------------------------|--------------|--------------------|---------------- 0 | Chicken Bowl | 761 | 1.0482093663911847 | 3 1 | Chicken Burrito | 591 | 1.0687160940325497 | 4 2 | Chips and Guacamole | 506 | 1.0563674321503131 | 4 3 | Steak Burrito | 386 | 1.048913043478261 | 3 4 | Canned Soft Drink | 351 | 1.1661129568106312 | 4 5 | Chips | 230 | 1.0900473933649288 | 3 6 | Steak Bowl | 221 | 1.04739336492891 | 3 7 | Bottled Water | 211 | 1.3024691358024691 | 10 8 | Chips and Fresh Tomato Salsa | 130 | 1.1818181818181819 | 15 9 | Canned Soda | 126 | 1.2115384615384615 | 4 

Full example in ./examples folder using many of the constructs in the API.

Installing

Jupyter notebook

CLI

  • Run the installation script curl '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/mchav/dataframe/refs/heads/main/scripts/install.sh | sh
  • Download the run script with: curl --output dataframe "https://raw.githubusercontent.com/mchav/dataframe/refs/heads/main/scripts/dataframe.sh"
  • Make the script executable: chmod +x dataframe
  • Add the script your path: export PATH=$PATH:./dataframe
  • Run the script with: dataframe

What is exploratory data analysis?

We provide a primer here and show how to do some common analyses.

Coming from other dataframe libraries

Familiar with another dataframe library? Get started:

Supported input formats

  • CSV
  • Apache Parquet

Supported output formats

  • CSV

Future work

  • Apache arrow compatability
  • Integration with more data formats (SQLite, Postgres, json lines, xlsx).
  • Host the whole library + Jupyter lab on Azure with auth and isolation.