dataframe: An intuitive, dynamically-typed DataFrame library.

[ data, library, mit, program ] [ Propose Tags ] [ Report a vulnerability ]

An intuitive, dynamically-typed DataFrame library for exploratory data analysis.


[Skip to Readme]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.1.0.0, 0.1.0.1, 0.1.0.2, 0.1.0.3, 0.2.0.0, 0.2.0.1, 0.2.0.2, 0.3.0.0, 0.3.0.1, 0.3.0.2, 0.3.0.3, 0.3.0.4, 0.3.1.1, 0.3.1.2, 0.3.2.0, 0.3.3.0, 0.3.3.1, 0.3.3.2, 0.3.3.3, 0.3.3.4, 0.3.3.5, 0.3.3.6, 0.3.3.7, 0.3.3.8, 0.3.3.9, 0.3.4.0, 0.3.4.1, 0.3.5.0, 0.4.0.0, 0.4.0.2, 0.4.0.3, 0.4.0.4, 0.4.0.5, 0.4.0.6, 0.4.0.7, 0.4.0.8, 0.4.0.9, 0.4.0.10, 0.4.1.0, 0.5.0.0, 0.5.0.1, 0.6.0.0, 0.7.0.0, 1.0.0.0, 1.0.0.1
Change log CHANGELOG.md
Dependencies array (>=0.5 && <0.6), attoparsec (>=0.12 && <=0.14.4), base (>=4.17.2.0 && <4.21), bytestring (>=0.11 && <=0.12.2.0), containers (>=0.6.7 && <0.8), directory (>=1.3.0.0 && <=1.3.9.0), hashable (>=1.2 && <=1.5.0.0), statistics (>=0.16.2.1 && <=0.16.3.0), text (>=2.0 && <=2.1.2), time (>=1.12 && <=1.14), vector (>=0.13 && <0.14), vector-algorithms (>=0.9 && <0.10) [details]
Tested with ghc ==9.8.3 || ==9.6.6 || ==9.4.8
License GPL-3.0-or-later
Copyright (c) 2024-2024 Michael Chavinda
Author Michael Chavinda
Maintainer mschavinda@gmail.com
Uploaded by mchav at 2025-06-15T06:50:17Z
Category Data
Bug tracker https://github.com/mchav/dataframe/issues
Source repo head: git clone https://github.com/mchav/dataframe
Reverse Dependencies 4 direct, 0 indirect [details]
Executables dataframe
Downloads 840 total (199 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2025-06-15 [all 1 reports]

Readme for dataframe-0.2.0.0

[back to package description]

DataFrame

An intuitive, dynamically-typed DataFrame library.

A tool for exploratory data analysis.

Installing

CLI

  • Install Haskell (ghc + cabal) via ghcup selecting all the default options.
  • To install dataframe run cabal update && cabal install dataframe
  • Open a Haskell repl with dataframe loaded by running cabal repl --build-depends dataframe.
  • Follow along any one of the tutorials below.

Jupyter notebook

What is exploratory data analysis?

We provide a primer here and show how to do some common analyses.

Coming from other dataframe libraries

Familiar with another dataframe library? Get started:

Example usage

Code example

import qualified DataFrame as D import DataFrame ((|>)) main :: IO () df <- D.readTsv "./data/chipotle.tsv" print $ df |> D.select ["item_name", "quantity"] |> D.groupBy ["item_name"] |> D.aggregate (zip (repeat "quantity") [D.Maximum, D.Mean, D.Sum]) |> D.sortBy D.Descending ["Sum_quantity"] 

Output:

---------------------------------------------------------------------------------------------------- index | item_name | Sum_quantity | Mean_quantity | Maximum_quantity ------|---------------------------------------|--------------|--------------------|----------------- Int | Text | Int | Double | Int ------|---------------------------------------|--------------|--------------------|----------------- 0 | Chips and Fresh Tomato Salsa | 130 | 1.1818181818181819 | 15 1 | Izze | 22 | 1.1 | 3 2 | Nantucket Nectar | 31 | 1.1481481481481481 | 3 3 | Chips and Tomatillo-Green Chili Salsa | 35 | 1.1290322580645162 | 3 4 | Chicken Bowl | 761 | 1.0482093663911847 | 3 5 | Side of Chips | 110 | 1.0891089108910892 | 8 6 | Steak Burrito | 386 | 1.048913043478261 | 3 7 | Steak Soft Tacos | 56 | 1.018181818181818 | 2 8 | Chips and Guacamole | 506 | 1.0563674321503131 | 4 9 | Chicken Crispy Tacos | 50 | 1.0638297872340425 | 2 

Full example in ./app folder using many of the constructs in the API.

Visual example

Screencast of usage in GHCI

Future work

  • Apache arrow and Parquet compatability
  • Integration with common data formats (currently only supports CSV)
  • Support windowed plotting (currently only supports ASCII plots)
  • Create a lazy API that builds an execution graph instead of running eagerly (will be used to compute on files larger than RAM)

Contributing

  • Please first submit an issue and we can discuss there.