dataframe: An intuitive, dynamically-typed DataFrame library.

[ data, gpl, library, program ] [ Propose Tags ] [ Report a vulnerability ]

An intuitive, dynamically-typed DataFrame library for exploratory data analysis.

[Skip to Readme]

Modules

[Index] [Quick Jump]

Data
- Data.DataFrame

Downloads

dataframe-0.1.0.1.tar.gz [browse] (Cabal source package)
Package description (as included in the package)

Maintainer's Corner

Package maintainers

mchav

For package maintainers and hackage trustees

edit package information

Candidates

No Candidates

Versions [RSS]	0.1.0.0, 0.1.0.1
Change log	CHANGELOG.md
Dependencies	array (>=0.5 && <0.6), attoparsec (>=0.12 && <=0.14.4), base (>=4.17.2.0 && <4.21), bytestring (>=0.11 && <=0.12.2.0), containers (>=0.6.7 && <0.8), directory (>=1.3.0.0 && <=1.3.9.0), hashable (>=1.2 && <=1.5.0.0), statistics (==0.16.3.0), text (>=2.0 && <=2.1.2), time (>=1.12 && <=1.14), vector (>=0.13 && <0.14), vector-algorithms (>=0.9 && <0.10) [details]
Tested with	ghc ==9.8.3 \|\| ==9.6.6 \|\| ==9.4.8
License	GPL-3.0-or-later
Copyright	(c) 2024-2024 Michael Chavinda
Author	Michael Chavinda
Maintainer	mschavinda@gmail.com
Category	Data
Bug tracker	https://github.com/mchav/dataframe/issues
Source repo	head: git clone https://github.com/mchav/dataframe
Uploaded	by mchav at 2025-04-16T14:12:06Z
Distributions	Stackage:0.1.0.1
Executables	dataframe
Downloads	7 total (7 in the last 30 days)
Rating	(no votes yet) [estimated by Bayesian average]
Your Rating	λ λ λ
Status	Docs available [build log] Last success reported on 2025-04-16 [all 1 reports]

Readme for dataframe-0.1.0.1

[back to package description]

DataFrame

An intuitive, dynamically-typed DataFrame library.

A tool for exploratory data analysis.

Installing

Install Haskell (ghc + cabal) via ghcup selecting all the default options.
To install dataframe run cabal update && cabal install dataframe
Open a Haskell repl with dataframe loaded by running cabal repl --build-depends dataframe.
Follow along any one of the tutorials below.

What is exploratory data analysis?

We provide a primer here and show how to do some common analyses.

Coming from other dataframe libraries

Familiar with another dataframe library? Get started:

Example usage

Code example

import qualified Data.DataFrame as D

import Data.DataFrame ((|>))

main :: IO ()
    df <- D.readTsv "./data/chipotle.tsv"
    print $ df
      |> D.select ["item_name", "quantity"]
      |> D.groupBy ["item_name"]
      |> D.aggregate (zip (repeat "quantity") [D.Maximum, D.Mean, D.Sum])
      |> D.sortBy D.Descending ["Sum_quantity"]

Output:

----------------------------------------------------------------------------------------------------
index |               item_name               | Sum_quantity |   Mean_quantity    | Maximum_quantity
------|---------------------------------------|--------------|--------------------|-----------------
 Int  |                 Text                  |     Int      |       Double       |       Int       
------|---------------------------------------|--------------|--------------------|-----------------
0     | Chips and Fresh Tomato Salsa          | 130          | 1.1818181818181819 | 15              
1     | Izze                                  | 22           | 1.1                | 3               
2     | Nantucket Nectar                      | 31           | 1.1481481481481481 | 3               
3     | Chips and Tomatillo-Green Chili Salsa | 35           | 1.1290322580645162 | 3               
4     | Chicken Bowl                          | 761          | 1.0482093663911847 | 3               
5     | Side of Chips                         | 110          | 1.0891089108910892 | 8               
6     | Steak Burrito                         | 386          | 1.048913043478261  | 3               
7     | Steak Soft Tacos                      | 56           | 1.018181818181818  | 2               
8     | Chips and Guacamole                   | 506          | 1.0563674321503131 | 4               
9     | Chicken Crispy Tacos                  | 50           | 1.0638297872340425 | 2

Full example in ./app folder using many of the constructs in the API.

Visual example

Screencast of usage in GHCI

Future work

Apache arrow and Parquet compatability
Integration with common data formats (currently only supports CSV)
Support windowed plotting (currently only supports ASCII plots)
Create a lazy API that builds an execution graph instead of running eagerly (will be used to compute on files larger than RAM)

Contributing

Please first submit an issue and we can discuss there.