dataframe-0.7.0.0: A fast, safe, and intuitive DataFrame library.
Safe HaskellNone
LanguageHaskell2010

DataFrame.Lazy.IO.Binary

Description

Simple column-oriented binary spill format (DFBN).

Layout (all integers little-endian):

[magic:       4  bytes] DFBN
[num_columns: 4  bytes] Word32
  per column:
    [name_len:  2  bytes] Word16  (byte length of UTF-8 name)
    [name:     name_len bytes]
    [type_tag:  1  byte]  Word8
[num_rows:    8  bytes] Word64

per column data block (order matches schema):
  type_tag 0 (Int):            num_rows × Int64 LE
  type_tag 1 (Double):         num_rows × Double LE (IEEE 754)
  type_tag 2 (Text):           (num_rows+1) × Word32 offsets  ++  payload bytes (UTF-8)
  type_tag 3 (Maybe Int):      ceil(num_rows/8)-byte null bitmap  ++  num_rows × Int64 LE
  type_tag 4 (Maybe Double):   ceil(num_rows/8)-byte null bitmap  ++  num_rows × Double LE
  type_tag 5 (Maybe Text):     ceil(num_rows/8)-byte null bitmap
                                ++  (num_rows+1) × Word32 offsets  ++  payload bytes

Null bitmap: bit i of byte i/8 is 1 when row i is non-null.

Synopsis

Documentation

spillToDisk :: FilePath -> DataFrame -> IO () Source #

Serialise a DataFrame to a DFBN binary file.

readSpilled :: FilePath -> IO DataFrame Source #

Deserialise a DFBN binary file into a DataFrame.

withSpilled :: DataFrame -> (FilePath -> IO a) -> IO a Source #

Spill a DataFrame to a temporary file, run an action with the path, then delete the file even if the action throws.