dataframe-0.7.0.0: A fast, safe, and intuitive DataFrame library.
Safe HaskellNone
LanguageHaskell2010

DataFrame.Internal.Column

Synopsis

Documentation

data Column where Source #

Our representation of a column is a GADT that can store data based on the underlying data.

This allows us to pattern match on data kinds and limit some operations to only some kinds of vectors. E.g. operations for missing data only happen in an OptionalColumn.

Constructors

BoxedColumn :: forall a. Columnable a => Vector a -> Column 
UnboxedColumn :: forall a. (Columnable a, Unbox a) => Vector a -> Column 
OptionalColumn :: forall a. Columnable a => Vector (Maybe a) -> Column 

Instances

Instances details
Show Column Source # 
Instance details

Defined in DataFrame.Internal.Column

NFData Column Source # 
Instance details

Defined in DataFrame.Internal.Column

Methods

rnf :: Column -> () #

Eq Column Source # 
Instance details

Defined in DataFrame.Internal.Column

Methods

(==) :: Column -> Column -> Bool #

(/=) :: Column -> Column -> Bool #

Ord Column Source # 
Instance details

Defined in DataFrame.Internal.Column

data MutableColumn where Source #

Constructors

MBoxedColumn :: forall a. Columnable a => IOVector a -> MutableColumn 
MUnboxedColumn :: forall a. (Columnable a, Unbox a) => IOVector a -> MutableColumn 
MOptionalColumn :: forall a. Columnable a => IOVector (Maybe a) -> MutableColumn 

data TypedColumn a where Source #

A TypedColumn is a wrapper around our type-erased column. It is used to type check expressions on columns.

Constructors

TColumn :: forall a. Columnable a => Column -> TypedColumn a 

Instances

Instances details
Show a => Show (TypedColumn a) Source # 
Instance details

Defined in DataFrame.Internal.Column

Eq a => Eq (TypedColumn a) Source # 
Instance details

Defined in DataFrame.Internal.Column

Ord a => Ord (TypedColumn a) Source # 
Instance details

Defined in DataFrame.Internal.Column

unwrapTypedColumn :: TypedColumn a -> Column Source #

Gets the underlying value from a TypedColumn.

vectorFromTypedColumn :: TypedColumn a -> Vector a Source #

Gets the underlying vector from a TypedColumn.

hasMissing :: Column -> Bool Source #

Checks if a column contains missing values.

allMissing :: Column -> Bool Source #

Checks if a column contains only missing values.

isNumeric :: Column -> Bool Source #

Checks if a column contains numeric values.

hasElemType :: Columnable a => Column -> Bool Source #

Checks if a column is of a given type values.

columnVersionString :: Column -> String Source #

An internal/debugging function to get the column type of a column.

columnTypeString :: Column -> String Source #

An internal/debugging function to get the type stored in the outermost vector of a column.

generalLEQ :: (Typeable a, Typeable b, Ord a, Ord b) => a -> b -> Bool Source #

class ColumnifyRep (r :: Rep) a where Source #

A class for converting a vector to a column of the appropriate type. Given each Rep we tell the toColumnRep function which Column type to pick.

Methods

toColumnRep :: Vector a -> Column Source #

Instances

Instances details
Columnable a => ColumnifyRep 'RBoxed a Source # 
Instance details

Defined in DataFrame.Internal.Column

Methods

toColumnRep :: Vector a -> Column Source #

(Columnable a, Unbox a) => ColumnifyRep 'RUnboxed a Source # 
Instance details

Defined in DataFrame.Internal.Column

Methods

toColumnRep :: Vector a -> Column Source #

Columnable a => ColumnifyRep 'ROptional (Maybe a) Source # 
Instance details

Defined in DataFrame.Internal.Column

Methods

toColumnRep :: Vector (Maybe a) -> Column Source #

type Columnable a = (Columnable' a, ColumnifyRep (KindOf a) a, UnboxIf a, IntegralIf a, FloatingIf a, SBoolI (Unboxable a), SBoolI (Numeric a), SBoolI (IntegralTypes a), SBoolI (FloatingTypes a)) Source #

Constraint synonym for what we can put into columns.

fromVector :: (Columnable a, ColumnifyRep (KindOf a) a) => Vector a -> Column Source #

O(n) Convert a vector to a column. Automatically picks the best representation of a vector to store the underlying data in.

Examples:

> import qualified Data.Vector as V
> fromVector (VB.fromList [(1 :: Int), 2, 3, 4])
[1,2,3,4]

fromUnboxedVector :: (Columnable a, Unbox a) => Vector a -> Column Source #

O(n) Convert an unboxed vector to a column. This avoids the extra conversion if you already have the data in an unboxed vector.

Examples:

> import qualified Data.Vector.Unboxed as V
> fromUnboxedVector (VB.fromList [(1 :: Int), 2, 3, 4])
[1,2,3,4]

fromList :: (Columnable a, ColumnifyRep (KindOf a) a) => [a] -> Column Source #

O(n) Convert a list to a column. Automatically picks the best representation of a vector to store the underlying data in.

Examples:

> fromList [(1 :: Int), 2, 3, 4]
[1,2,3,4]

mapColumn :: (Columnable b, Columnable c) => (b -> c) -> Column -> Either DataFrameException Column Source #

An internal function to map a function over the values of a column.

imapColumn :: (Columnable b, Columnable c) => (Int -> b -> c) -> Column -> Either DataFrameException Column Source #

Applies a function that returns an unboxed result to an unboxed vector, storing the result in a column.

columnLength :: Column -> Int Source #

O(1) Gets the number of elements in the column.

numElements :: Column -> Int Source #

O(n) Gets the number of elements in the column.

takeColumn :: Int -> Column -> Column Source #

O(n) Takes the first n values of a column.

takeLastColumn :: Int -> Column -> Column Source #

O(n) Takes the last n values of a column.

sliceColumn :: Int -> Int -> Column -> Column Source #

O(n) Takes n values after a given column index.

atIndices :: Set Int -> Column -> Column Source #

O(n) Selects the elements at a given set of indices. May change the order.

atIndicesStable :: Vector Int -> Column -> Column Source #

O(n) Selects the elements at a given set of indices. Does not change the order.

gatherWithSentinel :: Vector Int -> Column -> Column Source #

Like atIndicesStable but treats negative indices as null, producing an OptionalColumn. Keeps the index vector fully unboxed (no VB.Vector (Maybe Int)).

getIndices :: Vector Int -> Vector a -> Vector a Source #

Internal helper to get indices in a boxed vector.

getIndicesUnboxed :: Unbox a => Vector Int -> Vector a -> Vector a Source #

Internal helper to get indices in an unboxed vector.

sortedIndexes :: Bool -> Column -> Vector Int Source #

An internal function that returns a vector of how indexes change after a column is sorted.

ifoldrColumn :: (Columnable a, Columnable b) => (Int -> a -> b -> b) -> b -> Column -> Either DataFrameException b Source #

Fold (right) column with index.

foldlColumn :: (Columnable a, Columnable b) => (b -> a -> b) -> b -> Column -> Either DataFrameException b Source #

foldDirectGroups :: Columnable b => (acc -> b -> acc) -> acc -> Column -> Vector Int -> Vector Int -> Either DataFrameException (Vector acc) Source #

O(n) Fold a column over groups without materialising a sorted copy. Instead of backpermuting the column (expensive: O(n) allocation + random reads), this iterates valueIndices sequentially and accesses the column at each index. Avoids one 10M-element allocation per column per aggregation expression.

foldl1DirectGroups :: Columnable a => (a -> a -> a) -> Column -> Vector Int -> Vector Int -> Either DataFrameException (Vector a) Source #

O(n) Seedless fold over groups using the first element of each group as seed. Like foldDirectGroups but for the case where no initial accumulator is available.

foldLinearGroups :: (Columnable b, Columnable acc) => (acc -> b -> acc) -> acc -> Column -> Vector Int -> Int -> Either DataFrameException Column Source #

O(n) fold over groups by scanning the column LINEARLY. rowToGroup[i] = group index for row i. Avoids random column reads; random writes go to the accumulator array which is small (nGroups entries) and typically cache-resident. When acc is unboxable, uses an unboxed mutable vector for the accumulator array, eliminating pointer indirection on every read/write.

zipColumns :: Column -> Column -> Column Source #

An internal, column version of zip.

mergeColumns :: Column -> Column -> Column Source #

Merge two columns using These.

zipWithColumns :: (Columnable a, Columnable b, Columnable c) => (a -> b -> c) -> Column -> Column -> Either DataFrameException Column Source #

An internal, column version of zipWith.

expandColumn :: Int -> Column -> Column Source #

Fills the end of a column, up to n, with Nothing. Does nothing if column has length greater than n.

leftExpandColumn :: Int -> Column -> Column Source #

Fills the beginning of a column, up to n, with Nothing. Does nothing if column has length greater than n.

concatColumns :: Column -> Column -> Either DataFrameException Column Source #

Concatenates two columns. Returns Nothing if the columns are of different types.

concatManyColumns :: [Column] -> Column Source #

Concatenates two columns.

Works similar to concatColumns, but unlike that function, it will also combine columns of different types by wrapping the values in an Either.

E.g. combining Column containing [1,2] with Column containing ["a","b"] will result in a Column containing [Left 1, Left 2, Right "a", Right "b"].

O(n) Concatenate a list of same-type columns in a single allocation. All columns must have the same constructor and element type (as they will within a single Parquet column). Calls error on mismatch.

newMutableColumn :: Int -> Column -> IO MutableColumn Source #

Allocate a mutable column of size n matching the constructor/type of the given column.

copyIntoMutableColumn :: MutableColumn -> Int -> Column -> IO () Source #

Copy a column chunk into a mutable column starting at offset off.

freezeMutableColumn :: MutableColumn -> IO Column Source #

Freeze a mutable column into an immutable column.

toList :: Columnable a => Column -> [a] Source #

O(n) Converts a column to a list. Throws an exception if the wrong type is specified.

Examples:

> column = fromList [(1 :: Int), 2, 3, 4]
> toList Int column
[1,2,3,4]
> toList Double column
exception: ...

toVector :: forall a v. (Vector v a, Columnable a) => Column -> Either DataFrameException (v a) Source #

Converts a column to a vector of a specific type.

This is a type-safe conversion that requires the column's element type to exactly match the requested type. You must specify the desired type via type applications.

Type Parameters

Expand
a
The element type to convert to
v
The vector type (e.g., Vector, Vector)

Examples

Expand
>>> toVector @Int @VU.Vector column
Right (unboxed vector of Ints)
>>> toVector @Text @VB.Vector column
Right (boxed vector of Text)

Returns

Expand

See also

Expand

For numeric conversions with automatic type coercion, see toDoubleVector, toFloatVector, and toIntVector.

toDoubleVector :: Column -> Either DataFrameException (Vector Double) Source #

Converts a column to an unboxed vector of Double values.

This function performs intelligent type coercion for numeric types:

  • If the column is already Double, returns it directly
  • If the column contains other floating-point types, converts via realToFrac
  • If the column contains integral types, converts via fromIntegral (beware of overflow if the type is Integer).

Optional column handling

Expand

For OptionalColumn types, Nothing values are converted to NaN (Not a Number). This allows optional numeric data to be represented in the resulting vector.

Returns

Expand

toFloatVector :: Column -> Either DataFrameException (Vector Float) Source #

Converts a column to an unboxed vector of Float values.

This function performs intelligent type coercion for numeric types:

  • If the column is already Float, returns it directly
  • If the column contains other floating-point types, converts via realToFrac
  • If the column contains integral types, converts via fromIntegral
  • If the column is boxed Integer, converts via fromIntegral (beware of overflow for 64-bit integers and Integer)

Optional column handling

Expand

For OptionalColumn types, Nothing values are converted to NaN (Not a Number). This allows optional numeric data to be represented in the resulting vector.

Returns

Expand

Precision warning

Expand

Converting from Double to Float may result in loss of precision.

toIntVector :: Column -> Either DataFrameException (Vector Int) Source #

Converts a column to an unboxed vector of Int values.

This function performs intelligent type coercion for numeric types:

  • If the column is already Int, returns it directly
  • If the column contains floating-point types, rounds via round and converts
  • If the column contains other integral types, converts via fromIntegral
  • If the column is boxed Integer, converts via fromIntegral

Returns

Expand

Note

Expand

Unlike toDoubleVector and toFloatVector, this function does NOT support OptionalColumn. Optional columns must be handled separately.

Rounding behavior

Expand

Floating-point values are rounded to the nearest integer using round. For example: 2.5 rounds to 2, 3.5 rounds to 4 (banker's rounding).