dataframe-0.7.0.0: A fast, safe, and intuitive DataFrame library.
Safe HaskellNone
LanguageHaskell2010

DataFrame.Operations.Statistics

Synopsis

Documentation

frequencies :: Columnable a => Expr a -> DataFrame -> DataFrame Source #

Show a frequency table for a categorical feaure.

Examples:

ghci> df <- D.readCsv "./data/housing.csv"

ghci> D.frequencies "ocean_proximity" df

---------------------------------------------------------------------
   Statistic    | <1H OCEAN | INLAND | ISLAND | NEAR BAY | NEAR OCEAN
----------------|-----------|--------|--------|----------|-----------
      Text      |    Any    |  Any   |  Any   |   Any    |    Any
----------------|-----------|--------|--------|----------|-----------
 Count          | 9136      | 6551   | 5      | 2290     | 2658
 Percentage (%) | 44.26%    | 31.74% | 0.02%  | 11.09%   | 12.88%

mean :: (Columnable a, Real a, Unbox a) => Expr a -> DataFrame -> Double Source #

Calculates the mean of a given column as a standalone value.

median :: (Columnable a, Real a, Unbox a) => Expr a -> DataFrame -> Double Source #

Calculates the median of a given column as a standalone value.

medianMaybe :: (Columnable a, Real a) => Expr (Maybe a) -> DataFrame -> Double Source #

Calculates the median of a given column (containing optional values) as a standalone value.

percentile :: (Columnable a, Real a, Unbox a) => Int -> Expr a -> DataFrame -> Double Source #

Calculates the nth percentile of a given column as a standalone value.

genericPercentile :: (Columnable a, Ord a) => Int -> Expr a -> DataFrame -> a Source #

Calculates the nth percentile of a given column as a standalone value.

standardDeviation :: (Columnable a, Real a, Unbox a) => Expr a -> DataFrame -> Double Source #

Calculates the standard deviation of a given column as a standalone value.

skewness :: (Columnable a, Real a, Unbox a) => Expr a -> DataFrame -> Double Source #

Calculates the skewness of a given column as a standalone value.

variance :: (Columnable a, Real a, Unbox a) => Expr a -> DataFrame -> Double Source #

Calculates the variance of a given column as a standalone value.

interQuartileRange :: (Columnable a, Real a, Unbox a) => Expr a -> DataFrame -> Double Source #

Calculates the inter-quartile range of a given column as a standalone value.

correlation :: Text -> Text -> DataFrame -> Maybe Double Source #

Calculates the Pearson's correlation coefficient between two given columns as a standalone value.

sum :: (Columnable a, Num a) => Expr a -> DataFrame -> a Source #

Calculates the sum of a given column as a standalone value.

imputeWith :: Columnable b => (Expr b -> Expr b) -> Expr (Maybe b) -> DataFrame -> DataFrame Source #

O(n) Impute missing values in a column using a derived scalar.

Given

  • an expression f :: Expr b -> Expr b that, when interpreted over a non-nullable column, produces the same value in every row (for example a mean, median, or other aggregate), and
  • a nullable column Expr (Maybe b)

this function:

  1. Drops all Nothing values from the target column.
  2. Interprets f on the remaining non-null values.
  3. Checks that the resulting column contains a single repeated value.
  4. Uses that value to impute all Nothings in the original column.

Throws

Expand

Example

Expand
>>> :set -XOverloadedStrings
>>> import qualified DataFrame as D
>>> let df =
...       D.fromNamedColumns
...         [ ("age", D.fromList [Just 10, Nothing, Just 20 :: Maybe Int]) ]
>>>
>>> -- Impute missing ages with the mean of the observed ages
>>> D.imputeWith F.mean "age" df
-- age
-- ----
-- 10
-- 15
-- 20

summarize :: DataFrame -> DataFrame Source #

Descriptive statistics of the numeric columns.

roundTo :: Int -> Double -> Double Source #

Round a Double to Specified Precision