hanalyze: A general-purpose statistical analysis, optimization and visualization toolkit

[ bsd3, library, machine-learning, math, numeric, program, statistics ] [ Propose Tags ] [ Report a vulnerability ]

hanalyze is a self-contained Haskell toolkit for classical regression (LM, GLM, GLMM, splines, kernels, GP, RFF), Bayesian modeling (HBM DSL with MH, HMC, NUTS, Gibbs, ADVI), design of experiments (full/fractional factorial, RSM, D-optimal, orthogonal arrays, Taguchi), optimization (Nelder-Mead, L-BFGS, DE, CMA-ES, NSGA-II, Bayesian optimization, augmented Lagrangian), and Vega-Lite-based visualization with HTML PNG SVG output. . All algorithms are implemented natively in Haskell — no R Stan Python bridges. Data interchange uses the dataframe package as a first-class citizen. . A unified hanalyze command-line interface exposes the most common workflows (regress, info, hist, doe, taguchi, ridge, kernel, spline, multireg, clean, melt, regrid, ...).


[Skip to Readme]

Modules

[Index] [Quick Jump]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.1.0.0, 0.1.0.1
Change log CHANGELOG.md
Dependencies ad (>=4.4 && <4.6), aeson (>=2.0 && <2.3), async (>=2.2 && <2.3), base (>=4.14 && <5), bytestring (>=0.11 && <0.13), cassava (>=0.5 && <0.6), containers (>=0.6 && <0.8), dataframe (>=0.4 && <2), deepseq (>=1.4 && <1.6), directory (>=1.3 && <1.4), filepath (>=1.4 && <1.6), hanalyze, hmatrix (>=0.20 && <0.21), hvega (>=0.12 && <0.13), massiv (>=1.0 && <1.1), mwc-random (>=0.15 && <0.16), parallel (>=3.2 && <3.3), process (>=1.6 && <1.8), random (>=1.2 && <1.4), statistics (>=0.16 && <0.17), tasty (>=1.4 && <1.6), tasty-bench (>=0.3 && <0.5), temporary (>=1.3 && <1.4), text (>=1.2 && <2.2), time (>=1.11 && <1.13), unordered-containers (>=0.2 && <0.3), vector (>=0.13 && <0.14), vector-algorithms (>=0.9 && <0.10) [details]
Tested with ghc ==9.6.7
License BSD-3-Clause
Copyright 2026 Toshiaki Honda
Author Toshiaki Honda
Maintainer frenzieddoll@gmail.com
Uploaded by frenzieddoll at 2026-05-19T10:44:03Z
Category Math, Statistics, Numeric, Machine Learning
Home page https://github.com/frenzieddoll/hanalyze
Bug tracker https://github.com/frenzieddoll/hanalyze/issues
Source repo head: git clone https://github.com/frenzieddoll/hanalyze.git
Distributions
Executables bench-tasty, bench-profile, bench-regression, bench-ts-extras, bench-optim-plus, bench-stat-util, bench-multi-output, bench-regrid, bench-mcmc-extras, bench-mcmc-b7, bench-mcmc-diag, bench-survts, bench-ml, bench-kernel, bench-mem-mcmc, bench-mem-bo, bench-mem-nsga2, bench-mem-aggregate, bench-mem-vi, bench-rff-oom, bench-optim, bench-bootstrap-isolate, bench-beta-isolate, bench-mo, bench-massiv, bench-bo, bench-data-gen, hbm-regression, simpson-paradox, hbm-random-slope, new-distrib-demo, discrete-obs-demo, ppc-demo, forest-compare, single-opt-bench-demo, potential-multikr-demo, potential-multiout-demo, potential-demo, mixture-demo, trunc-censor-demo, cdf-test, mvnormal-demo, energy-demo, pymc-status-demo, summary-demo, deterministic-demo, noncentered-demo, dirichlet-demo, setdata-demo, mvnormal-latent-demo, negbinom-demo, multinomial-demo, zeroinflated-demo, lkj-demo, lkj3d-demo, ar1-demo, slice-demo, integrated-demo, spline-demo, kernel-demo, doe-demo, rsm-demo, nsga-smoke, nsga-demo, multilm-demo, multivariate-demo, multirsm-demo, bayesopt-demo, materials-moo-demo, pareto-smoke, optimaldoe-demo, regularized-demo, newdistribs-demo, gibbs-hbm-demo, rff-demo, robust-gp-demo, new-sections-demo, analysis-compare-demo, external-io-demo, dirty-data-demo, preprocess-demo, gp-demo, clinical-trial, bar-demo, regrid-bench-demo, potential-gen, gibbs-demo, vi-demo, bench-mcmc, test-hmc-nuts, hbm-example, glmm-demo, hanalyze
Downloads 2 total (2 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs uploaded by user
Build status unknown [no reports yet]

Readme for hanalyze-0.1.0.0

[back to package description]

hanalyze

🌐 English | ζ—₯本θͺž

License: BSD-3 GHC

hanalyze is a Haskell-native statistical engineering toolkit: regression, GLMM, Bayesian inference (HMC/NUTS/Gibbs/ADVI), Gaussian processes, design of experiments, multi-objective optimisation, and HTML reporting integrated under one API. Core modelling and optimisation logic is implemented in Haskell, with numerical linear algebra delegated to hmatrix/BLAS/LAPACK. No R/Stan/Python bridge required. Benchmarks (see below) show competitive accuracy with Python/R references in the tested cases. Performance varies by domain: optimisation and small-to-medium MCMC workloads are often faster in these benchmarks, while large-scale ML/GLM workloads are currently slower than sklearn.


Highlights

  • Haskell-native: types catch many dtype/API mismatches; shape checks happen at runtime where needed
  • Algorithms in Haskell, BLAS for numerics: hmatrix/BLAS/LAPACK powers linear algebra; no R/Stan/Python bridge
  • HTML reporting: MathJax/Mermaid + Vega-Lite visualisations in one call; PNG/SVG export available for supported plots
  • Dirty-data defence: 8 warning codes + auto-sniff (delim/header/encoding) + cleaning DSL
  • Hackage dataframe: Polars-like DataFrame used directly; CSV native, Parquet/JSON support through dataframe

Capabilities

Features grouped by category. Each capability links to a usage doc and (where relevant) a theory doc.

Statistical inference (Hanalyze.Stat.*)

Feature Module Usage Theory
12 hypothesis tests (t/χ²/ANOVA/Wilcoxon/KS/Shapiro/Levene/Bartlett/...) Hanalyze.Stat.Test stat/01-test.md β€”
Multiple-testing correction (Bonferroni/Holm/BH/BY) Hanalyze.Stat.MultipleTesting stat/06-multipletesting.md β€”
Bootstrap CI / permutation tests Hanalyze.Stat.Bootstrap stat/07-bootstrap.md β€”
Effect size + power analysis (Cohen's d/Ξ·Β²/CramΓ©r V/n estimation) Hanalyze.Stat.Effect stat/09-effect.md β€”
Cross-validation (k-fold/stratified/LOO) + Grid search Hanalyze.Stat.CV stat/04-cv.md β€”

Regression (Hanalyze.Model.*)

Feature Module Usage Theory
Linear regression (LM) + inference stats (SE/t/p, F, AIC/BIC, leverage, Cook's) Hanalyze.Model.LM / Hanalyze.Model.LM.Diagnostics regression/01-lm.md principles/lm.md
GLM (Binomial / Poisson / Gaussian) Hanalyze.Model.GLM regression/02-glm.md principles/glm.md
GLMM / mixed-effects model (LME) Hanalyze.Model.GLMM regression/03-glmm.md principles/glmm.md
Spline regression (B-spline / NaturalCubic) Hanalyze.Model.Spline regression/04-spline.md regression/theory-regression-extensions.md
Kernel regression (NW / Kernel Ridge) + multi-D inputs Hanalyze.Model.Kernel regression/04-kernel.md same
Regularised (Ridge / Lasso / ElasticNet) Hanalyze.Model.Regularized regression/04-regularized.md same
Gaussian process (RBF / MatΓ©rn / Periodic + ARD + multi-input) Hanalyze.Model.GP regression/04-gp.md principles/gp.md
Random Fourier Features (large-scale GP approximation) Hanalyze.Model.RFF regression/04-rff.md regression/theory-regression-extensions.md
Multivariate regression / Multi-output GP Hanalyze.Model.{Multivariate,MultiGP,MultiOutput} regression/05-multivariate.md regression/theory-multivariate.md
Quantile regression Hanalyze.Model.Quantile regression/06-quantile.md regression/theory-regression-extensions.md
Generalized additive model (GAM) Hanalyze.Model.GAM regression/06-gam.md same
Random forest (regression) Hanalyze.Model.RandomForest regression/06-randomforest.md same
Multi-output regression + interactive HTML Hanalyze.Model.MultiOutput regression/07-multireg.md regression/theory-multivariate.md

Machine learning (Hanalyze.Model.* / Hanalyze.Stat.*)

Feature Module Usage Theory
PCA + cumulative variance + standardisation Hanalyze.Model.PCA stat/02-pca.md β€”
Clustering (K-means + k-means++ + silhouette) Hanalyze.Model.Cluster stat/05-cluster.md β€”
Decision tree (CART classifier) Hanalyze.Model.DecisionTree regression/08-decisiontree.md β€”
Time series (ARIMA / Holt-Winters / STL / ACF / PACF) Hanalyze.Model.TimeSeries regression/09-timeseries.md β€”
Survival analysis (Kaplan-Meier / Nelson-Aalen / Log-rank / Cox PH) Hanalyze.Model.Survival regression/10-survival.md β€”
Classification metrics (Confusion / AUC / F1 / MCC / log-loss / Brier) Hanalyze.Stat.ClassMetrics stat/03-classmetrics.md β€”
Model interpretation (Permutation imp / PDP / ICE) Hanalyze.Stat.Interpret stat/13-interpret.md β€”

Bayesian (Hanalyze.MCMC.* / Hanalyze.Stat.* / Hanalyze.Model.HBM)

Feature Module Usage Theory
27 probability distributions (Truncated/Censored/MvNormal/LKJ/Multinomial/...) Hanalyze.Stat.Distribution bayesian/01-distributions.md bayesian/theory-distributions.md
Probabilistic model DSL (HBM polymorphic free monad, incl. deterministic / dataNamed) Hanalyze.Model.HBM bayesian/02-probabilistic-model.md principles/hbm.md
MCMC samplers (MH / HMC / NUTS / Slice) Hanalyze.MCMC.{MH,HMC,NUTS,Slice} bayesian/03-mcmc-samplers.md bayesian/theory-mcmc.md / theory-hmc-nuts.md
Gibbs sampling (auto-conjugate detection + hybrid) Hanalyze.MCMC.Gibbs bayesian/04-gibbs.md bayesian/theory-mcmc.md
Variational inference (ADVI mean-field Adam) Hanalyze.Stat.VI bayesian/05-vi.md bayesian/theory-advanced.md
Model comparison (WAIC / PSIS-LOO / Pseudo-BMA) Hanalyze.Stat.ModelSelect bayesian/06-model-comparison.md bayesian/theory-bayesian-basics.md
Posterior predictive checks; selected PyMC-style modelling features Hanalyze.Stat.PosteriorPredictive 02-pymc-comparison.md β€”

Optimisation (Hanalyze.Optim.*)

Feature Module Usage Theory
Single-obj (gradient): NM / L-BFGS / Brent Hanalyze.Optim.NelderMead
Hanalyze.Optim.LBFGS
Hanalyze.Optim.LineSearch
optim/01-singleobj.md optim/theory-singleobj.md
Single-obj (evolutionary): DE / CMA-ES / SA / PSO Hanalyze.Optim.DifferentialEvolution
Hanalyze.Optim.CMAES
Hanalyze.Optim.SimulatedAnnealing
Hanalyze.Optim.ParticleSwarm
optim/01-singleobj.md optim/theory-singleobj.md
Multi-objective (NSGA-II + Pareto) Hanalyze.Optim.{NSGA,Pareto} optim/02-multi-objective.md optim/theory-pareto-moo.md
Acquisition functions (EHVI / ParEGO / EI / LCB / PI) Hanalyze.Optim.Acquisition optim/02-multi-objective.md optim/theory-bayesopt.md
Bayesian optimisation (BO + GP-Hedge + analytic gradient) Hanalyze.Optim.BayesOpt optim/01-singleobj.md optim/theory-bayesopt.md
Algorithm selection guide β€” optim/03-algorithm-guide.md β€”

Design of experiments (Hanalyze.Design.*)

Feature Module Usage Theory
DoE (Factorial / Block / Mixed / RSM / Optimal / Power / Quality) Hanalyze.Design.{Factorial,Block,Mixed,RSM,Optimal,Power,Quality,MultiRSM,Anova} doe/01-doe.md doe/theory-doe.md
Orthogonal arrays (L4/L8/L9/L12/L16/L18) + Taguchi (S/N + inner/outer) + process capability (Cp/Cpk) Hanalyze.Design.{Orthogonal,Taguchi,Quality} doe/02-orthogonal-taguchi.md doe/theory-doe.md

Visualisation (Hanalyze.Viz.*)

Feature Module Usage
Scatter / bar / histograms / MCMC diagnostics / GP plot / Pareto plot Hanalyze.Viz.{Scatter,Bar,Histogram,MCMC,GP,Pareto,ModelGraph,Taguchi} visualization/01-visualization.md
Integrated HTML report (MathJax + Mermaid + interactive) Hanalyze.Viz.ReportBuilder visualization/02-report-builder.md

Data I/O (Hanalyze.DataIO.*)

Feature Module Usage
CSV/TSV/SSV (cassava) + Parquet/JSON (Hackage dataframe) Hanalyze.DataIO.{CSV,External,Convert} io/01-dirty-data.md
Dirty-data defence (W001-W008 warnings + auto-sniff + clean DSL) Hanalyze.DataIO.{Health,Sniff,Clean,Log} io/01-dirty-data.md
Reshape (pivot_wider / one-hot / lag-lead / rolling window) Hanalyze.DataIO.Reshape io/02-reshape.md
Preprocessing (impute / groupBy / derived columns / melt) Hanalyze.DataIO.Preprocess io/01-dirty-data.md
Long-form regrid (regridLong) Hanalyze.DataIO.Preprocess + Hanalyze.Stat.Interpolate io/03-regrid.md

Quick start

30 seconds via CLI

git clone https://github.com/frenzieddoll/hanalyze
cd hanalyze
cabal build all

# Regress sales on price + promo, write an HTML report.
hanalyze regress data/readme/sales.csv "price promo" sales --report sales.html
# Ξ²β‚€=185.05  Ξ²(price)=-4.37  Ξ²(promo)=+32.29  RΒ²=0.995

data/readme/sales.csv is a 20-row demo CSV shipped with the repository (price, promo, sales). The generated sales.html includes coefficients, fit diagnostics, and an interactive prediction widget β€” straight from one command.

30 seconds via Haskell API

import qualified Stat.Test as ST
import qualified Numeric.LinearAlgebra as LA

main = do
  let xs = LA.fromList [12, 14, 13, 15, 17, 11]
      ys = LA.fromList [18, 22, 20, 19, 25, 17]
      result = ST.tTestWelch xs ys ST.TwoSided
  print (ST.trPValue result, ST.trEffect result)
  -- (0.012, Just ("Cohen's d", -1.85))

See docs/01-quickstart.md for a fuller introduction.


CLI

hanalyze help                     list subcommands
hanalyze regress <file> <x> <y>   LM/GLM/GP/HBM regression + HTML report
hanalyze info <file>              per-column type/statistics
hanalyze hist <file> <col>        histogram with theoretical PDF overlay
hanalyze ridge <file> ...         regularised regression (Ridge/Lasso/EN)
hanalyze kernel <file> ...        kernel regression (NW/KR/RFF), multi-D inputs
hanalyze spline <file> ...        spline regression
hanalyze multireg <file> ...      multi-output regression + interactive HTML
hanalyze melt <file> ...          long-form transform
hanalyze regrid <file> ...        time-axis grid alignment
hanalyze doe ortho <NAME> -f ...  orthogonal-array generation
hanalyze taguchi sn / analyze     Taguchi method
hanalyze clean <file> --rule ...  dirty-data cleaning

For per-command flags, run hanalyze <cmd> --help or see docs/01-quickstart.md.


Examples / demos

demo/ contains many demos (60+ as of this release). Highlights:

Demo Summary
demo/regression/HBMRegressionDemo.hs HBM Bayesian linear regression with NUTS + HTML
demo/regression/RFFDemo.hs Large-scale GP via Random Fourier Features
demo/regression/RobustGPDemo.hs Robust GP with Student-t observation likelihood
demo/doe-optim/NSGADemo.hs NSGA-II + Pareto on the ZDT suite
demo/doe-optim/BayesOptDemo.hs BO on Branin / Hartmann6
demo/bayesian/HBMComparisonDemo.hs Compare HBMs with WAIC / LOO
demo/bayesian/SimpsonParadoxDemo.hs Disentangle Simpson's paradox via hierarchical model
demo/io/DirtyDataDemo.hs Auto-defend against 19 dirty CSV variants

Run: dist-newstyle/build/x86_64-linux/ghc-9.6.7/hanalyze-0.1.0.0/x/<demo-name>/build/<demo-name>/<demo-name>.


Where hanalyze fits

Rather than a complete Python/R replacement, hanalyze targets specific workflows where Haskell integration, single-binary CLI, and tight reporting add value.

Strong fit

  • Haskell-native pipelines that need stats/Bayes/optim without calling out to Python
  • Single-binary CLI distribution (one hanalyze binary, no Python venv)
  • Dirty-CSV defence + cleaning + analysis in one workflow
  • DoE / Taguchi / orthogonal arrays for manufacturing and process tuning
  • HTML reports straight from the analysis (no separate templating step)
  • Type-safe analysis pipelines that catch dtype/API mismatches early

Not a goal β€” keep using existing tools for

  • Large-scale DataFrame work (pandas / polars / data.table)
  • GPU deep learning (PyTorch / JAX)
  • The full breadth of scikit-learn's mature model zoo
  • The full Stan / PyMC MCMC diagnostics ecosystem
  • The full expressive range of ggplot2

Comparison vs Python

R is included in the feature map only β€” no numerical bench against R has been run.

Numbers below come from bench/results/{haskell,python}/*.csv; see bench/results/SUMMARY.md for the full table and benchmark conditions (OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1, single-thread, deterministic seeds).

Domain Result in these benchmarks
Single-objective optim (DE/CMAES/L-BFGS/NM) Often faster than scipy in tested cases (Rosenbrock_2D/DE 134Γ—, Ackley/CMAES 49Γ—, Griewank/CMAES 54Γ—). On Sphere_30D/L-BFGS the reported objective value is 8.1e-40 vs scipy 2.6e-11 in this run.
Multi-objective optim (NSGA-II) Comparable or favourable in the ZDT/DTLZ suite (DTLZ2_3 1.43Γ— faster, ZDT1/2/3 within Β±5% of pymoo). HV/IGD figures match or slightly improve on pymoo in these runs.
Bayesian optim (BO) Comparable on Branin (1.15Γ—); on Hartmann6 the best objective in this run was -3.07 vs skopt -2.77.
Simulated annealing (Tsallis SA) Comparable; Rastrigin_10D reaches 0.0 in this run (scipy dual_annealing reports 7.8e-14).
Classical regression (LM/Ridge/Lasso/GLMM) Comparable in tested cases; LME 30Γ— faster than statsmodels in our LME run.
Large-scale GLM/Lasso (n β‰₯ 10k) Currently slower than sklearn (3-5Γ— in tested cases) β€” sklearn's Cython inner loops dominate.
Kernel/GP Currently slower than sklearn (2.5-4.7Γ— in tested cases).
Bayesian MCMC (NUTS/HMC) NUTS with ESS comparable to blackjax (mu: 839 vs 810) on the 8-schools benchmark; 7.4Γ— faster than PyMC; 2.8Γ— slower than blackjax (JAX-JIT advantage).
HBM (probabilistic programming) Polymorphic DSL with selected PyMC-style modelling features and selected distributions (Truncated/Censored/MvNormal/LKJ/...).
VI / WAIC / LOO ADVI 3.0Γ— faster than numpyro SVI on a small logistic posterior; LOO 2.9Γ— faster than arviz on (S=1000, N=200) log-lik matrix.
Hypothesis tests / bootstrap / k-fold Welch t-test 39Γ— faster, KS 11Γ—, k-fold split 2.2Γ— faster than scipy/sklearn in tested cases.
Time series / Spline / GAM ARIMA 128Γ— faster than statsmodels; Spline PCHIP comparable to scipy; GAM ~1.6Γ— slower than pygam in tested cases.
Survival analysis (KM/Cox PH) Comparable to lifelines in tested cases (KM/CoxPH).
Multi-output regression / Regrid MultiLM 2.3Γ— faster than sklearn; regridLong 20Γ— faster than a hand-written pandas+scipy synthesis.
Visualisation Vega-Lite specs via hvega (grammar-of-graphics-style); HTML reports built-in.

See docs/comparison/python-r.md for the feature map, and bench/results/SUMMARY.md for numbers.


Benchmark highlights

Selected results from bench/results/SUMMARY.md. Each entry is a single benchmark configuration; absolute objective values depend on iteration counts, seeds, and tolerances β€” see the SUMMARY for full conditions.

  • NUTS 8-schools (warmup 500, samples 1000): hanalyze 1492 ms with ESS(mu) 839 vs blackjax 530 ms / ESS 810 in this run
  • Holt-Winters seasonal n=500 p=12: hanalyze 0.19 ms vs statsmodels MLE 96 ms in this run (note: hanalyze uses fixed Ξ±=0.3 closed-form; statsmodels does MLE)
  • Sphere_30D/DE: hanalyze 1.0e-26 vs scipy 2.8e-5 on this benchmark
  • Sphere_30D/L-BFGS: hanalyze 8.1e-40 vs scipy 2.6e-11 on this benchmark
  • Rastrigin_10D/SA: hanalyze 0.0 vs scipy dual_annealing 7.8e-14 in this run
  • Hartmann6/BO: hanalyze -3.07 vs skopt -2.77 in this run
  • DTLZ2_3/NSGA-II: hanalyze 528 ms vs pymoo 758 ms (1.43Γ— faster in this run)
  • DE Rosenbrock_2D: hanalyze 1.2 ms vs scipy 164 ms (134Γ— faster in this run)
  • Constrained Quad2D (eq): hanalyze 0.062 ms vs scipy SLSQP 0.69 ms in this run
  • regridLong on jagged long-form: hanalyze 0.99 ms vs pandas+scipy synthesis 19.4 ms in this run

Reproduce: OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 cabal run bench-{regression,kernel,optim,mo,bo,mcmc-b7,mcmc-extras,ts-extras,optim-plus,stat-util,multi-output,regrid}, then bench/python/bench_*.py (see bench/README.md).


Architecture

graph TD
  IO[DataIO.* CSV/Parquet/JSON]
  IO --> DF[Hackage dataframe]
  DF --> Models[Model.* regression/ML/Bayesian/TS/Survival]
  DF --> Stat[Stat.* tests/CV/effect/interpret]
  Models --> Optim[Optim.* optimisation]
  Models --> MCMC[MCMC.* samplers]
  Models --> Viz[Viz.* HTML/PNG/SVG]
  Stat --> Viz
  MCMC --> Viz
  Optim --> Design[Design.* DoE/Taguchi]

All modules talk to Hackage dataframe directly. The internal DataFrame.Core was retired.


Roadmap & API stability

  • Stable (API expected to remain backward-compatible within minor versions): Hanalyze.DataIO.*, Hanalyze.Stat.{Test, Bootstrap, MultipleTesting, ClassMetrics, CV, Effect, Distribution}, Hanalyze.Model.{LM, GLM, Spline, Regularized, RandomForest, DecisionTree, TimeSeries, Survival, GAM}, Hanalyze.Optim.{NelderMead, LBFGS, DifferentialEvolution, CMAES, NSGA, BayesOpt, SimulatedAnnealing, ParticleSwarm}, Hanalyze.Design.*, Hanalyze.Viz.{Scatter, Bar, Histogram}.
  • Experimental (API may evolve): Hanalyze.Model.HBM DSL, Hanalyze.MCMC.NUTS (mass-matrix adaptation is opt-in), Hanalyze.Stat.VI (ADVI), Hanalyze.Model.{GP, RFF, GPRobust, GLMM}, Hanalyze.Viz.ReportBuilder. Behaviour is benchmarked but type signatures may shift.
  • Future direction: a unified top-level Hanalyze.* re-export layer, a Pipeline-style Unfitted β†’ Fitted API, and a backend-abstraction typeclass for swapping hmatrix/Massiv/Accelerate are under consideration but not on a fixed schedule.

Module layout

src/
  DataIO/      β€” CSV/JSON/Parquet IO + health checks + sniff + clean DSL + reshape (9 mods)
  Stat/        β€” tests/distributions/interpolation/effect/CV/bootstrap/interpret etc. (21 mods)
  Model/       β€” LM/GLM/GLMM/Spline/Kernel/GP/RFF/HBM/PCA/Cluster/Tree/TS/Survival (23 mods)
  Optim/       β€” single-obj (NM/LBFGS/DE/CMAES/SA/PSO) + multi-obj (NSGA/BO/Pareto) (18 mods)
  Design/      β€” Factorial/Block/RSM/Optimal/Orthogonal/Taguchi (11 mods)
  Viz/         β€” Vega-Lite-based visualisation + ReportBuilder (15 mods)
  MCMC/        β€” MH/HMC/NUTS/Gibbs/Slice (6 mods)

As of this release: 103 modules, 238 tests.


Build

cabal build all                  # library + all executables (60+ demos)
cabal test                       # hspec test suite
cabal repl                       # interactive REPL

Major dependencies: hmatrix (BLAS/LAPACK), hvega (Vega-Lite), statistics, mwc-random, dataframe (Hackage Polars-like), massiv (parallel arrays), ad (auto-diff), async.

Tested on GHC 9.6.7 + cabal 3.14.2.


Running benchmarks

# 1. Generate shared test data (fixed-seed, deterministic)
cabal run bench-data-gen

# 2. Haskell side
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 \
  cabal run bench-regression bench-kernel bench-optim bench-mo bench-bo

# 3. Python side (need bench/venv from bench/requirements.txt)
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 \
  bench/venv/bin/python bench/python/bench_regression.py
# (similarly for kernel, optim, mo, bo)

# 4. Aggregate (Markdown table)
bench/venv/bin/python bench/aggregate.py > bench/results/SUMMARY.md

Development

  • Issues / PRs: github.com/frenzieddoll/hanalyze
  • Adding tests: append hspec specs in test/Spec.hs
  • Adding benchmarks: place bench/haskell/Bench*.hs and matching Python script
  • Coding rules: see CONTRIBUTING.md (no list-passing on hot paths, minimise unsafe*, ...)

License

BSD-3-Clause License β€” see LICENSE.

Author

Toshiaki Honda frenzieddoll@gmail.com