futhark-0.25.31: An optimising compiler for a functional, array-oriented language.
Safe HaskellNone
LanguageGHC2021

Futhark.CodeGen.ImpGen.GPU.SegHist

Description

Our compilation strategy for SegHist is based around avoiding bin conflicts. We do this by splitting the input into chunks, and for each chunk computing a single subhistogram. Then we combine the subhistograms using an ordinary segmented reduction (SegRed).

There are some branches around to efficiently handle the case where we use only a single subhistogram (because it's large), so that we respect the asymptotics, and do not copy the destination array.

We also use a heuristic strategy for computing subhistograms in shared memory when possible. Given:

H: total size of histograms in bytes, including any lock arrays.

G: block size

T: number of bytes of shared memory each thread can be given without impacting occupancy (determined experimentally, e.g. 32).

LMAX: maximum amount of shared memory per threadblock (hard limit).

We wish to compute:

COOP: cooperation level (number of threads per subhistogram)

LH: number of shared memory subhistograms

We do this as:

COOP = ceil(H / T) LH = ceil((G*T)/H) if COOP <= G && H <= LMAX then use shared memory else use global memory

Synopsis

Documentation

compileSegHist :: Pat LetDecMem -> SegLevel -> SegSpace -> [HistOp GPUMem] -> KernelBody GPUMem -> CallKernelGen () Source #

Generate code for a segmented histogram called from the host.