| Copyright | (c) 2010 - 2011 Simon Meier | 
|---|---|
| License | BSD3-style (see LICENSE) | 
| Maintainer | Simon Meier <iridcode@gmail.com> | 
| Stability | unstable, private | 
| Portability | GHC | 
| Safe Haskell | Unsafe | 
| Language | Haskell2010 | 
Data.ByteString.Builder.Internal
Description
- Warning:* this module is internal. If you find that you need it then please contact the maintainers and explain what you are trying to do and discuss what you would need in the public API. It is important that you do this as the module may not be exposed at all in future releases.
Core types and functions for the Builder monoid and its generalization,
 the Put monad.
The design of the Builder monoid is optimized such that
- buffers of arbitrary size can be filled as efficiently as possible and
- sequencing of Builders is as cheap as possible.
We achieve (1) by completely handing over control over writing to the buffer
 to the BuildStep implementing the Builder. This BuildStep is just told
 the start and the end of the buffer (represented as a BufferRange). Then,
 the BuildStep can write to as big a prefix of this BufferRange in any
 way it desires. If the BuildStep is done, the BufferRange is full, or a
 long sequence of bytes should be inserted directly, then the BuildStep
 signals this to its caller using a BuildSignal.
We achieve (2) by requiring that every Builder is implemented by a
 BuildStep that takes a continuation BuildStep, which it calls with the
 updated BufferRange after it is done. Therefore, only two pointers have
 to be passed in a function call to implement concatenation of Builders.
 Moreover, many Builders are completely inlined, which enables the compiler
 to sequence them without a function call and with no boxing at all.
This design gives the implementation of a Builder full access to the IO
 monad. Therefore, utmost care has to be taken to not overwrite anything
 outside the given BufferRanges. Moreover, further care has to be taken to
 ensure that Builders and Puts are referentially transparent. See the
 comments of the builder and put functions for further information.
 Note that there are no safety belts at all, when implementing a Builder
 using an IO action: you are writing code that might enable the next
 buffer-overflow attack on a Haskell server!
Synopsis
- data Buffer = Buffer !(ForeignPtr Word8) !BufferRange
- data BufferRange = BufferRange !(Ptr Word8) !(Ptr Word8)
- newBuffer :: Int -> IO Buffer
- bufferSize :: Buffer -> Int
- byteStringFromBuffer :: Buffer -> ByteString
- data ChunkIOStream a- = Finished Buffer a
- | Yield1 ByteString (IO (ChunkIOStream a))
 
- buildStepToCIOS :: AllocationStrategy -> BuildStep a -> IO (ChunkIOStream a)
- ciosUnitToLazyByteString :: AllocationStrategy -> ByteString -> ChunkIOStream () -> ByteString
- ciosToLazyByteString :: AllocationStrategy -> (a -> (b, ByteString)) -> ChunkIOStream a -> (b, ByteString)
- data BuildSignal a
- type BuildStep a = BufferRange -> IO (BuildSignal a)
- finalBuildStep :: BuildStep ()
- done :: Ptr Word8 -> a -> BuildSignal a
- bufferFull :: Int -> Ptr Word8 -> BuildStep a -> BuildSignal a
- insertChunk :: Ptr Word8 -> ByteString -> BuildStep a -> BuildSignal a
- fillWithBuildStep :: BuildStep a -> (Ptr Word8 -> a -> IO b) -> (Ptr Word8 -> Int -> BuildStep a -> IO b) -> (Ptr Word8 -> ByteString -> BuildStep a -> IO b) -> BufferRange -> IO b
- data Builder
- builder :: (forall r. BuildStep r -> BuildStep r) -> Builder
- runBuilder :: Builder -> BuildStep ()
- runBuilderWith :: Builder -> BuildStep a -> BuildStep a
- empty :: Builder
- append :: Builder -> Builder -> Builder
- flush :: Builder
- ensureFree :: Int -> Builder
- byteStringCopy :: ByteString -> Builder
- byteStringInsert :: ByteString -> Builder
- byteStringThreshold :: Int -> ByteString -> Builder
- lazyByteStringCopy :: ByteString -> Builder
- lazyByteStringInsert :: ByteString -> Builder
- lazyByteStringThreshold :: Int -> ByteString -> Builder
- shortByteString :: ShortByteString -> Builder
- maximalCopySize :: Int
- byteString :: ByteString -> Builder
- lazyByteString :: ByteString -> Builder
- toLazyByteStringWith :: AllocationStrategy -> ByteString -> Builder -> ByteString
- data AllocationStrategy
- safeStrategy :: Int -> Int -> AllocationStrategy
- untrimmedStrategy :: Int -> Int -> AllocationStrategy
- customStrategy :: (Maybe (Buffer, Int) -> IO Buffer) -> Int -> (Int -> Int -> Bool) -> AllocationStrategy
- smallChunkSize :: Int
- defaultChunkSize :: Int
- chunkOverhead :: Int
- data Put a
- put :: (forall r. (a -> BuildStep r) -> BuildStep r) -> Put a
- runPut :: Put a -> BuildStep a
- putToLazyByteString :: Put a -> (a, ByteString)
- putToLazyByteStringWith :: AllocationStrategy -> (a -> (b, ByteString)) -> Put a -> (b, ByteString)
- hPut :: forall a. Handle -> Put a -> IO a
- putBuilder :: Builder -> Put ()
- fromPut :: Put () -> Builder
Buffer management
A Buffer together with the BufferRange of free bytes. The filled
 space starts at offset 0 and ends at the first free byte.
Constructors
| Buffer !(ForeignPtr Word8) !BufferRange | 
data BufferRange Source #
A range of bytes in a buffer represented by the pointer to the first byte of the range and the pointer to the first byte after the range.
Constructors
| BufferRange !(Ptr Word8) !(Ptr Word8) | 
bufferSize :: Buffer -> Int Source #
Combined size of the filled and free space in the buffer.
byteStringFromBuffer :: Buffer -> ByteString Source #
Convert the filled part of a Buffer to a strict ByteString.
data ChunkIOStream a Source #
A stream of chunks that are constructed in the IO monad.
This datatype serves as the common interface for the buffer-by-buffer
 execution of a BuildStep by buildStepToCIOS. Typical users of this
 interface are ciosToLazyByteString or iteratee-style libraries like
 enumerator.
Constructors
| Finished Buffer a | The partially filled last buffer together with the result. | 
| Yield1 ByteString (IO (ChunkIOStream a)) | Yield a non-empty strict  | 
Arguments
| :: AllocationStrategy | Buffer allocation strategy to use | 
| -> BuildStep a | 
 | 
| -> IO (ChunkIOStream a) | 
Convert a BuildStep to a ChunkIOStream stream by executing it on
 Buffers allocated according to the given AllocationStrategy.
ciosUnitToLazyByteString :: AllocationStrategy -> ByteString -> ChunkIOStream () -> ByteString Source #
Convert a ChunkIOStream ()ByteString using
 unsafeDupablePerformIO.
ciosToLazyByteString :: AllocationStrategy -> (a -> (b, ByteString)) -> ChunkIOStream a -> (b, ByteString) Source #
Convert a ChunkIOStream to a lazy tuple of the result and the written
 ByteString using unsafeDupablePerformIO.
Build signals and steps
data BuildSignal a Source #
BuildSignals abstract signals to the caller of a BuildStep. There are
 three signals: done, bufferFull, or 'insertChunks signals
type BuildStep a = BufferRange -> IO (BuildSignal a) Source #
BuildSteps may be called *multiple times* and they must not rise an
 async. exception.
finalBuildStep :: BuildStep () Source #
The final build step that returns the done signal.
Arguments
| :: Ptr Word8 | Next free byte in current  | 
| -> a | Computed value | 
| -> BuildSignal a | 
Signal that the current BuildStep is done and has computed a value.
Arguments
| :: Int | Minimal size of next  | 
| -> Ptr Word8 | Next free byte in current  | 
| -> BuildStep a | 
 | 
| -> BuildSignal a | 
Signal that the current buffer is full.
Arguments
| :: Ptr Word8 | Next free byte in current  | 
| -> ByteString | Chunk to insert. | 
| -> BuildStep a | 
 | 
| -> BuildSignal a | 
Signal that a ByteString chunk should be inserted directly.
Arguments
| :: BuildStep a | Build step to use for filling the  | 
| -> (Ptr Word8 -> a -> IO b) | Handling the  | 
| -> (Ptr Word8 -> Int -> BuildStep a -> IO b) | Handling the  | 
| -> (Ptr Word8 -> ByteString -> BuildStep a -> IO b) | Handling the  | 
| -> BufferRange | Buffer range to fill. | 
| -> IO b | Value computed while filling this  | 
Fill a BufferRange using a BuildStep.
The Builder monoid
Builders denote sequences of bytes.
 They are Monoids where
   mempty is the zero-length sequence and
   mappend is concatenation, which runs in O(1).
Arguments
| :: (forall r. BuildStep r -> BuildStep r) | A function that fills a  This function must be referentially transparent; i.e., calling it
 multiple times with equally sized  | 
| -> Builder | 
Arguments
| :: Builder | 
 | 
| -> BuildStep () | 
 | 
Run a Builder with the finalBuildStep.
Run a Builder.
Primitive combinators
ensureFree :: Int -> Builder Source #
ensureFree nn free bytes
 for the following Builder.
byteStringCopy :: ByteString -> Builder Source #
Construct a Builder that copies the strict ByteString.
Use this function to create Builders from smallish (<= 4kb)
 ByteStrings or if you need to guarantee that the ByteString is not
 shared with the chunks generated by the Builder.
byteStringInsert :: ByteString -> Builder Source #
Construct a Builder that always inserts the strict ByteString
 directly as a chunk.
This implies flushing the output buffer, even if it contains just
 a single byte. You should therefore use byteStringInsert only for large
 (> 8kb) ByteStrings. Otherwise, the generated chunks are too
 fragmented to be processed efficiently afterwards.
byteStringThreshold :: Int -> ByteString -> Builder Source #
Construct a Builder that copies the strict ByteStrings, if it is
 smaller than the treshold, and inserts it directly otherwise.
For example, byteStringThreshold 1024 copies strict ByteStrings whose size
 is less or equal to 1kb, and inserts them directly otherwise. This implies
 that the average chunk-size of the generated lazy ByteString may be as
 low as 513 bytes, as there could always be just a single byte between the
 directly inserted 1025 byte, strict ByteStrings.
lazyByteStringCopy :: ByteString -> Builder Source #
Construct a Builder that copies the lazy ByteString.
lazyByteStringInsert :: ByteString -> Builder Source #
Construct a Builder that inserts all chunks of the lazy ByteString
 directly.
lazyByteStringThreshold :: Int -> ByteString -> Builder Source #
Construct a Builder that uses the thresholding strategy of byteStringThreshold
 for each chunk of the lazy ByteString.
shortByteString :: ShortByteString -> Builder Source #
Construct a Builder that copies the ShortByteString.
maximalCopySize :: Int Source #
The maximal size of a ByteString that is copied.
 2 *  to guarantee that on average a chunk is of
 smallChunkSizesmallChunkSize.
byteString :: ByteString -> Builder Source #
Create a Builder denoting the same sequence of bytes as a strict
 ByteString.
 The Builder inserts large ByteStrings directly, but copies small ones
 to ensure that the generated chunks are large on average.
lazyByteString :: ByteString -> Builder Source #
Create a Builder denoting the same sequence of bytes as a lazy
 ByteString.
 The Builder inserts large chunks of the lazy ByteString directly,
 but copies small ones to ensure that the generated chunks are large on
 average.
Execution
Arguments
| :: AllocationStrategy | Buffer allocation strategy to use | 
| -> ByteString | Lazy  | 
| -> Builder | 
 | 
| -> ByteString | Resulting lazy  | 
Heavy inlining. Execute a Builder with custom execution parameters.
This function is inlined despite its heavy code-size to allow fusing with
 the allocation strategy. For example, the default Builder execution
 function toLazyByteString is defined as follows.
{-# NOINLINE toLazyByteString #-}
toLazyByteString =
  toLazyByteStringWith (safeStrategy smallChunkSize defaultChunkSize) L.empty
where L.empty is the zero-length lazy ByteString.
In most cases, the parameters used by toLazyByteString give good
 performance. A sub-performing case of toLazyByteString is executing short
 (<128 bytes) Builders. In this case, the allocation overhead for the first
 4kb buffer and the trimming cost dominate the cost of executing the
 Builder. You can avoid this problem using
toLazyByteStringWith (safeStrategy 128 smallChunkSize) L.empty
This reduces the allocation and trimming overhead, as all generated
 ByteStrings fit into the first buffer and there is no trimming
 required, if more than 64 bytes and less than 128 bytes are written.
data AllocationStrategy Source #
A buffer allocation strategy for executing Builders.
Arguments
| :: Int | Size of first buffer | 
| -> Int | Size of successive buffers | 
| -> AllocationStrategy | An allocation strategy that guarantees that at least half of the allocated memory is used for live data | 
Use this strategy for generating lazy ByteStrings whose chunks are
 likely to survive one garbage collection. This strategy trims buffers
 that are filled less than half in order to avoid spilling too much memory.
Arguments
| :: Int | Size of the first buffer | 
| -> Int | Size of successive buffers | 
| -> AllocationStrategy | An allocation strategy that does not trim any of the filled buffers before converting it to a chunk | 
Use this strategy for generating lazy ByteStrings whose chunks are
 discarded right after they are generated. For example, if you just generate
 them to write them to a network socket.
Arguments
| :: (Maybe (Buffer, Int) -> IO Buffer) | Buffer allocation function. If  | 
| -> Int | Default buffer size. | 
| -> (Int -> Int -> Bool) | A predicate  | 
| -> AllocationStrategy | 
Create a custom allocation strategy. See the code for safeStrategy and
 untrimmedStrategy for examples.
smallChunkSize :: Int Source #
The recommended chunk size. Currently set to 4k, less the memory management overhead
defaultChunkSize :: Int Source #
The chunk size used for I/O. Currently set to 32k, less the memory management overhead
chunkOverhead :: Int Source #
The memory management overhead. Currently this is tuned for GHC only.
The Put monad
A Put action denotes a computation of a value that writes a stream of
 bytes as a side-effect. Puts are strict in their side-effect; i.e., the
 stream of bytes will always be written before the computed value is
 returned.
Puts are a generalization of Builders. The typical use case is the
 implementation of an encoding that might fail (e.g., an interface to the
 zlib
 compression library or the conversion from Base64 encoded data to
 8-bit data). For a Builder, the only way to handle and report such a
 failure is ignore it or call error.  In contrast, Put actions are
 expressive enough to allow reporting and handling such a failure in a pure
 fashion.
Put ()Builders. The functions putBuilder
 and fromPut convert between these two types. Where possible, you should
 use Builders, as sequencing them is slightly cheaper than sequencing
 Puts because they do not carry around a computed value.
Arguments
| :: (forall r. (a -> BuildStep r) -> BuildStep r) | A function that fills a  This function must be referentially transparent; i.e., calling it
 multiple times with equally sized  | 
| -> Put a | 
Arguments
| :: Put a | Put to run | 
| -> BuildStep a | 
 | 
Run a Put.
Execution
Arguments
| :: Put a | 
 | 
| -> (a, ByteString) | Result and lazy  | 
Execute a Put and return the computed result and the bytes
 written during the computation as a lazy ByteString.
This function is strict in the computed result and lazy in the writing of the bytes. For example, given
infinitePut = sequence_ (repeat (putBuilder (word8 1))) >> return 0
evaluating the expression
fst $ putToLazyByteString infinitePut
does not terminate, while evaluating the expression
L.head $ snd $ putToLazyByteString infinitePut
does terminate and yields the value 1 :: Word8.
An illustrative example for these strictness properties is the implementation of Base64 decoding (http://en.wikipedia.org/wiki/Base64).
type DecodingState = ... decodeBase64 ::ByteString-> DecodingState ->Put(Maybe DecodingState) decodeBase64 = ...
The above function takes a strict ByteString supposed to represent
 Base64 encoded data and the current decoding state.
 It writes the decoded bytes as the side-effect of the Put and returns the
 new decoding state, if the decoding of all data in the ByteString was
 successful. The checking if the strict ByteString represents Base64
 encoded data and the actual decoding are fused. This makes the common case,
 where all data represents Base64 encoded data, more efficient. It also
 implies that all data must be decoded before the final decoding
 state can be returned. Puts are intended for implementing such fused
 checking and decoding/encoding, which is reflected in their strictness
 properties.
putToLazyByteStringWith Source #
Arguments
| :: AllocationStrategy | Buffer allocation strategy to use | 
| -> (a -> (b, ByteString)) | Continuation to use for computing the final result and the tail of its side-effect (the written bytes). | 
| -> Put a | 
 | 
| -> (b, ByteString) | Resulting lazy  | 
Execute a Put with a buffer-allocation strategy and a continuation. For
 example, putToLazyByteString is implemented as follows.
putToLazyByteString =putToLazyByteStringWith(safeStrategysmallChunkSizedefaultChunkSize) (x -> (x, L.empty))