--- layout: post title: "Text Bug" date: 2014-03-01 comments: true external-url: author: Eric Seidel published: false categories: benchmarks, text demo: TextBug.hs --- For our last post on `text`, we return to the topic of building a new `Text` value, i.e. proving the safety of write operations. <!-- more --> <div class="hidden"> \begin{code} {-# LANGUAGE BangPatterns, CPP, MagicHash, Rank2Types, RecordWildCards, UnboxedTuples, ExistentialQuantification #-} {-@ LIQUID "--no-termination" @-} module TextBug where import Control.Monad.ST.Unsafe (unsafeIOToST) import Foreign.C.Types (CSize) import GHC.Base hiding (unsafeChr) import GHC.ST import GHC.Word (Word16(..)) import Data.Bits hiding (shiftL) import Data.Word import Language.Haskell.Liquid.Prelude -------------------------------------------------------------------------------- --- From TextInternal -------------------------------------------------------------------------------- {-@ shiftL :: i:Nat -> n:Nat -> {v:Nat | ((n = 1) => (v = (i * 2)))} @-} shiftL :: Int -> Int -> Int shiftL = undefined -- (I# x#) (I# i#) = I# (x# `iShiftL#` i#) {-@ measure isUnknown :: Size -> Prop isUnknown (Exact n) = false isUnknown (Max n) = false isUnknown (Unknown) = true @-} {-@ measure getSize :: Size -> Int getSize (Exact n) = n getSize (Max n) = n @-} {-@ invariant {v:Size | (getSize v) >= 0} @-} data Size = Exact {-# UNPACK #-} !Int -- ^ Exact size. | Max {-# UNPACK #-} !Int -- ^ Upper bound on size. | Unknown -- ^ Unknown size. deriving (Eq, Show) {-@ upperBound :: k:Nat -> s:Size -> {v:Nat | v = ((isUnknown s) ? k : (getSize s))} @-} upperBound :: Int -> Size -> Int upperBound _ (Exact n) = n upperBound _ (Max n) = n upperBound k _ = k data Step s a = Done | Skip !s | Yield !a !s data Stream a = forall s. Stream (s -> Step s a) -- stepper function !s -- current state !Size -- size hint {-@ data Array = Array { aBA :: ByteArray# , aLen :: Nat } @-} data Array = Array { aBA :: ByteArray# , aLen :: !Int } {-@ measure alen :: Array -> Int alen (Array a n) = n @-} {-@ aLen :: a:Array -> {v:Nat | v = (alen a)} @-} {-@ type ArrayN N = {v:Array | (alen v) = N} @-} {-@ type AValidI A = {v:Nat | v < (alen A)} @-} {-@ type AValidO A = {v:Nat | v <= (alen A)} @-} {-@ type AValidL O A = {v:Nat | (v+O) <= (alen A)} @-} {-@ data MArray s = MArray { maBA :: MutableByteArray# s , maLen :: Nat } @-} data MArray s = MArray { maBA :: MutableByteArray# s , maLen :: !Int } {-@ measure malen :: MArray s -> Int malen (MArray a n) = n @-} {-@ maLen :: a:MArray s -> {v:Nat | v = (malen a)} @-} {-@ type MArrayN s N = {v:MArray s | (malen v) = N} @-} {-@ type MAValidI MA = {v:Nat | v < (malen MA)} @-} {-@ type MAValidO MA = {v:Nat | v <= (malen MA)} @-} {-@ new :: forall s. n:Nat -> ST s (MArrayN s n) @-} new :: forall s. Int -> ST s (MArray s) new n | n < 0 || n .&. highBit /= 0 = error $ "Data.Text.Array.new: size overflow" | otherwise = ST $ \s1# -> case newByteArray# len# s1# of (# s2#, marr# #) -> (# s2#, MArray marr# n #) where !(I# len#) = bytesInArray n highBit = maxBound `xor` (maxBound `shiftR` 1) bytesInArray n = n `shiftL` 1 {-@ unsafeWrite :: ma:MArray s -> MAValidI ma -> Word16 -> ST s () @-} unsafeWrite :: MArray s -> Int -> Word16 -> ST s () unsafeWrite MArray{..} i@(I# i#) (W16# e#) | i < 0 || i >= maLen = liquidError "out of bounds" | otherwise = ST $ \s1# -> case writeWord16Array# maBA i# e# s1# of s2# -> (# s2#, () #) {-@ copyM :: dest:MArray s -> didx:MAValidO dest -> src:MArray s -> sidx:MAValidO src -> {v:Nat | (((didx + v) <= (malen dest)) && ((sidx + v) <= (malen src)))} -> ST s () @-} copyM :: MArray s -- ^ Destination -> Int -- ^ Destination offset -> MArray s -- ^ Source -> Int -- ^ Source offset -> Int -- ^ Count -> ST s () copyM dest didx src sidx count | count <= 0 = return () | otherwise = liquidAssert (sidx + count <= maLen src) . liquidAssert (didx + count <= maLen dest) . unsafeIOToST $ memcpyM (maBA dest) (fromIntegral didx) (maBA src) (fromIntegral sidx) (fromIntegral count) {-@ memcpyM :: MutableByteArray# s -> CSize -> MutableByteArray# s -> CSize -> CSize -> IO () @-} memcpyM :: MutableByteArray# s -> CSize -> MutableByteArray# s -> CSize -> CSize -> IO () memcpyM = undefined {-@ unsafeFreeze :: ma:MArray s -> ST s (ArrayN (malen ma)) @-} unsafeFreeze :: MArray s -> ST s Array unsafeFreeze MArray{..} = ST $ \s# -> (# s#, Array (unsafeCoerce# maBA) maLen #) {-@ unsafeIndex :: a:Array -> AValidI a -> Word16 @-} unsafeIndex :: Array -> Int -> Word16 unsafeIndex Array{..} i@(I# i#) | i < 0 || i >= aLen = liquidError "out of bounds" | otherwise = case indexWord16Array# aBA i# of r# -> (W16# r#) data Text = Text Array Int Int {-@ data Text [tlen] = Text (tarr :: Array) (toff :: TValidO tarr) (tlen :: TValidL toff tarr) @-} {-@ type TValidI T = {v:Nat | v < (tlen T)} @-} {-@ type TValidO A = {v:Nat | v <= (alen A)} @-} {-@ type TValidL O A = {v:Nat | (v+O) <= (alen A)} @-} -------------------------------------------------------------------------------- --- From TextRead -------------------------------------------------------------------------------- {-@ measure numchars :: Array -> Int -> Int -> Int @-} {-@ measure tlength :: Text -> Int @-} {-@ invariant {v:Text | (tlength v) = (numchars (tarr v) (toff v) (tlen v))} @-} -------------------------------------------------------------------------------- --- From TextWrite -------------------------------------------------------------------------------- {-@ qualif Ord(v:int, i:int, x:Char) : ((((ord x) < 65536) => (v = i)) && (((ord x) >= 65536) => (v = (i + 1)))) @-} {-@ predicate Room C MA I = (((One C) => (RoomN 1 MA I)) && ((Two C) => (RoomN 2 MA I))) @-} {-@ predicate RoomN N MA I = (I+N <= (malen MA)) @-} {-@ measure ord :: Char -> Int @-} {-@ ord :: c:Char -> {v:Int | v = (ord c)} @-} {-@ predicate One C = ((ord C) < 65536) @-} {-@ predicate Two C = ((ord C) >= 65536) @-} {-@ writeChar :: ma:MArray s -> i:Nat -> {v:Char | (Room v ma i)} -> ST s {v:Nat | (RoomN v ma i)} @-} writeChar :: MArray s -> Int -> Char -> ST s Int writeChar marr i c | n < 0x10000 = do unsafeWrite marr i (fromIntegral n) return 1 | otherwise = do unsafeWrite marr i lo unsafeWrite marr (i+1) hi return 2 where n = ord c m = n - 0x10000 lo = fromIntegral $ (m `shiftR` 10) + 0xD800 hi = fromIntegral $ (m .&. 0x3FF) + 0xDC00 -------------------------------------------------------------------------------- --- Helpers -------------------------------------------------------------------------------- {-@ qualif MALenLE(v:int, a:MArray s): v <= (malen a) @-} {-@ qualif ALenLE(v:int, a:Array): v <= (alen a) @-} \end{code} </div> Let's take a look at `mapAccumL`, which combines a map and a fold over a `Stream` and bundles the result of the map into a new `Text`. Again, we'll want to focus our attention on the `Yield` case of the inner loop. \begin{code} mapAccumL f z0 (Stream next0 s0 len) = (nz, Text na 0 nl) where mlen = upperBound 4 len (na,(nz,nl)) = runST $ do (marr,x) <- (new mlen >>= \arr -> outer arr mlen z0 s0 0) arr <- unsafeFreeze marr return (arr,x) outer arr top = loop where loop !z !s !i = case next0 s of Done -> return (arr, (z,i)) Skip s' -> loop z s' i Yield x s' | j >= top -> do let top' = (top + 1) `shiftL` 1 arr' <- new top' copyM arr' 0 arr 0 top outer arr' top' z s i | otherwise -> do let (z',c) = f z x d <- writeChar arr i c loop z' s' (i+d) where j | ord x < 0x10000 = i | otherwise = i + 1 \end{code} If you recall `unstream` from last time, you'll notice that this loop body looks almost identical to the one found in `unstream`, but LiquidHaskell has flagged the `writeChar` call as unsafe! What's going on here? Let's take a look at `j`, recalling that it carried a crucial part of the safety \begin{code} proof last time, and see what LiquidHaskell was able to infer. {v:Int | ((ord x >= 65536) => (v == i+1)) && ((ord x < 65536) => (v == i))} \end{code} Well that's not very useful at all! LiquidHaskell can prove that it's safe to write `x` but here we are trying to write `c` into the array. This is actually a *good* thing though, because `c` is the result of calling an arbitrary function `f` on `x`! We haven't constrained `f` in any way, so it could easily return a character above `U+10000` given any input. So `mapAccumL` is actually *unsafe*, and our first wild bug caught by LiquidHaskell! The fix is luckily easy, we simply have to lift the `let (z',c) = f z x` binder into the `where` clause, and change `j` to depend on `ord c` instead. \begin{code} mapAccumL' f z0 (Stream next0 s0 len) = (nz, Text na 0 nl) where mlen = upperBound 4 len (na,(nz,nl)) = runST $ do (marr,x) <- (new mlen >>= \arr -> outer arr mlen z0 s0 0) arr <- unsafeFreeze marr return (arr,x) outer arr top = loop where loop !z !s !i = case next0 s of Done -> return (arr, (z,i)) Skip s' -> loop z s' i Yield x s' | j >= top -> do let top' = (top + 1) `shiftL` 1 arr' <- new top' copyM arr' 0 arr 0 top outer arr' top' z s i | otherwise -> do d <- writeChar arr i c loop z' s' (i+d) where (z',c) = f z x j | ord c < 0x10000 = i | otherwise = i + 1 \end{code} LiquidHaskell happily accepts our revised `mapAccumL`, as did the `text` maintainers. We hope you've enjoyed this whirlwind tour of using LiquidHaskell to verify production code, we have many more examples in the `benchmarks` folder of our GitHub repository for the intrepid reader.