evm-opcodes

This Haskell library provides opcode types for the Ethereum Virtual Machine (EVM).
The library has one parameterised type, Opcode' j where j is the annotation
for the jump-related instructions JUMP, JUMPI and JUMPDEST, and it has
three concrete variants:
type Opcode = Opcode' ()
type PositionalOpcode = Opcode' Word
type LabelledOpcode = Opcode' Text
The library has a fixpoint algorithm that translates labelled jumps into
positional jumps, and it has another function that translates those positional
jumps into plain EVM opcodes where a constant is pushed before a jump is made.
The constant that is pushed depends on the position of JUMPDEST which depends
on the constant that is pushed, hence the need for fixpoint iteration.
When pushing a constant to the stack, EVM uses push0, push1, push2, ...,
push32 where the number 0-32 refers to how many bytes the constant occupies.
Instead of having 32 unique push commands, this library has a single PUSH !Word256 constructor that serializes to the right push1, push2, etc.
Example
Imagine translating the following C program to EVM opcodes:
int x = 1;
while (x != 0) { x *= 2 };
Since EVM is stack-based, let's put x on the stack.
λ> import EVM.Opcode
λ> import EVM.Opcode.Labelled as L
λ> import EVM.Opcode.Positional as P
λ> let opcodes = [PUSH 1,JUMPDEST "loop",DUP1,ISZERO,JUMPI "end",PUSH 2,MUL,JUMP "loop",JUMPDEST "end"]
λ> L.translate opcodes
Right [PUSH 1,JUMPDEST 2,DUP1,ISZERO,JUMPI 14,PUSH 2,MUL,JUMP 2,JUMPDEST 14]
λ> P.translate <$> L.translate opcodes
Right [PUSH 1,JUMPDEST,DUP1,ISZERO,PUSH 14,JUMPI,PUSH 2,MUL,PUSH 2,JUMP,JUMPDEST]
λ> fmap opcodeText . P.translate <$> L.translate opcodes
Right ["push1 1","jumpdest","dup1","iszero","push1 14","jumpi","push1 2","mul","push1 2","jump","jumpdest"]
Accounts for size of PUSHes when doing absolute jumps
EVM's jump and jumpi instructions are parameterless. Instead they pop and
jump to the address on the top of the stack. In order to perform absolute jumps
in the code, it is necessary to PUSH an address on the stack first. This is
inconvenient, and so PositionalOpcode and LabelledOpcode are easier to use.
But what's more inconvenient is what happens to the offset of an absolute jump
when the address being jumped to crosses a boundary where its byte index can no
longer be represented by the same amount of bytes.
Take for example this EVM code:
0x00: push1 255
0x02: jump
0x03: stop
0x04: stop
0x05: stop
...
0xfe: stop
0xff: jumpdest
which can be represented with the following LabelledOpcode:
λ> import EVM.Opcode
λ> import EVM.Opcode.Labelled as L
λ> import EVM.Opcode.Positional as P
λ> let opcodes = [JUMP "skip"] <> replicate 252 STOP <> [JUMPDEST "skip"]
λ> fmap (fmap opcodeText . P.translate) (L.translate opcodes)
Right ["push1 255","jump","stop","stop","stop",...,"jumpdest"]
Note especially the byte size of a PUSH 255 vs. a PUSH 256:
λ> opcodeSize (PUSH 255)
2
λ> opcodeSize (PUSH 256)
3
Then add another one-byte opcode between the jump and the jumpdest:
λ> let opcodes = [JUMP "skip"] <> replicate 253 STOP <> [JUMPDEST "skip"]
λ> fmap (fmap opcodeText . P.translate) (L.translate opcodes)
Right ["push2 257","jump","stop","stop","stop",...,"jumpdest"]
Even though one byte was added, because the address of jumpdest is now
greater than 255, all references to it now take more than 2 bytes. Concretely,
one reference went from 2 bytes to 3 bytes, or rather, one JUMP "skip" became
a push2 257 instead of a push1 255. And if there were many such jumps,
this amounts to a bit of book-keeping.
This happens at subsequent boundaries as well. While this library handles each
boundary the same way, it is unlikely to have EVM bytecode of more than a few
kilobytes at present time.
λ> let opcodes = [JUMP "skip"] <> replicate 65532 STOP <> [JUMPDEST "skip"]
λ> fmap (fmap opcodeText . P.translate) (L.translate opcodes)
Right ["push3 65537","jump","stop","stop","stop",...,"jumpdest"]