Skip to content

Abstractions for user-controlled mapping of u8 and u8 sequences/collections to Solidity types #2428

@davidsemakula

Description

@davidsemakula

Primer

Is &[u8] a byte slice or an 8-bit unsigned integer slice? what about Vec<8>?

In Rust/ink!, there isn't any semantic difference.
This can be seen in various <from/to/into/as>_bytes conversions (including from the standard library) that either take or return some u8 sequence/collection (e.g. String::as_bytes).

NOTE: Newtype wrappers are commonly used to add higher-level semantics to sequences/collections of bytes/u8s, but fundamentally, these are zero-cost abstractions that are cheap/free to convert back into the underlying sequences/collections (e.g. struct Bytes(pub Vec<u8>)).

However, unlike Rust/ink!, Solidity has primitives for both 8-bit unsigned integers (i.e. uint8), and bytes sequences (i.e. bytes and bytes1, bytes2 ... bytes32).
As such, there are meaningful differences between how bytes (and bytes<N>) sequences, and uint8 sequences (i.e. fixed-size and dynamic uint8 arrays) are represented/encoded in calldata (i.e. Solidity ABI encoding), but also in memory.

One meaningful difference for us is that in Solidity ABI encoding, bytes (i.e. dynamic) and bytes<N> (i.e. fixed-sized) arrays are packed, while uint8[] (i.e. dynamic) and uint8[N] (i.e. fixed-sized) are not.
As a concrete example, bytes32 is encoded/packed into a 32 byte sequence (i.e. a single word) in Solidity calldata (and memory), while uint8[32] is encoded into a 1024 byte sequence (i.e. 32 words, with 32 bytes used for each element).

NOTE: abi.encodePacked() is not part of the Solidity ABI spec, so it's not a transparent optimization (i.e. interacting contracts have to be aware of it usage).

Takeaways

  • For Rust/ink!, byte and 8-bit unsigned integer sequences are semantically equivalent
  • Representation/encoding differences only matter at the Solidity ABI/interoperability boundary

Goals

  • Because byte and 8-bit unsigned integer sequences are semantically equivalent in Rust/ink!, any abstractions over them should be zero-cost (i.e. it should be cheap/free to convert between any two semantically equivalent representations)
  • It should be possible for ink! smart contract authors to keep abstracts at the interface boundary (i.e. only deal with close to the signature of an ink! message and mostly ignore them body)

Design

The default mappings will be as follows:

  • u8 is mapped to uint8
  • u8 sequences/collections are mapped to equivalent Solidity fixed-size arrays (i.e. uint8[N] where N is the array size) and dynamic arrays (i.e. uint8[])

We then introduce a newtype wrapper AsBytes<T> that:

  • Can only be applied to u8 sequences/collections with equivalent Solidity byte types (enforced by a sealed trait bound on T)
  • Encapsulates logic for encoding/decoding u8 sequences/collections as their Solidity bytes types (e.g. AsBytes<[u8; 32]> is mapped to bytes32)
  • Implements core/standard traits for cheaply/freely using/passing the wrapper type in place of the underlying type (e.g. Deref, Borrow, AsRef e.t.c)

This is roughly translates to:

ink! <=> Solidity AsBytes<[u8; 1]> == bytes1 AsBytes<[u8; 2]> == bytes2 ... AsBytes<[u8; 32]> == bytes32 AsBytes<Vec<u8>> == bytes 

This allows ink! developers to largely keep the AsBytes<T> wrapper at the interface boundary (e.g. an ink! message with AsBytes<[u8; 32]> input and output types can immediately deref the input to [u8; 32] and deal with only the underlying type in the function body, and then cheaply wrap it back up to AsBytes<[u8;32]> at the return place).

Follow ups/Updates

Alternatives

The following alternatives were considered and rejected/abandoned:

  • Introducing a Byte type as a semantically distinct u8 equivalent was abandoned because conversions from Byte sequences/collections to u8 equivalents would be expensive
  • Mapping u8 to uint8, and u8 sequences/collections to bytes and bytesN equivalents by default, and then providing two wrappers (e.g. AsBytes<T> and AsInt<T>) to override the preferred representation/encoding was abandoned due to:
    • Perceived higher user-level cognitive load (e.g. u8 would map to uint8 by default, while [u8; N] for 1 <= N <= 32 would map to bytes<N> by default, and [u8; N] for N > 32 would always map to uint8[N])
    • Generic implementation complexity due to Rust's limited support for specialization

Metadata

Metadata

Assignees

Labels

B-designDesigning a new component, interface or functionality.E-in-progressA task that is already being worked on.

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions