# remote-dom-vm bytecode format description

    **Précis:**
        write-only instruction set;
        single byte opcodes;
        variable-width instructions;
        big-endian;
        UTF-8 strings prefixed by 32-bit length.

## Types

    ### u32 (unsigned 32-bit integers)

        Four bytes in big-endian order.

            ◊code`
              0   1   2   3  
            ├───┴───┴───┴───┤
            │ number        │
            └───────────────┘
            `

        As an example, the bytes \[0x12, 0x34, 0x56, 0x78\] represent the number 0x12345678 (305,419,896₁₀).

    ### String

            ◊code`
              0   1   2   3   ...
            ├───┴───┴───┴───┼───────────────────
            │ length: u32   │ ... data
            └───────────────┴───────────────────
            `

        A string is represented as its length in UTF-8 code units as a u32,
        followed by the UTF-8 code points.

        Although JavaScript strings allow unmatched surrogate code points,
        this layer only allows legal Unicode,
        for performance, simplicity and brevity of implementation.

        As a 32-bit unsigned integer, a string’s length lies in the range ◊`[0, 2³²)`.
        It is not possible to represent strings greater than 4,294,967,295 UTF-8 code units long.
        If you feel a pressing need to put strings even one *hundredth* of this size into the DOM, you should rethink things.

    ### NodeId

        DOM nodes (regardless of subtype, whether document, element, text or comment) are identified by an unsigned 32-bit integer.
        NodeId 0 is currently reserved for the document being operated upon.
        (This may change in the future, to a single VM host supporting working with multiple documents.)
        Certain instructions (opcodes 0–4) reserve a new NodeId.
        The client and host agree at compile time which of the following NodeId allocation schemes will be used,
        so that they can both calculate what the next NodeId will be,
        so that the VM can be write-only.

## NodeId allocation schemes

    ### Simplistic

        Each new NodeId is one greater than the previous NodeId,
        with no reuse of past NodeIds.

          • Excellent for static or almost-static pages,
            having the lowest baseline memory usage and highest performance.

          • Cannot create more than 2\³\² nodes in total.
            In practice, this is not a serious concern:
            if you create ten thousand nodes per second,
            which would suggest you were doing something wildly wrong,
            it still takes five days to exhaust this.
            (In practice, the memory usage thing will crash your page long before this.)

          • Memory usage is likely to be proportional to the number of nodes ever created,
            so heavily dynamic pages will use more memory than they should.
            (On the VM host side, it will become a sparse array,
            and the JavaScript engine will probably optimise it so,
            leaving memory usage proportional to number of current nodes.
            But on the client side, this is unlikely to be the case,
            the node map probably being something like ◊code.rs`Vec<Option<Node>>`,
            and so it’ll probably be using at least four bytes per node ever allocated.
            This, incidentally, will cause an out-of-memory crash much earlier,
            probably by no more than a billion nodes.)

    ### Compact

        NodeIds are reused in most-recently-freed order,
        and if there are no freed NodeIds remaining then a new NodeId is minted,
        one higher than the previous highest.

        (The implementation of this is straightforward,
        but a prose description cumbersome;
        look at the code if you want more.
        It involves leaving tombstones and swapping next NodeIds.)

          • Somewhat higher baseline memory per node,
            and probably very slightly lower performance.

          • Memory usage is proportional to the largest number of nodes that existed at one time.

## Instructions

    The first byte of an instruction is the opcode.
    After that, instruction widths vary.

    ### 0: CreateElement

          • document (NodeId of a Document in JavaScript-land)
          • tag_name (string)

        Allocates a new NodeId, corresponding to an HTMLElement in JavaScript-land.

    ### 1: CreateSvgElement

          • document (NodeId of a Document in JavaScript-land)
          • tag_name (string)

        Allocates a new NodeId, corresponding to an SVGElement in JavaScript-land.

    ### 2: CreateTextNode

          • document (NodeId of a Document in JavaScript-land)
          • data (string)

        Allocates a new NodeId, corresponding to a Text in JavaScript-land.

    ### 3: CreateComment

          • document (NodeId of a Document in JavaScript-land)
          • data (string)

        Allocates a new NodeId, corresponding to a Comment in JavaScript-land.

    ### 4: CreateDocumentFragment

          • document (NodeId of a Document in JavaScript-land)

        Allocates a new NodeId, corresponding to a DocumentFragment in JavaScript-land.

    ### 5: SetData

          • node (NodeId of a CharacterData in JavaScript-land, meaning a Text or a Comment)
          • data (string)

    ### 6: SetAttribute

          • element (NodeId of an Element in JavaScript-land)
          • name (string)
          • value (string)

    ### 7: RemoveAttribute

          • element (NodeId of an Element in JavaScript-land)
          • name (string)

    ### 8: AppendChild

          • parent (NodeId of an Element in JavaScript-land)
          • new_child (NodeId, typically of a DocumentFragment, Element, Text or Comment)

    ### 9: InsertBefore

          • parent (NodeId of an Element in JavaScript-land)
          • reference_child (NodeId, typically of a Element, Text or Comment)
          • new_child (NodeId, typically of a DocumentFragment, Element, Text or Comment)

    ### 10: Free

          • node (NodeId of any Node)

## Error handling

    Current implementations assume correct usage and are not robust against incorrect input.
    Behaviour in the presence of incorrect input is undefined.

    Client and VM implementations may coordinate on which instructions to support;
    for example, on an app that never creates any SVG,
    the code implementing opcode 1 (CreateSvgElement) may be removed;
    calling it would then produce undefined behaviour.