# Miscellaneous notes and things to possibly do and such (This started out as a short file of simple entries, but then I went adding more notes to individual items, and it ended up what it is now.) ## Better node ID allocator I’ve currently implemented the simplest possible node ID allocator: auto-increment without reuse; for this, the client need only track a u32 of the next ID, and the VM side uses an ever-growing sparse array. I should compare it with the following more serious allocator: client and VM both track all allocated IDs in an array that will always be as large as the maximum number of IDs that have existed at any given time. The VM code will look quite like this: ◊code.ts` let nodes: (Node | u32)[] = [document]; let next_id: u32 = 1; function free(id) { nodes[id] = next_id; next_id = id; } function push_node(node) { if (next_id === nodes.length) nodes.push(next_id + 1); const id = next_id; next_id = nodes[id]; nodes[id] = node; return id; } ` And the client side like this: ◊code.rs` enum NodeSlot { Free { next_free_slot: NonZeroU32, }, Full, } struct Vm { nodes: Vec = vec![NodeSlot::Full], next_id: NonZeroU32 = 1, } impl Vm { fn free(&mut self, id: NonZeroU32) { self.nodes[id] = NodeSlot::Free { next_free_slot: self.next_id }; self.next_id = id; } fn allocate_node(&mut self) -> NodeId { if self.next_id == self.nodes.len() { self.nodes.push(NodeSlot::Free { next_free_slot: self.next_id + 1 }); } let id = self.next_id; self.next_id = match &mut self.nodes[id] { slot @ NodeSlot::Free { next_free_slot } => { *slot = NodeSlot::Full; next_free_slot }, NodeSlot::Full => unreachable!(), }; id } } ` If I ever decide that the client side needs to materialise an actual DOM-like tree (which I’m trying to avoid, but which may become desirable for event propagation), then this would become a fairly obvious choice (though it might need some finesse to keep size down—2 billion nodes ought to be enough for anyone, I could limit the next free slot numbers to 31-bit and use one bit as a discriminant over some kind of pointer to the actual node data). Otherwise, I honestly don’t know; the limit of only ever creating 2\³\² nodes in total is not too awful, and sparse arrays are probably fairly decent. Try to benchmark it plausibly. Try to keep it easy to swap in and out too. ## Terminology The **host** (also known as the **VM host**) is the bit of JavaScript that runs in the main browser thread, and executes the VM bytecode. The **client** (also known as the **VM client**) is the bit that writes the VM bytecode and potentially receives serialised events. In my primary designed use case, it runs in a worker and is WebAssembly, with ideally a SharedArrayBuffer or two as the channel between the client and host, but it could be written in any language and even run on a completely different machine. In fact, you could use this to implement something like Phoenix LiveView with a WebSocket as the channel—in that instance, the VM client would run on the server! (Mind you, you *would* need to be a bit careful about disconnect recovery.) ## Event handling Yes, the first part that’s not write-only. 1. By some means, the host will learn what events the client is interested in. (Qwik’s approach is interesting here: it writes the events names of interest to the DOM, and Qwik only even bothers registering for the ones that are *on screen*, IntersectionObserver and all. I don’t intend to *start* that way, but I’m keeping it in mind and definitely not ruling it out.) 2. There will also be some means of the host synchronously determining where `preventDefault` should be invoked. (The two main approaches that I have in mind: declare a JavaScript function that takes the target element and returns a boolean; or specify a DOM attribute so that you can check ◊code.js`target.closest('[prevent-default]')`.) 3. The host will add corresponding event listeners to the rendering root (which will probably not be ◊code.js`document` or ◊code.js`document.documentElement`). (It’d be quite at liberty to add and remove them dynamically, too.) 4. When an event is fired, the host will serialise it along with the target’s node ID. (The precise serialisation may vary by application, as with the instruction bytecode— for optimal size and performance, this project is a VM factory, not a VM.) 5. The client will deserialise this and can do what it likes with it, which can, if desired, entail a total reimplementation of event dispatch except for `preventDefault`. (It actually doesn’t even need a full tree, only a mapping from node to parent (which doesn’t even take more memory, given the reallocater: limit it to ◊`2³¹ − 1` node IDs, and store the parent ◊code.rs`Option` in ◊code.rs`NodeSlot::Full`): trace the path to the root node, fire capture phase while unwinding that, then trace it again for bubbling.) (In practice, I’m not decided on the precise event dispatch model. There is some appeal to widget-based dispatch like Druid’s, where propagation is entirely left to the component.) With the proposed node → parent mapping, the client side will still not be able to enumerate their children, a single “remove node” instruction (… huh, haven’t implemented that yet) may need to be followed be a large number of “free” instructions, because I expect that’ll still be the cheapest way of implementing it. I’ll still want a “remove all children” opcode, too, because for nodes with many children its performance is better (though I should benchmark that at length in various realistic circumstances). ## The race condition problem It sounds like everything’s neat and sorted at this point, but there’s one rather uncomfortable problem remaining: race conditions. This architecture is asynchronous, so the host could send an event for a node that has been removed or moved. *Most* of the time, these race conditions should be harmless, because root-finding on a removed node will fail due to a parentless or freed node, or event semantics are probably equivalent on a moved node in its before and after locations. But both could be bad: if a node is removed and freed, that ID may already have been reallocated, and so the event could be fired on the wrong node altogether. The sorts of checks that would be required to catch that (some kind of DOM epoch tracking) would be onerous and usually wasteful. Moved could also be bad: in a lazy-rendered scrolling component, nodes are commonly reused so that the same node ID. In each case, the problem is rendered extremely uncommon by intended very low latency between client and host, and mitigated further by the unlikelihood of user events so very immediately following UI state change, but I’d find it a genuine concern for something like LiveView on high latency connections because that easily gives a problem window *baseline* of hundreds of milliseconds, and probably want to do event dispatch by something other than node ID, e.g. serialise any event-data DOM attributes from the event target’s lineage. ## Limitations No dynamic `preventDefault`. No rewriting of an ◊code.css`a[href]` in the mousedown or click handler, which can occasionally be genuinely useful (and I don’t count Google’s historical use of it in search result pages), and `preventDefault()` followed by href change followed by `click()` is not a suitable alternative because it’ll lose the effect of modifiers like ◊kbd`Ctrl`. This will be one of those “redesign your UI” cases, which will I think *normally* be a good thing. One concern of doing this asynchronously is that user-activation-gated API calls might not work. has the spec details. We should be OK now (though when they first started doing this we probably *wouldn’t* have been): it’s time-based, and although the transient activation duration is not specified and not supposed to be more than a few seconds, that should be enough for conceptually-relatively-synchronous actions even if the VM client is on the other side of the world. (In Firefox as I write this on 2021-09-23, it’s controlled by dom.user_activation.transient.timeout which defaults to 5000 milliseconds.) Might have issues calling things that are only allowed to be called in response to a user action—not certain. But per , probably OK. ## Opcodes to implement • `Remove { node: ChildNode }`: equivalent to `node.remove()`. Does *not* free: you might want to put it back later. ## Other potential opcodes (But they need justification, whether of functionality or performance.) • `AppendChildren { parent: ParentNode, number_of_children: u32, children: [...Node] }`: equivalent to `parent.append(...children)`. Compare performance with a sequence of `appendChild`. • `AppendUntrackedString { parent: ParentNode, data: string }`: equivalent to CreateTextNode followed by AppendChild, but without allocating a node ID. The host can use `parent.append(data)` instead of `parent.appendChild(document.createTextNode(data))`. Very unlikely to be warranted, but `AppendChildren` got me thinking of it; I don’t think there’d be any sane way of integrating untracked string children into `AppendChildren`. • `RemoveAllChildren { node: ParentNode }`: removes all children; equivalent to `node.textContent = ""` or any number of ways of doing the same thing. (Doesn’t *free* them, though.) • `SetUntrackedInnerHtml { parent: ParentNode, inner_html: string }`: sets innerHTML. The resulting child nodes will be invisible to both client and host. At first glance this might seem to complicate event handling for the host because now it may need to traverse target’s parents until it finds one with a node ID, but browser extensions mean that we could never quite trust that all targets would have node IDs anyway, so making it more robust in this way is probably for the best. This could be justified as a performance thing for what I’ll call “static trees”, meaning parts of components that don’t contain anything where node identification is required (though performance claims will naturally need to be verified). In whatever sort of component layer my own client implementations end up with, this could really start to shine if components could be identified as being static trees, and allow such components to be part of static trees, and merge them recursively— I don’t know of any framework where components are able to be zero-cost like this, even in environments that try to be at least a little bit clever about it (I’m talking DOM ones; SSR ones there are plenty that directly write strings to a buffer so that the difference will be negligible). Any tracked children removed by this operation would still be allocated and need freeing. • `SetTextContent { node: Node, data: string }`: sets `textContent`. TODO: verify performance on setting `data` and `textContent` on `CharacterData`, as they do exactly the same thing for `CharacterData` and maybe I should remove `SetData` in favour of `SetTextContent`. If this were implemented, I’d probably also ditch plans for `RemoveAllChildren` in favour of `SetTextContent { node, inner_html: "" }` as saving four bytes on the instruction isn’t worth the cost to the VM size. • `SetTrackedInnerHtml { parent: ParentNode, inner_html: string }`: sets innerHTML and then traverses the children recursively in tree order, assigning a node ID to each. (The client must naturally have done something equivalent.) It’s *possible* that this could be faster than the corresponding sequence of `createElement`/`createTextNode`/`appendChild` calls, but I think it’d need to be quite big before it’d stand much chance of being worthwhilely better. As far as wire size is concerned, the bytecode is very compact, so serialised HTML would be unnlikely to be smaller. I’d levy a pretty big burden of proof on this one. • `InsertTrackedAdjacentHtml { parent: ParentNode, position: enum { BeforeBegin, AfterBegin, BeforeEnd, AfterEnd }, html: string }`: `SetTrackedInnerHtml`, but for `insertAdjacentHTML` instead of `innerHTML`. A smidgeon more complicated. ## On dictionaries and string interning I doubt that a dynamic dictionary will be worthwhile, though I could turn out to be wrong. But a static dictionary, absolutely: I like the idea of applying profile-guided optimisation. Definitely going to need to generate the remote-dom-vm JS and client together for this.