"How can decentralized apps work well even with 5-10 second blockchain latency?"
This was a question posed by Vitalik Buterin to the community recently, and it hits upon a topic I'd been thinking about for a while now too. The Ethereum blockchain has been described as a global supercomputer, but looking purely in terms of hardware specs, this computer's not all that impressive (slow clock speed, expensive storage, etc.). That leads to the issue where if you submit a command, and then make the user wait for confirmation, they may be staring at a loading screen for ten or more seconds, which is not how users want to interact with an application.
The Web 2.0 standard
The problem I see is that in the user experience of web applications, we've arrived at a point where "Web 2.0" applications have a front end that has dynamic logic (Javascript, mostly) that runs on the visitor's own browser, and makes network calls periodically to some web endpoint that saves and restores data, which has built up a certain type of expectation with users about how the app will work. We've come to expect computers to be snappy enough that the call to the back-end service should be less than a second for most actions. Web UI toolkits and best practices have embraced that, so the user workflow for saving data (Take adding a to-do item to a list as an example) typically looks something like this:
- User types the description of the to-do into a text box and clicks "Add"
- The Add button itself becomes disabled (can't click it again), as well as the text box for description (can't change its contents)
- A "Loading..." message/animation pops up
- A call is made to the back-end to add the record
- The response is received from the back-end
- If successful, the UI adds the new to-do item to the rendered page
- The description field and Add button are re-enabled and the "Loading..." indicator hidden
With that sort of interaction, the front-end view of the data is always in-step with reality. The to-do item doesn't get added to the page until the back-end responds that it has successfully stored that new record and we can be certain the new "reality" includes that new item. Because the saving process takes a short time, this process works, from a user perspective. But if the back-end is saving to a blockchain, the "response is received from the back-end" step could take 10 seconds or more, and that makes it feel really laggy on the user. Aside from making that lag quicker, there are a few things that can be done to alter users expectations when dealing with a blockchain-backed app, that some frameworks have already been using.
Non-immediate interactions:
Mobile games
Mobile games that follow the "city builder" or "war game" or "build up a tribe" sort of gameplay have many different interactions that a user may fire off very rapidly with clicks in the game (harvesting from all their fields of crops one after the other, collecting rent from all their buildings by tapping each one, etc.). It would be infeasible for the game to make a call to the back-end servers that keep the game state for every single transaction; even a one second wait between each of those actions would feel slow to the users, and with thousands of users interacting at once, the back-end servers would need to handle lots of responses at once, and that would suck up massive amounts of bandwidth for clients (who are likely on a phone data plan and wouldn't appreciate that).
So, instead, those sort of games typically have a journal/log of actions that have been taken that is kept locally on the phone and only periodically syncs back to the server. The impact this has on a user you might have experienced if you had a network connection drop in the middle of playing a game, and had to quit the game and re-launch, only to find your last few actions had been undone (the items that were in the phone's local log didn't get back to the server and become "reality" for the next time the application fetched the game state). This process works to keep the game interface snappy, and as long as the back-end transactions catch up eventually (possibly sycing while your phone is in standby after you're done actively playing the game), the experience works.
Meteor
The Meteor development platform takes the idea of an application running in Javascript and runs it both in the client's browser (which runs Javascript natively) and in the server (using Node). The developer writes the logic once, and then the Meteor compiler splits the logic into client and server parts. Then, when a user is interacting with an application, the client-side is able to make a pretty good guess at what will happen as a result of a certain action (adding a to-do item to a list will cause it to show up in the rendered page), and will do the action before the back-end responds. If the back-end responds with something the front-end didn't expect, the front-end's actions are undone and the back-end's logic wins.
This sort of implementation causes a very snappy front-end for the user, where they never need to see a "Loading..." indicator, but if something goes wrong, can cause a jarring experience if items appear or disappear unexpectedly (e.g. a to-do item is added and shows in the list for three seconds, and then suddenly dissappears), it can cause confusion (which the developer needs to defensively code against, to provide some sort of notification), but for the most part it helps make a very performant-feeling application.
One of the older pieces of the Internet technology ecosystem: desktop email clients. These applications come with them the expectation that they are a local store of something that exists in a remote place. Email clients typically offer configuration parameters to the user that include whether they should auto-fetch from the server (and how frequently) or just wait for manual sync actions from the user. Most email clients have a status bar area that shows the current state of the sync to the remote server (offline, actively sending/receiving, idle, and some indicator of how long it has been in that state).
The expectation has been well-established with users interacting with email that pressing the "Send" button on an email window doesn't lock the whole UI until the recipient gets the message. The message first goes to an "Outbox", where it might sit for a while (if the network connection is down), and once it is successfully delivered, will get moved to a "Sent" folder.
Emerging as Web 3.0
All these examples have in common the concept of a local journal/log/queue of items, and a process to compare those items with a remote listing. Meteor is a very slick framework that handles the sync portion of it for the developer, but the front- and back-ends need to be written in Javascript, which means a centralized back-end (running Node). The goal for DApps in the Web 3.0 ecosystem is to have the blockchain be the back-end, with no centralized dependency, so that framework as-is isn't ideal.
The web3js
toolchain exists as a communication tool for interacting with Ethereum nodes, either remotely or as part of a browser extension (e.g. Metamask), but it doesn't have any built-in options for creating queues or delayed batches of transactions. What I think the ecosystem needs is a modular library for the front-end that takes the web3js
mechanism as the "action" part of the process, and sticks a job queueing and status interface in front of it. Basically something to serve as an "outbox" (as an email client has), and options to query the length of the queue, read individual items off the queue, edit and delete jobs that are waiting to go out, and trigger the processing of the queue. Jobs in the queue would need to be fully ready-to-go actions (e.g. already-signed transactions), so they could operate completely independently of a human sitting in front of them. The ability to edit/replace items in the queue would allow for interactions with applications like counterfactual state channels (where you only need the most-recent state update transaction to be held) by updating the queued item with the most recent state. That much of the concept could be implemented now, using existing technologies, and start getting users thinking more like "an email outbox" view of their data/transactions.
But, in addition to just being a queue, the list of pending transactions needs to act as an extension to the real Ethereum blockchain state. For example, making a call to an ERC20 contract's balanceOf(_who)
function should get the current balance of the address passed (_who
). Querying the state as of the most recent block is accurate, as long as there's nothing in the local queue that could potentially change that value. Knowing the typical interactions of an ERC20 token, it would be transfer
and transferFrom
functions that would change the result of balanceOf
calls, though there may be other custom functions that change balances for that token beyond the ERC20 standard that would need to be taken into account. Having the queue able to virtually extend the blockchain state would allow uses like the mobile game example to "work ahead" of the blockchain, performing actions faster than new blocks could keep up, and retroactively changing things if new blocks are added to the main chain that contradict what it thought was its best guess (this could bring us back to the era of needing to fight "lag" in online games. For example, if you were trying to play a blockchain based live-action multiplayer game, and you take several steps forward and shoot towards an enemy, your client could render that out for you as if it were reality, but when the next block gets confirmed (recording your first step forward), it's revealed that one of your opponents targeted you with a freeze spell, and you cannot move, so your movement transactions are canceled and you snap backward to where you were). This sort of solution wouldn't help with those sort of "live action"-speed needs, but for more traditional web-based interactions would help keep the UX moving.
To know if a pending transaction would modify the output of that balanceOf()
call, the whole Ethereum Virtual Machine (EVM) would need to be emulated, and all the pending items in the queue processed as if they were part of virtual blocks on top of the last confirmed block. There are several implementations of the EVM in Javascript that have already been implemented. Some are used as part of the Remix and EthFiddle online IDEs. The Truffle framework has Ganache (formerly TestRPC), but that is a standalone application, so wouldn't work well for web-based DApps. There are implementations like ethereumjs-vm
that implement the EVM logic and act as a module to include in other applications, but still those VMs are initialized as blank, empty, new blockchains (essentially a private clone of the Ethereum blockchain, with no blocks in it yet), so there's still the issue of how to get that emulated EVM into sync with the mainnet EVM state.
Nicely, each contract that runs in the EVM has its own sandboxed storage area, so if we want to "seed" our EVM with the existing Ethereum mainnet state, we only need to do it for the contracts we want to interact with; no need to get the whole blockchain. However, interacting with an Ethereum node via RPC, there's a call to get the smart contract code in its entirety (eth_getcode
), but the commend to get the contract's storage (eth_getStorageAt
) doesn't return everything. Rather the memory of the contract can be queried word-by-word (32-byte chunks). It's infeasible to iterate over the whole index space since the virtual space a contract might be using is really large, so how could we interact with this?
The primary way to know what the storage state of a contract is in is to start at the contract's genesis and apply every transaction sent to it, to it. But in this situation that would mean needing to fetch all transactions for a given contract for all time, which would be a lot of data. If there were a query added to get the whole representation of a contract's storage space from an Ethereum node in its compressed (Patricia Tree) representation, that wouldn't be feasible in all cases, since for contracts like a popular ERC20 token, the amount of storage could be many megabytes in size (chews up bandwidth getting it to the client), but each transaction we run against it likely only needs a small portion of the data in memory to change (so much of that effort would be wasted).
The missing link
The ethereumjs-vm
library has the option to have a custom stateManager
inserted into it (experimental at this stage), which might be able to help with this issue. The EVM logic can be used as-is to load up the assembly bytecode of the smart contract, and every time a SLOAD
assembly command is encountered, the state manager would be queried for the memory value at a particular index. The key piece here is that because the EVM bytecode is being executed, the running program knows exactly the memory index it's supposed to be looking up at that moment in the program execution. Logic could be implemented such that the state manager first makes an RPC call to an Ethereum node to get the current memory at that location (eth_getStorageAt
works for this, since the location of the memory index is known), and returns that to the EVM application. Any SSTORE
values would save a new value into a temporary "virtual" state (which any subsequent SLOAD
calls would receive, rather than making a call to the real node).
The trick for this to work is that the link between EVM and State Manager would have to be asynchronous; if the State Manager needed to call out to an Ethereum node for the memory state, it can't return a value immediately. If the EVM isn't willing to wait around, the State Manager could note that it needs to fetch memory at a particular index and throw an exception to abort the EVM execution. Then it would have to go fetch the memory value and then signal to try the EVM action again, which would run past that first SLOAD
, and might need to abort again if there's another SLOAD
. So, an application could "brute force" it, running the EVM transaction over and over again until all the needed memory was cached and it runs through.
The end goal
With those tools in place, then the example from earlier could look like this:
- User types the description of the to-do into a text box and clicks "Add"
- The "Add" button itself becomes disabled (can't click it again), as well as the text box for description (can't change its contents)
- The Javascript handler on the "Add" button adds a call to
add_todo(string _description)
on the transaction queue - The status bar in the UI indicates there is now one transaction in the queue to be processed
- The description field and Add button are re-enabled
- A "loading..." indicator in the status bar is shown and a call is made to the in-browser EVM to calculate the contract's state as if all the items in the queue were finalized.
- If the "Add" button is clicked again while the new state is being calculated, another
add_todo
transaction is added to the queue and the state calculation processes it in order as well. - A call is made to the in-browser EVM to get the result of a
get_todos()
, which does not modify blockchain state, so returns with the list of todos (including the pending one(s)). - The "loading..." indicator in the status bar is turned off and the list of to-dos updated, to include the new one(s).
- With each new block that gets confirmed, transactions from the queue that are detected in the block are removed from the queue.
- A "loading..." indicator in the status bar is shown and a call is made to the in-browser EVM to calculate the contract's state as if all the items still left in the queue were finalized.
- A call is made to the in-browser EVM to get the result of a
get_todos()
, which does not modify blockchain state, so returns with the list of todos (including the pending one). - The "loading..." indicator in the status bar is turned off and the list of to-dos updated, which will likely not cause any changes, since the anticipated to-dos were already in the display.
A few things to note in this flow that are different from the user pespective is that the time that the form fields are disabled is very short, and compared to the original flow the "Loading..." state is a completely in-browser state (just using client-side CPU to figure out the new contract state), not waiting on network traffic, and does not need to block the UI at all. Individual applications could finesse the transitions even gracefully for their own custom logic. For example, in the To-do app example, the moment the "Loading..." indicator goes on and the in-browser EVM starts to process, the to-do item could be added to the DOM with some sort of "pending" indicator (grayed out or some such). Then when the in-browser EVM was done (should only take a second or two), it could move to a "processing" indicator (higher confidence that the transaction will result in that item being added to the list), and then removing the indicator when the item is confirmed in the blockchain. Or, the application could optionally block the whole UI while the in-browser EVM was running, and just show the updated list after that completed, since that shouldn't take too long to complete, and wouldn't be as complex to implement.
I think getting the user experience moving in this direction of having users expect to see a "queue" and "queue processing" status indicators somewhere on their display, and building up the expectations of that's how they can see what's going on with this newer means of network communication would go a long way to making DApps work well, without needing to rely on the block times for a blockchain to move extremely fast. What do you think? Any concerns or improvements in this idea?