State of decentralized file storage mid-2018
In my last article, Managing trust and risk in Fragments, I provided a little demo about creating mutual agreements that can be deployed on the Ethereum network. Interactions with these kinds of agreements work as intended, costing reasonable fees (Ethereum gas) when interacted with. If you intend to attach more comprehensive documents, however, things can become expensive. In this article, I will explore some of the other solutions for distributing supporting documents — and files in general — more economically.
It is important to bear in mind that the environment of distributed file storage is evolving rapidly. This text should be approached in context of the state of this technology in mid-2018.
Decentralized storage characteristics
In order to properly discuss distributed storage networks, we must first define some keywords. Most distributed storage systems offer Permanency of files. What this means is that files will be assigned unique names or addresses on the network. This will usually be the hash of the file. When you create a link to a permanently saved file it can never be directed to another file.
The second important concept is Persistence. Persistency refers to a trait of the network that ensures uploaded files will be available as long as required. In an ideal world that would be forever, but in real scenarios it will be until such time as users (usually the original uploader) are willing to pay fees for file persistency, or host the files themselves and share them on the network.
The main advantage of distributed storage is the ability to make files available to all users with maximum possible uptime and transparency and also a minimal possibility of content being censored, all for a price competitive with traditional or cloud storage. At Fragments we will leverage these features for storing informative documents (e.g. for agreements); hosting source datasets for micro-tasking, which will be downloaded by workers; and to store the crowd-sourced data resulting from micro-tasks.
IPFS
InterPlanetary File System is probably the best-known network among the crypto community. Files on the network are shared by nodes on a voluntary basis and it offers Permanency. The system splits the uploaded file into 1 MB pieces and computes a hash for each piece. The hashes are then composed into a hash tree. The root hash (which describes the entire tree) is then used to address the file. Nodes voluntarily host files and serve them to other nodes on request, in a similar manner to World Wide Web architecture.
The network’s architecture utilizes a combination of the proven, modern techniques of BitTorrent’s swarm (untrusted peers) to distribute the hashed pieces of files, Git for change versioning and SFS for authentication. The challenge which remains mostly unresolved is creating a human-readable naming subsystem that hides hashes from regular users and enables the building of new era web tools on top of it. Some (failed) attempts to create a naming system that meets the desired quality include IPNS, ENS and Namecoin.
Despite the difficulty in making it user-friendly, IPFS looks to have a stable dev team and could possibly become the industry standard for hosting files of distributed applications.
In Fragments’ architecture, micro-task repository providers could run their own IPFS nodes and utilize the network for distributing micro-task data to workers. The Fragments team could run a couple of friendly nodes to help with file seeding, thus increasing availability and speed. On the other hand, such a solution works at it’s best when users have their own IPFS node on the device they use to solve micro-tasks. Nodes on consumer devices are not common at this time, which forces applications to use Web/IPFS gateways. This might change in the future as we see a rise in projects trying to bring IPFS to phones.
We have used IPFS in one of our demos and found it to be surprisingly friendly for both developers and users.
Filecoin
Due to a lack of economic incentive in IPFS, for nodes to store files that they don’t need for themselves, FileCoin was created to provide an incentive layer on top of it. In essence, you pay fees for the time your files are saved on other nodes, thus achieving Persistency of files. The authors created two new algorithms for achieving this: Proof-of-Replication, which ensures the user uploading a file to the IPFS network that his file is hosted on multiple nodes. Proof-of-Spacetime ensures that the file will be available on those nodes for as long as originally contracted.
Filecoin had its ICO in 2017 but the project’s team has kept their source code private, so the public has yet to see an alpha version of the software. Hopefully, we will see a working version before the end of 2018.
Zeronet
I personally have liked the Zeronet project from the start. Its advantage lies in the number of working demo projects which show the many possibilities of the platform and distributed applications in general. Demos are very often web-oriented, so you can easily create your own blog, forum or even social network. One of the more significant examples of Zeronet’s capabilities is Play, a site that facilitates browsing and downloading of torrents. I also like that Zeronet supports anonymity using Tor, a feature which has been included since the early days of the project. As with IPFS, Zeronet only ensures Permanency.
The disadvantage of this network is a lack of documentation or a broader community. For that reason, it is hard to find detailed information about the platform. This is probably a result of the project being rather small and poorly funded in contrast to other big names in the space.
Zeronet also utilizes BitTorrent to distribute content but, in contrast to IPFS, it aims to share the whole webpage (via a top-level folder of all files in general), rather than its component parts.
Due to the shaky foundations of this platform, it will be complicated for Fragments or other projects to create consumer-grade applications that could be used today. That said, it is a working platform and it’s great for experimenting. It would be interesting to see if we could create a small micro-task application prototype and distribute it to workers via this network but, for now, that’s still on the horizon.
Sia
Another approach to distributed storage is Sia. Sia can be understood as a market for file storage where parties negotiate the various conditions of file hosting via the platform’s smart contracts. It utilizes Siacoin, which you can you use to buy storage on the network’s market or as collateral for offering hosting services. Whenever a file is uploaded to the network it is split into ten pieces, each of which are hosted by a different host. Each piece is then duplicated three times across different hosting nodes. When a host goes offline, their data are reduplicated from the remaining nodes, and penalties apply should they fail to maintain over 95% uptime., ensuring a higher quality of hosts. Sia offers both Permanency and Persistency.
Hosting contracts can define maximum storage space, upload and download bandwidth, duration, price and various other conditions. At the time of writing, you can rent 1 TB of storage for a month for 75 Siacoins (less than 1$), on average! That is the cheapest price for distributed storage I am aware of on the market today. If the network maintains its low price, it could find itself in a position to challenge services like Dropbox in the near future.
Of course, there’s a catch. The problem facing Sia right now is its upload and download speeds. While you can rent enormous amounts of storage space, transfer speeds will likely become a bottleneck. Because hosts include regular people sharing their PC storage space, when connected to the internet via an asymmetric connection, downloads (i.e. the host’s upload) can take a painfully long time. As far as I’m concerned, this is the biggest issue of the Sia network as it stands — for a more detailed breakdown, see this discussion. (Edit: as was pointed out by Sia team this information is outdated. In the newest version download/upload speeds are significantly improved) Another disadvantage is the current need to download the full node software, which pulls around 8GB of data from the network on its first run. Furthermore, there’s currently an awkward minimum upload size of 40MB, meaning that when you upload smaller files they are padded to meet this minimum.
The Sia node software seems to have a nice API and there is an active community of users and developers. Hopefully, in the future, there will be a lightweight software alternative that won’t initially require users to download the whole blockchain.
Storj
Storj’s network is currently more centralized than other projects. It has set goals similar to those of Sia but it introduces the idea of masternodes (called Bridges) that manage files’ metadata and speeds up the network. The company Storj Labs, which stands behind the project, collects storage fees and pays hosts on a monthly basis. Payments are carried out via the Storj token, built on Ethereum (before 2017 it was a token on CounterParty). It is the biggest network discussed so far, storing 600 TB of data. (Edit: It was pointed out to me by Storj developers that the network is actually much bigger with around 30 PB stored!) At first glance the platform comes across much like regular cloud storage, setting a fixed price for storing files. The price of storage space is higher when compared to Sia but so are its download and upload speeds. At the time of writing, you can rent 1 TB a month for $15. The Storj network supports Permanency and Persistency of stored files.
Storj’s developer portal is robust, client libraries for major programming languages are already available, and the community is very active. It may also be an advantage that the Storj token exists on Ethereum’s network, where the Fragments token will be.
Fragments will attempt to create solutions on both Sia and Storj networks, while leveraging their respective strengths.
Other projects worth mentioning
Freenet is a storage solution that focuses primarily on security, censorship resistance and anonymity. The project is actually older than Tor but is still in an experimental phase, thus not fit for our use.
People working on the Maidsafe project are developing their own browser which enables users to browse data and applications running on the network with enhanced security. The project is focused on security and censorship resistance. As of June 2018, only ~30 GB (vs ~200 TB on Sia) was stored on the network but it has a passionate community testing their current alpha version software. There are already a couple of applications running on the Maidsafe network.
Substratum appears to be similar to Freenet and Maidsafe but utilizes a blockchain. Currently, they have an ERC20 token on Ethereum but they have stated plans to develop their own blockchain. It is the newest project among these three and yet the beta version of the network is already operational. Substratum’s network can be browsed by a regular web browser thanks to the SubstratumDNS.
Conclusion
For our storage solution, the qualities which we are looking for include high availability, data security, accessibility of the network for user devices (desktop, phone), and competitive pricing. Three possibilities which seem to fulfill our requirements are combinations of IPFS with FileCoin, Storj or Siacoin. We will try to create proof of concept applications with each that can be used and forked.
IPFS introduces the possibility for Fragments’ task repository providers to host their files the “old fashioned way”, on their own servers, while taking advantage of the faster speeds and other goodies brought about by IPFS. In future, both data and micro-task applications could possibly be hosted on IPFS. Sia could be a cheap way to outsource the need to configure servers or hosting solutions for the micro-task datasets and results submitted by workers. It’s hard to predict what throughput will be needed by Fragments applications. Storj will be used on applications where the capabilities provided by Sia are insufficient and higher prices are acceptable.
Join the Fragments movement
We need you, our community, to help us achieve our global ambitions. With your help we can bring Fragments to life and create the workforce of the future.
https://fragments.network/
Join Community on Telegram: https://t.me/fragments
See our website: https://fragments.network
Follow us on Twitter: https://twitter.com/fragment
Fork us on GitHub https://github.com/FragmentsNetwork