STEEM internals #2: BASE58 and Base58Check

in steem •  8 years ago 

Base58

What is BASE58 and Base58Check

Base58 is a binary to text encoding. Meaning it is a way of converting binary data into text. BASE58Check is an encoding scheme of a Bitcoin based addresses. Including STEEM. This means that the address is encoded in the special alphabet and a few additional bits of information (literally) are added. For STEEM based addresses STM prefix is added. For WIF keys 0x80 prefix is added. The main reason to introduce it is to encode byte arrays in Bitcoin into human-typable strings.

BASE58 alphabet

BASE58 alphabet consists of 58 characters:

BASE58_ALPHABET = b"123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz"

>>> BASE58_ALPHABET = b"123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz"
>>> len(BASE58_ALPHABET)
58

You may want to ask why this is used for Bitcoin at all?

Let me quote the original Bitcoin source code:

Why base-58 instead of standard base-64 encoding?

  • Don't want 0OIl characters that look the same in some fonts and
    could be used to create visually identical looking account numbers.
  • A string with non-alphanumeric characters is not as easily accepted as an account number.
  • E-mail usually won't line-break if there's no punctuation to break at.
  • Doubleclicking selects the whole number as one word if it's all alphanumeric.

Compared to Base64, the following similar-looking letters are omitted: 0 (zero), O (capital o), I (capital i) and l (lower case L).

How does Base58 encoding work?

  1. Count number of leading 0 bytes (equal to 0) and apply the sum in the first byte.
  2. Iterate through all bytes in the source byte sequence. If you encounter a leading 0 byte, skip it. If byte value is within Base58 alphabet, add it to the result. If the byte is outside the alphabet divide the byte value by 58 until it is inside '58' range and take the letter from the result remainder index in the Base58 alphabet.

You can find an example implementation in reference Bitcoin implementation https://github.com/bitcoin/bitcoin/blob/master/src/base58.h and python-graphenelib.

And what about Base58check?

'Check' after Base58 means that an additional checksum is appended to the end before the encoding happens. The checksum can be used to automatically detect and possibly correct typographical errors. Additionally one byte of version/application information is added to the front. Bitcoin addresses use 0x00 for this byte, WIF 0x80.

As an example the full WIF is Base58 encoded from 0x80 + key + checksum from 2xSHA256 hashed key.

I guess the most important to remember from this post is that Bitcoin addresses use Base58 encoding, and the alphabet consists of 58 alphanumeric symbols.

DISCLAIMER: THE INFORMATION IS DELIVERED FREE OF CHARGE AND 'AS IS' WITHOUT WARRANTY OF ANY KIND. I HOPE IT IS ACCURATE AND FREE OF ERRORS AND YOU FIND IT USEFUL.

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!