Encoding in Javascript: Part 1
A rose with many names πΉ
Encoding takes the raw bits and bytes of digital data and puts them in a presentable form. The same data can be encoded in many formats depending on the application. Some encoding standards might have crossed your path before like base64, hexadecimal, UTF-8, UTF-16 and ASCII. To humans, we will see different numbers and letters, but the underlying digital data is the same.
Encoding | Representation
----------------------------------------------
Binary | 01110010 01101111 01110011 01100101
Decimal | 114 111 115 101
Hex | 72 6f 73 65
Base64 | cm9zZQ==
UTF-8 | Rose
Encoding is possible and necessary because digital data is physically stored using electrical transistors, which only have two states: on and off. The state of transistors can be represented with binary code.
Binary basics β¨οΈ
Binary code represents the state of a transistor where 1 means on or 0 means off. Each transistor or value is referred to as a bit of memory.
1
0
To represent larger numbers than 1 or 0, we group bits together, where each subsequent bit represents a higher value in base 2 (1, 2, 4, 6, 8, 16, 32, etc). The right most value indicates 20 (1), the second value indicates 21 (2), the third value represents 22 , and so forth. A 1 means we add 2x to the total integer and a 0 means we do not.
23 (8) | 22 (4) | 21 (2) | 20 (1) | Decimal |
---|---|---|---|---|
0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 1 | 1 |
0 | 0 | 1 | 0 | 2 |
0 | 0 | 1 | 1 | 3 |
0 | 1 | 0 | 0 | 4 |
0 | 1 | 0 | 1 | 5 |
We can represent any positive decimal integer (positive number) using binary. Try to see if you understand the pattern below:
Binary | Decimal
----------------
0000 | 0
0001 | 1
0010 | 2
0011 | 3
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Since a computer program that is just one integer wouldn't be very useful, we separate bits into groups to make multiple integers. Eight bits are be grouped together to make what is called a byte of memory. Below you can see how a sequence of bytes can represent a sequence of integers.
Binary | 00000001 11111111 00110011 01000111
Decimal | 1 255 51 71
Since a computer program that is just a bunch of integers is still not super useful, we can assign added meaning to the integers. In other words, we can encode information into the numbers that we store using digital bytes.
Encoding assigns meaning βπ»
If we were to take the standard 26 letter latin alphabet and assign each letter a number starting with a = 1
and b = 2
, we could store and transfer any written information using only numbers. This is a form of encoding. You can see how we could take logical steps to convert information to binary, which We know is how digital computers store data.
h i = 8 9 = 00001000 00001001
b y e = 2 25 5 = 00000010 00011001 00000101
Encoding is the crucial step of defining rules for what values mean what information. When engineers were designing the first digital computers, they could encode and decode data according to the purpose of the computer, but transferring data from one computer to another required the computers to know the same encoding and decoding rules.
Text Encoding π‘
In the 1960's the American Standards Association created the text encoding standard called ASCII. It took a byte of memory and mapped what text the computer would interpret for each byte. Since the last bit (farthest left) was always 0, they used 7 bits for encoding. This gave them 128 unique points, which was enough for 32 control operations, the entire uppercase and lowercase English alphabet, spaces, numbers and many symbols. You can see a few encodings below:
A 65 0100 0001 | B 66 0100 0010 | C 67 0100 0011 | D 68 0100 0100 |
a 97 0110 0001 | b 98 0110 0010 | c 99 0110 0011 | d 100 0110 0100 |
Overtime the limitations with using a 128 point set became increasingly apparent when supporting software with more languages, non-latin writing systems, more symbols and the possibility of emojis. Computer scientists decided that variable byte encoding was the way forward and eventually most operating systems have landed on the encoding standard called UTF-8. It is backwards compatible with ASCII, but because it allows for up to four bytes per character, you can assign more than a million unique encoding points. This entire website is encoded with UTF-8, which supports many languages (γ¨γ³γ³γΌγγ£γ³γ°γ―γ―γΌγ«γ ) and even Unicode emojis (ππ₯³π’).
In Javascript we use two similar objects convert text to and from binary. One is TextEncoder
, which takes text and encodes it into binary. The other is TextDecoder
, which takes binary data and decodes it into text.
const rawData = new TextEncoder().encode("make raw data");
const text = new TextDecoder().decode(rawData);
console.log({ rawData, text });
You can check out the binary code from what you type below. It's interesting to experiment what characters take up one byte and what characters take up multiple bytes. Try entering emojis π πββοΈ πΆββοΈββ‘οΈ and non latin alphabet characters Γ‘ β Ξ γ π.
new TextEncoder().encode()
01100001
Decoding binary to text does not necessarily produce valid letters because not every possible byte has a corresponding character encoding. Entering random 1s and 0s will return many unknown characters (οΏ½). You can experiment copying valid binary and making up your own random binary below:
new TextDecoder().decode()
a
While encodings like UTF-8 help us encode text, we can also encode numbers into robust datasets to perform needed math function. There's some important computer science involved in encoding numbers.
Encoding numbers in Javascript π’
In most programming languages, including Javascript, numbers are encoded consistently. Typically each number takes up 64 bits of data, where one bit is the sign (positive or negative), 11 bits is the exponent (the whole number), and then 52 bits is the mantissa (the decimal places between 0 and 1). This enables us to work with many mathematical operations included in the Math
prototype and in operators like +
, -
, *
and /
.
The Number
prototype also converts numbers into string formats. We can convert whole positive numbers to a binary string using Number.toString(2)
. We can convert it back to an integer using parseInt(, 2)
.
const fortyTwo = 42;
const binaryFortyTwo = fortyTwo.toString(2);
console.log(`Binary Forty Two: ${binaryFortyTwo}`); // 101010
const integerFortyTwo = parseInt(binaryFortyTwo, 2);
console.log(`Integer Forty Two: ${integerFortyTwo}`); // 42
const binaryFiftyFive = (55).toString(2);
console.log(`Binary Fifty Five: ${binaryFiftyFive}`); // 110111
const integerFiftyFive = parseInt(binaryFiftyFive, 2);
console.log(`Integer Fifty Five: ${integerFiftyFive}`); // 55
You can experiment with parseInt(, 2)
below:
parseInt( , 2)
:
10
Keep in mind that the Number.toString(2)
method does not actually display the binary representation in the same way as the programming language. Instead of designating a bit for the sign and bits for the mantissa, it will use a negative sign and decimal point. This is why it's more straight forward to use it for whole positive numbers.
Number.toString(2)
:
-101010.0110101110000101000111101011100001010001111011
To be continued...
We've discussed text encodings and reviewed some of the concepts underlying numbers, but there's many types of data that have nothing to do with letters or numbers. Think about colors, images, audio, video and cryptography. How can we efficiently represent these types of data? We will need to review more binary helpers in Javascript to work with that data.
This article continues in Encoding 2.