Character encoding works seamlessly in today’s world of modern computing. Until it doesn’t, and you find yourself staring at a screen of complete gibberish, wondering how you landed yourself in the matrix.
If you do find yourself staring at such a screen of random characters whatever you do don’t panic! You haven’t entered the matrix. You’ve simply come across the work of a developer who most likely didn’t have an understanding of character encoding! And today, in 2015, with more than 7 billion persons speaking just over 7000 languages character encoding is a very important concept for any developer to at least have a high-level understanding.
Computers are dumb
While there may be over 7000 languages spoken across the globe1, our fancy computers, stacked with gigabytes of RAM, terabytes of storage capacity, and capable of executing billions of calculations per second only speak one - binary.
1 or 0, true or false, on or off. Binary is really just a way for us humans to represent electrical signals computers use to communicate. Your computer has no idea, nor cares, what a number, letter, or symbol is - all it truly knows is on or off.
Just because my knowledge of Japanese is non-existent doesn’t mean I can’t hold a conversation with a Japanese speaker. On my trip to Tokyo I could always pack an English - Japanese dictionary and translate word-for-word what I’d like to say2. Encoding schemes or encodings work the same way, translating bits to characters just as my dictionary translates English to Japanese. And with this, let’s dicuss our first encoding - the American Standard Code for Information Interchange or ASCII. ASCII encodes 128 characters into 7-bit binary integers (8 bits total, with most significant bit always 0)