Jim and I talked about a range of design topics and a bit about how today’s Internet works in the seminar this week, and I thought I’d write my blog in the same “scattered” fashion. While at first seemingly scattered and long, if you bear with me, we might get to the second half of this blog’s title. Ready?
I’ll start with a pointer to a Wired article about DNS hijacking. On Monday, we discussed, at a high level, how the Domain Name System (DNS) on the Internet helps people to get to web sites (or other resources on the Internet) by converting easy to memorize domain names (e.g., college.harvard.edu) to the numerical IP addresses (i.e., 184.73.172.23 in my example). The IP address is simply a standardized name space that works like U.S. postal addresses. For example, the address of Pinocchio’s Pizza & Subs shop is 74 Winthrop St., Cambridge MA. You’d find it by traveling to Massachusetts and then to the Massachusetts’s town of Cambridge. In Cambridge, you’d look for Winthrop Street and hope you’d find a building with the number 74 on it. IP addresses work the same way. The first numbers take you to a particular part of the network, and the last number corresponds to some particular machine (house) on the identified subnet (street). The Wired article talks about how adversaries have attacked this system to cut certain resources out of the Internet for a short time.
Unfortunately, adversaries are tougher to defeat than Mother Nature. So let’s stick with getting around the problems that Mother Nature throws at us for the rest of this blog post.
While Jim and I were repeatedly expressing our deep admiration for the Saltzer et al paper this week, we somehow stumbled into a discussion of bit-error detection and correction. I’m worried that I glossed over things too quickly. So, here’s a bit more explanation of how this stuff works. There’s a whole science behind this, but I’ll attempt to just give you a flavor of how it works and why it is powerful.
As we discussed, you can detect a single bit error by adding a parity bit to any group of bits. For example, if I wanted to protect a 4-bit data string against single bit errors while the packet was in transmission, I would add a single bit to the data string (creating a 5-bit packet), and the value of this fifth bit would be the sum of the other four bits modulo 2. As described, I’ve added what’s called an even parity bit to my packet. It’s called this because the addition of the parity bit makes the number of ones in the 5-bit packet even (or zero). Lots of things in the digital world have duals, and the same is true here for we could also create an odd parity bit that would make the number of ones in the packet odd. Let’s stick with even parity.
How does this work in practice? The machine sending the original 4-bit data string computes the even parity bit, appends it to the end of the four bits of data, and sends a 5-bit packet. The receiving machine grabs the 5-bit packet and computes the packet’s parity (over all 5 bits). If the result of this computation is zero, the first 4 bits are the data it wanted without any single-bit errors. If the result is one, then there was an error in transmission and the packet needs to be resent.
Notice that we can’t correct the error without retransmission because we don’t know where the error occurred in the 5 bits we received. One of the 1s was flipped to a 0 or a 0 was flipped to a 1, but all we know is that a bit was flipped somewhere.
Of course, this scheme isn’t guaranteed to work for 2 bit errors. If you flip any two bits, the effect of their flips cancels each other out in the parity calculation. I’ll let you figure out what happens when you have more than 2 bit errors.
We, however, were interested in fixing (Mother-Nature-type) errors in the network so that you could create a reliable network and remove the need for retransmissions. To explore this question, let’s assume that your network never has anything more than a single bit error in a packet no matter what its size (an unrealistic, but pedagogically interesting situation). What would I have to do to guarantee that the receiving machine had enough information to correct any single bit error in the packet?
Well, you could send the same data string twice in a single packet, which guarantees that you would get a clean copy of the data bits. Unfortunately, if the two received copies don’t match, you don’t know which copy of the duplicated data bits is the one you want. You simply made a very space-expensive, single-bit error detector that not only tells you that an error occurred, but where it occurred in the string of data bits.
Ok, that design failed, but fear not. We can always try again. Let’s try sending three copies of the data bits in a single packet. Now in the case of a single bit error in the packet, we can use a simple voting mechanism to decide what is the correct string of data bits. Success! The cost of this success was that we need a packet three-times longer than the data we wanted to send. Hmmm, maybe we can do better.
This is where the work of Richard Hamming comes in. Hamming did pioneering work in the middle of the last century on error detection and how to build efficient error correcting codes. Continuing with our example of wanting to send 4 data bits over a network, Hamming showed that you could add just 3 parity bits to the 4 data bits (creating a 7-bit packet) and correct any single bit error that occurs in the entire 7-bit packet. That’s less space overhead than we added in our first duplication scheme that failed to achieve error correction! More impressive, the overhead of Hamming’s approach decreases very quickly as you send longer and longer strings of data bits. For example, when we send 4 data bits in a 7-bit packet, the efficiency of the transmission is (4/7 =) 0.571. If we wanted to send 247 data bits, we’d need only 8 parity bits to cover all (247+8 =) 256 bits in the packet for a (247/256 =) 0.969 efficiency! Nearly every bit sent is a useful data bit! That’s the power of exponentials and clever encodings.
To see exactly how this works, I encourage you to look at the example on the Hamming(7,4) Wikipedia page. The matrix arithmetic might look intimidating, but it’s just a mathematical way of saying that the three parity bits looked at as a 3-bit number can cover all 7 locations in our 7-bit packet, allowing us through clever encoding to know exactly which bit was flipped, if one was. If no bit was flipped (i.e., no single bit error occurred), the eighth encoding of the three parity bits means no single bit error occurred.
In class, I said we needed to add only three parity bits to correct any single bit error in eight data bits. Off by one error! You clearly need four parity bits to correct a single bit error in eight data bits (a 12-bit packet). Please fix your notes if you took some.
I want to bring us back to design as I end this blog post. I often cold call on some of you during our seminar asking, for example, “Yasmin, what would you do to make this work?” In asking this, I’m asking a design question, and design questions are, at first, hard. Why? Because design questions have no right answer. (Please go back and read the previous line again.) And because they have no right answer, you can answer any way you like! The worst thing that will happen is that you’ll describe something that doesn’t do what we hoped it would do, as I did above in the doubling-the-data-string example. No problem with that. Maybe we’ll design something cool for a different application, or we’ll learn something and restart our design process a bit smarter.
Now, most of you, probably, have never done any significant design. That’s ok. More importantly, in my humble opinion, the only way to learn to do design is to do it. How might you start doing it? (How might you answer my question to Yasmin?) Well, you know where you want to go and so just take a small step in that direction. Suggest something that sounds like it moves us in the desired direction and maybe give the reason why you think it moves us in the right direction. We did that above. With enough practice, you’ll see patterns in design that lead to fundamental advances like Hamming did, and like Saltzer and his colleagues did.