5.1. DATA REPRESENTATIONS
This section under construction.
All data on digital computers is represented as a sequence of 0s
and 1s. This includes numeric data, text, executable files, images,
audio, and video. The ASCII standard associates a seven bit binary
number with each of 128 distinct characters. The MP3 file format
rigidly specifies how to encode each raw audio file as
a sequence of 0s and 1s.
All data are numbers, and all numbers are data.
In this chapter, we will treat all data as binary numbers;
in Chapter XYZ, we will treat all data as strings of symbols.
Here is a recent study of
how much new information is created each year.
We review how to represent integers in binary, decimal, and hexadecimal. We also review how to convert between bases and do arithmetic. Finally, we describe how to represent negative integers using ``two's complement notation.''
Number systems. There are many ways to represent integers: the number of days in the month of October can be represented as 31 in decimal, 11111 in binary, 1F in hexadecimal, or XXXI in Roman Numerals. It is important to remember than an integer is an integer, no matter whether it is represented in decimal or with Roman Numerals. We are most familiar with performing arithmetic with the decimal (base 10) number system. This number system has been widely adopted, in large part because we have 10 fingers. Other number systems still persist in modern society. The Babylonians divided a circle into 360 degrees since they believed the Sun rotated around the Earth in about 360 days. 360 is more convenient than 365 since 360 has lots of divisors, so it can be divided into 12 months of 30 days each. Ptolemy tabulated trigonometric tables using base 360, and, even today, we still often use degrees instead of radians when doing geometry. Following in this spirit, most clocks are based on the sexagecimal (base 60) number system. Computers are based on the binary (base 2) number system because each wire can be in one of two states (on or off). Writing numbers in binary is tedious since this representation uses between 3 and 4 times as many digits as the decimal representation. The hexadecimal (base 16) number system is often used in machine languages, including TOY, as a shorthand for binary. Base 16 is useful because 16 is a power of 2, and numbers have roughly has many digits as in the corresponding decimal representation. A sequence of digits x = xn, xn-1, ..., x1, x0 in base b integer
x = xn bn + xn-1 bn-1 + ... + x1 b1 + x0 b0.
This representation is called positional notation. The xi terms are the positional digits, and each digit is required to be an integer between 0 and b - 1. In binary, the two digits (also referred to as bits) are 0 and 1; in decimal, the ten digits are 0 through 9; in hexadecimal, the sixteen digits are 0 through 9 and the letters A through F. Every positive integer can be expressed using positional notation, and the representation is unique modulo an arbitrary number of the leading 0's. As an example the number of days in a leap year is 36610 = 1011011102 = 16E16 since:
366 = 3×102 + 6×101 + 6× 100 366 = 1× 28 + 0× 27 + 1× 26 + 1×25 + 0×24 + 1×23 + 1×22 + 1×21 + 0×20 366 = 1×162 + 6×161 + 14×160
The following table gives the binary, decimal, and hexadecimal representations of the first 48 integers.
| BIN | DEC | HEX | BIN | DEC | HEX | BIN | DEC | HEX | ||
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 10000 | 16 | 10 | 100000 | 32 | 20 | ||
| 1 | 1 | 1 | 10001 | 17 | 11 | 100001 | 33 | 21 | ||
| 10 | 2 | 2 | 10010 | 18 | 12 | 100010 | 34 | 22 | ||
| 11 | 3 | 3 | 10011 | 19 | 13 | 100011 | 35 | 23 | ||
| 100 | 4 | 4 | 10100 | 20 | 14 | 100100 | 36 | 24 | ||
| 101 | 5 | 5 | 10101 | 21 | 15 | 100101 | 37 | 25 | ||
| 110 | 6 | 6 | 10110 | 22 | 16 | 100110 | 38 | 26 | ||
| 111 | 7 | 7 | 10111 | 23 | 17 | 100111 | 39 | 27 | ||
| 1000 | 8 | 8 | 11000 | 24 | 18 | 101000 | 40 | 28 | ||
| 1001 | 9 | 9 | 11001 | 25 | 19 | 101001 | 41 | 29 | ||
| 1010 | 10 | A | 11010 | 26 | 1A | 101010 | 42 | 2A | ||
| 1011 | 11 | B | 11011 | 27 | 1B | 101011 | 43 | 2B | ||
| 1100 | 12 | C | 11100 | 28 | 1C | 101100 | 44 | 2C | ||
| 1101 | 13 | D | 11101 | 29 | 1D | 101101 | 45 | 2D | ||
| 1110 | 14 | E | 11110 | 30 | 1E | 101110 | 46 | 2E | ||
| 1111 | 15 | F | 11111 | 31 | 1F | 101111 | 47 | 2F |
Number conversion.
Converting from Decimal. It is slightly more difficult to convert an integer represented in decimal to one in base b because we are accustomed to performing arithmetic in base 10. The easiest way to convert from decimal to base b by hand is to repeatedly divide by the base b, and read the remainder upwards. For example, the calculations below show that 36610 = 1011011102 = 16E16.
Converting between binary and hexadecimal. We describe a fast way to convert from the binary to hexadecimal representation of an integer. In principle, we could convert from binary to decimal, and then from decimal to hexadecimal. The following approach is simpler. To convert from binary to hexadecimal: first, group the digits 4 at a time starting from the right; then convert each group to a single hexadecimal digit, padding 0's to the very last group if necessary.
0011 1010 1110 0111 0001 3 A E 7 1 |
To convert from hexadecimal to binary: convert each hexadecimal digit individually into its corresponding 4 digit binary number, removing any leading 0's.
9 F 0 3 1001 1111 0000 0011 |
These simple procedures work because one base is a power of the other. Likewise, it would be easy to convert between the base 125 and base 5 representations.
Number conversion in Java. Converting a string to another type is called parsing, and can be a significant computational burden, since there are normally numerous cases to consider. We have already used one such built-in static method extensively: Integer.parseInt converts from a string of decimal digits to an integer. The program BinaryConverter.java defines a static method fromBinaryString(s) that converts a string s of binary digits to an integer, and another static method toBinaryString(n) that converts an int n to a binary string. We note that Java's library calls Integer.parseInt(s, 2) and Integer.toBinaryString(n) accomplish the same goal.
Program HexInOut.java reads in a hexadecimal integer from standard input and prints it out in hexadecimal to standard output.
Arithmetic in other number systems.
Addition. In grade school you learned how to add two decimal integers: add the two least significant digits (rightmost digits); if the sum is more than 10, then carry a 1 and right down the sum modulo 10. Repeat with the next digit, but this time include the carry bit in the addition. The same procedure generalizes to base b by replacing the 10 with the base b. For example, if you are working in base 16 and the two summand digits are 7 and E, then you should carry a 1 and write down a 6 because 7 + E = 1616. Below, we compute 456710 + 36610 = 493310 in binary (left), decimal (middle) and hexadecimal (right).
|
|
|
Multiplication. One compelling reason to use positional number systems is to facilitate multiplication. Multiplying two Roman Numerals is awkward and slow. In contrast, the grade school algorithm for multiplying two decimal integers is straightforward and reasonably efficient. As with addition, it easily generalizes to handle base b integers. All intermediate single-digit multiplications and additions are done in base b. Below, we multiply the decimal integers 4,567 and 366 (left) and then again in hex (right).
|
|
Signed and unsigned integers. On a machine with 16-bit words, there are 216 = 65,536 possible integers that can be stored in one word of memory. By interpreting the 16 bits as a binary number, we obtain an unsigned integer in the range 0 through 65,535. Instead, we can interpret the leading bit as the sign of the number, using two's complement notation. This allows us to interpret the 16 bits as a signed integer in the range -32,768 through +32,767, as described in the table below. As with binary integers, it is often convenient to express 16-bit two's complement integers using hexadecimal notation.
| BINARY | HEX | DECIMAL |
|---|---|---|
| 0000 0000 0000 0000 | 0000 | 0 |
| 0000 0000 0000 0001 | 0001 | +1 |
| 0000 0000 0000 0010 | 0002 | +2 |
| 0000 0000 0000 0011 | 0003 | +3 |
| ... | ||
| 0111 1111 1111 1110 | 7FFE | +32,766 |
| 0111 1111 1111 1111 | 7FFF | +32,767 |
| 1000 0000 0000 0000 | 8000 | -32,768 |
| 1000 0000 0000 0001 | 8001 | -32,767 |
| 1000 0000 0000 0010 | 8002 | -32,766 |
| ... | ||
| 1111 1111 1111 1101 | FFFD | -3 |
| 1111 1111 1111 1110 | FFFE | -2 |
| 1111 1111 1111 1111 | FFFF | -1 |
Negating a two's complement integer. To negate (change from positive to negative or negative to positive) a two's complement integer, first complement all of the bits, then add 1. By complement, we mean replace all of the 0's with 1's, and the 1's with 0's. The table below illustrates a few examples.
| DECIMAL | BINARY | COMPLEMENT | INCREMENT | DECIMAL |
|---|---|---|---|---|
| +3 | 0000 0000 0000 0011 | 1111 1111 1111 1100 | 1111 1111 1111 1101 | -3 |
| -3 | 1111 1111 1111 1101 | 0000 0000 0000 0010 | 0000 0000 0000 0011 | +3 |
| +40 | 0000 0000 0010 1000 | 1111 1111 1101 0111 | 1111 1111 1101 1000 | -40 |
| -40 | 1111 1111 1101 1000 | 0000 0000 0010 0111 | 0000 0000 0010 1000 | +40 |
| +366 | 0000 0001 0110 1110 | 1111 1110 1001 0001 | 1111 1110 1001 0010 | -366 |
| 0 | 0000 0000 0000 0000 | 1111 1111 1111 1111 | 0000 0000 0000 0000 | 0 |
| -32,768 | 1000 0000 0000 0000 | 0111 1111 1111 1111 | 1000 0000 0000 0000 | -32,768 |
Number Conversions with two's complement integers.
We describe how to convert between the decimal and two's
complement representations of an integer.
To convert the 16-bit two's complement integer
FE92 into decimal, we start by writing down its
binary representation: 1111 1110 1001 0010.
We recognize it as a negative integer since the most
significant bit is 1. Then, we negate it (flip bits and add 1)
to obtain: 0000 0001 0110 1110. We
convert 101101110 (binary) to 366 (decimal).
After putting back in the negative sign, we obtain the final
answer of -366 (decimal).
To convert from the decimal integer -366 to its 16-bit two's complement representation, we can apply the above steps in reverse. First we convert 366 (decimal) into 101101110 (binary). Next, we negate it (flip bits and add one) to obtain 1111 1110 1001 0010. It is important that we fill in all 16 bits. If desired, we can convert this to hexadecimal: FE92.
Adding two's complement integers. Adding two's complement integers is straightforward: add the numbers as if they were unsigned integers, ignoring any overflow. Below we compute 456710 + -36610 = 420110 in decimal (left) and again in binary using 16-bit two's complement integers (right). Note that the second binary integer represents a negative integer using two's complement notation.
|
|
In the example above, we carry a one into the most significant bit (leftmost bit), and we carry a one out, which we subsequently discard. Despite the apparent overflow, we are left with the correct result. However, there is one situation where this will produce an incorrect answer: if we do not carry a one into the most significant bit, but do carry a one out. Below we compute -32,76610 + (-36610) = -33,13210 using 16-bit two's complement integers. At first, we might be surprised to see that the result of adding two negative integers is a positive integer (to see this quickly, look at the most significant bit of the result). This occurs because there is carry out of the most significant digit, but no carry in to it. In hindsight, we should not be surprised because the true answer -33,132 cannot be represented as a 16-bit two's complement integer.
|
|
Bit-whacking operators in Java. In Java, an int is a 32-bit two's complement integer. Like TOY, Java supports a number of operators to manipulate the bits of integer types, as summarized below. These bit-whacking operators are especially useful when performing low-level data processing such as cryptography, data compression, error correction, and transmitting email. We will also use them in Section 5.5 to simulate the behavior of the TOY machine.
| NAME | OPERATOR | PURPOSE |
|---|---|---|
| bitwise NOT | ~x |
Flip all the bits of x |
| bitwise AND | x & y |
Take the AND of each pair of bits |
| bitwise OR | x | y |
Take the OR of each pair of bits |
| bitwise XOR | x ^ y |
Take the XOR of each pair of bits |
| left shift | x << y |
Move the bits of x to the left y positions |
| right shift | x >> y |
Move the bits of x to the right y positions |
| unsigned right shift | x >>> y |
Move the bits of x to the right y positions |
Left shifting an integer by i bits means that each of the bits in the integer are moved i positions to the left. Zeros are padded to the right end. This is equivalent to multiplying the integer by 2i. For example 43 <3 is 344 and -43 <3 is -344.
43 = 0000 0000 0000 0000 0000 0000 0010 1011 43 <3 =0000 0000 0000 0000 0000 0001 0101 1000=344 -43=1111 1111 1111 1111 1111 1111 1101 0101 -43 << 3=1111 1111 1111 1111 1111 1110 1010 1000=-344
Right shifting an integer means that you shift all of the bits i positions to the right, padding 0s to the beginning if the integer is positive and 1s if the integer is negative. It is equivalent to dividing the integer by 2i and rounding the result down (towards -infinity). For example 43 >> 3 is 5 and -43 >> 3 is -6. For positive integers, this is equivalent to integer division, but for negative integers, the rounding works in the opposite direction.
43 = 0000 0000 0000 0000 0000 0000 0010 1011 43 >> 3 = 0000 0000 0000 0000 0000 0000 0000 0101 = 5 43 / 8 = 0000 0000 0000 0000 0000 0000 0000 0101 = 5 -43 = 1111 1111 1111 1111 1111 1111 1101 0101 -43 >> 3 = 1111 1111 1111 1111 1111 1111 1111 1010 = -6 -43 / 8 = 1111 1111 1111 1111 1111 1111 1111 1011 = -5
An unsigned right shift is identical to a signed right shift when applied to positive integers, but differs when applied to negative integers. An unsigned right shift always pads 0s to the left, regardless of whether the integer is positive or negative. For example, 43 >>> 3 is 5 but -43 >>> 3 is 536870906. To see why, observe that
43 = 0000 0000 0000 0000 0000 0000 0010 1011 43 >>> 3 = 0000 0000 0000 0000 0000 0000 0000 0101 = 5 -43 = 1111 1111 1111 1111 1111 1111 1101 0101 -43 >>> 3 = 0001 1111 1111 1111 1111 1111 1111 1010 = 536870906
Program Shift.java reads in two integers a and b from the command line and computes the above quantities and prints the results.
Big Endian, little endian. Computers differ in the way in which they store multi-byte chunks of information, e.g., the 16-bit short integer 0111000011110010 = 70F2. The two are two primary formats, and they differ only in the order or "endianness" in which they store the bytes. Big endian systems store the most significant bytes first, e.g., they store the integer above in its natural order 70F2. Java uses this format, as does Apple Mac, IBM PowerPC G5, Cray, and Sun Sparc. Little endian systems store the least significant bytes first, e.g., they store the integer above in reverse order 2F07. This format is more natural when manually performing arithmetic, e.g., since for addition you work from least significant byte to most significant byte. Intel 8086, Intel Pentium, Intel Xeon use this format. Computer scientists occasionally engage in religious wars about which is better. Fortunately, Java hides the endianness from the end user, so if you create binary data files in Java, you won't need to worry about endianness when sharing them over the Internet with Java users on Mac, PC, and Solaris platforms. Unless you need to read in a legacy binary file (e.g., written in C on a PC), you shouldn't have to directly confront these details.
Dangers of not recognizing overflow. In September 2004, a computer glitch grounded air traffic in southern California for several hours and left planes without radio communication. The software was running on Windows 2000 Advanced Server. The software relied on a Windows function named GetTickCount(), which returns the number of milliseconds since the system was started. Unfortunately, this value is represented using a 32-bit integer. After approximately 49.7 days, the value overflows. This glitch was known when the system was launched. To avoid the ensuing problems, the documentation recommended manually rebooting the system once a month.
Q + A
Q. Do all programming languages use 32 bit two's complement integers?
A. No. Java is unusual in that it completely specifies the representation of an int. In C, there is no requirement that an int be 32 bits or that it uses two's complement notation to represent negative integers. Different C compilers can represent integers in different ways, and this can lead to incompatibilities when trying to use the same program with different compilers.
Q. My program only needs integers between -32,768 and 32,767. Does using a 16-bit short make my program faster? Does it save space?
A. Typically a single short variable is internally stored using 32 bits (especially if the underlying hardware architecture is 32 bit), in which case it does not save any space to declare a single short. Moreover, it can take longer because the Java system must make the 32 bits behave as if it were representing a 16-bit two's complement integer. On the other hand, if you declare a huge array of type short, then the elements will be packed two to an int, and the short array will use approximately half as much memory as an int array of the same length.
Q. Why do I need to cast to add two variables of type short?
A. Java convert the results of most integer operations to be of type int. If a, b, and c are of type short, then a + b is promoted to type int, and assigning this sum to a short requires an explicit cast. One exception to this rule is if you use +=, in which case the cast is not necessary.
Q. Can I apply bitwise & and bitwise | to boolean values? If so, is there any difference between the corresponding logical operators && and ||?
A. Yes. The difference is that the logical operators are subject to short circuiting: the expression (f(x) && g(x)) will not evaluate g(x) if f(x) is false. This also explains why there is no ^^ operator for logical XOR: it is never possible to short-circuit an XOR, so it would always be identical to bitwise ^.
Exercises
- Convert the decimal number 92 to binary.
Answer: 1100100.
- Convert the hexadecimal number BB23A
to octal.
Answer: first convert to binary 1011 1011 0010 0011 1010, then consider the bits three at a time 10 111 011 001 000 111 010, and convert to octal 2731072.
- Add the two hexadecimal numbers 23AC and 4B80 and give the result in hexadecimal. Hint: add directly in hex instead of converting to decimal, adding, and converting back.
- Assume m and n are positive integers. How many 1 bits are there in the binary representation of 2^(m+n)?
- How many bits are in the binary representation of 2^2^2^2^17?
- IPv4 is the protocol developed in the 1970s that dictates how computers on the Internet communicate. Each computer on the Internet needs it own Internet address. IPv4 uses 32 bit addresses. How many computers can the Internet handle? Is this enough for every human being to have their own? Every cell phone and toaster?
- IPv6 is an Internet protocol in which each computer has a 128 bit address. How many computers would the Internet be able to handle if this standard is adopted? Is this enough? Answer: 2^128. That at least enough for the short term - 5000 addresses per square micrometer of the Earth's surface!
- When you but a hard drive, 1 GB means 1,000 MB (megabytes) and 1 MB means 1,000 KB (kilobytes) and 1 KB means 1,000 bytes. But when you buy memory, 1 GB means 1,024 MB, 1 MB means 1,024 KB, and 1 KB means 1,024 bytes. What percentage difference is there in the amount of storage in a 100 MB hard drive vs. 100 MB memory? 1GB hard drive vs. 1 GB memory? (1024/1000)2 = 4.9% and (1024/1000)3 = 7.4%.
-
Why does the following code fragment fail?
Answer: Java automatically promotes the sum to be of type int. To assign the result to a short, you need to explicitly cast it back c = (short) (a + b). Yes, this is rather quirky.short a = 4; short b = 5; short c = a + b;
-
What does the following code fragment from
program Overflow.java print out?
int a = 2147483647; // 2^31 - 1 int b = a + 1; System.out.println("a = " + a); System.out.println("b = " + b); -
What does the following code fragment print out?
int a = -5 >> 3; int b = -5 >>> 3; System.out.println(a); System.out.println(b);
- List all values a of type int for which (a == (a >> 1)). Hint: there is more than one.
-
What does the following code fragment print out?
int a = 11 & 17; int b = 11 ^ 17; int c = 11 | 17; int d = ~11; System.out.println(a); System.out.println(b); System.out.println(c); System.out.println(d);
-
Given two positive integers a and b, what result does the
following Java code fragment leave in c?
c = 0; while (b > 0) { if (b & 1 == 1) c = c + a; b = b >> 1; a = a <1; }Answer: a * b.
-
What does the following code do to the integers stored in two
different variables a and b?
a = a ^ b; b = a ^ b; a = a ^ b;
- Repeat the previous question, but assume a and b are the same variable.
-
What does the following code do to the integers stored in two
different variables a and b? Any problems with overflow?
a = a + b; b = a - b; a = a - b;
-
What do each of the following to statements do?
x = - ~x; x = ~ -x;
Increment x, decrement x
- Modify Binary.java so that it converts from base 7 to decimal and vice versa.
-
What is the value of cnt after the following loop?
Hint: it's not an infinite loop.int cnt = 0; for (int i = 1; i != 0; i = 2 * i) { cnt++; } -
Explain why the following Java code fragment correctly determines
whether the integer n is a power of 2.
boolean isPowerOfTwo = (n & -n) == n;
Creative Exercises
- Linear feedback shift register. Rewrite LFSR.java to simulate the linear feedback shift register from Chapter 1 using bit-whacking operations.
- Linear feedback shift register cycle length. Modify the program from the previous exercise to compute the cycle length of the LFSR using Floyd's method. What if you change the taps to xyz?
- IP addresses and IP numbers
An IP address (IPV4) is comprised of integers w, x,
y, and z and is typically written
as the string w.x.y.z. The corresponding IP number is given by
IP number = 16777216*w + 65536*x + 256*y + z. Given an IP number, the corresponding IP address is w = (ipnum / 16777216 ) % 256, x = (ipnum / 65536) % 256, y = (ipnum / 256) % 256, z = (ipnum) % 256. [Or use shifting and masking.]
Write a function that takes an IP number and returns a String corresponding to the IP address. Write a function that takes an IP address and returns a int corresponding to the IP number. For example, if the IP number is 3401190660, then the function should return "202.186.13.4".
- IP address. Write a program that takes a 32 bit string as a command line argument, and prints out the corresponding IP address in dotted decimal form. That is, take the bits 8 at a time, convert each group to decimal, and separate each group with a dot. For example, the binary IP address 01010000000100000000000000000001 should be converted to 80.16.0.1.
- Base64 encoding. Base64 encoding is a popular method for sending binary data over the Internet. It converts arbitrary data to ASCII text, which can be emailed back between systems without problems. Write a program to read in a arbitrary binary file and encode it using Base64.
- Counting in base -2.
Use the definition of the positional notation to define the base -2
number system. There are two digits 0 and 1. Count from -7 to 7
in this system.
Answer:
0 = 0 -1 = 11 (-2 + 1) 1 = 1 -2 = 10 2 = 110 (4 + -2) -3 = 1101 (-8 + 4 + 1) 3 = 111 -4 = 100 4 = 100 -5 = 1111 5 = 101 -6 = 1110 6 = 11010 (16 + -8 + -2) -7 = 1001 7 = 11011 System.out.println(ans);
- RGBA color format.
Some of Java's classes (BufferedImage, PixelGrabber) use a special
encoding called RGBA to store the color of each pixel.
The format consists of four integers, representing the red, green, and
blue intensities from 0 (not present) to 255 (fully used),
and also the alpha transparency value from 0 (transparent) to
255 (opaque).
The four 8-bit integers are compacted into a single 32-bit integer.
Write a code fragment to extract the four components from
the RGBA integer, and to go the other way.
// extract int alpha = (rgba >> 24) & 0xff; int red = (rgba >> 16) & 0xff; int green = (rgba >> 8) & 0xff; int blue = (rgba >> 0) & 0xff; // write back rgba = (alpha <24) | (red << 16) | (green << 8) | (blue << 0); system.out.println(ans);
- Find the missing value.
Suppose you have an array consisting of 232 - 1
integers of type int such that no two
integer appears more than once. Since there are 232
possible values, exactly one integer is missing.
Write a code fragment to find the missing integer using
as little extra storage as possible.
Hint: this is a popular interview question. It's possible to do it using only one extra int. Use either properties of integer overflow on two's complement integers or use the XOR function.
- Cyclic redundancy check. Write programs CRC16.java and CRC32.java that read in data from standard input and computes its 16 or 32-bit CRC. Use the library CharStdIn.
- Number conversion.
Write a program Converter.java that
converts between base b and decimal for any 2 &le b ≤ 36. You should have
a static method toString(int n, int b) that converts n to a base b
string and a static method fromString(String s, int b) that converts
from a base b string to an integer.
Consider defining and using
String digits = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
- Excel column numbering. Write a function that takes a nonnegative integer and converts it into the corresponding Excel column name (0 = A, 1 = B, ..., 25 = Z, 26 = AA, ..., 702 = AAA).
- Elias Gamma coding.
Write a function elias that takes as input an integer N and returns the
Elias Gamma code as a string.
The Elias Gamma code
is a scheme to encode the positive integers. To generate the code for
an integer N, write the integer N in binary, subtract 1 from the number of
bits in the binary encoding, and prepend that many zeros. For example,
the code for the first 10 positive integers is given below.
1 1 6 00110 2 010 7 00111 3 011 8 0001000 4 00100 9 0001001 5 00101 10 0001010
- Bit reversal.
Write a function that takes an integer input, reverse its bits, and
returns that integer.
For example if n = 8, and the input is 13 (00001101), then its
reversal is 176 (10110000).
public static int bitReverse(int input) { int ans = 0; for (int i = 0; i> 1; } return ans; } - Bit-reversal sorting.
Use the previous algorithm to "sort" an array of N = 2n elements
into their bit-reversed order.
Swap elements i and j if i and j are bit reversal of each other.
Such permutations arise in the Fast Fourier Transform.
0 1 2 3 4 5 6 12 13 14 15 0000 0001 0010 0011 0100 0101 0110 ... 1100 1101 1110 1111 0000 1000 0100 1100 0010 1010 0110 ... 0011 1011 0111 1111 0 8 4 9 2 10 6 3 11 7 15
- Swap without temporary storage.
What do the following two code fragments do given integers
a and b?
Answer: each 3-line fragment swaps a and b. It works for two's complement notation.a = a + b; b = a - b; a = a - b; a = a ^ b; b = a ^ b; a = a ^ b;
- Gray codes.
An order n Gray code
is an n-bit encoding of the integers 0 to 2n - 1 such that adjacent numbers
differ in only one bit. The following is a 3-bit Gray code.
000 001 011 010 110 111 101 100 0 1 3 2 6 7 5 4
Gray codes have many applications. For example, they can be used to increase the reliability of counting of analog-to-digital converters. Counting by Gray code reduces the chance of such errors since only one bit of the counter changes in each step. Gray codes have also been used in pulse code communication, the minimization of logic circuits, hypercube architectures, and were even proposed to organize books on library shelves.
The binary reflected Gray code of order n is generated as follows. The code of order one is 0, 1. To get the code of order two, write it forwards then backwards: 0, 1, 1, 0. Then, pre-pend 0's to the first half and 1's to the second half: 00, 01, 11, 10. In general, to get the code of order n, write the code of order n-1 forwards and backwards. Then pre-pend 0's to the first half and 1's to the second half. Write a program GrayCode.java that reads in a parameter N and prints out the order N binary reflected Gray code. Observe that the index of the bit that changes in line i is the disc that is moved in the ith move of our solution to the Towers of Hanoi problem!
- Puzzle. There are n people waiting outside an empty room. Each minute you can send one person into the room or take one person out of the room. Can you arrange it so that every possible combination of people is attained exactly once?
- Bit-whacking version of Gray codes Repeat the previous exercise, but use bit-whacking operations and iteration instead of recursion. Name your program BitWhackingGrayCode.java.
- Gray codes II. Redo the previous exercise but do not use Java's built in string class. Instead, do it with a boolean array and name your program GrayCodeArray.java.
- Combinational Gray Code. Print out all combination of k of n items in such a way that consecutive combinations differ in exactly one element, e.g., if k = 3 and n = 5, 123, 134, 234, 124, 145, 245, 345, 135, 235, 125. Hint: use the Gray code, but only print out those integers with exactly k 1's in their binary representation.
- Free the prisoners I. A warden meets with 17 new prisoners when they arrive. The warden tells them that they may meet today and plan a strategy, but after the meeting, each prisoner will be in solitary confinement and will not be able to communicate with one another. The prison has a switch room with 17 switches that can be on or off, although the initial configuration is not revealed. There is one special setting of the 17 switches that if it is ever achieved will enable the prisoners to go free. Each hour the warden escorts one prisoner to the switch room, where the prisoner can flip at most one switch (from on to off or off to on). The warden can choose prisoners in arbitrary order, so one prisoner may be chosen four times in a row, or not at all. Design a strategy for the 17 prisoners so that they are guaranteed to be set free after some finite amount of time.
- Free the prisoners II. Same premise as above, except that the switch room has 2 switches (initially both off), and a prisoner must flip exactly one of the two switches upon entering the switch room. At any time, a prisoner may declare "all 17 of us have visited the control room." If it is true, all prisoners are freed; otherwise they are all executed. The warden can choose prisoners in arbitrary order, so one prisoner may be chosen four times in a row, but each prisoner will be chosen infinitely often (assuming they are never freed). Design a strategy for the 17 prisoners so that they are guaranteed to be set free after some finite amount of time. Extra credit: don't assume the initial configuration is known.
- Count the number of 1 bits.
Write function that takes an integer input and returns the
number of 1's
in its binary representation.
Answer:
public static int bitCount(int input) { int count = 0; for (int i = 0; i <32; i++) count=count + (input>>> i & 1); return count; } - Sparse bit-counting.
Explain why the following function (that often appears in job interviews
for programmers)
correctly counts the number of 1 bits in the binary representation of its input.
If the input has k 1's, how many times does the while loop iterate?
public static int bitCount(int input) { int count = 0; while (input != 0) { count++; input = input & (input - 1); } return count; } - Table lookup bit-counting.
Repeat the previous exercise, but pre-compute a table to speed up
the computation.
Answer: this one assumes you have a precomputed table of size 256, with bits[i] storing the number of 1 bits in the binary representation of i. You can use the bit counting function from the previous exercise to initialize it. previous
public static int bitCount(int input) { return bits[(input >> 0) & 0xff] + bits[(input >> 8) & 0xff] + bits[(input >> 16) & 0xff] + bits[(input >> 24) & 0xff]; }Increasing the table size to 216 = 65,536 will make things faster assuming you have sufficient memory. A table of size 232 is likely prohibitive.
- Subset sum. Given N integers and a target value V, your goal is to find a subset of the integers whose sum is as close to V as possible. Write a program SubsetSum.java that takes N as a command line input, reads in N integers from standard input, and print out the best subset and sum. Enumerate all 2^N subsets by brute force, using an integer variable to count from 0 to 2^N - 1, and using bit-whacking to calculate the subset sums.
- Faster subset sum. Repeat the previous exercise, but enumerate the 2^N subsets using a Gray code.
- Even faster subset sum. Divide the elements in half. Enumerate the subset sums in each half, sort each half, and compare.
- Exact subset sum. Divide the elements in half. Enumerate the subset sums in the first half and insert into a hash table, enumerate the subset sums in the second half and check whether the complementary value is in the hash table.
- Dictionary attack. One method that sleazy spammers use to auto-generate email addresses is by enumerating all possible email addresses at a give domain, e.g., hotmail.com. This annoying tactic is called a dictionary or Rumpelstiltskin attack and explains why you sometimes receive spam on a new email address to which you haven't given to anybody. Use Converter.java to design such a program. Your program Rumpelstiltskin.java should take a command line parameter N and print out all 36N possible passwords of N or fewer characters involving numbers and uppercase letters.
- Breaking a gold chain. You have a gold chain with 14 links that you are going to use to pay an worker for 15 days at a fee of 1 gold link per day. It's possible to split the chain into 15 pieces by cutting 14 times. Your goal is to pay the worker while only breaking the chain 3 times. The worker must receive exactly the right fraction of total payment after each day of work. Hint: break the chain so there are pieces of 1 section, 2 sections, 4 sections, and 8 sections.
- Hamming encoder. Write a Java program HammingEncoder.java that reads in a sequence of 0s and 1s, 4 bits at a time, and encodes them using Hamming codes.
- Hamming decoder. Write a Java program HammingDecoder.java that reads in a sequence of 0s and 1s encoded using Hamming codes, 7 bits at a time, and decodes and correct them.
- Hamming codes. Modify your solutions to the previous two exercises so that the input bits are packed 8 to the byte.
- Absolute value. The constant Integer.MIN_VALUE is the most negative 32-bit two's complement integer. What is Math.abs(Integer.MIN_VALUE)?
- Prove that a k-digit decimal number can be represented in binary with no more than 4k bits.
- CD Database.
CDDB and freedb
are databases that allow you to look up
CD information on the Web and display the artist, title, and song name.
Each CD has a (nearly) unique disc ID number which is used to
query the database.
- Write a static method sumDigits() that takes an integer parameter and returns the sum of the decimal digits in the integer. For example, sumDigits(6324) returns 15 since 6 + 3 + 2 + 4 = 15.
-
Write a program CDDB.java that
computes the disc ID from a list of lengths of the track lengths.
The 32-bit (8 hex digit) ID number is computed from the length of
the tracks on the CD and the number of tracks as follows:
XXYYYYZZ XX = checksum of track offsets in seconds, taken mod 255 YYYY = length of the CD in seconds ZZ = number of tracks on the CD
