Modular arithmatic is a mathematical system discovered by K.F. Gauss (1777 - 1855) in 1801. It is referred to as the Gaussian number system (our normal system is also known as the Euclidean system).
The Gaussian system is composed of a set of numbers, where if the greatest number is exceeded, the system loops back to the first number.
Two numbers are said to be congruent within the Gaussian system when the difference between them is exactly divisible by N.
| N | (a-b) = 0 | Where N represents the mod of the system, and a and b represent two congruent numbers in the system. |
| N mod (a-b) | = 0 |
| 5 mod (7-2) | = 0 |
| 5 mod 5 | = 0 |
The accepted practice
is to write modular numbers in square brackets indicating the mod set by
a subscript:
Negative numbers complicate things somewhat as the are harder visualize. The trick is to remember that when counting backwards, after 0 comes the highest number of the set. Backwards counting is shown for the mod 5 set.
The set of mod 5: {0, 1, 2, 3, 4}
5 mod
5 = 0
4 mod
5 = 4
3 mod
5 = 3
2 mod
5 = 2
1 mod
5 = 1
0 mod
5 = 0
-1 mod 5 =
4
-2 mod 5 =
3
-3 mod 5
= 2
Subtraction works almost identically to addition, if we carry our rules over from the Euclidean system.
if a (mod N)
= b (mod N) [a]N = [b]N
and c (mod
N) = d (mod N) [c]N = [d]N
then (a-c) (mod N) = (b-d) (mod N) [a-c]N = [b-d]N
Examples:
[4]5 [0]5 = [4]5
[8]5 [4]5 = [4]5
[2]5 [4]5 = [3]5
[16]5 [3]5 = [3]5
[4]5 [12]5 = [2]5
Again, you can use your calculator for larger values
Try [120]13
[5438]13
The answer should
be [12]13
1) Zero element
0 + a = a
0 * a = 0
2) Unit element
1 * a = a
3) Communtavity
a + b = b + a
a * b = b * a
Examples:
[3]5 * [2]5
= [3 *2]5
= 5 mod 5
= [0]5
[32]7 * [8]9
= [32 * 8]9
= 256 mod
9
= [4]9
[42]7 * [12]7
= [42 * 12]7
= 504 mod
7
= [0]7
The last example
illustrates a difference between the Euclidean and Gaussian systems, zero
divisors. Zero divisors is a set of numbers which produce the result of
zero when multiplied together. In this example, 42 * 7 resulted in 0 making
them a zero divisor set. In the Euclidean system, only multiplying something
by zero will result in zero. For each Gaussian system, there is a unlimited
set of zero divisors.
Try making multiplication tables for the mod set of 4, 5 and 6.
1) Relative Primality
The number M is said to be relatively prime to N if there are no common factors between the two numbers.
[M]N
A number M in the Gaussian system, which is relatively prime to N, has special properties compared to a number that is not relatively prime.
Examples:
[2]4 is not
relatively prime
[5]3 is relatively
prime
2) Multipicative Inverses
The multipicative inverse of a x, is a number that when multiplied by x equal multipicative inverse.
Examples of Multipicative inverses:
[4]5 * [4]5 = [1]5
[2]5 * [3]5 = [1]5
3) Relative Primality and Multipicative Inverses
[M]N
In the Gaussian system, a number M, which is relatively prime to N, will have a multipicative inverse.
Examples:
[2]3 * x =
1
Since
[2]3 is relatively prime, a multipicative inverse exists.
x = [2]3
or [5]3
..
[2]4 * x =
1
Since
[2]4 is not relatively prime, no multipicative inverses exist.
4) Euclidean
Algorithm
The simplest way of determining the GCD (greatest common divisor) is the factor two numbers and pick out the common factors.
Example:
Find the GCD
of 36 and 16.
16
2 8
2 4
2 2
16 = 2 x 2 x 2 x2
36
2 18
2 9
3 3
36
= 2 * 2 * 3 *3
The common
factors are (2 * 2) which equals 4.
An alternative method to finding the GCD is by using the Euclidean Algorithm. The Euclidean Algorithm operates on several simple steps for the set (m, n):
1) Put the highest
number on the left hand side of the set
2) If n = 0, the
GCD is m -> Exit
3) Subtract the
maximum number of ns from m
4) Swap m and n
5) Go back to step
two
Example:
Find the GCD of 36 and 12.
(36, 16)
(36 (16
*2), 16)
(4, 16)
(16, 4)
(16, 4)
(16 (4
*4), 4)
(0, 4)
(4, 0)
The GCD is
4.
5) Implementation
of the Euclidean Algorithm
The Euclidean Algorithm can be easily implemented both iterative and recursively in C++. Below is the iterative implementation:
Download the GCD algorithm source code
6) Euclidean Algorithm and Relative Primality
[M]N
A number in the Gaussian system is relatively prime if no common factor exists between M and N. This means that the GCD of these two numbers will be 1. We can use the Euclidean Algorithm to determine if a Gaussian number has a multipicative inverse.
Example:
[0]6 GCD(0,6)
= 6 therefore no inverse exists
[1]6 GCD(1,6)
= 1 therefore a inverse exists
[2]6 GCD(2,6)
= 2 therefore no inverse exists
[3]6 GCD(3,6)
= 3 therefore no inverse exists
[4]6 GCD(4,6)
= 2 therefore no inverse exists
[5]6 GCD(5,6)
= 1 therefore a inverse exists
7) The Extended
Euclidean Algorithm
The Extended Euclidean
algorithm is simply the Euclidean algorithm expanded to store several values
in each iteration of its loop. The method used to dervice this method is
beyond the scope of this document, however the results are very important.
The Extended Euclidean algorithm expresses the gcd of x and y as whole
number coefficients:
gcd = kx + ly Where
gcd is the greatest common divisor of x and y,
And k and l are
whole number coefficients
One of these whole number coefficients is zero or negative
These coefficients are found as follows:
Set a=1, b=0, c=0,
d=1
Loop while y!=0
and x!=0
If x<=y
l = y mod x
k = (y-l) / x
c = c (k*a)
d = d (k*b)
y = l
Else
l = x mod y
k = (x-l) / y
a = a (k*c)
b = b (k*d)
x = l
If x=0
gcd = y
k = c
l = d
Else
gcd = x
k = a
l = b
The above method
is most suited to a computerized calculation method because of the number
of variables used, however it is possible with paper and pencil.
8) The
Extended Euclidean Algorithm (substitution method)
An easier method exists for paper and pencil, this is the substitution method. The example below will use the numbers 54 and 21.
Recall the steps
for finding the gcd using the Euclidean Algorithm.
Now express each
of the steps using:
x = (k *
y) + remainder
The remainder is simply x mod y. k can be found by dividing x minus the remainder by y.
54 = (2 *
21) + 12
21 = (1 *
12) + 9
12 = (1 *
9) + 3
9 = (3 *3) + 0
The last non-zero remainder, 3 is the gcd.
Now re-arrange all these equations to solve for the remainder.
12 = (1 *
54) (2 * 21)
9 = (1 * 21) (1 * 12)
3 = (1 * 12) (1 * 9)
0 = (1 * 9) (3 * 3)
Now we want to express the gcd in the form gcd = kx + ly, only this time we can use 12 and 9 as x and y. This is expressed by the second to last equation above.
3 = (1 * 12) (1 * 9)
Now we use the next equation, and substitute it into the above equation.
Sub 9 = (1
* 21) (1 * 12)
Into 3 =
(1 * 12) (1 * 9)
3 = (1 * 12)
(1 * ( (1 * 21) (1 * 12) ) )
3 = (2 *
12) (1 * 21)
Next Sub
12 = (1 * 54) (2 * 21)
Into
3 = (2 *12) (1 * 21)
3 = (2 * (
(1 * 54) (2 * 21) ) (1 * 21)
3 = (2 *
54) (5 * 21)
So, now we have expressed the gcd with kx + ly, solving the algorithm
9) Implementation of the Extended Euclidean Algorithm
The Extended Euclidean
Algorithm is best implemented by iteration due to the number of variables
used during the calculation.
Download Extended Euclidean Algorithm
source code
10) The Extended Euclidean Algorithm and Multipicative Inverses
The Extended Euclidean Algorithm can be used to find the Multipicative Inverse of a Gaussian number. Given that
gcd = kx + ly Where
gcd is one,
And x and y represent
[x]y of a Gaussian number
Then [k]y is the multipicative inverse of [x]y.
Example:
Find the multipicative inverse of [2]5
Using the
Extended Euclidean Algorithm we find that:
1 =
(-2 * 2) + (1 * 5)
So [2]5 * [-2]5 equals 1
It is a good idea to always express the multipicative inverse as a Gaussian number within the mod set.
Example:
Find the multipicative inverse of [2]13
Using the
Extended Euclidean Algorithm we find that:
1 =
(-6 * 2) + (1 * 13)
So [2]13 * [-6]13 equals one
However, since [-6]13 equals [7]13 [2]13 * [7]13 is equivalent
1) Single parity check
A common and simple method of error checking is the single parity check. A check bit is used to create a even or odd number of the digit 1 in rows (or columns) of binary data.
Examples: (even
parity)
1 0 1 1 0 0 0 1
?
Check bit
0 1 0 1 0 0 1 1
0 0 0 0 0 1 0 1
1 0 1 1 0 0 0 1
Examples: (odd parity)
1 1 0 1 0 0 1 1
1 0 0 0 0 1 0 1
0 0 1 1 0 0 0 1
2) Double parity check
A double parity check makes use of a parity check on each row of data, as well as the columns of data. This is only useful during synchronous transfers where a large known amount of data is being transferred.
Example: (even parity) (D Data, P Parity bits)
P D D D D D
D D
D 0 1 0 1 0 0 1
1
D 0 0 0 0 0 1 0
1
D 1 0 1 1 0 0 0
1
D 1 0 0 1 1 0 1
0
D 0 1 0 0 1 0 1
1
D 1 0 0 0 0 0 0
1
D 1 1 1 0 0 1 0
0
P 0 1 0 1 0 0 1
1
3) Evaluation of Check Bits
Check bits are useful for detecting single and multiple bit errors when these errors are independent, however most transmission errors occur in chunks, affecting a large number of bits. Check bits can be easily fooled by these types of error.
4) Checksums
A check sum is another method of detecting errors. It calculates a sum for a row (or column) of data usually using modular arithmetic. A checksum is typically added to the end of a row of data and calculated using a mod-256 set. A checksum is typically used to make a row of data equal zero when it is added using mod-256.
Example:
146 114 96
49 146 160 155 37 121
?
??????????Checksum
If you sum the row using mod-256, it will equal zero.
1) The Fletcher Checksum
The Fletcher Checksum is a slightly better checksum resulting in two digits. It is calculated using the following method:
c[0] = The first
checksum digit
c[1] = The second
checksum digit
i = The loop variable
b[0]
b[n] = The
data the checksum is being calculated from
1) Set C[0] = 0,
and C[1] = 0. The final checksum value will be stored in these
2) Loop from 0 to
n
1) C[0] = C[0] +
b[i]
2) C[1] = C[1] +
C[0]
Example:
43 54 214 165 157 166 175 157 107 112
Where the
last two digits are C[0] and C[1]
The Fletcher checksum
can also be used as a hash algorithm.
2) The Division
/ Remainder hash algorithm
The Division / Remainder hash algorithm is a simple and common hash algorithm. It operates by summing a set of data and taking the mod-set of the data. It is typically used for a hashed search (see next section) so the mod-set used is typically determined by the searching method.
Example:
Word A K E Y
Value 1 11 5 25
We can represent each of these values by a 5 digit binary number
Word A K E Y
Value 00001 01011
00101 11001
Therefore our
number is: 00001010110010111001
Which equals
44217 in decimal.
If we were
wanted a maximum of 101 values, then we would use:
44217 mod
101 = 80
So the hash code for AKEY is 80.
One potential problem
is that for a string with more characters, it might be impossible to find
the value of the key word without using special large number algorithms.
The solution is to calculate the hash code one character at a time instead
of for the entire word. By using Horners method we can create the following
method to calculate a hash code.
h = The hash code
i = The loop variable
b[0]
b[n] = The
data the checksum is being calculated from
1) h = b[n]
2) Loop from n-1
to 0
h = ( (h*32)
+ b[i]) mod M
Searching using
Hash codes
1) Idea behind searching with hash codes
The idea behind Hashing is to reduce the time required to search for a large data record by only comparing a smaller value. This smaller value is the hash code of the data record and is calculated by a hash function. The hash code is typically used to store the data (or a pointer to the data) in an ordered table of some sort.
To start with we create a hash function, which calculates the hash code for the data. Ideally, each set of data should produce a unique hash value, however, in the real world this is not possible. Hence, a collision-resolution process is required to deal with two sets of data producing the same hash code.
2) Separate Chaining
Separate Chaining is by far the simplest method of dealing with collisions. If we have a hash function that can create a maximum of M hash values, we create an array of M linked lists. For each set of data that is hashed, it is stored in the linked list of the corresponding array element for its hash value.
Example:
H A S H I N G I S
G O O D
8 1 8 8 9 3 7 9
8 7 4 4 4
Produces:
The separate chaining
method is most useful if the number of elements is unknown in advance.
This method does use a lot of overhead, and requires careful programming
to implement.
3) Linear Probing
The Linear Probing method stores N hash values in a table of size M > N. The table size must be greater than N because empty spaces are used to deal with collisions. For every piece of data that is to be added, the hash code is calculated. If the position in the table corresponding to the hash code is empty, the data is added to that position. If it is empty, then the data is added in the next empty element.
Example:
A S E A R C H I N
G E X A M P L E
1 0 5 1 18 3 8 9
14 7 5 5 1 13 16 12 5
Produces the following table:
0 1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16 17 18
S A A C A E E G
H I X E L M N P R
Legend:
Bold indicates the element just added
Double
Strikethrough indicates elements jumped due to a collision
Start with
blank table
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
Insert A (Hash
Code = 1)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
A
Insert S (Hash
Code = 0)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A
Insert E (Hash
Code = 5)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A
E
Insert A (Hash Code
= 1)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A A
E
Insert R (Hash
Code = 18)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A A
E
R
Insert C (Hash
Code = 3)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A A C E
R
Insert H (Hash
Code = 8)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A A C E
H R
Insert I (Hash
Code = 9)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A A C E
H I R
Insert N (Hash
Code = 14)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A A C E
H I N R
Insert G (Hash
Code = 7)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A A C E
G H I N R
Insert E (Hash
Code = 5)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A A C E
E G H I N R
Insert X (Hash
Code = 5)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A A C E
E G H I X N R
Insert A (Hash
Code = 1)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A A C A E E G
H I X N R
Insert M (Hash
Code = 13)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A A C A E E G
H I X M N R
Insert P (Hash
Code = 16)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A A C A E E G
H I X M N P R
Insert L (Hash
Code = 12)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A A C A E E G
H I X L M N P R
Insert E (Hash
Code = 5)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A A C A E E G
H I X E L M N P R
To find an element, you calculate the hash value of the search key, and go to that position in the table. If the element isnt present there, then you continue to probe the next position until you find the element, or you hit a blank element in which case the element is not present.
There are a number of rules for deciding on the size of the table for x number of elements. The most important ones are that the size of the table should be prime (relative primality since were dealing with modular arithmetic), and that it should be at least one element larger than the number of being stored. A table that is only one element larger is very inefficient, however, so a much larger table is desired.
A simple C++ implementation for linear probing is on our web site. The simple remainder hash function is used to calculate the hash code. No safety check is included to check if table is full, in which case an infinite loop would occur if another item were added, or an unsuccessful search was conducted.
4) Double Hashing
Although the idea behind linear probing is good, clustering can occur very easily. This is when large amount of data clusters together resulting in very large searches to find data. Double Hashing is an extending of linear probing which solves this problem. If a collision occurs in double hashing, instead of incrementing the table position by one, it increments it by another hash code. This results in spreading out of data.
The choice of the second hash function is very important, as otherwise clustering can still occur. The primary criterion is that the two hash functions must be independent, not just modifications of the same type of function. However, at the same time efficiency must be considered.
Example:
A S E A R C H I N
G E X A M P L E
1 0 5 1 18 3 8 9
14 7 5 5 1 13 16 12 5
7 3 3 7 6 5 8 7
2 1 3 8 7 3 8 4 3
The first row of the table is the primary hash codes, the second row contains the hash codes if a collision occurs.
Start with
blank table
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
Insert A (Hash
Code = 1)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
A
Insert S (Hash
Code = 0)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A
Insert E (Hash
Code = 5)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A
E
Insert A (Hash
Code = 1, Second Hash Code = 7)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A
E A
Insert R (Hash Code
= 18)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A
E A R
Insert C (Hash Code
= 3)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A C
E A R
Insert H (Hash Code
= 8, Second Hash Code = 8)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A C
E A H R
Insert I (Hash Code
= 9)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A C
E A I H R
Insert N (Hash Code
= 14)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A C
E A I N H R
Insert G (Hash Code
= 7)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A C
E G A I N H R
Insert E (Hash Code
= 5, Second Hash Code=3)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A C
E G A I E N H R
Insert X (Hash Code
= 5, Second Hash Code=8)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A C
E G A I E X N H R
Insert A (Hash Code
= 1, Second Hash Code=7)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A C
E G A I E X N A H R
Insert M (Hash Code
= 13, Second Hash Code=3)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A C
E M G A I E X N A H R
Insert P (Hash Code
= 16, Second Hash Code=8)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A P C E
M G A I E X N A H R
Insert L (Hash Code
= 12)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A P C E
M G A I E L X N A H R
Insert E (Hash Code
= 5, Second Hash Code =3)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18
S A P C E
M G A I E L X N A H E R
To find an element, you calculate the hash value of the search key, and go to that position in the table. If the element isnt present there, then you advance sequentially by the second hash code until you find the element, or hit a blank in which case the element is not present.
The Double Hashing method has been mathematically proven to be much more efficient than Linear Probing. The average number of probes is less than five for an unsuccessful search if the table is less than 80% full, and for a successful search if the table is less than 99% full. This means that a smaller table can be used to get the same search times as using Linear Probing.
A simple Double Hashing C++ class is included on our web-site. Again no test is made for a full table so caution must be used when using this class.
5) Comparison of Hash Methods
Any search using hashing should decrease search times, however, some of the collision methods are more efficient than others. To decide on which method to use you must balance several factors. The Separate Chaining method is very effective in dynamic situations when the number of elements is unknown, however its implementation is much more complicated than either of the open-addressing methods (Linear Probing and Double Hashing). The open-addressing methods have their own problems. An approximate number of elements needs to be known, if underestimated the table will seize to work, if overestimated a lot of memory will be wasted. Elements cannot be easily deleted, it will mess up the searching functions. Double Hashing is generally accepted to be more efficient than Linear Probing for larger amounts of data assuming that appropriate hash functions are chosen.
Random Number Generation
1) Linear Congruential Method
This is a very well known method of generating random numbers and is very popular. It is also very simple. It is shown below:
To fill an array (a) with N random numbers: (i is loop variable)
1) a[0] = seed
2) loop from 1 to
N
1) a[i] = (a[i-1]
* b +1 ) mod m
Where seed, b and m and constants.
Much mathematical research has been done as to the ideal values to set b and m to. Several rules have been presented.
1) m should be large
2) b should be one
digit less than m
3) b should have
no pattern in its digits expect;
4) b should end
with x21, where x is even
D.E. Knuth who did considerable research into the generation of random numbers using this method created these rules.
A C++ implementation of this generator can be found on our web-site.
Robert Sedgewick, Algorithms, second edition, Addison-Wesley, 1989
Frank Beatrous, A Short Course in Cryptography, http://fennario.math.pitt.edu/~frank/crypto/
Sherry Mayo, Some
mathematical ideas from modular arithmetic used in RSA,
http://www.momentus.com.br/PGP/doc//modulus.html
Danny Eizips and
Teddy Flatau, Error Control in Data Communications,
http://www.rad.com/networks/1994/err_con/err_ctl.htm
R. Jain, A Comparison
of Hashing Schemes for Address Lookup in Computer Networks, IEEE
Transactions
on Communications, Vol. 40, No. 3, October 1992, pp. 1570-1573.
Steve Linton, Cryptology
Lecture,
http://www-theory.dcs.st-and.ac.uk/~sal/school/CS3010/Lectures/forhtml/node4.html
Applications of
Modular Arithmetic