Assignment #7.

Goal

The goal is to empirically verify the performance of a hash table with two different probing strategies. Starting with the author’s Quadratic Probing hash table, you must modify it so as to use (a) linear probing and (b) random probing, and in both cases to report the average number of probes. The latter is defined as the total number of times a table location is calculated, i.e., including the one calculated by the hash function.

After being sure that your average number of probes is being calculated correctly, empirically verify the performance vs. load factor lambda. See Figure 5.12 in the text.

Notes:

1. Use words from words.txt under downloads on the course Web page. A function readLines() is provided under downloads to read these words into an array in your driver. (Note: the words file has one word per line, so word and line are used interchangeably here.) The file has about 25,000-26,000 words in it. After the words are stored in the array, they need to be inserted into your hash table. This should be done randomly (there’s a lot of randomness in this assignment!).

2. Figure 5.12 has several curves. For this assignment, we are interested only in the ones labeled U,I. This is the expected number of probes required to do a single unsuccessful search or an insertion.

3.  T o verify a point on one of the curves, you have to measure the expected number of probes that it takes to insert a single item into a hash table that is already “loaded” to a load factor lambda. Naturally, you should be picking words at random to do test insertions, so you should expect some variability in the number of probes it takes to insert each one. Therefore to get meaningful answers, you have to insert several and average the number of probes. A minor problem is that each insertion you do will change the load factor. However, with a large table this effect should be small enough to be ignored. That is, if you have 25,005 words and create table of size 50,000 you will have a load factor of 0.5 after inserting 25,000 of them. Inserting 5 or 10 more will change the load factor very little.

 4. The author’s hash class implements rehashing. Since this creates an entirely new table when the load factor gets over 50%, you will have to disable this feature for this assignment.

 

Suggested experiment plan:

Random Probing

1.  Begin by implementing random probing for your closed hash table. Test it thoroughly. Note that random probing requires a random permutation of size tableSize-1. This should be created in the hash table constructor since it cannot change over the life of the table. (Why not?)

 2. You program should accept the load factor from input since it is a parameter in the study. The hash table should be instantiated with a size of the next prime number beyond n. Note that you should always insert all of the words.

 3.  In order to insert the words randomly into the hash table, create a random permutation r[] the same size as the array of words. (Note: this is a different random permutation from the one you use for probing.) Then insert word[r[i]] each time through the loop. By stopping the loop a little short of n, the total number of words, you can then have another loop in which you insert the remaining ones; these are the ones for which you should compute the average number of probes.

4. Run your program for lambda = 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.65, 0.70, 0.75, 0.8, 0.85 and 0.9. Plot the result with a spread sheet, or carefully on graph paper.

Linear Probing

Implement linear probing and repeat the above steps. You can save yourself some work by noting that linear probing can be implemented (although not efficiently) as a special case of random probing.