Problem: Maximum Finding

Everybody knows how to find the maximum of a set of integers. You might want to do the following:

max = x[0];
for (i = 1; i < n; i++)
     if (x[i] > max)
          max = x[i];
This is based on the fact that you have only one CPU and take n - 1 comparisons to find the maximum. Can we do better if more CPUs are available? The answer is certainly a "yes". Here is how.

Let us use a thread to simulate a CPU. Suppose we have n distinct integers x1, x2, ..., xn. We first initialize a working array of n entries, say w (winners), to all 1s. This can be done by using n threads, each of which writes a 1 into an entry. More precisely, thread Ti writes 1 into wi. After this initialization step, two more steps are required. Since all threads (or CPUs if you prefer) take one step to complete their job, the initialization step only needs one step. Writing a 1 into w[i] means that we assume it is a winner until we strike it out!

Step 2 For each pair of integers xi and xj, we create a thread (or a CPU if you prefer) Tij. This thread compares xi and xj, and writes a 0 into wi if xi < xj. Otherwise, it writes a 0 into wj. Writing a 0 into w[i] means that it is a loser in this comparison. Therefore, in this step, we need n(n-1)/2 threads, each of which executes one step to compare and write. Why n(n-1)/2 threads rather than n2?

Step 3 This step requires n threads. The i-th thread checks the value of wi. If it is a 0, do nothing. If it is a 1, output the value of xi because it is the maximum! Here is why. The maximum value is larger than any other number. As a result, when it is compared with other values in Step 2, its corresponding entry in array w never receives a zero. Step 3 also requires one step to output the maximum. Thus, if we have n(n-1)/2 threads, it only takes three comparison steps to find the maximum!

Example

Let us take a look at an example. Suppose we have 4 numbers, x1 = 3, x2 = 1, x3 = 7, x4 = 4. Array w is initialized to 1, 1, 1, 1 (i.e., w1 = w2 = w3 = w4 = 1). In Step 2, we need 4×(4-1)/2= 6 threads:

Thread Name Its Comparison Output
T12 Compares x1 = 3 and x2 = 1 Writes 0 into w2
T13 Compares x1 = 3 and x3 = 7 Writes 0 into w1
T14 Compares x1 = 3 and x4 = 4 Writes 0 into w1
T23 Compares x2 = 1 and x3 = 7 Writes 0 into w2
T24 Compares x2 = 1 and x4 = 4 Writes 0 into w2
T34 Compares x3 = 7 and x4 = 4 Writes 0 into w4

After this step, w1, w2 and w4 are set to zero, and w3 is still 1. Therefore, in Step 3, the thread associated with w3 sees a 1 and outputs the value of x3, which is 7. Thus, the maximum is 7.

Problem

Write a program that reads in an array of some distinct integers and use the above algorithm to find the maximum. Note that you should aim for maximum parallelism. More precisely, all created threads should run concurrently.