Problem: Maximum Finding

Everybody knows how to find the maximum of a set of integers. You might want to do the following:

max = x[0];
for (i = 1; i < n; i++)
     if (x[i] > max)
          max = x[i];

This is based on the fact that you have only one CPU and take n - 1 comparisons to find the maximum. Can we do better if more CPUs are available? The answer is certainly a "yes". Here is how.

Let us use a thread to simulate a CPU. Suppose we have n distinct integers x₁, x₂, ..., x_n. We first initialize a working array of n entries, say w (winners), to all 1s. This can be done by using n threads, each of which writes a 1 into an entry. More precisely, thread T_i writes 1 into w_i. After this initialization step, two more steps are required. Since all threads (or CPUs if you prefer) take one step to complete their job, the initialization step only needs one step. Writing a 1 into w[i] means that we assume it is a winner until we strike it out!

Step 2 For each pair of integers x_i and x_j, we create a thread (or a CPU if you prefer) T_ij. This thread compares x_i and x_j, and writes a 0 into w_i if x_i < x_j. Otherwise, it writes a 0 into w_j. Writing a 0 into w[i] means that it is a loser in this comparison. Therefore, in this step, we need n(n-1)/2 threads, each of which executes one step to compare and write. Why n(n-1)/2 threads rather than n²?

Step 3 This step requires n threads. The i-th thread checks the value of w_i. If it is a 0, do nothing. If it is a 1, output the value of x_i because it is the maximum! Here is why. The maximum value is larger than any other number. As a result, when it is compared with other values in Step 2, its corresponding entry in array w never receives a zero. Step 3 also requires one step to output the maximum. Thus, if we have n(n-1)/2 threads, it only takes three comparison steps to find the maximum!

Example

Let us take a look at an example. Suppose we have 4 numbers, x₁ = 3, x₂ = 1, x₃ = 7, x₄ = 4. Array w is initialized to 1, 1, 1, 1 (i.e., w₁ = w₂ = w₃ = w₄ = 1). In Step 2, we need 4×(4-1)/2= 6 threads:

*Thread Name*	*Its Comparison*	*Output*
T₁₂	Compares x₁ = 3 and x₂ = 1	Writes 0 into w₂
T₁₃	Compares x₁ = 3 and x₃ = 7	Writes 0 into w₁
T₁₄	Compares x₁ = 3 and x₄ = 4	Writes 0 into w₁
T₂₃	Compares x₂ = 1 and x₃ = 7	Writes 0 into w₂
T₂₄	Compares x₂ = 1 and x₄ = 4	Writes 0 into w₂
T₃₄	Compares x₃ = 7 and x₄ = 4	Writes 0 into w₄

After this step, w₁, w₂ and w₄ are set to zero, and w₃ is still 1. Therefore, in Step 3, the thread associated with w₃ sees a 1 and outputs the value of x₃, which is 7. Thus, the maximum is 7.

Problem

Write a program that reads in an array of some distinct integers and use the above algorithm to find the maximum. Note that you should aim for maximum parallelism. More precisely, all created threads should run concurrently.