# Problem: Maximum Finding

Everybody knows how to find the maximum of a set of integers. You might want
to do the following:

max = x[0];
for (i = 1; i < n; i++)
if (x[i] > max)
max = x[i];

This is based on the fact that you have *only* one CPU and take
*n* - 1 comparisons to find the maximum. Can we do better if more
CPUs are available? The answer is certainly a "yes". Here is how.
Let us use a thread to simulate a CPU. Suppose we have *n*
*distinct* integers
*x*_{1}, *x*_{2}, ...,
*x*_{n}. We first initialize a working
array of *n* entries, say *w* (winners), to all 1s.
This can be done by using *n* threads, each of which writes a 1
into an entry. More precisely, thread *T*_{i}
writes 1 into *w*_{i}. After this initialization
step, two more steps are required. Since all threads (or CPUs if you prefer)
take one step to complete their job, the initialization step only needs
*one* step. Writing a 1 into *w*[*i*] means that we
assume it is a winner until we strike it out!

**Step 2**
For each pair of integers *x*_{i} and
*x*_{j}, we create a thread
(or a CPU if you prefer) *T*_{ij}.
This thread compares *x*_{i} and
*x*_{j}, and writes a 0 into
*w*_{i} if
*x*_{i} < *x*_{j}.
Otherwise, it writes a 0 into *w*_{j}.
Writing a 0 into *w*[*i*] means that it is a loser in this
comparison. Therefore, in this step, we need
*n*(*n*-1)/2 threads, each of
which executes one step to compare and write.
**Why ***n*(*n*-1)/2 threads
rather than *n*^{2}?

**Step 3**
This step requires *n* threads. The *i*-th thread
checks the value of *w*_{i}. If it is a 0, do
nothing. If it is a 1, output the value of *x*_{i}
because it is the maximum! Here is why. The maximum value is larger than any
other number. As a result, when it is compared with other values in Step 2,
its corresponding entry in array *w* never receives a zero.
Step 3 also requires one step to output the maximum. Thus, if we have
*n*(*n*-1)/2 threads, it only takes three comparison
steps to find the maximum!

### Example

Let us take a look at an example. Suppose we have 4 numbers,
*x*_{1} = 3,
*x*_{2} = 1,
*x*_{3} = 7,
*x*_{4} = 4.
Array *w* is initialized to 1, 1, 1, 1
(*i.e.*, *w*_{1} = *w*_{2} =
*w*_{3} = *w*_{4} = 1). In Step 2, we
need 4×(4-1)/2= 6 threads:

*Thread Name* |
*Its Comparison* |
*Output* |

*T*_{12} |
Compares *x*_{1} = 3 and
*x*_{2} = 1 |
Writes 0 into *w*_{2} |

*T*_{13} |
Compares *x*_{1} = 3 and
*x*_{3} = 7 |
Writes 0 into *w*_{1} |

*T*_{14} |
Compares *x*_{1} = 3 and
*x*_{4} = 4 |
Writes 0 into *w*_{1} |

*T*_{23} |
Compares *x*_{2} = 1 and
*x*_{3} = 7 |
Writes 0 into *w*_{2} |

*T*_{24} |
Compares *x*_{2} = 1 and
*x*_{4} = 4 |
Writes 0 into *w*_{2} |

*T*_{34} |
Compares *x*_{3} = 7 and
*x*_{4} = 4 |
Writes 0 into *w*_{4} |

After this step, *w*_{1}, *w*_{2} and
*w*_{4} are set to zero, and
*w*_{3} is still 1. Therefore, in Step 3, the
thread associated with *w*_{3} sees a 1 and outputs
the value of *x*_{3}, which is 7. Thus, the maximum is 7.

### Problem

Write a program that reads in an array of some distinct integers and use
the above algorithm to find the maximum.
**Note that you should aim for maximum parallelism.
More precisely, all created threads should run concurrently.**