First Attempt

This is our first attempt to solve the problem. With the business card exchange metaphor, each employee offers his card and waits for the other party to make his offer. We might use a signal to indicate that Here is my card followed by a wait to mimic Wait for yours. Thus, we use two semaphores A and B for blocking threads in A and in B, respectively. Because everyone must wait until the other party offers his card, the initial values of these two semaphores are 0. This is shown below. Note that we use semaphore A = 0, B = 0 to declare two semaphores A and B with initial values 0's.

In addition to two semaphores, two global variables Buf_A and Buf_B are used. Variable Buf_A (resp., Buf_B) is for a thread in A (resp., B) to store his message for a thread in B (resp., A) to retrieve. In the above code, each iteration of Thread_A() does the following:

  1. Produces a message in its local variable Var_A.
  2. Notifies a thread in B, with a signal call Signal(B), that Here is my card and I am ready to exchange my card with you. Then, blocks itself with a call Wait(A).
  3. Once a thread in B is ready to exchange his business card by calling Signal(A), this thread (in group A) wakes up, moves its card into the global variable Buf_A for that thread in B to retrieve, and, finally, retrieves the other party's card from variable Buf_B.
Is this program correct? It is incorrect for an obvious reason that there is no mutual exclusion for global variables Buf_A and Buf_B. Because they are shared and can be accessed by threads in group A and group B simultaneously, race conditions will likely occur. If this is the case, can we find some possible race conditions?

An Easily Found Race Condition

Consider the following execution sequence. A thread in group A executes Signal(B), indicating the intention of a message exchange, followed by Wait(A) for a response from the other party. Because this thread blocks, the CPU is given to another thread. Suppose this is a thread in B. It executes a Signal(A) to show its intention followed by a Wait(B). The Signal() executed by B releases thread A and causes it to move its message from Var_A to the global variable Buf_A followed by retrieving B's message from global variable Buf_B. However, the value in Buf_B is not yet set by thread B and can be a garbage value or the value set by the previous message exchange. Therefore, thread A may show its good intention; however, it does not get the right message. Why is this a race condition? If everything runs fine, both A and B will get correct messages. Now, because two different orders of execution yield two different results, we have a race condition.

An Subtler Race Condition

The following is a subtler and more complicated race condition. Thread A1 executes Signal(B) to indicate that it is ready for exchanging a message and waits on Wait(A). Right after this, thread B1 comes into the scene and executes Signal(A) followed by Wait(B). However, thread B1's signal may not reach thread A1. Instead, a context switching causes thread A2 to run. Thus, thread A2 executes Signal(B) followed by Wait(A). Because of thread B1's Signal(A), thread A2 is released and running. Do not assume thread A1 will run, because we should not assume any order for releasing a thread from a semaphore's waiting queue. Now, thread A2 and thread B1 are running. After thread B1 makes its message into the global variable Buf_B, thread B2 comes into the scene and executes Signal(A). This releases thread A1 from Wait(A). Now, we have a problem. Because both threads A1 and A2 are running, they could store their own messages into the global variable Buf_A. Therefore, we have a race condition!

Lesson Learned

  • When a variable is shared by many threads, without a proper mutual exclusion protection, race conditions are likely to occur. In both execution sequences discussed above, messages received may not be the correct ones.
  • Global variables Buf_A and Buf_B should be protected.