Thursday, July 30, 2015

Synchronization


Thread Creation, Manipulation and Synchronization



  • We first must postulate a thread creation and manipulation interface. Will use the one in Nachos:
    class Thread {
      public:
        Thread(char* debugName); 
        ~Thread();
        void Fork(void (*func)(int), int arg);
        void Yield();
        void Finish();
    }
    
  • The Thread constructor creates a new thread. It allocates a data structure with space for the TCB.
  • To actually start the thread running, must tell it what function to start running when it runs. The Fork method gives it the function and a parameter to the function.
  • What does Fork do? It first allocates a stack for the thread. It then sets up the TCB so that when the thread starts running, it will invoke the function and pass it the correct parameter. It then puts the thread on a run queue someplace. Fork then returns, and the thread that called Fork continues.
  • How does OS set up TCB so that the thread starts running at the function? First, it sets the stack pointer in the TCB to the stack. Then, it sets the PC in the TCB to be the first instruction in the function. Then, it sets the register in the TCB holding the first parameter to the parameter. When the thread system restores the state from the TCB, the function will magically start to run.
  • The system maintains a queue of runnable threads. Whenever a processor becomes idle, the thread scheduler grabs a thread off of the run queue and runs the thread.
  • Conceptually, threads execute concurrently. This is the best way to reason about the behavior of threads. But in practice, the OS only has a finite number of processors, and it can't run all of the runnable threads at once. So, must multiplex the runnable threads on the finite number of processors.
  • Let's do a few thread examples. First example: two threads that increment a variable.
    int a = 0;
    void sum(int p) { 
      a++;
      printf("%d : a = %d\n", p, a);
    }
    void main() {
      Thread *t = new Thread("child");
      t->Fork(sum, 1);
      sum(0);
    }
    
  • The two calls to sum run concurrently. What are the possible results of the program? To understand this fully, we must break the sum subroutine up into its primitive components.
  • sum first reads the value of a into a register. It then increments the register, then stores the contents of the register back into a. It then reads the values of of the control string, p and a into the registers that it uses to pass arguments to the printf routine. It then calls printf, which prints out the data.
  • The best way to understand the instruction sequence is to look at the generated assembly language (cleaned up just a bit). You can have the compiler generate assembly code instead of object code by giving it the -S flag. It will put the generated assembly in the same file name as the .c or .cc file, but with a .s suffix.
            la      a, %r0
            ld      [%r0],%r1
            add     %r1,1,%r1
            st      %r1,[%r0]
    
            ld      [%r0], %o3 ! parameters are passed starting with %o0
            mov     %o0, %o1
            la      .L17, %o0
            call    printf
    
  • So when execute concurrently, the result depends on how the instructions interleave. What are possible results?
    0 : 1                                      0 : 1
    1 : 2                                      1 : 1
    
    1 : 2                                      1 : 1
    0 : 1                                      0 : 1
    
    1 : 1                                      0 : 2
    0 : 2                                      1 : 2
     
    0 : 2                                      1 : 2
    1 : 1                                      0 : 2
    
    So the results are nondeterministic - you may get different results when you run the program more than once. So, it can be very difficult to reproduce bugs. Nondeterministic execution is one of the things that makes writing parallel programs much more difficult than writing serial programs.
  • Chances are, the programmer is not happy with all of the possible results listed above. Probably wanted the value of a to be 2 after both threads finish. To achieve this, must make the increment operation atomic. That is, must prevent the interleaving of the instructions in a way that would interfere with the additions.
  • Concept of atomic operation. An atomic operation is one that executes without any interference from other operations - in other words, it executes as one unit. Typically build complex atomic operations up out of sequences of primitive operations. In our case the primitive operations are the individual machine instructions.
  • More formally, if several atomic operations execute, the final result is guaranteed to be the same as if the operations executed in some serial order.
  • In our case above, build an increment operation up out of loads, stores and add machine instructions. Want the increment operation to be atomic.
  • Use synchronization operations to make code sequences atomic. First synchronization abstraction: semaphores. A semaphore is, conceptually, a counter that supports two atomic operations, P and V. Here is the Semaphore interface from Nachos:
    class Semaphore {
      public:
        Semaphore(char* debugName, int initialValue);       
        ~Semaphore();                                      
        void P();
        void V();
    }
    
  • Here is what the operations do:
    • Semphore(name, count) : creates a semaphore and initializes the counter to count.
    • P() : Atomically waits until the counter is greater than 0, then decrements the counter and returns.
    • V() : Atomically increments the counter.
  • Here is how we can use the semaphore to make the sum example work:
    int a = 0;
    Semaphore *s;
    void sum(int p) {
      int t;
      s->P();
      a++;
      t = a;
      s->V();
      printf("%d : a = %d\n", p, t);
    }
    void main() {
      Thread *t = new Thread("child");
      s = new Semaphore("s", 1);
      t->Fork(sum, 1);
      sum(0);
    }
    
  • We are using semaphores here to implement a mutual exclusion mechanism. The idea behind mutual exclusion is that only one thread at a time should be allowed to do something. In this case, only one thread should access a. Use mutual exclusion to make operations atomic. The code that performs the atomic operation is called a critical section.
  • Semaphores do much more than mutual exclusion. They can also be used to synchronize producer/consumer programs. The idea is that the producer is generating data and the consumer is consuming data. So a Unix pipe has a producer and a consumer. You can also think of a person typing at a keyboard as a producer and the shell program reading the characters as a consumer.
  • Here is the synchronization problem: make sure that the consumer does not get ahead of the producer. But, we would like the producer to be able to produce without waiting for the consumer to consume. Can use semaphores to do this. Here is how it works:
    Semaphore *s;
    void consumer(int dummy) {
      while (1) { 
        s->P();
        consume the next unit of data
      }
    }
    void producer(int dummy) {
      while (1) {
        produce the next unit of data
        s->V();
      }
    }
    void main() {
      s = new Semaphore("s", 0);
      Thread *t = new Thread("consumer");
      t->Fork(consumer, 1);
      t = new Thread("producer");
      t->Fork(producer, 1);
    }
    
    In some sense the semaphore is an abstraction of the collection of data.

No comments:

Post a Comment