Basic Posix Threads

by Patrick Horgan

(Back to programming tutorials)

Let's start off some threads


pthread_create starts threads for you. It has several arguments:

int pthread_create(pthread_t *t,const pthread_attr_t, void *(*)(void *),void *arg);

but at first we'll be calling it in a simplified form,


i.e. we aren't going to pass any attributes or arguments to pthread_create. We'll talk more about them later in a more advanced tutorial.


Our first program will be thread1.cpp. It's not significant that it is a C++ program for this tutorial. It could just as well have been C. Our program will be compiled by gcc like this:

g++ -ggdb -o thread1 thread1.cpp -lpthread

The -lpthread has to be used every time. It links the pthread code to ours so that we can find all of the pthread routines like pthread_create.

Here's what the program will do

#include <iostream> #include <pthread.h> int ctr=0;

First we include <iostream> so I can print stuff to the console, and <pthread.h> which needs to be included in all pthread programs so your program will have all of the definitions it needs to run pthreads. Then we declare a global variable, ctr and set it to zero.

We're going to start two threads which will run the functions runt1() and runt2();

void *runt1(void* arg) { for(int i=0;i<10000;i++){ ctr++; } return NULL; }

runt1() increments ctr 10,000 times as fast as it can and then returns nothing useful.

void *runt2(void* arg) { for(int i=0;i<10000;i++){ ctr--; } return NULL; }

runt2() decrements ctr 10,000 times as fast as it can and then returns nothing useful.

Let's look at how our main starts them

int main(void) { pthread_t t1, t2; pthread_create(&t1,NULL,runt1,NULL); pthread_create(&t2,NULL,runt2,NULL); pthread_join(t1,NULL); pthread_join(t2,NULL); std::cout << ctr << '\n'; return 0; }

t1 and t2 are used to store the value returned from pthread_create. They let us refer to the threads later. We pass the address of them as the first argument to pthread_create, and pthread_create will store an opaque value into them.

Opaque just means that we don't have to care what it is, just hang on to it and pass it back to functions from the pthread library as needed.

We call pthread create twice, once asking for a thread to run runt1, and then the second time asking for a thread to run runt2.

In this program we don't have anything to do until the threads are done, so we call pthread_join twice, passing to it our opaque thread identifiers t1 and t2.

pthread_join pauses until the thread is finished and then lets you have access to whatever it returned. Since our threads return nothing, our second arguments to pthread_join are NULL, to tell it that we don't care about the return values. We'll deal with this in a later tutorial.

Finally, we print the value of the ctr, and exit.

Now let's ask bash to run our program ten times

$ for count in {1..10}; do ./thread1; done 0 0 0 -7760 -7762 0 384 0 -7275 0 $

What? It seems that our threads, somehow, interfered with each other. If you thought that incrementing a counter 10,000 times in one thread and decrementing it 10,000 times in another thread would always give you zero, then you must be surprised. How can one thread get in to the middle of incrementing or decrementing an integer? It seems such a basic operation.

Lets drop to assembler and see what's going on

$ gdb thread1 gdb thread1 Reading symbols from thread1...done. (gdb) set print asm-demangle (gdb) disassemble runt1 Dump of assembler code for function runt1(void*): 0x08048654 &+0>: push %ebp 0x08048655 &+1>: mov %esp,%ebp 0x08048657 &+3>: sub $0x10,%esp 0x0804865a &+6>: movl $0x0,-0x4(%ebp) 0x08048661 &+13>: jmp 0x8048674 <runt1(void*)+32> 0x08048663 &+15>: mov 0x8049b54,%eax 0x08048668 &+20>: add $0x1,%eax 0x0804866b &+23>: mov %eax,0x8049b54 0x08048670 &+28>: addl $0x1,-0x4(%ebp) 0x08048674 &+32>: cmpl $0x270f,-0x4(%ebp) 0x0804867b &+39>: setle %al 0x0804867e &+42>: test %al,%al 0x08048680 &+44>: jne 0x8048663 <runt1(void*)+15> 0x08048682 &+46>: mov $0x0,%eax 0x08048687 &+51>: leave 0x08048688 &+52>: ret End of assembler dump. (gdb)

The first three lines are our preamble. It carves out 16 bytes for local variables.

If this is completely new to you, check out my Basic Assembler Debugging with GDB which teaches about function preambles in assembler and a lot of other stuff.

The next line initializes our loop counter i. Then we jump to the bottom of our loop to test our loop condition, i <10000 by comparing it to 0x270f (999910), setting register %al to 1 if it was less than or equal, and then branching to the top of the loop at runt1t+15 if we haven't gone over 9999.

Finally we get to the part we care about. We move the value from address 0x8049b54 into register %eax. That value is the value of ctr. Then we add one to it, and then store it back into 0x8049b54. Our other thread works exactly the same, except it subtracts one from ctr.

Then we add one to our loop counter and hit the bottom of the loop again.

They don't do it in one step even when you think they do

Thread 1 Thread 2 Start Running, Thread1 randomly starts first mov 0x8049b54,%eax add $0x1,%eax Switch to Thread 2 mov 0x8049b54,%eax sub 0x1,%eax mov %eax,0x8049b64 Switch back to Thread 1 mov %eax,0x8049b54 Value stored from Thread 2 lost

This shows the case with one processor running both threads and the operating system interrupting one to run the other. It could be that the threads run on different CPUs and are completely intermixed.

Even if you operate directly on the memory like this:

mov $0x8049b54,%eax addl $0x1,(%eax)

to increment the memory, it could be that the memory isn't really in regular memory. It could be in various levels of cache, or paged out to a hard disk and even if you think the access is not able to be interrupted by another thread, you can't guarantee that that is true. The value could be a 32 bit int at an odd address, in which case one byte is in one cache line and the other three are in a different cache line. From C or C++ there is no way to make sure that you have the only access to ctr in memory.

What you're looking for is atomic access, i.e. you can read and change and write in one operation that another thread can't interrupt.

Such things exist in machine code, but they work differently with different types of CPUs. Even given that, in more complicated programs you might need to change several things based on checking one or more values and a simple atomic operation won't be enough. The pthread library gives you several things to help. We're going to look at a mutex in this tutorial, and wait for other tutorials to see other approaches.

So that didn't work so well, lets try again

Our program is broken, so we're going to fix it by using a mutex variable to control access to ctr.

MUTually EXclusive access (pthread_mutex)

pthread_mutex_t pmt;

A mutex is a special variable used to control access. Only one thread at a time can lock the mutex. Lock means have exclusive access. You can also say that a thread that has successfully locked the mutex holds the lock. To create a mutex, you declare it.


Next you initialize it (we pass null for the second argument because we are taking the default, which on linux is a fast mutex). See the man pages pthread_mutex_init(3) and pthread_mutexattr_init(3) for more information.


Then a thread has to try to acquire the lock. Only one thread at a time can acquire the lock, and if any other thread tries, they are suspended until the lock becomes free and then they acquire the lock. The library manages this for you. What they have to do is different for different hardware and software architectures, but you don't have to worry about that, just use the library provided on your machine.


Later, when the thread is done with the thing controlled by the lock, they would release the lock.

First we declare and initialize the mutex

int ctr=0; pthread_mutex_t pmt;

At global scope, so that both threads can get to it, we declare our mutex.


Then in our main we have to initialize it before using it.

pthread_mutex_lock(&pmt); ctr++; pthread_mutex_unlock(&pmt);

In each of the thread routines (this is from runt1()), we have to lock the mutex before we access ctr, and then release it when we're done.

Here's the code to the whole program thread2.cpp

In this version, no matter how many times we run it, the output at the end is always 0;

#include <iostream> #include <pthread.h> int ctr=0; pthread_mutex_t pmt; void *runt1(void* arg) { for(int i=0;i<10000;i++){ pthread_mutex_lock(&pmt); ctr++; pthread_mutex_unlock(&pmt); } return NULL; } void *runt2(void* arg) { for(int i=0;i<10000;i++){ pthread_mutex_lock(&pmt); ctr--; pthread_mutex_unlock(&pmt); } return NULL; } int main(void) { int retval1,retval2; int *tret1=&retval1,*tret2=&retval2; pthread_t t1, t2; pthread_mutex_init(&pmt,NULL); pthread_create(&1,NULL,runt1,NULL); pthread_create(&2,NULL,runt2,NULL); pthread_join(t1,reinterpret_cast<void**>(&ret1)); pthread_join(t2,reinterpret_cast<void**>(&ret2)); std::cout << ctr << '\n'; return 0; }

Don't do like I did in this example, check return values

Each of the pthread routines tells you if it succeeds, and in real code, unlike in this simple demo code, you need to check whether they succeeded. You can check the man pages for the possible return values, and your code needs to handle each possible case.

Possible responses to various errors are

Let's look at errors returned from functions we've called so far

Please be aware that the following is for example and inspiration. The errors returned can vary between one system and another. That seems strange, but it isn't. pthreads deals with the operating system and the hardware at the lowest, or the next to the lowest level depending on the implementation on your machine. Errors that can happen on one operating system, or on one CPU type are different from the errors that can happen on others. Please read the documentation that came with your version of the pthread library to see what the possible errors are. The information I give you below if for a recent linux box running on an x86 processor using the Native POSIX Thread Library (NPTL), rather than the older LinuxThreads.

pthread_create errors

When you call pthread_create it returns an int. On success that is 0, and on failure it returns non-zero. If it fails, a reason is returned telling you what the error was. The reasons it can fail are

#include <errno.h> #include <pthread.h> int retval; pthread_t t1; if((retval=pthread_create(&1,NULL,runt1,NULL))){ // failure switch(retval){ case EAGAIN: // code goes here to deal with it. break; case EINVAL: // code to fix the attr and retry, or else to quit break; case EPERM: // fix it if we can or exit break; default: // unexpected error, log, quit, abort break; } } else { // here's the code for when things worked }

pthread_mutex_init errors

pthread_mutex_init also returns an int. It always returns a zero, so you don't have to check it.

pthread_mutex_lock errors

pthread_mutex_lock also returns an int. It returns 0 for success and non-zero for failure. On failure, it returns

pthread_mutex_unlock errors

pthread_mutex_unlock also returns an int. It returns 0 for success and non-zero for failure. On failure, it returns

pthread_mutex_join errors

pthread_mutex_unlock also returns an int. It returns 0 for success and non-zero for failure. On failure, it returns

Don't skip on this

Fight the urge to skip on this. In this tutorial I don't show the error checking code for brevity, but in real code I always check and handle all possible error returns. If you catch yourself thinking, "That's stupid, that could never happen.", then you're asking for trouble. The man pages tell you about things that can happen. Your job is just to figure out what you will do if they do happen. For all the rest of the routines, make sure you check the man pages and deal with all possible errors.

Wrap up

This is all you need for basic use of pthreads, but I'll do more advanced tutorials soon.

(Back to programming tutorials)