Understanding Threads in Python

By : Akshar Raaj

We will see some examples of using threads in Python and how to avoid race conditions:

You should run each example several times to notice that threads are unpredictable and that your results differ every time.

Disclaimer: Forget anything you heard about GIL for now, because GIL is not going to mess up with scenarios I want to show.

Example 1:

We want to fetch five different urls:

Single threaded way:

def get_responses():
    urls = ['http://www.google.com', 'http://www.amazon.com', 'http://www.ebay.com', 'http://www.alibaba.com', 'http://www.reddit.com']
    start = time.time()
    for url in urls:
        print url
        resp = urllib2.urlopen(url)
        print resp.getcode()
    print "Elapsed time: %s" % (time.time()-start)


Output is:

http://www.google.com 200
http://www.amazon.com 200
http://www.ebay.com 200
http://www.alibaba.com 200
http://www.reddit.com 200
Elapsed time: 3.0814409256
  • All the urls are fetched in sequence.
  • Unless the processor got response from a url, it didn't fetch the next url.
  • Network operations are time taking, so processor was idle during the time it was expecting the response from a url.

Even in a single threaded program, there is one thread of execution. Let's call it main thread. So, last example had only one thread, i.e main thread.

Multi threaded way:

You need to create a class which subclasses Thread:

from threading import Thread

class GetUrlThread(Thread):
    def __init__(self, url):
        self.url = url 
        super(GetUrlThread, self).__init__()

    def run(self):
        resp = urllib2.urlopen(self.url)
        print self.url, resp.getcode()

def get_responses():
    urls = ['http://www.google.com', 'http://www.amazon.com', 'http://www.ebay.com', 'http://www.alibaba.com', 'http://www.reddit.com']
    start = time.time()
    threads = []
    for url in urls:
        t = GetUrlThread(url)
    for t in threads:
    print "Elapsed time: %s" % (time.time()-start)


Output is:

http://www.reddit.com 200
http://www.google.com 200
http://www.amazon.com 200
http://www.alibaba.com 200
http://www.ebay.com 200
Elapsed time: 0.689890861511
  • Appreciate the improvement in running time of this program.
  • We wrote a multi threaded program to decrease processor's idle time. While waiting for response of a particular thread's url, processor can work on some other thread and fetch the other thread's url.
  • We wanted one thread to act on one url, so overridden the constructor of thread class to pass it a url.
  • Execution of a thread means execution of a thread's run().
  • So, whatever we want the thread to do must go in its run().
  • Created one thread for each url and called start() on it. This tells the processor that it can execute the particular thread i.e the run() of thread.
  • We don't want the elapsed time to be evaluated until all the threads have executed, join() comes in picture here.
  • Calling join() on a thread tells the main thread to wait for this particular thread to finish before the main thread can execute the next instruction.
  • We call join() on all the threads, so elapsed time will be printed only after all the threads have run.
Few things about threads
  • Processor might not execute run() of a thread immediately after start().
  • You can't say in which order run() of different threads will be called.
  • For a specific thread, it's guaranteed that the statements inside run() will be executed sequentially.
  • It means that first the the url associated with the thread will be fetched and only then the recieved response will be printed.

Example 2:

We will demonstrate race condition with a program and then fix it:

Read wikipedia example to understand what race condition means.

#define a global variable
some_var = 0

class IncrementThread(Thread):
    def run(self):
        #we want to read a global variable
        #and then increment it
        global some_var
        read_value = some_var
        print "some_var in %s is %d" % (self.name, read_value)
        some_var = read_value + 1 
        print "some_var in %s after increment is %d" % (self.name, some_var)

def use_increment_thread():
    threads = []
    for i in range(50):
        t = IncrementThread()
    for t in threads:
    print "After 50 modifications, some_var should have become 50"
    print "After 50 modifications, some_var is %d" % (some_var,)


Run this and you will find different result every time.

  • There is a global variable and all the threads will modify it.
  • All threads should add 1 to the existing value of the variable.
  • There are 50 threads, so at end the value of some_var should become 50, but it doesn't.
Why some_var didn't reach 50?
  • At some point thread t1 read the value of some_var as 15 and then processor took the control from this thread and gave it to thread t2.
  • t2 also reads some_var as 15.
  • Both t1 and t2 reset the value of some_var to 15+1 i.e 16.
  • But when two threads act on some_var we expected it's value to be increased by 2.
  • So, we have a race condition here.
  • A similar race condition might have occurred few more times and so value of some_var at end remains something like 41 or 42 or anything less than 50.

Fix this race condition

Change the run() of IncrementThread to:

from threading import Lock
lock = Lock()

class IncrementThread(Thread):
    def run(self):
        #we want to read a global variable
        #and then increment it
        global some_var
        read_value = some_var
        print "some_var in %s is %d" % (self.name, read_value)
        some_var = read_value + 1
        print "some_var in %s after increment is %d" % (self.name, some_var)

You should run use_increment_thread again and the result will match your expectation.

  • Lock is used to guard against race condition.
  • If thread t1 has acquired the lock before performing a set of operations, no other thread can perform the same set of operation until t1 releases the lock.
  • We want to make sure that once t1 has read some_var, no other thread can read some_var until t1 is done with modifying the value of some_var.
  • So reading some_var and modifying it are logically related operations here.
  • And that is why we keep read and modify part of some_var guarded by a Lock instance.
  • Lock is a separate object and it will be acquired by the thread from whose context it is called.

Example 3

In last example we saw how a global variable gets affected in multithreading. Let's see an example to verify that one thread cannot affect the instance variable of some other thread.

Let's introduce time.sleep() in this example. It will make sure that a thread goes in suspended state and hence enforces thread switching to occur.

import time

class CreateListThread(Thread):
    def run(self):
        self.entries = []
        for i in range(10):
        print self.entries

def use_create_list_thread():
    for i in range(3):
        t = CreateListThread()


Run it few times and you will notice that the list do not get printed properly.

Possibly the entries of one thread was getting printed and during this operation, processor switched to some other thread and started printing the entries for other thread. We want to ensure that entries get printed one after another for separate threads.

Change run() of CreateListThread to use lock.

class CreateListThread(Thread):
    def run(self):
        self.entries = []
        for i in range(10):
        print self.entries

So, we put the print operation inside a lock. When one thread has acquired the lock and printing its entries, no other thread can print its entries. And so you will see entries of different threads printed on separate lines.

This will show that all threads' entries, which is an instance variable, is a list with numbers from 0 to 9. So, thread switching doesn't affect the instance variable of a thread.

Shawn Milochik

Thanks for the great introduction. It was very easy to follow.

Kiss György 22nd Sept., 2013

Nice article, thanks!

Ilias 24th Sept., 2013

I think you have a bug on your last example. This is what works for me synchronizing the threads:

class CreateListThread(Thread):
def run(self):
global lock
self.entries = []
for i in range(10):
print self.entries

Akshar Raaj 28th Sept., 2013

Can you be more specific about the bug, the example works fine for me.

berkay 29th Sept., 2013

great examples, in second example if we add time.sleep(.01) after read_value = some_var it's better to see the some_var does not add to 50 otherwise depending on processor's scheduler it's mostly 50.

@Ilias the code is correct self.entries = [] is not a global variable. Only when started printing the entries of each thread, the others may print too, then it's not seen correctly printed out if lock is not used.


Can I translate your article to chinese?

Akshar Raaj 31st Oct., 2013

Sure, please do.


These examples rock! Thanks for sparing the time to teach others.

RIshabh Roy 27th March, 2014

hi actually i needed help in implementing my code using threads
currrently i am working on GOslate translate api

i want to make an application using this to tranlate text files
but iam unable to implement this using threads . plz suggest me how can i read the file line by line and translate it .

© Agiliq, 2009-2012