Python Training – Part 3

Part 1 | Part 2 || Part 4 | Part 5

This is part 3 of a Python training that I gave while I was working at SPIELO International. My manager kindly gave me permission to publish the material. The material is Copyright © 2008-2011 SPIELO International, licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Advertisement

Exceptions

In Python, exceptions are the primary error handling mechanism. Whether you access an invalid list index, or you open a file that doesn’t exist, or you divide by zero—an exception is raised in all of these cases. If your program doesn’t handle the exception explicitly, a traceback is printed and the program terminates:

Traceback (most recent call last):
  File "C:tempx.py", line 9, in <module>
    Main()
  File "C:tempx.py", line 7, in Main
    print Divide(5, 0)
  File "C:tempx.py", line 3, in Divide
    x = a / b
ZeroDivisionError: integer division or modulo by zero

2.1Basic Usage

To handle exceptions, enclose code that might raise an exception in a “try” / “except” block. To raise an exception when something goes wrong, use the “raise” keyword.

If you are familiar with exception handling in C++, here’s how it maps to Python:

C++Python
// Define your own exception class.
class MyException
{
public:
    MyException(const string& msg)
    :   m_msg(msg)
    { }

    string m_msg;
};

// Raise an exception.
void FunctionWithError()
{
    throw MyException("Oops.");
}

// Handle an exception.
void HandleException()
{
    try
    {
        FunctionWithError();
    }
    catch (MyException& e)
    {
        cerr << e.m_msg;
    }
    catch (YourException)
    {
        cerr << "Your exception."
    }
    catch (...)
    {
        cerr << "Unknown error."
    }
}
# Define your own exception class.
class MyException:
    def __init__(self, msg):
        self.m_msg = msg

# Raise an exception.
def FunctionWithError():
    raise MyException("Oops.")

# Handle an exception.
def HandleException():
    try:
        FunctionWithError()
    except MyException, e:
        print e.m_msg
    except YourException:
        print "Your exception."
    except:
        print "Unknown error."

“Exception” Base Class

All of the exceptions that the Python interpreter or the standard library functions raise are derived from “Exception”. (I don’t recall any library function where this is not the case.) When you define your own exception classes, you should derive from “Exception” as well.

For exceptions that don’t need anything besides a message:

class MyException(Exception):
    pass

try:
    raise MyException("Oh no!")
except Exception, e:
        # Catches all exceptions that are derived from Exception.
    print e

For exceptions that need more:

class MyExtendedException(Exception):
    def __init__(self, info):
        # Initialize the base class with our own message.
        Exception.__init__(self, str(info) + "it happens")
        self.m_info = info

try:
    raise MyExtendedException("Sh")
except MyExtendedException, e:
    print e, e.m_info

Catching Multiple Types of Exceptions

The “except” keyword accepts a tuple with any number of exception classes:

try:
    if x:
        raise FirstException()
    else:
        raise SecondException()
except (FirstException, SecondException), e:
    print e

The previous code is equivalent to:

try:
    if x:
        raise FirstException()
    else:
        raise SecondException()
except FirstException, e:
    print e
except SecondException, e:
    print e

“try” / “except” / “else”

To run a code block only if no exception was raised, add an “else” clause to the “try” block:

try:
    print "Don't raise."
except:
    print "We never get here."
else:
    print "This is only run when no exception occurred."

“try” / “except” / “finally”

To run a code block regardless of whether an exception occurred or not, use “finally”:

def IntermediateFunction(fail):
    try:
        FunctionThatFailsSometimes(fail)
    finally:
        print "We always get here."
        # Hidden homework: See what happens when you
        # add "return" here. (Hint: Does the exception still
        # get through?)

def FunctionThatFailsSometimes(fail):
    if fail:
        print "Raise."
        raise Exception()
    else:
        print "Don't raise."

try:
    print "---Fail"
    IntermediateFunction(True)
except:
    print "The exception still gets through."
finally:
    print '"except" and "finally" can be used together.'

try:
    print "---Success"
    IntermediateFunction(False)
except:
    print "We never get here."
finally:
    print '"except" and "finally" can be used together.'

Output:

---Fail
Raise.
We always get here.
The exception still gets through.
"except" and "finally" can be used together.
---Success
Don't raise.
We always get here.
"except" and "finally" can be used together.

Printing a Traceback

Sometimes you want to handle an exception and still print the same kind of traceback that you would get from the interpreter if you didn’t have a “try” / “except”. Use the “traceback” module for this:

import traceback
try:
    raise Exception()
except:
    traceback.print_exc()    # print to stderr
    f = open("exception.txt", "w")
    traceback.print_exc(file=f)
    f.close()

Operator Overloading and Other Magic

A class can define a number of special methods to do things for which you would use operator overloading in C++.

Note: Some of these methods make your class behave more like one of the built-in types, like “list”, “dict”, or “str”. If your class is merely an extension of one of these types, consider deriving from the built-in class, but mind the Liskov Substitution Principle.

Destructor

To perform clean-up when the object is deleted, add a “__del__” method to your class:

class MyClass(object):
    def __del__(self):
        print "Object is deleted."

a = MyClass()
b = {1: a}
print "Removing first reference to object"
del a
print "Removing last reference to object"
b.clear()

# Output:
#    Removing first reference to object
#    Removing last reference to object
#    Object is deleted.

Support “str()” and “repr()”

When you call “str()” and “repr()” on an instance of a user-defined class, you get something like this:

<MyClass instance at 0x12345678>

To end up with something nicer, add “__str__” and “__repr__” methods to your class:

class MyClass(object):
    def __str__(self):
        return "I am a MyClass."

    def __repr__(self):
        return "MyClass()"

x = MyClass()
print str(x)     # prints "I am a MyClass."
print repr(x)    # prints "MyClass()"

Note: If you don’t have a “__str__” method, the “__repr__” method is used for “str()” as well.

If the string representation should be a Unicode string, add a “__unicode__” method:

class MyClass(object):
    def __unicode__(self):
        return u"xe4xf6xfc"

    def __str__(self):
        return "aou"

x = MyClass()
print str(x)     # prints "aou"
print "%s" % x   # prints "aou"
print unicode(x) # prints "äöü"
print u"%s" % x  # prints "äöü"

Support “==”, “!=”, “<”, “<=”, etc.

To support comparison operations, add these methods to your class:

OperatorMethod
==__eq__
!=__ne__
<__lt__
<=__le__
>__gt__
>=__ge__

Each of these methods receives the other object as its only parameter and should return True or False:

class MyClass(object):
    def __init__(self, a, b):
        self.m_a = a
        self.m_b = b
    def __eq__(self, other):
        return (self.m_a, self.m_b) == (other.m_a, other.m_b)
    def __ne__(self, other):
        return not self.__eq__(other)
    def __lt__(self, other):
        return (self.m_a, self.m_b) < (other.m_a, other.m_b)
    def __le__(self, other):
        return self.__lt__(other) or self.__eq__(other)
    def __gt__(self, other):
        return (self.m_a, self.m_b) > (other.m_a, other.m_b)
    def __ge__(self, other):
        return self.__gt__(other) or self.__eq__(other)
    def __repr__(self):
        return repr((self.m_a, self.m_b))

a = MyClass(3, 7)
b = MyClass(3, 7)
print a == b   # True
print a != b   # False
print a < b    # False
print a > b    # False
print a >= b   # True

# Comparison is also required for sorting:
ls = [MyClass(3, 7), MyClass(5, 2), MyClass(3, 5)]
ls.sort()
print ls    # prints [ (3, 5), (3, 7), (5, 2)]

Support “[ ]”

To support the subscript operator “[ ]”, add “__getitem__” and “__setitem__” methods. For list-like classes, these methods should receive an integer index and raise an “IndexError” if the index is invalid. For dict-like classes, these methods should receive a key of any type and raise a “KeyError” if the key is unknown.

class Alphabet(object):
    def __getitem__(self, idx):
        if 0 <= idx < 26:
            return chr(idx + ord("A"))
        else:
            raise IndexError("Index out of range")

    def __setitem__(self, idx, value):
        if 0 <= idx < 26:
            print "All %s will be changed to %s" % (
                chr(idx + ord("A")), value)
        else:
            raise IndexError("Index out of range")

x = Alphabet()
print x[3]       # prints "D"
x[2] = "Y"       # prints "All C will be changed to Y"

Support “for” Loops (Iteration)

There are two ways of supporting “for” loops over a sequence:

  • Implement “len()” and “[ ]” with “__len__” and “__getitem__”
  • Implement an iterator with “__iter__”

Using “len()” and “[ ]”

class MyList(object):
    def __len__(self):
        return 5

    def __getitem__(self, idx):
        if 0 <= idx < len(self):
            return idx * 10
        else:
            raise IndexError("Index out of range")

ls = MyList()
for i in ls:
    print i,

# Output:
#    0 10 20 30 40

Using an Iterator

class MyList(object):
    class MyIterator(object):
        def __init__(self, the_str):
            self.__m_the_str = the_str
            self.__m_idx = 0
        def __iter__(self):
            return self
        def next(self):
            if self.__m_idx >= len(self.__m_the_str):
                raise StopIteration()
            else:
                self.__m_idx += 1
                return self.__m_the_str[self.__m_idx - 1]

    def __iter__(self):
        return MyList.MyIterator("Iterate over this")

ls = MyList()
for i in ls:
    print i,

# Output:
#    I t e r a t e  o v e r  t h i s

How it works:

  • The “for” loop calls the “__iter__” method.
  • The “__iter__” method must return an iterator object that implements two methods:
    • “__iter__”, which returns the iterator itself
    • “next”, which is called repeatedly to retrieve the elements, until it raises a “StopIteration” exception.

Using an Iterator and a Generator Function

The preceding example can be written much more concisely using a “generator” function:

class MyList(object):
    def __iter__(self):
        for c in "Iterate over this":
            yield c

ls = MyList()
for i in ls:
    print i,

# Output:
#    I t e r a t e  o v e r  t h i s

How it works:

  • The “yield” keyword turns a normal function into a generator function.
  • When the generator function is called, it really returns an iterator.
  • For each step of the iteration, the function executes until it encounters a “yield”.
  • The result of the “yield” becomes value of the current step.
  • The iteration continues until the function exits through an implicit or explicit “return”.

Another generator example:

def OddFibonacci(maximum):
    a, b = 0, 1
    while b <= maximum:
        if (b % 2) == 0:
            yield str(b) + " is even!"
        else:
            yield b
        a, b = b, a + b

for x in OddFibonacci(5):
    print x

# Output:
#    1
#    1
#    2 is even!
#    3
#    5

Calling an Object like a Function (Functors)

To be able to call an object like a function, add a “__call__” method with any number of arguments:

class MyFunctor(object):
    def __init__(self, factor):
        self.__m_factor = factor

    def __call__(self, a):
        return self.m_factor * a

f = MyFunctor(10)
print f(3)    # prints 30

This particular example can also be written using the “lambda” keyword:

f = lambda a: 10 * a
    # This is equivalent to:
    #    def f(a):
    #        return 10 * a
print f(3)    # prints 30

More

There are many other special methods that we didn’t talk about. See chapter 3.4, “Special method names,” in the “Python Reference Manual.”

Common Scripting Tasks

Walking a Directory Structure

The “os.walk()” function walks a directory tree and returns a list (or rather, an iterator over a list) of tuples of the form “(dirpath, dirnames, filenames)”. Here’s an example:

import os
root = r"c:temp"
for dirpath, dirnames, filenames in os.walk(root):
    print os.path.join(root, dirpath)
    print "  Sub-directories:",
    prefix = "n   - "
    print prefix + prefix.join(dirnames)
    print "  Files:",
    print prefix + prefix.join(filenames)
  • To list the contents of a single directory, you can also use “os.listdir()”.
  • To list filenames that match a pattern (e.g., “*.txt” or “log????.*”), use “glob.glob()”.

 

More info: See the docs of the following modules for other filesystem-related functions that you might find useful: “os”, “os.path”, “shutil”, “glob”

Running External Programs

There are several ways of running external programs. The most important ones are:

  • Calling “os.system()” with the command as you would type it on the command line. This function waits for the program to finish and returns its exit code.
  • Using the “subprocess.Popen” class, you can run a program asynchronously (without blocking the calling Python program) and you can communicate with the program via stdin, stdout, and stderr.
  • Use “os.startfile()” to open a file with its associated program. For example, to open a Word document in Word, you can write “os.startfile(‘document1.doc’)”. This is like double-clicking the file in Explorer.

Here’s an example of using “subprocess.Popen” to layout a graph using “dot.exe” (from the Graphviz package):

import subprocess

PROGRAM_PATH = r"dot.exe"

p = subprocess.Popen(PROGRAM_PATH + " -T plain",
                     stdin=subprocess.PIPE,
                     stdout=subprocess.PIPE)
graph = """digraph {
    a -> b
    b -> c
    a -> c
    }
    """
stdout, stderr = p.communicate(graph)

print stdout

Instead of redirecting the output of the program, we can just as well work with temporary files and “os.system()”:

import os
import tempfile

DOT_PATH = r"dot.exe"
EXAMPLE_GRAPH = "digraph { a -> b; b -> c; a -> c }"

def GetTempFilename():
    fh, fname = tempfile.mkstemp(suffix=".tmp", prefix="dot")
    os.close(fh)
    return fname

def LayoutGraph(graph):
    input_file = GetTempFilename()
    output_file = GetTempFilename()

    try:
        open(input_file, "w").write(graph)
        os.system(DOT_PATH + ' -T plain -o "%s" "%s"'
                  % (output_file, input_file))
        return open(output_file).read()
    finally:
        # Delete the temporary files.
        os.remove(input_file)
        os.remove(output_file)

if __name__ == "__main__":
    print LayoutGraph(EXAMPLE_GRAPH)

Regular Expressions

Regular expressions facilitate searching for patterns in a string. The syntax appears a bit cryptic at first (which is probably due to it being cryptic), but don’t give up easily. It’s often easier to use a regular expression than to perform the same parsing using basic string operations like “find”, “split”, and slicing.

As an example, let’s assume we have a text file that contains dates in a certain format:

...
Meeting on 2007-09-13. Call Joe at 555-1232-4756.
... See last week's report (2008-02-01). ...
Reservations were made from 2009-1-17
to 2009-2-16.
...

Using a Python script, we’d like to transform it to this:

...
Meeting on Thursday, 13 September 2007. Call Joe at 555-1232-4756.
... See last week's report (Friday, 01 February 2008). ...
Reservations were made from Saturday, 17 January 2009
to Monday, 16 February 2009.
...

Let’s start with a regex that matches only the string “2007-09-13” and use the “re.sub()” function to replace it with “Thursday, 13 September 2007”:

import re

text = """...
Meeting on 2007-09-13. Call Joe at 555-1232-4756.
... See last week's report (2008-02-01). ...
Reservations were made from 2009-1-17
to 2009-2-16.
...
"""

print re.sub(r"2007-09-13", "Thursday, 13 September 2007", text)

Note: You should always use a raw string (r”…”) for the pattern string. (Backslashes are used frequently as part of the regular expression syntax. If you don’t use raw strings, you have to escape each backslash, which makes patterns harder to read.)

Instead of hard-coding the replacement string, what we really want is to call a function each time the pattern matches and calculate the replacement string in the function. This is possible by passing a function to “re.sub()”:

...
def ReplaceDate(match):
    return "Thursday, 13 September 2007"
print re.sub(r"2007-09-13", ReplaceDate, text)

The argument “match” to the “ReplaceDate” function is a “re.MatchObject” instance. To find out what a match object can do, try this in an interactive shell:

>>> text = "Meeting on 2007-09-13."
>>> m = re.search(r"2007-09-13", text)
>>> m
… <_sre.SRE_Match object at 0x01E84058>
>>> dir(_)
… ['__copy__', '__deepcopy__', 'end', 'expand', 'group', 'groupdict', 'groups', 'span', 'start']
>>> m.group()
… '2007-09-13'
>>> m.start(), m.end()
… 11, 21
>>> text[11:21]
… '2007-09-13'

Let’s write a better regular expression:

def ReplaceDate(match):
    return repr(match.group())
print re.sub(r"(d{4})-(d{1,2})-(d{1,2})", ReplaceDate, text)

Output:

...
Meeting on ('2007', '09', '13'). Call Joe at 555-1232-4756.
See last week's report (('2008', '02', '01')). ...
Reservations were made from ('2009', '1', '17')
to ('2009', '2', '16').
...

Let’s pick apart the regular expression: (d{4})-(d{1,2})-(d{1,2})

  • “d” matches any decimal digit.
  • Appending {m,n} matches m to n repetitions of the preceding pattern. For example, “d{1,2}” matches a single digit or two digits.
  • Other ways of indicating repetitions are “d?” (an optional digit), “d+” (one or more digits), and “d*” (zero or more digits).
  • The parentheses are used to create groups. A tuple of these groups is returned by the “groups” method of the match object.

What’s missing is some code that converts a tuple like “(‘2007′, ’09’, ’13’)” to the string “Thursday, 13 September 2007”. Here’s the final code:

import datetime
import re

text = """...
Meeting on 2007-09-13. Call Joe at 555-1232-4756.
... See last week's report (2008-02-01). ...
Reservations were made from 2009-1-17
to 2009-2-16.
...
"""

def ReplaceDate(match):
    year, month, day = map(int, match.groups())
    date = datetime.date(year, month, day)
    return date.strftime("%A, %d %B %Y")

print re.sub(r"(d{4})-(d{1,2})-(d{1,2})", ReplaceDate, text)

More info: See the docs of the “re” module. An overview of the regular expression syntax can be found in the section “Regular Expression Syntax” in the “Python Library Reference.”

Homework

The homework combines several techniques presented in this handout. The program can be written in less than 100 lines (comments included).

Write a program that draws an “#include” graph of some C++ code of your choice:

  • Walk one or more directories that contain your “.cpp” and “.h” files.
  • Open each “.cpp” file and search for “#include” directives (preferably using a regular expression).
  • For each “.cpp” file, store a list of all “.h” files that you find in the “#include” directives. The pairs of “.cpp” and “.h” files are the edges of your graph.
  • Once you have all the edges, write a graph file for “dot.exe” similar to this:
    digraph {
        "main.cpp" -> "helper.h"
        "main.cpp" -> "container.h"
        "helper.cpp" -> "os.h"
        "helper.cpp" -> "helper.h"
        ...
    }
  • Invoke “utildotdot.exe -T png -o includes.png the_graph.txt” to draw the include graph.

Advertisement

Leave a Reply

Your email address will not be published. Required fields are marked *