Python Training – Part 2

Part 1 || Part 3 | Part 4 | Part 5

This is part 2 of a Python training that I gave while I was working at SPIELO International. My manager kindly gave me permission to publish the material. The material is Copyright © 2008-2011 SPIELO International, licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Advertisement

Note: The training was based on Python 2.x, because that’s what we were using at the time. I would love to update it to Python 3 at some point. Any help with this would be greatly appreciated.

Namespaces and Scopes

When you define a variable anywhere in a Python function, the variable is added to the local namespace of the function. Unlike in C++, even if the variable is defined inside an “if” statement or in a “for” loop, the variable name is also visible outside the “if” or “for” block:

def SomeFunc(a):
    def InnerFunc():
        return 2 * x    # x is visible here as well
    if a:
        x = a
    else:
        x = 0
    return x + InnerFunc()

If a name is not found in the local scope at runtime, Python tries the global scope (aka module scope, where a module refers to the “.py” file) next, then the built-in scope (containing things like “int()”, “sorted()”, etc.).

the_global = 123

def SomeFunc():
    return the_global

def OtherFunc():
    the_global = 456
    return the_global

def GlobalFunc():
    global the_global
    the_global = 789
    return the_global

def BuggyFunc():
    x = the_global
    the_global = 456
    return x

print the_global    # prints 123
print SomeFunc()    # prints 123
print OtherFunc()   # prints 456
print the_global    # still prints 123
print GlobalFunc()  # prints 789
print the_global    # prints 789
print BuggyFunc()   # UnboundLocalError: local variable
                    # 'the_global' referenced before assignment

Things to note:

  • When you assign to a variable in a function, you are always writing to the local namespace of the function, even if the variable name exists in the global scope as well (see “OtherFunc”).
  • To assign to a variable in the global namespace, use the “global” keyword (see “GlobalFunc”).
  • As soon as you assign to a variable in a function, reading from the variable anywhere in the function accesses the local variable, even in a line that precedes the assignment. Therefore, you receive an error in “BuggyFunc”, because “x = the_global” tries to read the local variable “the_global”, which isn’t assigned yet.

You can use these built-in functions to access the contents of namespaces:

  • globals(): Returns a dictionary containing the variables in the module scope.
  • locals(): Returns a dictionary containing the variables in the local (class or function) scope. You should not modify the dictionary.
  • dir(): When called without parameters, returns a list of variables in the local scope. When called with an object as the parameter, returns a list of attributes of the object.

You can remove the binding of a name to an object by using the “del” keyword:

>>> the_global = 123
>>> print the_global
…
>>> del the_global
>>> print the_global
… NameError: name 'the_global' is not defined

Classes

To define a class in Python, use the “class” keyword. For the methods, use the “def” keyword, just like for functions.

class SomeClass(object):
    """@brief The docstring for the class."""

    def __init__(self, initial_value):
        """@brief This is the constructor, or more precisely,
                the initializer of the class.
        """
        self.m_some_member = initial_value

    def SomeMethod(self, inc):
        self.m_some_member += inc
        return self.m_some_member

# Working with the class
c = SomeClass(100)
print c.SomeMethod(10)    # prints 110
c.m_some_member = 0
print c.m_some_member     # prints 0

What we see:

  • New-style classes are derived from the “object”class.
    • If you write just “class SomeClass:”, you still get a class, but some of the features that we’ll discuss later (such as properties) won’t work. So you should always derive from “object” (or from a class that’s already derived from “object”) if possible.
  • The equivalent to a C++ constructor is the “__init__” method. You can leave it out if you have no fields to initialize. (“__del__” is the opposite.)
  • The first parameter to all methods must be the object instance. By convention, you should name it “self”.
    • This is the equivalent to the implicit “this” parameter in C++.
  • To create fields, assign to attributes of “self”. Typically, this is done in “__init__”, but it can be done everywhere.
    • According to our Python coding guidelines, fields are prefixed with “m_”, but this is just convention.
  • To create instances of the class, call the class like a function with the parameters specified for “__init__”.
  • You can access fields directly from outside the class (“c.m_some_member = …”).

Inheritance

To create a derived class, specify a comma-separated list of base classes when you define the class:

class DerivedClass(BaseA, BaseB):

The derived class inherits all the attributes of the base classes. When an attribute name appears both in “BaseA” and in “BaseB”, the attribute from “BaseA” has precedence, because it appears first in the list of base classes.

An example demonstrating some aspects of inheritance:

class BaseA(object):
    def __init__(self):
        self.m_member = None

    def MethodA(self):
        print "BaseA. MethodA", self.m_member

    def CommonMethod(self):
        print "BaseA.CommonMethod"

class BaseB(object):
    def MethodB(self):
        print "BaseB. MethodB"

    def CommonMethod(self):
        print "BaseB.CommonMethod"

class DerivedClass(BaseA, BaseB):
    def MethodA(self):
        print "DerivedClass.MethodA"
        # Call the inherited method
        self.m_member = "hi!"
        BaseA.MethodA(self)

d = DerivedClass()
d.MethodA()
# Output:
#  DerivedClass.MethodA
#  BaseA.MethodA hi!

d.MethodB()
# Output:
#  BaseB.MethodB

d.CommonMethod()
# Output:
#  BaseA.CommonMethod

What we see:

  • “BaseA.__init__” is invoked automatically when you create an instance of “DerivedClass”.
  • An attribute in a derived class overwrites an attribute of the same name in the base class (“DerivedClass.MethodA”).
  • If an attribute appears in more than one base class, the attribute from the class that was specified first in the list of base classes has precedence (“BaseA.CommonMethod”).
  • To invoke the base class implementation of a method, write BaseClass.MethodName and pass “self” as the first parameter explicitly.

Please note that if you define your own “__init__” in the derived class, or if you have multiple base classes with an “__init__”, you are responsible for invoking the base class implementation of “__init__”:

class BaseClass(object):
    def __init__(self):
        self.m_member = 0

    def SomeMethod(self):
        return self.m_member

class GoodDerivedClass(BaseClass):
    def __init__(self):
        BaseClass.__init__(self)

class BadDerivedClass(BaseClass):
    def __init__(self):
        pass

good = GoodDerivedClass()
print good.SomeMethod()    # prints 0

bad = BadDerivedClass()
print bad.SomeMethod()     # AttributeError: 'BadDerivedClass'
                           # object has no attribute 'm_member'

Public, private, protected

Python distinguishes between public and private attributes. Any attribute name prefixed with two underscores becomes private (except for names of the form “__xy__”).

class SomeClass(object):
    def PublicMethod(self):
        self.__m_private_field = "encapsulated"
        return self.__PrivateMethod()

    def __PrivateMethod(self):
        return self.__m_private_field

c = SomeClass()
print c.PublicMethod()    # prints "encapsulated"
print c.__PrivateMethod() # AttributeError: SomeClass instance has
                          # no attribute '__PrivateMethod'
print c.__m_private_field # AttributeError: SomeClass instance has
                          # no attribute '__m_private_field'

What we see:

  • To make an attribute (method or field) private, prefix it with “__”.
  • Private attributes can only be accessed from inside the class.
  • An “AttributeError” exception is raised when you try to access private attributes from outside the class.

Python does not support protected attributes (i.e., attributes that you can access in derived classes only). By convention, we prefix such attributes with a single underscore, so that users of the class know they’re an implementation detail, but authors of derived classes can still access them:

class BaseClass(object):
    def _ProtectedMethod(self):
        self._m_protected = "protected"

class DerivedClass(BaseClass):
    def PublicMethod(self):
        self._ProtectedMethod()
        print self._m_protected
    def __PrivateMethod(self):
        return self.__m_private_field

Properties

Python supports “properties”, i.e., pairs of Get/Set methods that are called transparently when you access an attribute:

class SomeClass(object):
    def __init__(self, initial_value):
        self.__m_read_write_prop = initial_value
        self.__m_read_only_prop = initial_value

    def __GetReadWriteProp(self):
        print "Someone's reading ReadWriteProp"
        return self.__m_read_write_prop

    def __SetReadWriteProp(self, new_value):
        print "Someone's writing ReadWriteProp"
        self.__m_read_write_prop = new_value

    ReadWriteProp = property(fget=__GetReadWriteProp,
                             fset=__SetReadWriteProp)

    def __GetReadOnlyProp(self):
        print "Someone's reading ReadOnlyProp"
        return self.__m_read_only_prop

    ReadOnlyProp = property(fget=__GetReadOnlyProp)

C = SomeClass("initial")
print "val =", c.ReadWriteProp
# Output:
#  Someone's reading ReadWriteProp
#  val = initial

c.ReadWriteProp = "new"
# Output:
#  Someone's writing ReadWriteProp

print "val =", c.ReadWriteProp
# Output:
#  Someone's reading ReadWriteProp
#  val = new

print "val =", c.ReadOnlyProp
# Output:
#  Someone's reading ReadWriteProp
#  val = initial

c.ReadOnlyProp = "new"    # AttributeError: can't set attribute

Note: For properties to work, the class must be derived from “object”. Otherwise, the property loses its special behavior and becomes a normal attribute as soon as you write “c.ReadWriteProp = 1000”.

Note: Properties are an application of Python’s “descriptor” concept. For more about this and other features of new-style classes, see my article “Introduction to New-Style Classes in Python”.

Static Fields and Static Methods

Python supports static fields and methods:

class InstanceCounter(object):
    s_num_instances = 0

    def __init__(self):
        InstanceCounter.s_num_instances += 1

    def GetNumInstances():    # no "self" here
        return InstanceCounter.s_num_instances

    GetNumInstances = staticmethod(GetNumInstances)

a = InstanceCounter()
b = InstanceCounter()
print InstanceCounter.GetNumInstances()    # prints 2
InstanceCounter.s_num_instances = 100
c = InstanceCounter()
print InstanceCounter.GetNumInstances()    # prints 101

print a.s_num_instances    # prints 101
a.s_num_instances = 5
print a.s_num_instances                    # prints 5
print InstanceCounter.GetNumInstances()    # still prints 101

Things to note:

  • According to our Python coding guidelines, static fields are prefixed with “s_”, but this is just convention.
  • Static methods do not have a “self” parameter, obviously.
  • Static fields and methods can be used both via the class and via an instance.
  • In the line “a.s_num_instances = 5”, an attribute named “s_num_instances” is added to the symbol table of “a”. This attribute hides the static field of the same name in “SomeClass” when you access it through “a”. The static field of “SomeClass” is not changed.

Class Methods

Class methods are similar to static methods (but less frequently used). Like static methods, they don’t receive a “self” parameter, but they receive a “cls” parameter with a reference to the class object:

class SomeClass(object):
    def ClassMethod(cls):
        print cls.__name__

    ClassMethod = classmethod(ClassMethod)

class DerivedClass(SomeClass):
    pass

SomeClass.ClassMethod()    # prints "SomeClass"
c = SomeClass()
c.ClassMethod()            # prints "SomeClass"
DerivedClass.ClassMethod() # prints "DerivedClass"
d = DerivedClass()
d.ClassMethod()            # prints "DerivedClass"

Example – Parsing XML

The following sections walk you through the task of writing a Python program that prints the contents of an XML document. This will give us plenty of opportunity to learn new things about Python programming in general.

Note: You can find the code and example XML documents in the ZIP package for this lesson.

Setting up a ContentHandler

The standard Python library contains an XML parser and modules to access XML documents using the SAX and DOM APIs. We’ll be using the SAX API from the “xml.sax” module. This module contains the function “parse”, which requires a user-defined class with callbacks for handling the various parts of the XML document.

# File: step_1xml_printer.py
import sys
import xml.sax
import xml.sax.handler

class Printer(xml.sax.handler.ContentHandler):
    def __init__(self):
        self.__m_level = 0
        self.m_num_elements = 0

    def startElement(self, name, attrs):
        # Invoked for each opening tag
        print " " * self.__m_level + name
        self.__m_level += 1
        self.m_num_elements += 1

    def endElement(self, name):
        # Invoked for each closing tag
        self.__m_level -= 1

def Main(filename_or_stream):
    handler = Printer()
    xml.sax.parse(filename_or_stream, handler)
    print handler.m_num_elements, "elements"

if __name__ == "__main__":
    Main(sys.argv[1])  # The name of the XML file must be the
                       # first command-line parameter.

The program produces output like this:

...
     MOB_Object
      MOB_InfoLine
       MOB_Text
      MOB_InfoLine
       MOB_Text
     MOB_Object
      MOB_GDL_ReelSlotGameLine
...
2995 elements

New things in the code:

  • The code in “if __name__ == “__main__”” is executed only when the “.py” file is the main program. When the “.py” file is loaded into another program via the “import” keyword, the code is not run. The importing module can call the “Main()” function later with an XML file of its own choice.
  • “sys.argv” is a list of command-line parameters to the Python program. “sys.argv[0]” contains the path to the program. The remaining elements contain the parameters.

Writing the Unit Test

In this section: In-memory files with the “StringIO” class and redirecting “sys.stdout”.

The program reads its input from a file and prints the output directly to STDOUT. One way of writing the unit test would be:

  • Prepare an XML file “test_input.xml” with test data.
  • Invoke the program from the unit test by running another instance of Python with redirected output:
    import os
    import sys
    …
    os.system(sys.executable + " xml_printer.py "
              "test_input.xml >output.txt")

    This is equivalent to running “xml_printer.py test_input.xml >output.txt” on the command line.

  • Compare the expected results to the results in “output.txt”.

While this approach might have its advantages, we’ll go down a different route:

  • Prepare the XML input in an in-memory file, which we can pass directly to the “Main” function. The “StringIO” class serves as an in-memory file.
  • Redirect STDOUT by setting the “sys.stdout” variable to another “StringIO” object. (The “print” statements in the program go through “sys.stdout” implicitly.)
  • Compare the expected results to the contents of the redirected STDOUT.

This is our unit test:

# File: step_1test_xml_printer.py
import StringIO
import sys
import unittest
import xml_printer

class TextXmlPrinter(unittest.TestCase):
    def setUp(self):
        # Redirect STDOUT so that all subsequent "print"
        # statements in the Python program go to a StringIO buffer.
        self.__m_old_stdout = sys.stdout
        sys.stdout = StringIO.StringIO()

    def tearDown(self):
        # Restore STDOUT so that prints go to the screen again.
        sys.stdout = self.__m_old_stdout

    def test_PrintHierarchy(self):
        # Prepare the XML in an in-memory file, i.e., in a
        # StringIO buffer.
        data = StringIO.StringIO(
            """<?xml version="1.0"?>
               <A>
                 <B>
                   <C/>
                 </B>
                 <D>
                   <E/>
                 </D>
               </A>
            """)
        xml_printer.Main(data)
        expected = ("An"
                    " Bn"
                    "  Cn"
                    " Dn"
                    "  En"
                    "5 elementsn")
        # Compare the expected results to the contents of
        # our redirected STDOUT.
        self.assertEquals(expected,
                          sys.stdout.getvalue())

if __name__ == "__main__":
    unittest.main()

Things to note:

  • The “setUp” method is called before each “test_” method.
  • The “tearDown” method is called after each “test_” method, even if the test fails.
  • We redirect STDOUT by temporarily setting the global variable “sys.stdout” to a “StringIO” buffer, and restoring the original stream afterwards.

Printing Attribute Values

In this section: Working with Unicode strings.

In the next step, we’ll print the XML attributes for each element. The attributes are passed as a dictionary to the “startElement” method of the “ContentHandler”. Try this:

# File: step_2xml_printer.py
import sys
import xml.sax
import xml.sax.handler

class Printer(xml.sax.handler.ContentHandler):
    def __init__(self):
        self.__m_level = 0
        self.m_num_elements = 0

    def startElement(self, name, attrs):
        # Invoked for each opening tag
        print " " * self.__m_level + name
        self.__m_level += 1
        self.m_num_elements += 1
        for attr_name, attr_value in attrs.items():
            print " " * self.__m_level + " -", print attr_name, "=", attr_value
             # This might cause an error. Explanation follows.

    def endElement(self, name):
        # Invoked for each closing tag
        self.__m_level -= 1

def Main(filename_or_stream):
    handler = Printer()
    xml.sax.parse(filename_or_stream, handler)
    print handler.m_num_elements, "elements"

if __name__ == "__main__":
    Main(sys.argv[1])  # The name of the XML file must be the
                       # first command-line parameter.

When you run this program on the command line with the file “scene.xml” from the example ZIP package, you receive this error in the line “print attr_name, “=”, attr_value”:

…
  File "C:Python25libencodingscp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'u20ac' in position 0: character maps to <undefined>

The reason for the error is this: The XML parser works with Unicode strings. One of the attribute values contains the € sign (Unicode 0x20ac). This character is stored in a Python string as follows:

>>> u"u20ac"
… u'u20ac'
>>> type(_)
… &lt;type 'unicode'&gt;

When you print a string of type “unicode” to STDOUT or to a file, it is converted to an 8-bit string using an encoding. The encoding used by the console (at least on my computer) is cp437. However, codepage 437 does not define the € sign, so the attempt to encode the string results in an error. Try it yourself:

>>> u"u20ac".encode("cp437")
… UnicodeEncodeError: 'charmap' codec can't encode character u'u20ac' in position 0: character maps to <undefined>

Important notes:

  • In PyCrust or another graphical shell, you can “print u’u20ac’” without problems. The error occurs only on the command line. The reason is that PyCrust uses its own file object for “sys.stdout”, which supports Unicode strings directly.
  • There is no encoding error when you work with normal strings. For example, if you read the € sign from a file in the standard Windows encoding, cp1252, it ends up as the Python string “x80”. When you print this, no encoding or decoding takes place. However, the wrong character might show up if the target of the print uses a different encoding than cp1252.

To solve the problem, we could either print “repr(attr_value)” instead, or we can encode the string to ASCII before printing it, replacing all unknown characters with “?”:

>>> u"u20ac".encode("ascii", "replace")
… '?'

So, the new code for printing the attributes looks like this:

for attr_name, attr_value in attrs.items():
    print " " * self.__m_level + " -", print attr_name, "=", 
          attr_value.encode("ascii", "replace")

If you need to convert an 8-bit string that contains characters in a certain encoding to a Unicode object, use the “decode” method:

>>> "x80".decode("cp1252")
… u'u20ac'
>>> _.encode("utf-8")
… 'xe2x82xac'
>>> _.decode("utf-8")
… u'u20ac'

Writing to a File Using an Encoding

To write Unicode strings to a file using a special encoding, you can use the “open” function from the “codecs” module. This is an example of printing the € sign to a UTF-8-encoded XML file:

import codecs
f = codecs.open("utf8.xml", "w", "utf-8")
print >> f, '<?xml version="1.0" encoding="utf-8"?>'
print >> f, u"<Root>u20ac</Root>"
f.close()

When you open the resulting file in a hex editor, you can see that the € sign is stored as a sequence of three bytes, as defined by the UTF-8 encoding. When you open the file in a text editor that supports the UTF-8 encoding, the € sign appears correctly.

Building an Object Tree

In this section: The “setattr” introspection function and more about lists.

Finally, let’s create Python objects from the XML elements. The unit test shows the desired interface of these objects:

# File: step_3test_xml_objects.py
import unittest
import xml_objects

data = """<?xml version="1.0"?>
<Page>
  <Paragraph align="left">
    This is <Bold>bold</Bold> text.
  </Paragraph>
  <Paragraph align="center">
    <Bold>Bold</Bold> and <Italic>italic</Italic>.
  </Paragraph>
  <Table border="1">
    <Row><Cell>A</Cell><Cell>B</Cell></Row>
  </Table>
</Page>"""

class TextXmlObjects(unittest.TestCase):
    def setUp(self):
        self.__m_root = xml_objects.Load(data)
    def test_ChildNodes(self):
        self.assertEquals(
            set(["Paragraph", "Table"]),
            self.__m_root.GetChildNames())
        self.assertEquals(
            2, len(self.__m_root.GetChildNodes("Paragraph")))
        self.assertEquals(
            1, len(self.__m_root.GetChildNodes("Table")))

    def test_Attributes(self):
        first_para = self.__m_root.GetChildNodes("Paragraph")[0]
        self.assertEquals(
            ["align"],
            first_para.GetAttributeNames())
        self.assertEquals(
            "left", first_para.align)

if __name__ == "__main__":
    unittest.main()

To summarize the interface:

  • The “Load” function returns the object for the root element.
  • The “GetChildNames” method returns a set of the child element names.
  • The “GetChildNodes” method returns a list of the child elements of a given name.
  • The “GetAttributeNames” method returns a list of XML attribute names.
  • The XML attribute values can be queried using normal Python attributes.

This is the code of the program:

# File: step_3xml_objects.py
import xml.sax
import xml.sax.handler

class Element(object):
    def __init__(self):
        self.__m_child_nodes = []
        self.__m_attributes = {}

    def AddChildNode(self, name, element):
        self.__m_child_nodes.append((name, element))

    def AddAttributes(self, attrs):
        self.__m_attributes.update(attrs)
        for (attr_name,
             attr_value) in self.__m_attributes.iteritems():
            setattr(self, attr_name, attr_value)

    def GetChildNames(self):
        return set([name for name, element in self.__m_child_nodes])

    def GetChildNodes(self, element_name):
        return [element for name, element in self.__m_child_nodes
                if name == element_name]

    def GetAttributeNames(self):
        return self.__m_attributes.keys()

class Loader(xml.sax.handler.ContentHandler):
    def __init__(self):
        self.__m_element_stack = []
        self.__m_root = None

    def GetRoot(self):
        return self.__m_root

    def startElement(self, name, attrs):
        element = Element()
        element.AddAttributes(attrs)
        self.__m_element_stack.append(element)

    def endElement(self, name):
        element = self.__m_element_stack.pop()
        if self.__m_element_stack:
            self.__m_element_stack[-1].AddChildNode(name, element)
        else:
            self.__m_root = element

def Load(xml_string):
    handler = Loader()
    xml.sax.parseString(xml_string, handler)
    return handler.GetRoot()

The code works like this:

  • When a new XML element starts, an “Element” instance is pushed on a stack.
  • The XML attributes are passed to the “AddAttributes” method of the “Element”.
  • In “AddAttributes”, new Python attributes are added to the instance using “setattr”: The code “setattr(x, “y”, z)” has the same effect as “x.y = z”. (“getattr” can be used for reading.)
  • When an XML element ends, the “Element” instance is popped off the stack and passed to the “AddChildNode” method of the parent element.
  • When there are no more elements on the stack, we have reached the root.
  • The “GetChildNames” and “GetChildNodes” methods use the list comprehension syntax (“[x for y in ys if expr]”) to return parts of the list “self.__m_child_nodes”.

More list comprehension examples:

>>> names = ["John", "Frank", "Sue", "Jane"]
>>> numbers = [1, 2, 3, 4, 5, 6, 7, 8]
>>> [n.upper() for n in names if n.startswith("J")]
… ["JOHN", "JANE"]
>>> # Convert two lists to a list of tuples by using "zip".
>>> [x for x in zip(numbers, names)]
… [(1, 'John'), (2, 'Frank'), (3, 'Sue'), (4, 'Jane')]

Homework

Extend “xml_objects.py” so that it handles the text contents of the elements. In the element…

    <Paragraph align="left">
        This is <Bold>bold</Bold> text.
    </Paragraph>

… it should be possible to retrieve the text “This is”, the child element “<Bold>”, and the text “text.” in the correct order.

Write the unit test first to help you find a convenient interface.

Advertisement

Leave a Reply

Your email address will not be published. Required fields are marked *