Sunday, October 3, 2021

A Bug Story

User reports an application crash that occurs when using a certain software component ("DataFile") but which only occurs when *not* connected to the test hardware. The DataFile component has nothing to do with hardware - it provides a proxy for a data file and allows the user to access the floating-point data in that file via array indexing notation (with the actual file reading happening transparently and the data being cached).

So the fact that a problem is being reported that correlates with a hardware connection (or lack of one) seemed reminiscent of the stories like the car that had trouble starting whenever the owner went to the store for *vanilla* ice cream, but not other flavours (http://www.snopes.com/autos/techno/icecream.asp). Or the password that didn't work if the user was standing up.

It turned out that the user was grossly misusing the DataFile component. It is intended for use with a file with N floats per line, separated by spaces or commas, etc. The user's file had one line with a string containing 30,000 0's and 1's, with no spaces or other delimiters. The low-level file reading and parsing is done via a standard Python library function. That function dutifully converted the string from the file into a float - but since it treated it as a float with 30,000 digits, the result was IEEE infinity. So my code got back an array with one entry - that infinity. And I had a bug in my error-checking code in the DataFile component (wasn't expecting infinity) that caused a Python exception.

So I fixed the error checking code, added some more validation to the DataFile component to prevent users misusing it like this, and started to type up my bug fix report. Then I remembered the weirdness about this only happening when *not* connected to the hardware and investigated that. I have some code that tries to determine whether a hardware connection is necessary for what the user wants to do with our tool. Some of the components are purely computational but others "talk" to the hardware to make measurements. So I parse out the variable names that the user refers to in their program, and loop through the list of variable names, checking to see if any of them are components that need to talk to the hardware. That code did a lookup of the variable name followed by a check to see if the lookup succeeded. The latter was testing the object  pointer from the lookup in the way that you might do it in C:

	if objPtr:
		doSomething()

That works fine in Python as well for testing if for a null object pointer. The problem is that in Python it does a bit more. If the object pointer is not null, it checks to see if the object contains anything. E.g. if checks if a string contains any characters. This sort of thing is often what you want and so it is convenient. But in the case of the DataFile class, it did something which I hadn't considered. The DataFile class implements a 'len' operator which allows the user to find out how many records are in the file by applying the 'len' operator to the DataFile instance. And so the 'if objPtr' line triggered a call to the 'len' operator. But that triggered a reading of the file (since the data is only read on demand and then cached). So my check to see if the variable was of a class that needed a hardware connection was triggering a reading of the data file - which caused an exception due to my bad error checking. And this code was "so simple" that I didn't bother checking for exceptions there, causing the application crash. And of course this whole check was only done if the user ran the program when *not* connected to the hardware since if it's already connected to the hardware there is no need to check if an connection is needed.

I fixed that problem by explicitly checking for a null object pointer (as I should have done before). Bug report done. Fixes checked into source code control. One last thing - check that the sample programs illustrating the use of the DataFile component are still working. Boom - first one fails right away. It turns out that my data caching code had forgotten to set the "good data" flag in one case. So the first time you asked for a data entry using the DataFile component, it read the file and then threw the data away. Only on the second time the file was read did the cached data stick around. And this egregious bug hadn't been noticed (for months!) since the data file was getting read inadvertently at program startup due to the 'if objPtr' bug I described above.

Sunday, January 3, 2021

Trapped, Icelandic murder mystery series

I highly recommend the Icelandic murder mystery series "Trapped"
(Trapped on Wikipedia)

There are a lot of characters to keep track of, so to make it easier for new viewers of the series, I made a character list:
Trapped TV Series Characters (PDF)

Monday, October 27, 2014

Getting version info for Python modules (for use in a super-installer)

The application that I am developing uses a lot of different 3rd-party Python modules. The installer (a Windows program) for this application installs these Python modules if they don't already exist on the target system. So the first thing the installer does is find out what version (if any) of these Python modules are currently installed. Below I refer to the application installer as a "super-installer" since it invokes the installers for these 3rd-party modules.

The semi-standard way for a Python module to provide version info is to have an attribute '__version__'.  For example:
>>> import numpy
>>> numpy.__version__
'1.8.2'
But some modules have an attribute 'version'.  For example:
>>> import tornado
>>> tornado.version
'3.1.1'
And some other modules have 'version' as a sub-attribute of 'release'. For example:
>>> import pyreadline
>>> pyreadline.release.version
'1.7.1'
Unfortunately, a few 3rd-party modules don't provide any runtime-queryable version info at all. For those, my super-installer writes the version info into a file in the module's folder after the module's installer has finished. The file name used for this was chosen to be a non-standard one (including the name of the company) so that it won't collide with any version mechanism introduced in future Python or module revisions. The code for reading the version of a Python module looks like this:
def getPythonModuleVersion(moduleName):
    """
    Return a string with the version info for the specified
    Python module.
    Return None if the module does not exist (import fails).
    Return 'UnknownVersion' if no version info is available.
    """
    try:
        moduleObj = importlib.import_module(moduleName)
        if moduleObj is None:
            return None
    except Exception:
        return None

    versionStr = getattr(moduleObj, '__version__', None)
    if versionStr is not None:
        return versionStr

    # '.version' works for 'tornado'
    versionStr = getattr(moduleObj, 'version', None)
    if versionStr is not None:
        return versionStr

    # '.release.version' works for 'pyreadline'
    releaseObj = getattr(moduleObj, 'release', None)
    if releaseObj is not None:
        versionStr = getattr(releaseObj, 'version', None)
        if versionStr is not None:
            return versionStr

    # Final try: see if there is a version file in the folder
    # of the module (this is not a standard mechanism but our
    # installer creates such a file if there is no other
    # version info.)
    versionFilePath = getVersionFilePath(moduleObj)
    if (versionFilePath is not None
        and os.path.isfile(versionFilePath)):
        contents = readContentsOfFile(versionFilePath)
        versionStr = contents.strip()
        if versionStr != '':
            return versionStr

    return 'UnknownVersion'
The function 'getVersionFilePath' looks like this:
def getVersionFilePath(moduleObj):
    """
    Return the path to the version file
    (in the module's folder).
    Note: this file does not necessarily exist.
    Return None if can't get the module's folder.
    """
    moduleFilePath = getattr(moduleObj, '__file__', None)
    if moduleFilePath is None:
        return None
    moduleFolderPath = os.path.dirname(moduleFilePath)
    # VersionFileName is a name chosen to not collide
    versionFilePath = os.path.join(moduleFolderPath,
                                   VersionFileName)
    return versionFilePath
The Python script (run from the super-installer) that checks if a particular Python module is installed and gets the version of the currently installed module looks like this:
    if len(sys.argv) < 2:
        sys.exit("Must supply the name of a Python module")
    moduleName = sys.argv[1]
    versionStrToAddIfNone = None
    if len(sys.argv) > 2:
        versionStrToAddIfNone = sys.argv[2]

    versionStr = getPythonModuleVersion(moduleName)

    # If 'versionStrToAddIfNone' is not None,
    # handle the case where module supplies no version info:
    if (versionStr == 'UnknownVersion'
        and versionStrToAddIfNone is not None):
        status = addVersionFile(moduleName,
                                versionStrToAddIfNone)
        if status:
            versionStr = versionStrToAddIfNone

    if versionStr is not None:
        print versionStr
    else:
        print "NoSuchModule"
The super-installer runs the above script with just one command-line argument (the module name) to find out the currently installed version of each 3rd-party Python module. If the current version is older than what is needed by the application, the installer for the new version of the Python module is run, and then the above script is run with a second command-line argument specifying the version of the module that was just installed. If the module didn't supply any version info, a version file is created via the function 'addVersionFile' which looks like this:
def addVersionFile(moduleName, versionStr):
    """
    Create a version file in the folder of the module.
    Return True iff successful.
    """
    try:
        moduleObj = importlib.import_module(moduleName)
        if moduleObj is None:
            return False
    except Exception:
        return False
    versionFilePath = getVersionFilePath(moduleObj)
    if versionFilePath is not None:
        writeStringToFile(versionFilePath, versionStr)
        return True
    return False
By the way, the use of 'pkg_resources' to obtain version info is problematic if binary Windows installers have been used to install a Python module. For example, look at this:
>>> import pkg_resources
>>> pkg_resources.get_distribution('numpy').version
'1.7.1'
>>> import numpy
>>> numpy.__version__
'1.8.2'

Sunday, October 16, 2011

Getting Process Info from iOS 5's LowMemory Diagnostic Reports

In iOS 5 (the latest version of Apple's OS for iPhones, iPads, etc), you can look at the diagnostic reports that it sends (if you have okayed this) to Apple via the "Settings" app.
The diagnostic reports are in the section:
General >> About >> Diagnostics & Usage >> Diagnostics & Usage Data

I wrote a Python script that extracts info from the LowMemory reports. It prints out the names of the processes and the amount (in MB) of memory that each was using.
Here's the script: processLowMemReport.py
(It is published under the MIT licence.)

Sample output (from my iPhone 3GS):

1PasswordPro7.230
AppleIDAuthAgent0.996
BTServer1.023
CommCenterClassi2.082
Medieval_iphone_36.859
MobileMail11.609
MobilePhone4.941
Music~iphone10.059
Pastebot4.453
Preferences3.215
SCHelper0.633
SpringBoard29.078
Tweetbot7.816
UserEventAgent1.664
absinthed.N880.531
accountsd1.043
aggregated1.137
aosnotifyd1.918
apsd0.965
awdd0.762
configd1.531
dataaccessd10.750
fairplayd.N880.633
fseventsd1.141
geod1.285
iapd1.254
imagent1.992
installd4.457
launchd0.480
locationd3.000
lockdownd1.312
mDNSResponder0.754
mediaremoted0.801
mediaserverd4.160
misd0.168
networkd0.582
notifyd0.898
powerd0.543
scrabble15.738
securityd0.848
syslogd0.445
ubd1.273
wifid1.117
yap.TV3.730

Wednesday, May 4, 2011

Enums in Python: a more flexible and powerful implementation

There is no standard implementation of enums (enumerated values) in Python. But enums are very useful and exist in most other programming languages. Hence several "recipes" for enum implementations have been proposed. As is clear from the differences between these implementations (and the comments that others have made about them), there are many different ideas about what the characteristics of an enum should be.

I needed enums for a Python project where the enums would be part of an API that I was defining for use by end-users. My requirements for an enum implementation were:

  1. Allow definition of a named class representing the set of allowable values. (I needed a class, not an instance, since I wanted to be able to specify the enum class as the parameter type for API functions that expected a class.) The enum classes should be derived from a common superclass (so I could test if a type was an enum type).
  2. The allowable values would be represented by instances of the enum class that would be automatically created at definition time. Each instance would have a name and a value (i.e. would be a name/value pair).
  3. The values should not be restricted to be integers (e.g. the values could be floats). But the values should be immutable objects (e.g. not lists)).
  4. Each enum class can be queried to get the list of allowable names and values.
  5. Each enum class can be queried to find the name/value pair instance given the name.
  6. It shouldn't be possible to modify the value of an existing name/value pair.
  7. Allow definition of alternative names for existing enum values. (e.g. so that the name can be changed in a future revision of the API while maintaining backward compatibility for users' source code) but these alternative names should not show up when the list of allowable values is queried.
None of the existing enum implementations fulfilled the above requirements, so I wrote my own. Here's the source code for my implementation: enum.py
(It is published under the MIT licence.)

To define an enum class, you use the 'Enum' function from the above 'enum' module. This function expects its first argument to be a string specifying the name of the enum class you are defining. The subsequent arguments should be keyword arguments specifying the names and values for this enum class. The 'Enum' function returns a class object (which is why its name is capitalized). For example, to define an enum class named 'Colours', you could do this:
from enum import Enum

>>> Colours = Enum('Colours', red=1, green=2, blue=3)

After that, there would be instances Colours.red, Colours.green, Colours.blue representing the possible colour values. To get the value of the 'red' instance, you would use 'Colours.red.value' (which is 1). To get the 'red' instance given a colour string in the variable 'col', you would use 'Colours.get(col)' or 'Colours(col)'. (If that string variable contained anything other than "red", "green" or "blue", you would get an exception.)

Some more examples of use will show the other facilities of this enum implementation:

from enum import Enum

>>> Colours = Enum('Colours', red=1, green=2, blue=3)
>>> Colours.green.name
'green'

>>> Colours.green.value
2

>>> Colours.names()
['red', 'green', 'blue']

>>> Colours.get('blue').value
3

>>> Colours.getByValue(3).name
'blue'

>>> Colours('blue').value
3

>>> Colours.addAlias('bleu', 'blue')
>>> Colours.names()
['red', 'green', 'blue']

>>> Colours.bleu.value
3

>>> [x.name for x in Colours.objects()]
['red', 'green', 'blue']

>>> listOfNames = ['red', 'blue']
>>> [x.name for x in Colours.listOfNamesToListOfEnums(listOfNames)]
['red', 'blue']

>>> commaSepStr = "red, blue"
>>> [x.name for x in Colours.commaSepStrToEnumList(commaSepStr)]
['red', 'blue']


>>> Numbers = Enum('Numbers', pi=3.1415926, e=2.71828)
>>> isinstance(Numbers, type)
True

>>> Numbers.names()
['e', 'pi']

>>> round(Numbers.e.value, 3)
2.718

>>> x = Numbers.pi
>>> x is Numbers('pi')
True

>>> Numbers('phi')
Traceback (most recent call last):
    ...
ValueError: 'phi' is not a valid enum name (Must be one of: ['e', 'pi'])

>>> Numbers.e.value = 42
Traceback (most recent call last):
    ...
TypeError: can't modify an enum

Thursday, April 28, 2011

Where did the time go? (Instrumenting C++/CLI and Python code)

The Problem:
How to speed up your program?
It's working ("Make it right, then make it fast") but it's a bit slower than you hoped. The first step in making it faster is to find out which parts are taking the most time.
I'm not (necessarily) talking about CPU time here - what matters in most cases is how long the user needs to wait until the program finishes its task. The bottlenecks might not be CPU time - e.g. the program might be spending a lot of time doing I/O or waiting for some other program that it interacts with, etc. So what I usually want to know is which parts of the program take up the most wall clock time. There are various sophisticated tools available to do this, but often they won't work for your specific situation (e.g. when your program has different interacting components written in different programming languages). And often you don't want to invest a lot of effort in acquiring and learning to use a sophisticated tool. Something simple is often best. And something that allows you to successively refine your efforts to "zoom in" on the bottlenecks.

The Solution:
A solution that has worked well for me is a utility class that makes it easy to instrument my code, bracketing sections of interest and generating reports at certain points. The reports refer to the sections of the program by the names that I assigned to them when I instrumented the code and show how much time elapsed in each section.

I've implemented a utility class called "TimeSpent" in Python and in C++/CLI for .NET. I supply the code below but here's an example of how to use it in Python (the usage in C++ is very similar). Suppose your Python program looks like this:
#!/usr/bin/env python
from myStuff import (func1, func2, func3, func4)

x = 1
func2('hello')
func1(x)
func3()
x = func4()
func1(x)
func2('goodbye')

To find out how much each part of this program contributes to the overall execution time, we instrument the code by creating an instance of the 'TimeSpent' class and adding calls to the 'start' and 'stop' methods around each section of interest:
#!/usr/bin/env python
from myStuff import (func1, func2, func3, func4)
from timeSpent import TimeSpent

x = 1
timeSpent = TimeSpent()
timeSpent.start("A")
func2('hello')
func1(x)
timeSpent.stop("A")
func3()
timeSpent.start("B")
x = func4()
timeSpent.stop("B")
timeSpent.start("A")
func1(x)
func2('goodbye')
timeSpent.stop("A")
print timeSpent.generateReportAndReset()

Each call to the 'start' method starts a stopwatch (creating it automatically if one with the specified name doesn't exist) and the corresponding 'stop' method stops that stopwatch. The accumulated times for all of the stopwatches is reported via the 'generateReportAndReset' method. Above I have named the stopwatches "A" and "B" and have used stopwatch "A" to measure the two separate sections where 'func1' and 'func2' are called.
Here's the output from the instrumented version of the program:
hello
goodbye
Elapsed seconds: 0.7
Breakdown of time spent (in milliseconds):
A: 629 (count: 2 avg: 314.6)
B: 95 (count: 1 avg: 95.0)
Note that the count for "A" is 2 - this reflects the fact that stopwatch "A" was used twice. The time of 629 milliseconds is the sum of the times taken for the two sections of code where stopwatch "A" was used.
Running the instrumented program again shows a longer execution time:
hello
goodbye
Elapsed seconds: 1.2
Breakdown of time spent (in milliseconds):
A: 1070 (count: 2 avg: 534.8)
B: 103 (count: 1 avg: 102.7)
And further runs would show that the "A" time is sometimes around 600 ms but other times around 1100 ms. To investigate this, we change the stopwatch name to "A1" for the first section and "A2" for the second section. Here's some outputs from this version of the program:
hello
goodbye
Elapsed seconds: 0.7
Breakdown of time spent (in milliseconds):
A1: 314 (count: 1 avg: 314.2)
A2: 293 (count: 1 avg: 292.9)
B: 97 (count: 1 avg: 96.7)
hello
goodbye
Elapsed seconds: 1.3
Breakdown of time spent (in milliseconds):
A1: 310 (count: 1 avg: 310.0)
A2: 843 (count: 1 avg: 842.7)
B: 105 (count: 1 avg: 104.6)
So it seems that the variation in overall execution time is due to variations in section "A2". With the hypothesis that the variation might be due to different values of 'x', let's change the names of the stopwatches to incorporate the 'x' value in their names:
#!/usr/bin/env python
from myStuff import (func1, func2, func3, func4)
from timeSpent import TimeSpent

x = 1
timeSpent = TimeSpent()
timeSpent.start("A (x=%d)" % x)
func2('hello')
func1(x)
timeSpent.stop("A (x=%d)" % x)
func3()
timeSpent.start("B")
x = func4()
timeSpent.stop("B")
timeSpent.start("A (x=%d)" % x)
func1(x)
func2('goodbye')
timeSpent.stop("A (x=%d)" % x)
print timeSpent.generateReportAndReset()
Here's some outputs from this version of the program:
hello
goodbye
Elapsed seconds: 0.7
Breakdown of time spent (in milliseconds):
A (x=1): 600 (count: 2 avg: 300.1)
B: 98 (count: 1 avg: 97.8)
hello
goodbye
Elapsed seconds: 1.2
Breakdown of time spent (in milliseconds):
A (x=-1): 783 (count: 1 avg: 783.3)
A (x=1): 284 (count: 1 avg: 283.8)
B: 108 (count: 1 avg: 108.3)
The above shows that when x is -1, the second "A" section takes a lot longer than when x is 1.

The ability to use arbitrary strings to name the stopwatches is one of the things that make this utility class user-friendly (here the user is the programmer). Preventing or detecting common errors is another aspect of user (programmer) friendliness. The usage of the TimeSpent class relies on the names matching up in the 'start' and 'stop' methods. What happens if the programmer slips up? Here's an example with two slip-ups: the call to stop("8") should have been stop("B"), and the call to start("A") on the second last line should have been stop("A")
#!/usr/bin/env python
from myStuff import (func1, func2, func3, func4)
from timeSpent import TimeSpent

x = 1
timeSpent = TimeSpent()
timeSpent.start("A")
func2('hello')
func1(x)
timeSpent.stop("A")
func3()
timeSpent.start("B")
x = func4()
timeSpent.stop("8")
timeSpent.start("A")
func1(x)
func2('goodbye')
timeSpent.start("A")
print timeSpent.generateReportAndReset()
The output from running the above version of the program makes it clear to the programmer that there were some mistakes:
hello
WARNING: unknown stopwatch name passed to 'stop' (8)
goodbye
WARNING: start(A) called when stopwatch is already running
Elapsed seconds: 0.7
Breakdown of time spent (in milliseconds):
A: 599 (count: 2 avg: 299.4)
B: 390 (count: 1 avg: 390.5)

The is no limit to how many stopwatches can be running at once. If you want to "zoom in" on a section of code, just bracket the sub-sections with more stopwatches (named appropriately) and the report will show the time for the whole section as well as the breakdown by sub-section.

If you want to track the time spent in code that is in many different files, it is convenient to use the 'getGlobalInstance' method of the 'TimeSpent' class - it gives you access to a singleton instance that you can use anywhere in your program.

By the way, here's the 'myStuff' module I used in the above test program:
#!/usr/bin/env python
from __future__ import division
import time
import random

def wasteTime(numMillis, stdDev=None):
    if stdDev is None:
        stdDev = numMillis / 15
    time.sleep(random.gauss(numMillis, stdDev) / 1000)

def func1(x):
    if x > 0:
        wasteTime(100)
    else:
        wasteTime(600)

def func2(msg):
    wasteTime(200)
    print msg

def func3():
    wasteTime(10)

def func4():
    wasteTime(100)
    if random.random() < 0.5:
        return 1
    else:
        return -1


Source code:
The source code for the TimeSpent class (in Python and in C++/CLI (.NET)) is published under the MIT licence and is available from the following links:

Thursday, April 21, 2011

Minimize Nesting of If Statements

The essay "Code in Motion" by Laura Wingerd and Christopher Seiwald of Perforce Software (from the book "Beautiful Code") emphasizes the importance of readability of source code for avoiding bugs. One of their style rules is to minimize the nesting of 'if' statements.
Here's a graph showing an increase in bugs possibly correlated with an
increase in nesting levels in the Perforce "DiffMerge" program source code: