Sunday, October 16, 2011

Getting Process Info from iOS 5's LowMemory Diagnostic Reports

In iOS 5 (the latest version of Apple's OS for iPhones, iPads, etc), you can look at the diagnostic reports that it sends (if you have okayed this) to Apple via the "Settings" app.
The diagnostic reports are in the section:
General >> About >> Diagnostics & Usage >> Diagnostics & Usage Data

I wrote a Python script that extracts info from the LowMemory reports. It prints out the names of the processes and the amount (in MB) of memory that each was using.
Here's the script: processLowMemReport.py
(It is published under the MIT licence.)

Sample output (from my iPhone 3GS):

1PasswordPro7.230
AppleIDAuthAgent0.996
BTServer1.023
CommCenterClassi2.082
Medieval_iphone_36.859
MobileMail11.609
MobilePhone4.941
Music~iphone10.059
Pastebot4.453
Preferences3.215
SCHelper0.633
SpringBoard29.078
Tweetbot7.816
UserEventAgent1.664
absinthed.N880.531
accountsd1.043
aggregated1.137
aosnotifyd1.918
apsd0.965
awdd0.762
configd1.531
dataaccessd10.750
fairplayd.N880.633
fseventsd1.141
geod1.285
iapd1.254
imagent1.992
installd4.457
launchd0.480
locationd3.000
lockdownd1.312
mDNSResponder0.754
mediaremoted0.801
mediaserverd4.160
misd0.168
networkd0.582
notifyd0.898
powerd0.543
scrabble15.738
securityd0.848
syslogd0.445
ubd1.273
wifid1.117
yap.TV3.730

Wednesday, May 4, 2011

Enums in Python: a more flexible and powerful implementation

There is no standard implementation of enums (enumerated values) in Python. But enums are very useful and exist in most other programming languages. Hence several "recipes" for enum implementations have been proposed. As is clear from the differences between these implementations (and the comments that others have made about them), there are many different ideas about what the characteristics of an enum should be.

I needed enums for a Python project where the enums would be part of an API that I was defining for use by end-users. My requirements for an enum implementation were:

  1. Allow definition of a named class representing the set of allowable values. (I needed a class, not an instance, since I wanted to be able to specify the enum class as the parameter type for API functions that expected a class.) The enum classes should be derived from a common superclass (so I could test if a type was an enum type).
  2. The allowable values would be represented by instances of the enum class that would be automatically created at definition time. Each instance would have a name and a value (i.e. would be a name/value pair).
  3. The values should not be restricted to be integers (e.g. the values could be floats). But the values should be immutable objects (e.g. not lists)).
  4. Each enum class can be queried to get the list of allowable names and values.
  5. Each enum class can be queried to find the name/value pair instance given the name.
  6. It shouldn't be possible to modify the value of an existing name/value pair.
  7. Allow definition of alternative names for existing enum values. (e.g. so that the name can be changed in a future revision of the API while maintaining backward compatibility for users' source code) but these alternative names should not show up when the list of allowable values is queried.
None of the existing enum implementations fulfilled the above requirements, so I wrote my own. Here's the source code for my implementation: enum.py
(It is published under the MIT licence.)

To define an enum class, you use the 'Enum' function from the above 'enum' module. This function expects its first argument to be a string specifying the name of the enum class you are defining. The subsequent arguments should be keyword arguments specifying the names and values for this enum class. The 'Enum' function returns a class object (which is why its name is capitalized). For example, to define an enum class named 'Colours', you could do this:
from enum import Enum

>>> Colours = Enum('Colours', red=1, green=2, blue=3)

After that, there would be instances Colours.red, Colours.green, Colours.blue representing the possible colour values. To get the value of the 'red' instance, you would use 'Colours.red.value' (which is 1). To get the 'red' instance given a colour string in the variable 'col', you would use 'Colours.get(col)' or 'Colours(col)'. (If that string variable contained anything other than "red", "green" or "blue", you would get an exception.)

Some more examples of use will show the other facilities of this enum implementation:

from enum import Enum

>>> Colours = Enum('Colours', red=1, green=2, blue=3)
>>> Colours.green.name
'green'

>>> Colours.green.value
2

>>> Colours.names()
['red', 'green', 'blue']

>>> Colours.get('blue').value
3

>>> Colours.getByValue(3).name
'blue'

>>> Colours('blue').value
3

>>> Colours.addAlias('bleu', 'blue')
>>> Colours.names()
['red', 'green', 'blue']

>>> Colours.bleu.value
3

>>> [x.name for x in Colours.objects()]
['red', 'green', 'blue']

>>> listOfNames = ['red', 'blue']
>>> [x.name for x in Colours.listOfNamesToListOfEnums(listOfNames)]
['red', 'blue']

>>> commaSepStr = "red, blue"
>>> [x.name for x in Colours.commaSepStrToEnumList(commaSepStr)]
['red', 'blue']


>>> Numbers = Enum('Numbers', pi=3.1415926, e=2.71828)
>>> isinstance(Numbers, type)
True

>>> Numbers.names()
['e', 'pi']

>>> round(Numbers.e.value, 3)
2.718

>>> x = Numbers.pi
>>> x is Numbers('pi')
True

>>> Numbers('phi')
Traceback (most recent call last):
    ...
ValueError: 'phi' is not a valid enum name (Must be one of: ['e', 'pi'])

>>> Numbers.e.value = 42
Traceback (most recent call last):
    ...
TypeError: can't modify an enum

Thursday, April 28, 2011

Where did the time go? (Instrumenting C++/CLI and Python code)

The Problem:
How to speed up your program?
It's working ("Make it right, then make it fast") but it's a bit slower than you hoped. The first step in making it faster is to find out which parts are taking the most time.
I'm not (necessarily) talking about CPU time here - what matters in most cases is how long the user needs to wait until the program finishes its task. The bottlenecks might not be CPU time - e.g. the program might be spending a lot of time doing I/O or waiting for some other program that it interacts with, etc. So what I usually want to know is which parts of the program take up the most wall clock time. There are various sophisticated tools available to do this, but often they won't work for your specific situation (e.g. when your program has different interacting components written in different programming languages). And often you don't want to invest a lot of effort in acquiring and learning to use a sophisticated tool. Something simple is often best. And something that allows you to successively refine your efforts to "zoom in" on the bottlenecks.

The Solution:
A solution that has worked well for me is a utility class that makes it easy to instrument my code, bracketing sections of interest and generating reports at certain points. The reports refer to the sections of the program by the names that I assigned to them when I instrumented the code and show how much time elapsed in each section.

I've implemented a utility class called "TimeSpent" in Python and in C++/CLI for .NET. I supply the code below but here's an example of how to use it in Python (the usage in C++ is very similar). Suppose your Python program looks like this:
#!/usr/bin/env python
from myStuff import (func1, func2, func3, func4)

x = 1
func2('hello')
func1(x)
func3()
x = func4()
func1(x)
func2('goodbye')

To find out how much each part of this program contributes to the overall execution time, we instrument the code by creating an instance of the 'TimeSpent' class and adding calls to the 'start' and 'stop' methods around each section of interest:
#!/usr/bin/env python
from myStuff import (func1, func2, func3, func4)
from timeSpent import TimeSpent

x = 1
timeSpent = TimeSpent()
timeSpent.start("A")
func2('hello')
func1(x)
timeSpent.stop("A")
func3()
timeSpent.start("B")
x = func4()
timeSpent.stop("B")
timeSpent.start("A")
func1(x)
func2('goodbye')
timeSpent.stop("A")
print timeSpent.generateReportAndReset()

Each call to the 'start' method starts a stopwatch (creating it automatically if one with the specified name doesn't exist) and the corresponding 'stop' method stops that stopwatch. The accumulated times for all of the stopwatches is reported via the 'generateReportAndReset' method. Above I have named the stopwatches "A" and "B" and have used stopwatch "A" to measure the two separate sections where 'func1' and 'func2' are called.
Here's the output from the instrumented version of the program:
hello
goodbye
Elapsed seconds: 0.7
Breakdown of time spent (in milliseconds):
A: 629 (count: 2 avg: 314.6)
B: 95 (count: 1 avg: 95.0)
Note that the count for "A" is 2 - this reflects the fact that stopwatch "A" was used twice. The time of 629 milliseconds is the sum of the times taken for the two sections of code where stopwatch "A" was used.
Running the instrumented program again shows a longer execution time:
hello
goodbye
Elapsed seconds: 1.2
Breakdown of time spent (in milliseconds):
A: 1070 (count: 2 avg: 534.8)
B: 103 (count: 1 avg: 102.7)
And further runs would show that the "A" time is sometimes around 600 ms but other times around 1100 ms. To investigate this, we change the stopwatch name to "A1" for the first section and "A2" for the second section. Here's some outputs from this version of the program:
hello
goodbye
Elapsed seconds: 0.7
Breakdown of time spent (in milliseconds):
A1: 314 (count: 1 avg: 314.2)
A2: 293 (count: 1 avg: 292.9)
B: 97 (count: 1 avg: 96.7)
hello
goodbye
Elapsed seconds: 1.3
Breakdown of time spent (in milliseconds):
A1: 310 (count: 1 avg: 310.0)
A2: 843 (count: 1 avg: 842.7)
B: 105 (count: 1 avg: 104.6)
So it seems that the variation in overall execution time is due to variations in section "A2". With the hypothesis that the variation might be due to different values of 'x', let's change the names of the stopwatches to incorporate the 'x' value in their names:
#!/usr/bin/env python
from myStuff import (func1, func2, func3, func4)
from timeSpent import TimeSpent

x = 1
timeSpent = TimeSpent()
timeSpent.start("A (x=%d)" % x)
func2('hello')
func1(x)
timeSpent.stop("A (x=%d)" % x)
func3()
timeSpent.start("B")
x = func4()
timeSpent.stop("B")
timeSpent.start("A (x=%d)" % x)
func1(x)
func2('goodbye')
timeSpent.stop("A (x=%d)" % x)
print timeSpent.generateReportAndReset()
Here's some outputs from this version of the program:
hello
goodbye
Elapsed seconds: 0.7
Breakdown of time spent (in milliseconds):
A (x=1): 600 (count: 2 avg: 300.1)
B: 98 (count: 1 avg: 97.8)
hello
goodbye
Elapsed seconds: 1.2
Breakdown of time spent (in milliseconds):
A (x=-1): 783 (count: 1 avg: 783.3)
A (x=1): 284 (count: 1 avg: 283.8)
B: 108 (count: 1 avg: 108.3)
The above shows that when x is -1, the second "A" section takes a lot longer than when x is 1.

The ability to use arbitrary strings to name the stopwatches is one of the things that make this utility class user-friendly (here the user is the programmer). Preventing or detecting common errors is another aspect of user (programmer) friendliness. The usage of the TimeSpent class relies on the names matching up in the 'start' and 'stop' methods. What happens if the programmer slips up? Here's an example with two slip-ups: the call to stop("8") should have been stop("B"), and the call to start("A") on the second last line should have been stop("A")
#!/usr/bin/env python
from myStuff import (func1, func2, func3, func4)
from timeSpent import TimeSpent

x = 1
timeSpent = TimeSpent()
timeSpent.start("A")
func2('hello')
func1(x)
timeSpent.stop("A")
func3()
timeSpent.start("B")
x = func4()
timeSpent.stop("8")
timeSpent.start("A")
func1(x)
func2('goodbye')
timeSpent.start("A")
print timeSpent.generateReportAndReset()
The output from running the above version of the program makes it clear to the programmer that there were some mistakes:
hello
WARNING: unknown stopwatch name passed to 'stop' (8)
goodbye
WARNING: start(A) called when stopwatch is already running
Elapsed seconds: 0.7
Breakdown of time spent (in milliseconds):
A: 599 (count: 2 avg: 299.4)
B: 390 (count: 1 avg: 390.5)

The is no limit to how many stopwatches can be running at once. If you want to "zoom in" on a section of code, just bracket the sub-sections with more stopwatches (named appropriately) and the report will show the time for the whole section as well as the breakdown by sub-section.

If you want to track the time spent in code that is in many different files, it is convenient to use the 'getGlobalInstance' method of the 'TimeSpent' class - it gives you access to a singleton instance that you can use anywhere in your program.

By the way, here's the 'myStuff' module I used in the above test program:
#!/usr/bin/env python
from __future__ import division
import time
import random

def wasteTime(numMillis, stdDev=None):
    if stdDev is None:
        stdDev = numMillis / 15
    time.sleep(random.gauss(numMillis, stdDev) / 1000)

def func1(x):
    if x > 0:
        wasteTime(100)
    else:
        wasteTime(600)

def func2(msg):
    wasteTime(200)
    print msg

def func3():
    wasteTime(10)

def func4():
    wasteTime(100)
    if random.random() < 0.5:
        return 1
    else:
        return -1


Source code:
The source code for the TimeSpent class (in Python and in C++/CLI (.NET)) is published under the MIT licence and is available from the following links:

Thursday, April 21, 2011

Minimize Nesting of If Statements

The essay "Code in Motion" by Laura Wingerd and Christopher Seiwald of Perforce Software (from the book "Beautiful Code") emphasizes the importance of readability of source code for avoiding bugs. One of their style rules is to minimize the nesting of 'if' statements.
Here's a graph showing an increase in bugs possibly correlated with an
increase in nesting levels in the Perforce "DiffMerge" program source code: