Thursday, April 28, 2011

Where did the time go? (Instrumenting C++/CLI and Python code)

The Problem:
How to speed up your program?
It's working ("Make it right, then make it fast") but it's a bit slower than you hoped. The first step in making it faster is to find out which parts are taking the most time.
I'm not (necessarily) talking about CPU time here - what matters in most cases is how long the user needs to wait until the program finishes its task. The bottlenecks might not be CPU time - e.g. the program might be spending a lot of time doing I/O or waiting for some other program that it interacts with, etc. So what I usually want to know is which parts of the program take up the most wall clock time. There are various sophisticated tools available to do this, but often they won't work for your specific situation (e.g. when your program has different interacting components written in different programming languages). And often you don't want to invest a lot of effort in acquiring and learning to use a sophisticated tool. Something simple is often best. And something that allows you to successively refine your efforts to "zoom in" on the bottlenecks.

The Solution:
A solution that has worked well for me is a utility class that makes it easy to instrument my code, bracketing sections of interest and generating reports at certain points. The reports refer to the sections of the program by the names that I assigned to them when I instrumented the code and show how much time elapsed in each section.

I've implemented a utility class called "TimeSpent" in Python and in C++/CLI for .NET. I supply the code below but here's an example of how to use it in Python (the usage in C++ is very similar). Suppose your Python program looks like this:
#!/usr/bin/env python
from myStuff import (func1, func2, func3, func4)

x = 1
func2('hello')
func1(x)
func3()
x = func4()
func1(x)
func2('goodbye')

To find out how much each part of this program contributes to the overall execution time, we instrument the code by creating an instance of the 'TimeSpent' class and adding calls to the 'start' and 'stop' methods around each section of interest:
#!/usr/bin/env python
from myStuff import (func1, func2, func3, func4)
from timeSpent import TimeSpent

x = 1
timeSpent = TimeSpent()
timeSpent.start("A")
func2('hello')
func1(x)
timeSpent.stop("A")
func3()
timeSpent.start("B")
x = func4()
timeSpent.stop("B")
timeSpent.start("A")
func1(x)
func2('goodbye')
timeSpent.stop("A")
print timeSpent.generateReportAndReset()

Each call to the 'start' method starts a stopwatch (creating it automatically if one with the specified name doesn't exist) and the corresponding 'stop' method stops that stopwatch. The accumulated times for all of the stopwatches is reported via the 'generateReportAndReset' method. Above I have named the stopwatches "A" and "B" and have used stopwatch "A" to measure the two separate sections where 'func1' and 'func2' are called.
Here's the output from the instrumented version of the program:
hello
goodbye
Elapsed seconds: 0.7
Breakdown of time spent (in milliseconds):
A: 629 (count: 2 avg: 314.6)
B: 95 (count: 1 avg: 95.0)
Note that the count for "A" is 2 - this reflects the fact that stopwatch "A" was used twice. The time of 629 milliseconds is the sum of the times taken for the two sections of code where stopwatch "A" was used.
Running the instrumented program again shows a longer execution time:
hello
goodbye
Elapsed seconds: 1.2
Breakdown of time spent (in milliseconds):
A: 1070 (count: 2 avg: 534.8)
B: 103 (count: 1 avg: 102.7)
And further runs would show that the "A" time is sometimes around 600 ms but other times around 1100 ms. To investigate this, we change the stopwatch name to "A1" for the first section and "A2" for the second section. Here's some outputs from this version of the program:
hello
goodbye
Elapsed seconds: 0.7
Breakdown of time spent (in milliseconds):
A1: 314 (count: 1 avg: 314.2)
A2: 293 (count: 1 avg: 292.9)
B: 97 (count: 1 avg: 96.7)
hello
goodbye
Elapsed seconds: 1.3
Breakdown of time spent (in milliseconds):
A1: 310 (count: 1 avg: 310.0)
A2: 843 (count: 1 avg: 842.7)
B: 105 (count: 1 avg: 104.6)
So it seems that the variation in overall execution time is due to variations in section "A2". With the hypothesis that the variation might be due to different values of 'x', let's change the names of the stopwatches to incorporate the 'x' value in their names:
#!/usr/bin/env python
from myStuff import (func1, func2, func3, func4)
from timeSpent import TimeSpent

x = 1
timeSpent = TimeSpent()
timeSpent.start("A (x=%d)" % x)
func2('hello')
func1(x)
timeSpent.stop("A (x=%d)" % x)
func3()
timeSpent.start("B")
x = func4()
timeSpent.stop("B")
timeSpent.start("A (x=%d)" % x)
func1(x)
func2('goodbye')
timeSpent.stop("A (x=%d)" % x)
print timeSpent.generateReportAndReset()
Here's some outputs from this version of the program:
hello
goodbye
Elapsed seconds: 0.7
Breakdown of time spent (in milliseconds):
A (x=1): 600 (count: 2 avg: 300.1)
B: 98 (count: 1 avg: 97.8)
hello
goodbye
Elapsed seconds: 1.2
Breakdown of time spent (in milliseconds):
A (x=-1): 783 (count: 1 avg: 783.3)
A (x=1): 284 (count: 1 avg: 283.8)
B: 108 (count: 1 avg: 108.3)
The above shows that when x is -1, the second "A" section takes a lot longer than when x is 1.

The ability to use arbitrary strings to name the stopwatches is one of the things that make this utility class user-friendly (here the user is the programmer). Preventing or detecting common errors is another aspect of user (programmer) friendliness. The usage of the TimeSpent class relies on the names matching up in the 'start' and 'stop' methods. What happens if the programmer slips up? Here's an example with two slip-ups: the call to stop("8") should have been stop("B"), and the call to start("A") on the second last line should have been stop("A")
#!/usr/bin/env python
from myStuff import (func1, func2, func3, func4)
from timeSpent import TimeSpent

x = 1
timeSpent = TimeSpent()
timeSpent.start("A")
func2('hello')
func1(x)
timeSpent.stop("A")
func3()
timeSpent.start("B")
x = func4()
timeSpent.stop("8")
timeSpent.start("A")
func1(x)
func2('goodbye')
timeSpent.start("A")
print timeSpent.generateReportAndReset()
The output from running the above version of the program makes it clear to the programmer that there were some mistakes:
hello
WARNING: unknown stopwatch name passed to 'stop' (8)
goodbye
WARNING: start(A) called when stopwatch is already running
Elapsed seconds: 0.7
Breakdown of time spent (in milliseconds):
A: 599 (count: 2 avg: 299.4)
B: 390 (count: 1 avg: 390.5)

The is no limit to how many stopwatches can be running at once. If you want to "zoom in" on a section of code, just bracket the sub-sections with more stopwatches (named appropriately) and the report will show the time for the whole section as well as the breakdown by sub-section.

If you want to track the time spent in code that is in many different files, it is convenient to use the 'getGlobalInstance' method of the 'TimeSpent' class - it gives you access to a singleton instance that you can use anywhere in your program.

By the way, here's the 'myStuff' module I used in the above test program:
#!/usr/bin/env python
from __future__ import division
import time
import random

def wasteTime(numMillis, stdDev=None):
    if stdDev is None:
        stdDev = numMillis / 15
    time.sleep(random.gauss(numMillis, stdDev) / 1000)

def func1(x):
    if x > 0:
        wasteTime(100)
    else:
        wasteTime(600)

def func2(msg):
    wasteTime(200)
    print msg

def func3():
    wasteTime(10)

def func4():
    wasteTime(100)
    if random.random() < 0.5:
        return 1
    else:
        return -1


Source code:
The source code for the TimeSpent class (in Python and in C++/CLI (.NET)) is published under the MIT licence and is available from the following links:

No comments:

Post a Comment