How to speed up your program?
It's working ("Make it right, then make it fast") but it's a bit slower than you hoped. The first step in making it faster is to find out which parts are taking the most time.
I'm not (necessarily) talking about CPU time here - what matters in most cases is how long the user needs to wait until the program finishes its task. The bottlenecks might not be CPU time - e.g. the program might be spending a lot of time doing I/O or waiting for some other program that it interacts with, etc. So what I usually want to know is which parts of the program take up the most wall clock time. There are various sophisticated tools available to do this, but often they won't work for your specific situation (e.g. when your program has different interacting components written in different programming languages). And often you don't want to invest a lot of effort in acquiring and learning to use a sophisticated tool. Something simple is often best. And something that allows you to successively refine your efforts to "zoom in" on the bottlenecks.
The Solution:
A solution that has worked well for me is a utility class that makes it easy to instrument my code, bracketing sections of interest and generating reports at certain points. The reports refer to the sections of the program by the names that I assigned to them when I instrumented the code and show how much time elapsed in each section.
I've implemented a utility class called "TimeSpent" in Python and in C++/CLI for .NET. I supply the code below but here's an example of how to use it in Python (the usage in C++ is very similar). Suppose your Python program looks like this:
#!/usr/bin/env python from myStuff import (func1, func2, func3, func4) x = 1 func2('hello') func1(x) func3() x = func4() func1(x) func2('goodbye')
To find out how much each part of this program contributes to the overall execution time, we instrument the code by creating an instance of the 'TimeSpent' class and adding calls to the 'start' and 'stop' methods around each section of interest:
#!/usr/bin/env python from myStuff import (func1, func2, func3, func4) from timeSpent import TimeSpent x = 1 timeSpent = TimeSpent() timeSpent.start("A") func2('hello') func1(x) timeSpent.stop("A") func3() timeSpent.start("B") x = func4() timeSpent.stop("B") timeSpent.start("A") func1(x) func2('goodbye') timeSpent.stop("A") print timeSpent.generateReportAndReset()
Each call to the 'start' method starts a stopwatch (creating it automatically if one with the specified name doesn't exist) and the corresponding 'stop' method stops that stopwatch. The accumulated times for all of the stopwatches is reported via the 'generateReportAndReset' method. Above I have named the stopwatches "A" and "B" and have used stopwatch "A" to measure the two separate sections where 'func1' and 'func2' are called.
Here's the output from the instrumented version of the program:
hello goodbye Elapsed seconds: 0.7 Breakdown of time spent (in milliseconds): A: 629 (count: 2 avg: 314.6) B: 95 (count: 1 avg: 95.0)
Running the instrumented program again shows a longer execution time:
hello goodbye Elapsed seconds: 1.2 Breakdown of time spent (in milliseconds): A: 1070 (count: 2 avg: 534.8) B: 103 (count: 1 avg: 102.7)
hello goodbye Elapsed seconds: 0.7 Breakdown of time spent (in milliseconds): A1: 314 (count: 1 avg: 314.2) A2: 293 (count: 1 avg: 292.9) B: 97 (count: 1 avg: 96.7)
hello goodbye Elapsed seconds: 1.3 Breakdown of time spent (in milliseconds): A1: 310 (count: 1 avg: 310.0) A2: 843 (count: 1 avg: 842.7) B: 105 (count: 1 avg: 104.6)
#!/usr/bin/env python from myStuff import (func1, func2, func3, func4) from timeSpent import TimeSpent x = 1 timeSpent = TimeSpent() timeSpent.start("A (x=%d)" % x) func2('hello') func1(x) timeSpent.stop("A (x=%d)" % x) func3() timeSpent.start("B") x = func4() timeSpent.stop("B") timeSpent.start("A (x=%d)" % x) func1(x) func2('goodbye') timeSpent.stop("A (x=%d)" % x) print timeSpent.generateReportAndReset()
hello goodbye Elapsed seconds: 0.7 Breakdown of time spent (in milliseconds): A (x=1): 600 (count: 2 avg: 300.1) B: 98 (count: 1 avg: 97.8)
hello goodbye Elapsed seconds: 1.2 Breakdown of time spent (in milliseconds): A (x=-1): 783 (count: 1 avg: 783.3) A (x=1): 284 (count: 1 avg: 283.8) B: 108 (count: 1 avg: 108.3)
The ability to use arbitrary strings to name the stopwatches is one of the things that make this utility class user-friendly (here the user is the programmer). Preventing or detecting common errors is another aspect of user (programmer) friendliness. The usage of the TimeSpent class relies on the names matching up in the 'start' and 'stop' methods. What happens if the programmer slips up? Here's an example with two slip-ups: the call to stop("8") should have been stop("B"), and the call to start("A") on the second last line should have been stop("A")
#!/usr/bin/env python from myStuff import (func1, func2, func3, func4) from timeSpent import TimeSpent x = 1 timeSpent = TimeSpent() timeSpent.start("A") func2('hello') func1(x) timeSpent.stop("A") func3() timeSpent.start("B") x = func4() timeSpent.stop("8") timeSpent.start("A") func1(x) func2('goodbye') timeSpent.start("A") print timeSpent.generateReportAndReset()
hello WARNING: unknown stopwatch name passed to 'stop' (8) goodbye WARNING: start(A) called when stopwatch is already running Elapsed seconds: 0.7 Breakdown of time spent (in milliseconds): A: 599 (count: 2 avg: 299.4) B: 390 (count: 1 avg: 390.5)
The is no limit to how many stopwatches can be running at once. If you want to "zoom in" on a section of code, just bracket the sub-sections with more stopwatches (named appropriately) and the report will show the time for the whole section as well as the breakdown by sub-section.
If you want to track the time spent in code that is in many different files, it is convenient to use the 'getGlobalInstance' method of the 'TimeSpent' class - it gives you access to a singleton instance that you can use anywhere in your program.
By the way, here's the 'myStuff' module I used in the above test program:
#!/usr/bin/env python from __future__ import division import time import random def wasteTime(numMillis, stdDev=None): if stdDev is None: stdDev = numMillis / 15 time.sleep(random.gauss(numMillis, stdDev) / 1000) def func1(x): if x > 0: wasteTime(100) else: wasteTime(600) def func2(msg): wasteTime(200) print msg def func3(): wasteTime(10) def func4(): wasteTime(100) if random.random() < 0.5: return 1 else: return -1
Source code:
The source code for the TimeSpent class (in Python and in C++/CLI (.NET)) is published under the MIT licence and is available from the following links:
No comments:
Post a Comment