Code performance with gprof
Learn how gprof can help you to identify the performance bottlenecks in your program’s source code
Code profiling is an important aspect of software development. It is mostly done to identify those code snippets that consume more time than expected, or to understand and trace the call flow of functions. This not only aids in debugging many tricky problems, but also helps the programmer to improve the software’s performance.
Although performance requirements vary from program to program, it’s always advantageous to have minimum performance bottlenecks in the final product. For example, a video player will usually have very strict speed requirements while a calculator might not have the same kind of requirements. Even so, a better-performing calculator will always be preferred.
There are many tools available for code profiling in Linux, but one of the popular tools is the GNU profiler – gprof. It is a free program that comes as a part of GNU binary utils and is based on BSD gprof.
The sample code (sampleCode.c – on the disc) used in this guide is written in the C programming language and compiled using GCC 4.7.3. All the commands are executed on Bash 4.2.45 and the gprof version used is 2.23.2. The whole test environment is built on Ubuntu 13.04.
Step 01 Compile profiling-enabled code
In order to profile a code, the first step is to enable profiling while the code is being compiled and linked. In most cases, the command line option -pg should enable profiling.
If compilation and linking commands are used separately, then this option is to be used in both cases. For example:
gcc -Wall -c sampleCode.c -pg gcc -Wall sampleCode.o -o sampleCode -pg
And if compilation and linking is being done in the same command then this option also needs to be added. For example:
gcc -Wall sampleCode.c -o sampleCode -pg
Step 02 Execute the binary – part 1
After the program is compiled (and linked) for profiling, the next step is to execute it. One important point to remember is that the program execution should happen in such a way that all the code blocks (or at least the ones you want to profile) get executed. So make sure that command-line arguments and inputs are given to the program accordingly.
Here is how the profiling-enabled executable program ‘sampleCode’ was executed:
./sampleCode Count = 
So you can see that the program ‘sampleCode’ executed and exited normally.
Step 03 Execute the binary – part 2
Once the program is executed, it produces a file named gmon.out.
ls gmon.out gmon.out
This file contains the profiling data of the code blocks that were actually hit during the program execution. It is not a regular text file and therefore cannot be read normally. This can be confirmed by using the file command in Linux.
file gmon.out gmon.out: GNU prof performance data - version 1
Note 1: The file gmon.out is not produced if the program abnormally terminates because of, say, an unhandled signal, by calling _exit() function directly etc.
Note 2: This file gets created in the working directory of the program at the time of its exit. So, make sure that the program has required permissions for the same.
Note 3: A profiling-enabled program always produces a file named gmon.out. So, make sure that an existing file with this name is not overwritten.
Step 04 Execute gprof
Once the profiling data (gmon.out) is available, the gprof tool can be used to analyse and produce meaningful data from it. Here is the general syntax of the gprof command :
gprof [command-line-options] [executable- file-name] [profiling-data-file-name] > [output-file]
So, the gprof command accepts the executable filename, profiling data filename and the required command-line options to produce human- readable profiling information which can be redirected to an output file.
But, in the simplest form, the command-line tool gprof does not require any argument (the arguments within [ ] are not mandatory). When no argument is supplied, gprof looks for a.out as the default executable-file-name and gmon.out as profiling-data-file-name in the current directory, and the default output is produced on standard output – stdout.
Let’s run gprof in our case :
gprof sampleCode gmon.out > prof_output
The command above redirects the output of gprof to a file named prof_output. This file will now contain human-readable profiling information in the form of a flat profile and call graph (more on these later).
Step 05 Annotated source
The annotated source listing gives an idea about the number of times each line of the program was executed. To get the annotated source listing …
First compile the source code with the -g option. This option enables debugging:
gcc -Wall -pg -g sampleCode.c -o sampleCode
Next, while running the gprof command, use the command-line option -A to produce the annotated source listing:
gprof -A sampleCode gmon.out > prof_ output
Step 06 Flat profile
The flat profile (see screen grab at top of page) shows how much time your program spent in each function, and how many times that function was called. If you simply want to know which functions burn most of the cycles, it is stated concisely here.
The different columns in the Flat Profile table represent :
% time – The percentage of the total running time of the program used by this function.
cumulative seconds – A running sum of the number of seconds accounted for by this function and those listed above it.
self seconds – The number of seconds accounted for by this function alone. This is the major sort for this listing.
calls – The number of times this function was invoked (if this function is profiled, else blank).
self ms/call – The average number of milliseconds spent in this function per call (if this function is profiled, else blank).
total ms/call – The average number of milliseconds spent in this function and its descendants per call (if this function is profiled, else blank).
name – The name of the function. This is the minor sort for this listing. The index shows the location of the function in the gprof listing. If the index is in parentheses, it shows where it would appear in the gprof listing if it were to be printed.
Step 07 Call graph
The Call Graph (see screen below) shows, for each function, which functions called it, which other functions it called, and how many times. There is also an estimate of how much time was spent in the subroutines of each function. This can suggest places where you might try to eliminate function calls that use a lot of time.
Each entry in this table consists of several lines. The line with the index number at the left- hand margin lists the current function. The lines above it list the functions that called this function, and the lines below it list the functions this one called. This line lists:
index – A unique number given to each element of the table. Index numbers are sorted numerically. The index number is printed next to every function name so it is easier to look up where the function is in the table.
% time – This is the percentage of the ‘total’ time that was spent in this function and its children. Note that due to different viewpoints, functions excluded by options etc, these numbers will not add up to 100%.
self – This is the total amount of time spent in this function. For the function’s parents, this is the amount of time that was propagated directly from the function into this parent. For the function’s children, this is the amount of time that was propagated directly from the child into the function.
children – This is the total amount of time propagated into this function by its children.
For the function’s parents, it’s the amount of time that was propagated from the function’s children into this parent. For the function’s children, this is the amount of time that was propagated from the child’s children to the function.
called – This is the number of times the
function was called. If the function called itself recursively, the number only includes non- recursive calls and is followed by a ‘+’ and the number of recursive calls.
For the function’s parents, this is the number of times this parent called the function / the total number of times the function was called. For the function’s children, this is the number of times the function called this child / the total number of times the child was called
name – The name of the current function. The index number is printed after it. If the function is a member of a cycle, the cycle number is printed between the function’s name and the index number.
For the function’s parents, this is the name of the parent. For the function’s children, this is the name of the child.
Step 08 Exclude a particular function
To exclude a particular function from the flat profile or call graph, use the -P or -Q option respectively, along with the function name as
For example, the following command would exclude the flat profile- and call graph-related details of func_a :
gprof -b -Pfunc_a -Qfunc_a sampleCode gmon.out > prof_output
Step 09 Profile a particular function
To fetch the flat profile and call graph information of only a particular function, use the -p and -q options respectively along with the function name as the argument.
For example, the following command would produce the flat profile- and call graph-related details of func_a:
gprof -b -pfunc_a -qfunc_a sampleCode gmon.out > prof_output
Step 10 Suppress verbose blurbs
By default, the gprof output contains detailed explanation of each column of flat profile and call graph. This is good for beginners, but you may want to suppress these details once you know everything. The command-line option -b (or -brief) can be used for this purpose.