Timing Methods in C++ Under Linux

Measuring the execution time for code sections can be done in multiple ways in C++. Except for the time resolution issue, different timing methods worked relatively the same in single processor environment. As multi-core processors become more prevalent however, we need to be careful at choosing the correct timing mechanism as not all such routines measure the wall time elapsed.

Here I will examine a few commonly used method in measuring time intervals under Linux. All of the following timing routines are timed against the same OpenMP parallel for loop (on a quad-core CPU, the parallel for will spawn four concurrent threads).

time()

The time() function returns time with the accuracy to a second. So this function is generally useful for measuring long-running processes.

clock()

In single-core systems, clock() is often used for time measurements. The resolution of this timer is determined by CLOCKS_PER_SEC and is usually a microsecond. Since it determines the number of CPU clock cycles elapsed, it is not particularly useful in measuring time on a multi-core processor system when there are concurrent executing threads as the result of clock() function is the accumulation of CPU clocks across all active CPUs. On a quad-core system, if all cores are at full utilization then the result time is roughly four times the wall time.

gettimeofday()

Similar to the clock() function, gettimeofday() has a resolution up to one microsecond. As the function name suggests, gettimeofday() measures the wall time and thus is suitable for time measurement in multi-core, multi-cpu systems.

rdtsc

The time stamp counter is available on most modern CPUs (since Pentium). There are many implementations based on rdtsc (e.g. on Windows systems, the Win32 API call QueryPerformanceCounter). Implementation based on rdtsc is generally very accurate (with resolution up to one nanosecond) but depending on implementation, its accuracy might be susceptible to CPU clock throttling (common in mobile CPUs). In my implementation below, the rdtsc results are divided by the CPU frequency.

grep “cpu MHz” /proc/cpuinfo | cut -d’:’ -f2

This implementation assumes that CPU frequency remains constant during operations, which could lead to poor accuracy should the CPU frequency change during the measurement. For desktop CPU though, this is less of a concern however.

clock_gettime

Like rdtsc, this function has a nanosecond accuracy and is available on all POSIX compliant systems.

Intel Threading Building Block also provides a timer function tick_count::now() and the time can easily measured using the code snippet below:

    t_start = tick_count::now();
    //statements
    t_end = tick_count::now();
    cout << (t_end - t_start).seconds() * 1000 << " ms" << endl;

The following lists the code for time measuring using the methods mentioned above.

#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <time.h>
#include <ctime>

using namespace std;

void Foo() {
#pragma omp parallel
    {
        for (long i = 0; i < 50000; i++)
            for (long j = 0; j < 50000; j++);
    }

}

unsigned long long rdtsc() {
    unsigned a, d;

    __asm__ volatile("rdtsc" : "=a" (a), "=d" (d));

    return ((unsigned long long) a) | (((unsigned long long) d) << 32);
}

void Time() {
    time_t t1, t2;

    time(&t1);
    Foo();
    time(&t2);

    cout << "time() : " << t2 - t1 << " s" << endl;
}

void Clock() {
    clock_t c1 = clock();
    Foo();
    clock_t c2 = clock();
    cout << "clock() : " << (float) (c2 - c1) / (float) CLOCKS_PER_SEC << " s" << endl;
}

void GetTimeOfDay() {
    timeval t1, t2, t;
    gettimeofday(&t1, NULL);
    Foo();
    gettimeofday(&t2, NULL);
    timersub(&t2, &t1, &t);

    cout << "gettimeofday() : " << t.tv_sec + t.tv_usec / 1000000.0 << " s" << endl;
}

void RDTSC() {
    unsigned long long t1, t2;

    t1 = rdtsc();
    Foo();
    t2 = rdtsc();

    cout << "rdtsc() : " << 1.0 * (t2 - t1) / 3199987.0 / 1000.0 << " s" << endl;
}

void ClockGettime() {
    timespec res, t1, t2;
    clock_getres(CLOCK_REALTIME, &res);

    clock_gettime(CLOCK_REALTIME, &t1);
    Foo();
    clock_gettime(CLOCK_REALTIME, &t2);
    
    cout << "clock_gettime() : " 
         << (t2.tv_sec - t1.tv_sec)  + (float) (t2.tv_nsec - t1.tv_nsec) / 1000000000.0
         << " s" << endl;
}

int main() {
    cout.setf(ios::fixed);
    cout.setf(ios::showpoint);
    cout.precision(5);

    Time();
    Clock();
    GetTimeOfDay();

    RDTSC();
    ClockGettime();
    
    return (EXIT_SUCCESS);
}

And here is the output from my quad-core computer in debug mode.

time() : 6 s
clock() : 21.85000 s
gettimeofday() : 5.67368 s
rdtsc() : 5.65957 s
clock_gettime() : 5.65894 s

Be Sociable, Share!

4 Comments

  1. Jason says:

    Thanks, I found this quite useful!

  2. weberc2 says:

    So seeing as how wall-time is a poor method for comparing algorithms as one set of code might get more CPU time during its total execution time, are there any good methods for removing the environmental variability from each trial? My concern isn’t absolute precision as much as relative precision.

Leave a Reply