TBB Benchmarks

Since I started using Ubuntu 8.04 as my main operating system, I have been trying to obtain some benchmark information for my month-old new build.

Unlike a machine running Windows, benchmarking suites for Linux are few and far between, and are especially hard to find for 64bit systems. Phoronix does provide an excellent test suite that is designed to run under Linux. But I haven’t had any luck to get the latest version (0.6.0) to build and run properly under the 64bit version of Linux yet. And since most of the applications within the test suite have Linux versions only, it would be very difficult to make cross-OS performance comparisons.

If your primary goal is to test your CPU and memory sub system, then I would recommend using Intel’s open source Threading Building Block (TBB). The source includes a few algorithms that were executed after compilation to test whether the build was successful. As a side benefit, these tests are timed and can be used as benchmarks as well.

As an example, the following list is the benchmark information obtained while building the latest stable version of TBB (tbb20_020oss_src). The library was built on my machine (Q9450 @3.2G, 8GB DDR2-800, Linux 2.6.24-16-generic SMP x86_64)

 ./count_strings 1
threads = 1  total = 1000000  time = 0.336895
./count_strings 2
threads = 2  total = 1000000  time = 0.214048
./count_strings 4
threads = 4  total = 1000000  time = 0.181645
./seismic – 300
101.5 frame per sec with serial version
102.3 frame per sec with 1 way parallelism
193.9 frame per sec with 2 way parallelism
219.0 frame per sec with 3 way parallelism
244.2 frame per sec with 4 way parallelism
./convex_hull_bench
Starting TBB unbufferred push_back version of QUICK HULL algorithm
  Number of nodes:5000000  Number of threads:1  Initialization time:0.293048  Calculation time:0.807145
  Number of nodes:5000000  Number of threads:2  Initialization time:0.822569  Calculation time:1.02838
  Number of nodes:5000000  Number of threads:3  Initialization time:0.607247  Calculation time:1.13264
  Number of nodes:5000000  Number of threads:4  Initialization time:0.5828  Calculation time:1.08477
  Number of nodes:5000000  Number of threads:5  Initialization time:0.569491  Calculation time:1.10567
  Number of nodes:5000000  Number of threads:6  Initialization time:0.585655  Calculation time:1.09051
  Number of nodes:5000000  Number of threads:7  Initialization time:0.583944  Calculation time:1.08213
  Number of nodes:5000000  Number of threads:8  Initialization time:0.561563  Calculation time:1.09363
Starting TBB bufferred version of QUICK HULL algorithm
  Number of nodes:5000000  Number of threads:1  Initialization time:0.180772  Calculation time:0.713631
  Number of nodes:5000000  Number of threads:2  Initialization time:0.09458  Calculation time:0.369742
  Number of nodes:5000000  Number of threads:3  Initialization time:0.0698851  Calculation time:0.266026
  Number of nodes:5000000  Number of threads:4  Initialization time:0.0567744  Calculation time:0.207367
  Number of nodes:5000000  Number of threads:5  Initialization time:0.0555128  Calculation time:0.230236
  Number of nodes:5000000  Number of threads:6  Initialization time:0.0598358  Calculation time:0.23095
  Number of nodes:5000000  Number of threads:7  Initialization time:0.0624336  Calculation time:0.257518
  Number of nodes:5000000  Number of threads:8  Initialization time:0.0586483  Calculation time:0.278667
./primes 100000000 0:4
#primes from [2..100000000] = 5761455 (0.16 sec with serial code)
#primes from [2..100000000] = 5761455 (0.18 sec with 1-way parallelism)
#primes from [2..100000000] = 5761455 (0.09 sec with 2-way parallelism)
#primes from [2..100000000] = 5761455 (0.06 sec with 3-way parallelism)
#primes from [2..100000000] = 5761455 (0.05 sec with 4-way parallelism)
./parallel_preorder 1:4
0.235308 seconds using 1 threads (average of 199.74 nodes in root_set)
0.202356 seconds using 2 threads (average of 199.74 nodes in root_set)
0.153144 seconds using 3 threads (average of 199.74 nodes in root_set)
0.181067 seconds using 4 threads (average of 199.74 nodes in root_set)
./sum_tree
Tree creation using TBB scalable allocator
   half created serially: time = 177.1 msec
   half done in parallel: time = 77.9 msec
Calculations:
           SerialSumTree: time = 77.9 msec, sum=7.01275e+08
   SimpleParallelSumTree: time = 44.5 msec, sum=7.01275e+08
OptimizedParallelSumTree: time = 43.4 msec, sum=7.01275e+08
./sum_tree -stdmalloc
Tree creation using standard operator new
   half created serially: time = 369.2 msec
   half done in parallel: time = 548.7 msec
Calculations:
           SerialSumTree: time = 94.7 msec, sum=7.01275e+08
   SimpleParallelSumTree: time = 65.3 msec, sum=7.01275e+08
OptimizedParallelSumTree: time = 65.4 msec, sum=7.01275e+08

 

Be Sociable, Share!

Leave a Reply