Download MPThreadBench.zip

There is no doubt that Hyper-Threading can, under certain circumstances, boost application performance. The performance gain is highly dependent on application type, and according to Intel, this performance gain is at an average of 15-30%.

But for computational intensive multi-threaded applications, Hyper-Threading does not provide much benefit however. Here I will show some benchmarks that support this assertion.

The Benchmark Program

I created a very simple benchmark program using C# which generates prime numbers for a given range and for a user specified number of threads. The main program creates as many threads as the user specified and each threads computes prime numbers within the given range. The calculation for each thread is interleaved so that each thread generates a unique subset of the prime numbers within the given interval.

The main function is listed here:

public double Run(int numOfThreads, int ubound){

MPThreadingTest[] threads = new MPThreadingTest[numOfThreads];

int startNum = 3;

for (int i = 0; i < numOfThreads; i++)

threads[i] = new MPThreadingTest(startNum + i * 2, numOfThreads * 2, ubound (numOfThreads i) * 2);

perfCounter.Start();

for (int i = 0; i < numOfThreads; i++) threads[i].Start();

for (int i = 0; i < numOfThreads; i++) threads[i].Wait();

perfCounter.End();

return perfCounter.TimeElapsed("MS"); }

The high precision timing unit was described in one of my earlier post.

Results

The following are the results from two CPUs, one is a Pentium 4 3.0 GHz HT (Northwood), the other is a Pentium D 2.8 GHz (Smithfield). All the calculations are repeated for 10 times and the results are averaged. The time obtained is the time it takes to enumerate the prime numbers within 1,000,000.

Pentium 4

3 G

 

 

 

 

 

 

 

 

 

 

Number of Threads

 

 

 

 

 

 

 

Runs

1

2

3

4

5

6

7

8

9

10

1

761.6

748.8

748.5

748.8

796.9

811.6

790.1

778.5

864.8

832

2

755.5

754.7

749.5

790.4

791.1

759.2

860.1

748

829.8

808

3

755.9

748.7

790.4

751.4

786

799.6

797.9

775

869.7

810.2

4

753.8

749.5

791.2

787.9

801.7

800.6

861.1

782.7

835.6

800.5

5

755.9

749.8

750.2

757.4

805.4

808.5

784.9

776.2

869.5

835.8

6

754.6

750.5

787.6

786.4

787.5

745.9

747.3

747.3

831.7

812.1

7

760.8

749.5

785.7

748.7

809.1

790.7

831

807.4

824.3

781.6

8

753.9

755.9

783.2

804.5

788

758

784.6

785.4

851.9

840.5

9

756.2

747

749.8

747.5

791.5

756.8

759.6

777.8

869.1

811.9

10

754.3

792.8

756.8

802.3

786.8

749.6

785.9

793.3

826.1

839.7

 

 

 

 

 

 

 

 

 

 

 

Avg. (ms)

756.2

754.7

769.3

772.5

794.4

778.1

800.3

777.2

847.2

817.2

 

Pentium D

2.8 G

 

 

 

 

 

 

 

 

 

 

Number of Threads

 

 

 

 

 

 

 

Runs

1

2

3

4

5

6

7

8

9

10

1

930.7

484.1

466.9

467.6

494.5

464.8

468.5

467.9

465.1

524.4

2

931.8

467.6

467.1

467.3

465.7

465.1

468.3

467.9

465.1

522.4

3

931.2

468.1

467.7

513.8

525.5

506.6

469.8

468.6

465.4

523.7

4

931.5

468.2

466.1

506.5

485.2

465

468.8

468.1

465.1

518.4

5

932.3

467.8

467.3

475.7

466.4

465.8

469.4

468.1

468

521.9

6

934

467.3

466.7

482.9

497.4

464.1

469.8

468.8

465.6

514.1

7

930.2

467.2

467.2

479.1

498.9

464.2

468.8

468

465.5

510.3

8

931.6

467.5

466.8

480.9

480.8

507.9

496

468.5

465.2

519

9

931.3

467.4

466.6

509.1

494.4

464.6

468.9

468

465.8

518.2

10

931.1

468.8

466.2

474.7

553.9

465.9

469.6

467.8

464.8

522.2

 

 

 

 

 

 

 

 

 

 

 

Avg. (ms)

931.6

469.4

466.9

485.8

496.3

473.4

471.8

468.2

465.6

519.5

Pentium D

2.8 G

As can be seen, using 2 threads instead of 1 on the Hyper-Threaded CPU, we only achieved roughly 0.2%. While on Pentium D we are able to achieve 98.5% speed gain.

Intuitively, adding more threads than the actual core slows down the operation. But as the benchmarks suggested, the slowdown might not be as dramatic as you might have imagined. Scaling to 10 threads from 2 only slowed down the calculation speed by approximately 10% for both CPUs.

Be Sociable, Share!