Earlier today, I decided to see how fast my quad-core PC really is in terms of the raw floating point performance measured by GFLOPS. The typical software to measure the performance of scientific calculations is LINPACK. Using Intel’s implementation, I obtained the following results (Q9450 @ 3.2GHz, O.C.):

CPU frequency:    3.200 GHz
Number of CPUs: 4
Number of threads: 4
Parameters are set to:Number of tests :  1
Number of equations to solve (problem size) : 10000
Leading dimension of array : 10000
Number of trials to run  : 10
Data alignment value (in Kbytes) : 1024

Maximum memory requested that can be used = 801248576, at the size = 10000

============= Timing linear equation system solver =================

Size   LDA    Align. Time(s)    GFlops   Residual      Residual(norm)
10000  10000  1024    16.427     40.5946  1.012665e-10 3.570760e-02
10000  10000  1024    16.398     40.6676  1.012665e-10 3.570760e-02
10000  10000  1024    16.395     40.6740  1.012665e-10 3.570760e-02
10000  10000  1024    16.473     40.4833  1.012665e-10 3.570760e-02
10000  10000  1024    16.391     40.6852  1.012665e-10 3.570760e-02
10000  10000  1024    16.394     40.6785  1.012665e-10 3.570760e-02
10000  10000  1024    16.427     40.5970  1.012665e-10 3.570760e-02
10000  10000  1024    16.397     40.6712  1.012665e-10 3.570760e-02
10000  10000  1024    16.394     40.6766  1.012665e-10 3.570760e-02
10000  10000  1024    16.396     40.6733  1.012665e-10 3.570760e-02

Performance Summary (GFlops)

Size   LDA    Align.  Average  Maximal
10000  10000  1024     40.6401  40.6852

End of tests

It is amazing to see that a personal PC nowadays can achieve 40+ GFLOPS! To put this number in perspective, take a look at the TOP 500 super computer ranking back in 2005. At that time this was the performance of a super computer (Since the benchmarking program used here is different and the conditions under which the tests are performed are not necessarily the same, direct numerical comparison might not be meaningful. Nevertheless, the general trend still holds.)!

It is also worth noting that Q9450 is an extremely overclockable CPU. To achieve a 3.2GHz core frequency, I only needed to raise the Front Side Bus (FSB) frequency from the default  333MHz to 400MHz (vcore is set at 1.2V manually). I was able to achieve a maximum of 3.4GHz without any stability issue. In fact, the only thing keeps me from achieving a higher clock rate is my DDR2 800 RAM (4x2GB, G.SKILL F2-6400CL5D-4GBPQ). With DDR2 RAM, Q9450 can easily archive 50 GFLOP in LINPACK test.

Be Sociable, Share!