A Simple Beowulf Cluster

As I mentioned previously, one of the reasons I built another quad-core PC was that I wanted to do some experiments with high performance computing. Even though the idea of a two-node mini cluster does not sound like much, the principle behind it is pretty similar to much larger and more powerful clusters.

Typically a large computer cluster has a single head node and many compute nodes. These nodes are interconnected via a fast dedicated network (e.g. Gigabit Ethernet or Infiniband). The head node typically has two network adapters which bridges the WAN (or Internet) and the cluster.

For a two-node Beowulf cluster setup however, the above requirements (e.g. one head node and many compute nodes) can be greatly relaxed. Particularly in a two node environment, we do not have the luxury of having a dedicated head node and we have to use both nodes as compute nodes in the cluster. As a result, one of the two nodes will be both the head node and the compute node. As you will see later, the difference between the head node and the compute node in a two node cluster is not all that significant. Furthermore, since the cluster sits behind a firewall, there is no need to bridge the head node from the outside world via a second network adapter. The DSL router is sufficient to isolate the LAN from the Internet. This drops the two NIC requirements.

1. Choosing an MPI (Message Passing Interface) package
There are primarily two popular MPI implementations: MPICH and LAM/MPI. Some technical and performance differences exist between these two implementations (you can find more information in this research paper. This link has some high level information). Since MPICH2 implementation relies solely on SSH and does not require a daemon running on each machine in the cluster, it is presumably easier to setup in an environment that involves a large number of nodes. But for my particular setup, either implementation should serve the purpose. I chose to use MPICH2.

2. Setting up the MPI Environment
The two PCs (zeta, sigma) I use for the cluster both have a quad core processor (Q9450) and 8GB of RAM. The hardware configurations are slightly different (one Motherboard is based on Intel X38 chipset and the other one is based on P43) and the OSes are slightly different as well (one is running 64bit Ubuntu 8.04 and the other is running 64bit Ubuntu 8.10). None of these differences are material though.

In my setup, I used zeta (Ubuntu 8.04) as the head node since that is where my primary desktop is on. My new built (sigma, Ubuntu 8.10) runs in headless mode and serves as a compute node.

There is a lot of information on the Internet on how to setup an MPI environment. But most articles are rather dated. For Ubuntu cluster setup, this article is pretty much up-to-date (the article mainly targets Ubuntu 7.04 and there are some places that need to be updated and I will point those out as we move along). Some of the best references on MPI environment setup come from the MPICH2 distribution tar ball:

mpich2-1.0.8/doc/installguide/install.pdf
mpich2-1.0.8/doc/userguide/user.pdf

It definitely pays to read them through prior to setting up the environment. There are many useful trouble shooting tips in the installation guide as well, and they will come in handy should you run into any setup issues.

3. Setting up NFS on the Head Node
The installation guide that came with the MPICH2 distribution points out that NFS is not necessarily required for MPICH2 to function, but it would be necessary to duplicate the MPICH2 installation and all the MPI applications in the same directory structure on each node. Things become so much easier if all the nodes can use the same NFS directory where MPICH2 installation and MPI applications are. See Step 2 in the MPICH Cluster article.

4. /etc/hosts, mpd.hosts and SSH Setup
As indicated in the article I mentioned earlier, we need to change /etc/hosts on both nodes to contain the IP addresses of each server.

Pay special attention to the 127.0.1.1 address that points to the server name. For MPICH2 to work properly, this IP address must be changed to the real IP address of that server. For instance, here is the /etc/hosts on zeta:

127.0.0.1    localhost
#127.0.1.1    zeta
192.168.1.67    zeta
192.168.1.68    sigma

And similarly here is the /etc/hosts file on sigma:

127.0.0.1       localhost
#127.0.1.1      sigma
192.168.1.67    zeta
192.168.1.68    sigma

You will also need to create a mpd.hosts file on each of the nodes within the user’s home directory (note, this is the user you wish to run mpi under). Here is my mpd.hosts (I only have two machines in the cluster):

zeta
sigma

MPICH2 requires trusted SSH among compute nodes and head node. Please see Step 7 in this article I mentioned earlier. Note that in Ubuntu 8.04 and 8.10 the authorized keys file is called authorized_keys2 and the actual keys generated using ssh-keygen by default are id_dsa and id_dsa.pub. It does not really matter what type of keys you choose. Either DSA or RSA should be fine. This step should be repeated on each cluster node and the id_dsa.pub generated on one machine should be added to authorized_keys2 on the other machine. In my setup, I added the id_dsa.pub on zeta to the authorized_keys2 on sigma and vise versa. This is where we make distinction between  head node and compute nodes. Since head node needs to communicate with all the compute nodes, it needs to have all compute nodes’ public keys. Whereas each compute node only needs to have the head node’s public key. As you can see, in a two-node scenario, either node can be used as the head node.

5 Building and Installing MPICH2
The MPICH Cluster article on Ubuntu documentation has some great details on this so I am not going to repeat here. One thing you need to pay special attention is the .mpd.conf file (located in user’s home directory).

This file is used to make the communications among cluster nodes more secure. To ensure that computers within the same cluster can communicate with each other, each machine must have the same secret word. If the secret word does not match, you will receive a handle_mpd_output error (e.g. failed to handshake with mpd on <node name>; recvd output={})

6 Test Run
On zeta (head node) export the NFS node

exportfs -a

On sigma (compute node) mount the NFS exported from zeta on the same path so that the directory structures on both machines are the same.

sudo mount zeta:/mirror /mirror

On zeta (head node) start up the mpd on both nodes according to mpd.hosts

mpdboot -n 2

Right now running mpdtrace on either machine you should see both nodes get listed.

Running the interactive version of cpi on one core with an interval number of 100,000,000

mpiexec -n 1 ./icpi

it took 1.93 second to complete.

Running with all 8 cores with the same paramter:

mpiexec -n 8 ./icip

took only 0.31 second which is more than 6 times faster than the signle core version.

The following figure recaps the structure of my cluster setup:

Be Sociable, Share!

10 Comments

  1. Jason says:

    Great info! Looking forward to doing this on Amazon’s EC2.

  2. WiNeOS says:

    uhmmm… could it be possible to virtualize the cluster and install a virtual machine that encapsulate all nodes into a big HPC server?

  3. evad1089 says:

    Thank you very much for the tutorial. It was very helpful.

    I was wondering if you have any experience with installing the Perl module for MPI? I cannot seem to find much documentation.

    Regards,
    Dave

  4. Anuj Patel says:

    hi there,
    i have taken up a cluster implementation as a project in college.

    problem i am facing is just described below if you can help me out please do so.

    and other thing is i am implementing it with MPICH 2.
    i have configured MPICH as per guide.

    i haven’t configured NIS or NFS.
    is it really necessary to configure NIS or NFS?

    (connect_lhs 918): failed to connect to lhs at destiny2 45774
    localhost.localdomain_33293 (enter_ring 873): lhs connect failed
    localhost.localdomain_33293 (run 263): failed to enter ring

    Thanks in advance.
    Please do reply ASAP.

  5. Dan says:

    I am wanting to set up a few computers as a cluster.
    Some may be Intel and others AMD.
    A mix of motherboards and CPU Cores and speeds.
    Would it be best to go to high Core Dual Scoket 128gb boards for bettter performance?

    I am looking for the fastest operating system to do a lot of number crunching.

    I also want high available cluster.

    Other users may login and do computing or play games over the web.

    I am considering virtual KVM or Single System Image.

    Can you tell me which type KVM or SSI is the best for high performance computing?
    User friendly?

    The biggest limitation is going to be data transfer… Other than a giga bit sec LAN
    Any suggestions with out a large cost?

    Also I was thinking the 6gb/s Sata disk drives would help with this but are about
    double the cost of 3gb/s so I loose about double the storage with the higher cost.

    Would some Gpgpu’s be worth the extra cost to get more number crunching done?
    Should they be installed on each node to distribute the work load?

    One other question… If I can get someone to donate computer time (over the web) can
    their computer become a node on my system?

    How do I do this?
    Any limits to the scaling?
    Will the SSI do the load balancing automatically?
    How would load balancing be accomplished with KVM?
    Can Ubuntu do all this or is there better and faster for a big number crunching system.

  6. puppian says:

    try puppy linux at http://murga-linux.com/puppy/viewtopic.php?p=484243#484243

    you dont need to install anything to try mpich-cluster

  7. mike lee says:

    hi, I am a 7th grade math teacher and thought for PI day, it would be fun to try a beowulf cluster to run the computation. I saw there a was a code download that could do that. Have you updated any of your instructions, and do you have any words of wisdom for a linux newbee>

    thanks

Leave a Reply