CUDA 5 Multi-GPU Cluster via Amazon EC2 and StarCluster

Your ads will be inserted here by

Easy Plugin for AdSense.

Please go to the plugin admin page to
Paste your ad code OR
Suppress this ad slot.

As promised here is a tutorial on potentially configuring and running say a 20-node CUDA 5 Multi-GPU cluster on Amazon’s AWS cloud infrastructure. The secret is to not pay the $2.10*20=$42/hour cost by using Spot Instances together with the awesome StarCluster python package which takes the pain out of creating clusters on AWS. For the purpose of this post, we will stick to just 2-nodes and will point out the place where you can easy add more nodes all the way up to 20. So lets get started!

Prerequisites

The first thing we need is to install StarCluster and also configure our Amazon AWS credentials and keys. On my 64-bit Mac OSX, I had to install pycrpto first with the following command (you may need to sudo):

➜ export ARCHFLAGS='-arch x86_64'
➜ easy_install pycrypto
...
➜ easy_install starcluster
...

And once installed we need to run it with the help command to create the config file by pressing 2:

➜ starcluster help
StarCluster - (http://web.mit.edu/starcluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu

!!! ERROR - config file /Users/kashif/.starcluster/config does not exist

Options:
--------
[1] Show the StarCluster config template
[2] Write config template to /Users/kashif/.starcluster/config
[q] Quit

Please enter your selection: 2

>>> Config template written to /Users/kashif/.starcluster/config
>>> Please customize the config template

Next we need to look up our AWS security credentials and fill in the [aws info] section of the .starcluster/config file:

➜ cat ~/.starcluster/config
...
#############################################
## AWS Credentials and Connection Settings ##
#############################################
[aws info]
# This is the AWS credentials section (required).
# These settings apply to all clusters
# replace these with your AWS keys
AWS_ACCESS_KEY_ID = blahblah
AWS_SECRET_ACCESS_KEY = blahblahblahblah
# replace this with your account number
AWS_USER_ID= blahblah
...

Now would be a good time to create a key via:

➜ starcluster createkey cuda -o ~/.ssh/cuda.rsa
...
>>> keypair written to /Users/kashif/.ssh/cuda.rsa

and add its location to the .starcluster/config under the [key cuda] section.

Its always good to also create a ~/.aws-credentials-master file and fill it in with the same information so that we can also use the Amazon command line tools:

➜ cat ~/.aws-credentials-master
# Enter the AWS Keys without the < or >
# You can either use the AWS Accounts access keys and they can be found at
# http://aws.amazon.com under Account->Security Credentials
# or you can use the access keys of a user created with IAM
AWSAccessKeyId=blahblah
AWSSecretKey=blahblahblah

Now commands like:

➜ starcluster spothistory cg1.4xlarge
StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu

>>> Current price: $0.35
>>> Max price: $2.10
>>> Average price: $0.46

should work as above.

Basic Idea

What we are going to do is to use an official StarCluster HVM AMI and update and create an EBS backed AMI of it. Then we will use this new AMI to run the cluster. The updated AMI will hopefully have the latest CUDA 5 as well as other goodies.

Customizing an Image Host

We first launch a new single node cluster called imagehost as a spot instance based of an existing StarCluster AMI on a GPU enabled instance. We need to choose an AMI or machine image which supports HVM so we have access to the GPU. We can list all the StarCluster AMIs via:

➜ starcluster listpublic
StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu

>>> Listing all public StarCluster images...
2bit Images:
-------------
[0] ami-899d49e0 us-east-1 starcluster-base-ubuntu-11.10-x86 (EBS)
[1] ami-8cf913e5 us-east-1 starcluster-base-ubuntu-10.04-x86-rc3
[2] ami-d1c42db8 us-east-1 starcluster-base-ubuntu-9.10-x86-rc8
[3] ami-8f9e71e6 us-east-1 starcluster-base-ubuntu-9.04-x86

64bit Images:
--------------
[0] ami-4583572c us-east-1 starcluster-base-ubuntu-11.10-x86_64-hvm (HVM-EBS)
[1] ami-999d49f0 us-east-1 starcluster-base-ubuntu-11.10-x86_64 (EBS)
[2] ami-0af31963 us-east-1 starcluster-base-ubuntu-10.04-x86_64-rc1
[3] ami-2faa7346 us-east-1 starcluster-base-ubuntu-10.04-x86_64-qiime-1.4.0 (EBS)
[4] ami-8852a0e1 us-east-1 starcluster-base-ubuntu-10.04-x86_64-hadoop
[5] ami-a5c42dcc us-east-1 starcluster-base-ubuntu-9.10-x86_64-rc4
[6] ami-a19e71c8 us-east-1 starcluster-base-ubuntu-9.04-x86_64
[7] ami-06a75a6f us-east-1 starcluster-base-centos-5.4-x86_64-ebs-hvm-gpu-hadoop-rc2 (HVM-EBS)
[8] ami-12b6477b us-east-1 starcluster-base-centos-5.4-x86_64-ebs-hvm-gpu-rc2 (HVM-EBS)

total images: 13

and its [0] ami-4583572c that we need to start as a single node cluster and a bid price higher than the current price:

➜ starcluster start -o -s 1 -b 0.35 -i cg1.4xlarge -n ami-4583572c imagehost
StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu

>>> Using default cluster template: smallcluster
>>> Validating cluster template settings...
>>> Cluster template settings are valid
>>> Starting cluster...
>>> Launching a 1-node cluster...
>>> Launching master node (ami: ami-4583572c, type: cg1.4xlarge)...
>>> Creating security group @sc-imagehost...
>>> Creating placement group @sc-imagehost...
SpotInstanceRequest:sir-98fb8411
>>> Starting cluster took 0.042 mins

We can now check to see if our instance is available:

➜ starcluster listclusters --show-ssh-status imagehost
StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu

-----------------------------------------
imagehost (security group: @sc-imagehost)
-----------------------------------------
Launch time: N/A
Uptime: N/A
Zone: N/A
Keypair: N/A
EBS volumes: N/A
Spot requests: 1 open
Cluster nodes: N/A
....
➜ starcluster listclusters --show-ssh-status imagehost
StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu

-----------------------------------------
imagehost (security group: @sc-imagehost)
-----------------------------------------
Launch time: 2012-09-04 12:48:54
Uptime: 0 days, 00:02:38
Zone: us-east-1a
Keypair: cuda
EBS volumes: N/A
Spot requests: 1 active
Cluster nodes:
master running i-5654f92c ec2-50-19-21-200.compute-1.amazonaws.com (spot sir-98fb8411) (SSH: Up)
Total nodes: 1

And once its up we can ssh into it:

➜ starcluster sshmaster imagehost
...
root@ip-10-16-20-37:~#

Install CUDA 5

We can now update the system:

$ apt-get update
...
$ apt-get upgrade
...
$ apt-get dist-upgrade

and reboot. Once back in again we are ready to install CUDA 5. distcalc First we remove the installed Nvidia version of the drivers etc.:

$ sudo apt-get purge nvidia*
...

Next we adjust the linux-restricted-modules-common file so that it has:

$ cat /etc/default/linux-restricted-modules-common
DISABLED_MODULES=”nv nvidia_new”

Next we remove the older CUDA version:

$ sudo rm -rf /usr/local/cuda

After that we install some dependencies of CUDA 5:

$ sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev

Your ads will be inserted here by

Easy Plugin for AdSense.

Please go to the plugin admin page to
Paste your ad code OR
Suppress this ad slot.

And finally we download the latest CUDA 5 and install it:

$ wget http://developer.download.nvidia.com/compute/cuda/5_0/rc/installers/cuda_5.0.24_linux_64_ubuntu11.10.run
....
$ chmod +x cuda_5.0.24_linux_64_ubuntu11.10.run
$ sudo ./cuda_5.0.24_linux_64_ubuntu11.10.run
Logging to /tmp/cuda_install_7078.log
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 304.33? (yes/no/quit): yes
Install the CUDA 5.0 Toolkit? (yes/no/quit): yes
Enter Toolkit Location [ default is /usr/local/cuda-5.0 ]
Install the CUDA 5.0 Samples? (yes/no/quit): yes
Enter CUDA Samples Location [ default is /usr/local/cuda-5.0/samples ]
Installing the NVIDIA display driver...
Installing the CUDA Toolkit in /usr/local/cuda-5.0 ...
...

Once installed we can check if CUDA is working by going to:

$ cd /usr/local/cuda/samples/C/1_Utilities/deviceQuery
$ make
g++ -m64 -I/usr/local/cuda-5.0/include -I. -I.. -I../../common/inc -I../../../shared/inc -o deviceQuery.o -c deviceQuery.cpp
g++ -m64 -o deviceQuery deviceQuery.o -L/usr/local/cuda-5.0/lib64 -lcuda -lcudart
mkdir -p ../../bin/linux/release
cp deviceQuery ../../bin/linux/release
$ ./../bin/linux/release/deviceQuery
[deviceQuery] starting...

./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Found 2 CUDA Capable device(s)

Device 0: "Tesla M2050"
CUDA Driver Version / Runtime Version 5.0 / 5.0
CUDA Capability Major/Minor version number: 2.0
Total amount of global memory: 2687 MBytes (2817982464 bytes)
(14) Multiprocessors x ( 32) CUDA Cores/MP: 448 CUDA Cores
GPU Clock rate: 1147 MHz (1.15 GHz)
Memory Clock rate: 1546 Mhz
Memory Bus Width: 384-bit
L2 Cache Size: 786432 bytes
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: Yes
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 0 / 3
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: "Tesla M2050"
CUDA Driver Version / Runtime Version 5.0 / 5.0
CUDA Capability Major/Minor version number: 2.0
Total amount of global memory: 2687 MBytes (2817982464 bytes)
(14) Multiprocessors x ( 32) CUDA Cores/MP: 448 CUDA Cores
GPU Clock rate: 1147 MHz (1.15 GHz)
Memory Clock rate: 1546 Mhz
Memory Bus Width: 384-bit
L2 Cache Size: 786432 bytes
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: Yes
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 0 / 4
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 2, Device = Tesla M2050, Device = Tesla M2050
[deviceQuery] test results...
PASSED

> exiting in 3 seconds: 3...2...1...done!

The last thing we need to do is to ensure that the device files /dev/nvidia* exist and have the correct file permissions. This can be done by creating a startup script e.g.:

$ cat /etc/init.d/nvidia
#!/bin/bash
PATH=/sbin:/bin:/usr/bin:$PATH

/sbin/modprobe nvidia

if [ "$?" -eq 0 ]; then
# Count the number of NVIDIA controllers found.
NVDEVS=`lspci | grep -i NVIDIA`
N3D=`echo "$NVDEVS" | grep "3D controller" | wc -l`
NVGA=`echo "$NVDEVS" | grep "VGA compatible controller" | wc -l`
N=`expr $N3D + $NVGA - 1`
for i in `seq 0 $N`; do
mknod -m 666 /dev/nvidia$i c 195 $i
done
mknod -m 666 /dev/nvidiactl c 195 255
else
exit 1
fi

$ sudo chmod +x /etc/init.d/nvidia
$ sudo update-rc.d nvidia defaults

And all should be working! We can now cleanup by removing the downloaded files etc. and log out.

Creating an EBS-Backed AMI

We can now create an AMI called starcluster-cuda5-ami of our updated CUDA 5 instance (ID i-5654f92c) by using:

➜ starcluster ebsimage i-5654f92c starcluster-cuda5-ami
StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu

>>> Removing private data...
>>> Creating EBS image...
>>> Waiting for AMI ami-9f6ed8f6 to become available...
>>> create_image took 12.236 mins
>>> Your new AMI id is: ami-9f6ed8f6

And we now have an AMI which we can use for our Cluster.

Cluster Template

Now we can setup the cluster template in the StarCluster config file. We need to choose the AMI or machine image which we just created before. The ami-9f6ed8f6 is the AMI which we will use to setup a small cluster template in the StarCluster config file:

...
[cluster smallcluster]
KEYNAME = cuda
CLUSTER_SIZE = 2
CLUSTER_USER = sgeadmin
CLUSTER_SHELL = bash
NODE_IMAGE_ID = ami-9f6ed8f6
NODE_INSTANCE_TYPE = cg1.4xlarge
SPOT_BID = x.xx

Its important to have a SPOT_BID = x.xx or else the actual price will be charged, which is not what we want :-) Also to run a bigger cluster just replace CLUSTER_SIZE = 2 with the number you need.

Finally in the [global] section of the config file we need to tell StarCluster to use this template:

[global]
DEFAULT_TEMPLATE=smallcluster

Start the Cluster

Ok so lets fire the cluster up with the command:

➜ starcluster start smallcluster
StarCluster - (http://web.mit.edu/starcluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu

>>> Using default cluster template: smallcluster
>>> Validating cluster template settings...
>>> Cluster template settings are valid
>>> Starting cluster...
>>> Launching a 2-node cluster...
>>> Launching master node (ami: ami-9f6ed8f6, type: cg1.4xlarge)...
>>> Creating security group @sc-smallcluster...
Reservation:r-d03ff8b5
>>> Launching node001 (ami: ami-9f6ed8f6, type: cg1.4xlarge)
SpotInstanceRequest:sir-cdc0bc12
>>> Waiting for cluster to come up... (updating every 30s)
>>> Waiting for open spot requests to become active...
...
>>> Configuring cluster took 1.978 mins
>>> Starting cluster took 15.578 mins

And we can ssh into it via:

➜ starcluster sshmaster -u ubuntu smallcluster
...

Using the Cluster

CUDA 5 comes with a number of exciting new features, especially for multi-gpu programming (GPUDirect™) and in the next blog post I will show some of them, so stay tuned.

By the way after you are finished, dont forget to terminate the cluster via:

➜ starcluster terminate smallcluster

CUDA 4.0 MultiGPU on an Amazon EC2 instance

Your ads will be inserted here by

Easy Plugin for AdSense.

Please go to the plugin admin page to
Paste your ad code OR
Suppress this ad slot.

This post will take you through starting and configuring an Amazon EC2 instance to use the Multi GPU features of CUDA 4.0.

Motivation

CUDA 4.0 comes with some new exciting features such as:

  • the ability to share GPUs across multiple threads;
  • or use all GPUs in the system concurrently from a single host thread;
  • and unified virtual addressing for faster multi GPU programming;

and many more.

The ability to access all the GPUs in a system is particularly nice on Amazon, since the large GPU enabled instances come with two Tesla M2050 Fermi boards, each capable of 1030 GFlops theoretical peak performance with 448 cores and 3GB of memory.

Getting started

Signing up to Amazon’s AWS is easy enough with a Credit Card, and once you are logged in, go to the EC2 tab of your console which should look something like this:

The EC2 console page
The EC2 console page.

Now press the Launch Instance button and in the Community AMIs tab set the Viewing option to Amazon Images and search for gpu and Select the CentOS 5.5 GPU HVM AMI and press Continue:

Choose an AMI
Choose the CentOS 5.5 GPU HVM AMI (bottom one).

Next we need to select the Instance Type and its important here to select the Cluster GPU type, and then press Continue:

Instance type
Select the Cluster GPU Instance Type.

Next we need to Create a New Key Pair, by giving it a name like amazon-gpu and press Create & Download your Key Pair to download it to your local computer as a file called amazon-gpu.pem:

Create Key Pair
Create and download Key Pair.

We press Continue to go to the Firewall setting. Here we Create a new Security Group, give it a name and description, and then Create a new rule for ssh so that we can log into our instance once its up and running, and press Continue:

Security Group
Create a new Security Group and a new ssh rule.

And finally we can review our settings and Launch it:

Review and Launch
Review and Launch instance.

Back in our EC2 console we can go to our Instances and see our new AMI’s Status. It should be booting or running, rather than stopped as in the case below:

AMI Instance
AMI Instance's Status and Description.

The Description tab will also contain the Public DNS which we can use together with the Key Pair we downloaded locally to ssh into our instance:

$ chmod 400 amazon-gpu.pem
$ ssh root@ec2-50-16-170-159.compute-1.amazonaws.com -i amazon-gpu.pem

__| __|_ ) CentOS
_| ( / v5.5
___|\___|___| HVMx64 GPU

Welcome to an EC2 Public Image
Please view /root/README
:-)

 

[root@ip-10-16-7-119 ~]#

Updating CUDA to 4.0

Now we need to update the CUDA driver and toolkit on our instance, so the first thing we do is to update the Linux Kernel and reboot the instance via the web console:

[root@ip-10-16-7-119 ~]# yum update kernel kernel-devel kernel-headers
Loaded plugins: fastestmirror
Determining fastest mirrors
* addons: mirror.cogentco.com
* base: mirror.umoss.org
* extras: mirror.symnds.com
* updates: mirror.umoss.org
addons | 951 B 00:00
base | 2.1 kB 00:00
base/primary_db | 2.2 MB 00:00
extras | 2.1 kB 00:00
extras/primary_db | 260 kB 00:00
updates | 1.9 kB 00:00
updates/primary_db | 635 kB 00:00
Setting up Update Process
Resolving Dependencies
--> Running transaction check
---> Package kernel.x86_64 0:2.6.18-238.12.1.el5 set to be installed
---> Package kernel-devel.x86_64 0:2.6.18-238.12.1.el5 set to be installed
---> Package kernel-headers.x86_64 0:2.6.18-238.12.1.el5 set to be updated
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================
Package Arch Version Repository Size
================================================================================
Installing:
kernel x86_64 2.6.18-238.12.1.el5 updates 19 M
kernel-devel x86_64 2.6.18-238.12.1.el5 updates 5.5 M
Updating:
kernel-headers x86_64 2.6.18-238.12.1.el5 updates 1.2 M

Transaction Summary
================================================================================
Install 2 Package(s)
Upgrade 1 Package(s)

Total download size: 26 M
Is this ok [y/N]: y
Downloading Packages:
(1/3): kernel-headers-2.6.18-238.12.1.el5.x86_64.rpm | 1.2 MB 00:00
(2/3): kernel-devel-2.6.18-238.12.1.el5.x86_64.rpm | 5.5 MB 00:00
(3/3): kernel-2.6.18-238.12.1.el5.x86_64.rpm | 19 MB 00:00
--------------------------------------------------------------------------------
Total 18 MB/s | 26 MB 00:01
Running rpm_check_debug
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
Installing : kernel-devel 1/4
Installing : kernel 2/4
Updating : kernel-headers 3/4
Cleanup : kernel-headers 4/4

Installed:
kernel.x86_64 0:2.6.18-238.12.1.el5 kernel-devel.x86_64 0:2.6.18-238.12.1.el5

Updated:
kernel-headers.x86_64 0:2.6.18-238.12.1.el5

 

Complete!

I leave it as an exercise to figure out how to reboot the instance from the console, but once its back up and running, we can ssh back into it to download and install the CUDA 4.0 drivers, toolkit and SDK. For example:

[root@ip-10-16-7-119 ~]# wget http://developer.download.nvidia.com/compute/cuda
/4_0/toolkit/cudatoolkit_4.0.17_linux_64_rhel5.5.run
--2011-06-23 04:47:05-- http://developer.download.nvidia.com/compute/cuda/4_0/toolkit/cudatoolkit_4.0.17_linux_64_rhel5.5.run
Resolving developer.download.nvidia.com... 168.143.242.144, 168.143.242.203
Connecting to developer.download.nvidia.com|168.143.242.144|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 212338897 (203M) [application/octet-stream]
Saving to: `cudatoolkit_4.0.17_linux_64_rhel5.5.run'

100%[======================================>] 212,338,897 33.2M/s in 6.3s

2011-06-23 04:47:12 (32.0 MB/s) - `cudatoolkit_4.0.17_linux_64_rhel5.5.run' saved [212338897/212338897]

 

[root@ip-10-16-7-119 ~]# chmod +x cudatoolkit_4.0.17_linux_64_rhel5.5.run
[root@ip-10-16-7-119 ~]# ./cudatoolkit_4.0.17_linux_64_rhel5.5.run

will install the CUDA toolkit. Similarly install the drivers and SDK and finally check everything is working by typing:

[root@ip-10-16-7-119 ~]# nvidia-smi -a -q

==============NVSMI LOG==============

Timestamp : Thu Jun 23 04:46:42 2011

Driver Version : 270.41.19

Attached GPUs : 2

GPU 0:0:3
Product Name : Tesla M2050
Display Mode : Disabled
Persistence Mode : Disabled
Driver Model
...
GPU 0:0:4
....

MultiGPU example

Once CUDA 4.0 is installed and working, we can test out the MultiGPU example that comes with the SDK installed earlier. Firstly we will need to install the C++ compiler:

[root@ip-10-16-7-119 simpleMultiGPU]# yum install gcc-c++

and then we need to set our LD_LIBRARY_PATH to include the CUDA libraries:

[root@ip-10-16-7-119 release]# export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/lib

After that, we can go to the NVIDIA_GPU_Computing_SDK/C/ folder and type make. The binaries will be installed in the NVIDIA_GPU_Computing_SDK/C/bin/linux/release/ directory and if we go there, we can run the simpleMultiGPU example:

[root@ip-10-16-7-119 release]# ./simpleMultiGPU
[simpleMultiGPU] starting...
CUDA-capable device count: 2
Generating input data...

Computing with 2 GPU's...
GPU Processing time: 24.472000 (ms)

Computing with Host CPU...

Comparing GPU and Host CPU results...
GPU sum: 16777280.000000
CPU sum: 16777294.395033
Relative difference: 8.580068E-07

[simpleMultiGPU] test results...
PASSED

Press ENTER to exit...

MultiGPU Cluster Setup

Using the above setup and this video, it is also possible to configure an 8 node cluster of GPU instances as described here for high performance computing applications. I will try to do a MultiGPU and Open MPI example in another blog post so stay tuned.