Sample Script
Batch jobs are initiated by submitting a shell script to the queue. Once the scheduler has allocated nodes to your job, the shell script will be started. The script has your AFS permissions and can access variables that provide essential information about the job environment.
my_script.sh ------------ #!/bin/bash # Change to the submission directory cd $PBS_O_WORKDIR # Perform tasks ./process_data segment1.dat
This script, for example, changes to the directory the job was submitted from and runs a program namedprocess_data. If desired, you can pass variables to your script at submission time. See the section on running batch jobs for details.
After your job is finished, two files are created in the submission directory. In my_script.sh example, the file names would be my_script.oJOBID and my_script.eJOBID. my_script.oJOBID is the sceern output of your program and my_script.eJOBID contains possible error messages.
When writing your job script, it usually helps to test it first from an interactive job, where you can quickly make changes. Make sure the script is executable (chmod +x my_script.sh). Job-specific environment variables can be examined by typing printenv | grep PBS at the command line.
Using MATLAB
You can use the cluster to run jobs containing MATLAB code. An example m-file and job script are provided below. See the sample script for more information on writing the script, and the section on batch jobs for details on submitting the script for execution.
matlab_sample.m --------------- function [a,b] = matlab_sample(x) noise = rand(2,1) * 1e-5; a = cos(x) + noise(1); b = sin(x) + noise(2); end
The job script starts up MATLAB and issues commands directly to its command line. In the following example, MATLAB output is redirected to matlab.out. You can issue load and save commands from the m-file or the job script to manage data files.
run_matlab_sample.sh ------------------- #!/bin/bash # Change to the submission directory cd $PBS_O_WORKDIR # Start up MATLAB with desired parameters matlab -nojvm -nodisplay > matlab.out << EOF % Run your MATLAB commands inline [a,b] = matlab_sample(sqrt(2)*pi) % Plot a figure and store it to an PNG file x = linspace(-10, 10); y = 1./(1+exp(-x)); figure(1); plot(x,y); print(1,'-dpng','example_plot'); % Exit MATLAB exit EOF # Display the output cat matlab.out
Alternatively, you can place all your MATLAB code into an m-file and just execute that. Given a MATLAB script you have named, for example, matlab_task.m, your job script might look like:
run_matlab_sample1.sh ------------------- #!/bin/bash # Change to the submission directory cd $PBS_O_WORKDIR # Run the m-file matlab -nojvm -nodisplay -r matlab_task > matlab.out # Display the output cat matlab.out
If you use the -r option, ensure that your m-file ends with the exit command, or your job will not finish.
If you want to pass some data to your m-file and save the figures the bellow script can be used
run_matlab_sample2.sh ------------------- #!/bin/bash # Change to the submission directory cd $PBS_O_WORKDIR # Run matlab_task.m file unset DISPLAY matlab > matlab.out 2>&1 << EOF matlab_sample(input_data); plot(1:10) print file exit EOF
Using MPI
MPI is a message-passing API that simplifies parallel programming. The SAIL cluster supports an implementation called LAM/MPI. The LAM/MPI web site offers documentation, tutorials, and mailing lists.
The MPI interface supports a broad set of operations, from simple message passing and synchronization to collective operations that combine data from every node. This page will help you get a simple MPI app running on the cluster.
Following is a simple C program that instructs each node to print Hello, world along with its rank and the total number of nodes. You can perform any MPI operation between the calls to MPI_Init and MPI_Finalize. If you encounter an error in between these commands, call MPI_Abort(MPI_COMM_WORLD, -1) to terminate the process.
When an MPI job runs, N copies of the binary are started on the nodes allocated to your job, and each process is assigned a unique rank from 0...N-1. The rank can be used to specialize the behavior of each process.
mpi_sample.c ------------ #include <stdio.h> #include <mpi.h> int main(int argc, const char *argv[]) { int rank, world_size; // Initialize MPI, get the rank of this process and the number of processes MPI_Init(NULL, NULL); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &world_size); // Print the number of processes (only if rank==0 so it happens once) if (rank == 0) printf("Starting MPI job with %d processes...\n", world_size); // Wait for all other processes to start MPI_Barrier(MPI_COMM_WORLD); // Print a message printf("Hello, world -- this is the node of rank %d.\n", rank); // Finalize MPI_Finalize(); return 0; }
To compile an MPI program, use the wrappers around gcc provided by the MPI distribution. For a C program, use mpicc. For a C++ program, use mpic++. The syntax is the same as when using gcc. For example, to compile this program, type the following command from an interactive job:
Once you have successfully built your program, you need to define a shell script to boot up MPI, run your job, and shut down MPI at the end. While the invocations of lamboot, lamhalt, and mpirun are MPI-specific, everything described on the sample script page still applies.
run_mpi_sample.sh ------------------ #!/bin/bash # Count the number of CPUs allocated to the job export CPU_COUNT=`wc -l $PBS_NODEFILE | awk '{print \$1}'` # Start up the MPI environment on the nodes allocated to the job lamboot -v $PBS_NODEFILE echo # Change to the submission directory cd $PBS_O_WORKDIR # Run the MPI program (pass arguments as usual after the process name) mpirun -np $CPU_COUNT mpi_sample arg1 arg2 # Shut down the MPI environment when finished echo lamhalt -v
Using an interactive job is a good way to make sure your script runs correctly. When you are ready to go, see batch jobs for details on job submission.