Working with Interactive Servers
The RCE consists of multiple layers of servers:
- Login servers - For all desktop services, including launching jobs on the batch and interactive servers
- Batch servers - For large-scale parallel processing
- Interactive servers - For memory-intensive nonparallel processing
RCE Powered Applications use the interactive servers to run your memory-intensive research jobs, to isolate the login servers from the load these processes incur.
For information about how to use these resrouces, see Working with RCE Powered Statistical Applications.
Cluster Computing - Overview
We maintain several research computing clusters. These powerful computing resources relieve you of the burden of purchasing, provisioning, configuring, and maintaining your own servers. By making these clusters available, we enable you to concentrate on your own specialties, secure in the knowledge that technical resources are available when needed. Our systems can support concurrent use for different projects simultaneously, thereby reducing competition among users for computing resources.
Our computing clusters consist of two main pools of resources:
- Batch servers - Intended for long-running processes that are CPU intensive and able to run in parallel. You use the batch processing utilities to submit you jobs to the batch servers.
- Interactive servers - Intended for large processes that are memory intensive. You use the RCE tools to submit your jobs to the interactive servers.
Our computing clusters use parallel processing to enable faster execution of computation-intensive tasks. Many computing tasks can benefit from implementation in a parallel processing form. A thorough explanation of parallel processing and how to use it is available in Introduction to Parallel Computing, from Lawrence Livermore National Laboratory.
We designed our computing clusters around open standards for reliability, scalability, extensibility, and interoperability. We use hardware from major vendors and a standard, enterprise-grade Linux distribution customized to address the specific needs of our users. Our infrastructure is designed to provide the greatest possible range of options to you, rather than obliging you to restrict yourself to a narrow range of tools and methodologies. We provide a stable platform on which a wide range of technologies can be deployed.
Access to our computing clusters is available to all RCE users. To learn about how to get an RCE account please refer to Research Computing Environment.
Getting Started with Batch Processing Summer 2009
Guide Overview
This guide helps you to set up and execute batch processing. It will walk you through an example batch submission as well as offer details to each step in the Batch Submission Workflow.
The default environment for batch processing supports the R language, but you can submit any executable file or script compatible with the Linux environment.
Note: All batch processing is performed in the RCE. You must have an RCE account to use the batch servers. See Research Computing Environment for more information.
We provide the Automated Condor Submission script, condor_submit_util, to automate the process of submitting and executing jobs to the batch servers' Condor system. You also can submit your jobs manually.
Use this guide to perform the following:
- Review the batch processing workflow
- Set up to use the batch servers
- Determine your batch parameters
- Submit a job to the batch servers
- Receive email notifications of job status
- Check job status manually
Script Options Reference
The Automated Condor Submission script makes the task of running jobs using the batch servers easier and more intuitive. The process it automates is described in Getting Started with Batch Processing Summer 2009. This script negotiates all job scheduling; it constructs the appropriate submit file for your job, and calls the condor_submit function. To use this utility you need a program to run. The format for using this script is:
condor_submit_util [OPTIONS]
In addition, the script can notify you when your job is done via email so you do not have to check the queue constantly using condor_q. In future releases, the script also will be able to keep usage data so administrators can track overall performance.
The script can be run in two ways, interactively or from the command line. When running interactively, the script prompts you for the values required to run the batch job. If you supply arguments on the command line, these arguments are used in addition to default values for any values you do not supply.
Options
-h, --help
Print help page and exit.-V, --version
Print version information and exit.-v, --verbose
Show information about what goes on during script execution.-I, --Interactive
Enter interactive mode, in which the script prompts you for the required values.-s, --submitfile FILE
Specify the name of the created submit file (default is<user-name-datetime>.submit).-k, --keep
Do not delete the created submit file.-N, --Notify
Receive notification by email when jobs are complete.-x, --executable FILE
The executable for condor to run (default is/usr/bin/R).-a, --arguments ARGS
Any arguments you want to pass to the executable (should be quoted, default is"--no-save --vanilla").-i, --input [FILE|PATT]
Either an explicit file name or base name of input files to the executable (default isin).-o, --output [PATT]
Base name of output files for the executable (default isout).-e, --error [PATT]
Base name of error files for the executable (default iserror).-l, --log [PATT]
Base name of log files for the executable (default islog).-n, --iterations NUM
Number of iterations to submit (default is10).-f, --force
Overwrite any existing files.--noinput
Use no input file for executable.--noargs
Send no arguments to executable.
Examples
-
You have a compiled executable (named foo) that takes a data set and does some analysis. You have five different data sets to run against (named data.0, data.1 ... data.4). You want to save the submit file and be notified when the job is done.
condor_submit_util -x foo -i "data" -k -N
-
You have an R program that has some random output. You want to run it 10 times to see the results.
condor_submit_util -i random.R -n 10
-
You have an R program that will take a long time to complete. You only need to run it once, but you want to be notified when it is done.
condor_submit_util -i long.R -n 1 -N
Notes: For -o, -e, and -l, these options are considered base names for the implied files. The actual file names are created with a numerical extension tied to its condor process number (0 indexed). This means that if you execute condor_submit_util -o "out" -n 3, three output files named out.0, out.1, and out.2 are created.
Also, for -i, the script first checks to see if the name supplied is an actual file on disk, if not it uses the argument as a base name, similar to -o, -e, and -i.
Which R packages are available for cluster jobs?
You can use the following packages from any cluster node without using install.packages:
- CRAN (http://cran.r-project.org/src/contrib)
Updated weekly on Sundays. - Gary King's R Packages (http://gking.harvard.edu/src/contrib)
Updated daily.
How do I use files in my home directory from my batch job?
When you submit a batch job, the R script is copied to a staging area and then executed by a cluster node. This means that you must set paths explicitly in your R scripts.
To set paths, add the following code to the beginning of all your R scripts. This tells R to find the absolute path to your home directory, then set the working directory to that path:
setwd(path.expand("~<username>"))Use this code to address such problems as the following:
Loading required package: MASS Error in file(file, "r") : unable to open connection In addition: Warning message: cannot open file '<filename>', reason 'No such file or directory' Execution halted
Note: If you use a subdirectory, include the path to the subdirectory in the setwd command referenced previously.
How can I make my batch submission track iteration number?
To track iteration number for batch submissions, use one of the following:
- Add
--args '$(Process)'to the Arguments line of your Condor submit file. This passes to the R process the process number of the R run, which progresses from 0 to one less than the number of runs. - Capture the argument in a variable in your R code by entering the following line:
run <- commandArgs(TRUE). The R objectruncontains the run number. You then can use this object to construct appropriate output file names for your job.