- Getting Started with Batch Processing Summer 2009
-
Getting Started with Batch Processing Summer 2009
Guide Overview
This guide helps you to set up and execute batch processing. It will walk you through an example batch submission as well as offer details to each step in the Batch Submission Workflow.
The default environment for batch processing supports the R language, but you can submit any executable file or script compatible with the Linux environment.
Note: All batch processing is performed in the RCE. You must have an RCE account to use the batch servers. See Research Computing Environment for more information.
We provide the Automated Condor Submission script, condor_submit_util, to automate the process of submitting and executing jobs to the batch servers' Condor system. You also can submit your jobs manually.
Use this guide to perform the following:
- Review the batch processing workflow
- Set up to use the batch servers
- Determine your batch parameters
- Submit a job to the batch servers
- Receive email notifications of job status
- Check job status manually
- Batch Processing Basics
-
Batch Processing Basics
This section describes the batch processing environment in our facilities.
What is Batch Processing?
Batch processing is a procedure by which you submit a program for delayed execution. Batch processing enables you to perform multiple commands and functions without waiting for results from one command to begin another, and to execute these processes without your attendance. The terms process and job are interchangeable.
The batch processing system at HMDC runs on a high throughput cluster on which you can perform extensive, time-consuming calculations without the technical limitations imposed by a typical workstation.
Why Use Batch Processing?
HMDC provides a large, powerful pool of computers that are available for you to use to conduct research. This pool is extremely useful for the following applications:
-
Jobs that run for a long time - You can submit a batch processing job that executes for days or weeks and does not tie up your RCE session during that time.
-
Jobs that are too big to run on your desktop - You can submit batch processing that requires more infrastructure than your workstation provides. For example, you could use a dataset that is larger in size than the memory on your workstation.
-
Groups of dozens or hundreds of jobs that are similar - You can submit batch processing that entails multiple uses of the same program with different parameters or input data. Examples of these types of submission are simulations, sensitivity analysis, or parameterization studies.
Condor System for Batch Processing
The Condor system enables you to submit a program for execution as batch processing, which then does not require your attention until processing is complete. The Condor project website is located at the following URL:
http://www.cs.wisc.edu/condor/
To view the user manual for this software, go to the following URL and choose a viewing option:
http://www.cs.wisc.edu/condor/manual/
Condor System Components and Terminology
A Condor system comprises a central manager and a pool. A Condor central manager machine manages the execution of all jobs that you submit as batch processing. An associated pool of Condor machines associated with that central manager execute individual processes based on policies defined for each pool member. If a computing installation has multiple Condor pools or additional machine clusters dedicated to Condor system use, these pools and clusters can be associated as a flock.
Listed below are some common Condor terms and references, which are unique to Condor:
-
Cluster - A group of jobs or processes submitted together to Condor for batch processing is known as a cluster. Each job has a unique job identifier in a cluster, but shares a common cluster identifier.
-
Pool - A Condor pool comprises a single machine serving as a central manager, and an arbitrary number of machines that have joined the pool. Simply put, the pool is a collection of resources (machines) and resource requests (jobs).
-
Jobs - In a Condor system, jobs are unique processes submitted to a pool for execution and are tracked with a unique process ID number.
-
Flock - A Condor flock is a collection of Condor pools and clusters associated for managing jobs and clusters with varying priorities. A Condor flock functions in the same manner as a pool, but provides greater processing power.
When you submit batch processing to the Condor system, you use a submit description file (or submit file) to describe your jobs. This file results in a
ClassAdfor each job, which defines requirements and preferences for running that job. Each pool machine has a description of what job requirements and preferences that machine can run, called the machineClassAd. The central manager matches jobClassAdswith pool machineClassAdsto select the machine on which to execute a job.Process Identification Numbers
For Condor batch processing, there are two identification numbers that are important to you:
-
Cluster number - The cluster number represents each set of executable jobs submitted to the Condor system. It is a cluster of jobs, or processes. A cluster can consist of a single job.
-
Process number - The process number represents each individual job (process) within a single cluster. Process numbers for a cluster always start at zero.
Each single job in a cluster is assigned a process identification number, called the process ID or job ID. This ID consists of both cluster and process number in the form
<cluster>.<process>.For example, if you submit a batch that consists of a single job, and your batch submission to the Condor queue is assigned cluster number 20, then your process ID is 20.0. If you submit a batch that consists of fifteen jobs that all use the same executable, and your batch submission to the Condor queue is assigned cluster number 8, then your process IDs range from 8.0 to 8.14.
-
- Batch Processing Workflow
-
Batch Processing Workflow
The workflow to submit batch processing to the Condor system is as follows:
-
Create a directory in which to submit jobs to the Condor system.
Make sure that the directory and files with which you plan to work are readable and writable by other users, which include Condor processes.
-
Choose an execution environment, called a universe, for your jobs.
At HMDC, you always use the
vanillauniverse. This execution environment supports processing of individual serial jobs, but has few other restrictions on the types of jobs that you can execute. -
Make your jobs batch ready.
Batch processing runs in the background, meaning that you cannot input to your executable interactively. You must create a program or script that reads in your inputs from a file, and writes out your outputs to another file.
You also must identify the full path and executable source to use for your Condor cluster. The default executable for the condor_submit_util script is the R language. In the RCE, the path and executable source for this language is
/usr/bin/R. -
If you choose to use the
condor_submit_utilscript to create the submit description file (or submit file) and submit your jobs to the Condor system for batch processing automatically, skip to step the next step.If you choose to submit your batch processing to the Condor system manually, create a submit file.
A submit file is a plain-text file that describes a batch of jobs for the Condor software. This file contains the following descriptors:
-
Environment (
vanilla) -
Executable program path and file name
-
Program arguments
-
Input and output file names
-
Log and error file names
-
-
Execute the
condor_submit_utilcommand to write the submit file and submit your program automatically to the Condor job queue.If you chose to write your own submit file, execute the
condor_submit <submit file>.submitcommand to submit your jobs to the queue.Condor then checks the submit file for errors, creates a
ClassAdobject and the object attributes for that cluster, and then places this object in the queue for processing.
-
- Batch Processing Example
-
Batch Processing Example
To illustrate Condor system use in the RCE, we provide the source files for an example R script that can be used for batch processing. You can download the example material from our website, and use these sources to follow procedures described throughout this guide.
- Preparing to Use Batch Processing
-
Preparing to Use Batch Processing
This section covers the peparations necessary to use batch processing, including:
- Setting up your environment
- Making your process batch ready
- Determining batch parameters
- Setting Up Your Environment
-
Setting Up Your Environment
Before you submit any programs for batch processing, perform the following:
Create a directory in which to submit your batch processing, and then change to that directory.
For example, type the following:
> mkdir condor
> cd condorYou can contact us and request that a project directory be set up for you to use for batch processing. If you perform you batch processing within your home directory, the space used for your data and program files can consume much of your allotted resources.
- Determining Batch Parameters
-
Determining Batch Parameters
Before you submit
dwarves.plfor batch processing, you need to determine the parameters for this submission. To use the Condor system for batch processing, you must define these parameters by assigning values to submit file arguments, which describe the jobs that you choose to submit for processing.In the RCE, you always use the
vanillaenvironment.To determine the remaining submit file arguments, answer the following questions:
-
What is the executable path and file name?
For any shell script or statistical application installed in the RCE, the
condor_submit_utilscript can determine the full path for the executable. At the script prompt, you type in the name of your script, program, or application. The default executable in the RCE is the R language, and the path and executable name are/usr/bin/R. -
Do you have any arguments to supply to the executable?
Arguments are parameters that you specify for your executable. For example, the default arguments in the
condor_submit_utilscript are--no-saveand--vanilla, which specify how to launch and exit the R program. The argument--no-savespecifies not to save the R workspace at exit. The argument--vanillainstructs R to not read any user or site profiles or restored data at start up and to not save data files at exit. -
What are the input file names?
If you are using the R program, your input file(s) will be whatever R script you want to execute.
-
What do you plan to name the output files?
A general rule for batch processing is that you have one output file for each input file. Therefore, if you have seven input files, you expect to have seven output files after processing is complete. A useful practice is to correlate the names of input and output files.
-
How many times do you need to execute this script or program?
A general rule for batch processing is that you execute your job one time for each input file that you use.
-
- Making Your Program Batch Ready
-
Making Your Program Batch Ready
Before you submit a program for batch processing, write your program files and compile the programs if necessary. Write the input files for your submission. Then, place your program files in your working directory.
If you need assistance with making your program batch ready, contact us.
If you need assistance with statistical questions and not technical issues, contact the HMDC Data Fellows by email at
dataquest@help.hmdc.harvard.edu.
- Setting Up Batch Processing Example
-
Setting Up Batch Processing Example
To set up our batch processing example for use, you first download the source material, and then determine your batch processing parameters.
Downloading the Source Files
To download the source files for use in this case study:
-
Log in to your RCE session.
-
Open this page in a web browser in your RCE session.
-
Click the file
condor_example.tar.gzto download it to your desktop condor_example.tar.gzYou are prompted to save or open the file (Figure 1).
Figure 1. Download dwarves Case Study Sources
-
Click the Save to Disk option, and then click OK to save the
tarfile to your desktop.The file is downloaded to your desktop, and the Downloads window is displayed (Figure 2), listing files downloaded in your RCE session.
Figure 2. Downloads Window
-
Open a terminal window, and unzip the
tarfile in theDesktopdirectory. Type:> tar zxvf Desktop/condor_example.tar.gz
condor_example/
condor_example/condor_submit_util/
condor_example/bootstrap.R
You now have a directory named
condor_examplein your home directory, which contains the files necessary to run our example.Image(s):File(s): -
- Submitting for Batch Processing
-
Submitting for Batch Processing
After you set up your working directory and define your batch processing parameters, you can submit your script and input files for processing. You can use the HMDC script to set up your submit file and submit your batch or create your submit file manually and submit to the cluster.
This section covers both methods of submission and includes directions for submitting our example program.
- Submitting Using the Script
-
Submitting Using the Script
To build a submit file automatically and submit your program for batch processing, you can use the Automated Condor Submission script in two modes: interactive or command line.
Note: If you do not specify any options when you use the script, the script enters interactive mode automatically. Also, if you do not specify required options when you use the script in command-line mode, the script enters interactive mode automatically, or it reports an error and returns you to the command-line prompt.
- Working Interactively
-
Working Interactively
When you use the script in interactive mode, you can press the Return key to accept default values. Default values are specified in the prompts inside square brackets, and appear at the end of the prompt.
To use the
condor_submit_utilscript in interactive mode:-
Execute the
condor_submit_utilcommand.Type the following at the command prompt in your Condor working directory:
> condor_submit_util
*** No arguments specified, defaulting to interactive mode...
*** Entering interactive mode.
*** Press return to use default value.
*** Some options allow the use of '--' to unset the value. -
The script first prompts you to define the executable program that you choose to submit for batch processing, and then requests the list of arguments to provide to that executable:
Enter executable to submit [/usr/bin/R]: <executable name>
Enter arguments to /usr/bin/R [--no-save --vanilla]: <arguments>The default argument
--no-savespecifies not to save the R workspace at exit. The default argument--vanillainstructs R to not read any user or site profiles or restored data at start up and to not save data files at exit.If you do not have any arguments to apply to your executable, then type -- to supply no arguments.
-
Next, the script prompts you to provide a name or pattern for the input, output, log, and error files for this Condor cluster submission. You can include a relative path in these entries, if you choose:
Enter input file base [in]: <input path and file name or pattern>
Enter output file base [out]: <output path and file name or pattern>
Enter log file base [log]: <log path and file name or pattern>
Enter error file base [error]: <error path and file name or pattern> -
After specifying the files, the script prompts you to define the number of iterations that you choose to execute your program for processing:
Enter number of iterations [10]: <integer>
-
The system creates the submit file for this batch process using your responses to script prompts.
An example submit file is shown here. To view the contents of your submit file, include the option
-v(verbose) when you launch thecondor_submit_utilscript:*** creating submit file '<login account name>-<date-time>.submit'
Universe = vanilla
Executable = /usr/bin/R
Arguments = --no-save --vanilla
when_to_transfer_output = ON_EXIT_OR_EVICT
transfer_output_files = <output file>
input = <input file>
output = <output file>
error = <error file>
Log = <log file>
Queue <integer> -
If you use the verbose option, the script prompts you to confirm that the submit file is correct. To continue, press Return or type y.
Condor checks the submit file for errors, creates the
ClassAdobject for your submission, and adds that object to the end of the queue for processing. The script lists messages that report this progress in your terminal window, and includes the cluster number assigned to the batch process. For example:Is this correct? (Enter y or n) [yes]: y
] submitting job to condor...
] removing submit file '<login account name>-<date-time>'
*** Job successfully submitted to cluster <cluster ID>. -
Finally, the script prompts whether you choose to receive email when execution of your batch processing is complete. Press Return or type y to receive email, or type n to not send email and exit the script.
If you choose to receive email, before exiting, the script prompts you to enter the email address to which you choose to send the notification. The default email address for notification is your email account on the server on which you launched the script. For example:
Would you like to be notified when your jobs complete? (Enter y or n)
[yes]: y
Please enter your email address [<your email account on this server>]:
*** creating watch file '/nfs/fs1/projects/condor_watch/<Condor machine>.<batch cluster>.<your email>' -
View your job queue to ensure that your batch processing begins execution successfully.
See for complete details about checking the queue. An example is:
> condor_q
-- Submitter: vnc.hmdc.harvard.edu : <10.0.0.47:60603> : vnc.hmdc.harvard.edu
IDOWNER SUBMITTED RUN_TIME STPRISIZECMD
9.0arose10/4 11:02 0+00:00:00 R 0 9.8 dwarves.pl
9.1arose10/4 11:02 0+00:00:00 R0 9.8 dwarves.pl
9.2arose10/4 11:02 0+00:00:00 I 0 9.8 dwarves.pl
9.3arose10/4 11:02 0+00:00:00 R 0 9.8 dwarves.pl
4 jobs; 1 idle, 3 running, 0 held
-
- Working on the Command Line
-
Working on the Command Line
When you use the script in command-line mode, you must specify all required options or the script does not execute. For example, the default number of iterations for the script is 10. If you do not have 10 input files in your working directory and you do not enter the option to specify the correct number of iterations that you plan to perform, the script does not execute and returns a message similar to the following:
> condor_submit_util -v
*** Fatal error; exiting script
*** Reason: could not find input file 'in.7'.To use the
condor_submit_utilscript in command-line mode:-
Execute the
condor_submit_utilcommand with the appropriate arguments. See Script Options for detailed information about script options.At a minimum, you must include the following options on the command line:
-
Executable program file name
-
Executable file arguments, or
--noargsoption -
Input file, or
--noinputoption -
Number of iterations, if you do not have 10 input files
At a minimum, type the following at the command prompt from within your Condor working directory:
> condor_submit_util -x <program> -a <arguments> -i <input files>
-
-
Condor creates a submit file and checks it for errors, creates the
ClassAdobject, and adds that object to the end of the queue for processing. The script supplies messages that report this progress, and includes the cluster number assigned to your Condor cluster. For example:> condor_submit_util -x <program> --noargs
Submitting job(s)..........
Logging submit event(s)..........
10 job(s) submitted to cluster 24.If the script encounters a problem when creating the submit file, it enters interactive mode automatically and prompts you for the correct inputs.
-
View your job queue to ensure that your batch processing begins execution.
-
- Saving and Reusing a Submit File
-
Saving and Reusing a Submit File
When you use the script in command-line mode to submit a program for batch processing, include the option
-k(keep) to save the submit file created by the utility.You can edit and reuse that submit file to submit similar programs to the Condor queue for batch processing. You also can include Condor macros to further improve the usability of the file. See Working with Submit Files for detailed information about how to use Condor macros.
For example, if you plan to submit several iterations of a program for batch processing, you can use a single submit file for all iterations. In that submit file, you use the
$(PROCESS)macro to specify unique input, output, error, and log files for each iteration.Use of the
$(PROCESS)macro requires that you develop a naming convention for files or subdirectories that includes the full range of process IDs for your iterations.To use an existing submit file when you submit a batch process, you cannot use the script and must execute the
condor_submitcommand instead. Type the following:> condor_submit my.submit
- Passing Arguments to the Program
-
Passing Arguments to the Program
You can pass arguments to the batch program using the
--argsflag in your submit file. For example, if you change the arguments line in your submit file to something like the following:Arguments = --no-save --vanilla --args <arguments>
Then the contents of
<arguments>will be passed in to the program as command-line arguments. The syntax for passing and handling these arguments differs depending on the statistics program in use.Passing Arguments to R
To parse command-line arguments in R, use the following command in your R script:
args <- commandArgs(TRUE)
This puts the command-line arguments (the contents of <arguments>) into the variable
args.- Script Options
-
Script Options
This section describes the
condor_submit_utilscript options, and how to use them.Refer to Script Options Reference for a full reference.
- Option Conventions
-
Option Conventions
Options for the
condor_submit_utilscript are described in . For most options, there are two conventions that you can use to specify that option on the command line:-
The
-<letter>convention - Use this simple convention as a short cut.For example, the simple option to receive email notification when your batch processing is complete is
-N. -
The
--<term>convention - Use this lengthy convention to make it easy to determine what option you use.For example, the lengthy option to receive email notification when your batch processing is complete is
--Notify.
Both conventions for specifying an option perform the same function. For example, to receive email notification when your batch processing is complete, the options
-Nand--Notifyperform the same function. -
- Pattern Arguments
-
Pattern Arguments
For file-related options, such as the output file name or the error file name, you can use a pattern-matching argument. For example, if you specify the option
-i "run", Condor looks for an input file with the namerun. If there is no file namedrun, Condor looks for a file name that begins withrun., such asrun.14.If there are multiple files with names that begin with the pattern that you specify, then for the first execution within a cluster, Condor uses the file with the name that matches first in alphanumeric order. For successive executions within a cluster, Condor uses the files with names that match successively in alphanumeric order.
- Submitting Manually
-
Submitting Manually
This appendix describes how to write a submit file to submit batch processing to the HMDC Condor flocks. It includes a description of common attributes that you can include in your Condor submit file.
You can use the
condor_submit_utiltool to create the submit file for you and submit your batch processing automatically. See for detailed information.- Submission Weights
-
Submission Weights
When you acquire a login account to the RCE, your account is assigned to a group. When you use the Automated Condor Submission script to submit batch processing, your jobs are assigned a weight (priority) based on the group to which your login account belongs. If you submit jobs manually for batch processing, your jobs might not have the same weight that they would have if you submit them by using the script.
You can use the script to submit a job and include the option
-kto keep a copy of the submit file. Then, you can edit and reuse this submit file to make sure that your job have the same weight when you submit them manually that they have when you submit them by using the script.
- Submitting a Batch Job Manually
-
Submitting a Batch Job Manually
You use the command
condor_submitto submit batch processing manually to the Condor system.In the RCE, you must include the attribute
Universe = vanillain every submit file. If you do not include this statement, Condor attempts to enable job-check pointing, which consumes the central manager resource.Perform the following to submit batch processing manually:
-
Before you submit your program for batch processing, create a directory in which to run your submission, and then change to that directory. Make sure that you set permissions to enable the Condor software to read from and write to the directory and its contents.
Also make sure that your program is batch ready.
-
Create a submit file for your program.
For information about how to create a submit file, see Submit File Basics.
Note: You can use the HMDC Automated Condor Submission script and include the
-koption to create a submit file, and then edit and reuse that submit file for other submissions. -
Submit your program for batch processing.
Type the following at the command prompt:
> condor_submit <submit file>
Condor then checks the submit file for errors, creates the
ClassAdobject, and places that object in the queue for processing. New jobs are added to the end of the queue. For example:> condor_submit hosttest1.submit
Submitting job(s)..........
Logging submit event(s)..........
10 job(s) submitted to cluster 24. -
View your job queue (type condor_q) to ensure that execution begins. When your batch processing is complete, check your output for errors. Output from the example program is as follows:
> cat out.* | grep -A 1 '^> system'
> system("hostname -f")
x1.hmdc.harvard.edu
--
> system("hostname -f")
x2.hmdc.harvard.edu
--
> system("hostname -f")
x3.hmdc.harvard.edu
--
> system("hostname -f")
x1.hmdc.harvard.edu
--
> system("hostname -f")
x2.hmdc.harvard.edu
--
> system("hostname -f")
>x3.hmdc.harvard.edu
--
> system("hostname -f")
x1.hmdc.harvard.edu
--
> system("hostname -f")
x2.hmdc.harvard.edu
--
> system("hostname -f")
x3.hmdc.harvard.edu
--
> system("hostname -f")
x1.hmdc.harvard.edu
-
- Submit File Basics
-
Submit File Basics
You send input to the Condor system using a submit file, which is a text file of
<attribute> = <value>pairs. The naming convention for a submit file is<file name>.submit. Before you submit any batch processing, you first set up a directory in which to work, and create the executable script or program that you choose to submit for processing.- Basic Attributes
-
Basic Attributes
Basic attributes used in the submit file include the following:
-
Universe- At HMDC you specify thevanillauniverse, which supports serial job processing. HMDC does not support use of other Condor environments. -
Executable- Type the name of your program. In the jobClassAd, this becomes theCmdvalue. The default value in the RCE for this attribute is the R program. -
Arguments- Include any arguments required as parameters for your program. When your program is executed, the Condor software issues the string assigned to this attribute as a command-line argument. In the RCE, the default arguments for the R program are--no-saveand--vanilla. -
Input- Type the name of the file or the base name of multiple files that contain inputs for your executable program. -
Output- Type the name of the file or the base name of multiple files in which Condor can place the output from your batch job. -
Log- Type the name of the file or the base name of multiple files in which Condor can record information about your job's execution. -
Error- Type the name of the file or the base name of multiple files in which Condor can record errors from your job. -
Queue- The commandqueueinstructs the Condor system to submit one set of program, attributes, and input file for processing. You use this command one time for each input file that you choose to submit.
When you specify file-related attributes (executable, input, output, log, and error), either place those files within the directory from which you execute the Condor submission or include the relative path name of the files. See Executing in Unique Directories for more information about using subdirectories.
-
- Required Attributes
-
Required Attributes
There are three additional attributes that are required in your submit file when you use batch processing in the RCE. These attributes define when to write an output file, the name of the output file to write at that time, and the number of Condor machines to use when executing your batch process. For each of these attributes, use the following specific values in your submit file:
when_to_transfer_output = ON_EXIT_OR_EVICT
transfer_output_files = <your output file>
- Example Submit File
-
Example Submit File
An example submit file with the minimum required arguments is as follows:
> cat hosttest1.submit
Universe = vanilla
Executable = /usr/bin/R
Arguments = --no-save --vanilla
when_to_transfer_output = ON_EXIT_OR_EVICT
transfer_output_files = out.$(PROCESS)
input = <program>.R
output = out.$(Process)
error = error.$(Process)
Log = log.$(Process)
Queue 10This file instructs Condor to execute ten R jobs using one input program (
<program>.R) and to write unique output files to the current directory.
- Working with Submit Files
-
Working with Submit Files
This section describes some common submit file attributes that you might find useful when submitting your batch processes. In particular, this section describes useful attributes for performing iterative executions of a program.
Note: This section describes how to use input and output files for iterative jobs in a batch process. The same information applies to log and error files.
- Specifying Multiple Executions
-
Specifying Multiple Executions
When you write a submit file, you define the execution parameters for your Condor cluster processes, and then you specify the
Queuecommand. This command instructs the Condor software to create a job cluster and place that cluster in the queue for execution.To instruct Condor to repeat the execution of a process, include the number of times you choose to repeat execution (the number of iterations) after the
Queuecommand. Syntax for this command isQueue <integer>.For example, to execute the program
RanVal.R 10times use the following attributes:Executable = RanVal.R
Queue 10
- Macros and Directories
-
Macros and Directories
Macros are generic attributes that are replaced with specific values during execution of batch processing. Two useful predefined macros are
$(Process)and$(Cluster), which return the process number or cluster number of a job.For example, you can use the
$(Process)macro to submit batch processing that executes the same job numerous times and uses individual input files for each execution. You create unique input files before you submit your batch processing, and use a consistent naming convention for each file that includes the full range of process IDs for your iterations. This enables Condor to match the process ID with the name of a file (or directory).Another good use of the
$(Process)macro combines the macro with the use of theinitialdirattribute to perform iterative executions from within unique directories. Theinitialdirattribute gives individual job executions a directory for file input and output use. If you specify a path for this attribute, it is relative to the directory in which you execute the script or thecondor_submitcommand. Note that the path to the executable is not related to the value ofinitialdir.Another macro enables you to use the dollar sign (
$) as a literal character. For example, to include a dollar sign in a file name, use the macro$(DOLLAR)before the symbol in the file name.
- Executing in Unique Directories
-
Executing in Unique Directories
You can instruct Condor to read input files from and write output files to more than one directory.
To direct Condor to use individual directories for reading and writing files in an iterative process, first create the directories. Use a consistent naming convention for each directory, and include in the names the full range of process numbers that you plan to execute.
For example, to execute a program four times and use individual directories for each of the four executions, create four directories. Use a naming convention that includes the full range of process IDs for four executions,
0-3. You can use any naming convention that you choose. In this example, you might name your directoriesdir0-dir3.The submit file for this example contains attributes and commands that instruct the Condor system to perform one executable four times. The following attributes direct Condor to perform each of the program executions within individual directories:
InitialDir = dir0 # This directory is used for job <cluster ID>.0
Queue
InitialDir = dir1 # This directory is used for job <cluster ID>.1
Queue
InitialDir = dir2 # This directory is used for job <cluster ID>.2
Queue
InitialDir = dir3 # This directory is used for job <cluster ID>.3
QueueA shorter way to do the same thing is to use the following:
InitialDir = dir$(Process)
Queue 4You then can place an input file for individual executions within each directory.
- Executing from Unique Files
-
Executing from Unique Files
You can instruct Condor to read unique input files for each execution in an iterative process. To direct Condor to read unique input files for individual iterations within your cluster, use unique input file names, or unique directory names, or both.
Use a consistent naming convention for your files or directories, and include in the names the full range of process numbers that you plan to execute.
- Retrieving from One Directory
-
Retrieving from One Directory
You can direct Condor to read unique input files for iterative processes from one directory. Create the input files, and use a consistent naming convention that includes the full range of process numbers that you plan to execute.
For example, to run the
RanVal.Rprocess 600 times and use unique input files for each execution, you might name your input filesin.<process ID>, where<process ID>ranges from0-599. The attributes and command for this example are as follows:Executable = RanVal.R
Input = in.$(PROCESS)
Queue 600This example uses a single directory for all iterations of the process.
- Retrieving Unique Files from Unique Directories
-
Retrieving Unique Files from Unique Directories
To read unique input files from individual directories for each execution, create one directory for each execution that you plan to perform and use a consistent naming convention that includes the full range of process IDs. Then, create one input file for each execution and use a consistent naming convention that includes the full range of process IDs. Place each input file in the associated directory.
For this example, use the same input file names you used in the previous section, and create directories named
run_0-run_599. Placein.0in the directoryrun_0, placein.1in the directoryrun_1, and so on. Your input files look like this:/<working directory>/run_0/
in.0
/<working directory>/run_1/
in.1
...
/<working directory>/run_599/
in.599Using unique directories and file names for each iteration, the attributes in your submit file for this example look like this:
Executable = RanVal.R
InitialDir = run_$(PROCESS)
Input = in.$(PROCESS)
Queue 600
- Retrieving One File from Unique Directories
-
Retrieving One File from Unique Directories
To read an input file from individual directories for each execution during batch processing, you can use the same input file name for every execution. Because the input files are located in unique directories, you can use the same file name but include unique content within the file.
For this example, you might name your input file
infile. You create 500 copies of this file, and place one copy in each directory. After you place a copy ofinfilein a directory, you can edit the content of that copy to contain the unique inputs for that iteration. Using the same directory names from the previous section, your input files now look like this:/<working directory>/run_0/
infile
/<working directory>/run_1/
infile
...
/<working directory>/run_499/
infileUsing individual directories and one file name for each iteration, the attributes in your submit file for this example look like this:
Executable = RanVal.R
InitialDir = run_$(PROCESS)
Input = infile
Queue 500mospagebreak title=Directing Output from Multiple Executions to Unique Files
- Directing Output to Unique Files
-
Directing Output to Unique Files
You can instruct Condor to write unique output files for iterative processes, or to write output files in more than one directory, or both.
To direct Condor to write output from your batch processing to specific directories, first create the directories. Use a consistent naming convention for each directory, and include in the names the full range of process numbers that you plan to execute. Then specify an output file name for each process.
For example, you first create fifteen directories named
dir_0-dir_14. You instruct the Condor system to execute your program 15 times (using theQueue 15command). You instruct Condor to create individual output files for each iteration of the executable and name those filesout.<process ID>. The Condor system then places those files in the directory that is assigned the name that includes the same<process ID>.For this example, your submit file includes the following attributes:
InitialDir = dir_$(PROCESS)
output = out.$(PROCESS)
Queue 15Your results look like this:
/<working directory>/dir_0/
out.0
/<working directory>/dir_1/
out.1
...
/<working directory>/dir_14/
out.14
- Combining Unique Input and Output Files
-
Combining Unique Input and Output Files
You can instruct the Condor software to read an individual input file for each iteration of a process, and then to write an individual output file for each iteration. You can organize these input and output files by placing them within one directory, or you can place them within individual directories for each iteration.
For example, to execute an R program (named
RanVal.R) 15 times and use individual directories, input files, and output files for each execution:-
Create fifteen directories. Use a naming convention that includes the full range of process IDs for 15 executions, 0 - 14.
For this example, name your directories
dir_0-dir_14. -
Create individual input files for each execution that you plan to perform. Use a naming convention that includes the full range of process IDs for 15 executions, 0 - 14.
For this example, name your files
in.0-in.14. -
Place each input file in the associated directory.
That is, place
in.0in the directorydir_0, placein.1in the directoryin.1, and so on. Your input files look like this:/<working directory>/dir_0/
in.0
/<working directory>/dir_1/
in.1
...
/<working directory>/dir_14/
in.14 -
Instruct the Condor system to write unique output files for each iteration of the program.
For this example, use the output file name
out.$(PROCESS). -
Instruct the Condor system to execute your program 15 times.
The attributes in your submit file look like this:
Executable = RanVal.R
InitialDir = dir_$(PROCESS)
input = in.$(PROCESS)
output = out.$(PROCESS)
Queue 15
The results of execution of your batch process are as follows:
/<working directory>/dir_0/
in.0
out.0
/<working directory>/dir_1/
in.1
out.1
...
/<working directory>/dir_14/
in.14
out.14 -
- Defining R Component Paths
-
Defining R Component Paths
In R,
/usr/lib64/R/libraryis the location of installed libraries and packages. However, if R packages and libraries are installed manually by using the commandinstall.packages()orR CMD build source.tar.gz, the path for these newly installed components are not known to Condor unless specified.The following are common R utilities and commands used to find and specify absolute and relative paths for unique components:
-
To identify the default directory from which you read input and to which you write output, use the command
getwd(). For example:> getwd()
[1] "/nfs/home/S/sspade"You also can write to a specific directory using the command
sink(<path and file name>). -
To set the default or working directory from which you read input and to which you write output, use the command
setwd(). Insert the full path between the parentheses. For example:> setwd("/nfs/home/S/sspade") -
When installing update packages or libraries from sources other than HMDC's Comprehensive R Archive Network (CRAN) repository, you must specify an absolute path for new components if they do not reside in the default working directory.
For example, to load a library installed in your home directory type the following:
> library(experiment, lib.loc="/nfs/home/S/sspade/.R/library-x886_64")
-
To save results to your home directory, use the command
save.image(<path and file name>). For example:> save.image("/nfs/home/S/sspade/condor-temp/condorprac.Rdata")
-
- Checking the Status of Your Processes
-
Checking the Status of Your Processes
Once you have submitted your job(s) to the queue, you have various ways of checking in on the status of your jobs including e-mail notification of job completion and command line access to both your jobs status and the current state of the cluster.
This section covers:
- Automated e-mail notification via condor_watch
- Checking status of jobs with condor_q
- Checking status of cluster with condor_status
- Receiving Email Notifications
-
Receiving Email Notifications
If you accepted the HMDC script default to receive notification when your batch processing is complete, you receive two emails from the Condor system. The sender of these emails is
condor_watch.You receive one email to notify you that the Condor system is watching your cluster of jobs. For example:
Date: Wed Oct 4 10:20:01 2006 -0400
From: condor_watch@hmdc.harvard.edu
To: sspade@hmdc.harvard.edu
Subject: Condor Watch Greeting
Hello,
You've requested that I watch your jobs running on cluster 7.
When these jobs complete, I will send you another message.
-Condor Watch on vnc.hmdc.harvard.eduYou receive a second email to notify you that your cluster processing is complete. For example:
Date: Wed Oct 4 10:25:02 2006 -0400
From: condor_watch@hmdc.harvard.edu
To: sspade@hmdc.harvard.edu
Subject: Condor Watch - Job(s) Complete
Hello,
Your 7 job(s) on cluster 7 running dwarves.pl -- are complete.
Thank you for using Condor Watch.
-Condor Watch on vnc.hmdc.harvard.edu
- Viewing and Managing Job Status
-
Viewing and Managing Job Status
You can monitor progress of your batch processing using the
condor_statusandcondor_qcommands. This section describes how to check the status of your processes at any time, and how to remove a process from the Condor queue.- Checking the Status of the Pool
-
Checking the Status of the Pool
After you submit a cluster for processing, you can check the status of the Condor pool machines and verify that machines are available on which your jobs can execute.
To check the status of the Condor pool, type the command condor_status. This command returns information about the pool resources. Output lists the number of virtual machines (VMs) available in the pool and whether they are in use. If there are no idle VMs, your batch processing is queued when it is submitted.
For example:
> condor_status
Name OpSys Arch State ActivityLoadAvMemActvtyTime
vm1@mc-1-1.hm LINUX X86_64Claimed Busy 1.060 19750+17:43:50
vm2@mc-1-1.hm LINUX X86_64 Claimed Busy 1.060 1975 0+17:43:48
vm1@mc-1-2.hm LINUX X86_64 Claimed Busy 1.000 1975 0+17:44:43
vm2@mc-1-2.hm LINUX X86_64 Claimed Busy 1.000 1975 0+17:44:36
vm1@mc-1-3.hm LINUX X86_64 UnclaimedIdle 0.010 1975 0+00:03:57
vm2@mc-1-3.hm LINUX X86_64 Unclaimed Idle 0.000 1975 0+00:00:04
vm1@mc-1-4.hm LINUX X86_64 Unclaimed Idle 0.000 1975 0+00:00:04
Total Owner Claimed Unclaimed Matched Preempting Backfill
X86_64/LINUX 7 0 4 3 0 0 0
Total 7 0 4 3 0 0 0To check the cumulative use of resources within in the Condor pool, include the option
-submitterwith the commandcondor_status. This command returns information about each user in the Condor pool. Output lists the user's name, machine in use, and current number of jobs per machine. Use this command to help determine how many resources Condor has available to run your jobs. An example is shown here:> condor_status -submitter
Name Machine Running IdleJobs HeldJobs
mkellerm@hmdc.harvar w4.hmdc.ha 2 0 0
jgreiner@hmdc.harvar x1.hmdc.ha 9 0 0
jgreiner@hmdc.harvar x3.hmdc.ha 40 0 0
kquinn@hmdc.harvard. x5.hmdc.ha 32 0 0
RunningJobs IdleJobs HeldJobs
jgreiner@hmdc.harvar 49 0 0
kquinn@hmdc.harvard. 32 0 0
mkellerm@hmdc.harvar 2 0 0
Total 83 0 0
- Checking the Status of Processes
-
Checking the Status of Processes
To check the status of your jobs in the Condor queue, type the following command:
> condor_q
-- Submitter: vnc.hmdc.harvard.edu : <10.0.0.47:60603> : vnc.hmdc.harvard.edu
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
9.0 arose 10/4 11:02 0+00:00:00 R 0 9.8 dwarves.pl
9.1 arose 10/4 11:02 0+00:00:00 R 0 9.8 dwarves.pl
9.2 arose 10/4 11:02 0+00:00:00 I 0 9.8 dwarves.pl
9.3 arose 10/4 11:02 0+00:00:00 R 0 9.8 dwarves.pl
9.4 arose 10/4 11:02 0+00:00:00 R 0 9.8 dwarves.pl
9.5 arose 10/4 11:02 0+00:00:00 R 0 9.8 dwarves.pl
9.6 arose 10/4 11:02 0+00:00:00 R 0 9.8 dwarves.pl
7 jobs; 1 idle, 6 running, 0 heldThe column
STcontains the status of each job in the queue. A value ofRindicates that the job is running. Valid status values are listed the following table.Table 2. Job Status Values
Status Value
Description
RRunning
IIdle
HHold
- Removing a Process from the Queue
-
Removing a Process from the Queue
To remove a process from the queue, type the command
condor_rm <cluster ID>.<process ID>. For example:> condor_rm 9.9
Job 9.9 marked for removalTo remove all jobs affiliated with a cluster, type the command condor_rm <cluster ID>. For example, the command
condor_rm 4removes all jobs assigned to cluster 4.To remove all of your clusters' jobs from the Condor queue, type condor_rm -a. For example:
> condor_rm -a
All jobs marked for removal.
- Troubleshooting Processes
-
Troubleshooting Processes
The Condor central manager stops (evicts or preempts) a process for several reasons, including the following:
-
Another job or another user's job in the queue has a higher priority and preempts or evicts your job.
-
The pool machine on which your process is executed encounters an issue with the machine state or the machine policy.
-
You specified attributes in your submit file that cannot process without error.
Refer to the Condor project's Frequently Asked Questions (FAQ) website at the following URL for detailed information about submission, job status, and processing errors:
http://www.cs.wisc.edu/condor/manual/v6.8/7_Frequently_Asked.html
Note: A simple action can help you to diagnose problems if you submit multiple jobs to Condor. Be sure to specify unique file names for each job's output, history, error, and log files. If you do not specify unique file names for each submission, Condor overwrites existing files that have the same names. This can prevent you from locating information about problems that might occur.
- Priorities and Preemption
-
Priorities and Preemption
Job priorities enable you to assign a priority level to each submitted Condor job. Job priorities, however, do not impact user priorities.
User priorities are linked to the allocation of Condor resources based upon a user's priority. A lower numerical value for user priority means higher priority, so a user with priority 5 is allocated more resources than a user with priority 50. You can view user priorities by using the
condor_userpriocommand. For example:> condor_userprio -allusers
Condor continuously calculates the share of available machines. For example, a user with a priority of 10 is allocated twice as many machines as a user with a priority of 20. New users begin with a priority of 0.5 and, based upon increased usage, their priority rating rises proportionately in relation to other users. Condor enforces this function such that each user gets a fair share of machines according to user priority and historical volume. For example, if a low-priority user is using all available machines and a higher-priority user submits a job, Condor immediately performs a checkpoint and vacates the jobs that belong to the lower-priority user, except for that user's last job.
User priority rating decreases over time and returns to a baseline of 0.5 as jobs are completed and idle time is realized relative to other users.
- Tracking Progress of a Process
-
Tracking Progress of a Process
To track progress of your processes:
-
Type condor_q to view the status of your process IDs.
-
Check your output directory for the time stamps of your output, log, and error files.
If the output file and log file for a submitted process are more current than the error file, your process probably is running without error.
-
- Analyzing the Process Queue
-
Analyzing the Process Queue
To view detailed information about your processes, including the
ClassAdrequirements for your jobs, type the command condor_q -analyze.Refer to the Condor Version 6.8.0 Manual for a description of the value that represents why a process was placed on hold or evicted. Go to the following URL for section 2.5, "Submitting a Job," and search for the text
JobStatusunder the heading "ClassAd Job Attributes":http://www.cs.wisc.edu/condor/manual/v6.8.0/2_5Submitting_Job.html
For example:
> condor_q -analyze
Run analysis summary. Of 43 machines,
43 are rejected by your job's requirements
0 are available to run your job
WARNING: Be advised:
No resources matched request's constraints
Check the Requirements expression below:
Requirements = ((Memory > 8192)) && (Disk >= DiskUsage)
- Viewing the Process Log File
-
Viewing the Process Log File
A log file includes information about everything that occurred during your cluster processing: when it was submitted, when execution began and ended, when a process was restarted, if there were any issues. When processing finishes, the exit conditions for that process are noted in the log file.
Refer to the Condor Version 6.8.0 Manual for a description of the entries in the process log file. Go to the following URL for section 2.6, "Managing a Job," and go to subsection 2.6.6, "In the log file":
http://www.cs.wisc.edu/condor/manual/v6.8.0/2_6Managing_Job.html
To view the log file for a process and determine where an error occurred, use the
catcommand. For example, the following log file indicates that the process completed normally:> cat log.1
000 (012.001.000) 10/04 12:14:51 Job submitted from host: <10.0.0.47:60603>
...
001 (012.001.000) 10/04 12:15:00 Job executing on host: <10.0.0.61:37097>
...
005 (012.001.000) 10/04 12:15:00 Job terminated.
(1) Normal termination (return value 0)
Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
7 - Run Bytes Sent By Job
163 - Run Bytes Received By Job
7 - Total Bytes Sent By Job
163 - Total Bytes Received By Job
...Following is an example log file for a process that did not complete execution:
> cat log.4
000 (09.000.000) 09/20 14:47:31 Job submitted from host:
<x1.hmdc.harvard.edu>
...
007 (09.000.000) 09/20 15:02:10 Shadow exception!
Error from starter on x1.hmdc.harvard.edu: Failed
to open 'scratch.1/frieda/workspace/v67/condor-
test/test3/run_0/b.input' as standard input: No such
file or directory (errno 2)
0 - Run Bytes Sent By Job
0 - Run Bytes Received By Job
...
- Viewing the Process Error File
-
Viewing the Process Error File
An error file includes information about any errors occurred when your batch processing executed.
Refer to the Condor Version 6.8.0 Manual for information about entries in the error file. Go to the following URL:
http://www.cs.wisc.edu/condor/manual/v6.8.0/ref.html
To view the error file for a process and determine where an error occurred, use the cat command. For example:
> cat errorfile
Error in readChar(con, 5) : cannot open the connection
In addition: Warning message:
cannot open compressed file 'Utilization1.RData'
Execution halted
- Viewing the History File
-
Viewing the History File
When batch processing completes, Condor removes the cluster from the queue and records information about the processes in the history file. History is displayed for each process on a single line. Information provided includes the following:
-
ID- The cluster and process IDs of the job -
OWNER- The owner of the job -
SUBMITTED- The month, day, hour, and minute at which the job was submitted to the queue -
CPU_USAGE- Remote user central processing unit (CPU) time accumulated by the job to date, in days, hours, minutes, and seconds -
ST- Completion status of the job, whereCis completed andXis removed -
COMPLETED- Time at which the job was completed -
CMD- Name of the executable
To view information about processes that you executed on the Condor system, type the command condor_history. For example:
> condor_history
IDOWNER SUBMITTED RUN_TIME ST COMPLETED CMD
1.0 arose 9/26 11:45 0+00:00:00 C 9/26 11:45 /usr/bin/R --no
2.0 arose 9/26 11:48 0+00:00:01 C 9/26 11:48 /usr/bin/R --no
3.0 arose 9/26 11:49 0+00:00:00 C 9/26 11:50 /usr/bin/R --no
3.1 arose 9/26 11:49 0+00:00:01 C 9/26 11:50 /usr/bin/R --no
6.0 arose 10/3 15:52 0+00:00:00 C 10/3 15:52 /nfs/fs1/home/A
6.1 arose 10/3 15:52 0+00:00:00 C 10/3 15:52 /nfs/fs1/home/A
6.2 arose 10/3 15:52 0+00:00:00 C 10/3 15:52 /nfs/fs1/home/A
6.5 arose 10/3 15:52 0+00:00:00 C 10/3 15:52 /nfs/fs1/home/A
6.3 arose 10/3 15:52 0+00:00:00 C 10/3 15:52 /nfs/fs1/home/A
6.4 arose 10/3 15:52 0+00:00:00 C 10/3 15:52 /nfs/fs1/home/A
6.6 arose 10/3 15:52 0+00:00:01 C 10/3 15:52 /nfs/fs1/home/A
9.0 arose 10/4 11:02 0+00:00:00 C 10/4 11:02 /nfs/fs1/home/A
9.1 arose 10/4 11:02 0+00:00:01 C 10/4 11:02 /nfs/fs1/home/A
9.2 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A
9.3 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A
9.5 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A
9.6 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A
9.4 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/ASearch through the history file for your process and cluster IDs to locate information about your jobs.
To view information about all completed processes in a cluster, type the command condor_history <cluster ID>. To view information about one process, type the command condor_history <cluster ID>.<process ID>. For example:
> condor_history 9
IDOWNER SUBMITTED RUN_TIME ST COMPLETED CMD
9.0 arose 10/4 11:02 0+00:00:00 C 10/4 11:02 /nfs/fs1/home/A
9.1 arose 10/4 11:02 0+00:00:01 C 10/4 11:02 /nfs/fs1/home/A
9.2 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A
9.3 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A
9.5 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A
9.6 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A
9.4 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A -
- Managing Processes on Hold
-
Managing Processes on Hold
To view information about processes that Condor placed on hold, type condor_q -hold. For example:
> condor_q -hold
-- Submitter: vnc.hmdc.harvard.edu : <10.0.0.47:60603> : vnc.hmdc.harvard.edu
ID OWNER HELD_SINCEHOLD_REASON
17.0 arose 10/5 12:53via condor_hold (by user arose)
17.1 arose 10/5 12:53via condor_hold (by user arose)
17.2 arose 10/5 12:53via condor_hold (by user arose)
17.3 arose 10/5 12:53via condor_hold (by user arose)
17.4 arose 10/5 12:53via condor_hold (by user arose)
17.5 arose 10/5 12:53via condor_hold (by user arose)
17.6 arose 10/5 12:53via condor_hold (by user arose)
17.7 arose 10/5 12:53via condor_hold (by user arose)
17.9 arose 10/5 12:53via condor_hold (by user arose)
9 jobs; 0 idle, 0 running, 9 heldRefer to the Condor Version 6.8.0 Manual for a description of the value that represents why a process was placed on hold. Go to the following URL for section 2.5, "Submitting a Job," and look for subsection 2.5.2.2, "ClassAd Job Attributes." Look for the entry
HoldReasonCode:http://www.cs.wisc.edu/condor/manual/v6.8.0/2_5Submitting_Job.html
To place a process on hold, type the command condor_hold <cluster ID>.<process ID>. For example:
> condor_hold 8.33
Job 8.33 heldTo place on hold any processes not completed in a full cluster, type condor_hold <cluster ID>. For example:
> condor_hold 8
Cluster 8 held.The status of those uncompleted processes in cluster 8 is now
H(on hold):> condor_q
-- Submitter: vnc.hmdc.harvard.edu : <10.0.0.47:60603> vnc.hmdc.harvard.edu
ID OWNER SUBMITTED RUN_TIME STPRISIZECMD
8.2 sspade 10/4 11:19 0+00:00:00 H 0 9.8 dwarves.pl
8.5 sspade 10/4 11:19 0+00:00:00 H 0 9.8 dwarves.pl
8.6 sspade 10/4 11:19 0+00:00:00 H 0 9.8 dwarves.pl
3 jobs; 0 idle, 0 running, 3 heldTo release a process from hold, type the command condor_release <cluster ID>.<process ID>. For example:
> condor_release 8.33
Job 8.33 released.To release the full cluster from hold, type the command condor_release <cluster ID>. For example:
> condor_release 8
Cluster 8 released.You can instruct the Condor system to place your batch processing on hold if it spends a specified amount of time suspended (that is, not processing). For example, include the following attribute in your submit file to place your jobs on hold if they spends more than 50 percent of their time suspended:
Periodic_hold = CumulativeSuspensionTime > (RemoteWallClockTime /2.0)
-
- Other Batch Processing Examples
-
Other Batch Processing Examples
We created the condor_submit_util script to automate the process of writing a submit file and submitting a cluster of jobs to the Condor queue. When you execute this script, you can include all arguments on the command line. Or, you can execute the script in interactive mode and be prompted for your submit file attributes.
The default settings for the Automated Condor Submission script support creation of submit files for programs that are written in the R language. To submit another type of program to the Condor queue, such as an Octave program, specify the full path and program for the executable (in this example,
Octave). You then define your program file as the input to the executable.Note: To use the
condor_submit_utilscript, you must have an RCE account. See for more information.The following are example uses of the
condor_submit_utilscript and options to submit batch processing in the RCE. A complete description of options is provided in Script Options Reference.Example Using Multiple Input Files
Start with an executable program (named
foo) that uses a set of input data files (nameddata0-data4) and does some analysis.To save the submit file and receive notification when processing is done, type the following command:
> condor_submit_util -x foo -i "data" -k -N
The submit file for this batch looks like this:
Universe = vanilla
Executable = /usr/bin/foo
Arguments = --no-save --vanilla
when_to_transfer_output = ON_EXIT_OR_EVICT
transfer_output_files = out.$(Process)
Notification = Complete
input = data.$(Process)
output = out.$(Process)
error = err.$(Process)
Log = log.$(Process)
Queue 5Example Using Multiple Iterations of One Executable Program
An R program (named
random.R) produces random output.To execute this program eight times and place the output of each execution in separate files in your default working directory, type the following command:
> condor_submit_util -i random.R -n 8 -o "outrun"
Following is the submit file for this batch:
Universe = vanilla
Executable = /usr/bin/R
Arguments = --no-save --vanilla
when_to_transfer_output = ON_EXIT_OR_EVICT
transfer_output_files = outrun.$(Process)
input = random.R
output = outrun.$(Process)
error = error.$(Process)
Log = log.$(Process)
Queue 8Example Checking Process Status
To check the status of the Condor queue after submitting your program for processing, type:
> condor_q
-- Submitter: x1.hmdc.harvard.edu : <10.0.0.47:60603> : x1.hmdc.harvard.edu
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
24.4 mcox8/18 16:35 0+00:00:01 R 0 0.0 R --no-save--vani
24.5 mcox8/18 16:35 0+00:00:00 R 0 0.0 R --no-save--vani
24.6 mcox8/18 16:35 0+00:00:00 R 0 0.0 R --no-save--vani
24.7 mcox8/18 16:35 0+00:00:00 I 0 0.0 R --no-save--vani
24.8 mcox8/18 16:35 0+00:00:00 I 0 0.0 R --no-save--vani
24.9 mcox8/18 16:35 0+00:00:00 I 0 0.0 R --no-save--vani
6 jobs; 3 idle, 3 running, 0 heldThe column
IDlists the process IDs for your jobs. The columnSTlists the status of each job in the Condor queue. A value ofRindicates that the job is running. Valid status values are listed in Checking the Status of Processes.