-
Notifications
You must be signed in to change notification settings - Fork 4
SAGA Tutorial Part 3: Remote Job Submission
In this second part of the tutorial, we take the previous example and modify it, so that our job is executed on a remote machine instead of localhost. This second examples shows one of the most important capabilities of SAGA: abstracting system heterogeneity. We can use the same code we have used to run a job via 'fork' with minimal modifications to run a job on a completely different system, e.g., via 'ssh' or even 'pbs'.
This example assumes that you have SSH access to a remote resource, either a single host (e.g., a cloud VM) or an HPC cluster. Alternatively, you can run an SSH server on your local machine to 'emulate' a remote resource.
The example also assumes that you have a working public/private SSH key-pair and that you can log-in to your remote resource of choice using those keys, i.e., your public key is in the ~/.ssh/authorized_hosts
file on the remote machine. If you are not sure how this works, you might want to read SSH and GSISSH first.
Copy the code from the previous example to a new file saga_example_remote.py
. To change the execution host for the job, change the URL in the job.Service
constructor. If you want to use a remote SSH host, use a ssh://...
URL:
js = saga.job.Service("ssh://remote.host.net")
Alternatively, if you have access to a PBS cluster, use a pbs+ssh://...
URL:
js = saga.job.Service("pbs+ssh://remote.hpchost.net")
There are more URL options. Have a look at the Plugins page for a complete list. If you submitting your job to a PBS cluster, you will probably also have to make some modifications to your job.Description
. Depending on the configuration of your cluster, you might have to put in the name of the queue you want to use or the allocation or project name that should be credited:
jd = saga.job.Description()
jd.environment = {'MYOUTPUT':'"Hello from Bliss"'}
jd.executable = '/bin/echo'
jd.arguments = ['$MYOUTPUT']
jd.output = "myjob.stdout"
jd.error = "myjob.stderr"
jd.queue = "short" # Using a specific queue
jd.project = "TG-XYZABCX" # Example for an XSEDE/TeraGrid allocation
Save the file and execute it (make sure your virtualenv is activated):
python saga_example_remote.py
The output should look something like this:
Job ID : [pbs+ssh://remote.hpchost.net]-[None]
Job State : saga.job.Job.New
...starting job...
Job ID : [pbs+ssh://remote.hpchost.net]-[644240]
Job State : saga.job.Job.Pending
...waiting for job...
Job State : saga.job.Job.Done
Exitcode : None
Note: If you're using PBS, it might take some time for you job to switch from Pending
to Running
state since it might have to wait in the queue for a while until it gets scheduled. That's perfectly normal.
As opposed to the previous "local" example, you won't find a myjob.stdout
file in your working directory. This is because the file has been created on the remote host were your job was executed. In order to check the content, you would have to log-in to the remote machine. We will address this issue in the next example.
Back: [Tutorial Home](SAGA Tutorial) Next: SAGA Tutorial Part 4: XYZ