Skip to content

Introduction to Bliss

oweidner edited this page Jun 29, 2012 · 5 revisions

NOTE: Please also have a look at the Known Problems And Solutions page for a list of common problems and pitfalls.

Getting Started: Submitting a Job with Bliss

One of the most important feature of Bliss is the capability to submit jobs to local and remote queueing systems and resource managers. This first example explains how.

The job submission and management capabilities of Bliss are packaged in the bliss.saga.job module (API Doc). Three classes are defined in this module:

  • The job.Service class (API Doc) provides a handle to the resource manager, like for example a remote PBS cluster.
  • The job.Description class (API Doc) is used to describe the executable, arguments, environment and requirements (e.g., number of cores, etc) of a new job.
  • The job.Job class (API Doc) is a handle to a job associated with a job.Service. It is used to control (start, stop) the job and query its status (e.g., Running, Finished, etc).

In order to use the Bliss Job API, we first need to import the Bliss module:

import bliss.saga as saga

Next, we create a job service object that represents a local or cluster resource. The job service takes a single URL as parameter. The URL parameter is passed to Bliss' plug-in mechanism and based on the URL scheme, a specific plug-in is selected to connect to the specified location. The URL is a way to tell Bliss what type of queueing system or middleware you want to use and where it is. For example:

js = saga.job.Service("pbs+ssh://india.futuregrid.org")

will tell Bliss to use the PBS over SSH* plug-in to connect a remote PBS cluster that runs on india.futuregrid.org. The latest version of Bliss supports the following plug-ins:

  • fork://localhost Connects to a pseudo job.Service on the local machine: jobs submitted to a job.Service object instantiated with a fork://localhost URL will execute on the local machine.
  • ssh://hostname Connects to a pseudo job.Service on a remote machine via SSH: jobs submitted to a job.Service object instantiated with an ssh://hostname URL will execute on the remote machine via the login shell.
  • pbs://localhost Connects to a PBS cluster on the local machine: jobs submitted to a job.Service object instantiated with a pbs://localhost URL will be submitted to the queue specified in the job.Description
  • pbs+ssh://hostname Connects to a remote PBS cluster via SSH: jobs submitted to a job.Service object instantiated with a pbs+ssh://hostname URL will be submitted to the queue specified in the job.Description
  • sge://localhost Connects to a Sun Grid Engine (SGE) cluster on the local machine: jobs submitted to a job.Service object instantiated with a sge://localhost URL will be submitted to the queue specified in the job.Description
  • sge+ssh://hostname Connects to a remote SGE cluster via SSH: jobs submitted to a job.Service object instantiated with a sge+ssh://hostname URL will be submitted to the queue specified in the job.Description

Once the job.Service object has been created, it can be used to create and start new jobs. To define a new job, a job.Description object needs to be created that contains information about the executable we want to run, the arguments that we need to passed to it, the environment that needs to be set and what requirements we have for our job. Here's an example:

jd = saga.job.Description()

# requirements 
jd.queue  = "development" 
jd.wall_time_limit = 1 # minutes

# environment, executable & arguments
jd.environment = {'MYOUTPUT':'"Hello from Bliss"'}       
jd.executable  = '/bin/echo'
jd.arguments   = ['$MYOUTPUT']

# output options
jd.output = "myjob.stdout"
jd.error  = "myjob.01.stderr"

Exceptions and Error Handling

It is always a good idea to implement proper error handling. Especially when working in distributed environments, things will go wrong sooner or later due to the unreliable nature of distributed resources: a cluster might be down, a network link faulty, etc.

Bliss adheres to the exception mechanism provided by Python. A saga.Exception is raised every time an error is discovered on plug-in level. Bliss calls should hence be always wrapped in a try block:

try:
    # bliss call(s)
except saga.Exception, ex:
    print "Oh no, something went wrong: %s" % ex

Debugging and Logging

For debugging purposes, Bliss provides a logging mechanism that can be enabled by setting the environment variable SAGA_VERBOSE to a value between 1: (less verbose) and 5: (very verbose). In very verbose mode, Bliss produces a large amount of log messages concerning the internals of the currently active plug-in. Sometimes, if an error is not propagated properly to the application via an exception, examining the logs can be helpful to figure out what went wrong.

Example using bash:

SAGA_VERBOSE=5 python mysagaprog.py

A Note on SSH-Based Plug-Ins

Many of Bliss' plug-ins, like the PBS and SGE plug-ins, provide middleware access tunneled over SSH. For security reasons, Bliss (just like the SSH command-line utility) doesn't provide an option for hardcoded passwords.

In order to use plug-ins that allow ssh-tunnelng (xyz_+ssh_://), it is hence necessary to set-up password-less ssh-keychain access to the remote hosts you want to use. Otherwise, you will end-up with error messages like:

bliss.SSHJobPlugin(0x102054320) - ERROR - Couldn't run job because: Private key file is encrypted

or

bliss.PBSJobPlugin(0x10ebdacb0) - ERROR - Couldn't run job because: Permission denied (publickey,hostbased).

Most systems should come with keychain already installed. If not, a simple yum install keychain (RedHat-based systems), apt-get install keychain (Debian-based systems) and brew install keychain (MacOS X via Homebrew) should do the trick.

If you're not familiar with SSH keys and authentication mechanisms at all, please refer to this tutorial for an introduction.

Assuming you have your public/private key-pair stored in $HOME/.ssh/id_rsa, the following command will ask you for your ssh-key's password and add your key to the ssh-agent for subsequent password-less use:

$> keychain $HOME/.ssh/id_rsa

 * keychain 2.7.1 ~ http://www.funtoo.org
 * Found existing ssh-agent: 4175
 * Adding  1 ssh key(s): /Users/oweidner/.ssh/id_rsa
   Enter passphrase for /Users/oweidner/.ssh/id_rsa: 
 * ssh-add: Identities added: /Users/oweidner/.ssh/id_rsa

In order to use this identity, you simply source it into your environment:

$> source ~/.keychain/<your-hostname>-sh