Introduction to BioHPC

 

This guide is a companion to the BioHPC introductory training session, which is required for all new users. Essential information about BioHPC and how to use our services can be found here. Use it as a quick reference, and in conjunction with the other detailed guides to find your way around our systems. Click the links below to quickly jump to the information you need. Any questions or suggestions can be emailed to biohpc-help@utsouthwestern.edu or submitted using the 'Comment on this page' link in the top right of this site.


About BioHPC
Who and what is BioHPC?
What services does BioHPC provide?
How do I register for an account?
When will my account be activated?
How do I contact BioHPC?

Using BioHPC Storage
What storage is allocated for my account?
About lamella - the BioHPC cloud storage gateway
Working with storage on the web using the lamella web interface
Mounting storage on your Windows PC or Mac.
Transferring data using FTP.

The BioHPC Portal
Introduction to the BioHPC portal
Training calendar and materials
Portal Guides and FAQs
Accessing cloud services

Using the BioHPC Compute Cluster
Introduction to the compute cluster
The modules system for software
Module commands
Submitting a compute job via the portal
Connecting to an interactive GUI session
Command line access via the portal and SSH
Useful Linux commands
Using the SLURM job scheduler

 


About BioHPC

Who and what is BioHPC?

biohpc_cloud

BioHPC is the high-performance computing group at UT Southwestern. We provide the hardware, services and assistance necessary for UT Southwestern researchers to tackle large computational problems within their research work. BioHPC is different from many HPC centers, due to the diverse range of users we have. We aim to offer easy-access to our systems using a wide range of cloud services on the web. Users can benefit from our large storage systems and powerful compute cluster without needing Linux and HPC expertises.

BioHPC’s core hardware currently consists of a 172-node compute cluster, and over 3.5 PB of high-performance storage. Follow the links to our systems page to find out more about our systems on this site. Most users will become familiar with the Lamella cloud storage gateway, and our compute cluster Nucleus.

A team of 7 staff manages BioHPC systems, collaborates on research projects, and provides general support to users. Led by the BioHPC director, Liqiang Wang, we have a range of expertise covering bioinformatics work, mathematical simulation, software development and hardware support. We can be contacted via biohpc-help@utsouthwestern.edu.

 

What services does BioHPC provide?

jobs.PNG

  • Storage – user and laboratory allocations on our high performance storage systems with fast access from the campus 10Gb or 1Gb networks, via the web, drive mount, or FTP.
  • Computing – scheduled jobs and interactive GUI sessions on our 164 node cluster, nucleus which contains 128GB, 256GB and 384GB nodes, some containing NVIDIA Tesla GPU cards.
  • Visualization – run complex 3D visualization tasks on our cluster GPU nodes, with smooth and responsive access from your BioHPC workstation, Windows PC, or Mac.
  • Cloud services – Easy access to our systems via your web browser. Submit a job via the web, access nucleus cluster via web, access a GUI visualization session, access web desktop (which share the same session, within the same research group), use our in-house NGS or Galaxy pipelines, store images in our OMERO imagebank, collaborate on code with our Git repository service, and more.
  • Training – We offer sessions each Wednesday, part of a comprehensive training calendar targeted at the needs of researchers at UTSW.
  • Support & Help – Staff are available to advise and assist you in the effective use of our systems for our research, as well as troubleshooting problems.

 

How do I register for an account?

Our initial account registration process is automated. Fill out the online registration form, and watch for an email to confirm your registration details. Once your account is registered you’ll need to attend a compulsory new user introduction session. These take place on the first Wednesday of each month, at 10:30 am in seminar room NL6.125

When will my account be activated?

Your account will be fully activated for access to our systems once you have attended the introductory training session. Please make sure you sign-in when you take the session, so that we know you have received the required training. Accounts are generally activated in the afternoon following introductory training. In very busy time, with a large number of user registrations, you may need to wait a few hours before receiving your activation email.

In exceptional circumstances, if we have the agreement of your department, the training requirement can be waived for early activation at the request of your PI. In this case we still strongly recommend you attend the training session as soon as possible, and review all the contents of this document before starting to use BioHPC.

How do I contact BioHPC?

You can contact BioHPC staff:

                By email to biohpc-help@utsouthwestern.edu
                By telephone at extension 84833
                In person at our office in the NL building, room NL5.136

We strongly prefer email to biohpc-help@utsouthwestern.edu, as this is tracked using a ticket system, ensuring a prompt response from the member of staff best equipped to answer your question.

Please try to provide as much information as possible when contacting us. Copy and pasted messages, error logs, screenshots are all appreciated! It will much easier for me, if you can follow the questions listed below, and anser them as much as you can. This will save lots time for back and forth emails to request additional information for your initial explanation.
 
What is the problem?
    Provide any error message, and diagnostic output you have
When did it happen?
    What time? Cluster or client? What job ID?
How did you run it?
    What did you run, what parameters, what do they mean?
Any unusual circumstances?
    Have you compiled your own software? Do you customize startup scripts?
Can we look at your scripts and data?
    Tell us if you are happy for us to access your scripts or data to help troubleshoot.

 


The BioHPC Portal

Introduction to the BioHPC portal

cloud_services.PNG

The BioHPC user portal at https://portal.biohpc.swmed.edu is the central location on the web to find information about, and gain access to all of our services. If you use BioHPC heavily you might want to set the portal as your home page, or bookmark it and visit it regularly.

The home page of the portal highlight news and upcoming training sessions. You can directly view any open support tickets, and access our cloud services. The status of the cluster and current job queue, or shown for quick reference. The content of the portal is broken into various sections in the top menu bar:

  • News – a full list of updates from BioHPC, which are posted when we make important upgrade, introduce new services, or have important announcements for users concerning downtime etc.
  • About – Detailed information about BioHPC, our systems, and our staff. Here you can learn more technical information about our hardware, and read the backgrounds of our staff.
  • System Status – Access to an overview of our cluster status, the job queue, and cluster usage (restricted).
  • Training – Browse our calendar of upcoming training session, download slides and materials from past sessions, and find links to other recommended training resources on the web.
  • Guides / FAQs – In depth guides in a web format that complement our training sessions, and provide a reference for users. These guides will be kept up-to-date, and added to when new training sessions are given.
  • Cloud Services – Links to all of our web-based cloud service that provide easy-access to our systems via the web and elsewhere.
  • Software – Downloads for software that is recommended or required to access our systems. We provide links to the most common programs that are useful on Windows PCs, Mac, and Linux systems.

 

Training Calendar and Materials

BioHPC holds 4 training sessions each month, on the 1st, 2nd and 3rd Wednesday at 10am in seminar room NL6.125. A drop-in session, Coffee with BioHPC, is held on the 4th Wednesday at 10am in the same room. We have a calendar of introductory, intermediate and advanced training which operates over the year:

                1st Wednesday - Introductory training for new users (repeated every month) 

                2nd Wednesday - Recommend intermediate topics (session repeat every 3 months

                3rd Wednesday - Advanced topics

                4th Wednesday - coffee with BioHPC team

Please browse our training calendar to find sessions interesting to you. At minimum we recommend all users attend the cloud storage session, and computational users attend the SLURM job scheduler session.

After each training session slides and any other material (source code examples etc.) are placed on the portal.

 

Portal Guides and FAQs

We’re working hard to improve the tutorial and reference material available on our website. Guides (like this one) will be added to the Guides / FAQs section of the portal as quickly as possible. When a new training session is delivered we’ll try to offer a guide as a companion and reference for the training session.

We’re interested in providing guides for other topics important to users, and improving our existing guides based on user feedback. Hit the ‘Comment on this page’ link, or email biohpc-help@utsouthwestern.edu with any suggestions.


Using BioHPC Storage

What storage is allocated for my account?

Every BioHPC user received multiple allocations of storage, different amounts at different locations. The major storage locations on our cluster are known by the names home2, project and work reflecting their paths in the filesystem on our cluster. Standard allocations for each user are:

home2  - A 50GB quota for a home directory at /home2/<username>. This is a small area that can be used to store private configuration files, script and programs that you have installed yourself. It should not be used for storing and analyzing large datasets.

project - The /project directory is the main storage area on BioHPC. Space is allocated for each lab group, typically at least 5TB initially. Additional space is available at the request of a lab PI, depending on arrangements with your department. Your lab’s project space can be found at the path /project/<department>/<lab> . Some labs choose to have a folder for each lab member. Other labs choose to have folders for each project. All labs have a ‘shared’ folder inside their project space that can be access by anyone in the lab. Another ‘shared’ folder at /project/<department>/shared is accessible by all members of a department.

work – The /work directory is an additional storage area on different hardware than project space. It may offer better performance for some workloads. Space for each user is 5TB. It should not be used for long-term storage and inactive data should be moved to project space. Each user has a work directory at /work/<department>/<username>. A department shared directory can be found at /work/<department>/shared

Lamella - 100G storage on campus only, private cloud.

External cloud - 50G storage web-interface, external cloud, can share with researchers outside of our UT Southwestern campus.

 

Is my data backed up?

We currently backup data on BioHPC as follows:

  • Data stored directly in the lamella system is backed up weekly.
  • backup of /home2 is twice per week (Mon and Wed) for two copies. /home2 usage is three times of the lab storage allocation.
  • The files under /work are backed up weekly for one copy. The usage is two times of the lab storage allocation, except for some heavy users.
  • Files under /project are backed up according to request by PIs. Email bioHPC which directories to backup, and how often to run the backup. Weekly incremental backup by default. Old versions of files are kept available.
  • See the BioHPC Cloud Storage Guide for more information.
  • Email biohpc-help@utsouthwester.edu to request recovering files.

 

About Lamella – the BioHPC cloud storage gateway

Lamella ( https://lamella.biohpc.swmed.edu https://lamella.biohpc.swmed.edu ) is our cloud storage gateway. Through lamella.biohpc.swmed.edu you can access your files via a web browser, mount your BioHPC space to your Windows PC, Mac or Linux machine, and transfer files via FTP. Lamella is the gateway system for any non-BioHPC machine to access files stored on BioHPC.

 

Working with storage on the web – using the lamella web interface

A full cloud-storage guide is available separately.

The lamella web interface is available at https://lamella.biohpc.swmed.edu or via the links on the BioHPC portal. Login to lamella using your BioHPC username and password. Lamella uses ownCloud, a service that provides a similar experience to web sites such as DropBox and Google Drive. You can download ownCloud client at  https://portal.biohpc.swmed.edu/content/software/ ?.

When you first login to lamella you will be in the files view. This shows your files in lamella cloud storage, a separate 100GB allocation only available via the web or ownCloud client. To access your BioHPC project and work space you must mount them into the lamella web interface.

Screen Shot 2016-12-07 at 9.51.17 AM.png

To mount project and work storage within the lamella web system choose the ‘Personal’ option from the user menu at the top right of the screen.

lamella1.png

Scroll down to the ‘External Storage’ section. You can then mount your main BioHPC space by adding storage definitions as shown below. If successful you will see a green circle on the left of the storage definition, and you will find your storage via the files section of the web interface.

 

Your home directory and BioHPC file exchange (cloud.biohpc.swmed.edu) space are mounted by default. If you want to access your project or work space then you must add them here.Type the desired folder name, pick BioHPC Lysosome for the External storage option,and pick either log-in credential, save in session, or Username and password options for authentication.

Project Directory: Enter project in the Share box and the directory inside project (excluding the first /project) you want to access in the Remote subfolder box. E.g. to access your personal project space at /project/department/lab/s999999 you would enter department/lab/s999999 into the Remote subfolder box. To access your lab shared space you would enter department/lab/shared.

Work Directory: Enter work in the Share box and the directory inside work (excluding the first /work) you want to access in the Remote subfolder box. E.g. to access your personal work space at /work/department/s999999 you would enter department/s999999 into the Remote subfolder box. To access your department shared space you would enter department/shared.

If you choose log-in credentials, save in session you won’t be able share the folders under this directory with other people, but if you pick Username and password option and manually type in your username and password (make sure you actually type in your username and password, they might have been filled in automatically, but they won’t work unless you modified them, and the red frame around the text box disappears), this will allow you to share files and folders under this directory with other people.

(** Detailed path of your directories may vary, please refer to the activation notice email we sent to you after training.)

 

Screen Shot 2016-12-07 at 10.05.59 AM.png

 

Mounting storage on your Windows PC or Mac

You can mount your home2, project and work space on your PC or Mac to access them directly, just like a local hard disk. These uses samba shares, often known as ‘Network Drives’ on Windows, or ‘SMB shares’ on Mac.

IMPORTANT - If you use symlinks on Linux you should be aware that they behave differently when you mount your storage to Windows or Mac. Because Windows does not have the concept of symlinks, the server follows any symlink present on Linux and provides the actual file over the drive mount, not the link. This means that if you delete a symlink (to a file or folder) from Windows/Mac drive mount it may delete the actual files, not just the link itself.

Windows

On Windows in the ‘Computer’ file browser you need to click the ‘Map Network Drive’ button on the toolbar.

Pick a drive letter which you want to map your storage as. Enter one of the following addresses to mount home2, project or work space. To mount home2 space you will replace <username> with your BioHPC username.

\\lamella.biohpc.swmed.edu\<username>
\\lamella.biohpc.swmed.edu\project                   
\\lamella.biohpc.swmed.edu\work

win_map_drive.PNG

If you login to your PC with a username and password other than your BioHPC account then check the ‘Connect using different credentials’ box. Click ‘Finish’ and you’ll be prompted for a username and password. If the computer is not shared with others you might want to select the option to ‘Remember my credentials’ to avoid being prompted for your password each time you connect.

win_password_box.PNG

If your connection is successful the BioHPC space you connected to will open in an explorer window. It will also appear in ‘Computer’ as a drive. You can work with files on the mounted drive in the same way as if they were on a local hard disk. Note, however, that you must be on the campus network or connected to the UTSW VPN to obtain access.

win_explorer.PNG

Mac OSX

To mount your BioHPC storage to your Mac open a finder window and then choose ‘Connect to Server’ from the ‘Go’ menu at the top of your screen. Enter one of the server addresses listed and click the ‘Connect’ button. To mount home2 space you will replace <username> with your BioHPC username.

smb://lamella.biohpc.swmed.edu/<username>
smb://lamella.biohpc.swmed.edu/project
smb://lamella.biohpc.swmed.edu/work

osx_connect.png

You’ll be prompted to enter your BioHPC username and password, and have the option of saving the password to your keychain if the computer is not shared with others. Click ‘Connect’ and the BioHPC space you mounted will open in a new finder window. You can work directly with files in this space like you would on your local computer.

osx_password_box.png

After a connection is made to lamella from OSX, you’ll find lamella.biohpc.swmed.edu listed in the sidebar of finder windows. For easier access to individual shares you can turn on desktop icons for the mounted drives:

Open a finder window and choose Finder->Preferences from the menu bar. Check the ‘Connected servers’ checkbox for ‘Show these items on the desktop’.

 

Transferring data using FTP

Using FTP for data transfer to/from BioHPC storage might be convenient if you have a very large amount of data to move or are working on the command line. FTP can be faster than Windows or Mac mounted shares, but you cannot directly work on files – you must download and upload between your computer and BioHPC.

To connect using FTP we recommend the ‘ Filezilla ’ client, which can be downloaded via the Software section of the portal.

Using your FTP client you will need to connect to:

Host/Server:      lamella.biohpc.swmed.edu
Port:                      21

Use your regular BioHPC username and password for the FTP connection.

* Previous host lysosome.biohpc.swmed.edu continues to work from computers on the campus 10Gb network only. New users should always use lamella.biohpc.swmed.edu


Using the BioHPC Computer Cluster

Introduction to the compute cluster

Our compute cluster is called Nucleus, and has 148 nodes right now. It’s a heterogeneous cluster where the nodes have different specifications. At present there’s a mix of 128, 256, 384GB nodes, plus 8 nodes with GPU cards.The cluster is running RedHat Enterprise Linux 6, and uses the SLURM job scheduling software. Not by accident this is the same basic setup as the TACC Stampede supercomputer in Austin.

To run programs on nucleus you must interact with the job scheduler, SLURM and understand how to use software modules. The job scheduler allocates time on the cluster to users, queueing their jobs and running them when free time is available on a compute node. Jobs can be submitted to the scheduler manually via the command line, or more easily using the online submission tool our web portal. Special visualization jobs can also be submitted via the portal, which allow you to connect to a graphical desktop from your local workstation, Windows PC or Mac. See below for intructions.


The modules system for software

A lot of different software is required by the groups who are members of BioHPC, and different users might need different versions etc. We use a system of ‘modules’ to provide a wide range of packages. On the cluster or clients and workstations you can load modules of software that you need. If you need additional software, or updated versions of existing software then please email us. If you are trying things out, and know how to, you can also install software into your home directory for your sole use.

Here’s an example of using software modules where we want to run the 'cufflinks' RNA-Seq analysis tool. From a command line on the compute cluster or a workstation we can run module list to see software modules currently in-use in our current session. If we try to run the cufflinks command it fails, because the relevant module is not loaded. To run software the system must know the path of the program, and often the location of libraries and configuration. Each module provides this information for a particular software package.

We can search for a cufflinks module with module avail cufflinks, and then load it into our session with module load cufflinks or module load cufflinks/2.1.1 if we want a specific version. Now the output of module list shows the cufflinks modules, as well as boost – a library which cufflinks depends on. We can now run the cufflinks command directly to use the software, as below:

modules_example.png

Module Commands

To use BioHPC software modules effectively familiarize yourself with the following commands, which load, unload, list, search for, and display help about modules. Remember that you can contact biohpc-help@utsouthwestern.edu if you are unsure about a module, which version of a module to use, or if you need additional software setup.

module list
Show loaded modules

module avail
Show available modules

module load <module name>
Load a module, setting paths and environment in the current session

module unload <module name>
Unload a module, removing it from the current session.

module help <module name>
Help notes for a module

module –H
Help for the module command

 

Submitting a compute job via the portal

Once you have transferred data to BioHPC storage, the easiest way to submit a compute job is to use the Web Job Submission tool, which can be accessed from the BioHPC portal Cloud Services menu. This tool allows you to setup a job using a simple web form, automatically creating and submitting the job script to the SLURM scheduler on the cluster. Work through the form, filling in the fields according to the information below:

web_sbatch_form.png

Job Name This is a name for your job, which will be visible in the output of the squeue command, and on the job list show on the BioHPC portal. Use a short but descriptive name without spaces or special characters to identify your job.
Modules When running a job you must load any modules that provide software packages you require. If you are going to run a matlab script you must load a matlab module. Click the Select Modules button to access the module list. You can select any combination of modules with checkboxes. Note, however that some combinations don't make sense (e.g. 2 versions of the same package)  and may cause an error on submission.
STDOUT file Any output your commands would normally print to the screen will be directed to this file when your job is run on the cluster. You will find the file within your home directory, under the portal_jobs subdirectory. You can use the code '%j' to include the numeric job ID in the filename.
STDERR file Any errors your commands would normally print to the screen will be directed to this file when your job is run on the cluster. You will find the file within your home directory, under the portal_jobs subdirectory. You can use the code '%j' to include the numeric job ID in the filename.
Partition/Queue The nucleus cluster contains nodes with different amounts of RAM, and some with GPU cards. The cluster is organized into partitions separating these nodes. You can choose either a specific RAM partition, the GPU partition if you need a GPU card, or you can use the super partition. super is an aggregate partition containing all 128, 256 and 384GB nodes which can be used when it's not important that your job runs on any single specific type of node.
Number of Nodes The number of nodes are requried for your job. Programs do not automatically run on more than one node - they must use a parallel framework such as MPI to spread work over multiple nodes. Please review the SLURM training before attempting to run jobs across multiple nodes.
Memory Limit (GB) Specifies the amount of RAM your job needs. The options will depend on the partition selected. Choose the lowest amount required in order that your job can be allocated on the widest range of nodes, reducing wait times.
Email me The SLURM scheduler can send you an email when your job starts running, finishes etc. You can turn these emails off if you wish.
Time Limit Try to estimate the amount of time your job needs, add a margin of safety and enter that time here. The scheduler relies on job time limits to efficiently fit smaller jobs in between larger ones.Jobs with shorter time limits will generally be scheduled more quickly. Beware - this is a hard limit. If your job take longer than the limit entered it will be killed by the scheduler.
Job Command

The actual commands to run, such as 'matlab -nodisplay < hello.m' to run a matlab script, are entered in this section. You can have one or more command groups, each containing one or more commands. All of the commands in a group will be run at the same time, in parallel. The groups themselves run sequentially. Everything in the first group must finish before the second group begins. By default there is a single command in the first group, hostname.  This simply prints the name of the compute node your job runs on. You can replace it with a real command, or add another command group for your own commands.

 

Below the web form you will see the SLURM sbatch script that is being created from your choices. It is updated whenever you make a change in the form. The script contains comments beginning with #, and parameters for the scheduler beginning with #SBATCH. Setting up jobs using the web form, and then reviewing the script that is created, is a good way to learn the basics of SLURM job scripts. Note that you can even edit the script before you submit the job, but a script that as been edited cannot be further modified using the web form.

 

web_sbatch_script.png

When you are happy with the settings and commands for your job you can submit it using the button at the bottom of the page, below the script. If all settings are okay the portal will report that the job has been submitted and supply a Job ID. If there is a problem you may receive an error message from sbatch, which the portal will display. You can email biohpc-help@utsouthwestern.edu with any questions or problems you have submitting jobs.


Connecting to an interactive GUI visualization session

BioHPC allows interactive use of cluster nodes, for visualization debugging and running interactive software, through the portal's Web Visualization service. Using this facility you can run a Linux desktop session on a cluster node (webGUI), on a GPU node with 3D OpenGL acceleration (webGPU), or start a powerful Windows 7 virtual machine with 3D acceleration (webWinDCV). To start a session and connect to it use the Web Visualization link in the Cloud Services menu of the portal site:

web_visualization.png

The page that is displayed lists the connection information for any running visualization sessions. At the bottom is a form allowing you to start a new session. Choose the type of session you need and click the submit button to queue the visualization job on the cluster. All visualization jobs are limited to 20 hours. They run on cluster nodes and are managed by the SLURM schedules, so it can take some time for them to start when the cluster queue is busy. Once a session has started you will see VNC/DCV connection details for your session, and a link to connect directly from your web browser. The screenshot below shows matlab running in a WebGPU session with a connection made from the web browser:

web_vnc.png

The web browser connections are convenient but a smoother response is possible by connecting using a VNC client, particularly if you are using a wired network connection on campus. Connection details are displayed for each running session. If you are not using a BioHPC client or workstation you can download the TurboVNC client using the links provided. TurboVNC is the recommended VNC client for best performance and compatibility with our sessions. To resize the Turbo VNC client, increase the resolution of the client through System->Preferences->Display menu to increase the display size of the client on the host machines.

When using a WebGPU session to run 3D visualization software you need to start programs with the vglrun command. Type vglrun in front of the command line you would usually use, to start your software. E.g. vglrun paraview. This is neccessary to ensure that the 3D images can be passed back to your VNC connection from the GPU card on the cluster GPU node.

The Windows Virtual Machine webWinDCV sessions perform best, when a specific DCV client is used to connect. This offers far better performance for 3D graphics than a standard VNC connection. A download link is provided for you on the Web Visualization page.


Command line access via the portal and SSH

If you are comfortable using the Linux command line, or want to learn, you can login to our systems using the Secure Shell (SSH) to the nucleus.biohpc.swmed.edu cluster head node. The head node of the cluster allows users to login, manipulate and edit files, compile code and submit jobs. It should not be used to run analyses, as this will affect the response for others who are using the system.

The easiest way to login to the cluster is to use the Cloud Services - Nucleus Web Terminal of the portal website. This provides a command line interface inside your web browser. You will need to enter your password when the connection is made. Note that if you close your browser, or browse to a different page your connection will close.

web_terminal.png

To login using a stand-alone SSH client, please connect to nucleus.biohpc.swmed.edu using your biohpc username and password.

If you are using a Mac or Linux computer, you can use the ssh command in a terminal window - ssh username@nucleus.biohpc.swmed.edu

If you are using a Windows PC you will need to download an SSH client program. We recommend PuTTY , which is available via a link on the portal Software section.

 

Useful Linux Commands

The following commands are useful when working with Linux on BioHPC. See also the material from our Linux command line & scripting training session.

quota –ugs
Show home directory and project directory quota and usage

panfs_quota -G /work
Show work directory quota and usage

du –sh <directory>
Show size of a specific directory and it’s contents

squeue
Show cluster job information

sinfo
Show cluster node status

sbatch myscript.sh
Submit a cluster batch job using a script file

cat filename
Display a file on the screen

less filename
Displays a file so that you can scroll up and down. ‘q’ or ctrl-c quits

vi or vim
Powerful text editors, with a cryptic set of commands! See http://www.webmonkey.com/2010/02/vi_tutorial_for_beginners

nano
Simpler, easier to use! See http://mintaka.sdsu.edu/reu/nano.html

 

Using the SLURM Job Scheduler

Earlier in this document, we described how to submit a job using the web job submission service on the BioHPC portal. You can also work with the cluster's SLURM job scheduler from the command line. If you are familiar with another job scheduler (e.g. PBS, SGE), SLURM is similar but with different names for the common commands.

A comprehensive training session on using the SLURM job scheduler is given every 3 months. Please check the BioHPC training calendar and the materials from past sessions for more information.

squeue - view the job queue - From any biohpc system you can run the squeue command to display a list of jobs currently being managed by the scheduler. The job status is shown as a 1-2 letter code, and times are in Days-Hours:Mins:Seconds format:

squeue_ss.PNG

A more complete output, including the full status and the time-limit for each job can be obtained using the -l option to squeue:

squeue_long_ss.PNG

The list of jobs on the cluster can be long at times. To see only your own jobs use the -u <username> option, e.g. squeue -u dtrudgian

sbatch - submit a job - If you are comfortable writing SLURM job scripts for your jobs you can submit them to the cluster using the sbatch command. In the sample below the script myjob.sh was created using the vi editor, then submitted using sbatch myjob.sh. The output of the sbatch command is a numeric job ID if successful, or any error message if there are problems with your job script. Once you have submitted a job you can see it in the cluster queue with the squeue  command. In the example, the job is waiting to run with the reason given as (RESOURCES) as the scheduler is waiting for a node to become available:

 

sbatch.png

When  a node became available the job was executed successfully, and output messages written into the file specified in the job script. In this case the file was named job_26948.out and we can view it's content with the cat command:

job_output.png

scancel - cancelling a submitted job - If you make a mistake when submitting a job, please cancel it so that it does not occupy time on the cluster. The scancel command takes a job ID as it's only argument. In the example below we use the comand scancel 26953 to stop our job running. We can check that it was cancelled correctly, by examining the output of squeue - the job is no longer in the cluster queue. If you don't remember the job ID for a job you need to cancel check the output of squeue -u <username> which will list the details of all of your current jobs.

scancel.png


Next Steps

Now that you have worked through this introduction to BioHPC you should experiment with our systems! Make sure you can successfully login to the portal, use the web based job submission and connect to a web visualization session.

We offer training sessions on our cloud storage system, the SLURM scheduler, and our clients and workstations in a 3-month rotation on the 2nd Wednesday of each month. Advanced topics are covered on the 3rd Wednesday of each month with a drop-in coffee session held on the 4th Wednesday. Please check the training calendar and make a note of any sessions that are applicable to your work.

If you have any questions, comments, or sugesstions please contact us via biohpc-help@utsouthwestern.edu  or use the 'Comment on this page' link above the menu bar.


Last updated Dec 8th, 2016, YC.