Bioinformatics tools often require installing different dependencies in a controlled environment. Containers allow you to logically package your application (e.g., a bioinformatics tool) together with libraries and other dependencies, providing isolated environments for running your software services. Containerised applications can be run in an isolated runtime environment independent of the actual environment (e.g., private data center, the public cloud, or even a developer’s personal laptop) in which the applications are running in. These are recently gaining popularity as a standard way to distribute, deploy, and run services by developers and system administrators. This course will focus on the deployment of containerised applications in HPC environment. The course will also introduce a modern workflows manager (i.e., nextflow ) to perform complex analysis in bioinformatics..

In March you can spend your Tuesday and Wednesday mornings learning how to use supercomputer Puhti to do your data processing or simulations effectively, and how to handle your data not only while doing the analysis, but also before and after that! Join the BioMonth 2021 classes and make sure you get the most out of our services* and your data management in shape!

* The CSC services discussed in this course are free-of-charge for academic research, education and training purposes in Finnish higher education institutions and in state research institutes (subsidized by the Ministry of Education and Culture, Finland).  

With 1 registration you get 6 half-days (9:00-12:00) of interesting topics in March 2021.

16.3. and 17.3.
23.3. and 24.3.
30.3. and 31.3.


Puhti is a CSC's supercomputer that comprises powerful CPU partitions with a wide range of memory sizes and local storage options. Puhti allows the user to reserve compute and memory resources flexibly, and the user can run anything from interactive single core data processing to medium scale simulations spanning multiple nodes. Puhti has a wide selection of scientific software installed.

Allas is CSC's general-purpose research data storage server. It is a part of the CSC storage portfolio and can be accessed on the CSC servers as well as from anywhere on the internet. Allas can be used both for static research data that needs to be available for analysis and to collect and host cumulating or changing data.

Good research data management is the basis of successful research. Research data management (RDM) concerns the managing and organisation of data during as well as after the active phase of a project. It is important to consider all stages of data management from collecting and processing the data to publishing and sharing it using a Data Management Plan (DMP). This will increase the impact and visibility of your work and enable reuse of the data in the future.

Later this course we will introduce HPC-compliant containers called Singularity containers which allows Puhti users to run their applications in a containerised environment. These containers can serve as an alternative approach to conda packages.

Puhti (16.-17.3. 9:00-12:00 EET (UTC+2))
Getting started with Puhti
Module system
Data storage in Puhti
Running sbatch jobs
Performance analysis
Running interactive jobs in Puhti

Data (23.-24.3. 9:00-12:00 EET (UTC+2))
What is Allas? 
Projects, clients and interfaces.
Examples for storing, using and sharing small or large datasets.
Examples for using Allas from Puhti and from your local environment
Research data management: what happens before and after computing?
Data management planning
Sensitive data services offered by CSC
Publishing and sharing data after the project

Containers (30.-31.3. 9:00-12:00 EET (UTC+2)) 
Introduction to Singularity
Running applications as singularity containers
Building singularity containers
Converting conda packages as singularity applications
Workflows with singularity containers (if this part is two days)

This is a basic level course on containers which are a modern way of deploying complex applications. In this course, the basics of docker container and its containnerised applications in bioinformatics will be discussed. Emphasis will be given to the deployment of different pre-existing dockerised bio-applications. Selected examples from different omics’ disciplines such as genomics, proteomics and metabolomics will be covered. We will introduce HPC-compliant containers in CSC environment.

This introductory course covers single cell RNA-seq data analysis methods, tools and file formats. The free* and user-friendly Chipster software is used in the exercises, and the course is thus suitable for everybody.
* See: