Section: 8. Cluster cells and detect marker genes for clusters | Single-cell RNA-seq data analysis with Chipster | csc

Main course page
  • About the course

    Course contents

    In this course, you will learn how to analyse single-cell RNA-seq data using the Seurat single-cell tools integrated in the easy-to-use Chipster software. The exercises and course data are based on the Seurat guided analyses "Guided tutorial - 2700 PBMCs" and "Introduction to scRNAseq integration".

    This course contains two types of lecture videos: short lectures on each topic by trainers from CSC (ELIXIR-FI), and more in-depth lectures by Paulo Czarnewski (NBIS / ELIXIR-SE), Ahmed Mahfouz (LUMC / ELIXIR-NL) and Jules Gilet (ELIXIR-FR).

    You will learn the following topics, and how to perform these steps in the Chipster software:

    • UMAP plot showing how cells (dots) are clusteredperform quality control and filter out low quality cells
    • normalize gene expression values (with global scaling normalization and SCTransform)
    • scale data and remove unwanted sources of variation
    • select highly variable genes
    • perform dimensionality reduction (PCA, tSNE, UMAP, CCA)
    • cluster cells
    • find marker genes for a cluster
    • annotate cells and clusters using a reference data
    • take a closer look at the Seurat objects
    • integrate two samples
    • find conserved cluster marker genes for two samples
    • find genes which are differentially expressed between two samples in a cell type specific manner
    • visualize genes with cell type specific responses in two samples

    "It is so nice to be able to do the whole workflow in Chipster, compared to the old model, where I had to transfer the tsv file to R-studio and run Seurat there. -- I learned how to use the Seurat tools in Chipster and what all the steps really mean. I learned to check the results after every step to adjust the next steps parameters and to test different PCA plotting tools. I also learned how to find different genes in the clusters and how to visualize them. I never got this far using the R-pipeline. " -Pinja, course participant & PhD student from University of Helsinki

    Learning objectives

    After this course you should be able to:

    • use the Seurat tools available in Chipster to undertake basic analysis of single-cell RNA-seq data
    • name and discuss the different steps of single-cell RNA-seq data analysis
    • understand the advantages and limitations of single-cell RNA-seq data analysis in general and in Chipster

    Keywords: Chipster, Seurat, single-cell sequencing, RNA-seq, clustering, aligning cells, cluster markers

    Links to material
    The relevant material is linked in each course section. Here are some quick links:

    Each section of this course contains lecture videos, hands-on exercises and questions/tasks. The tasks can be used to confirm that you have reached the learning goals. You can use the Q&A Forum below to ask questions regarding the course topics or the exercises. Once you have finished all the tasks, you can download a course certificate with a unique course identifier. You can follow your progress with the progress bar on the right. The estimated time to complete the course is 2-3 days. In the certificate we recommend granting 1 credit (ECTS) for the course.  

    In practical matters, please contact event-support (at), and in content related questions, chipster (at) You can also join the Weekly CSC research user meetings in Zoom to discuss course matters and get help with the exercises.

8. Cluster cells and detect marker genes for clusters

  • 8. Cluster cells and detect marker genes for clusters

    In this section, you will learn about clustering of the cells, and finding and visualising cluster marker genes.

    We want to know what kind of cells are present in our dataset, so we cluster the cells, and study the similarities in expression within these clusters. Clustering is not a simple task! Luckily the Seurat tools wrapped in the Chipster's clustering tool will take care of it, but it is good to understand what happens under the hood.

    First, watch the lecture video about clustering:



    For more theory, you can also watch this lecture video and slides by Ahmed Mahfouz (LUMC, ELIXIR-NL):



    Once we have clustered the cells, we can look for marker genes for these clusters. In this section you will learn

    • What is a marker gene
    • What aspects of scRNA-seq data complicate differential expression analysis
    • Why do we want to filter out genes prior to statistical testing

    Watch the lecture on detecting marker genes for clusters



    After watching the video, contemplate on the questions above.

    Next, do the following exercises in Chipster:

    1. Clustering

    Select seurat_obj_PCA.Robj from the previous step. Select tool Single-cell RNA-seq / Seurat - Clustering. In the parameters, set Number of principal components to use = 10.

    While waiting for the tool to run, you can study the manual (click "More info..." to access the manual page).

    What are the two main steps of this tool?

    When the results are ready, inspect the clusterPlot.pdf. How many clusters are there in this data?

    2. Marker genes for clusters

    Select seurat_obj_clustering.Robj from the previous step and the tool Seurat v4 -Find differentially expressed genes between the clusters. In the parameters tab, set the parameters as indicated below, and run the tool.

    • Find all markers = FALSE
    • Cluster of interest = 3 (note, we want to select that one "lonesome" cluster far away from the others -if you used some different parameters in previous steps, please note that cluster number might be different!)
    • Limit testing to genes which are expressed in at least this fraction of cells = 0.25  

    Open markers.tsv as spreadsheetHow many marker genes were found for cluster 3? What are the top two marker genes for this cluster? 

    Let's check which markers show higher than 4-fold difference in expression between cluster 3 and all other cells. Select markers.tsv and run the tool Filter table by column value from the Utilities category using the following parameters:

    • Column to filter by = avg_log2FC
    • Does the first column lack a title = yes
    • Cutoff = 2 (why do we put 2 here if we want a 4-fold difference?)
    • Filtering criteria = larger-than

    How many genes do you get?

    3. Visualize markers

    Choose seurat_obj_clustering.Robj  generated in the step 2. Select tool Single-cell RNA-seq / Seurat -Visualize genes. Type marker gene names in the parameter field (try for example MS4A1, LYZ, PF4). You can enter several gene names at the same time, separated by comma (,). Set the parameters:

    • Add labels on top of clusters in plot = yes 
    • Plotting order of cells based on expression = yes
    • Give a list of average expression and percentage of cell expressing in each cluster = yes

     Open the biomarker_plot.pdf.

    Is any of your genes a good marker for cluster 3? Are the genes you selected good markers for other clusters (check both the plots and the tables)?

    Finally, answer the questions in the quiz.