In this session we demonstrate the use of Seurat tools for joint analysis of two samples. The session uses the same data and follows the steps in Seurat tutorial for integrated analysis: https://satijalab.org/seurat/v3.0/immune_alignment.html
We begin with two expression matrixes: one for control PBMC cells, and another for PBMC cells stimulated with interferon beta. So now the cells will cluster based on cell type, but also based on the treatment, which makes the analysis a bit more complex.
We wish to:
- Identify cell types that are present in both datasets
- Obtain cell type markers that are conserved in both control and stimulated cells
- Compare the datasets to find cell-type specific responses to stimulation
The first steps of the analysis are already familiar to us. After we have preprocessed both samples, we combine them, perform the integrated analysis, find markers for samples and for clusters, and visualise these.
The process of integrating samples is described in more detail in Methods section in the paper by Stuart*, Butler*, et al., Cell 2019. (You can access the paper in bioRxiv: https://www.biorxiv.org/content/10.1101/460147v1)
Please watch the video for introduction to two sample analysis:
We are switching to another dataset and another Chipster session, with two samples.Please go through these exercises:
1. Open example session
Click Sessions and open training session course_single_cell_RNAseq_Seurat_integrated.
2. Setup Seurat object & quality control
Select the immune_control_expression_matrix.txt.gz. Select tool Single cell RNA-seq / Seurat -Setup and QC. Check the parameters, and name your sample as CTRL. You can name the project and give a bit stricter parameter for filtering (genes expressed at least in 5 cells for example). Make sure that you have assigned the file correctly: this is a digital expression matrix (a DGE table). Run the tool.
Repeat this step for the immune_stimulated_expression_matrix.txt.gz, except now name the sample as STIM.
3. Filtering, regression and detection of variable gene
Select seurat_obj.Robj. Select the tool Single cell RNA-seq / Seurat - Filter cells, normalize, regress and detect variable genes. Adjust the parameters so that you are filtering out cells that have less than 500 genes expressed, and run the tool. Repeat for the other sample as well.
Once the tool is done, open the Dispersion_plot.pdf and check also the second page.
How many variable genes are there?
4. Combine two samples
Select both seurat_obj.R objects from the previous step and run the tool Single cell RNA-seq / Seurat –Combine two samples.
Please watch the video for aligning two samples and clustering:
For more theory, you can view the lecture video by Ahmed Mahfouz (slides):
Then go through these exercises:
5. Integrated analysis of two samples
Select the combined seurat_obj.Robj from the previous step. Run the tool Single cell RNA-seq / Seurat –Integrated analysis of two samples with default parameters.
While waiting, you can study the manual (click More info...). What are the main steps of this tool?
When the results are ready, study the integrated_plot.pdf. How many clusters are there in this data?
Please watch the video for finding differentially expressed genes and conserved cluster markers:
Then go through these exercises:6. Find conserved cluster markers and DE genes in two samples
Select the seurat_obj.Robj from the previous step. Run tool Find conserved cluster markers and DE genes in two samples for a cluster of your interest. Inspect the tables generated by the tool.
What was used as a cut-off for the adjusted p-value?
How many differentially expressed genes were there between the two samples in this cluster? Write down few interesting genes from the list for the visualization exercise 7.
How many conserved biomarkers were recognized for the cluster? Write down few interesting genes from the list for the next tool.
7. Visualize markers and differentially expressed genes
Choose seurat_obj.Robj generated in step 5. Select tool Single cell RNA-seq / Seurat - Visualize genes with cell type specific responses in two samples. Type the gene names to the parameter field (the ones you listed in previous step, or try for example: CD3D, GNLY, IFI16, ISG15, CD14, CXCL10). Use comma (,) as a separator. You can run the tool several times for different gene lists.
Are the differentially expressed genes expressed differently also in other clusters? Are the conserved markers expressed in other clusters?