Introduction

Last updated: 2024-03-19

Checks: 7 0

Knit directory: ProtocolLabRotationSaezRodriguezGroup/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20240306)

The command set.seed(20240306) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 7cb8923

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 7cb8923. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    analysis/.Rhistory

Untracked files:
    Untracked:  10X_Visium_ACH005.tar.gz
    Untracked:  ACH005/
    Untracked:  analysis/10X_Visium_ACH005.tar.gz
    Untracked:  analysis/ACH005/
    Untracked:  analysis/hca_p14.rds
    Untracked:  bc_metadata.tsv
    Untracked:  data/10X_Visium_ACH005.tar.gz
    Untracked:  data/ACH005/
    Untracked:  data/bc_metadata.tsv
    Untracked:  data/hca_p14.rds
    Untracked:  data/imc_bc_optim_zoi.RDS
    Untracked:  data/omni_resource.csv
    Untracked:  hca_p14.rds
    Untracked:  imc_bc_optim_zoi.RDS
    Untracked:  omni_resource.csv
    Untracked:  omnipathr-log/
    Untracked:  result/
    Untracked:  results/

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/Introduction.Rmd) and HTML (docs/Introduction.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	e2e128e	leotenshii	2024-03-18	samll additions
html	e2e128e	leotenshii	2024-03-18	samll additions
html	a00c940	leotenshii	2024-03-17	Build site.
Rmd	b512668	leotenshii	2024-03-17	small updates
html	1e4a2af	leotenshii	2024-03-17	Build site.
html	ec66e86	leotenshii	2024-03-17	Build site.
Rmd	5f6f819	leotenshii	2024-03-17	wflow_publish(c("analysis/_site.yml", "analysis/Introduction.Rmd",

General

Single-cell technologies have revolutionized the understanding of cellular processes by enabling the discovery of new gene regulation mechanisms and protein expression dynamics. A significant limitation of single-cell methods is that they remove the cells from their spatial context thus preventing the unraveling of the complex interplay between cells in tissues. To address this challenge, spatial omics methods enabling the analysis of different biomolecules directly in tissue samples have emerged, conserving the spatial context. These methods bridge the gap between single-cell resolution and spatial information and provide invaluable insights into cellular interactions, tissue structure, and disease mechanisms. In this context, spatial transcriptomics and spatial proteomics have proven to be powerful tools for analyzing the spatial organization of RNA and prot

Spatial transcriptomics

Spatial transcriptomics allow for analyzing RNA-seq data at a spatial level. This is possible through capturing both the spatial context and transcriptional patterns at the same time. Yue et al. (1) categorize the methods into four groups.

In situ hybridization-based technologies use labeled complementary probes to bind and detect specific mRNA targets in tissue samples. The second group includes in situ sequencing-based technologies. RNA molecules are sequenced directly in the tissue sample. Various methods can be used in this process, such as RNA reverse transcription, cDNA cyclization, and rolling-circle amplification. Next generation sequencing-based technologies form the next group. These technologies capture RNA in tissue sections through RNA-capturing slides or DNA barcoding. The last group contains the spatial information reconstruction technologies that combine RNA sequencing with computational methods like image reconstruction or iterative algorithms.

Choosing the appropriate spatial transcriptomic technology involves weighing factors such as gene throughput, sensitivity, resolution, and feasibility against each other. Therefore, it is all the more important to know the advantages and limitations of the methods to choose the right one for an experiment.

Visium 10x

Visium (2) is a spatial transcriptomics technology provided by 10x Genomics. It belongs to the group of Next generation sequencing-based technologies working with RNA-capturing slides. Tissue sections are placed on a special slide with capture areas. Each of these capture areas contains 5000 barcoded spots. The tissue sample is then stained and imaged routinely. Subsequently, the tissue sample is fixed and permeabilized, allowing the capture of RNA on the probes in the capture areas. Then, the RNA is sequenced and a sequencing library is constructed. This way, the transcripts can be linked to their respective location on the tissue slide.

Visium can profile the whole transcriptome with a spatial resolution of 55µm, corresponding to the size of a spot. Therefore, it can not provide single-cell resolution since eukaryotic cells have an average diameter of 10µm. Further, its RNA capturing efficiency is relatively low (3). Still, Visium is one of the most popular commercially available spatial transcriptomics assays.

Spatial Proteomics

Spatial proteomics enable the analysis of protein localization within tissues, providing insights into their spatial patterns. Image-based methods visualize protein locations in the tissues through specific targeting with antibodies (4). The review groups the methods into cyclic fluorescent and one-step mass-tag procedures. Both are based on the same principle of antibody tagging with fluorophores, DNA barcodes, enzymes, or metal tags and subsequent fluorescence imaging or mass spectrometry.

Cyclic fluorescent approaches work by staining, imaging, and extinguishing the signal, which is repeated for each protein to be measured. Only one round of staining is needed when antibodies are DNA-barcoded, making the assay faster. In contrast, one-step mass-tag methods work without cycling but with one-time immunostaining and detection. The applied antibodies are conjugated with ionizable metal mass tags. Following the staining, mass spectrometry is used for data acquisition. These methods have the advantage of being more robust than cyclic fluorescent methods but also have the disadvantage of being more expensive.

In comparison to spatial transcriptomics, these methods provide a better estimation of cellular activity as RNA levels and protein levels do not always correlate.

Image Mass Cytometry (IMC)

IMC (5) belongs to the group of one-step mass-tag technologies. The tissue is stained with antibodies tagged with a unique rare-earth-metal isotope. The tissue is then ablated in a grid using a laser with a spot size of 1 µm. The ablated tissue is then transferred to a CyTOF mass cytometer. The signal of the individual isotopes is extracted for all measured markers, plotted according to the spot coordinates, and layered to create a high-dimensional image. In the final step, a single-cell segmentation is performed before the downstream data analysis.

IMC can image up to 40 markers simultaneously. It is estimated that it will be possible to image over 100 markers in the future. IMC is also faster than comparable methods such as Multiplexed Ion Beam Imaging (MIBI) at the expense of an increased spot size.

MISTy

As the number of spatial omics methods increases, so does the number of applications for analyzing them. Examples are tools for neighborhood analysis (e.g. Squidpy (6)), finding spatial variable genes (e.g. SpatialDE (7)), or spatial deconvolution (e.g. Cell2location (8)). Discovering interesting spatial patterns through these methods raises questions about the interactions that create them. One approach to answering this question involves the R package MISTy (Multiview Intercellular SpaTial modeling framework) (9).

MISTy uses an explainable machine learning algorithm to analyze spatial omics data sets within and between spatial contexts called views. The baseline of the model is the intraview that captures the relationships within the spatial unit, be it a spot or a cell. Adding further views is optional. Two built-in views that consider spatial context are juxtaview and paraview. The juxtaview considers all marker expressions in the close neighbors of a spatial unit, while the paraview includes the marker expressions in the broader tissue weighted by their distance to the spatial unit. The user can set the radius of these views and add custom views. Then, for each view, the expression of a marker is predicted based on the expression of all other markers in the spatial context of the view. By default, this is modeled by a random forest, but other algorithms are available. In the second step, the contribution of each view in predicting the marker expression is estimated by ridge regression of the beforehand obtained results. The output contains performance, contribution, and importance estimations. The downstream analysis can provide answers to the following questions:

To what extent can the analyzed marker expression from surrounding tissues explain the marker expression of the spot? How does it compare to the estimation of the intraview alone? There are three different statistics to look at: R2 (the fraction of variance explained) by the intraview and the multiview model and the resulting gain in R2 from the non-intraviews. This can be used to determine which marker expressions are better explained when influences from outside the spot are taken into account. Furthermore, visualization of which view has how much influence on the improved prediction is possible.
What are the specific relations that can explain the contributions? To explain the contributions, the importance of each marker in predicting all marker expressions is visualized separately for each view. Interesting predictor-target pairs can then be visualized and analyzed further with different tools.
What drives differences in the performance/contribution/importance measurements? After training the model, in the result space, these samples are represented by a vector consisting of the sample signatures. There are three signatures: performance, contribution, and importance. Based on the signatures, causes for differences in performance metrics between the samples can be analyzed. This is done by performing PCA on the signatures and finding factors responsible for the clustering of the samples.

My project

The goal of my project was to add new vignettes to the MISTy package, since they are the most frequently accessed part of the MISTy website. I have written a total of six vignettes that show how to answer the three questions described above.

For five of the vignettes, I used a Visium 10X dataset from Kuppe et al. (10). They generated a comprehensive map of cardiac remodeling following myocardial infarction in humans. Multimodal data integration of single-cell gene expression, chromatin accessibility, and spatial transcriptomic data was performed to determine, for example, cell type composition. I selected patient 14’s (P14) data for the vignettes. Two of the vignettes analyze structural relationships based on cell type distribution, while the two analyze functional relationships based on spatial patterns of markers. The fifth vignette combines both structural and functional analysis. Selected markers are, for example, receptors, ligands, or pathway-specific genes.

The sixth vignette demonstrates the sample signature analysis on an IMC dataset (11) of tumors of three different grades, measuring 26 protein markers. In the vignette, I show how the samples can be distinguished by their signatures and that this grouping is similar to that by grade or clinical type.

To summarize, by creating six vignettes, I have increased the accessibility of MISTy and established a general recipe for analysis with MISTy that can be adapted to individual needs.

References

L. Yue et al., A guidebook of spatial transcriptomic technologies, data resources and analysis approaches. Computational and Structural Biotechnology Journal. 21, 940–955 (2023).

Visium spatial gene expression (14/03/2024) (available at https://www.10xgenomics.com/products/spatial-gene-expression).

Y. Wang et al., Spatial transcriptomics: Technologies, applications and experimental considerations. Genomics. 115, 110671 (2023).

J. R. Moffitt, E. Lundberg, H. Heyn, The emerging landscape of spatial profiling technologies. Nat Rev Genet. 23, 741–759 (2022).

C. Giesen et al., Highly multiplexed imaging of tumor tissues with subcellular resolution by mass cytometry. Nat Methods. 11, 417–22 (2014).

G. Palla et al., Squidpy: A scalable framework for spatial omics analysis. Nat Methods. 19, 171–178 (2022).

V. Svensson, S. A. Teichmann, O. Stegle, SpatialDE: Identification of spatially variable genes. Nat Methods. 15, 343–346 (2018).

V. Kleshchevnikov et al., Cell2location maps fine-grained cell types in spatial transcriptomics. Nat Biotechnol. 40, 661–671 (2022).

J. Tanevski, R. O. R. Flores, A. Gabor, D. Schapiro, J. Saez-Rodriguez, Explainable multiview framework for dissecting spatial relationships from highly multiplexed data. Genome Biol. 23, 97 (2022).

10.

C. Kuppe et al., Spatial multi-omic map of human myocardial infarction. Nature. 608, 766–777 (2022).

11.

S. D et al., histoCAT: Analysis of cell phenotypes and interactions in multiplex image cytometry data. Nat Methods. 14, 873–876 (2017).

sessionInfo()

R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=German_Germany.utf8  LC_CTYPE=German_Germany.utf8   
[3] LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.utf8    

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] workflowr_1.7.1

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5       httr_1.4.7        cli_3.6.2         knitr_1.45       
 [5] rlang_1.1.2       xfun_0.41         stringi_1.8.3     processx_3.8.3   
 [9] promises_1.2.1    jsonlite_1.8.8    glue_1.6.2        rprojroot_2.0.4  
[13] git2r_0.33.0      htmltools_0.5.7   httpuv_1.6.13     ps_1.7.5         
[17] sass_0.4.8        fansi_1.0.6       rmarkdown_2.25    jquerylib_0.1.4  
[21] tibble_3.2.1      evaluate_0.23     fastmap_1.1.1     yaml_2.3.8       
[25] lifecycle_1.0.4   whisker_0.4.1     stringr_1.5.1     compiler_4.3.2   
[29] fs_1.6.3          pkgconfig_2.0.3   Rcpp_1.0.11       rstudioapi_0.15.0
[33] later_1.3.2       digest_0.6.33     R6_2.5.1          utf8_1.2.4       
[37] pillar_1.9.0      callr_3.7.3       magrittr_2.0.3    bslib_0.6.1      
[41] tools_4.3.2       cachem_1.0.8      getPass_0.2-4