mudRapp-seq

This repository contains the code that accompanies the paper introducing "Multiple direct RNApadlock probing in combination with in-situ sequencing (mudRapp-seq)":

Ahmad S, Gribling-Burrer AS, Schaust J, Fischer SC, Ambil UB, Ankenbrand MJ, Smyth RP. Visualizing the transcription and replication of influenza A viral RNAs in cells by multiple direct RNA padlock probing and in-situ sequencing (mudRapp-seq) (in preparation)

Warning

This repository is a work in progress, so if you notice any errors or have any suggestions, please open an issue.

This repository is archived at Zenodo

Data

Raw data is archived independently in the BioImage Archive (BIA) with accession number S-BIAD1376 (under embargo until publication). In order to reproduce our analyses, download the raw data from BIA and put them into the data/raw folder.

Image data, acquired on the Leica DMI8 were maximum intensity projected along the z-axis and instant computational clearing (ICC) was applied using Leica software. These images, with associated metadata consist the raw data of our analysis. Only for one dataset (2nt), the images without ICC were used for cell segmentation. Except for these non-ICC images, all datasets were convertet to spacetx format for further analysis.

All final and some intermediate results are included in the repository to facilitate additional analyses without having to re-process all files from scratch.

Data sets

cDNA_vRNA: sensitivity of cDNA vs direct vRNA probing (raw (cDNA, vRNA), raw (cDNA*), spacetx)
specificity: specificity of PLPs with closely related strains PR8 and StPt (raw)
plp_individual: sensitivity of direct vRNA probing with 10 individual PLPs on the HA and NA segments (raw HA, raw NA, spacetx)
plp_cumulative: sensitivity of direct vRNA probing with increasing number of PLPs on the HA, NA, and PB1 segments (initial experiment for HA, NA, and PB1, replicates for HA and PB1, replicates for NA, more replicates for HA, spacetx)
seq_qc: quality control experiment for the sequencing method, only a single barcode is present in each sample (raw, spacetx)
seq_2nt: biologically relevant experiment, simulataneous detection of all viral mRNA and vRNA segments using a 2nt barcode (raw initial with ICC,raw initial no ICC,raw replicates with ICC,raw replicates no ICC, spacetx, intensity_scaled)

Computational environments

Python environments

To re-create the python environments with mamba run:

mamba env create -f envs/starfish.yml # mudRapp-seq-starfish
mamba env create -f envs/cellpose.yml # mudRapp-seq-cellpose

R environment

The R environment for this project is managed via renv. A local environment is automatically created for you, when you run R or Rscript for the first time in the main project directory. This happens because of the .Rprofile file.

jupyter setting (vscode)

The jupyter files are in sub-folders of code/ but assume the kernel to run in the project root. This is necessary to make the R kernel use the local renv, and allows to consistently use paths relative to the project root rather than the specific notebook location. In VS Code this can be achieved by changing the setting jupyter.notebookFileRoot to ${workspaceFolder}. For jupyter lab there seems to be no simple solution at this moment (see jupyterlab#11619).

Analyses

Data formatting

As starfish is used, the raw data needs to be restructured in SpaceTx format.

In order to create the formatted data in data/spacetx run these steps in the root of the mudRapp-seq repo and in the mudRapp-seq environment:

mamba run -n mudRapp-seq-starfish python code/data_formatting/cDNA_vRNA.py
mamba run -n mudRapp-seq-starfish python code/data_formatting/specificity.py
mamba run -n mudRapp-seq-starfish python code/data_formatting/plp_individual.py
mamba run -n mudRapp-seq-starfish python code/data_formatting/plp_cumulative.py
mamba run -n mudRapp-seq-starfish python code/data_formatting/seq_qc.py
mamba run -n mudRapp-seq-starfish python code/data_formatting/seq_2nt.py

For cell segmentation in the seq_2nt dataset, the intensity of raw images without ICC need to be re-scaled, such that the autofluorescence within the cell is amplified:

mamba run -n mudRapp-seq-starfish python code/data_formatting/seq_2nt_scale_intensity.py

Segmentation

Images were separately segmented for nuclei and cell instances. Nuclei segmentation is used to separate spots based on their location into nucleus and cytoplasm. Cell segmentation is used to count spots per cell, filter infected cells and perform single cell analyses.

All segmentation masks (along with training data and models) are depositet at Zenodo:

You can either download from there and unpack them into analysis/segmentation or follow the instructions below to create masks yourself.

For nucleus segmentation, a cellpose model (models/cellpose/nuclei) was trained and applied to the raw dapi images (in spaceTx format). The model was trained on a total of 5 dapi images with human provided sparse labels (seq_2nt, rep3, hpi5, fov1-5).

For cell instance segmentation, two different approaches were used:

Watershed of the dapi image with nuclei as seeds
A separate cellpose model with manual correction (details below)

Strategy 1 (nuclei via cellpose, cells via watershed)

The first strategy was used for most data, as it was deemed sufficient for filtering of infected cells and to calculate summary statistics like spots per cell. However, for single cell analyses the cell borders were not reliable enough.

This code performs nuclei segmentation with the cellpose model and watershed for cell segmentation (strategy 1).

mamba run -n mudRapp-seq-cellpose python code/segmentation/cellpose_nuclei_watershed_cells.py

Strategy 2 (nuclei and cells via cellpose, cells manually corrected)

The separate cellpose model was trained on raw images without ICC (computational clearing by the microscope vendor). Further, data was preprocessed with intensity scaling (see code). The model was trained on a total of 7 images with human provided sparse labels (2nt_rep1_0.3MOI_5hpi_fov4, 2nt_rep1_0.3MOI_7hpi_fov1, 2nt_rep1_0.3MOI_8hpi_fov1, 2nt_rep1_0.3MOI_8hpi_fov4, 2nt_rep1_1.0MOI_7hpi_fov1, 2nt_rep1_1.0MOI_8hpi_fov1, 2nt_rep2_0.3MOI_8hpi_fov2). In order to maximize the number of correctly detected cells, the following parameters were used: cellprob_threshold=-4.0, flow_threshold=0.7 based on preliminary experiments. Resulting masks were post-processed, removing small objects and closing small holes and gaps (see code).

mamba run -n mudRapp-seq-cellpose python code/segmentation/cellpose_nuclei.py
mamba run -n mudRapp-seq-cellpose python code/segmentation/cellpose_cells.py

Masks produced this way were manually corrected using label editing tools in napari. Manual correction involved extending cells, shrinking cells, moving cell borders and adding new cells (new cells were assigned IDs of 2000 and higher). The focus of the manual correction was mainly on infected cells, if no visibly infected cells were made out during inspection the FOV was saved unchanged.

The script used for manual correction can be started for a specific replication, MOI, hpi, and fov like this:

mamba run -n mudRapp-seq-starfish python code/segmentation/manual_correction_via_napari.py --rep 1 --moi 0.3MOI --hpi 7 --fov_index 1

The "cells (manually corrected)" layer, can be modified using napari tools. When finished, the layer can be saved in the corresponding folder (e.g. analysis/segmentation/seq_2nt/rep1/0.3MOI/7hpi/fov_1_cpmc_cells.png), the infix _cpmc_ stands for cellpose with manual correction.

Spot detection

Spot detection is performed using starfish methods. The following command creates csv and netCDF files in analysis/spot_detection

mamba run -n mudRapp-seq-starfish python code/spot_detection/cDNA_vRNA.py
mamba run -n mudRapp-seq-starfish python code/spot_detection/specificity.py
mamba run -n mudRapp-seq-starfish python code/spot_detection/plp_individual.py
mamba run -n mudRapp-seq-starfish python code/spot_detection/plp_cumulative.py
mamba run -n mudRapp-seq-starfish python code/spot_detection/seq_qc.py
mamba run -n mudRapp-seq-starfish python code/spot_detection/seq_2nt.py

This creates separate spots files for each fov, they can be combined to a single tsv.xz file for each experiment using

Rscript code/spot_detection/combine_csvs.R

The result of this step is included in the repository:

analysis/spot_detection/cDNA_vRNA/all_spots.tsv.xz
analysis/spot_detection/plp_cumulative/all_spots.tsv.xz
analysis/spot_detection/plp_individual/all_spots.tsv.xz
analysis/spot_detection/seq_2nt/all_spots.tsv.xz
analysis/spot_detection/specificity/all_spots.tsv.xz

Summary results of the QC experiment are in:

analysis/spot_detection/seq_qc/rep0/A_PB2/results.csv

Spot analysis

Sensitivity of cDNA vs direct vRNA probing

The main result is the much higher sensitivity for direct vRNA probing compared to cDNA probing. For details, see the analysis notebook. These figures are included in the manuscript as Supp. Fig. 1b and Fig. 1c, respectively.

Specificity of padlock probing on closely related strains

The main result is that padlock probing has very high specificity. There are almost no false positive results of probes designed for another strain. For details, see the analysis notebook. These figures are included in the manuscript as Supp. Fig. 1b and Fig. 1c, respectively.

Sensitivity of individual padlock probes

To test the sensitivity of individual padlock probes (PLPs), PLPs for ten distinct locations on NA and HA each, have been designed. The main result is that individual PLPs have different levels of sensitivity. While most PLPs produce a good number of spots, some PLPs produce almost none. For details, see the analysis notebook. These figures are included in the manuscript as Fig. 2c,d and Supp. Fig. 2c,d, respectively.

Binding site reactivity with Nano-DMS-MaP

To explain the different efficiencies of the individual PLPs, the reactivity of the binding sites was analyzed with Nano-DMS-MaP, both with and without PLPs bound. PLP binding reduces the reactivity of the binding sites (as expected). Further, the binding reactivity (without PLPs) positively correlates with PLP efficiency (both looking at the whole binding site, and only looking at a small window around the junctions). For details, see the analysis notebook. An overview of the reactivities along the NA segment, and the correlations with efficiencies are shown in the manuscript in Fig. 3b,c and Supp. Fig. 3a. The same plots were generated for the HA segment in the same notebook and shown in Supp. Fig. 4a,4b,4c.

Sensitivity with increasing number of padlock probes

To test the sensitivity with increasing number of padlock probes per segment, an increasing number of the ten distinct locations on NA and HA, have been used. The main result is that sensitivity increases with number of PLPs used, but saturates around 6PLPs. For details, see the analysis notebook. These figures are included in the manuscript as Fig. 2f,g and Supp. Fig. 2f,g, respectively.

Sequencing quality control

Dedicated experiments have been performed, to check the quality of the sequencing procedure.

Channel bleed-through estimation

Based on four experiments, in which only one of the channels (A,G,T, and C) is active in the first round, the bleed-through of signal from each channel to each other channel was estimated.

Only a moderate bleed-through (factor 0.656) was detected from channel A to channel T.

For details, see the notebooks (analysis, plot). This figure is included in the manuscript as Supp. Fig. 5b.

Decoding quality

Detailed analysis of the decoding correctness showed, that in an experiment with only a single valid 6nt barcode present, more than 92% of all detected spots were correct after 2 rounds of sequencing and 84% spots after 6 rounds of sequencing.

For details, see the analysis notebook. These figures are included in the manuscript as Supp. Fig. 6b,c.

Segment analysis (seq_2nt)

channel order: "AGTC"
magnification: images were taken at 63x oil objective and pixel size is 0.103 µm

2nt barcode

Segment	vRNA/mRNA	Code
PB2	vRNA	TT
PB1	vRNA	TG
PA	vRNA	TC
HA	vRNA	TA
NP	vRNA	GT
NA	vRNA	GG
M	vRNA	GC
NS	vRNA	GA
PB2	mRNA	CT
PB1	mRNA	CG
PA	mRNA	CC
HA	mRNA	CA
NP	mRNA	AT
NA	mRNA	AG
M	mRNA	AC
NS	mRNA	AA

Bulk analysis

The molecule and segment counts, relative abundances and ambiguous spots were analysed by MOI and hpi, separately for nucleus and cytoplasm.

For details, see the analysis notebook. These figures are included in the manuscript as Fig. 5c,d and Supp. Fig. 7a,b, and 9a,b respectively.

Temporal correlation

The temporal correlation analysis of single segment mRNA expression with total vRNA abundance revealed a high correlation of the M segment.

For details, see the analysis notebook. This figure is included in the manuscript as Supp. Fig. 8.

Single-cell analysis

The single-cell analysis reveals extensive cell-to-cell heterogeneity. A substantial proportion of cells fails to replicate all vRNA segments. Cells missing either component of the polymerase complex vRNA segments or NP are associated with very low replication of the vRNA.

For details, see the analysis notebook. The results of the linear modelling are included in Table 2 in the manuscript. These figures are included in the manuscript as Fig. 6b,c,d,e and Supp. Fig. 10a,b, and 12a,b respectively.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mudRapp-seq

Data

Data sets

Computational environments

Python environments

R environment

jupyter setting (vscode)

Analyses

Data formatting

Segmentation

Strategy 1 (nuclei via cellpose, cells via watershed)

Strategy 2 (nuclei and cells via cellpose, cells manually corrected)

Spot detection

Spot analysis

Sensitivity of cDNA vs direct vRNA probing

Specificity of padlock probing on closely related strains

Sensitivity of individual padlock probes

Binding site reactivity with Nano-DMS-MaP

Sensitivity with increasing number of padlock probes

Sequencing quality control

Channel bleed-through estimation

Decoding quality

Segment analysis (seq_2nt)

2nt barcode

Bulk analysis

Temporal correlation

Single-cell analysis

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
analysis		analysis
code		code
data		data
envs		envs
figures		figures
models/cellpose		models/cellpose
.Rprofile		.Rprofile
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

BioMeDS/mudRapp-seq

Folders and files

Latest commit

History

Repository files navigation

mudRapp-seq

Data

Data sets

Computational environments

Python environments

R environment

jupyter setting (vscode)

Analyses

Data formatting

Segmentation

Strategy 1 (nuclei via cellpose, cells via watershed)

Strategy 2 (nuclei and cells via cellpose, cells manually corrected)

Spot detection

Spot analysis

Sensitivity of cDNA vs direct vRNA probing

Specificity of padlock probing on closely related strains

Sensitivity of individual padlock probes

Binding site reactivity with Nano-DMS-MaP

Sensitivity with increasing number of padlock probes

Sequencing quality control

Channel bleed-through estimation

Decoding quality

Segment analysis (seq_2nt)

2nt barcode

Bulk analysis

Temporal correlation

Single-cell analysis

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages