- Study thymic selection using rep-seq data
- Generate TCR data using OLGA software and use it as TCR data before thymic selection
- Compare gene usage between OLGA and experimental sample
- Compare k-mers usage -//-
- Study changes in chemo-physical properties of cdr3 overall and gene-pairwise
- Study selected clusters of CDR3 sequences
- Data generation. (naive_cells/notebooks/eda_keck.ipynb)
As a TCR sequnces before selection we took the sequnces generated by OLGA$^1$ software. For naive cells after selection we took KECK dataset$^2$ - adaptive biotechnologies data (10 samples, approx.$10^6$ cdr3). Has read count of one = naive. Most of the data generated were too large for github so we decided to leave only the code for its generation. - Gene usages plots. (naive_cells/notebooks/Genes_usage.ipynb) We calculated the genes usage frequences before and after thymic selection and compared it:
-
Kmers sequences. (naive_cells/notebooks/k_mers_and_general_physical_chemistry.ipynb)
We compare the k-mers frequences between OLGA and KECK samples. Maximum attention were paid to 1-mers (single aa) and 3-mers.
1-mers:
3-mers:
-
General chemo-physical proeprties. (naive_cells/notebooks/k_mers_and_general_physical_chemistry.ipynb)
We calculared key chemo-physical proeprties (length, charge, hydrophobicity) for all OLGA and KECK TCRs.
-
Kidera factors analysis. (naive_cells/notebooks/gene_pairs_analysis.ipynb)
Kidera factors - the key fetures of peptides, first introduced by Kidera et al.$^3$ . We analyzed each of 10 kidera factors genepairwise. Kideras with maximum FC were represented in Volcano plot above. The most selected Kideras: KF2, KF6 - Size; KF4 - Hydrophilicity; KF8 - Occurrence in alpha region -
Genes clusters. (naive_cells/notebooks/VDJ_tools_analysis.ipynb)
We analysed gene clusters enriched in OLGA and rare in KECK and vise versa with VDJtools$^4$ . Volcano plots for this clusters are presented below.
For all plots please see https://docs.google.com/presentation/d/12NM-7CLGjYuhLo4dROhbbfClH7IH4oHbi5svjMNscro/edit?usp=sharing
- C - strong negative selection
- Nx[S,T] - glycosylation sites - strong negative selection
- Other post-translational modification do not affect selection
- Decrease in charge and increase in hydrophobicity. Extreme length are also negatively selected
- Decrease in Kidera factors responsible for size and hydrophilicity. Increase in Kideras responsible for occurrence in alpha region
- Negative selection toward “bad” aa
- Clusters enriched in OLGA are short and R-reach
- Clusters enriched in KECK are of normal size and G-reach
- Clusters with glycosylation sites which survive after selection have J1-6 gene which is rare
- Generation of TCRs beta chain were carried out by OLGA software (1.2.4)
olga-generate_sequences --humanTRB -n 10000000
- All statistics and visualisation except clusters enrichment were done in Python 3.7 (please see requirements.txt for more details)
- Kidera factors and other phys-chemical properties were calculated in Peptides
$^5$ (0.3.2) - Clusters enrichment comparison were carried out in VDJtools (1.2.1)
vdjtools --CalcDegreeStats
with default parameters
- Zachary Sethna et al., OLGA: fast computation of generation probabilities of B- and T-cell receptor amino acid sequences and motifs, Bioinformatics, Volume 35, Issue 17, 1 September 2019, Pages 2974–2981, https://doi.org/10.1093/bioinformatics/btz035
- Emerson, R., DeWitt, W., Vignali, M. et al., Immunosequencing identifies signatures of cytomegalovirus exposure history and HLA-mediated effects on the T cell repertoire. Nat Genet 49, 659–665 (2017). https://doi.org/10.1038/ng.3822
- Kidera, A., Konishi, Y., Oka, M. et al. Statistical analysis of the physical properties of the 20 naturally occurring amino acids. J Protein Chem 4, 23–55 (1985). https://doi.org/10.1007/BF01025492
- Shugay M et al. VDJtools: Unifying Post-analysis of T Cell Receptor Repertoires. PLoS Comp Biol 2015; 11(11):e1004503-e1004503
- Peptides lib: https://pypi.org/project/peptides/