forked from phillipnicol/scGBM
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
130 lines (89 loc) · 2.87 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
title: "Model-based Dimensionality Reduction for Single-cell RNA-seq with Generalized Bilinear Models"
author: "R package version 1.0.0"
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Installation
From the R console, `devtools::install_github("phillipnicol/scGBM")`.
## Demo
```{r}
library(scGBM)
set.seed(1126490984)
```
Generate a count matrix that is random Poisson noise:
```{r, echo}
I <- 500
J <- 500
Y <- matrix(rpois(I*J,lambda=1),nrow=I,ncol=J)
colnames(Y) <- 1:J; rownames(Y) <- 1:I
```
Run scGBM with $M = 10$ latent factors
```{r}
out <- gbm.sc(Y,M=10)
```
To use the projection method (faster version based on subsampling), use
```{r, eval=FALSE}
##Specify subsample size and number of cores
out.proj <- gbm.sc(Y,M=10,subset=100,ncores=8)
```
Cluster the cell scores using Seurat
```{r, echo=T, eval=F}
library(Seurat)
Sco <- CreateSeuratObject(counts=Y)
colnames(out$V) <- 1:10
Sco[["gbm"]] <- CreateDimReducObject(embeddings=out$V,key="GBM_")
Sco <- FindNeighbors(Sco,reduction = "gbm")
Sco <- FindClusters(Sco)
```
```{r, include=F}
library(Seurat)
Sco <- CreateSeuratObject(counts=Y)
colnames(out$V) <- 1:10
Sco[["gbm"]] <- CreateDimReducObject(embeddings=out$V,key="GBM_")
Sco <- FindNeighbors(Sco,reduction = "gbm")
Sco <- FindClusters(Sco)
```
Plot the scores and color by the assigned clustering:
```{r}
plot_gbm(out, cluster=Sco$seurat_clusters)
```
Quantify the uncertainty in the low dimensional embedding:
```{r}
out <- get.se(out)
## Standard errors of V and U are now in the list
head(out$se_V)
```
You can visualize the uncertainty with ellipses around the points
```{r}
plot_gbm(out, cluster=Sco$seurat_clusters, se=TRUE)
```
Now we evaluate cluster stability using the cluster confidence index. First we need to define a function that takes as input a set of simulated scores $\tilde{V}$ and returns a new clustering:
```{r}
cluster_fn <- function(V,Y) {
Sco <- CreateSeuratObject(Y)
colnames(V) <- 1:ncol(V)
Sco[["gbm"]] <- CreateDimReducObject(embeddings=V,key="GBM_")
Sco <- FindNeighbors(Sco,reduction = "gbm")
Sco <- FindClusters(Sco)
as.vector(Sco$seurat_clusters)
}
```
Now we can run the CCI function. Here we set `reps=10` to make it fast but `reps=100` (or higher) is recommended on real analyses.
```{r, echo=T, eval=F}
cci <- CCI(out,cluster.orig=Sco$seurat_clusters, reps=10, cluster.fn = cluster_fn, Y=Y)
```
```{r, include=F}
cci <- CCI(out,cluster.orig=Sco$seurat_clusters, reps=10, cluster.fn = cluster_fn, Y=Y)
```
```{r}
pheatmap::pheatmap(cci$H.table,legend=TRUE, color=colorRampPalette(c("white","red"))(100),
breaks=seq(0,1,by=0.01),
rownames=TRUE,
colnames=TRUE)
#Just the diagonal
cci$cci_diagonal
```
The heatmap shows there is significant overlap between the clusters. This makes sense because the data was simulated to have no latent variability.