Conos alignment of all datasets

Last updated: 2020-07-09

Checks: 7 0

Knit directory: Epilepsy19/

This reproducible R Markdown analysis was created with workflowr (version 1.6.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20200706)

The command set.seed(20200706) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: cc9a53b

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version cc9a53b. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/fig_go.nb.html
    Ignored:    analysis/fig_neun.nb.html
    Ignored:    analysis/fig_overview.nb.html
    Ignored:    analysis/fig_smart_seq.nb.html
    Ignored:    analysis/fig_summary.nb.html
    Ignored:    analysis/fig_type_distance.nb.html
    Ignored:    analysis/gene_testing.nb.html
    Ignored:    analysis/prep_alignment.nb.html
    Ignored:    analysis/prep_filtration.nb.html
    Ignored:    cache/con_allen.rds
    Ignored:    cache/con_filt_cells.rds
    Ignored:    cache/con_filt_samples.rds
    Ignored:    cache/con_ss.rds
    Ignored:    cache/count_matrices.rds
    Ignored:    cache/p2s/
    Ignored:    output/

Untracked files:
    Untracked:  code/

Unstaged changes:
    Modified:   README.md
    Modified:   analysis/index.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/prep_alignment.Rmd) and HTML (docs/prep_alignment.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	83671cf	viktor_petukhov	2020-07-09	Pre-processing notebooks

# library(Epilepsy19)
library(dataorganizer)
library(magrittr)
library(Matrix)
library(pbapply)
library(conos)
library(readr)
library(tidyverse)

theme_set(theme_bw())

devtools::load_all()

annotation <- read_csv(MetadataPath("annotation.csv"))

cms_all <- CachePath("count_matrices.rds") %>% read_rds()
cms_all <- names(cms_all) %>% setNames(., .) %>% lapply(function(n)
  cms_all[[n]] %>% set_colnames(paste0(n, "_", colnames(.))) %>% 
    set_rownames(make.unique(rownames(.))))

cms_all$NeuN <- NULL

Filter cells

cms_filt <- cms_all %>% lapply(function(cm) cm[, (colnames(cm) %in% annotation$cell)])

p2s_filt <- lapply(cms_filt, GetPagoda, graph.k=10, embeding.type="UMAP", n.cores=30,
                   spread=5, min.dist=1.0, build.graph=F)

9394 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 881 overdispersed genes ... 881persisting ... done.
running PCA using 1000 OD genes .... done
6618 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 1454 overdispersed genes ... 1454persisting ... done.
running PCA using 1000 OD genes .... done
3944 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 463 overdispersed genes ... 463persisting ... done.
running PCA using 1000 OD genes .... done
7295 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 819 overdispersed genes ... 819persisting ... done.
running PCA using 1000 OD genes .... done
1389 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 1561 overdispersed genes ... 1561persisting ... done.
running PCA using 1000 OD genes .... done
217 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 66 overdispersed genes ... 66persisting ... done.
running PCA using 1000 OD genes .... done
2045 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 468 overdispersed genes ... 468persisting ... done.
running PCA using 1000 OD genes .... done
6472 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 566 overdispersed genes ... 566persisting ... done.
running PCA using 1000 OD genes .... done
6034 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 475 overdispersed genes ... 475persisting ... done.
running PCA using 1000 OD genes .... done
6074 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 577 overdispersed genes ... 577persisting ... done.
running PCA using 1000 OD genes .... done
11023 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 430 overdispersed genes ... 430persisting ... done.
running PCA using 1000 OD genes .... done
8431 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 712 overdispersed genes ... 712persisting ... done.
running PCA using 1000 OD genes .... done
4095 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 503 overdispersed genes ... 503persisting ... done.
running PCA using 1000 OD genes .... done
542 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 10 overdispersed genes ... 10persisting ... done.
running PCA using 1000 OD genes .... done
3253 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 265 overdispersed genes ... 265persisting ... done.
running PCA using 1000 OD genes .... done
2654 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 86 overdispersed genes ... 86persisting ... done.
running PCA using 1000 OD genes .... done
9780 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 471 overdispersed genes ... 471persisting ... done.
running PCA using 1000 OD genes .... done
5461 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 246 overdispersed genes ... 246persisting ... done.
running PCA using 1000 OD genes .... done
5758 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 360 overdispersed genes ... 360persisting ... done.
running PCA using 1000 OD genes .... done

con_filt <- conos::Conos$new(p2s_filt, n.cores=30)
sample_per_cell <- con_filt$getDatasetPerCell()
condition_per_sample <- ifelse(grepl("ep|HB", levels(sample_per_cell)), "epilepsy", "control") %>%
  setNames(levels(sample_per_cell))

con_filt$buildGraph(k=50, k.self=5, k.self.weight=0.1, same.factor.downweight=0.25, 
                    k.same.factor=5, balancing.factor.per.sample=condition_per_sample)

found 0 out of 171 cached PCA  space pairs ... running 171 additional PCA  space pairs  done
inter-sample links using  mNN   done
local pairs local pairs  done
building graph ..done
balancing edge weights done

con_filt$embedGraph(method="UMAP", n.cores=45, min.prob.lower=1e-5, n.neighbors=30, 
                    n.epochs=1000, spread=5, min.dist=1.0, verbose=T)

Convert graph to adjacency list...
Done
Estimate nearest neighbors and commute times...
Estimating hitting distances: 03:59:13.
Done.
Estimating commute distances: 03:59:31.
Hashing adjacency list: 03:59:31.
Done.
Estimating distances: 04:00:42.
Done
Done.
All done!: 04:01:30.
Done
Estimate UMAP embedding...

04:01:30 UMAP embedding parameters a = 0.05 b = 1.003

04:01:30 Read 100479 rows and found 1 numeric columns

04:01:31 Commencing smooth kNN distance calibration using 45 threads

04:01:36 Initializing from normalized Laplacian + noise

04:01:55 Commencing optimization for 1000 epochs, with 4150582 positive edges using 45 threads

04:02:24 Optimization finished

Done

write_rds(con_filt, CachePath("con_filt_cells.rds"))

Filter samples

bad_samples <- c("C3", "C5", "E5")

con_sf <- Conos$new(con_filt$samples[!(names(con_filt$samples) %in% bad_samples)], n.cores=30)

con_sf$buildGraph(verbose=T, var.scale=T, k=40, k.self=5, k.self.weight=0.1, 
                  balancing.factor.per.sample=condition_per_sample, k.same.factor=5, 
                  same.factor.downweight=0.25)
con_sf$findCommunities(method=conos::leiden.community, resolution=10)
con_sf$embedGraph(method="UMAP", spread=5, min.dist=1.0, n.epochs=2000, verbose=T, 
                  n.cores=30, min.prob.lower=1e-5)

write_rds(con_sf, CachePath("con_filt_samples.rds"))

Save Pagoda 2 objects

# con_sf <- read_rds(CachePath("con_filt_samples.rds"))
# con_filt <- read_rds(CachePath("con_filt_cells.rds"))

# p2_con <- Pagoda2FromConos(con_sf, n.pcs=0)
# go_env <- pagoda2::p2.generate.human.go(p2_con)
# 
# go_sets <- ExtractGoSets(go_env)

Conos

Filtered samples:

# t_anns <- lapply(annotation[2:4], setNames, annotation$cell) %>% 
#   lapply(`[`, names(p2_con$clusters$dataset))
# metadata <- ConvertMetadataToPagoda2Format(
#   AnnotationL1=t_anns$l1, AnnotationL2=t_anns$l2, 
#   AnnotationL3=t_anns$l3, Dataset=p2_con$clusters$dataset
# )
# 
# p2_web <- GetPagodaWebApp(p2_con, con_sf$clusters$leiden$groups, 
#                           additional.metadata=metadata, go.sets=go_sets, go.env=go_env)
# 
# show.app(p2_web, "Samples Filtered")
# p2_web$serializeToStaticFast(CachePath("p2s", "con_filt_samples.bin"));

Filtered cells:

# con_filt$findCommunities(method=leiden.community, resolution=10, n.iterations=10)
# p2_con_filt <- Pagoda2FromConos(con_filt, n.pcs=0)
# t_anns <- lapply(annotation[2:4], setNames, annotation$cell) %>% lapply(`[`, names(p2_con_filt$clusters$dataset))
# metadata <- ConvertMetadataToPagoda2Format(AnnotationL1=t_anns$l1, AnnotationL2=t_anns$l2,
#                                            AnnotationL3=t_anns$l3, Dataset=p2_con_filt$clusters$dataset)
# 
# p2_web_filt <- GetPagodaWebApp(p2_con_filt, con_filt$clusters$leiden$groups, additional.metadata=metadata,
#                                go.sets=go_sets, go.env=go_env)
# 
# # show.app(p2_web_filt, "Cells Filtered")
# p2_web_filt$serializeToStaticFast(CachePath("p2s", "con_filt_cells.bin"));

Individual

# p2 <- con_filt$samples[[1]]
# p2s_web <- lapply(con_filt$samples, function(p2) {
#   p2$makeKnnGraph(k=30, type="PCA", center=T, distance="cosine", weight.type="none", verbose=F)
#   p2$getKnnClusters(type="PCA", method=leiden.community, resolution=7, n.iterations=10, name="leiden")
# 
#   t_anns <- lapply(annotation[2:4], setNames, annotation$cell) %>% lapply(`[`, names(p2$clusters$PCA$leiden))
#   metadata <- ConvertMetadataToPagoda2Format(AnnotationL1=t_anns$l1, AnnotationL2=t_anns$l2, AnnotationL3=t_anns$l3)
#   GetPagodaWebApp(p2, p2$clusters$PCA$leiden, additional.metadata=metadata, go.sets=go_sets, go.env=go_env)
# })

# for (n in names(p2s_web)) {
#   p2s_web[[n]]$serializeToStaticFast(CachePath("p2s", paste0(n, ".bin")));
# }

data.frame(value=unlist(sessioninfo::platform_info()))

	value
version	R version 3.5.1 (2018-07-02)
os	Ubuntu 18.04.2 LTS
system	x86_64, linux-gnu
ui	X11
language	(EN)
collate	en_US.UTF-8
ctype	en_US.UTF-8
tz	America/New_York
date	2020-07-09

as.data.frame(sessioninfo::package_info())[c('package', 'loadedversion', 'date', 'source')]

	package	loadedversion	date	source
AnnotationDbi	AnnotationDbi	1.44.0	2018-10-30	Bioconductor
assertthat	assertthat	0.2.1	2019-03-21	CRAN (R 3.5.1)
backports	backports	1.1.5	2019-10-02	CRAN (R 3.5.1)
base64enc	base64enc	0.1-3	2015-07-28	CRAN (R 3.5.1)
beeswarm	beeswarm	0.2.3	2016-04-25	CRAN (R 3.5.1)
Biobase	Biobase	2.42.0	2018-10-30	Bioconductor
BiocGenerics	BiocGenerics	0.28.0	2018-10-30	Bioconductor
bit	bit	1.1-15.2	2020-02-10	CRAN (R 3.5.1)
bit64	bit64	0.9-7	2017-05-08	CRAN (R 3.5.1)
blob	blob	1.2.1	2020-01-20	CRAN (R 3.5.1)
brew	brew	1.0-6	2011-04-13	CRAN (R 3.5.1)
broom	broom	0.5.5	2020-02-29	CRAN (R 3.5.1)
callr	callr	3.4.2	2020-02-12	CRAN (R 3.5.1)
cellranger	cellranger	1.1.0	2016-07-27	CRAN (R 3.5.1)
cli	cli	2.0.2	2020-02-28	CRAN (R 3.5.1)
colorspace	colorspace	1.4-1	2019-03-18	CRAN (R 3.5.1)
conos	conos	1.3.0	2020-05-12	local
crayon	crayon	1.3.4	2017-09-16	CRAN (R 3.5.1)
data.table	data.table	1.12.8	2019-12-09	CRAN (R 3.5.1)
dataorganizer	dataorganizer	0.1.0	2019-11-08	local
DBI	DBI	1.1.0	2019-12-15	CRAN (R 3.5.1)
dbplyr	dbplyr	1.4.2	2019-06-17	CRAN (R 3.5.1)
dendextend	dendextend	1.13.4	2020-02-28	CRAN (R 3.5.1)
desc	desc	1.2.0	2018-05-01	CRAN (R 3.5.1)
devtools	devtools	2.2.2	2020-02-17	CRAN (R 3.5.1)
digest	digest	0.6.25	2020-02-23	CRAN (R 3.5.1)
dplyr	dplyr	0.8.5	2020-03-07	CRAN (R 3.5.1)
ellipsis	ellipsis	0.3.0	2019-09-20	CRAN (R 3.5.1)
Epilepsy19	Epilepsy19	0.0.0.9000	2019-10-15	local
evaluate	evaluate	0.14	2019-05-28	CRAN (R 3.5.1)
fansi	fansi	0.4.1	2020-01-08	CRAN (R 3.5.1)
fastmap	fastmap	1.0.1	2019-10-08	CRAN (R 3.5.1)
forcats	forcats	0.5.0	2020-03-01	CRAN (R 3.5.1)
fs	fs	1.3.2	2020-03-05	CRAN (R 3.5.1)
generics	generics	0.0.2	2018-11-29	CRAN (R 3.5.1)
ggbeeswarm	ggbeeswarm	0.6.0	2018-10-16	Github (eclarke/ggbeeswarm@fb85521)
ggplot2	ggplot2	3.3.0	2020-03-05	CRAN (R 3.5.1)
ggrastr	ggrastr	0.1.7	2018-12-04	Github (VPetukhov/ggrastr@203d5cc)
git2r	git2r	0.26.1	2019-06-29	CRAN (R 3.5.1)
glue	glue	1.3.2	2020-03-12	CRAN (R 3.5.1)
gridExtra	gridExtra	2.3	2017-09-09	CRAN (R 3.5.1)
gtable	gtable	0.3.0	2019-03-25	CRAN (R 3.5.1)
haven	haven	2.2.0	2019-11-08	CRAN (R 3.5.1)
highr	highr	0.8	2019-03-20	CRAN (R 3.5.1)
hms	hms	0.5.3	2020-01-08	CRAN (R 3.5.1)
htmltools	htmltools	0.4.0	2019-10-04	CRAN (R 3.5.1)
httpuv	httpuv	1.5.2	2019-09-11	CRAN (R 3.5.1)
httr	httr	1.4.1	2019-08-05	CRAN (R 3.5.1)
igraph	igraph	1.2.4.2	2019-11-27	CRAN (R 3.5.1)
IRanges	IRanges	2.16.0	2018-10-30	Bioconductor
irlba	irlba	2.3.3	2019-02-05	CRAN (R 3.5.1)
jsonlite	jsonlite	1.6.1	2020-02-02	CRAN (R 3.5.1)
knitr	knitr	1.28	2020-02-06	CRAN (R 3.5.1)
later	later	1.0.0	2019-10-04	CRAN (R 3.5.1)
lattice	lattice	0.20-40	2020-02-19	CRAN (R 3.5.1)
lifecycle	lifecycle	0.2.0	2020-03-06	CRAN (R 3.5.1)
lubridate	lubridate	1.7.4	2018-04-11	CRAN (R 3.5.1)
magrittr	magrittr	1.5	2014-11-22	CRAN (R 3.5.1)
MASS	MASS	7.3-51.5	2019-12-20	CRAN (R 3.5.1)
Matrix	Matrix	1.2-18	2019-11-27	CRAN (R 3.5.1)
memoise	memoise	1.1.0	2017-04-21	CRAN (R 3.5.1)
mgcv	mgcv	1.8-31	2019-11-09	CRAN (R 3.5.1)
mime	mime	0.9	2020-02-04	CRAN (R 3.5.1)
modelr	modelr	0.1.6	2020-02-22	CRAN (R 3.5.1)
munsell	munsell	0.5.0	2018-06-12	CRAN (R 3.5.1)
nlme	nlme	3.1-145	2020-03-04	CRAN (R 3.5.1)
org.Hs.eg.db	org.Hs.eg.db	3.7.0	2019-10-08	Bioconductor
pagoda2	pagoda2	0.1.1	2019-12-10	local
pbapply	pbapply	1.4-2	2019-08-31	CRAN (R 3.5.1)
pillar	pillar	1.4.3	2019-12-20	CRAN (R 3.5.1)
pkgbuild	pkgbuild	1.0.6	2019-10-09	CRAN (R 3.5.1)
pkgconfig	pkgconfig	2.0.3	2019-09-22	CRAN (R 3.5.1)
pkgload	pkgload	1.0.2	2018-10-29	CRAN (R 3.5.1)
prettyunits	prettyunits	1.1.1	2020-01-24	CRAN (R 3.5.1)
processx	processx	3.4.2	2020-02-09	CRAN (R 3.5.1)
promises	promises	1.1.0	2019-10-04	CRAN (R 3.5.1)
ps	ps	1.3.2	2020-02-13	CRAN (R 3.5.1)
purrr	purrr	0.3.3	2019-10-18	CRAN (R 3.5.1)
R6	R6	2.4.1	2019-11-12	CRAN (R 3.5.1)
Rcpp	Rcpp	1.0.4	2020-03-17	CRAN (R 3.5.1)
readr	readr	1.3.1	2018-12-21	CRAN (R 3.5.1)
readxl	readxl	1.3.1	2019-03-13	CRAN (R 3.5.1)
remotes	remotes	2.1.1	2020-02-15	CRAN (R 3.5.1)
reprex	reprex	0.3.0	2019-05-16	CRAN (R 3.5.1)
rjson	rjson	0.2.20	2018-06-08	CRAN (R 3.5.1)
rlang	rlang	0.4.5	2020-03-01	CRAN (R 3.5.1)
rmarkdown	rmarkdown	2.1	2020-01-20	CRAN (R 3.5.1)
Rook	Rook	1.1-1	2014-10-20	CRAN (R 3.5.1)
rprojroot	rprojroot	1.3-2	2018-01-03	CRAN (R 3.5.1)
RSpectra	RSpectra	0.16-0	2019-12-01	CRAN (R 3.5.1)
RSQLite	RSQLite	2.2.0	2020-01-07	CRAN (R 3.5.1)
rstudioapi	rstudioapi	0.11	2020-02-07	CRAN (R 3.5.1)
rvest	rvest	0.3.5	2019-11-08	CRAN (R 3.5.1)
S4Vectors	S4Vectors	0.20.1	2018-11-09	Bioconductor
scales	scales	1.1.0	2019-11-18	CRAN (R 3.5.1)
sccore	sccore	0.1	2020-04-24	Github (hms-dbmi/sccore@2b34b61)
sessioninfo	sessioninfo	1.1.1	2018-11-05	CRAN (R 3.5.1)
shiny	shiny	1.4.0.2	2020-03-13	CRAN (R 3.5.1)
stringi	stringi	1.4.6	2020-02-17	CRAN (R 3.5.1)
stringr	stringr	1.4.0	2019-02-10	CRAN (R 3.5.1)
testthat	testthat	2.3.2	2020-03-02	CRAN (R 3.5.1)
tibble	tibble	2.1.3	2019-06-06	CRAN (R 3.5.1)
tidyr	tidyr	1.0.2	2020-01-24	CRAN (R 3.5.1)
tidyselect	tidyselect	1.0.0	2020-01-27	CRAN (R 3.5.1)
tidyverse	tidyverse	1.3.0	2019-11-21	CRAN (R 3.5.1)
triebeard	triebeard	0.3.0	2016-08-04	CRAN (R 3.5.1)
urltools	urltools	1.7.3	2019-04-14	CRAN (R 3.5.1)
usethis	usethis	1.5.1	2019-07-04	CRAN (R 3.5.1)
uwot	uwot	0.1.8	2020-03-16	CRAN (R 3.5.1)
vctrs	vctrs	0.2.4	2020-03-10	CRAN (R 3.5.1)
vipor	vipor	0.4.5	2017-03-22	CRAN (R 3.5.1)
viridis	viridis	0.5.1	2018-03-29	CRAN (R 3.5.1)
viridisLite	viridisLite	0.3.0	2018-02-01	CRAN (R 3.5.1)
whisker	whisker	0.4	2019-08-28	CRAN (R 3.5.1)
withr	withr	2.1.2	2018-03-15	CRAN (R 3.5.1)
workflowr	workflowr	1.6.1	2020-03-11	CRAN (R 3.5.1)
xfun	xfun	0.12	2020-01-13	CRAN (R 3.5.1)
xml2	xml2	1.2.5	2020-03-11	CRAN (R 3.5.1)
xtable	xtable	1.8-4	2019-04-21	CRAN (R 3.5.1)
yaml	yaml	2.2.1	2020-02-01	CRAN (R 3.5.1)