Last updated: 2020-07-09

Checks: 7 0

Knit directory: Epilepsy19/

This reproducible R Markdown analysis was created with workflowr (version 1.6.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20200706) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version cc9a53b. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/fig_go.nb.html
    Ignored:    analysis/fig_neun.nb.html
    Ignored:    analysis/fig_overview.nb.html
    Ignored:    analysis/fig_smart_seq.nb.html
    Ignored:    analysis/fig_summary.nb.html
    Ignored:    analysis/fig_type_distance.nb.html
    Ignored:    analysis/gene_testing.nb.html
    Ignored:    analysis/prep_alignment.nb.html
    Ignored:    analysis/prep_filtration.nb.html
    Ignored:    cache/con_allen.rds
    Ignored:    cache/con_filt_cells.rds
    Ignored:    cache/con_filt_samples.rds
    Ignored:    cache/con_ss.rds
    Ignored:    cache/count_matrices.rds
    Ignored:    cache/p2s/
    Ignored:    output/

Untracked files:
    Untracked:  code/

Unstaged changes:
    Modified:   README.md
    Modified:   analysis/index.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/prep_alignment.Rmd) and HTML (docs/prep_alignment.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 83671cf viktor_petukhov 2020-07-09 Pre-processing notebooks

# library(Epilepsy19)
library(dataorganizer)
library(magrittr)
library(Matrix)
library(pbapply)
library(conos)
library(readr)
library(tidyverse)

theme_set(theme_bw())

devtools::load_all()
annotation <- read_csv(MetadataPath("annotation.csv"))

cms_all <- CachePath("count_matrices.rds") %>% read_rds()
cms_all <- names(cms_all) %>% setNames(., .) %>% lapply(function(n)
  cms_all[[n]] %>% set_colnames(paste0(n, "_", colnames(.))) %>% 
    set_rownames(make.unique(rownames(.))))

cms_all$NeuN <- NULL

Filter cells

cms_filt <- cms_all %>% lapply(function(cm) cm[, (colnames(cm) %in% annotation$cell)])

p2s_filt <- lapply(cms_filt, GetPagoda, graph.k=10, embeding.type="UMAP", n.cores=30,
                   spread=5, min.dist=1.0, build.graph=F)
9394 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 881 overdispersed genes ... 881persisting ... done.
running PCA using 1000 OD genes .... done
6618 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 1454 overdispersed genes ... 1454persisting ... done.
running PCA using 1000 OD genes .... done
3944 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 463 overdispersed genes ... 463persisting ... done.
running PCA using 1000 OD genes .... done
7295 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 819 overdispersed genes ... 819persisting ... done.
running PCA using 1000 OD genes .... done
1389 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 1561 overdispersed genes ... 1561persisting ... done.
running PCA using 1000 OD genes .... done
217 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 66 overdispersed genes ... 66persisting ... done.
running PCA using 1000 OD genes .... done
2045 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 468 overdispersed genes ... 468persisting ... done.
running PCA using 1000 OD genes .... done
6472 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 566 overdispersed genes ... 566persisting ... done.
running PCA using 1000 OD genes .... done
6034 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 475 overdispersed genes ... 475persisting ... done.
running PCA using 1000 OD genes .... done
6074 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 577 overdispersed genes ... 577persisting ... done.
running PCA using 1000 OD genes .... done
11023 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 430 overdispersed genes ... 430persisting ... done.
running PCA using 1000 OD genes .... done
8431 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 712 overdispersed genes ... 712persisting ... done.
running PCA using 1000 OD genes .... done
4095 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 503 overdispersed genes ... 503persisting ... done.
running PCA using 1000 OD genes .... done
542 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 10 overdispersed genes ... 10persisting ... done.
running PCA using 1000 OD genes .... done
3253 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 265 overdispersed genes ... 265persisting ... done.
running PCA using 1000 OD genes .... done
2654 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 86 overdispersed genes ... 86persisting ... done.
running PCA using 1000 OD genes .... done
9780 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 471 overdispersed genes ... 471persisting ... done.
running PCA using 1000 OD genes .... done
5461 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 246 overdispersed genes ... 246persisting ... done.
running PCA using 1000 OD genes .... done
5758 cells, 33681 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 360 overdispersed genes ... 360persisting ... done.
running PCA using 1000 OD genes .... done
con_filt <- conos::Conos$new(p2s_filt, n.cores=30)
sample_per_cell <- con_filt$getDatasetPerCell()
condition_per_sample <- ifelse(grepl("ep|HB", levels(sample_per_cell)), "epilepsy", "control") %>%
  setNames(levels(sample_per_cell))

con_filt$buildGraph(k=50, k.self=5, k.self.weight=0.1, same.factor.downweight=0.25, 
                    k.same.factor=5, balancing.factor.per.sample=condition_per_sample)
found 0 out of 171 cached PCA  space pairs ... running 171 additional PCA  space pairs  done
inter-sample links using  mNN   done
local pairs local pairs  done
building graph ..done
balancing edge weights done
con_filt$embedGraph(method="UMAP", n.cores=45, min.prob.lower=1e-5, n.neighbors=30, 
                    n.epochs=1000, spread=5, min.dist=1.0, verbose=T)
Convert graph to adjacency list...
Done
Estimate nearest neighbors and commute times...
Estimating hitting distances: 03:59:13.
Done.
Estimating commute distances: 03:59:31.
Hashing adjacency list: 03:59:31.
Done.
Estimating distances: 04:00:42.
Done
Done.
All done!: 04:01:30.
Done
Estimate UMAP embedding...
04:01:30 UMAP embedding parameters a = 0.05 b = 1.003
04:01:30 Read 100479 rows and found 1 numeric columns
04:01:31 Commencing smooth kNN distance calibration using 45 threads
04:01:36 Initializing from normalized Laplacian + noise
04:01:55 Commencing optimization for 1000 epochs, with 4150582 positive edges using 45 threads
04:02:24 Optimization finished
Done
write_rds(con_filt, CachePath("con_filt_cells.rds"))

Filter samples

bad_samples <- c("C3", "C5", "E5")

con_sf <- Conos$new(con_filt$samples[!(names(con_filt$samples) %in% bad_samples)], n.cores=30)

con_sf$buildGraph(verbose=T, var.scale=T, k=40, k.self=5, k.self.weight=0.1, 
                  balancing.factor.per.sample=condition_per_sample, k.same.factor=5, 
                  same.factor.downweight=0.25)
con_sf$findCommunities(method=conos::leiden.community, resolution=10)
con_sf$embedGraph(method="UMAP", spread=5, min.dist=1.0, n.epochs=2000, verbose=T, 
                  n.cores=30, min.prob.lower=1e-5)
write_rds(con_sf, CachePath("con_filt_samples.rds"))

Save Pagoda 2 objects

# con_sf <- read_rds(CachePath("con_filt_samples.rds"))
# con_filt <- read_rds(CachePath("con_filt_cells.rds"))
# p2_con <- Pagoda2FromConos(con_sf, n.pcs=0)
# go_env <- pagoda2::p2.generate.human.go(p2_con)
# 
# go_sets <- ExtractGoSets(go_env)

Conos

Filtered samples:

# t_anns <- lapply(annotation[2:4], setNames, annotation$cell) %>% 
#   lapply(`[`, names(p2_con$clusters$dataset))
# metadata <- ConvertMetadataToPagoda2Format(
#   AnnotationL1=t_anns$l1, AnnotationL2=t_anns$l2, 
#   AnnotationL3=t_anns$l3, Dataset=p2_con$clusters$dataset
# )
# 
# p2_web <- GetPagodaWebApp(p2_con, con_sf$clusters$leiden$groups, 
#                           additional.metadata=metadata, go.sets=go_sets, go.env=go_env)
# 
# show.app(p2_web, "Samples Filtered")
# p2_web$serializeToStaticFast(CachePath("p2s", "con_filt_samples.bin"));

Filtered cells:

# con_filt$findCommunities(method=leiden.community, resolution=10, n.iterations=10)
# p2_con_filt <- Pagoda2FromConos(con_filt, n.pcs=0)
# t_anns <- lapply(annotation[2:4], setNames, annotation$cell) %>% lapply(`[`, names(p2_con_filt$clusters$dataset))
# metadata <- ConvertMetadataToPagoda2Format(AnnotationL1=t_anns$l1, AnnotationL2=t_anns$l2,
#                                            AnnotationL3=t_anns$l3, Dataset=p2_con_filt$clusters$dataset)
# 
# p2_web_filt <- GetPagodaWebApp(p2_con_filt, con_filt$clusters$leiden$groups, additional.metadata=metadata,
#                                go.sets=go_sets, go.env=go_env)
# 
# # show.app(p2_web_filt, "Cells Filtered")
# p2_web_filt$serializeToStaticFast(CachePath("p2s", "con_filt_cells.bin"));

Individual

# p2 <- con_filt$samples[[1]]
# p2s_web <- lapply(con_filt$samples, function(p2) {
#   p2$makeKnnGraph(k=30, type="PCA", center=T, distance="cosine", weight.type="none", verbose=F)
#   p2$getKnnClusters(type="PCA", method=leiden.community, resolution=7, n.iterations=10, name="leiden")
# 
#   t_anns <- lapply(annotation[2:4], setNames, annotation$cell) %>% lapply(`[`, names(p2$clusters$PCA$leiden))
#   metadata <- ConvertMetadataToPagoda2Format(AnnotationL1=t_anns$l1, AnnotationL2=t_anns$l2, AnnotationL3=t_anns$l3)
#   GetPagodaWebApp(p2, p2$clusters$PCA$leiden, additional.metadata=metadata, go.sets=go_sets, go.env=go_env)
# })
# for (n in names(p2s_web)) {
#   p2s_web[[n]]$serializeToStaticFast(CachePath("p2s", paste0(n, ".bin")));
# }

data.frame(value=unlist(sessioninfo::platform_info()))
value
version R version 3.5.1 (2018-07-02)
os Ubuntu 18.04.2 LTS
system x86_64, linux-gnu
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2020-07-09
as.data.frame(sessioninfo::package_info())[c('package', 'loadedversion', 'date', 'source')]
package loadedversion date source
AnnotationDbi AnnotationDbi 1.44.0 2018-10-30 Bioconductor
assertthat assertthat 0.2.1 2019-03-21 CRAN (R 3.5.1)
backports backports 1.1.5 2019-10-02 CRAN (R 3.5.1)
base64enc base64enc 0.1-3 2015-07-28 CRAN (R 3.5.1)
beeswarm beeswarm 0.2.3 2016-04-25 CRAN (R 3.5.1)
Biobase Biobase 2.42.0 2018-10-30 Bioconductor
BiocGenerics BiocGenerics 0.28.0 2018-10-30 Bioconductor
bit bit 1.1-15.2 2020-02-10 CRAN (R 3.5.1)
bit64 bit64 0.9-7 2017-05-08 CRAN (R 3.5.1)
blob blob 1.2.1 2020-01-20 CRAN (R 3.5.1)
brew brew 1.0-6 2011-04-13 CRAN (R 3.5.1)
broom broom 0.5.5 2020-02-29 CRAN (R 3.5.1)
callr callr 3.4.2 2020-02-12 CRAN (R 3.5.1)
cellranger cellranger 1.1.0 2016-07-27 CRAN (R 3.5.1)
cli cli 2.0.2 2020-02-28 CRAN (R 3.5.1)
colorspace colorspace 1.4-1 2019-03-18 CRAN (R 3.5.1)
conos conos 1.3.0 2020-05-12 local
crayon crayon 1.3.4 2017-09-16 CRAN (R 3.5.1)
data.table data.table 1.12.8 2019-12-09 CRAN (R 3.5.1)
dataorganizer dataorganizer 0.1.0 2019-11-08 local
DBI DBI 1.1.0 2019-12-15 CRAN (R 3.5.1)
dbplyr dbplyr 1.4.2 2019-06-17 CRAN (R 3.5.1)
dendextend dendextend 1.13.4 2020-02-28 CRAN (R 3.5.1)
desc desc 1.2.0 2018-05-01 CRAN (R 3.5.1)
devtools devtools 2.2.2 2020-02-17 CRAN (R 3.5.1)
digest digest 0.6.25 2020-02-23 CRAN (R 3.5.1)
dplyr dplyr 0.8.5 2020-03-07 CRAN (R 3.5.1)
ellipsis ellipsis 0.3.0 2019-09-20 CRAN (R 3.5.1)
Epilepsy19 Epilepsy19 0.0.0.9000 2019-10-15 local
evaluate evaluate 0.14 2019-05-28 CRAN (R 3.5.1)
fansi fansi 0.4.1 2020-01-08 CRAN (R 3.5.1)
fastmap fastmap 1.0.1 2019-10-08 CRAN (R 3.5.1)
forcats forcats 0.5.0 2020-03-01 CRAN (R 3.5.1)
fs fs 1.3.2 2020-03-05 CRAN (R 3.5.1)
generics generics 0.0.2 2018-11-29 CRAN (R 3.5.1)
ggbeeswarm ggbeeswarm 0.6.0 2018-10-16 Github ()
ggplot2 ggplot2 3.3.0 2020-03-05 CRAN (R 3.5.1)
ggrastr ggrastr 0.1.7 2018-12-04 Github ()
git2r git2r 0.26.1 2019-06-29 CRAN (R 3.5.1)
glue glue 1.3.2 2020-03-12 CRAN (R 3.5.1)
gridExtra gridExtra 2.3 2017-09-09 CRAN (R 3.5.1)
gtable gtable 0.3.0 2019-03-25 CRAN (R 3.5.1)
haven haven 2.2.0 2019-11-08 CRAN (R 3.5.1)
highr highr 0.8 2019-03-20 CRAN (R 3.5.1)
hms hms 0.5.3 2020-01-08 CRAN (R 3.5.1)
htmltools htmltools 0.4.0 2019-10-04 CRAN (R 3.5.1)
httpuv httpuv 1.5.2 2019-09-11 CRAN (R 3.5.1)
httr httr 1.4.1 2019-08-05 CRAN (R 3.5.1)
igraph igraph 1.2.4.2 2019-11-27 CRAN (R 3.5.1)
IRanges IRanges 2.16.0 2018-10-30 Bioconductor
irlba irlba 2.3.3 2019-02-05 CRAN (R 3.5.1)
jsonlite jsonlite 1.6.1 2020-02-02 CRAN (R 3.5.1)
knitr knitr 1.28 2020-02-06 CRAN (R 3.5.1)
later later 1.0.0 2019-10-04 CRAN (R 3.5.1)
lattice lattice 0.20-40 2020-02-19 CRAN (R 3.5.1)
lifecycle lifecycle 0.2.0 2020-03-06 CRAN (R 3.5.1)
lubridate lubridate 1.7.4 2018-04-11 CRAN (R 3.5.1)
magrittr magrittr 1.5 2014-11-22 CRAN (R 3.5.1)
MASS MASS 7.3-51.5 2019-12-20 CRAN (R 3.5.1)
Matrix Matrix 1.2-18 2019-11-27 CRAN (R 3.5.1)
memoise memoise 1.1.0 2017-04-21 CRAN (R 3.5.1)
mgcv mgcv 1.8-31 2019-11-09 CRAN (R 3.5.1)
mime mime 0.9 2020-02-04 CRAN (R 3.5.1)
modelr modelr 0.1.6 2020-02-22 CRAN (R 3.5.1)
munsell munsell 0.5.0 2018-06-12 CRAN (R 3.5.1)
nlme nlme 3.1-145 2020-03-04 CRAN (R 3.5.1)
org.Hs.eg.db org.Hs.eg.db 3.7.0 2019-10-08 Bioconductor
pagoda2 pagoda2 0.1.1 2019-12-10 local
pbapply pbapply 1.4-2 2019-08-31 CRAN (R 3.5.1)
pillar pillar 1.4.3 2019-12-20 CRAN (R 3.5.1)
pkgbuild pkgbuild 1.0.6 2019-10-09 CRAN (R 3.5.1)
pkgconfig pkgconfig 2.0.3 2019-09-22 CRAN (R 3.5.1)
pkgload pkgload 1.0.2 2018-10-29 CRAN (R 3.5.1)
prettyunits prettyunits 1.1.1 2020-01-24 CRAN (R 3.5.1)
processx processx 3.4.2 2020-02-09 CRAN (R 3.5.1)
promises promises 1.1.0 2019-10-04 CRAN (R 3.5.1)
ps ps 1.3.2 2020-02-13 CRAN (R 3.5.1)
purrr purrr 0.3.3 2019-10-18 CRAN (R 3.5.1)
R6 R6 2.4.1 2019-11-12 CRAN (R 3.5.1)
Rcpp Rcpp 1.0.4 2020-03-17 CRAN (R 3.5.1)
readr readr 1.3.1 2018-12-21 CRAN (R 3.5.1)
readxl readxl 1.3.1 2019-03-13 CRAN (R 3.5.1)
remotes remotes 2.1.1 2020-02-15 CRAN (R 3.5.1)
reprex reprex 0.3.0 2019-05-16 CRAN (R 3.5.1)
rjson rjson 0.2.20 2018-06-08 CRAN (R 3.5.1)
rlang rlang 0.4.5 2020-03-01 CRAN (R 3.5.1)
rmarkdown rmarkdown 2.1 2020-01-20 CRAN (R 3.5.1)
Rook Rook 1.1-1 2014-10-20 CRAN (R 3.5.1)
rprojroot rprojroot 1.3-2 2018-01-03 CRAN (R 3.5.1)
RSpectra RSpectra 0.16-0 2019-12-01 CRAN (R 3.5.1)
RSQLite RSQLite 2.2.0 2020-01-07 CRAN (R 3.5.1)
rstudioapi rstudioapi 0.11 2020-02-07 CRAN (R 3.5.1)
rvest rvest 0.3.5 2019-11-08 CRAN (R 3.5.1)
S4Vectors S4Vectors 0.20.1 2018-11-09 Bioconductor
scales scales 1.1.0 2019-11-18 CRAN (R 3.5.1)
sccore sccore 0.1 2020-04-24 Github ()
sessioninfo sessioninfo 1.1.1 2018-11-05 CRAN (R 3.5.1)
shiny shiny 1.4.0.2 2020-03-13 CRAN (R 3.5.1)
stringi stringi 1.4.6 2020-02-17 CRAN (R 3.5.1)
stringr stringr 1.4.0 2019-02-10 CRAN (R 3.5.1)
testthat testthat 2.3.2 2020-03-02 CRAN (R 3.5.1)
tibble tibble 2.1.3 2019-06-06 CRAN (R 3.5.1)
tidyr tidyr 1.0.2 2020-01-24 CRAN (R 3.5.1)
tidyselect tidyselect 1.0.0 2020-01-27 CRAN (R 3.5.1)
tidyverse tidyverse 1.3.0 2019-11-21 CRAN (R 3.5.1)
triebeard triebeard 0.3.0 2016-08-04 CRAN (R 3.5.1)
urltools urltools 1.7.3 2019-04-14 CRAN (R 3.5.1)
usethis usethis 1.5.1 2019-07-04 CRAN (R 3.5.1)
uwot uwot 0.1.8 2020-03-16 CRAN (R 3.5.1)
vctrs vctrs 0.2.4 2020-03-10 CRAN (R 3.5.1)
vipor vipor 0.4.5 2017-03-22 CRAN (R 3.5.1)
viridis viridis 0.5.1 2018-03-29 CRAN (R 3.5.1)
viridisLite viridisLite 0.3.0 2018-02-01 CRAN (R 3.5.1)
whisker whisker 0.4 2019-08-28 CRAN (R 3.5.1)
withr withr 2.1.2 2018-03-15 CRAN (R 3.5.1)
workflowr workflowr 1.6.1 2020-03-11 CRAN (R 3.5.1)
xfun xfun 0.12 2020-01-13 CRAN (R 3.5.1)
xml2 xml2 1.2.5 2020-03-11 CRAN (R 3.5.1)
xtable xtable 1.8-4 2019-04-21 CRAN (R 3.5.1)
yaml yaml 2.2.1 2020-02-01 CRAN (R 3.5.1)