vignettes/tenx-unfiltered-pbmc4k.Rmd
tenx-unfiltered-pbmc4k.Rmd
Here, we describe a brief analysis of the peripheral blood mononuclear cell (PBMC) dataset from 10X Genomics [@zheng2017massively]. The data are publicly available from the 10X Genomics website, from which we download the raw gene/barcode count matrices, i.e., before cell calling from the CellRanger pipeline.
This tutorial is edited and extracted from the OSCA Unfiltered human PBMCs workflow tenx.unfiltered-pbmc4k.Rmd
library(DropletTestFiles)
raw.path <- getTestFile("tenx-2.1.0-pbmc4k/1.0.0/raw.tar.gz")
out.path <- file.path(tempdir(), "pbmc4k")
untar(raw.path, exdir=out.path)
library(DropletUtils)
fname <- file.path(out.path, "raw_gene_bc_matrices/GRCh38")
sce.pbmc <- read10xCounts(fname, col.names=TRUE)
library(scater)
rownames(sce.pbmc) <- uniquifyFeatureNames(
rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol)
library(EnsDb.Hsapiens.v86)
location <- mapIds(EnsDb.Hsapiens.v86, keys=rowData(sce.pbmc)$ID,
column="SEQNAME", keytype="GENEID")
We perform cell detection using the emptyDrops()
algorithm, as discussed in the book Advanced
Single-Cell Analysis with Bioconductor.
set.seed(100)
e.out <- emptyDrops(counts(sce.pbmc))
sce.pbmc <- sce.pbmc[,which(e.out$FDR <= 0.001)]
unfiltered <- sce.pbmc
We use a relaxed QC strategy and only remove cells with large mitochondrial proportions, using it as a proxy for cell damage. This reduces the risk of removing cell types with low RNA content, especially in a heterogeneous PBMC population with many different cell types.
stats <- perCellQCMetrics(sce.pbmc, subsets=list(Mito=which(location=="MT")))
high.mito <- isOutlier(stats$subsets_Mito_percent, type="higher")
sce.pbmc <- sce.pbmc[,!high.mito]
summary(high.mito)
## Mode FALSE TRUE
## logical 3985 315
colData(unfiltered) <- cbind(colData(unfiltered), stats)
unfiltered$discard <- high.mito
gridExtra::grid.arrange(
plotColData(unfiltered, y="sum", colour_by="discard") +
scale_y_log10() + ggtitle("Total count"),
plotColData(unfiltered, y="detected", colour_by="discard") +
scale_y_log10() + ggtitle("Detected features"),
plotColData(unfiltered, y="subsets_Mito_percent",
colour_by="discard") + ggtitle("Mito percent"),
ncol=2
)
plotColData(unfiltered, x="sum", y="subsets_Mito_percent",
colour_by="discard") + scale_x_log10()
library(scran)
set.seed(1000)
clusters <- quickCluster(sce.pbmc)
sce.pbmc <- computeSumFactors(sce.pbmc, cluster=clusters)
sce.pbmc <- logNormCounts(sce.pbmc)
summary(sizeFactors(sce.pbmc))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00749 0.71207 0.87490 1.00000 1.09900 12.25412
plot(librarySizeFactors(sce.pbmc), sizeFactors(sce.pbmc), pch=16,
xlab="Library size factors", ylab="Deconvolution factors", log="xy")
set.seed(1001)
dec.pbmc <- modelGeneVarByPoisson(sce.pbmc)
top.pbmc <- getTopHVGs(dec.pbmc, prop=0.1)
plot(dec.pbmc$mean, dec.pbmc$total, pch=16, cex=0.5,
xlab="Mean of log-expression", ylab="Variance of log-expression")
curfit <- metadata(dec.pbmc)
curve(curfit$trend(x), col='dodgerblue', add=TRUE, lwd=2)
set.seed(10000)
sce.pbmc <- denoisePCA(sce.pbmc, subset.row=top.pbmc, technical=dec.pbmc)
set.seed(100000)
sce.pbmc <- runTSNE(sce.pbmc, dimred="PCA")
set.seed(1000000)
sce.pbmc <- runUMAP(sce.pbmc, dimred="PCA")
We verify that a reasonable number of PCs is retained.
ncol(reducedDim(sce.pbmc, "PCA"))
## [1] 9
g <- buildSNNGraph(sce.pbmc, k=10, use.dimred = 'PCA')
clust <- igraph::cluster_walktrap(g)$membership
colLabels(sce.pbmc) <- factor(clust)
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 205 508 541 56 374 125 46 432 302 867 47 155 166 61 84 16
plotTSNE(sce.pbmc, colour_by="label")
markers <- findMarkers(sce.pbmc, pval.type="some", direction="up")
We examine the markers for cluster 8 in more detail. High expression of CD14, CD68 and MNDA combined with low expression of CD16 suggests that this cluster contains monocytes, compared to macrophages in cluster 15 (Figure @ref(fig:unref-mono-pbmc-markers)).
marker.set <- markers[["8"]]
as.data.frame(marker.set[1:30,1:3])
## p.value FDR summary.logFC
## CSTA 7.170624e-222 2.015964e-217 2.4178954
## MNDA 1.196631e-221 2.015964e-217 2.6614935
## FCN1 2.375980e-213 2.668543e-209 2.6380934
## S100A12 4.393470e-212 3.700839e-208 3.0808902
## VCAN 1.711043e-199 1.153038e-195 2.2603760
## TYMP 1.173532e-154 6.590164e-151 2.0237930
## AIF1 3.673649e-149 1.768285e-145 2.4603604
## LGALS2 4.004740e-137 1.686696e-133 1.8927606
## MS4A6A 5.639909e-134 2.111457e-130 1.5457061
## FGL2 2.044513e-124 6.888781e-121 1.3859366
## RP11-1143G9.4 6.891551e-122 2.110945e-118 2.8042347
## AP1S2 1.786019e-112 5.014842e-109 1.7703547
## CD14 1.195352e-110 3.098169e-107 1.4259764
## CFD 6.870490e-109 1.653531e-105 1.3560255
## GPX1 9.048825e-107 2.032607e-103 2.4013937
## TNFSF13B 3.920319e-95 8.255701e-92 1.1151275
## KLF4 3.309726e-94 6.559876e-91 1.2049050
## GRN 4.801206e-91 8.987324e-88 1.3814668
## NAMPT 2.489624e-90 4.415020e-87 1.1438687
## CLEC7A 7.736088e-88 1.303299e-84 1.0616120
## S100A8 3.124930e-84 5.013875e-81 4.8051993
## SERPINA1 1.580359e-82 2.420392e-79 1.3842689
## CD36 8.018347e-79 1.174653e-75 1.0538169
## MPEG1 8.481588e-79 1.190744e-75 0.9778095
## CD68 5.118714e-78 6.898798e-75 0.9481203
## CYBB 1.200516e-77 1.555776e-74 1.0300245
## S100A11 1.174556e-72 1.465759e-69 1.8962486
## RBP7 2.467027e-71 2.968714e-68 0.9666127
## BLVRB 3.762610e-71 4.371634e-68 0.9701168
## CD302 9.859086e-71 1.107307e-67 0.8792077
plotExpression(sce.pbmc, features=c("CD14", "CD68",
"MNDA", "FCGR3A"), x="label", colour_by="label")
R version 4.2.2 (2022-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 22.04.1 LTS
Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats4 stats graphics grDevices utils
datasets methods
[8] base
other attached packages: [1] scran_1.26.0
EnsDb.Hsapiens.v86_2.99.0
[3] ensembldb_2.22.0 AnnotationFilter_1.22.0
[5] GenomicFeatures_1.50.2 AnnotationDbi_1.60.0
[7] scater_1.26.1 ggplot2_3.4.0
[9] scuttle_1.8.1 DropletUtils_1.18.1
[11] SingleCellExperiment_1.20.0 SummarizedExperiment_1.28.0 [13]
Biobase_2.58.0 GenomicRanges_1.50.1
[15] GenomeInfoDb_1.34.4 IRanges_2.32.0
[17] S4Vectors_0.36.0 BiocGenerics_0.44.0
[19] MatrixGenerics_1.10.0 matrixStats_0.63.0
[21] DropletTestFiles_1.8.0 knitr_1.41
loaded via a namespace (and not attached): [1] AnnotationHub_3.6.0
BiocFileCache_2.6.0
[3] systemfonts_1.0.4 igraph_1.3.5
[5] lazyeval_0.2.2 BiocParallel_1.32.4
[7] digest_0.6.30 htmltools_0.5.3
[9] viridis_0.6.2 fansi_1.0.3
[11] magrittr_2.0.3 memoise_2.0.1
[13] ScaledMatrix_1.6.0 cluster_2.1.4
[15] limma_3.54.0 Biostrings_2.66.0
[17] R.utils_2.12.2 pkgdown_2.0.6
[19] prettyunits_1.1.1 colorspace_2.0-3
[21] blob_1.2.3 rappdirs_0.3.3
[23] ggrepel_0.9.2 textshaping_0.3.6
[25] xfun_0.35 dplyr_1.0.10
[27] crayon_1.5.2 RCurl_1.98-1.9
[29] jsonlite_1.8.3 glue_1.6.2
[31] gtable_0.3.1 zlibbioc_1.44.0
[33] XVector_0.38.0 DelayedArray_0.24.0
[35] BiocSingular_1.14.0 Rhdf5lib_1.20.0
[37] HDF5Array_1.26.0 scales_1.2.1
[39] DBI_1.1.3 edgeR_3.40.0
[41] Rcpp_1.0.9 viridisLite_0.4.1
[43] xtable_1.8-4 progress_1.2.2
[45] dqrng_0.3.0 bit_4.0.5
[47] rsvd_1.0.5 metapod_1.6.0
[49] httr_1.4.4 FNN_1.1.3.1
[51] ellipsis_0.3.2 farver_2.1.1
[53] pkgconfig_2.0.3 XML_3.99-0.12
[55] R.methodsS3_1.8.2 uwot_0.1.14
[57] sass_0.4.4 dbplyr_2.2.1
[59] locfit_1.5-9.6 utf8_1.2.2
[61] labeling_0.4.2 tidyselect_1.2.0
[63] rlang_1.0.6 later_1.3.0
[65] munsell_0.5.0 BiocVersion_3.16.0
[67] tools_4.2.2 cachem_1.0.6
[69] cli_3.4.1 generics_0.1.3
[71] RSQLite_2.2.19 ExperimentHub_2.6.0
[73] evaluate_0.18 stringr_1.4.1
[75] fastmap_1.1.0 yaml_2.3.6
[77] ragg_1.2.4 bit64_4.0.5
[79] fs_1.5.2 purrr_0.3.5
[81] KEGGREST_1.38.0 sparseMatrixStats_1.10.0
[83] mime_0.12 R.oo_1.25.0
[85] xml2_1.3.3 biomaRt_2.54.0
[87] compiler_4.2.2 beeswarm_0.4.0
[89] filelock_1.0.2 curl_4.3.3
[91] png_0.1-8 interactiveDisplayBase_1.36.0 [93] statmod_1.4.37
tibble_3.1.8
[95] bslib_0.4.1 stringi_1.7.8
[97] highr_0.9 desc_1.4.2
[99] bluster_1.8.0 lattice_0.20-45
[101] ProtGenerics_1.30.0 Matrix_1.5-3
[103] vctrs_0.5.1 pillar_1.8.1
[105] lifecycle_1.0.3 rhdf5filters_1.10.0
[107] BiocManager_1.30.19 jquerylib_0.1.4
[109] BiocNeighbors_1.16.0 bitops_1.0-7
[111] irlba_2.3.5.1 httpuv_1.6.6
[113] rtracklayer_1.58.0 R6_2.5.1
[115] BiocIO_1.8.0 promises_1.2.0.1
[117] gridExtra_2.3 vipor_0.4.5
[119] codetools_0.2-18 assertthat_0.2.1
[121] rhdf5_2.42.0 rprojroot_2.0.3
[123] rjson_0.2.21 withr_2.5.0
[125] GenomicAlignments_1.34.0 Rsamtools_2.14.0
[127] GenomeInfoDbData_1.2.9 parallel_4.2.2
[129] hms_1.1.2 grid_4.2.2
[131] beachmat_2.14.0 rmarkdown_2.18
[133] DelayedMatrixStats_1.20.0 Rtsne_0.16
[135] shiny_1.7.3 ggbeeswarm_0.6.0
[137] restfulr_0.0.15