“SummarizedExperiment” and the historical “ExpressionSet” are S4 objects storing high throughput omics data. The core component of the objects is an expression matrix, where the rows are features, such as genes, proteins, and columns are samples. The values in the matrix represent the abundance or presence/absence of features. The meta-information about features (rows) and samples (columns) are stored in data.frames-like object called “feature data” (or “row data”) and “phenotype data” (or "col data), respectively. More detailed instructions of ExpressionSet and SummarizeExperiment could be found here and here.
omicsViewer visualizes ExpressionSet/SummarizedExperiment in an interactive way to facilitate data exploration. The object to be visualized needs to be prepared in R, usually, it contains all the necessary information for the data interpretation, e.g. functional annotation of features, meta information of samples, and the statistical results (See Prepare the ExpressionSet). Using R/Bioconductor in the data preparation stage guarantees maximum flexibility in the statistical analysis. Once the object is prepared, it can be visualized using the omicsViewer interface, implemented using shiny and plotly. At this stage, coding in R is not required, therefore, it can be shared with anyone who does not have experience with any programming knowledge.
Both features and samples could be selected from (data) tables or graphs (scatter plot/heatmap) in the omicsViewer. Different types of analyses, such as enrichment analysis (using Bioconductor package fgsea or fisher’s exact test) and STRING network analysis, will be performed on the fly when a subset of features is selected. The statistical results are visualized simultaneously. When a subset of samples and a phenotype variable is selected, a significance test on mean difference (t-test or ranked based test; when phenotype variable is quantitative) or test of independence (chi-square or fisher’s exact test; when phenotype data is categorical) will be performed to test the association between the phenotype of interest with the selected samples. Therefore, the immediate analyses and result visualization in omicsViewer greatly facilitates data exploration, many different hypotheses can be explored in a short time without the need for knowledge of R.
Install the package from bioconductor:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("omicsViewer")
BiocManager::install("omicsViewer")
Let’s assume the object of ExpressionSet has been prepared (See Prepare the ExpressionSet), and the task now is to visualize the object using omicsViewer. Inside R, this can be done using:
library(omicsViewer)
path2data <- system.file("extdata", package = "omicsViewer")
omicsViewer(path2data)
Here path2data is the path to the folder containing the .RDS files to be visualized by omicsViewer. All the .RDS files will be listed in the text box on the top right corner. Select one of the datasets to load it into the viewer. A video introduction on how to use the viewer is here.
The example data is an object of ExpressionSet, storing the protein expression of 60 cell lines from 9 different tissues of origin. There are around 2700 proteins measured. The phenotype data contain the basic information about cell lines, such as tissue of origin (categorical), TP53 status (categorical), doubling time (quantitative), multi-drug resistance (quantitative), principal components (quantitative), drug sensitivity data (quantitative) and survival data (fake data for demonstrative purpose), etc. The feature data include the overall intensity of feature, t-test results comparing RE and ME (p-value, fold change, etc), and feature loading from principal component analysis, etc. All the information could be visualized in the omicsViewer and incorporated into the data interpretation interactively.
The viewer has a “Left and Right Halves Layout”. The left side is the “Eset” panel, where the ExpressionSet (feature data, phenotype data and expression matrix) is visualized. This panel also serves as a selector, i.e. a subset of features and samples could be selected from this panel, which will be passed to the “Analyst” panel. In the “Analyst” panel, different statistical analyses will be performed depending on the features or samples selected.
The left side visualize the ExpressionSet, it contains the following tabs:
Feature visualizes the feature data. It is usually a scatter plot or beeswarm plot, every point in the plot is a feature. The x- and y-axis could be selected by the user. A single or multiple features can be selected from the plot and passed to the “analyst” (right) panel. A video tutorial about how to work with the *feature space":
Sample visualizes the sample data. Like the feature space, usually, a scatter plot or a beeswarm plot is shown here. But every point in this tab is a sample. One or more samples could be selected from the figure. A video tutorial about the data:
Heatmap shows the expression matrix as heatmap. Additionally, the information in phenotype and feature data could be added as side colors or used as variables to sort the rows and columns of the heatmap. Both features and samples could be selected from the heatmap. Video tutorial:
Analyst will perform analysis in response to the features and samples selected from the eset panel. Depending on the input data (see here), these tabs are maybe shown:
contrast variable | Significance test | Visualization |
---|---|---|
quantitative | t-test/Mann-Whitney U test | Beeswarm |
categorical | fischer’s exact test/chi-sq test | Contingency table |
Surv | log-rank test | KM-curve |
Video tutorial:
contrast variable | Significance test | Visualization |
---|---|---|
no | - | line chart/boxplot |
quantitative | Pearson correlation analysis | Scatter plot |
categorical | t-test/Mann-Whitney U test | Beeswarm |
Video tutorial:
ORA represents Over-Representation Analysis, this tab will be shown if the gene set information is given in the feature data. This tab shows which gene sets are over-represented in the selected features.
fGSEA the results of fGSEA. Please note the result of fgsea does not depend on the feature selected, instead, the user needs to give a ranking statistic, which is usually stored in the feature data.
StringDB This tab appears when there is a column in the feature data that can be used to query string database. When a subset of features is selected from the left panel, these features will be used to query Enrichment analysis and network analysis will be performed. This tab appears when there is a column in the feature data that can be used to query string database. When a subset of features is selected from the left panel, these features will be used to query Enrichment analysis and network analysis will be performed.
Geneshot Geneshot is web service used to retrieve genes related to a biomedical term. Unlike GSA, the term-related genes are RANKED, mainly according to the how frequently the gene is associated with a term in the literature.Therefore, this method may particularly useful in prioritizing the gene candidate in the data analysis.
gsList The functional annotation of genes in a simple tabular format.
SeqLogo This tab will only be shown when users give at least one column as the “sequence window” in the feature data. The column should contain the peptide sequence with the same length. SeqLogo will be drawn.
The back-end user needs to prepare the data to be visualized by ExpresssionSetViewer. There are functions defined in the package to perform the most basic analyses, such as principal component analysis and t-test. The results of these analyses will be integrated into the ExpressionSet in a format that is compatible with omicsViewer.
A simple example start with an expression matrix:
library(omicsViewer)
library(Biobase)
packdir <- system.file("extdata", package = "omicsViewer")
# reading expression
expr <- read.delim(file.path(packdir, "expressionMatrix.tsv"), stringsAsFactors = FALSE)
colnames(expr) <- make.names(colnames(expr))
rownames(expr) <- make.names(rownames(expr))
(Optional) When the expression matrix has thousands of rows, calculating the dendrogram of rows in shinyapp will be slow, we can also calculate the dendrograms in advance and store them as the “rowDendrogram” (or “colDendrogram” for columns) attribute of expression matrix. The “rowDendrogram” should be a named list, the suggested name is in the format like:
[distance measure]_[linkage method]_[imputed/noImpute]
Every element should be a list of length two:
An example:
# euclidean distance + complete linkage
hcl <- hclust(dist(expr))
dend <- as.dendrogram(hcl)
# correlation distance + ward.D linkage
hcl2 <- hclust(as.dist(1-cor(t(expr), use= "pair")), method = "ward.D")
dend2 <- as.dendrogram(hcl2)
# put two cluster results into one list and add it to the attributes of expression
dl <- list(
euclidean_complete_imputed = list(ord = hcl$order, hcl = dend),
pearson_ward.D_imputed = list(ord = hcl2$order, hcl = dend2)
)
attr(expr, "rowDendrogram") <- dl
Then we can continue to create the phenotypic and sample data:
# reading feature data
fd <- read.delim(file.path(packdir, "featureGeneral.tsv"), stringsAsFactors = FALSE)
# reading phenotype data
pd <- read.delim(file.path(packdir, "sampleGeneral.tsv"), stringsAsFactors = FALSE)
# reading other datasets
drugData <- read.delim(file.path(packdir, "sampleDrug.tsv"))
# survival data
# this data is from cell line, the survival data are fake data to show how to use the survival data in omicsViewer
surv <- read.delim(file.path(packdir, "sampleSurv.tsv"))
# gene set information
genesets <- read_gmt(file.path(packdir, "geneset.gmt"), data.frame = TRUE)
gsannot <- gsAnnotIdList(idList = rownames(fd), gsIdMap = genesets, data.frame = FALSE)
# Define t-test to be done, a matrix nx3
# every row define a t-test, the format
# [column header] [group 1 in the test] [group 2 in the test]
tests <- rbind(
c("Origin", "RE", "ME"),
c("Origin", "RE", "LE"),
c('TP53.Status', "MT", "WT")
)
# prepare column for stringDB query
strid <- sapply(strsplit(fd$Protein.ID, ";|-"), "[", 1)
###
d <- prepOmicsViewer(
expr = expr, pData = pd, fData = fd,
PCA = TRUE, pca.fillNA = TRUE,
t.test = tests, ttest.fillNA = FALSE,
gs = gsannot, stringDB = strid, surv = surv,
SummarizedExperiment = FALSE)
# feature space - default x axis
attr(d, "fx") <- "ttest|RE_vs_ME|mean.diff"
# feature space - default y axis
attr(d, "fy") <- "ttest|RE_vs_ME|log.fdr"
# sample space - default x axis
attr(d, "sx") <- "PCA|All|PC1("
# sample space - default y axis
attr(d, "sy") <- "PCA|All|PC2("
# saveRDS(d, file = "dtest.RDS")
## to open the viewer
# omicsViewer("./")
Additional columns of phenotype data and feature data could be added using extendMetaData function. For example, if we can to correlate drug sensitivity data with our protein expression data
expr <- exprs(d)
drugcor <- correlationAnalysis(expr, pheno = drugData[, 1:2])
d <- extendMetaData(d, drugcor, where = "fData")
# saveRDS(d, file = "dtest.RDS")
## to open the viewer
# omicsViewer("./")
When you prepare the object by yourself, please note that to be compatible with omicsViewer, there are two extra requirements imposed on the object:
The expression matrix should have unique row names and column names. The row names of feature data should be the same as the row names of the expression matrix. The row names of phenotype data should be the same as the column names of the expression matrix. It is also recommended to use meaningful names such as gene names because the names will be displayed in the expression viewer.
The columns names of feature data and phenotype data should be in the format consisting of three elements:
Analysis|Subset|Variable
The first element is called analysis, such as “t-test”, “PCA”, which tells the front-end user from which analysis the result is. There are few analysis “keywords” reserved so the omicsViewer knows how to process/interpret it internally. Please see the next section for more information. The second element is subset, it often indicates which subset of samples/features was used in the analysis. For example, in an experiment with three groups A, B and C, a t-test was used to compare the group A and B, the first two elements could be “ttest|A_vs_B”. Alternatively, the Subset can be anything that helps to further narrow down the information given an “Analysis”. If nothing needs to give, a place holder (e.g. “All”) needs to be added to this position. The last elements are the variables returned by the analysis, for example, the “fold change” and (log-transformed) “p-value” could be given by header “ttest|A_vs_B|fold.change” and “ttest|A_vs_B|p.value”.
Any information could be added to the feature and sample data as long as they meet these requirements. The purpose of this rule for headers is to let the users quickly navigate through the potentially large number of columns in the phenotype and feature data. Once the data are prepared in this way, the information of interest can be selected using a three-step selector in the front-end.
These keywords are reserved for the Analysis element in the headers. Users should avoid using them in cases other than the designated usage.
In addition to opening the viewer inside R, omicsViewer together with the data prepared, can be deployed via several other ways.
docker pull mengchen18/omicsviewer
Portable R: To share the omicsViewer and data with users do not use docker, you can use portable R with xcmsViewer installed. The ready-to-use package could be access from zenodo.
Shiny server: omicsViewer can also be deployed via shiny server. To do so, a server.R and ui.R file should be created and put into the shiny app directory in the shiny server. By doing this, any users who have accession to the shiny server will be able to explore the data and do not need to install anything. The example files of server.R and ui.R are
# server.R
library(shiny)
dir <- "/path/to/obj" # this should be changed
server <- function(input, output, session) {
callModule(omicsViewer::app_module, id = "app", dir = reactive(dir))
}
# ui.R
library(shiny)
ui <- fluidPage(
omicsViewer::app_ui("app")
)
Assume you have prepared an object for omicsViewer, you can open it in R using
library(omicsViewer)
omicsViewer("path/to/object")
The object could be loaded from the top-right corner.
The users can also define their own analysis and plug it into the Analyst panel. To add a self-defined analysis/tab, users first need to create a list of four elements:
tabName Tab name to be shown in the Analyst panel
moduleName ID used when call modules
moduleUi shiny module UI function
moduleServer shiny module server function. The module function should a function with the following arguments:
Then the four-element list should be wrapped by a parent list, which can store a single or multiple “four-element lists”, each representing an additional analysis/tab. Here is an example including a single additional tab:
library(omicsViewer)
ctxt <- function(x) paste(
x[1:min(10, length(x))], collapse = ";"
)
### extra tab 1 -- feature selected ###
exampleCustomeUI_1 <- function(id) {
ns <- NS(id)
tagList(
verbatimTextOutput(ns("txt"))
)
}
exampleCustomeMoudule_1 <- function(
input, output, session, pdata, fdata, expr, feature_selected, sample_selected, object
) {
output$txt <- renderText({
sprintf(
"feature selected: %s ; index feature selected: %s",
ctxt(feature_selected()),
ctxt(match(feature_selected(), rownames(fdata())))
)
})
}
ll <- list(
list(
tabName = "CustomeAddOn",
moduleName = "opt1",
moduleUi = exampleCustomeUI_1,
moduleServer = exampleCustomeMoudule_1
)
)
if (interactive())
omicsViewer("Git/package/inst/extdata/", additionalTabs = ll)
When more tabs should be added, simply add more elements in to the list and pass it to the additionalTabs argument:
### continue from above : extra tab 2 -- sample selected ###
exampleCustomeUI_2 <- function(id) {
ns <- NS(id)
tagList(
verbatimTextOutput(ns("txt"))
)
}
exampleCustomeMoudule_2 <- function(
input, output, session, pdata, fdata, expr, feature_selected, sample_selected, object
) {
output$txt <- renderText({
sprintf(
"sample selected: %s ; colnames sample selected: %s",
ctxt(sample_selected()),
ctxt(match(sample_selected(), rownames(pdata())))
)
})
}
### use oiginal object ###
exampleCustomeUI_3 <- function(id) {
ns <- NS(id)
tagList(
verbatimTextOutput(ns("txt"))
)
}
exampleCustomeMoudule_3 <- function(
input, output, session, pdata, fdata, expr, feature_selected, sample_selected, object
) {
output$txt <- renderText({
as.character(classVersion(object()))
})
}
ll <- list(
# first optional tab
list(
tabName = "Opt feature",
moduleName = "opt-feature",
moduleUi = exampleCustomeUI_1,
moduleServer = exampleCustomeMoudule_1
),
# second optional tab
list(
tabName = "Opt sample",
moduleName = "opt-sample",
moduleUi = exampleCustomeUI_2,
moduleServer = exampleCustomeMoudule_2
),
# third optional tab
list(
tabName = "Opt object",
moduleName = "opt-object",
moduleUi = exampleCustomeUI_3,
moduleServer = exampleCustomeMoudule_3
)
)
if (interactive())
omicsViewer("Git/package/inst/extdata/", additionalTabs = ll)
Include a “three-step selector” in the module:
selector_example_ui <- function(id) {
ns <- NS(id)
tagList(
# select variable
omicsViewer:::triselector_ui(ns("selector_id"))
)
}
selector_example_module <- function(
input, output, session, pdata, fdata, expr, feature_selected, sample_selected, object
) {
ns <- session$ns
# select stats from feature data
triset <- reactive({
fd <- fdata()
cn <- colnames(fd)[sapply(fd, is.numeric) & !grepl("^GS\\|", colnames(fd))]
str_split_fixed(cn, "\\|", n = 3)
})
v1 <- callModule(
omicsViewer:::triselector_module, id = "selector_id", reactive_x = triset, label = "Label"
)
}
ll <- list(
list(
tabName = "selector",
moduleName = "selector-feature",
moduleUi = selector_example_ui,
moduleServer = selector_example_module
)
)
if (interactive())
omicsViewer("Git/package/inst/extdata/", additionalTabs = ll)
sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] shiny_1.7.1 Biobase_2.52.0 BiocGenerics_0.38.0
## [4] omicsViewer_0.99.11
##
## loaded via a namespace (and not attached):
## [1] fgsea_1.18.0 colorspace_2.0-2
## [3] ggsignif_0.6.3 ellipsis_0.3.2
## [5] rio_0.5.27 XVector_0.32.0
## [7] GenomicRanges_1.44.0 ggpubr_0.4.0
## [9] DT_0.19 bit64_4.0.5
## [11] fansi_0.5.0 splines_4.1.2
## [13] mnormt_2.0.2 cachem_1.0.6
## [15] knitr_1.34 shinythemes_1.2.0
## [17] flatxml_0.1.1 jsonlite_1.7.2
## [19] km.ci_0.5-2 broom_0.7.9
## [21] compiler_4.1.2 httr_1.4.2
## [23] backports_1.2.1 assertthat_0.2.1
## [25] Matrix_1.4-0 fastmap_1.1.0
## [27] lazyeval_0.2.2 later_1.3.0
## [29] htmltools_0.5.2 tools_4.1.2
## [31] igraph_1.2.6 gtable_0.3.0
## [33] glue_1.4.2 GenomeInfoDbData_1.2.6
## [35] reshape2_1.4.4 dplyr_1.0.7
## [37] fastmatch_1.1-3 Rcpp_1.0.7
## [39] carData_3.0-4 cellranger_1.1.0
## [41] jquerylib_0.1.4 vctrs_0.3.8
## [43] nlme_3.1-153 crosstalk_1.1.1
## [45] ggseqlogo_0.1 psych_2.1.6
## [47] xfun_0.25 stringr_1.4.0
## [49] networkD3_0.4 openxlsx_4.2.4
## [51] mime_0.11 lifecycle_1.0.0
## [53] shinycssloaders_1.0.0 rstatix_0.7.0
## [55] zoo_1.8-9 zlibbioc_1.38.0
## [57] scales_1.1.1 hms_1.1.0
## [59] promises_1.2.0.1 MatrixGenerics_1.4.3
## [61] SummarizedExperiment_1.22.0 RColorBrewer_1.1-2
## [63] yaml_2.2.1 curl_4.3.2
## [65] memoise_2.0.0 gridExtra_2.3
## [67] KMsurv_0.1-5 ggplot2_3.3.5
## [69] sass_0.4.0 stringi_1.7.4
## [71] RSQLite_2.2.8 S4Vectors_0.30.0
## [73] zip_2.2.0 BiocParallel_1.26.2
## [75] GenomeInfoDb_1.28.4 rlang_0.4.11
## [77] pkgconfig_2.0.3 matrixStats_0.60.1
## [79] bitops_1.0-7 fontawesome_0.2.2
## [81] evaluate_0.14 lattice_0.20-45
## [83] purrr_0.3.4 htmlwidgets_1.5.4
## [85] bit_4.0.4 tidyselect_1.1.1
## [87] plyr_1.8.6 magrittr_2.0.1
## [89] R6_2.5.1 IRanges_2.26.0
## [91] generics_0.1.0 DelayedArray_0.18.0
## [93] DBI_1.1.1 pillar_1.6.2
## [95] haven_2.4.3 foreign_0.8-82
## [97] abind_1.4-5 survival_3.2-13
## [99] RCurl_1.98-1.4 tibble_3.1.4
## [101] crayon_1.4.1 car_3.0-11
## [103] survMisc_0.5.5 shinyWidgets_0.6.2
## [105] utf8_1.2.2 tmvnsim_1.0-2
## [107] plotly_4.9.4.1 shinybusy_0.2.2
## [109] rmarkdown_2.10 grid_4.1.2
## [111] readxl_1.3.1 data.table_1.14.0
## [113] blob_1.2.2 forcats_0.5.1
## [115] digest_0.6.27 xtable_1.8-4
## [117] tidyr_1.1.3 httpuv_1.6.2
## [119] stats4_4.1.2 munsell_0.5.0
## [121] beeswarm_0.4.0 viridisLite_0.4.0
## [123] survminer_0.4.9 bslib_0.3.0
## [125] shinyjs_2.0.0