Title: | Gene Set Analysis Toolkit WebGestaltR |
---|---|
Description: | The web version WebGestalt <https://www.webgestalt.org> supports 12 organisms, 354 gene identifiers and 321,251 function categories. Users can upload the data and functional categories with their own gene identifiers. In addition to the Over-Representation Analysis, WebGestalt also supports Gene Set Enrichment Analysis and Network Topology Analysis. The user-friendly output report allows interactive and efficient exploration of enrichment results. The WebGestaltR package not only supports all above functions but also can be integrated into other pipeline or simultaneously analyze multiple gene lists. |
Authors: | John Elizarraras [aut, cre], Jing Wang [aut], Yuxing Liao [aut], Eric Jaehnig [ctb], Zhiao Shi [ctb], Quanhu Sheng [ctb] |
Maintainer: | John Elizarraras <[email protected]> |
License: | LGPL |
Version: | 1.0.0 |
Built: | 2025-02-14 18:32:58 UTC |
Source: | https://github.com/bzhanglab/webgestaltr |
Use affinity propagation to cluster similar gene sets to reduce redundancy in report.
affinityPropagation(idsInSet, score)
affinityPropagation(idsInSet, score)
idsInSet |
A list of set names and their member IDs. |
score |
A vector of addible scores with the same length used to assign input preference; higher score has larger weight, i.e. -logP. |
A list of clusters
and representatives
for each cluster.
A list of character vectors of set IDs in each cluster.
A character vector of representatives for each cluster.
Zhiao Shi, Yuxing Liao
Check Format and Read Data
formatCheck(dataType = "list", inputGeneFile = NULL, inputGene = NULL)
formatCheck(dataType = "list", inputGeneFile = NULL, inputGene = NULL)
dataType |
Type of data, either |
inputGeneFile |
The data file to be mapped. |
inputGene |
Or the input could be given as an R object.
GMT file should be read with |
A list of data frame
Outputs a brief summary of input genes based on GO Slim data.
goSlimSummary( organism = "hsapiens", geneList, outputFile, outputType = "pdf", isOutput = TRUE, cache = NULL, hostName = "https://www.webgestalt.org" )
goSlimSummary( organism = "hsapiens", geneList, outputFile, outputType = "pdf", isOutput = TRUE, cache = NULL, hostName = "https://www.webgestalt.org" )
organism |
Currently, WebGestaltR supports 12 organisms. Users can use the function
|
geneList |
A list of input genes. |
outputFile |
Output file name. |
outputType |
File format of the plot: |
isOutput |
Boolean if a plot is save to |
cache |
A directory to save data cache for reuse. Defaults to |
hostName |
The server URL for accessing data. Mostly for development purposes. |
A list of the summary result.
ID mapping utility with WebGestalt server.
idMapping( organism = "hsapiens", dataType = "list", inputGeneFile = NULL, inputGene = NULL, sourceIdType, targetIdType = NULL, collapseMethod = "mean", mappingOutput = FALSE, outputFileName = "", cache = NULL, hostName = "https://www.webgestalt.org/" ) idToSymbol( organism = "hsapiens", dataType = "list", inputGeneFile = NULL, inputGene = NULL, sourceIdType = "ensembl_gene_id", collapseMethod = "mean", mappingOutput = FALSE, outputFileName = NULL, cache = NULL, hostName = "https://www.webgestalt.org/" )
idMapping( organism = "hsapiens", dataType = "list", inputGeneFile = NULL, inputGene = NULL, sourceIdType, targetIdType = NULL, collapseMethod = "mean", mappingOutput = FALSE, outputFileName = "", cache = NULL, hostName = "https://www.webgestalt.org/" ) idToSymbol( organism = "hsapiens", dataType = "list", inputGeneFile = NULL, inputGene = NULL, sourceIdType = "ensembl_gene_id", collapseMethod = "mean", mappingOutput = FALSE, outputFileName = NULL, cache = NULL, hostName = "https://www.webgestalt.org/" )
organism |
Currently, WebGestaltR supports 12 organisms. Users can use the function
|
dataType |
Type of data, either |
inputGeneFile |
The data file to be mapped. |
inputGene |
Or the input could be given as an R object.
GMT file should be read with |
sourceIdType |
The ID type of the data. |
targetIdType |
The ID type of the mapped data. |
collapseMethod |
The method to collapse duplicate IDs with scores. |
mappingOutput |
Boolean if the mapping output is written to file. |
outputFileName |
The output file name. |
cache |
A directory to save data cache for reuse. Defaults to |
hostName |
The server URL for accessing data. Mostly for development purposes. |
A list of mapped
and unmapped
IDs.
Calculate Jaccard Similarity.
jaccardSim(idsInSet, score)
jaccardSim(idsInSet, score)
idsInSet |
A list of set names and their member IDs. |
score |
A vector of addible scores with the same length used to assign input preference; higher score has larger weight, i.e. -logP. |
A list of similarity matrix sim.mat
and input preference vector ip.vec
.
Zhiao Shi, Yuxing Liao
kMedoid clustering
kMedoid(idsInSet, score, maxK = 10)
kMedoid(idsInSet, score, maxK = 10)
idsInSet |
a list of sets of ids |
score |
a vector of scores for each set |
maxK |
maximum number of clusters |
List available WebGestalt servers.
listArchiveUrl()
listArchiveUrl()
A data frame of available servers.
List available gene sets for the given organism on WebGestalt server.
listGeneSet( organism = "hsapiens", hostName = "https://www.webgestalt.org/", cache = NULL )
listGeneSet( organism = "hsapiens", hostName = "https://www.webgestalt.org/", cache = NULL )
organism |
Currently, WebGestaltR supports 12 organisms. Users can use the function
|
hostName |
The server URL for accessing data. Mostly for development purposes. |
cache |
A directory to save data cache for reuse. Defaults to |
A data frame of available gene sets.
List supported ID types for the given organism on WebGestalt server.
listIdType( organism = "hsapiens", hostName = "https://www.webgestalt.org/", cache = NULL )
listIdType( organism = "hsapiens", hostName = "https://www.webgestalt.org/", cache = NULL )
organism |
Currently, WebGestaltR supports 12 organisms. Users can use the function
|
hostName |
The server URL for accessing data. Mostly for development purposes. |
cache |
A directory to save data cache for reuse. Defaults to |
A list of supported gene sets.
List supported organisms on WebGestalt server.
listOrganism(hostName = "https://www.webgestalt.org/", cache = NULL)
listOrganism(hostName = "https://www.webgestalt.org/", cache = NULL)
hostName |
The server URL for accessing data. Mostly for development purposes. |
cache |
A directory to save data cache for reuse. Defaults to |
A list of supported organisms.
List available reference sets for the given organism on WebGestalt server.
listReferenceSet( organism = "hsapiens", hostName = "https://www.webgestalt.org/", cache = NULL )
listReferenceSet( organism = "hsapiens", hostName = "https://www.webgestalt.org/", cache = NULL )
organism |
Currently, WebGestaltR supports 12 organisms. Users can use the function
|
hostName |
The server URL for accessing data. Mostly for development purposes. |
cache |
A directory to save data cache for reuse. Defaults to |
A list of reference sets.
Load gene set data
loadGeneSet( organism = "hsapiens", enrichDatabase = NULL, enrichDatabaseFile = NULL, enrichDatabaseType = NULL, enrichDatabaseDescriptionFile = NULL, cache = NULL, hostName = "https://www.webgestalt.org/", isMultiOmics = FALSE )
loadGeneSet( organism = "hsapiens", enrichDatabase = NULL, enrichDatabaseFile = NULL, enrichDatabaseType = NULL, enrichDatabaseDescriptionFile = NULL, cache = NULL, hostName = "https://www.webgestalt.org/", isMultiOmics = FALSE )
organism |
Currently, WebGestaltR supports 12 organisms. Users can use the function
|
enrichDatabase |
The functional categories for the enrichment analysis. Users can use
the function |
enrichDatabaseFile |
Users can provide one or more GMT files as the functional
category for enrichment analysis. The extension of the file should be |
enrichDatabaseType |
The ID type of the genes in the |
enrichDatabaseDescriptionFile |
Users can also provide description files for the custom
|
cache |
A directory to save data cache for reuse. Defaults to |
hostName |
The server URL for accessing data. Mostly for development purposes. |
isMultiOmics |
Boolean if loading gene sets for multiomics. Defaults to |
A list of geneSet
, geneSetDes
, geneSetDag
, geneSetNet
, standardId
.
Gene set: A data frame with columns of "geneSet", "description", "genes"
Description: A data frame with columns of two columns of gene set ID and description
DAG: A edge list data frame of two columns of parent and child. Or a list of data frames if multilple databases are given.
Network: A edge list data frame of two columns connecting nodes. Or a list of data frames if multilple databases are given.
The standard ID of the gene set
Performs site weighted gene set enrichment analysis or standard GSEA when
likelihood/weight columns in input_df
are 1 or 0, p=1
,
q=1
and thresh_type="val"
.
multiswGsea( input_df_list, thresh_type = "percentile", thresh = 0.9, thresh_action = "exclude", min_set_size = 10, max_set_size = 500, max_score = "max", min_score = "min", psuedocount = 0.001, perms = 1000, p = 1, q = 1, nThreads = 1, rng_seed = 1, fork = FALSE, fdrMethod = "BH" )
multiswGsea( input_df_list, thresh_type = "percentile", thresh = 0.9, thresh_action = "exclude", min_set_size = 10, max_set_size = 500, max_score = "max", min_score = "min", psuedocount = 0.001, perms = 1000, p = 1, q = 1, nThreads = 1, rng_seed = 1, fork = FALSE, fdrMethod = "BH" )
input_df_list |
A data frame in which first column is name of item of interest (gene, protein, phosphosite, etc.), the second is the correlation of that item of interest with the phenotype (typically log ratio of expression for phenotype vs. normal), and the remaining columns are the scores for the likelihood that the item belongs in each set (one column per set). |
thresh_type |
The type of |
thresh |
Depends on |
thresh_action |
Either "include", "exclude (default)", or "adjust"; this specifies how to treat each set if it doesn't contain a minimum number of items or contains all of the items; this option cannot be used with predefined lists of items in sets (if the number of items in a given set doesn't meet requirements, that set will be skipped). |
min_set_size , max_set_size
|
The minimum/maximum number of items each set needs for the analysis to proceed. |
max_score , min_score
|
A optional numeric vector of minimum/maximum boundaries to clip scores for each set. |
psuedocount |
Psuedocount (pc) is used for rescaling set scores:
|
perms |
The number of permutations. |
p |
The exponential scaling factor of the phenotype score (second column in
|
q |
The exponential scaling factor of the likelihood score (weights). |
nThreads |
The number of threads to use in calculating permutaions. |
rng_seed |
Random seed. |
fork |
A boolean. Whether pass "fork" to |
fdrMethod |
For the ORA method, WebGestaltR supports five FDR methods: |
The formula for weighting is as follows
Where r is log ratio score, s is likelihood score, j is the index of the gene.
A list of Enrichment_Results
, Items_in_Set
and Running_Sums
.
A data frame with row names of gene set and columns of "ES", "NES", "p_val", "fdr".
A list of one-column data frames. Describes genes and their ranks in each set.
Running sum scores along genes sorted by ranked scores, with gene sets as columns.
John Elizarraras
A helper to read files for performing standard GSEA.
prepareGseaInput(rankFile, gmtFile)
prepareGseaInput(rankFile, gmtFile)
rankFile |
Path of the rnk file |
gmtFile |
Path of the GMT file |
a data frame to be used in swGsea
Prepare Input Matrix for GSEA
prepareInputMatrixGsea(rank, gmt)
prepareInputMatrixGsea(rank, gmt)
rank |
A 2 column Data Frame of gene and score |
gmt |
3 column Data Frame of geneSet, description, and gene |
A matrix used for input to swGsea
.
Read GMT File
readGmt(gmtFile, cache = NULL)
readGmt(gmtFile, cache = NULL)
gmtFile |
The file path or URL of the GMT file. |
cache |
A directory to save data cache for reuse. Defaults to |
A data frame with columns of "geneSet", "description", "gene".
Performs site weighted gene set enrichment analysis or standard GSEA when
likelihood/weight columns in input_df
are 1 or 0, p=1
,
q=1
and thresh_type="val"
.
swGsea( input_df, thresh_type = "percentile", thresh = 0.9, thresh_action = "exclude", min_set_size = 10, max_set_size = 500, max_score = "max", min_score = "min", psuedocount = 0.001, perms = 1000, p = 1, q = 1, nThreads = 1, rng_seed = 1, fork = FALSE )
swGsea( input_df, thresh_type = "percentile", thresh = 0.9, thresh_action = "exclude", min_set_size = 10, max_set_size = 500, max_score = "max", min_score = "min", psuedocount = 0.001, perms = 1000, p = 1, q = 1, nThreads = 1, rng_seed = 1, fork = FALSE )
input_df |
A data frame in which first column is name of item of interest (gene, protein, phosphosite, etc.), the second is the correlation of that item of interest with the phenotype (typically log ratio of expression for phenotype vs. normal), and the remaining columns are the scores for the likelihood that the item belongs in each set (one column per set). |
thresh_type |
The type of |
thresh |
Depends on |
thresh_action |
Either "include", "exclude (default)", or "adjust"; this specifies how to treat each set if it doesn't contain a minimum number of items or contains all of the items; this option cannot be used with predefined lists of items in sets (if the number of items in a given set doesn't meet requirements, that set will be skipped). |
min_set_size , max_set_size
|
The minimum/maximum number of items each set needs for the analysis to proceed. |
max_score , min_score
|
A optional numeric vector of minimum/maximum boundaries to clip scores for each set. |
psuedocount |
Psuedocount (pc) is used for rescaling set scores:
|
perms |
The number of permutations. |
p |
The exponential scaling factor of the phenotype score (second column in
|
q |
The exponential scaling factor of the likelihood score (weights). |
nThreads |
The number of threads to use in calculating permutaions. |
rng_seed |
Random seed. |
fork |
A boolean. Whether pass "fork" to |
The formula for weighting is as follows
Where r is log ratio score, s is likelihood score, j is the index of the gene.
A list of Enrichment_Results
, Items_in_Set
and Running_Sums
.
A data frame with row names of gene set and columns of "ES", "NES", "p_val", "fdr".
A list of one-column data frames. Describes genes and their ranks in each set.
Running sum scores along genes sorted by ranked scores, with gene sets as columns.
Eric Jaehnig
The web version WebGestalt https://www.webgestalt.org supports 12 organisms, 354 gene identifiers and 321,251 function categories. Users can upload the data and functional categories with their own gene identifiers. In addition to the Over-Representation Analysis, WebGestalt also supports Gene Set Enrichment Analysis and Network Topology Analysis. The user-friendly output report allows interactive and efficient exploration of enrichment results. The WebGestaltR package not only supports all above functions but also can be integrated into other pipeline or simultaneously analyze multiple gene lists.
Main function for enrichment analysis
WebGestaltR( enrichMethod = "ORA", organism = "hsapiens", enrichDatabase = NULL, enrichDatabaseFile = NULL, enrichDatabaseType = NULL, enrichDatabaseDescriptionFile = NULL, interestGeneFile = NULL, interestGene = NULL, interestGeneType = NULL, interestGeneNames = NULL, collapseMethod = "mean", referenceGeneFile = NULL, referenceGene = NULL, referenceGeneType = NULL, referenceSet = NULL, minNum = 10, maxNum = 500, sigMethod = "fdr", fdrMethod = "BH", fdrThr = 0.05, topThr = 10, reportNum = 20, perNum = 1000, gseaP = 1, isOutput = TRUE, outputDirectory = getwd(), projectName = NULL, dagColor = "continuous", saveRawGseaResult = FALSE, gseaPlotFormat = c("png", "svg"), setCoverNum = 10, networkConstructionMethod = NULL, neighborNum = 10, highlightType = "Seeds", highlightSeedNum = 10, nThreads = 1, cache = NULL, hostName = "https://www.webgestalt.org/", useWeightedSetCover = FALSE, useAffinityPropagation = FALSE, usekMedoid = TRUE, kMedoid_k = 25, listName = NULL, ... ) WebGestaltRBatch( interestGeneFolder = NULL, enrichMethod = "ORA", isParallel = FALSE, nThreads = 3, ... )
WebGestaltR( enrichMethod = "ORA", organism = "hsapiens", enrichDatabase = NULL, enrichDatabaseFile = NULL, enrichDatabaseType = NULL, enrichDatabaseDescriptionFile = NULL, interestGeneFile = NULL, interestGene = NULL, interestGeneType = NULL, interestGeneNames = NULL, collapseMethod = "mean", referenceGeneFile = NULL, referenceGene = NULL, referenceGeneType = NULL, referenceSet = NULL, minNum = 10, maxNum = 500, sigMethod = "fdr", fdrMethod = "BH", fdrThr = 0.05, topThr = 10, reportNum = 20, perNum = 1000, gseaP = 1, isOutput = TRUE, outputDirectory = getwd(), projectName = NULL, dagColor = "continuous", saveRawGseaResult = FALSE, gseaPlotFormat = c("png", "svg"), setCoverNum = 10, networkConstructionMethod = NULL, neighborNum = 10, highlightType = "Seeds", highlightSeedNum = 10, nThreads = 1, cache = NULL, hostName = "https://www.webgestalt.org/", useWeightedSetCover = FALSE, useAffinityPropagation = FALSE, usekMedoid = TRUE, kMedoid_k = 25, listName = NULL, ... ) WebGestaltRBatch( interestGeneFolder = NULL, enrichMethod = "ORA", isParallel = FALSE, nThreads = 3, ... )
enrichMethod |
Enrichment methods: |
organism |
Currently, WebGestaltR supports 12 organisms. Users can use the function
|
enrichDatabase |
The functional categories for the enrichment analysis. Users can use
the function |
enrichDatabaseFile |
Users can provide one or more GMT files as the functional
category for enrichment analysis. The extension of the file should be |
enrichDatabaseType |
The ID type of the genes in the |
enrichDatabaseDescriptionFile |
Users can also provide description files for the custom
|
interestGeneFile |
If |
interestGene |
Users can also use an R object as the input. If |
interestGeneType |
The ID type of the interesting gene list. The supported ID types of
WebGestaltR for the selected organism can be found by the function |
interestGeneNames |
The names of the id lists for multiomics data. |
collapseMethod |
The method to collapse duplicate IDs with scores. |
referenceGeneFile |
For the ORA method, the users need to upload the reference gene
list. The extension of the |
referenceGene |
For the ORA method, users can also use an R object as the reference
gene list. |
referenceGeneType |
The ID type of the reference gene list. The supported ID types
of WebGestaltR for the selected organism can be found by the function |
referenceSet |
Users can directly select the reference set from existing platforms in
WebGestaltR and do not need to provide the reference set through |
minNum |
WebGestaltR will exclude the categories with the number of annotated genes
less than |
maxNum |
WebGestaltR will exclude the categories with the number of annotated genes
larger than |
sigMethod |
Two methods of significance are available in WebGestaltR: |
fdrMethod |
For the ORA method, WebGestaltR supports five FDR methods: |
fdrThr |
The significant threshold for the |
topThr |
The threshold for the |
reportNum |
The number of enriched categories visualized in the final report. The default
is |
perNum |
The number of permutations for the GSEA method. The default is |
gseaP |
The exponential scaling factor of the phenotype score. The default is |
isOutput |
If |
outputDirectory |
The output directory for the results. |
projectName |
The name of the project. If |
dagColor |
If |
saveRawGseaResult |
Whether the raw result from GSEA is saved as a RDS file, which can be
used for plotting. Defaults to
|
gseaPlotFormat |
The graphic format of GSEA enrichment plots. Either |
setCoverNum |
The number of expected gene sets after set cover to reduce redundancy.
It could get fewer sets if the coverage reaches 100%. The default is |
networkConstructionMethod |
Netowrk construction method for NTA. Either
|
neighborNum |
The number of neighbors to include in NTA Network Expansion method. |
highlightType |
The type of nodes to highlight in the NTA Network Expansion method,
either |
highlightSeedNum |
The number of top input seeds to highlight in NTA Network Retrieval & Prioritizaiton method. |
nThreads |
The number of cores to use for GSEA and set cover, and in batch function. |
cache |
A directory to save data cache for reuse. Defaults to |
hostName |
The server URL for accessing data. Mostly for development purposes. |
useWeightedSetCover |
Use weighted set cover for ORA. Defaults to |
useAffinityPropagation |
Use affinity propagation for ORA. Defaults to |
usekMedoid |
Use k-medoid for ORA. Defaults to |
kMedoid_k |
The number of clusters for k-medoid. Defaults to |
listName |
(optional) The names of the analyte list. Used to give the HTML title of the report. Defaults to |
... |
In batch function, passes parameters to WebGestaltR function. Also handles backward compatibility for some parameters in old versions. |
interestGeneFolder |
Run WebGestaltR for gene list files in the folder. |
isParallel |
If jobs are run parallelly in the batch. |
WebGestaltR function can perform three enrichment analyses: ORA (Over-Representation Analysis) and GSEA (Gene Set Enrichment Analysis).and NTA (Network Topology Analysis). Based on the user-uploaded gene list or gene list with scores, WebGestaltR function will first map the gene list to the entrez gene ids and then summarize the gene list based on the GO (Gene Ontology) Slim. After performing the enrichment analysis, WebGestaltR function also returns a user-friendly HTML report containing GO Slim summary and the enrichment analysis result. If functional categories have DAG (directed acyclic graph) structure or genes in the functional categories have network structure, those relationship can also be visualized in the report.
The WebGestaltR function returns a data frame containing the enrichment analysis
result and also outputs an user-friendly HTML report if isOutput
is TRUE
.
The columns in the data frame depend on the enrichMethod
and they are the following:
ID of the gene set.
Description of the gene set if available.
Link to the data source.
The number of genes in the set after filtering by minNum
and maxNum
.
The number of mapped input genes that are annotated in the gene set.
Expected number of input genes that are annotated in the gene set.
Enrichment ratio, overlap / expect.
Enrichment score, the maximum running sum of scores for the ranked list.
Normalized enrichment score, normalized against the average enrichment score of all permutations.
Number of genes/phosphosites in the leading edge.
P-value from hypergeometric test for ORA. For GSEA, please refer to its original publication or online at https://software.broadinstitute.org/gsea/doc/GSEAUserGuideTEXT.htm.
Corrected P-value for mulilple testing with fdrMethod
for ORA.
The gene/phosphosite IDs of overlap
for ORA (entrez gene IDs or
phosphosite sequence).
Genes/phosphosites in the leading edge in entrez gene ID or phosphosite sequence.
The gene/phosphosite IDs of overlap
for ORA or leadingEdgeId
for GSEA in User input IDs.
Path of the GSEA enrichment plot.
Name of the source database if multiple enrichment databases are given.
In NTA, like geneSet
, the enriched GO terms of genes in the
returned subnetwork.
In NTA, the gene IDs in the subnetwork with 0/1 annotations indicating if it is from user input.
The WebGestaltRBatch function returns a list of enrichment results.
Maintainer: John Elizarraras [email protected]
Authors:
Jing Wang [email protected]
Yuxing Liao [email protected]
Other contributors:
Eric Jaehnig [email protected] [contributor]
Zhiao Shi [email protected] [contributor]
Quanhu Sheng [email protected] [contributor]
Useful links:
## Not run: ####### ORA example ######### geneFile <- system.file("extdata", "interestingGenes.txt", package = "WebGestaltR") refFile <- system.file("extdata", "referenceGenes.txt", package = "WebGestaltR") outputDirectory <- getwd() enrichResult <- WebGestaltR( enrichMethod = "ORA", organism = "hsapiens", enrichDatabase = "pathway_KEGG", interestGeneFile = geneFile, interestGeneType = "genesymbol", referenceGeneFile = refFile, referenceGeneType = "genesymbol", isOutput = TRUE, outputDirectory = outputDirectory, projectName = NULL ) ####### GSEA example ######### rankFile <- system.file("extdata", "GeneRankList.rnk", package = "WebGestaltR") outputDirectory <- getwd() enrichResult <- WebGestaltR( enrichMethod = "GSEA", organism = "hsapiens", enrichDatabase = "pathway_KEGG", interestGeneFile = rankFile, interestGeneType = "genesymbol", sigMethod = "top", topThr = 10, minNum = 5, outputDirectory = outputDirectory ) ####### NTA example ######### enrichResult <- WebGestaltR( enrichMethod = "NTA", organism = "hsapiens", enrichDatabase = "network_PPI_BIOGRID", interestGeneFile = geneFile, interestGeneType = "genesymbol", sigMethod = "top", topThr = 10, outputDirectory = getwd(), highlightSeedNum = 10, networkConstructionMethod = "Network_Retrieval_Prioritization" ) ## End(Not run)
## Not run: ####### ORA example ######### geneFile <- system.file("extdata", "interestingGenes.txt", package = "WebGestaltR") refFile <- system.file("extdata", "referenceGenes.txt", package = "WebGestaltR") outputDirectory <- getwd() enrichResult <- WebGestaltR( enrichMethod = "ORA", organism = "hsapiens", enrichDatabase = "pathway_KEGG", interestGeneFile = geneFile, interestGeneType = "genesymbol", referenceGeneFile = refFile, referenceGeneType = "genesymbol", isOutput = TRUE, outputDirectory = outputDirectory, projectName = NULL ) ####### GSEA example ######### rankFile <- system.file("extdata", "GeneRankList.rnk", package = "WebGestaltR") outputDirectory <- getwd() enrichResult <- WebGestaltR( enrichMethod = "GSEA", organism = "hsapiens", enrichDatabase = "pathway_KEGG", interestGeneFile = rankFile, interestGeneType = "genesymbol", sigMethod = "top", topThr = 10, minNum = 5, outputDirectory = outputDirectory ) ####### NTA example ######### enrichResult <- WebGestaltR( enrichMethod = "NTA", organism = "hsapiens", enrichDatabase = "network_PPI_BIOGRID", interestGeneFile = geneFile, interestGeneType = "genesymbol", sigMethod = "top", topThr = 10, outputDirectory = getwd(), highlightSeedNum = 10, networkConstructionMethod = "Network_Retrieval_Prioritization" ) ## End(Not run)
Perform multi-omics analysis using WebGestaltR
WebGestaltRMultiOmics( analyteLists = NULL, analyteListFiles = NULL, analyteTypes = NULL, enrichMethod = "ORA", organism = "hsapiens", enrichDatabase = NULL, enrichDatabaseFile = NULL, enrichDatabaseType = NULL, enrichDatabaseDescriptionFile = NULL, collapseMethod = "mean", minNum = 10, maxNum = 500, fdrMethod = "BH", sigMethod = "fdr", fdrThr = 0.05, topThr = 10, reportNum = 100, setCoverNum = 10, perNum = 1000, gseaP = 1, isOutput = TRUE, outputDirectory = getwd(), projectName = NULL, dagColor = "binary", saveRawGseaResult = FALSE, gseaPlotFormat = "png", nThreads = 1, cache = NULL, hostName = "https://www.webgestalt.org/", useWeightedSetCover = TRUE, useAffinityPropagation = FALSE, usekMedoid = FALSE, kMedoid_k = 25, isMetaAnalysis = TRUE, mergeMethod = "mean", normalizationMethod = "rank", referenceLists = NULL, referenceListFiles = NULL, referenceTypes = NULL, referenceSets = NULL, listNames = NULL )
WebGestaltRMultiOmics( analyteLists = NULL, analyteListFiles = NULL, analyteTypes = NULL, enrichMethod = "ORA", organism = "hsapiens", enrichDatabase = NULL, enrichDatabaseFile = NULL, enrichDatabaseType = NULL, enrichDatabaseDescriptionFile = NULL, collapseMethod = "mean", minNum = 10, maxNum = 500, fdrMethod = "BH", sigMethod = "fdr", fdrThr = 0.05, topThr = 10, reportNum = 100, setCoverNum = 10, perNum = 1000, gseaP = 1, isOutput = TRUE, outputDirectory = getwd(), projectName = NULL, dagColor = "binary", saveRawGseaResult = FALSE, gseaPlotFormat = "png", nThreads = 1, cache = NULL, hostName = "https://www.webgestalt.org/", useWeightedSetCover = TRUE, useAffinityPropagation = FALSE, usekMedoid = FALSE, kMedoid_k = 25, isMetaAnalysis = TRUE, mergeMethod = "mean", normalizationMethod = "rank", referenceLists = NULL, referenceListFiles = NULL, referenceTypes = NULL, referenceSets = NULL, listNames = NULL )
analyteLists |
|
analyteListFiles |
If |
analyteTypes |
a vector containing the ID types of the analyte lists. |
enrichMethod |
Enrichment methods: |
organism |
Currently, WebGestaltR supports 12 organisms. Users can use the function
|
enrichDatabase |
The functional categories for the enrichment analysis. Users can use
the function |
enrichDatabaseFile |
Users can provide one or more GMT files as the functional
category for enrichment analysis. The extension of the file should be |
enrichDatabaseType |
The ID type of the genes in the |
enrichDatabaseDescriptionFile |
Users can also provide description files for the custom
|
collapseMethod |
The method to collapse duplicate IDs with scores. |
minNum |
WebGestaltR will exclude the categories with the number of annotated genes
less than |
maxNum |
WebGestaltR will exclude the categories with the number of annotated genes
larger than |
fdrMethod |
For the ORA method, WebGestaltR supports five FDR methods: |
sigMethod |
Two methods of significance are available in WebGestaltR: |
fdrThr |
The significant threshold for the |
topThr |
The threshold for the |
reportNum |
The number of enriched categories visualized in the final report. The default
is |
setCoverNum |
The number of expected gene sets after set cover to reduce redundancy.
It could get fewer sets if the coverage reaches 100%. The default is |
perNum |
The number of permutations for the GSEA method. The default is |
gseaP |
The exponential scaling factor of the phenotype score. The default is |
isOutput |
If |
outputDirectory |
The output directory for the results. |
projectName |
The name of the project. If |
dagColor |
If |
saveRawGseaResult |
Whether the raw result from GSEA is saved as a RDS file, which can be
used for plotting. Defaults to
|
gseaPlotFormat |
The graphic format of GSEA enrichment plots. Either |
nThreads |
The number of cores to use for GSEA and set cover, and in batch function. |
cache |
A directory to save data cache for reuse. Defaults to |
hostName |
The server URL for accessing data. Mostly for development purposes. |
useWeightedSetCover |
Use weighted set cover for ORA. Defaults to |
useAffinityPropagation |
Use affinity propagation for ORA. Defaults to |
usekMedoid |
Use k-medoid for ORA. Defaults to |
kMedoid_k |
The number of clusters for k-medoid. Defaults to |
isMetaAnalysis |
whether to perform meta-analysis. Defaults to |
mergeMethod |
The method to merge the results from multiple omics (options: |
normalizationMethod |
The method to normalize the results from multiple omics (options: |
referenceLists |
For the ORA method, users can also use an R object as the reference
gene list. |
referenceListFiles |
For the ORA method, the users need to upload the reference gene
list. The extension of the |
referenceTypes |
Vector of the ID types of the reference lists. The supported ID types
of WebGestaltR for the selected organism can be found by the function |
referenceSets |
Users can directly select the reference sets from existing platforms in
WebGestaltR and do not need to provide the reference set through |
listNames |
The names of the analyte lists. |
Multi-omics GSEA importFrom dplyr bind_rows left_join arrange select desc importFrom readr write_tsv
WebGestaltRMultiOmicsGSEA( analyteLists = NULL, analyteListFiles = NULL, analyteTypes = NULL, enrichMethod = "GSEA", organism = "hsapiens", enrichDatabase = NULL, enrichDatabaseFile = NULL, enrichDatabaseType = NULL, enrichDatabaseDescriptionFile = NULL, collapseMethod = "mean", minNum = 10, maxNum = 500, fdrMethod = "BH", sigMethod = "fdr", fdrThr = 0.05, topThr = 10, reportNum = 100, setCoverNum = 10, perNum = 1000, gseaP = 1, isOutput = TRUE, outputDirectory = getwd(), projectName = NULL, dagColor = "binary", saveRawGseaResult = FALSE, gseaPlotFormat = "png", nThreads = 1, cache = NULL, hostName = "https://www.webgestalt.org/", useWeightedSetCover = TRUE, useAffinityPropagation = FALSE, usekMedoid = FALSE, kMedoid_k = 25, isMetaAnalysis = TRUE, mergeMethod = "mean", normalizationMethod = "rank", listNames = NULL )
WebGestaltRMultiOmicsGSEA( analyteLists = NULL, analyteListFiles = NULL, analyteTypes = NULL, enrichMethod = "GSEA", organism = "hsapiens", enrichDatabase = NULL, enrichDatabaseFile = NULL, enrichDatabaseType = NULL, enrichDatabaseDescriptionFile = NULL, collapseMethod = "mean", minNum = 10, maxNum = 500, fdrMethod = "BH", sigMethod = "fdr", fdrThr = 0.05, topThr = 10, reportNum = 100, setCoverNum = 10, perNum = 1000, gseaP = 1, isOutput = TRUE, outputDirectory = getwd(), projectName = NULL, dagColor = "binary", saveRawGseaResult = FALSE, gseaPlotFormat = "png", nThreads = 1, cache = NULL, hostName = "https://www.webgestalt.org/", useWeightedSetCover = TRUE, useAffinityPropagation = FALSE, usekMedoid = FALSE, kMedoid_k = 25, isMetaAnalysis = TRUE, mergeMethod = "mean", normalizationMethod = "rank", listNames = NULL )
analyteLists |
|
analyteListFiles |
If |
analyteTypes |
a vector containing the ID types of the analyte lists. |
enrichMethod |
Enrichment methods: |
organism |
Currently, WebGestaltR supports 12 organisms. Users can use the function
|
enrichDatabase |
The functional categories for the enrichment analysis. Users can use
the function |
enrichDatabaseFile |
Users can provide one or more GMT files as the functional
category for enrichment analysis. The extension of the file should be |
enrichDatabaseType |
The ID type of the genes in the |
enrichDatabaseDescriptionFile |
Users can also provide description files for the custom
|
collapseMethod |
The method to collapse duplicate IDs with scores. |
minNum |
WebGestaltR will exclude the categories with the number of annotated genes
less than |
maxNum |
WebGestaltR will exclude the categories with the number of annotated genes
larger than |
fdrMethod |
For the ORA method, WebGestaltR supports five FDR methods: |
sigMethod |
Two methods of significance are available in WebGestaltR: |
fdrThr |
The significant threshold for the |
topThr |
The threshold for the |
reportNum |
The number of enriched categories visualized in the final report. The default
is |
setCoverNum |
The number of expected gene sets after set cover to reduce redundancy.
It could get fewer sets if the coverage reaches 100%. The default is |
perNum |
The number of permutations for the GSEA method. The default is |
gseaP |
The exponential scaling factor of the phenotype score. The default is |
isOutput |
If |
outputDirectory |
The output directory for the results. |
projectName |
The name of the project. If |
dagColor |
If |
saveRawGseaResult |
Whether the raw result from GSEA is saved as a RDS file, which can be
used for plotting. Defaults to
|
gseaPlotFormat |
The graphic format of GSEA enrichment plots. Either |
nThreads |
The number of cores to use for GSEA and set cover, and in batch function. |
cache |
A directory to save data cache for reuse. Defaults to |
hostName |
The server URL for accessing data. Mostly for development purposes. |
useWeightedSetCover |
Use weighted set cover for ORA. Defaults to |
useAffinityPropagation |
Use affinity propagation for ORA. Defaults to |
usekMedoid |
Use k-medoid for ORA. Defaults to |
kMedoid_k |
The number of clusters for k-medoid. Defaults to |
isMetaAnalysis |
whether to perform meta-analysis. Defaults to |
mergeMethod |
The method to merge the results from multiple omics (options: |
normalizationMethod |
The method to normalize the results from multiple omics (options: |
listNames |
The names of the analyte lists. |
Multi-omics ORA importFrom dplyr bind_rows left_join arrange select desc importFrom readr write_tsv
WebGestaltRMultiOmicsOra( analyteLists = NULL, analyteListFiles = NULL, analyteTypes = NULL, enrichMethod = "ORA", organism = "hsapiens", enrichDatabase = NULL, enrichDatabaseFile = NULL, enrichDatabaseType = NULL, enrichDatabaseDescriptionFile = NULL, collapseMethod = "mean", minNum = 10, maxNum = 500, fdrMethod = "BH", sigMethod = "fdr", fdrThr = 0.05, topThr = 10, reportNum = 100, setCoverNum = 10, perNum = 1000, gseaP = 1, isOutput = TRUE, outputDirectory = getwd(), projectName = NULL, dagColor = "binary", nThreads = 1, cache = NULL, hostName = "https://www.webgestalt.org/", useWeightedSetCover = TRUE, useAffinityPropagation = FALSE, usekMedoid = FALSE, kMedoid_k = 25, referenceLists = NULL, referenceListFiles = NULL, referenceTypes = NULL, referenceSets = NULL, listNames = NULL )
WebGestaltRMultiOmicsOra( analyteLists = NULL, analyteListFiles = NULL, analyteTypes = NULL, enrichMethod = "ORA", organism = "hsapiens", enrichDatabase = NULL, enrichDatabaseFile = NULL, enrichDatabaseType = NULL, enrichDatabaseDescriptionFile = NULL, collapseMethod = "mean", minNum = 10, maxNum = 500, fdrMethod = "BH", sigMethod = "fdr", fdrThr = 0.05, topThr = 10, reportNum = 100, setCoverNum = 10, perNum = 1000, gseaP = 1, isOutput = TRUE, outputDirectory = getwd(), projectName = NULL, dagColor = "binary", nThreads = 1, cache = NULL, hostName = "https://www.webgestalt.org/", useWeightedSetCover = TRUE, useAffinityPropagation = FALSE, usekMedoid = FALSE, kMedoid_k = 25, referenceLists = NULL, referenceListFiles = NULL, referenceTypes = NULL, referenceSets = NULL, listNames = NULL )
analyteLists |
|
analyteListFiles |
If |
analyteTypes |
a vector containing the ID types of the analyte lists. |
enrichMethod |
Enrichment methods: |
organism |
Currently, WebGestaltR supports 12 organisms. Users can use the function
|
enrichDatabase |
The functional categories for the enrichment analysis. Users can use
the function |
enrichDatabaseFile |
Users can provide one or more GMT files as the functional
category for enrichment analysis. The extension of the file should be |
enrichDatabaseType |
The ID type of the genes in the |
enrichDatabaseDescriptionFile |
Users can also provide description files for the custom
|
collapseMethod |
The method to collapse duplicate IDs with scores. |
minNum |
WebGestaltR will exclude the categories with the number of annotated genes
less than |
maxNum |
WebGestaltR will exclude the categories with the number of annotated genes
larger than |
fdrMethod |
For the ORA method, WebGestaltR supports five FDR methods: |
sigMethod |
Two methods of significance are available in WebGestaltR: |
fdrThr |
The significant threshold for the |
topThr |
The threshold for the |
reportNum |
The number of enriched categories visualized in the final report. The default
is |
setCoverNum |
The number of expected gene sets after set cover to reduce redundancy.
It could get fewer sets if the coverage reaches 100%. The default is |
perNum |
The number of permutations for the GSEA method. The default is |
gseaP |
The exponential scaling factor of the phenotype score. The default is |
isOutput |
If |
outputDirectory |
The output directory for the results. |
projectName |
The name of the project. If |
dagColor |
If |
nThreads |
The number of cores to use for GSEA and set cover, and in batch function. |
cache |
A directory to save data cache for reuse. Defaults to |
hostName |
The server URL for accessing data. Mostly for development purposes. |
useWeightedSetCover |
Use weighted set cover for ORA. Defaults to |
useAffinityPropagation |
Use affinity propagation for ORA. Defaults to |
usekMedoid |
Use k-medoid for ORA. Defaults to |
kMedoid_k |
The number of clusters for k-medoid. Defaults to |
referenceLists |
For the ORA method, users can also use an R object as the reference
gene list. |
referenceListFiles |
For the ORA method, the users need to upload the reference gene
list. The extension of the |
referenceTypes |
Vector of the ID types of the reference lists. The supported ID types
of WebGestaltR for the selected organism can be found by the function |
referenceSets |
Users can directly select the reference sets from existing platforms in
WebGestaltR and do not need to provide the reference set through |
listNames |
The names of the analyte lists. |
Size constrained weighted set cover problem to find top N sets while maximizing the coverage of all elements.
weightedSetCover(idsInSet, costs, topN, nThreads = 4)
weightedSetCover(idsInSet, costs, topN, nThreads = 4)
idsInSet |
A list of set names and their member IDs. |
costs |
A vector of the same length to add weights for penalty, i.e. 1/-logP. |
topN |
The number of sets (or less when it completes early) to return. |
nThreads |
The number of processes to use. In Windows, it fallbacks to 1. |
A list of topSets
and coverage
.
A list of set IDs.
The percentage of IDs covered in the top sets.
Zhiao Shi, Yuxing Liao