Skip to Content

hca.test {swamp}

Tests for annotation differences among sample clusters
Package: 
swamp
Version: 
1.2.3

Description

The main clusters of a dendrogram are tested for different patient annotations.

Usage

hca.test(g, o, dendcut = 2, method = "correlation", link = "ward", 
         test = "chisq", workspace = 2e+07)

Arguments

g
the input data in form of a matrix with features as rows and samples as columns.
o
the corresponding sample annotations in the form of a data.frame. Sample annotation as a single vector is allowed and will be transformed to a data.frame. rownames (o) must be identical to colnames (g). o can contain factors and numeric variables. No character variables are allowed. NAs are allowed.
dendcut
the number of clusters to cut the dendrogram tree (using cutree()). default=2.
method
the distance method for the clustering. default="correlation". hcluster from the package amap is used and method must be one of "euclidean", "maximum", "manhattan", "canberra" "binary" "pearson", "correlation", "spearman" or "kendall".
link
the agglomeration principle for the clustering. default="ward". hcluster from the package amap is used and link must be one of "ward", "single", "complete", "average", "mcquitty", "median" or "centroid".
test
the test to use for the annotations that are factors. this can be either "fisher" or "chisq" to use fisher.test() or chisq.test(), respectively. default = "chisq". However fisher.test is preferable as it is an exact test. Note that fisher.test() is computationally expensive and can cause R to crash.
workspace
workspace to use if test="fisher"

Details

The function clusters the samples using amap and then cuts the dendrogram into a specified number of clusters. The obtained sample clusters are tested for differences in sample annotations. fisher.test() or chisq.test() is used if the annotation is a factor, lm(annotation~clusters) is used for numeric annotations. The p-values for the dependence of sample annotation and sample clusters are returned.

Values

a list with components

p.values
a numeric vector containing the p.values for the annotation variable.
classes
the classes of the annotation variables in o.

Note

requires the package amap

Examples

## data as a matrix
set.seed(100)
g<-matrix(nrow=1000,ncol=50,rnorm(1000*50),dimnames=list(paste("Feature",1:1000),
          paste("Sample",1:50)))
g[1:100,26:50]<-g[1:100,26:50]+1 # the first 100 features show higher values in the samples 26:50
## patient annotations as a data.frame, annotations should be numbers and factor but not characters.
## rownames have to be the same as colnames of the data matrix 
set.seed(200)
o<-data.frame(Factor1=factor(c(rep("A",25),rep("B",25))),
              Factor2=factor(rep(c("A","B"),25)),
              Numeric1=rnorm(50),row.names=colnames(g))
 
# perform the test for the main 2 clusters
res3<-hca.test(g,o,dendcut=2,test="fisher") 
        # use test="chisq" for large ncol(g) to avoid crash of R
res3$p.values

Author(s)

Martin Lauss

Documentation reproduced from package swamp, version 1.2.3. License: GPL (>= 2)