# elbow {GMD}

The "Elbow" Method for Clustering Evaluation
Package:
GMD
Version:
0.3.1.1

### Description

Determining the number of clusters in a data set by the "elbow" rule.

### Usage

```## find a good k given thresholds of EV and its
increment.
elbow(x,inc.thres,ev.thres,precision=3,print.warning=TRUE)

## a wrapper of `elbow' testing multiple thresholds.
elbow.batch(x,inc.thres=c(0.01,0.05,0.1),
ev.thres=c(0.95,0.9,0.8,0.75,0.67,0.5,0.33),precision=3)

## S3 method for class 'elbow':
plot((x,elbow.obj=NULL,main,xlab="k",
ylab="Explained_Variance",type="b",pch=20,col.abline="red",
lty.abline=3,if.plot.new=TRUE,print.info=TRUE,
mar=c(4,5,3,3),omi=c(0.75,0,0,0),...))

```

### Arguments

x
a `css.multi' object, generated by `css.hclust`
inc.thres
numeric with value(s) from 0 to 1, the threshold of the increment of EV. A single value is used in `elbow` while a vector of values in `elbow.batch`.
ev.thres
numeric with value(s) from 0 to 1, the threshold of EV. A single value is used in `elbow` while a vector of values in `elbow.batch`.
precision
integer, the number of digits to round for numerical comparison.
print.warning
logical, whether to print warning messages.
elbow.obj
a `elbow' object, generated by `elbow` or `elbow.batch`
main
an overall title for the plot.
ylab
a title for the y axis.
xlab
a title for the x axis.
type
what type of plot should be drawn.
See `help("plot", package="graphics")`.
pch
Either an integer specifying a symbol or a single character to be used as the default in plotting points (see `par`).
col.abline
color for straight lines through the current plot (see option `col` in `par`).
lty.abline
line type for straight lines through the current plot (see option `lty` in `par`).
if.plot.new
logical, whether to start a new plot device or not.
print.info
logical, whether to print the information of `elbow.obj'.
mar
A numerical vector of the form 'c(bottom, left, top, right)' which gives the number of lines of margin to be specified on the four sides of the plot (see option `mar` in `par`). The default is 'c(4, 5, 3, 3) + 0.1'.
omi
A vector of the form 'c(bottom, left, top, right)' giving the size of the outer margins in inches (see option `omi` in `par`).
...
arguments to be passed to method `plot.elbow`, such as graphical parameters (see `par`).

### Details

Determining the number of clusters in a data set by the "elbow" rule and thresholds in the explained variance (EV) and its increment.

### Values

Both `elbow` and `elbow.btach` return a `elbow' object (if a "good" `k` exists), which is a list containing the following components

 k number of clusters ev given `k` inc.thres the threshold of the increment in EV ev.thres the threshold of the EV

, and with an attribute `meta' that contains

 description A description about the "good" `k`

`css` and `css.hclust` for computing Clustering Sum-of-Squares.

### Examples

```## load library
require("GMD")

## simulate data around 12 points in Euclidean space
pointv <- data.frame(x=c(1,2,2,4,4,5,5,6,7,8,9,9),
y=c(1,2,8,2,4,4,5,9,9,8,1,9))
set.seed(2012)
mydata <- c()
for (i in 1:nrow(pointv)){
mydata <- rbind(mydata,cbind(rnorm(10,pointv[i,1],0.1),
rnorm(10,pointv[i,2],0.1)))
}
mydata <- data.frame(mydata); colnames(mydata) <- c("x","y")
plot(mydata,type="p",pch=21, main="Simulated data")

## determine a "good" k using elbow
dist.obj <- dist(mydata[,1:2])
hclust.obj <- hclust(dist.obj)
css.obj <- css.hclust(dist.obj,hclust.obj)
elbow.obj <- elbow.batch(css.obj)
print(elbow.obj)

## make partition given the "good" k
k <- elbow.obj\$k; cutree.obj <- cutree(hclust.obj,k=k)
mydata\$cluster <- cutree.obj

## draw a elbow plot and label the data
dev.new(width=12, height=6)
par(mfcol=c(1,2),mar=c(4,5,3,3),omi=c(0.75,0,0,0))
plot(mydata\$x,mydata\$y,pch=as.character(mydata\$cluster),
col=mydata\$cluster,cex=0.75,main="Clusters of simulated data")
plot(css.obj,elbow.obj,if.plot.new=FALSE)

## clustering with more relaxed thresholds (, resulting a smaller "good" k)
elbow.obj2 <- elbow.batch(css.obj,ev.thres=0.90,inc.thres=0.05)
mydata\$cluster2 <- cutree(hclust.obj,k=elbow.obj2\$k)

dev.new(width=12, height=6)
par(mfcol=c(1,2), mar=c(4,5,3,3),omi=c(0.75,0,0,0))
plot(mydata\$x,mydata\$y,pch=as.character(mydata\$cluster2),
col=mydata\$cluster2,cex=0.75,main="Clusters of simulated data")
plot(css.obj,elbow.obj2,if.plot.new=FALSE)```

Documentation reproduced from package GMD, version 0.3.1.1. License: GPL (>= 2)