# How can I superimpose an arbitrary parametric distribution over a histogram using ggplot?

How can I superimpose an arbitrary parametric distribution over a histogram using ggplot?

I have made an attempt based on a Quick-R example, but I don't understand where the scaling factor comes from. Is this method reasonable? How can I modify it to use ggplot?

An example overplot the normal and lognormal distributions using this method follows:

```## Get a log-normalish data set: the number of characters per word in "Alice in Wonderland"
n = -1L, ok = TRUE, warn = TRUE,
encoding = "UTF-8")

alice.long <- paste(alice.raw, collapse=" ")
alice.long.noboilerplate <- strsplit(alice.long, split="\\*\\*\\*")[[1]][3]
alice.words <- strsplit(alice.long.noboilerplate, "[[:space:]]+")[[1]]
alice.nchar <- nchar(alice.words)
alice.nchar <- alice.nchar[alice.nchar > 0]

# Now we want to plot both the histogram and then log-normal probability dist
require(MASS)
h <- hist(alice.nchar, breaks=1:50, xlab="Characters in word", main="Count")
xfit <- seq(1, 50, 0.1)

# Plot a normal curve
yfit<-dnorm(xfit,mean=mean(alice.nchar),sd=sd(alice.nchar))
yfit <- yfit * diff(h\$mids[1:2]) * length(alice.nchar)
lines(xfit, yfit, col="blue", lwd=2)

# Now plot a log-normal curve
params <- fitdistr(alice.nchar, densfun="lognormal")
yfit <- dlnorm(xfit, meanlog=params\$estimate[1], sdlog=params\$estimate[1])
yfit <- yfit * diff(h\$mids[1:2]) * length(alice.nchar)
lines(xfit, yfit, col="red", lwd=2)```

This produces the following plot:

To clarify, I would like to have counts on the y-axis, rather than a density estimate.