New Perspective: Latitude and Longitude Graphical Analysis of Distance and Frequency

Problem: Determine the best use of public funds, when revenue is shrinking?

Solution: Analyze zip code data to determine three things: 1) where participants are coming from, 2) how many are coming from a specific spot, and 3) the distance traveled.

There are numerous ways of getting this type of data, some are complicated and easier than others. The reason for choosing zip code data is because it is a good balance between accuracy and privacy. If available there are ways of getting down to the street and house level if an address is available, but in many cases people are unwilling to give out such information making getting the data difficult. While not as accurate as an address a zip code is close, while not invading the privacy of participants. The use of zip code data allows for easier coding, cleaning up, and faster analysis via R.

This example will show how to look at data to answer these questions using the data set us.cities found in the Maps package and by producing three key graphs. The first graph is a bar plot of the top 5 cities.

Normally there would be more than one person per city. The second graph is a frequency histogram of the distances traveled by the participants. This can be used to determine who is staying over for the night (eating at restaurants, and staying at hotels) versus those who are making a day trip of it. For this data set, the vast majority are going to be staying the night, a good thing for revenue for the community. The exact distance for those who stay over night, and those who do not can be set by location as different factors influence such decisions, population density, traffic (interstate or state highways), natural barriers (mountains, lakes, rivers, etc), and others.

The final graph is a map of the various locations from which participants came from. Because it is a map of every city with a population greater than 40,000 people, there are a number of lines stretching out all over the United States. This should give a flavor of what such analysis can do.

The r code is as follows:

```require(geosphere)
require(maps)

#setting up data, summary stats
data(us.cities)
summary(us.cities)

#setting up Los Angeles USA, as point of origin, and all the US cities as
#a destination
LA<-c(-118.41, 34.11)
all<-matrix(data=c(us.cities\$long, us.cities\$lat), ncol=2)

#Top 5 Cities, in this case all cities are represented once, change n=""
#to change the number, and use tail, instead of head, to find the bottom.
top.5<-barplot(table(head(us.cities\$name, n=5)), col='blue', main="Top 5 Cities")

#Getting the distance and histograms, the default is meters, so it is
#converted to miles, the Vincenty Ellipsoid function takes into account
#that the earth is not perfectly round

dist<-distm(LA, all, fun=distVincentyEllipsoid)*0.000621371192
par(las=0)
hist(dist, main='Histogram of the Distances: Using Vincenty Ellipsoid Function',
col='blue')

#mapping it out
#US
map("world", col="#f2f2f2", fill=TRUE, bg="white", lwd=0.15,
xlim=c(-170, -65), ylim=c(15, 60))

title(main='US Cities')

for(i in 1:dim(all)[1]){
inter <- gcIntermediate(LA, all[i, 1:2], n=1005, addStartEnd=TRUE)
lines(inter, col="blue")
}

#From here the data can be zoomed in, subset, sorted, or even have the
#colors change to designate different frequencies```

Created by Pretty R at inside-R.org