Finding Data on the Internet
What I would like is a nice list of all of credible sources on the Internet for finding data to use with R projects. I know that this is a crazy idea, not well formulated (what are data after all) and loaded with absurd computational and theoretical challenges. (Why can't I just google "data R" and get what I want?) So, what can I do? As many people are also out there doing, I can begin to make lists (in many cases lists of lists) on a platform that is stable enough to survive and grow, and perhaps encourage others to help with the effort.
Here follows a list of data sources that may easily be imported into R.
If an (R) appears after source this means that the data are already in R format or there exist R commands for directly importing the data from R. (See http://www.quantmod.com/examples/intro/ for some code.) Otherwise, i have limited the list to data sources for which there is a reasonably simple process for importing csv files. What follows is a list of data sources organized into categories that are not mutually exclusive but which reflect what's out there.
Economics
UMD:: http://inforumweb.umd.edu/econdata/econdata.html
World bank: http://data.worldbank.org/indicator
Finance
CBOE Futures Exchange: http://cfe.cboe.com/Data/
Google Finance: http://finance.yahoo.com/ (R)
Google Trends: http://www.google.com/trends?q=google&ctab=0&geo=all&date=all&sort=0
St Louis Fed: http://research.stlouisfed.org/fred2/ (R)
NASDAQ: https://data.nasdaq.com/
OANDA: http://www.oanda.com/ (R)
Yahoo Finance: http://finance.yahoo.com/ (R)
Government
Archived national government statistics: http://www.archive-it.org/
Australia: http://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/3301.02009?OpenDocument
Canada: http://www.data.gc.ca/default.asp?lang=En&n=5BCD274E-1
DataMarket: http://datamarket.com/
Fed Stats: http://www.fedstats.gov/cgi-bin/A2Z.cgi
Guardian world governments: http://www.guardian.co.uk/world-government-data
London, U.K. data: http://data.london.gov.uk/catalogue
New Zealand: http://www.stats.govt.nz/tools_and_services/tools/TableBuilder/tables-by...
NYC data: http://nycplatform.socrata.com/
OECD: http://www.oecd.org/document/0,3746,en_2649_201185_46462759_1_1_1_1,00.html
San Francisco Data sets: http://datasf.org/
U.K. Government Data:http://data.gov.uk/data
United Nations: http://data.un.org/
U.S. Federal Government Agencies: http://www.data.gov/metric
US CDC Public Health datasets: http://www.cdc.gov/nchs/data_access/ftp_data.htm
The World Bank: http://wdronline.worldbank.org/
UK 2011 Census Open Atlas Project: http://www.alex-singleton.com/2011-census-open-atlas-project/
Machine Learning
Causality Workbench: http://www.causality.inf.ethz.ch/repository.php
Kaggle competition data: http://www.kaggle.com/
KDNuggets competition site: www.kdnuggets.com/datasets/
UCI Machine Learning Repository: http://archive.ics.uci.edu/ml/
Machine Learning Data Set Repository: http://mldata.org/
Microsoft Research: http://research.microsoft.com/apps/dp/dl/downloads.aspx
Million songs: http://blog.echonest.com/post/3639160982/million-song-dataset
Social Networking: http://www.cs.cmu.edu/~jelsas/data/ancestry.com/
The Koblenz Network Collection: http://konect.uni-koblenz.de/
53.5 billion clicks: http://cnets.indiana.edu/groups/nan/webtraffic/click-dataset
Public Domain Collections
Data360: http://www.data360.org/index.aspx
Datamob.org: http://datamob.org/datasets
Factual: http://www.factual.com/topics/browse
Freebase: http://www.freebase.com/
Google: http://www.google.com/publicdata/directory
infochimps: http://www.infochimps.com/
numbray: http://numbrary.com/
Sample R data sets: http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/00Index.html (R)
SourceForge Research Data: http://www.nd.edu/~oss/Data/data.html
UFO Reports: http://www.nuforc.org/webreports.html
Wikileaks 911 pager intercepts: http://911.wikileaks.org/files/index.html
Stats4Stem.org: R data sets: http://www.stats4stem.org/data-sets.html (R)
The Washington Post List: http://www.washingtonpost.com/wp-srv/metro/data/datapost.html
Science
Agricultural Experiments: http://www.inside-r.org/packages/cran/agridat/docs/agridat (R)
Climate data: http://www.cru.uea.ac.uk/cru/data/temperature/#datter
and ftp://ftp.cmdl.noaa.gov/
Gene Expression Omnibus: http://www.ncbi.nlm.nih.gov/geo/
Geo Spatial Data: http://geodacenter.asu.edu/datalist/
Human Microbiome Project: http://www.hmpdacc.org/reference_genomes/reference_genomes.php
MIT Cancer Genomics Data: http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi
NASA: http://nssdc.gsfc.nasa.gov/nssdc/obtaining_data.html
NIH Microarray data: ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/series/GSE6532/ (R)
Protein structure: http://www.infobiotic.net/PSPbenchmarks/
Public Gene Data: http://www.pubgene.org/
Stanford Microarray Data: http://smd.stanford.edu//
Social Sciences
General Social Survey: http://www3.norc.org/GSS+Website/
ICPSR: http://www.icpsr.umich.edu/icpsrweb/ICPSR/access/index.jsp
UCLA Social Sciences Archive: http://dataarchives.ss.ucla.edu/Home.DataPortals.htm
UPJOHN INST: http://www.upjohn.org/erdc/erdc.html
Time Series
Time Series data Library: http://robjhyndman.com/TSDL/
Universities
Carnegie Mellon University Enron email: http://www.cs.cmu.edu/~enron/
Carnegie Mellon University StatLab: http://lib.stat.cmu.edu/datasets/
Carnegie Mellon University JASA data archive: http://lib.stat.cmu.edu/jasadata/
Ohio State University Financial data: http://fisher.osu.edu/fin/osudata.htm
UC Berkeley: http://ucdata.berkeley.edu/
UCLA: http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data
UC Riverside Time Series: http://www.cs.ucr.edu/~eamonn/time_series_data/
University of Toronto: http://www.cs.toronto.edu/~delve/data/datasets.html

Comments
You could add my Time Series Data Library: http://robjhyndman.com/TSDL/
Thank you Rob!
Joe,
Thanks for this totally awesome list! The best part is that wonderful little (R). That should save me a bunch of time when preparing teaching examples.
Cheers,
Bob Muenchen
Thank you Bob. I am especially keen on adding more of those little (R)'s myself.
Joe,
Thanks for doing this!
This Ancestry.com forum archive was just released this morning:
http://www.cs.cmu.edu/~jelsas/data/ancestry.com/
looks like a great data set for both text mining and social network analysis.
-Jim
Thank you Jim.
The Ancestry set is on the list.
Joe
Hi, I also find a couple sites to download exchange rates data, with a couple a adjustments they are ready to work with R.
http://ratedata.gaincapital.com/
http://www.forexite.com/free_forex_quotes/forex_history_arhiv.html
http://www.dukascopy.com/swiss/english/data_feed/csv_data_export/
For me, Dukascopy data has better quality and very realiable.
Hey Joe,
Here is some data for large networks.
http://snap.stanford.edu/data/index.html
Cheers,
Derek
Great list!
My 2 cents: there are tones of data from different providers aggregated at the Hans Rosling's website: http://www.gapminder.org/data
The files are in Excel format, which is not a problem to import into R
- Sergey
There is a sizable thread on this subject on Quora with many, many sources:
http://www.quora.com/Data/Where-can-I-get-large-datasets-open-to-the-public
There are also links there to a Reddit thread and other similar collections.
Thanks for posting. Just noticed though the link to Google Finance actually points to Yahoo Finance.
hey, you have a very succsessfull article beacuse we are waituing since morning to you beacause i am a very big fan of your article .......ii always want to you just send me this type of articles thanks...sdfsdf diy australia
I agree, Simple, but great and incredible. Information add insight. I love to read your article because of the many benefits that I can. Urlaub
I agree, Simple, but great and incrsdfedible. Information add insight. I love to read your article because of the many benefits that I can. options binaires forex
That's is so inspiring; I am very pleased by this post. Nice information on this post!!! I really like it.
http://www.dissertationmojo.co.uk/dissertation-proposal/
http://www.dissertationmojo.co.uk/pricing/
http://www.dissertationmojo.co.uk/uk-dissertation/
http://www.dissertationmojo.co.uk
Thanks for another wonderful post greatest internet websites.Please keep up the good work for the concise and informative articles.Buy A Dissertation ^thesis writing service ^UK Dissertation Online ^Essays Help
The insurance you shield your wealth or asset from any obvious danger through multiple systems which are in place.
Prestige Wealth Management
The webinar will provide up-to-date information about the tool for infrastructure owners and operators as well as government partners. Panelists will present federal and private sector perspectives on the development of the tool, how it works, and how their facilities have benefited from its use. Registration is free. Anaheim water damage
I hope you write more on this subject! Great blog, congrats.! I found this much informative. I have really enjoyed.I have saved it to my bookmarks.GoPro camera video
There is a lot of confusion about business letters and many people are not sure exactly what a "business letter" really is. In fact, the term "business letter" is a very general one that can mean many different specific letter types. This article clears up the confusion technical writing service.
confusion about business letters and many people are not sure exactly what a "business letter" really is. In fact, the term "business letter" is a very general one that can mean many different specific letter types. UK Cheap DVD BOX SET
finding data on the internet requires enterprise search