Skip to Content

How to Learn R

RevoJoe's picture

The R programming language was designed for doing statistics. In my view, its great popularity among statisticians, people learning statistics, data miners and others is due to the way it facilities the process of thinking about statistics. R’s syntax greatly aids in expressing statistical models. Often, it is intuitive shorthand for the mathematics. R’s interactive nature and the ability to get near instantaneous feedback encourages experimentation and self-learning; and, once you get a feel for where the resources can be found, the commitment and creativity of the R community is a source of great encouragement.

It is true that learning R takes some effort. However, just like with learning a new natural language useful things can be done and great fun had before achieving fluency. I think that the process of learning R can be broken down into the following five stages:

1. Understand something of the culture of the R community, the environment in which the R programming language is maintained and developed. Become familiar with the resources available. Install the R on your computer and run a test script.

2. Read csv files into data frames and confidently use R functions to perform statistical analyses in a domain with which you are familiar.

3. Use the basic control structures of the R language to write simple programs. Write your own functions, become familiar with the data structures included in R and begin to explore the rich features of the language. Interface with database, web pages and other external data sources.

4. Write complex programs in the language. Develop an understanding of the deep structure of the language S3 and S4 objects, closures etc.

5. Develop programs for production use. Write an R package.

Stage 1 can be achieved in less than a day and, with the right reference book, should be enough to launch anyone sitting down to learn statistics on a very good trajectory. The completion of stage 2 with regular work at stage 3 might be all that most people ever need to know. Once one becomes familiar with the libraries of R functions that are important to one’s field, it is not inconceivable that proficiency at this level is sufficient for professional scientists, social scientists and others for whom the mechanics of model building and analysis is not their main focus can go about their daily work. For the rest of us who want to do some serious modeling analysis, it’s a matter of taking Malcolm Gladwell’s advice and getting in your 10,000 hours.

So, how would I advise an R newbie to go about learning R? – jump right in, get oriented, latch on to a learning resource that fits your style, run other people’s R scripts that do something interesting, and begin writing your own.

Getting oriented

The best way to get oriented is to explore theInside-R web site,CRAN (particularly the task views) and crantastic. Download R and a GUI-based integrated development environment (IDE). If you are fortunate enough to have access to Revolution Analytics Enterprise R IDEthen you are off to a very good start. Otherwise, try RStudio.

Resources

Resources for learning R generally fit into three categories:
1. Books, papers, presentations and other “slideware”
2. Blogs
3. Formal courses

Books

I am a book person, so my knee jerk reaction to learning anything new is to find a good book. This might seem quaint to the mobile app generation, but, as it turns out, each of the major technical publishing houses specializing in statistics books: Springer, the Cambridge University Press, Chapman&Hall / CRC have excellent books on doing statistics with R. Springer is the clear leader. The short texts in Springer’s Use-R series are at an introductory level, are modestly priced and each focuses on a different statistical area. The following recommendations are only just a small sample of what is available. Even the extensive list on the Inside-R site is no longer complete.
Probably the best text for someone new to both statistics and R is Peter Dalgaar’s “Introductory Statistics with R” . A personal favorite of mine at approximately the same level is John Fox’s “An R and S-Plus Companion to Applied Regression” . Slightly more advanced but very readable and enjoyable texts are Maindonald and Braun’s “Data Analysis and Graphics Using R: An Example-based Approach” and Gelman and Hill’s "Data Analysis Using Regression and Multilevel / Hierarchical Models”. A reference text that every aspiring R competent statistician ought to have is Venables and Ripley’s “Modern Applied Statistics with S (Statistics and Computing”.

A very short but sweet book that ought to help beginners become familiar with R’s data structures is Phil Spector’s “Data Manipulation with R”. Two other noteworthy books in this class are the O’Reilly publications “R in a Nutshell” by Joe Adler and the “R Cookbook” by Paul Teetor. If you have a SAS or SPSS background then Robert Muenchen’s “R for SAS and SPSS Users” might be your bible. If you are an accomplished programmer and want a technical overview of the R language try John Chamber’s “Software for Data Analysis" .

Blogs

Besides books and their accompanying websites blogs are excellent place to get your hands on interesting, useful code. My favorite blogs are David Smith’s blog at Revolution, Quick R, R-Bloggers , and Rob Hyndman’s blog.

Courses

If a semi-formal setting better suites you style of learning than please do have a look at the courses offered by Statistics.com. I took one of their courses taught by Hadley Wickham, and very much enjoyed it.

Comments

Anonymous's picture

I particularly liked your characterisation of the phases of learning R and the careful selection of book recommendations. There are so many resources these days that the real challenge is helping new R users filter through what is available. You inspired me to update my own guide on getting started with R.

Anonymous's picture

Thank you. It is kind of you to say that. Good luck with R!

Anonymous's picture

Thank you for an excellent guide. Very concise and instructive.

Anonymous's picture

Joe, Like you, I'm a book person. Our short lists are nearly identical. One nit, you may want to update link to John Fox's book to http://www.amazon.com/R-Companion-Applied-Regression/dp/141297514X/ref=s...

The 2nd edition that he jointly wrote with Sandy Weisberg totally up to date & greatly expanded.
-Jim

Anonymous's picture

Good catch. Thank you. The link has been updated.

Anonymous's picture

R best practice == emacs + ess + ( Sweave() + LaTeX | R2wd + MSWord | odfWeave + OpenOffice )

Anonymous's picture

I am finding very useful "A Handbook of Statistical Analyses Using R" second edition by B. Everitt and T. Hothorn. Each chapter has exercises using the package HSAUR2.

R statistics's picture

I have written some basic R tutorials myself.
The tutorials on: R statistics , are just basic tutorials to get to know R. It contains the basics and how working with distributions, frequenties etc can be done in R.

I'm planning on making some more advanced tutorials in the near furture.
But the tutorials that are online right now, make a good base for new R-users.

john12333's picture

Our short lists are nearly identical. One nit, you may want to update link to John Fox'sRelatie Site

leveloneluke's picture

Your link for Peter Dalgaard's book is to the version published in 2004. There seems to be a second edition published recently. Any reason for not recommending the newer version? This looks like exactly what I'm looking for.

pama10's picture

Any reason for not recommending the newer version? This looks like exactly what I'm looking for.compare car insurance

jackson12347's picture

I am finding very useful "A Handbook of Statistical Analyses Using R" second edition by B. Everitt and T. Hothorn. Each chapter has exercises using the package HSAUR2.
apartments

jackson12347's picture

Your link for Peter Dalgaard's book is to the version published in 2004. There seems to be a second edition published recently. Any reason for not recommending the newer version? This looks like exactly what I'm looking for.
vacation

jackson12347's picture

I am finding very useful "A Handbook of Statistical Analyses Using R" second edition by B. Everitt and T. Hothorn. Each chapter has exercises using the package HSAUR2.
wonder bra

sceptre53's picture

The 2nd edition that he jointly wrote with Sandy Weisberg totally up to date & greatly expanded.
restaurantes sp

john12333's picture

Cambridge University Press, Chapman&Hall / CRC have excellent books on doing statistics with R. Springer is the clear leader.tech support service

sceptre53's picture

I am finding very useful "A Handbook of Statistical Analyses Using R" second edition by B. Everitt and T. Hothorn. Each chapter has exercises using the package HSAUR2.
restaurantes sp

Joshua's picture

How would you say it compares with C++? I am familiar with the STL of C++, would it be similar concepts but just different syntax? christian iphone cases

sceptre53's picture

There seems to be a second edition published recently. Any reason for not recommending the newer version? This looks like exactly what I'm looking for.
restaurantes sp

john12333's picture

R community, the environment in which the R programming language is maintained and developed. registry cleaner pro cnet