Skip to Content

Previous Questions of the Day

How to compute descriptive statistics on a set of differently sized vectors

In a problem, I have a set of vectors. Each vector has sensor readings but are of different lengths. I'd like to compute the same descriptive statistics on each of these vectors. My question is, how should I store them in R. Using c() concatenates the vectors. Using list() seems to cause functions like mean() to misbehave. Is a data frame the right object?

What is the best practice for applying the same function to vectors if different sizes? Supposing the data resides in a SQL server, how should it be imported?

Tricks to manage the available memory in an R session?

What tricks do people use to manage the available memory of an interactive R session? I use the functions below [based on postings by Petr Pikal and David Hinds to the r-help list in 2004] to list (and/or sort) the largest objects and to occassionally rm() some of them. But by far the most effective solution was ... to run under 64-bit Linux with ample memory.

Any other nice tricks folks want to share? One per post, please.

Learning functional programming with R

I have recently studied functional programming with Haskell and
Clojure and found it has also improved my R coding practices. For
example I've better grasped the possibilities of usign lists and apply
family of functions instead of loops. As a side effect I also
discovered that a lot of my problems are parallel (and I can use
mclapply for significant speed up), but I've been
thinking sequentially in terms of loops.

How do I manually change the key labels in a legend in ggplot2

I am preparing a plot for publication. I created a stacked box plot to show frequency of patients in each group who were some complicated accumulation of seronegatives versus not. The legend is using the labels from the data frame which are appropriate for us who are working on the project but no for publication. I want to change the names to something more rapidly understood by the reader.

So for instance run the following script

What is the most useful R trick? [closed]

In order to share some more tips and tricks for R, what is your single-most useful feature or trick? Clever vectorization? Data input/output? Visualization and graphics? Statistical analysis? Special functions? The interactive environment itself?

One item per post, and we will see if we get a winner by means of votes.

[Edit 25-Aug 2008]: So after one week, it seems that the simple str() won the poll. As I like to recommend that one myself, it is an easy answer to accept.

What can MATLAB do that R cannot do?

I often hear people complain how expensive MATLAB licenses are. Then I wonder why they don't just use Octave or R. But is the latter right? Can you use R to replace MATLAB?

Workflow for statistical analysis and report writing

Does anyone have any wisdom on workflows for data analysis related to custom report writing? The use-case is basically this:

  1. Client commissions a report that uses data analysis, e.g. a population estimate and related maps for a water district.

  2. The analyst downloads some data, munges the data and saves the result (e.g. adding a column for population per unit, or subsetting the data based on district boundaries).

  3. The analyst analyzes the data created in (2), gets close to her goal, but sees that needs more data and so goes back to (1).

Workflow for statistical analysis and report writing

Does anyone have any wisdom on workflows for data analysis related to custom report writing? The use-case is basically this:

  1. Client commissions a report that uses data analysis, e.g. a population estimate and related maps for a water district.

  2. The analyst downloads some data, munges the data and saves the result (e.g. adding a column for population per unit, or subsetting the data based on district boundaries).

  3. The analyst analyzes the data created in (2), gets close to her goal, but sees that needs more data and so goes back to (1).

How to sort a dataframe by column(s) in R

I want to sort a data.frame by multiple columns in R. For example, with the data.frame below I would like to sort by column z (descending) then by column b (ascending):

dd <- data.frame(b = factor(c("Hi", "Med", "Hi", "Low"), 
      levels = c("Low", "Med", "Hi"), ordered = TRUE),
      x = c("A", "D", "A", "C"), y = c(8, 3, 9, 9),
      z = c(1, 1, 1, 2))
dd
    b x y z
1  Hi A 8 1
2 Med D 3 1
3  Hi A 9 1
4 Low C 9 2