Packing everything into a data.frame

OK, I know I talk about R too much, but I like R, so I’m going to talk about it some more.

Common situation: repeat a procedure many times; each time generates some large wadge of awful-structured data, and in the end you’d like to go back and look at it all.

OK, sounds reasonably simple, just

lapply(1:Num.Trials, function(N) {
    ...
    list(
        A = ...,
        B = ...,
    )
})

and you’ve got a list of structs containing that data. It works, but I find it undesirable for a few reasons:

  1. A list of lists is cumbersome to navigate. You have to subscript the first list before the second.
  2. You can’t do nice data.frame things with it like plot(…, data=…). Basically, it should be a data.frame, because data.frames are pretty.
  3. Having to explicitly put everything into the struct there at the end forces you to choose what gets remembered and what gets dropped. Rarely do I have such foresight.

So to get a data.frame, we can use the magic of sapply. Like this:

as.data.frame(t(sapply(1:N, function(I) {
    ...
    list(
        A = ...,
        B = ...
    )
}))))

I have to admit I don’t actually know why sapply is smart enough to do this, but it turns the whole shebang into a matrix of mode “list”. t() transposes that matrix so the fields A, B… become the columns. as.data.frame() makes the whole thing a data frame. Excellent.

Well, there’s a little problem here. I didn’t realize this at first, but a data.frame is just a list() of columns plus some attributes() attached. And those columns are welcome to be of mode “list”, as they will be here. In one way that’s actually really convenient, because you can stick complex stuff inside a data.frame, as in, like anything, even whole other data.frames. But you can’t call mean() or sd() or acf() on a vector of mode “list”. Inconvenient.

(By the way, is there any other language in which every object has a type, a mode and a class, all of which mean different things? What is up with that?)

So the solution is this “clean” function, to convert, where possible, vectors of mode “list” to numeric or character vectors.

clean = function(Data.Frame) {
	is.one = function(X) {
		is.atomic(X) && (length(X) == 1)
	}
	
	is.good = function(Col) {
		all(sapply(Col, is.one))
	}
	
	for (Col.Name in colnames(Data.Frame)) {
		Col = Data.Frame[[Col.Name]]
		if (is.good(Col)) {
			Data.Frame[[Col.Name]] = unlist(Col)
		}
	}
	
	Data.Frame
}

Basically, check to see that all the elements are atomic vectors (ie not lists) of length 1; if so, flatten (“unlist”).

And lastly, how about automatically grabbing everything you created along the way? Just end each loop with

as.list(environment())

Putting this all together, we have:

do.trials = function(N, Func) {
	clean(as.data.frame(t(sapply(1:N, Func))))
}
Advertisements
This entry was posted in R. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s