Wrangling messy data in R using the tidyverse
When I was learning R, I found the biggest obstacle to be figuring out how to organize and format the data – how to go from raw, messy data to something I could enter into an analysis. This process has become much easier with the introduction of the tidyverse suite of packages. Over the past several months, I have been adopting the tidyverse syntax and applying it to my new scripts. Compared to my old routines, I’m finding this syntax much more transparent and readable, and I think it will make overcoming the obstacle of learning data wrangling in R much, much easier for new users.
If you have been tempted to switch to R, but have been stymied by data wrangling, I hope you find this notebook I’ve written useful. To learn tidyverse, I took a large data set of output from the automated operation span created by Prof. Randy Engle’s lab (collected by Dr. Jonathan Mall when he worked with me at Rijksuniversiteit Groningen), and created data frames focusing on the processing responses and individual, trial-level memory responses. The program is designed to conveniently output summary scores. These trial level data are available, but need a lot of wrangling to be made useful for analysis. In the notebook, I describe what steps were needed to wrangle the raw data, and how to implement them.