No more excuses: R is better than SPSS for psychology undergrads, and students agree

Starting this academic year (2016/7), all of our stats teaching in Psychology at the University of Glasgow will use R and RStudio instead of Excel and SPSS.

When we proposed the transition, some staff worried that the scripting nature of R would be too challenging for incoming students, many of whom would be starting the program with little or no programming experience.  Staff at other universities who have also been leading similar transitions to R have also told me that they have faced similar pessimism from teaching staff.

I did not share these doubts, because for six years, I have been giving students the choice between R and SPSS in my level 3 course, and found that those who chose to use R and RStudio were mostly able to get up and going on their own, even without very much guidance from me, beyond sending them links to online tutorials. To be sure, students found it challenging at first, but these were students who had just gone through two years of the program working in Excel and SPSS, and had become accustomed to the point-and-click nature of these programs.  The truth is that you really aren’t exposed to the quirks of R until you start using it for data wrangling–which generally isn’t part of the psychology stats curriculum anyway (but should be!)  If all you are doing is plugging in pre-formatted, pre-cleaned, canned datasets, cranking out a t-test or an ANOVA and maybe also a bar graph, the software you use does not make a huge difference.  But if that is all you are teaching, your students will be ill-prepared when they first encounter their own messy datasets.

 

This kind of teaching should be a relic of the past, and this is one of the many reasons we decided to move to R: we thought that R’s more interactive, transparent, and reproducible approach to data analysis would be better for learning, and we also wanted to incorporate more data science skills into our curriculum.

We recently piloted some of our new R materials on a group of undergraduates who already had some exposure to SPSS in previous years, which allowed them to compare the two platforms.  Bear in mind that these students had no previous experience with R before the piloting day, and had not yet been subject to my annual brainwashing lecture on the benefits of R at the start of year three.

One of the questions we asked at the end of the session was: “Did you understand how to use the software?” 12/13 students said “yes”, with the remaining student saying “yes and no; with some practice, it should be OK.”

The students’ comments are very illuminating, and I will let them speak for themselves.  They have not been edited or cherry picked.

I hope they encourage other departments to make the transition to R.

R generally is fascinating, and I can absolutely see the benefits of it. I am sure it will make students more critically aware about what they’re doing instead of just following SPSS drop-down menus.

 

R is very easy to use. Typing in code helps with understanding.

 

Already find R much better than SPSS because I feel much more in control and enjoy having a clear oversight over what I’ve been doing, while SPSS just felt like I was clicking random buttons and looking for numbers to write down.

 

I find it more engaging than SPSS.

 

You are able to see exactly what is happening to the stats, you can change the graphs around and if you add a sort of interactive feature, e.g., change the error bars, colour the graph, etc., you can start to see how the code works.

 

You can edit/manipulate the data much easier than with SPSS.  It forced you to engage with it.

 

Different things are on the screen at the same time so you get a complete overview of your stats.  It would help if most relevant results were written out in bold; volume of text can be a bit overwhelming, but I guess that’s how the software is.

 

If you know what to look for, it can really help with solidifying mechanics behind stats.

 

I found the experience very exciting and going through it with the teaching assistant was extremely helpful. I personally prefer R to SPSS and can’t wait to learn it and be able to run it. R looks more timesaving and less confusing than SPSS.

 

Better than pointing and clicking. Coding is much more flexible. But the results of the tests look a bit more confusing than with SPSS.

What the world needs now: Even more R/RStudio instructional videos

Backstory

You might be interested in the backstory for these two R/RStudio instructional videos I created, which gives insight into our department’s recent transition to teaching stats using R/RStudio instead of Excel/SPSS (if not, the TL;DR for this section is that these two videos were borne out of frustration, last-minute panic, and pedagogical role-modeling by my son).

Starting this 2016/7 academic year, all our statistics instruction in psychology is leaving Excel/SPSS behind and moving on to R/RStudio.  We chose R because we want our students to learn how to do make their data analyses reproducible.  The first year of our program will now be devoted to developing basic data science skills: loading in different kinds of datasets, tidying them, merging them with others, visualizing distributions, calculating basic descriptive statistics, and generating reports in RMarkdown.  We needed some kind of tutorial on interacting with R/RStudio that incoming students could work through at their own pace.

We also needed some additional training materials for our teaching staff.  We set aside a day before the start of term where we would pilot our new lab materials with students, almost none of whom had encountered R/RStudio before. The students would therefore need lots of support to get through the exercises.  But in spite of best our efforts to get teaching staff sufficiently trained up, as the pilot day approached, many were still anxious about having to help students use software they themselves were still coming to grips with.

I had developed a series of web-based step-by-step walkthrough documents for our incoming students that I sent out to some of the staff to try out, and although staff politely expressed gratitude for my efforts, I think they found it overwhelming.  Some complained that it took far too much time to get through (and, by the way, was also missing important information).  So clearly the format was not working.

While I had been working on these materials, my son had been spending his last days of summer vacation producing videogame walkthroughs (BTW, if you’re looking for cool Minecraft videos, this 11-year-old has got your back).  So 24 hours before piloting day, with staff panic approaching meltdown levels, I realized through my son’s example that the pedagogical medium I needed all along was video (DUH, Dad!).

 

I needed a video, and I needed it quickly.  I did not have hours to spend watching videos on YouTube to the exact one that would suit my needs, and I quickly realized that it would probably take me less time to make my own than to review the many hours of instructional videos already available, if I limited myself to one take.  After all, wasn’t Sister Ray cut in a single take?  So I thought up an analysis task, launched the video capture software, and hoped for the best.  Judge accordingly.

The staff and students were happy the result, so I decided to make the videos public.  Hopefully others will find these introductory videos useful, especially those just starting out in R.

The videos

The videos provide a demo of R/RStudio in the context of an analysis of Scottish babynames.  I had three goals:

  • Dazzle students with some R black magic so that they get excited about its possibilities, while still giving them the basics;
  • Provide a model of how to interact with R/RStudio in the context of a well-defined analysis task;
  • Choose an analysis project that would be fun and personally relevant to our incoming students.  I had been playing around with the babynames package for one of our homework assignments, and in the process had discovered that the National Records of Scotland has a similar database in CSV format.

I did the analysis twice, once as an R script and once as part of an RMarkdown report, and decided to split the video into two parts.

Video 1: Basic interaction with RStudio, developing an R script

Yes, around a minute of the first video involves me awkwardly watching the readr package installation process, wishing I knew some good R jokes or had some background music to make the time pass more quickly.

Yes, in the video I accidentally reveal that I had been recently been using R to make chickens talk.

Yes, color is probably not the best way to differentiate the names in the graphs.

Yes, at no point in the video do I appear to realize that the trends I am looking at in the videos largely reflect the statistical phenomenon of regression toward the mean, in spite of having published on this very topic.  But at least it did remind me after the fact that we need to discuss this phenomenon somewhere in our curriculum!

The R script for this analysis:

Video 2: RMarkdown and knitting an HTML report

This second part of the video reproduces the analysis in an RMarkdown document and shows how to compile an HTML report.

And the RMarkdown document (click “View Raw” at the bottom right of the window for the RMarkdown source)