60+ R resources to improve your data skills
- 16 January, 2014 01:09
This list was originally published as part of the Computerworld Beginner's Guide to R but has since been expanded to also include resources for advanced beginner and intermediate users. If you're just starting out with R, I recommend first heading to the Beginner's Guide.
These websites, videos, blogs, social media/communities, software and books/ebooks can help you do more with R.
Books and e-books
R Cookbook. Like the rest of the O'Reilly Cookbook series, this one offers how-to "recipes" for doing lots of different tasks, from the basics of R installation and creating simple data objects to generating probabilities, graphics and linear regressions. It has the added bonus of being well written. If you like learning by example or are seeking a good R reference book, this is well worth adding to your reference library. By Paul Teetor, a quantitative developer working in the financial sector.
R Graphics Cookbook. If you want to do beyond-the-basics graphics in R, this is a useful resource both for its graphics recipes and brief introduction to ggplot2. While this goes way beyond the graphics capabilities that I need in R, I'd recommend this if you're looking to move beyond advanced-beginner plotting. By Winston Chang, a software engineer at RStudio.
R in Action: Data analysis and graphics with R. This book aims at all levels of users, with sections for beginning, intermediate and advanced R ranging from "Exploring R data structures" to running regressions and conducting factor analyses. The beginner's section may be a bit tough to follow if you haven't had any exposure to R, but it offers a good foundation in data types, imports and reshaping once you've had a bit of experience. There are some particularly useful explanations and examples for aggregating, restructuring and subsetting data, as well as a lot of applied statistics. Note that if your interest in graphics is learning ggplot2, there's relatively little on that here compared with base R graphics and the lattice package. You can see an excerpt from the book online: Aggregation and restructuring data. By Robert I. Kabacoff.
The Art of R Programming. For those who want to move beyond using R "in an ad hoc way ... to develop[ing] software in R." This is best if you're already at least moderately proficient in another programming language. It's a good resource for systematically learning fundamentals such as types of objects, control statements (unlike many R purists, the author doesn't actively discourage for loops), variable scope, classes and debugging -- in fact, there's nearly as large a chapter on debugging as there is on graphics. With some robust examples of solving real-world statistical problems in R. By Norman Matloff.
R in a Nutshell. A reasonably readable guide to R that teaches the language's fundamentals -- syntax, functions, data structures and so on -- as well as how-to statistical and graphics tasks. Useful if you want to start writing robust R programs, as it includes sections on functions, object-oriented programming and high-performance R. By Joseph Adler, a senior data scientist at LinkedIn.
Visualize This. Note; Most of this book is not about R, but there are several examples of visualizing data with R. And there's so much other interesting info here about how to tell stories with data that it's worth a read. By Nathan Yau, who runs the popular Flowing Data blog and whose doctoral dissertation was on "personal data collection and how we can use visualization to learn about ourselves."
R For Dummies. I haven't had a chance to read this one, but it's garnered some good reviews on Amazon.com. If you're familiar with the Dummies series and have found them helpful in the past, you might want to check this one out. You can get a taste of the authors' style in the Programming in R section of Dummies.com, which has more than a 100 short sections such as How to construct vectors in R and How to use the apply family of functions in R. By Joris Meys and Andrie de Vries.
Introduction to Data Science. It's highly readable, packed with useful examples and free -- what more could you want? This e-book isn't technically an "R book," but it uses R for all of its examples as it teaches concepts of data analysis. If you're familiar with that topic you may find some of the explanations rather basic, but there's still a lot of R code for things like analyzing tweet rates (including a helpful section on how to get Twitter OAuth authorization working in R), simple map mashups and basic linear regression. Although Stanton calls this an "electronic textbook," Introduction to Data Science has a conversational style that's pleasantly non-textbook like. There used to be a downloadable PDF, but now the only versions are for OS X or iOS.
R for Everyone. Author Jared P. Lander promises to go over "20% of the functionality needed to accomplish 80% of the work." And in fact, topics that are actually covered, are covered pretty well; but be warned that some topics appearing in the table of contents can be a little thin. This is still a well-organized reference, though, with sections on topics beginning and intermediate users might want to know: importing data, generating graphs, grouping and reshaping data, working with basic stats and more.
Statistical Analysis With R: Beginner's Guide. This book has you "pretend" you're a strategist for an ancient Chinese kingdom analyzing military strategies with R. If you find that idea hokey, move along to see another resource; if not, you'll get a beginner-level introduction to various tasks in R, including tasks you don't always see in an intro text, such as multiple linear regressions and forecasting. Note: My early e-version had a considerable amount of bad spaces in my Kindle app, but it was still certainly readable and usable.
Reproducible Research with R and RStudio. Although categorized as a "bioinformatics" textbook (and priced that way - even the Kindle edition is more than $50), this is more general advice on steps to make sure you can document and present your work. This includes numerous sections on creating report documents using the knitr package, LaTeX and Markdown -- tasks not often covered in-depth in general R books. The author has posted source code for generating the book on GitHub, though, if you want to create an electronic version of it yourself.
Exploring Everyday Things with R and Ruby. This book oddly goes from a couple of basic introductory chapters to some fairly robust, beyond-beginner programming examples; for those who are just starting to code, much of the book may be tough to follow at the outset. However, the intro to R is one of the better ones I've read, including lot of language fundamentals and basics of graphing with ggplot2. Plus experienced programmers can see how author Sau Sheong Chang splits up tasks between a general language like Ruby and the statistics-focused R.
4 data wrangling tasks in R for advanced beginners, This follow-up to our Beginner's Guide outlines how to do several specific data tasks in R: add columns to an existing data frame, get summaries, sort results and reshape data. With sample code and explanations.
Cookbook for R. Not to be confused with the R Cookbook book mentioned above, this website by software engineer Winston Chang (author of the R Graphics Cookbook) offers how-to's for tasks such as data input and output, statistical analysis and creating graphs. It's got a similar format to an O'Reilly Cookbook; and while not as complete, can be helpful for answering some "How do I do that?" questions.
Quick-R. This site has a fair amount of samples and brief explanations grouped by major category and then specific items. For example, you'd head to "Stats" and then "Frequencies and crosstabs" to get an explainer of the table() function. This ranges from basics (including useful how-to's for customizing R startup) through beyond-beginner statistics (matrix algebra, anyone?) and graphics. By Robert I. Kabacoff, author of R in Action.
R Reference Card. If you want help remembering function names and formats for various tasks, this 4-page PDF is quite useful despite its age (2004) and the fact that a link to what's supposed to be the latest version no longer works. By Tom Short, an engineer at the Electric Power Research Institute.
A short list of R the most useful commands. Commands grouped by function such as input, "moving around" and "statistics and transformations." This offers minimal explanations, but there's also a link to a longer guide to Using R for psychological research. HTML format makes it easy to cut and paste commands. Also somewhat old, from 2005. By William Revelle, psychology professor at Northwestern University.
Chart Chooser in R. This has numerous examples of R visualizations and sample code to go with them, including bar, column, stacked bar & column, bubble charts and more. It also breaks down the visualizations by categories like comparison, distribution and trend. By Greg Lamp, based on Juice Labs' Chart Choser for Excel and PowerPoint.
Frequently Asked Questions about R. Some basics about reading, writing, sorting and shaping data as well as a lineup of how to do various statistical operations and a few specialized graphics such as spaghetti plots. From UCLA's Institute for Digital Research and Education.
R Reference Card for Data Mining. This is a task-oriented compilation of useful R packages and functions for things ranging from text mining and time series analysis to more general subjects like graphics and data manipulation. Since descriptions are somewhat bare-boned, this will likely be more useful to either remind you of functions you've seen before or give you suggestions for things to try. For much more on the subject, head to the author's R and Data Mining website, which includes examples and other documentation. including a substantial portion of his book R and Data Mining published by Elsevier in 2012. By Yanchang Zhao.
Spatial Cheat Sheet. For those doing GIS and spatial analysis work, this list offers some key functions and packages for working with spatial vector and raster data. By Barry Stephen Rowlingson at Lancaster University in the U.K.
Web interface for ggplot2. This online tool by UCLA Ph.D. candidate Jeroen Ooms creates an interactive front end for ggplot2, allowing users to input tasks they want to do and get a plot plus R code in return. Useful for those who want to learn more about using ggplot2 for graphics without having to read through lengthy documentation.
Ten Things You Can Do in R That You Would've Done in Microsoft Excel. From the R for Dummies Web site, these code samples aim to help Excel users feel more comfortable with R.
Twotorials. You'll either enjoy these snappy 2-minute "twotorial" videos or find them, oh, corny or over the top. I think they're both informative and fun, a welcome antidote to the typically dry how-to's you often find in statistical programming. Analyst Anthony Damico takes on R in 2-minute chunks, from "how to create a variable with R" to "how to plot residuals from a regression in R;" he also tackles an occasional problem such as "how to calculate your ten, fifteen, or twenty thousandth day on earth with R." I'd strongly recommend giving this a look if textbook-style instruction leaves you cold.
Google Developers' Intro to R. This series of 21 short YouTube videos includes some basic R concepts, a few lessons on reshaping data and some info on loops. In addition, six videos focus on a topic that's often missing in R intros: working with and writing your own functions. This YouTube playlist offers a good programmer's introduction to the language -- just note that if you're looking to learn more about visualizations with R, that's not one of the topics covered.
Up and Running with R. This lynda.com video class covers the basics of topics such as using the R environment, reading in data, creating charts and calculating statistics. The curriculum is limited, but presenter Barton Poulson tries to explain what he's doing and why, not simply run commands. He also has a more in-depth 6-hour class, R Statistics Essential Training. Lynda.com is a subscription service that starts at $25/month, but several of the videos are available free for you to view and see if you like the instruction style, and there's a 7-day free trial available.
Coursera: Computing for Data Analysis. Coursera's free online classes are time-sensitive: You've got to enroll while they're taking place or you're out of luck. However, if there's no session starting soon, instructor Roger Peng, associate professor of biostatistics at Johns Hopkins University, posted his lectures on YouTube; Revolution Analytics then collected them on a handy single page. While I found some of these a bit difficult to follow at times, they are packed with information, and you may find them useful.
Coursera: Data Analysis. This was more of an applied statistics class that uses R as opposed to one that teaches R; but if you've got the R basics down and want to see it in action, this might be a good choice. There are no upcoming scheduled sessions for this at Coursera, but instructor Jeff Leek -- an assistant professor of biostatistics at Johns Hopkins, posted his lecture videos on YouTube, and Revolution Analytics collected links to them all by week.
Coursera: Statistics One If you don't mind going through a full, 12-week stats course along with learning R, Princeton University senior lecturer Andrew Conway's class includes an introduction to R. "All the examples and assignments will involve writing code in R and interpreting R output," says the course description. You can check the Coursera link to see if and when future sessions are scheduled.
Other online introductions and tutorials
Try R This beginner-level interactive online course will probably seem somewhat basic for anyone who has experience in another programming language. However, even if the focus on pirates and plunder doesn't appeal to you, it may be a good way to get some practice and get more comfortable using R syntax.
An Introduction to R. Let's not forget the R Project site itself, which has numerous resources on the language including this intro. The style here is somewhat dry, but you'll know you're getting accurate, up-to-date information from the R Core Team.
Learning statistics with R: A tutorial for psychology students and other beginners by Daniel Navarro at the University of Adelaide (PDF). 500+ pages that go from "Why do we learn statistics" and "Statistics in every day life" to linear regression and ANOVA (ANalysis Of VAriance). If you don't need/want a primer in statistics, there are still many sections that focus specifically on R.
R Tutorial. A reasonably robust beginning guide that includes sections on data types, probability and plots as well as sections focused on statistical topics such as linear regression, confidence intervals and p-values. By Kelly Black, associate professor at Clarkson University.
r4stats.com. This site is probably best known in the R community for author Bob Muenchen's tracking of R's popularity vs. other statistical software. However, in the Examples section, he's got some R tutorials such as basic graphics and graphics with ggplots. He's also posted code for tasks such as data import and extracting portions of your data comparing R with alternatives such as SAS and SPSS.
Aggregating and restructuring data. This excerpt from R in Action goes over one of the most important subjects in using R: reshaping your data so it's in the format needed for analysis and then grouping and summarizing that data by factors. In addition to touching on base-R functions like the useful-but-not-always-well-known aggregate(), it also covers melt() and cast() with the reshape package. By Robert I. Kabacoff.
Getting started with charts in R. From the popular FlowingData visualization website run by Nathan Yau, this tutorial offers examples of basic plotting in R. Includes downloadable source code. (While many FlowingData tutorials now require a paid membership to the site, as of May 2013 this one did not.)
Using R for your basic statistical Needs LISA Short Course. Aimed at those who already know stats but want to learn R, this is a file of R code with comments, making it easy to run (and alter) the code yourself. The programming is easy to follow, but if you haven't brushed up on your stats lately, be advised that comments such as
Suppose we'd like to produce a reduced set of independent variables. We could use the function # step() to perform stepwise model selection based on AIC which is -2log(Likelihood) + kp? Where k=2 # and p = number of model parameters (beta coefficients).
may be tough to follow. By Nels Johnson at Virginia Tech's Laboratory for Interdisciplinary Statistical Analysis.
Producing Simple Graphs with R. Although 6+ years old now, this gives a few more details and examples for several of the visualization concepts touched on in our beginner's guide. By Frank McCown at Harding University.
Short courses. Materials from various courses taught by Hadley Wickham, chief scientist at RStudio and author of several popular R packages including ggplot2. Features slides and code for topics beyond beginning R, such as R development master class.
Quick introduction to ggplot2. Very nice, readable and -- as promised -- quick introduction to the ggplot2 add-on graphic package in R, including lots of sample plots and code. By Google engineer Edwin Chen.
ggplot2 workshop presentation. This robust, single-but-very-long-page tutorial offers a detailed yet readable introduction to the ggplot2 graphing package. What sets this apart is its attention to its theoretical underpinnings while also offering useful, concrete examples. From a presentation at the Advances in Visual Methods for Linguistics conference. By Josef Fruehwald, then a PhD candidate at the University of Pennsylvania.
ggplot2_tutorial.R. This online page at RPubs.com, prepared for the Santa Barbara R User Group, includes a lot of commented R code and graph examples for creating data visualizations with ggplot2.
More and Fancier Graphics. This one-page primer features loads of examples, including explainers of a couple of functions that let you interact with R plots, locator() and identify() as well as a lot of core-R plotting customization. By William B. King, Coastal Carolina University.
ggplot2 Guide. This ggplot2 explainer skips the simpler qplot option and goes straight to the more powerful but complicated ggplot command, starting with basics of a simple plot and going through geoms (type of plot), faceting (plotting by subsets), statistics and more. By data analyst George Bull at Sharp Statistics.
Using R. In addition to covering basics, there are useful sections on data manipulation -- an important topic not easily covered for beginners -- as well as getting statistical summaries and generating basic graphics with base R, the Lattice package and ggplot2. Short explainers are interspersed with demo code, making this useful as both a tutorial and reference site. By analytics consultant Alastair Sanderson, formerly research fellow in the Astrophysics & Space Research (ASR) Group at the University of Birmingham in the U.K.
The Undergraduate Guide to R. This is a highly readable, painless introduction to R that starts with installation and the command environment and goes through data types, input and output, writing your own functions and programming tips. Viewable as a Google Doc or downloadable as a PDF, plus accompanying files. By Trevor Martin, then at Princeton University, funded in part by an NIH grant.
How to turn CSV data into interactive visualizations with R and rCharts. 9-page slideshow gives step-by-step instructions on various options for generating interactive graphics. The charts and graphs use jQuery libraries as the underlying technology but only a couple of line of R code are needed. By Sharon Machlis, Computerworld.
Higher Order Functions in R. If you're at the point where you want to apply functions on multiple vectors and data frames, you may start bumping up against the limits of R's apply family. This post goes over 6 extremely useful base R functions with readable explanations and helpful examples. By John Mules White, "soon-to-be scientist at Facebook."
Introductory Econometrics Using Quandl and R While this does indeed promote Quandl as your data source, that data is free, and for those interested in using R for regressions, you'll find several detailed walk-throughs from data import through statistical analysis.
More free downloads and websites from academia:
Introducing R. Slide presentation from the UCLA Institute for Digital Research and Education, with downloadable data and code.
Introducing R. Although titled for beginners and including sections on getting started and reading data, this also shows how to use R for various types of linear models. By German Rodriguez at Princeton University's Office of Population Research.
R: A self-learn tutorial. Intro PDF from National Center for Ecological Analysis and Synthesis at UC Santa Barbara. While a bit dry, it goes over a lot of fundamentals and includes exercises.
Statistics with R Computing and Graphics. Unlike many PDF downloads from academia, this one is both short (15 pages) and basic, with some suggested informal exercises as well as explanations on things like getting data into R and statistical modeling (understanding statistical concepts like linear modeling is assumed). By Kjell Konis, then at the University of Oxford.
Little Book of R for Time Series. This is extremely useful if you want to use R for analyzing data collected over time, and also has some introductory sections for general R use even if you're not doing time series. By Avril Coghlan at the Wellcome Trust Sanger Institute, Cambridge, U.K.
Introduction to ggplot2. 11-page PDF with some ggplot basics, by N. Matloff at UC Davis.
Pretty much every social media platform has an R group. I'd particularly recommend:
Statistics and R on Google+. Community members are knowledgeable and helpful, and various conversation threads engage both newbies and experts.
Twitter #rstats hashtag. Level of discourse here ranges from beginner to extremely advanced, with a lot of useful R resources and commentary getting posted.
You can also find R groups on LinkedIn, Reddit and Facebook, among other platforms.
Stackoverflow has a very active R community where people ask and answer coding questions. If you've got a specific coding challenge, it's definitely worth searching here to see if someone else has already asked about something similar.
There are dozens of R User Meetups worldwide. In addition, there are other user groups not connected with Meetup.com. Revolution Analytics has an R User Group Directory.
Blogs & blog posts
R-bloggers. This site aggregates posts and tutorials from more than 250 R blogs. While both skill level and quality can vary, this is a great place to find interesting posts about R -- especially if you look at the "top articles of the week" box on the home page.
Revolutions. There's plenty here of interest to all levels of R users. Although author Revolution Analytics is in the business of selling enterprise-class R platforms, the blog is not focused exclusively on their products.
Post: 10 R packages I wish I knew about earlier. Not sure all of these would be in my top 10, but unless you've spent a fair amount of time exploring packages, you'll likely find at least a couple of interesting and useful R add-ons.
Post: R programming for those coming from other languages. If you're an experienced programmer trying to learn R, you'll probably find some useful tips here.
Post: A brief introduction to 'apply' in R. If you want to learn how the apply() function family works, this is a good primer.
Translating between R and SQL. If you're more experienced (and comfortable) with SQL than R, it can be frustrating and confusing at times to figure out how to do basic data tasks such as subsetting your data. Statistics consultant Patrick Burns shows how to do common data slicing in both SQL and R, making it easier for experienced database users to add R to their toolkit.
Graphs & Charts in base R, ggplot2 and rCharts. There are lots of sample charts with code here, showing how to do similar visualization tasks with basic R, the ggplot2 add-on package and rCharts for interactive HTML visualizations.
When to use Excel, when to use R? For spreadsheet users starting to learn R, this is a useful question to consider. Michael Milton, author of Head First Data Analysis (which discusses both Excel and R), offers practical (and short) advice on when to use each.
A First Step Towards R From Spreadsheets. Some advice on both when and how to start moving from Excel to R, with a link to a follow-up post, From spreadsheet thinking to R thinking.
Searching for "R" on a general search engine like Google can be somewhat frustrating, given how many utterly unrelated English words include the letter r. Some search possibilities:
RSeek is a Web search engine that just returns results from certain R-focused websites.
R site search returns results just from R functions, package "vignettes" (documentation that helps explain how a function works) and task views (focusing on a particular field such as social science or econometrics).
You can also search the R mailing list archives.
Google's R Style Guide. Want to write neat code with a consistent style? You'll probably want a style guide; and Google has helpfully posted their internal R style for all to use. If that one doesn't work for you, Hadley Wickham has a fairly abbreviated R style guide based on Google's but "with a few tweaks."
RStudio documentation. If you're using RStudio, it's worth taking a look at parts of the documentation at some point so you can take advantage of all it has to offer.
History of R Financial Time Series Plotting. Although, as the name implies, this focuses on financial time-series graphics, it's also a useful look at various options for plotting any data over time. With lots of code samples along with graphics. By Timely Portfolio on GitHub.
Grouping & Summarizing Data in R. There are so many ways to do these tasks in R that it can be a little overwhelming even for those beyond the beginner stage to decide which to use when. This downloadable Slideshare presentation by analyst Jeffrey Breen from the Greater Boston useR Group is a useful overview.
R Instructor. This app is primarily a well-designed, very thorough index to R, offering snippets on how to import, summarize and plot data, as well as an introductory section. An "I want to..." section gives short how-to's on a variety of tasks such as changing data classes or column/row names, ordering or subsetting data and more. Similar information is available free online; the value-add is if you want the info organized in an attractive mobile app. Extras include instructional videos and a "statistical tests" section explaining when to use various tests as well as R code for each. For iOS and Android, about $5.
Comprehensive R Archive Network (CRAN). The most important of all: home of the R Project for Statistical Computing, including downloading the basic R platform, FAQs and tutorials as well as thousands of add-on packages. Also features detailed documentation and a number of links to more resources.
RStudio. You can download the free RStudio IDE as well as RStudio's Shiny project aimed at turning R analyses into interactive Web applications.
Revolution Analytics. In addition to its commercial Revolution R Enterprise, you can request a download of their free Revolution R Community (you'll need to provide an email address). Both are designed to improve R performance and reliability.
Tibco. This software company recently released a free Tibco Enterprise Runtime for R Developers Edition to go along with its commercial Tibco Enterprise Runtime for R engine aimed at helping to integrate R analysis into other enterprise platforms.
Learn to use R: Your hands-on guide
If you're just starting with R, don't miss the Computerworld Beginner's Guide to R:
Part 1: Introduction - get started with this popular programming language.
Part 2: Getting your data into R - tips on how to import data in various formats, both local and on the Web.
Part 3: Easy ways to do basic data analysis - extract some simple stats.
Part 4: Painless data visualization - simple graphics, bar graphs and a few more complex charts.
Part 5: Syntax quirks you'll want to know - some R idiosyncrasies.
This article, 60+ R resources to improve your data skills, was originally published at Computerworld.com as part of the Computerworld Beginner's Guide to R , which was written by Sharon Machlis and edited by Johanna Ambrosio.
Join the Computerworld Australia group on Linkedin. The group is open to IT Directors, IT Managers, Infrastructure Managers, Network Managers, Security Managers, Communications Managers.
NBN Co hits 105Mbps in limited FTTN trial
TPG pushes unlimited NBN fibre plans
NBN Co hits 105Mbps in limited FTTN trial
Microsoft puts the squeeze on Windows to shoehorn it into 16GB devices
Adobe patches a critical flaw in Flash Player and AIR shown at Pwn2Own contest