4 data wrangling tasks in R for advanced beginners

Learn how to add columns, get summaries, sort your results and reshape your data.

To get the highest profit margins for each company, we're splitting the data frame by only one factor -- company. To get just the highest value and company name for each company, use summarize :

highestProfitMargins <- companiesData %>%
  group_by(company) %>%
  summarize(
    bestMargin = max(margin)
  )

(Here we've assigned the results to the variable highestProfitMargins, which will contain a new data frame.)

  company bestMargin
1 Apple 26.7
2 Google 29.0
3 Microsoft 33.1

Summarize doesn't give any information from other columns in the original data frame. In what year did each of the highest margins occur? We can't tell by using summarize.

If you want all the other column data, too, change "summarize" to "mutate." That will return your existing data frame with a new column that repeats the maximum margin for each company:

highestProfitMargins <- companiesData %>%
  group_by(company) %>%
  mutate(
    bestMargin = max(margin)
  )

  fy company revenue profit margin bestMargin
1 2010 Apple 65225 14013 21.5 26.7
2 2011 Apple 108248 25922 23.9 26.7
3 2012 Apple 156508 41733 26.7 26.7
4 2010 Google 29321 8505 29.0 29.0
5 2011 Google 37905 9737 25.7 29.0
6 2012 Google 50175 10737 21.4 29.0
7 2010 Microsoft 62484 18760 30.0 33.1
8 2011 Microsoft 69943 23150 33.1 33.1
9 2012 Microsoft 73723 16978 23.0 33.1

Note that this result shows the profit margin for each company and year in the margin column along with the bestMargin repeated for each company and year. The only way to tell which year has the best margin is to compare the two columns to see where they're equal.

summarize() and mutate() let you apply more than one function at a time, for example:

highestProfitMargins <- companiesData %>%
  group_by(company) %>%
  mutate(
    highestMargin = max(margin),
    lowestMargin = min(margin)
  )

This gets you:

  fy company revenue profit margin highestMargin lowestMargin
1 2010 Apple 65225 14013 21.5 26.7 21.5
2 2011 Apple 108248 25922 23.9 26.7 21.5
3 2012 Apple 156508 41733 26.7 26.7 21.5
4 2010 Google 29321 8505 29.0 29.0 21.4
5 2011 Google 37905 9737 25.7 29.0 21.4
6 2012 Google 50175 10737 21.4 29.0 21.4
7 2010 Microsoft 62484 18760 30.0 33.1 23.0
8 2011 Microsoft 69943 23150 33.1 33.1 23.0
9 2012 Microsoft 73723 16978 23.0 33.1 23.0

In some cases, though, what you want is a new data frame with just the (entire) rows that have the highest profit margins. One way to do that is to add a filter() statement to the pipe. filter() takes the syntax filter(mydata, conditional statement), or in this case:

highestProfitMargins <- companiesData %>%
  group_by(company) %>%
  mutate(
    highestMargin = max(margin)
  ) %>%
  filter(margin == highestMargin)

That last line says "only keep rows within each group where the value of the margin column equals the value of the highestMargin column." Also note the double equals sign, which means "test whether the left side of the equation equals the right side of the equation." A single equals sign assigns the value of the right side of the equation to the variable on the left side.

  fy company revenue profit margin
1 2012 Apple 156508 41733 26.7
2 2010 Google 29321 8505 29.0
3 2011 Microsoft 69943 23150 33.1

Again, a reminder that %>% is a "chaining" operation that allows you to string together multiple commands on a data frame.

Note that highestProfitMargins and myresults are a special type of data frame created by dplyr called a tibble. If you have problems running more conventional non-dplyr operations on a dplyr result, which has happened to me on occasion, convert it to a "regular" data frame with as.data.frame(), such as

highestProfitMargins <- as.data.frame(highestProfitMargins)

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

More about AdvancedAppleGitHubGoogleMicrosoft

Show Comments