Posted: September 20th, 2022

you used Google sheets for data manipulation. In week 4, you learned how to use Tidyverse for data manipulation. Compare the use of Googlesheets and Tidyverse for data manipulation. Be specific about how you can do specific data manipulation tasks in


you used Google sheets for data manipulation.  In week 4, you learned how to use Tidyverse for data manipulation. 

Compare the use of Googlesheets and Tidyverse for data manipulation.  Be specific about how you can do specific data manipulation tasks in Tidyverse and how these tasks can be done in Googlesheets.

Review the Tidyverse cheat sheet for a summary of data manipulation commands.  This cheat sheet will also be useful for the practice problems and other assignments/projects

R For Data Science Cheat Sheet
Tidyverse for Beginners

Learn More R for Data Science Interactively at


Learn R for Data Science Interactively

The tidyverse is a powerful collection of R packages that are actually
data tools for transforming and visualizing data. All packages of the
tidyverse share an underlying philosophy and common APIs.

The core packages are:

• ggplot2, which implements the grammar of graphics. You can use it
to visualize your data.

• dplyr is a grammar of data manipulation. You can use it to solve the
most common data manipulation challenges.

• tidyr helps you to create tidy data or data where each variable is in a
column, each observation is a row end each value is a cell.

• readr is a fast and friendly way to read rectangular data.

• purrr enhances R’s functional programming (FP) toolkit by providing a
complete and consistent set of tools for working with functions and

• tibble is a modern re-imaginging of the data frame.

• stringr provides a cohesive set of functions designed to make
working with strings as easy as posssible

• forcats provide a suite of useful tools that solve common problems
with factors.

You can install the complete tidyverse with:

Then, load the core tidyverse and make it available in your current R
session by running:

Note: there are many other tidyverse packages with more specialised usage. They are not
loaded automatically with library(tidyverse), so you’ll need to load each one with its own call
to library().


> install.packages(“tidyverse”)

> iris %>% Select iris data of species
filter(Species==”virginica”) “virginica”
> iris %>% Select iris data of species
filter(Species==”virginica”, “virginica” and sepal length
Sepal.Length > 6) greater than 6.



> library(tidyverse)

Useful Functions




> tidyverse_conflicts() Conflicts between tidyverse and other
> tidyverse_deps() List all tidyverse dependencies
> tidyverse_logo() Get tidyverse logo, using ASCII or unicode
> tidyverse_packages() List all tidyverse packages
> tidyverse_update() Update tidyverse packages

Loading in the data
> library(datasets) Load the datasets package
> library(gapminder) Load the gapminder package
> attach(iris) Attach iris data to the R search path

filter() allows you to select a subset of rows in a data frame.

> iris %>% Sort in ascending order of
arrange(Sepal.Length) sepal length
> iris %>% Sort in descending order of
arrange(desc(Sepal.Length)) sepal length

arrange() sorts the observations in a dataset in ascending or descending order
based on one of its variables.

> iris %>% Filter for species “virginica”
filter(Species==”virginica”) %>% then arrange in descending
arrange(desc(Sepal.Length)) order of sepal length

Combine multiple dplyr verbs in a row with the pipe operator %>%:

mutate() allows you to update or create new columns of a data frame.

> iris %>% Change Sepal.Length to be
mutate(Sepal.Length=Sepal.Length*10) in millimeters
> iris %>% Create a new column
mutate(SLMm=Sepal.Length*10) called SLMm

Combine the verbs filter(), arrange(), and mutate():
> iris %>%
filter(Species==”Virginica”) %>%
mutate(SLMm=Sepal.Length*10) %>%

> iris %>% Summarize to find the
summarize(medianSL=median(Sepal.Length)) median sepal length
> iris %>% Filter for virginica then
filter(Species==”virginica”) %>% summarize the median
summarize(medianSL=median(Sepal.Length)) sepal length

summarize() allows you to turn many observations into a single data point.

> iris %>%
filter(Species==”virginica”) %>%

You can also summarize multiple variables at once:

group_by() allows you to summarize within groups instead of summarizing the
entire dataset:

> iris %>% Find median and max
group_by(Species) %>% sepal length of each
summarize(medianSL=median(Sepal.Length), species
> iris %>% Find median and max
filter(Sepal.Length>6) %>% petal length of each
group_by(Species) %>% species with sepal
summarize(medianPL=median(Petal.Length), length > 6

Scatter plot

> iris_small <- iris %>%
filter(Sepal.Length > 5)
> ggplot(iris_small, aes(x=Petal.Length, Compare petal
y=Petal.Width)) + width and length

Scatter plots allow you to compare two variables within your data. To do this with
ggplot2, you use geom_point()

Additional Aesthetics

> ggplot(iris_small, aes(x=Petal.Length,
color=Species)) +

• Color

• Size
> ggplot(iris_small, aes(x=Petal.Length,
size=Sepal.Length)) +

> ggplot(iris_small, aes(x=Petal.Length,
y=Petal.Width)) +

Line Plots

Bar Plots


Box Plots

> by_year <- gapminder %>%
group_by(year) %>%
> ggplot(by_year, aes(x=year,

> by_species <- iris %>%
filter(Sepal.Length>6) %>%
group_by(Species) %>%
> ggplot(by_species, aes(x=Species,
y=medianPL)) +

> ggplot(iris_small, aes(x=Petal.Length))+

> ggplot(iris_small, aes(x=Species,

Expert paper writers are just a few clicks away

Place an order in 3 easy steps. Takes less than 5 mins.

Calculate the price of your order

You will get a personal manager and a discount.
We'll send you the first draft for approval by at
Total price: