Download the script here.
In this tutorial, we will learn how to produce this graph.
## `geom_smooth()` using formula 'y ~ x'
Actually, it only takes 12 lines of code (of which 5 are only related to the text captions) and some pretty straight forward logic.
This site heavily builds upon The R Bootcamp who employ the widely used mpg dataset and the accompanying coding instructions (see for example (see for example R for Data Science) for their data visualization tutorial.
But before we start, as always - clean the environment and load packages. First thing. Always.
rm(list = ls())
library(tidyverse) # We will now be using ggplot2 which is part of the tidyverse
Base R has several functions for data visualization. Crucially, you need separate functions for each type of plot. Some examples:
Solution: ggplot2
Aesthetics | Description | Code |
Data | Which data do we want to use | ggplot(data=) |
Relationship | Which relationship do we want to display (axes, color, ) color, size, shape, etc. | ggplot(data=df, mapping = aes(x= , y = , color= , size= , shape= )) |
Geometric Objects | How (e.g. in which form) do we want to see the relationships in our data (histogram, points, etc…) | ggplot(data=df, mapping = aes(x= , y = , color= , size= )) + geom_XX() |
Additional stuff | Some additional functions to adjust scales, labels, tick marks, titles | among other things +labs(), +scale() |
Step by step, with some digressions, we will now replicate the graph from above.
Select the data (mpg) and attach displ (the engine displacement) to the x axis, hwy (Highway miles per gallon) for the y axis and the class (car class) as the color variable.
ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = class))
Well, we don’t see much. The reason is that we have only specified the data and the relationship, so far. But we have not told ggplot, which kinds of objects it should plot.
ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = class))+
Now, it is up to you.
Task Reproduce the following plots!
Follow the instructions about the mapping and the type of geom below each plot.
?Back to business. Let’s continue working on our replication.
A smoothed regression line (standard OLS) can be added by adding geom_smooth()
. Choose method = "lm"
for standard OLS.
ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = class))+
geom_smooth(color = "blue",
method = "lm")
This is largely self-explanatory.
ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = class)) +
geom_point() +
geom_smooth(col = "blue",
method = "lm")+
labs(x = "Engine Displacement in Liters",
y = "Highway miles per gallon",
title = "MPG data",
subtitle = "Cars with higher engine displacement tend to have lower highway mpg",
caption = "Source: mpg data in ggplot2 \n Credits for the tutorial: TheRBootcamp",
color = "Car Classes")
## `geom_smooth()` using formula 'y ~ x'
We can modify the theme - the non-data components, e.g. (besides titles and labels) fonts, background, gridlines, and legends - as much as we would like to with theme()
Further below we will explain how to build this beautiful graph:
## `geom_smooth()` using formula 'y ~ x'
While you can adjust the theme as much as you want, the lazy but safer (e.g. plot designs which are deemed acceptable by more people than just yourself), you can rely on the large set of predesigned themes from ggplot2. Additionally, you can also make use of the ggthemes package, which offers even more themes.
ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = class)) +
geom_point() +
geom_smooth(col = "blue",
method = "lm")+
labs(x = "Engine Displacement in Liters",
y = "Highway miles per gallon",
title = "MPG data",
subtitle = "Cars with higher engine displacement tend to have lower highway mpg",
caption = "Source: mpg data in ggplot2 \n Credits for the tutorial: TheRBootcamp",
color = "Car Classes")+
## `geom_smooth()` using formula 'y ~ x'
There are many more themes (for example, theme_grey
, theme_void
, theme_dark
, theme_minimal
With the ggthemes packages, you have even more themes at hand:
ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = class)) +
geom_point() +
geom_smooth(col = "blue",
method = "lm")+
labs(x = "Engine Displacement in Liters",
y = "Highway miles per gallon",
title = "MPG data",
subtitle = "Cars with higher engine displacement tend to have lower highway mpg",
caption = "Source: mpg data in ggplot2 \n Credits for the tutorial: TheRBootcamp",
color = "Car Classes")+
## `geom_smooth()` using formula 'y ~ x'
Other themes from the ggthemes package include theme_economist
(The Economist style graphs), theme_stata()
(for those who miss Stata), or theme_tufte()
. It has also a cool functionality for creating color scales for color blind people.
… This still doesn’t look like the plot we wanted to create. Actually, the plot from above was build with the theme_bw()
ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = class)) +
geom_point() +
geom_smooth(col = "blue",
method = "lm")+
labs(x = "Engine Displacement in Liters",
y = "Highway miles per gallon",
title = "MPG data",
subtitle = "Cars with higher engine displacement tend to have lower highway mpg",
caption = "Source: mpg data in ggplot2 \n Credits for the tutorial: TheRBootcamp",
color = "Car Classes")+
## `geom_smooth()` using formula 'y ~ x'
Just to mention this, you can save a ggplot object in the environment like everything else (remember, everything in R is an object).
Storing our final plot from above as final_plot
final_plot <- ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = class)) +
geom_point() +
geom_smooth(col = "blue",
method = "lm")+
labs(x = "Engine Displacement in Liters",
y = "Highway miles per gallon",
title = "MPG data",
subtitle = "Cars with higher engine displacement tend to have lower highway mpg",
caption = "Source: mpg data in ggplot2 \n Credits for the tutorial: TheRBootcamp",
color = "Car Classes")+
We can then add stuff to the ggplot object in the usual way.
Suppose, we want to add a fat vertical line (x=3) again:
geom_vline(xintercept = 3, size=5)
## `geom_smooth()` using formula 'y ~ x'
GGplot, being part of the tidyverse, can be perfectly integrated with other functions of the tidyverse.
As an easy example, suppose, we only want to keep hwy values above 25:
## ── Attaching packages ───────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ tibble 3.0.1 ✓ dplyr 0.8.5
## ✓ tidyr 1.0.3 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ✓ purrr 0.3.4
## Warning: package 'tibble' was built under R version 3.6.2
## Warning: package 'tidyr' was built under R version 3.6.2
## Warning: package 'purrr' was built under R version 3.6.2
## ── Conflicts ──────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
mpg %>%
ggplot( # do not call data=mpg again here!
mapping = aes(x = displ,
y = hwy,
color = class)) +
geom_point() +
geom_smooth(col = "blue",
method = "lm")+
labs(x = "Engine Displacement in Liters",
y = "Highway miles per gallon",
title = "MPG data",
subtitle = "Cars with higher engine displacement tend to have lower highway mpg",
caption = "Source: mpg data in ggplot2 \n Credits for the tutorial: TheRBootcamp",
color = "Car Classes")+
## `geom_smooth()` using formula 'y ~ x'
You can print a ggplot object with ggsave
. For this, we use the plot object final_plot
which we created above.
ggsave(plot = final_plot,
file= "FILEDIRECTORYtoSAVEto.png")
Below, find information on how to create a Likert scale type of graph and the basics on how to create an individual theme.
Let’s create fake Likert scale data. Let’s pretend that the different types of transmissions are the levels of a Likert scale and that car classes are the Likert items.
Before we can plot this fake Likert scale, we have to do some data transformation:
mpg2 <- mpg %>%
group_by(class) %>%
mutate(length.class = n()) %>%
group_by(trans, add = TRUE) %>%
mutate(length.trans = n(),
percentage.trans = 100 * length.trans / length.class) %>%
distinct(trans, class, .keep_all = TRUE)
ggplot(data = mpg2,
mapping = aes(x = class,
y = percentage.trans,
fill=trans)) +
geom_bar(stat="identity", width = 0.7)+
?As indicated above, here is an example of a manually (ridiculously) adjusted theme.
Some random ideas for the theme (through the function theme'()
panel.background = element_rect(fill = "pink")
panel.grid.major.y = element_line(colour = "green"), panel.grid.minor.y = element_line(colour = "green")
legend.position = "bottom"
panel.border = element_rect(color="orange", fill = NA, size=5)
Additionally, outside of theme, we also want to add even more beautiful stuff:
+geom_vline(xintercept = 3, size=5)
+scale_x_continuous(breaks = seq(1,8, by=0.5)
Task Reproduce the plot below and follow the instructions from above!
## `geom_smooth()` using formula 'y ~ x'