Download the script here
Download the data here
Solutions can be downloaded here - but try to solve everything without the solutions first!
We practice with the Eurostat data again. However, to spare us the time of datamanagement, now we already have a prepared dataset at hand.
What do we always do first? (We only need the tidyverse)
Read the data ( eurostat_data.csv) and store them as eurost
.
There is only one single data management task for you here: For now, we only want to use the year (denoted by the variable time in the dataset) 2014. Please, filter accordingly.
eurost <-
# filter for the time==2014
You might want to explore the data first. Remember, you can use some of those str()
, head()
, summary()
, table()
, quantile()
, and View()
.
Reproduce these three plots. Below the graphs you can find some information.
theme_classic()
ggplot()
function. Therefore, as soon as you add the regression function, move the size argument to geom_point
. But don’t forget that the size argument has to be put within aes()
!
Here, we first do some some data management. We reload the data (but save them as eurost2), filter for some countries and the time period 1990-2015. Furthermore, we create two new columns which copy the values of (youth) unemployment ONLY in the year of 2015.
eurost2 <- read_csv2("data/eurostat_data.csv") %>%
filter(geo_code %in% c("DE", "IT", "EL", "ES", "UK"),
time >= 1990,
time <= 2015) %>%
mutate(
unemp_tod = if_else(time == 2015, unemp_workagepop_t, NA_real_),
unemp_youth_tod = if_else(time == 2015, unemp_youth_t, NA_real_)
)
We use this data for the plot
ggplot()
function, just use x = geo_codegeom_violin
and two seperate geom_point
functions
alpha = 0.5
to increase the transparency of the violin plotsy=unemp_youth_t
and the othery = unemp_workagepop_t
. Use the appropriate colorsy = unemp_tod
and the other y = unemp_youth_tod
. Use color = "black", size = 3
outside of the aesthetic.theme_minimal()
scale_fill_manual(values = c("red", "blue"), labels = c("Total Unemployment", "Youth Unemployment"))
In this section, we cover some new functions within ggplot2. However, as the ggplot logic is so straight forward, try to learn these new techniques yourself (nevertheless, there are some hints).
Learning how to find out about stuff in R by yourself is one of the key techniques for smooth coding.
If you have any questions, one of the following might help:
?functionname
(and press enter) to retrieve the official documentationAll the exercises will use the first plot. To make our lives easier, we safe this plot as main_plot
main_plot <- ggplot(
data = eurost,
mapping = aes(
x = unemp_youth_t,
y = gdp_gr,
color = emigration_t / immigration_t
)
) +
geom_point(aes(size = inv_per_empl)) +
labs(
x = "Share of Unemployed Youth (15-24) in Pct.",
y = "Real GDP growth rate (YOY)",
title = "GDP growth and youth unemployment in 2014",
subtitle = "Correlation between lower growth rate and higher youth unemployment",
caption = "Source: Eurostat",
size = "Investment p. person\n employed (in Mill. €)",
color = "Ratio of Emigration \n to Immigration"
) +
geom_smooth(method = "lm", se = FALSE, color = "black") +
theme_classic()
main_plot
## `geom_smooth()` using formula 'y ~ x'
First of all, the colors of this main_plot are not so nice.
There are hundreds of ways to change the colors of all the components of a ggplot. You might want to check out his page for more information.
Try to replicate this plot:
## `geom_smooth()` using formula 'y ~ x'
scale_color_gradient2()
to get the colors.
?scale_color_gradient2()
to understand the arguments you need to use to replicate the plotWe want to have labels with the country codes next to our points. ggplot has a proprietary function for this, but a cleaner and more efficient function comes from the ggrepel package (e.g. install ggrepel).
## `geom_smooth()` using formula 'y ~ x'
Again, use main_plot as the base
The function from ggrepel we want to use is geom_text_repel
. Use geo_code as the label within the aes()
argument of geom_text_repel
.
You can add a facets (e.g. the same plotting relationship in many windows representing different variables such as different years) with the function +facet_wrap(~FACETS_VARIABLE_NAME)
. Make facets using our main graph, using location as a facets variable.
## `geom_smooth()` using formula 'y ~ x'
With the plotly package, we can actually build interactive graphs.
The easiest way is to simply use the ggplotly()
function and parse a ggplot object to the p
argument:
# Download the plotly package:
library(plotly)
ggplotly(p = main_plot)