Download the script here
Let´s take a look at real data examples! You are going to have several options for datasets you can (feel free to use others) use. We now want to introduce: + UN General Assembly voting data by Voeten et al. (2000) + Survey Data on immigration + A subset of aiddata (provided by aiddata.org)
In contrast to eurostat data, there is no direct API attached to these datasets. This is why - in theory - we have two ways to download the data. We can download it manually and store it somewhere in the project folder (maybe “/data/xyz”). Or we tell R to download it directly from the website. While I have already uploaded subsetted data on the latter two datasets here and here, we now want to download the Voeten data directly from the internet:
rm(list = ls())
# install.packages("tidyverse")
library(tidyverse)
if(!exists("data/UN-73new_small.RData")) {
url <- "https://dataverse.harvard.edu/api/access/datafile/:persistentId?persistentId=doi:10.7910/DVN/LEJUQZ/KKG7SW"
download.file(url,
destfile="data/UN-73new_small.RData",
quiet = FALSE,
mode = "w",
cacheOK = TRUE,
extra = getOption("download.file.extra"),
headers = NULL)
}
load("data/UN-73new_small.RData")
# Now we subset the data to delete irrelevant variables
completeVotes<- completeVotes[c("rcid","vote","Country","year","importantvote","me","nu","di","hr","co","ec")]
completeVotes<- completeVotes[!is.na(completeVotes$Country),]
save(completeVotes,file="data/UN-73new_small.RData")
If we downloaded the aiddata here and saved it to our localdata folder, we would have been tackled by the problem of massive amounts of data. To still be able to handle this, the fread function in the data.table package offers great features to open huge csv files fast and completely.
#aiddata <- fread("data/AidDataCoreFull_ResearchRelease_Level1_v3.1.csv",
# sep=",",
# nrows = -1,
# na.strings = c("NA","N/A",""),
# stringsAsFactors=FALSE
#)
All three datasets should be stored in the datafile. As learned, we now want to make the following