Download the script here

1 General

Let´s take a look at real data examples! You are going to have several options for datasets you can (feel free to use others) use. We now want to introduce: + UN General Assembly voting data by Voeten et al. (2000) + Survey Data on immigration + A subset of aiddata (provided by aiddata.org)

2 Download the Data

In contrast to eurostat data, there is no direct API attached to these datasets. This is why - in theory - we have two ways to download the data. We can download it manually and store it somewhere in the project folder (maybe “/data/xyz”). Or we tell R to download it directly from the website. While I have already uploaded subsetted data on the latter two datasets here and here, we now want to download the Voeten data directly from the internet:

rm(list = ls())
# install.packages("tidyverse")
library(tidyverse)

if(!exists("data/UN-73new_small.RData")) {
  url <- "https://dataverse.harvard.edu/api/access/datafile/:persistentId?persistentId=doi:10.7910/DVN/LEJUQZ/KKG7SW"
  download.file(url,
                destfile="data/UN-73new_small.RData",
                quiet = FALSE,
                mode = "w",
                cacheOK = TRUE,
                extra = getOption("download.file.extra"),
                headers = NULL)
}
load("data/UN-73new_small.RData")

# Now we subset the data to delete irrelevant variables
completeVotes<- completeVotes[c("rcid","vote","Country","year","importantvote","me","nu","di","hr","co","ec")]

completeVotes<- completeVotes[!is.na(completeVotes$Country),]

save(completeVotes,file="data/UN-73new_small.RData")

If we downloaded the aiddata here and saved it to our localdata folder, we would have been tackled by the problem of massive amounts of data. To still be able to handle this, the fread function in the data.table package offers great features to open huge csv files fast and completely.

#aiddata <- fread("data/AidDataCoreFull_ResearchRelease_Level1_v3.1.csv",
#                    sep=",",
#                    nrows = -1,
#                    na.strings = c("NA","N/A",""),
#                    stringsAsFactors=FALSE
#)

3 Tasks

All three datasets should be stored in the datafile. As learned, we now want to make the following

Import the data
Load it to the environment
Tell us about the structure of the data
Subset according to the tasks and gogogo!

3.1 Voeten

Subset the data to only choose important votes (what do we notice?)
Subset the data and only focus on Russia, USA and China
Table the countries decision in the year 2006

3.2 Immigration

Subset the data to only get these participants with an age over 23
Table the ethnocentrism scale (variable ethno)

3.3 Aiddata

Subset the data to get all aid in 2013
Subset donor countries and only focus on Russia, USA and China
Table the commited ammount of these donor countries