ETC1010-5510: Introduction to Data Analysis

Read in the data

Question 1: Display the first 10 rows of the data set (1pt). Hint: Check ?head in your R console

head(terror,10)  #Run in R console

## # A tibble: 10 × 135
##       eventid iyear imonth  iday approxdate extended resolution country
##         <dbl> <dbl>  <dbl> <dbl> <chr>         <dbl> <dttm>       <dbl>
##  1    1.97e11  1970      7     2 <NA>              0 NA              58
##  2    1.97e11  1970      0     0 <NA>              0 NA             130
##  3    1.97e11  1970      1     0 <NA>              0 NA             160
##  4    1.97e11  1970      1     0 <NA>              0 NA              78
##  5    1.97e11  1970      1     0 <NA>              0 NA             101
##  6    1.97e11  1970      1     1 <NA>              0 NA             217
##  7    1.97e11  1970      1     2 <NA>              0 NA             218
##  8    1.97e11  1970      1     2 <NA>              0 NA             217
##  9    1.97e11  1970      1     2 <NA>              0 NA             217
## 10    1.97e11  1970      1     3 <NA>              0 NA             217
## # ℹ 127 more variables: country_txt <chr>, region <dbl>, region_txt <chr>,
## #   provstate <chr>, city <chr>, latitude <dbl>, longitude <dbl>,
## #   specificity <dbl>, vicinity <dbl>, location <chr>, summary <chr>,
## #   crit1 <dbl>, crit2 <dbl>, crit3 <dbl>, doubtterr <dbl>, alternative <dbl>,
## #   alternative_txt <chr>, multiple <dbl>, success <dbl>, suicide <dbl>,
## #   attacktype1 <dbl>, attacktype1_txt <chr>, attacktype2 <dbl>,
## #   attacktype2_txt <chr>, attacktype3 <lgl>, attacktype3_txt <lgl>, …

Question 2: How many observations and variables does the data set terror have (1pt)? Use inline code to complete the sentence below (2pts)

nrow(terror)

## [1] 209706

ncol(terror)

## [1] 135

Data terror contains 209706 record rows and 135 columns

Question 3: What is the name of the 3rd variable in this data set (2pts)? Use R commands to answer this question.

variables<- names(terror)
variables[3]

## [1] "imonth"

Question 4: Using the terror data set, rename the first variable to event_id and save this new data frame as terror (2pts). Display the first 4 rows and 9 columns corresponding to the country “France” (1pt).

terror<-rename(terror,"event_id"="eventid")

terror1<-terror %>% 
  dplyr::filter(country_txt == "France") %>% head(4) 
terror1 %>% select("event_id","iyear","imonth","iday","approxdate","extended","resolution","country","country_txt")

## # A tibble: 4 × 9
##      event_id iyear imonth  iday approxdate extended resolution country
##         <dbl> <dbl>  <dbl> <dbl> <chr>         <dbl> <dttm>       <dbl>
## 1     1.97e11  1972      5    25 <NA>              0 NA              69
## 2     1.97e11  1972      5    25 <NA>              0 NA              69
## 3     1.97e11  1972      5    25 <NA>              0 NA              69
## 4     1.97e11  1972      5    25 <NA>              0 NA              69
## # ℹ 1 more variable: country_txt <chr>

Question 5: Using the terror data set, and the function `filter` extract all data between the years of 1995 and 2011. Call this new dataset terror_short.(4pt)

terror_short<-terror %>% 
  dplyr::filter(iyear >= 1995 & iyear <= 2011 )
terror_short

## # A tibble: 46,626 × 135
##      event_id iyear imonth  iday approxdate extended resolution country
##         <dbl> <dbl>  <dbl> <dbl> <chr>         <dbl> <dttm>       <dbl>
##  1    2.00e11  1995      1     0 <NA>              0 NA             217
##  2    2.00e11  1995      1     0 <NA>              0 NA             217
##  3    2.00e11  1995      1     0 <NA>              0 NA             217
##  4    2.00e11  1995      1     1 <NA>              0 NA             202
##  5    2.00e11  1995      1     1 <NA>              0 NA             209
##  6    2.00e11  1995      1     1 <NA>              0 NA              55
##  7    2.00e11  1995      1     1 <NA>              0 NA              55
##  8    2.00e11  1995      1     1 <NA>              0 NA              55
##  9    2.00e11  1995      1     1 <NA>              0 NA              55
## 10    2.00e11  1995      1     1 <NA>              0 NA              83
## # ℹ 46,616 more rows
## # ℹ 127 more variables: country_txt <chr>, region <dbl>, region_txt <chr>,
## #   provstate <chr>, city <chr>, latitude <dbl>, longitude <dbl>,
## #   specificity <dbl>, vicinity <dbl>, location <chr>, summary <chr>,
## #   crit1 <dbl>, crit2 <dbl>, crit3 <dbl>, doubtterr <dbl>, alternative <dbl>,
## #   alternative_txt <chr>, multiple <dbl>, success <dbl>, suicide <dbl>,
## #   attacktype1 <dbl>, attacktype1_txt <chr>, attacktype2 <dbl>, …

Question 6: How many variables and observations are in the terror_short data frame during these years (2pts)? Hint: you can use `count` or `nrow` to complete this.

nrow(terror_short)

## [1] 46626

ncol(terror_short)

## [1] 135

Data terror contains 46626 record rows and 135 columns

Question 7: Use terror_short dataframe to answer the following questions.

Question 7.1: Since 1995, which countries have experienced the most terrorist attacks. Display the five countries with the highest frequency of attacks. (5pt)

terror_short %>% group_by(country_txt) %>% summarise(ncount=n()) %>% arrange(-ncount) %>% head(5)

## # A tibble: 5 × 2
##   country_txt ncount
##   <chr>        <int>
## 1 Iraq          7742
## 2 Pakistan      4833
## 3 India         4638
## 4 Afghanistan   2935
## 5 Colombia      2585

Question 7.2: In Western Europe, which county has experienced the most attacks.(2pt)

terror_short %>% filter(region_txt=="Western Europe")%>%  group_by(country_txt) %>% summarise(ncount=n()) %>% arrange(-ncount) %>% head(1)

## # A tibble: 1 × 2
##   country_txt ncount
##   <chr>        <int>
## 1 France         771

Question 7.3: What are the three most common type of attack in Eastern Europe? Use the variable ‘attacktype1_txt’ to answer this question. (2pt)

terror_short %>% filter(region_txt=="Eastern Europe")%>%  group_by(attacktype1_txt) %>% summarise(ncount=n()) %>% arrange(-ncount) %>% head(3)

## # A tibble: 3 × 2
##   attacktype1_txt   ncount
##   <chr>              <int>
## 1 Bombing/Explosion   1370
## 2 Armed Assault        624
## 3 Assassination        259

Question 8: Use terror_short dataframe to answer the following questions.

Question 8.1: Calculate the number of attacks per month across all years. Display the month with the highest number of attacks. (4pt)

terror_each_month <- terror_short %>% group_by(iyear,imonth)%>% summarise(number_of_attack=n()) %>% arrange(-number_of_attack)

## `summarise()` has grouped output by 'iyear'. You can override using the
## `.groups` argument.

terror_each_month

## # A tibble: 204 × 3
## # Groups:   iyear [17]
##    iyear imonth number_of_attack
##    <dbl>  <dbl>            <int>
##  1  2011     11              661
##  2  2007      6              588
##  3  2001      8              568
##  4  2008      7              554
##  5  2007     11              534
##  6  2010      9              528
##  7  2008      6              510
##  8  2008      4              507
##  9  2008      5              500
## 10  2009      7              467
## # ℹ 194 more rows

Question 8.2: Calculate the average number of attacks per month across all year, and display the month with the highest average. Which month has the highest variability as measured by standard deviation? (5pt)

terror_each_month %>% group_by(imonth) %>% summarise(mean=mean(number_of_attack),sd=sd(number_of_attack))%>% arrange(-sd)

## # A tibble: 12 × 3
##    imonth  mean    sd
##     <dbl> <dbl> <dbl>
##  1     11  242.  178.
##  2      6  244.  166.
##  3      8  257.  145.
##  4      4  227.  145.
##  5      5  239.  144.
##  6      7  255.  143.
##  7     12  200.  133.
##  8      9  208.  127.
##  9      2  204.  126.
## 10     10  229.  120.
## 11      3  219.  117.
## 12      1  219.  103.

Question 8.3: Use a boxplot to display the frequency of attacks by Month across all years. By visual analysis, state which month has the highest and lowest variability? (2pts)

ggplot(terror_each_month, aes(x =factor(imonth),y=number_of_attack,color = factor(imonth)))+
  geom_boxplot()

Question 8.4: Recalling the relationship between boxplots and empirical quantiles, state which month has the highest interquartile range of attacks. (4pts)

based on gg-plot above, the highest interquartile of number of attack is either month June or November

terror_each_month %>% group_by(imonth) %>% summarise(q1=quantile(number_of_attack,0.25),q3=quantile(number_of_attack,0.75))%>% arrange(-q3)

## # A tibble: 12 × 3
##    imonth    q1    q3
##     <dbl> <dbl> <dbl>
##  1      6   119   376
##  2      4   104   358
##  3      8   125   347
##  4      7   117   337
##  5     11   107   337
##  6     10   119   326
##  7      5   130   323
##  8      9   105   287
##  9      1   164   282
## 10      3   120   270
## 11     12    83   268
## 12      2   112   267

After validating it with the gap between Q3 and Q1, we can take conclusion that the highest number of attack occur on June

Question 8.5: Do you observe any pattern in the frequency of attacks throughout the year? (Hint: Make sure your boxplot is aligned from January to December.) Use two to three sentences to answer this question. (5pt)

** Mostly the number of attacks for each month throughout the year is arround 200 but on selected period, specially on July and Aug, frequency of attack tend experience an increasing. Therefore, median number on these period were high**

ETC1010-5510: Introduction to Data Analysis

Melisa

Load all the libraries that you need here

Read in the data

Question 1: Display the first 10 rows of the data set (1pt). Hint: Check ?head in your R console

Question 2: How many observations and variables does the data set terror have (1pt)? Use inline code to complete the sentence below (2pts)

Question 3: What is the name of the 3rd variable in this data set (2pts)? Use R commands to answer this question.

Question 4: Using the terror data set, rename the first variable to event_id and save this new data frame as terror (2pts). Display the first 4 rows and 9 columns corresponding to the country “France” (1pt).

Question 5: Using the terror data set, and the function `filter` extract all data between the years of 1995 and 2011. Call this new dataset terror_short.(4pt)

Question 6: How many variables and observations are in the terror_short data frame during these years (2pts)? Hint: you can use `count` or `nrow` to complete this.

Question 7: Use terror_short dataframe to answer the following questions.

Question 7.1: Since 1995, which countries have experienced the most terrorist attacks. Display the five countries with the highest frequency of attacks. (5pt)

Question 7.2: In Western Europe, which county has experienced the most attacks.(2pt)

Question 7.3: What are the three most common type of attack in Eastern Europe? Use the variable ‘attacktype1_txt’ to answer this question. (2pt)

Question 8: Use terror_short dataframe to answer the following questions.

Question 8.1: Calculate the number of attacks per month across all years. Display the month with the highest number of attacks. (4pt)

Question 8.2: Calculate the average number of attacks per month across all year, and display the month with the highest average. Which month has the highest variability as measured by standard deviation? (5pt)

Question 8.3: Use a boxplot to display the frequency of attacks by Month across all years. By visual analysis, state which month has the highest and lowest variability? (2pts)

Question 8.4: Recalling the relationship between boxplots and empirical quantiles, state which month has the highest interquartile range of attacks. (4pts)

Question 8.5: Do you observe any pattern in the frequency of attacks throughout the year? (Hint: Make sure your boxplot is aligned from January to December.) Use two to three sentences to answer this question. (5pt)

R Markdown

Including Plots

ETC1010-5510: Introduction to Data Analysis

Melisa

Load all the libraries that you need here

Read in the data

Question 1: Display the first 10 rows of the data set (1pt). Hint: Check ?head in your R console

Question 2: How many observations and variables does the data set terror have (1pt)? Use inline code to complete the sentence below (2pts)

Question 3: What is the name of the 3rd variable in this data set (2pts)? Use R commands to answer this question.

Question 4: Using the terror data set, rename the first variable to event_id and save this new data frame as terror (2pts). Display the first 4 rows and 9 columns corresponding to the country “France” (1pt).

Question 5: Using the terror data set, and the function filter extract all data between the years of 1995 and 2011. Call this new dataset terror_short.(4pt)

Question 6: How many variables and observations are in the terror_short data frame during these years (2pts)? Hint: you can use count or nrow to complete this.

Question 7: Use terror_short dataframe to answer the following questions.

Question 7.1: Since 1995, which countries have experienced the most terrorist attacks. Display the five countries with the highest frequency of attacks. (5pt)

Question 7.2: In Western Europe, which county has experienced the most attacks.(2pt)

Question 7.3: What are the three most common type of attack in Eastern Europe? Use the variable ‘attacktype1_txt’ to answer this question. (2pt)

Question 8: Use terror_short dataframe to answer the following questions.

Question 8.1: Calculate the number of attacks per month across all years. Display the month with the highest number of attacks. (4pt)

Question 8.2: Calculate the average number of attacks per month across all year, and display the month with the highest average. Which month has the highest variability as measured by standard deviation? (5pt)

Question 8.3: Use a boxplot to display the frequency of attacks by Month across all years. By visual analysis, state which month has the highest and lowest variability? (2pts)

Question 8.4: Recalling the relationship between boxplots and empirical quantiles, state which month has the highest interquartile range of attacks. (4pts)

Question 8.5: Do you observe any pattern in the frequency of attacks throughout the year? (Hint: Make sure your boxplot is aligned from January to December.) Use two to three sentences to answer this question. (5pt)

R Markdown

Including Plots

Question 5: Using the terror data set, and the function `filter` extract all data between the years of 1995 and 2011. Call this new dataset terror_short.(4pt)

Question 6: How many variables and observations are in the terror_short data frame during these years (2pts)? Hint: you can use `count` or `nrow` to complete this.