2.7 Practice Problems
Consider the following set of attributes about the American Film
Institute’s topfive movies ever from their 2007 list.
1. What code would you use to create a vector named Movie with the
values Citizen Kane, The Godfather, Casablanca, Raging Bull, and Singing
in the Rain?
Movie = c("Citizen Kane", "The Godfather", "Casablanca", "Raging Bull", "Singing in the Rain")
2. What code would you use to create a vector—giving the year that
the movies in Problem 1 were made—named Year with the values 1941, 1972,
1942, 1980, and 1952?
Year = c(1941, 1972, 1942,
1980, 1952)
3. What code would you use to create a vector—giving the run times
in minutes of the movies in Problem 1—named RunTime with the values 119,
177, 102, 129, and 103?
RunTime = c(119, 177, 102,
129, 103)
4. What code would you use to find the run times of the movies in
hours and save them in a vector called RunTimeHours?
RunTimeHours = RunTime/60
RunTimeHours
## [1] 1.983333 2.950000 1.700000 2.150000 1.716667
5. What code would you use to create a data frame named MovieInfo
containing the vectors created in Problem 1, Problem 2, and Problem
3?
MovieInfo=data.frame(Movie, Year, RunTime)
MovieInfo
## Movie Year RunTime
## 1 Citizen Kane 1941 119
## 2 The Godfather 1972 177
## 3 Casablanca 1942 102
## 4 Raging Bull 1980 129
## 5 Singing in the Rain 1952 103
Consider the following set of attributes about a series of
LucasArts—anearly video game company under the umbrella of George
Lucas’s Lucasfilm company—video games.
6. What code would you use to create a vector named Title with the
values The Secret of Monkey Island, Indiana Jones, and the Fate of
Atlantis, Day of the Tentacle, and Grim Fandango?
Title = c("The Secret of Monkey Island", "Indiana Jones and the Fate of Atlantis", "Day of the Tentacle", "Grim Fandango")
Title
## [1] "The Secret of Monkey Island"
## [2] "Indiana Jones and the Fate of Atlantis"
## [3] "Day of the Tentacle"
## [4] "Grim Fandango"
7. What code would you use to create a vector—giving the year that
the games in Problem 6 were released—named Release with the values 1990,
1992, 1993, and 1998?
Release = c(1990, 1992, 1993, 1998)
Release
## [1] 1990 1992 1993 1998
8. LucasArts was founded in 1982. What code would you use to
calculate how many years after the founding of the company was the game
released?
Release-1982
## [1] 8 10 11 16
10. What code would you use to create a data frame called
AdventureGames containing the vectors contained in Problem 6, Problem 7,
and Problem 9?
AdventureGames=data.frame(Title, Release, Rank)
AdventureGames
## Title Release Rank
## 1 The Secret of Monkey Island 1990 14
## 2 Indiana Jones and the Fate of Atlantis 1992 11
## 3 Day of the Tentacle 1993 6
## 4 Grim Fandango 1998 1
Chapter 4 :subsetting data, random numbers, and selecting a random
sample
4.2 Subsetting vectors
hours=c(8.84, 3.26, 2.81, 0.64, 0.60, 0.53, 0.37, 0.35, 0.31, 0.24)
hours[1]
## [1] 8.84
hours[c(1,3,9)]
## [1] 8.84 2.81 0.31
hours[hours>1]
## [1] 8.84 3.26 2.81
hours[hours>=0]
## [1] 8.84 3.26 2.81 0.64 0.60 0.53 0.37 0.35 0.31 0.24
4.3 Subsetting data frames
Membuat Dataframe
DailyACT = c("Sleeping", "Working", "Watching Television", "Socializing", "Food Preparation", "Housework", "Childcare", "Consumer Goods Purchase", "Participating in Recreation", "Attending Class")
AverageHours = c(8.84, 3.26, 2.81, 0.64, 0.60, 0.53, 0.37, 0.35, 0.31, 0.24)
Category = c("Personal Care", "Work-Related", "Leisure", "Leisure", "Household", "Household", "Caring for Household", "Purchasing", "Leisure", "Education")
Aktivitas = data.frame(DailyACT, AverageHours, Category)
Melihat dataframe
head(Aktivitas)
## DailyACT AverageHours Category
## 1 Sleeping 8.84 Personal Care
## 2 Working 3.26 Work-Related
## 3 Watching Television 2.81 Leisure
## 4 Socializing 0.64 Leisure
## 5 Food Preparation 0.60 Household
## 6 Housework 0.53 Household
Aktivitas[5, 3]
## [1] "Household"
Aktivitas[9,1]
## [1] "Participating in Recreation"
Aktivitas[10,]
## DailyACT AverageHours Category
## 10 Attending Class 0.24 Education
Aktivitas$Category
## [1] "Personal Care" "Work-Related" "Leisure"
## [4] "Leisure" "Household" "Household"
## [7] "Caring for Household" "Purchasing" "Leisure"
## [10] "Education"
Aktivitas[Aktivitas$AverageHours>1,]
## DailyACT AverageHours Category
## 1 Sleeping 8.84 Personal Care
## 2 Working 3.26 Work-Related
## 3 Watching Television 2.81 Leisure
4.4 Random numbers
set.seed(8)
rnorm(10)
## [1] -0.08458607 0.84040013 -0.46348277 -0.55083500 0.73604043 -0.10788140
## [7] -0.17028915 -1.08833171 -3.01105168 -0.59317433
4.7 Practice problems
1. What code would you use to select the first, third, tenth, and
twelfth entries in the TopSalary vector from the Colleges data
frame?
College = c("William and Mary", "Christopher Newport", "George Mason", "James Madison", "Longwood", "Norfolk State", "Old Dominion", "Radford", "Mary Washington", "Virginia", "Virginia Commonwealth", "Virginia Military Institute", "Virginia Tech", "Virginia State")
Employees = c(2104, 922, 4043, 2833, 746, 919, 2369, 1273, 721, 7431, 5825, 550, 7303, 761)
TopSalary = c(425000, 381486, 536714, 428400, 322868, 295000, 448272, 312080, 449865, 561099, 503154, 364269, 500000, 356524)
MedianSalary = c(56496, 47895, 63029, 53080, 52000, 49605, 54416, 51000, 53045, 60048, 55000, 44999, 51656, 55925)
Colleges <- data.frame(College,Employees,TopSalary,MedianSalary)
3. What code would you use to select the rows of the data frame for
colleges with less than or equal to 1000 employees?
selected_colleges <- Colleges[Colleges$Employees <= 1000, ]
print(selected_colleges)
## College Employees TopSalary MedianSalary
## 2 Christopher Newport 922 381486 47895
## 5 Longwood 746 322868 52000
## 6 Norfolk State 919 295000 49605
## 9 Mary Washington 721 449865 53045
## 12 Virginia Military Institute 550 364269 44999
## 14 Virginia State 761 356524 55925
4. What code would you use to select a sample of 5 colleges from
this data frame (there are 14 rows)?
sampled_colleges <- Colleges[sample(1:nrow(Colleges), size = 5), ]
print(sampled_colleges)
## College Employees TopSalary MedianSalary
## 2 Christopher Newport 922 381486 47895
## 7 Old Dominion 2369 448272 54416
## 4 James Madison 2833 428400 53080
## 3 George Mason 4043 536714 63029
## 9 Mary Washington 721 449865 53045
Suppose we have the following data frame named Countries:
# Membuat data frame 'Countries'
Countries <- data.frame(
Nation = c("China", "India", "United States", "Indonesia", "Brazil",
"Pakistan", "Nigeria", "Bangladesh", "Russia", "Mexico"),
Region = c("Asia", "Asia", "North America", "Asia", "South America",
"Asia", "Africa", "Asia", "Europe", "North America"),
Population = c(1409517397, 1339180127, 324459463, 263991379, 209288278,
197015955, 190886311, 164669751, 143989754, 129163276),
PctIncrease = c(0.40, 1.10, 0.70, 1.10, 0.80,
2.00, 2.60, 1.10, 0.00, 1.30),
GDPcapita = c(8582, 1852, 57467, 3895, 10309,
1629, 2640, 1524, 10248, 8562)
)
# Menampilkan data frame
print(Countries)
## Nation Region Population PctIncrease GDPcapita
## 1 China Asia 1409517397 0.4 8582
## 2 India Asia 1339180127 1.1 1852
## 3 United States North America 324459463 0.7 57467
## 4 Indonesia Asia 263991379 1.1 3895
## 5 Brazil South America 209288278 0.8 10309
## 6 Pakistan Asia 197015955 2.0 1629
## 7 Nigeria Africa 190886311 2.6 2640
## 8 Bangladesh Asia 164669751 1.1 1524
## 9 Russia Europe 143989754 0.0 10248
## 10 Mexico North America 129163276 1.3 8562
5. What could would you use to select the rows of the data frame
that have GDP per capita less than 10000 and are not in the Asia
region?
selected_countries <- Countries[Countries$GDPcapita < 10000 & Countries$Region != "Asia", ]
print(selected_countries)
## Nation Region Population PctIncrease GDPcapita
## 7 Nigeria Africa 190886311 2.6 2640
## 10 Mexico North America 129163276 1.3 8562
6. What code would you use to select a sample of three nations from
this data frame (There are 10 rows)?
# Set seed for reproducibility (optional)
set.seed(123)
# Select a random sample of 3 nations
sample_countries <- Countries[sample(1:nrow(Countries), 3), ]
# Display the sampled nations
print(sample_countries)
## Nation Region Population PctIncrease GDPcapita
## 3 United States North America 324459463 0.7 57467
## 10 Mexico North America 129163276 1.3 8562
## 2 India Asia 1339180127 1.1 1852
7. What code would you use to select which nations saw a population
percent increase greater that 1.5%?
# Filter rows where the population percent increase is greater than 1.5%
high_increase_nations <- Countries[Countries$PctIncrease > 1.5, ]
# Display the nations with a population percent increase greater than 1.5%
print(high_increase_nations)
## Nation Region Population PctIncrease GDPcapita
## 6 Pakistan Asia 197015955 2.0 1629
## 7 Nigeria Africa 190886311 2.6 2640
Suppose we have the following data frame named Olympics:
Year = c(1992, 1992, 1994, 1996, 1998, 2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014, 2016, 2018)
Type = c("Summer", "Winter", "Winter", "Summer", "Winter", "Summer", "Winter", "Summer", "Winter", "Summer", "Winter", "Summer", "Winter", "Summer", "Winter")
Host = c("Spain", "France", "Norway", "United States", "Japan", "Australia", "United States", "Greece", "Italy", "China", "Canada", "United Kingdom", "Russia", "Brazil", "South Korea")
Competitors = c(9356, 1801, 1737, 10318, 2176, 10651, 2399, 10625, 2508, 10942, 2566, 10768, 2873, 11238, 2922)
Events = c(257, 57, 61, 271, 68, 300, 78, 301, 84, 302, 86, 302, 98, 306, 102)
Nations = c(169, 64, 67, 197, 72, 199, 77, 201, 80, 204, 82, 204, 88, 207, 92)
Leader = c("Unified Team", "Germany", "Russia", "United States", "Germany", "United States", "Norway", "United States", "Germany", "China", "Canada", "United States", "Russia", "United States", "Norway")
Olympics <- data.frame(Year,Type,Host,Competitors,Events,Nations,Leader)
What code would you use to select the rows of the data frame where
the host nation was also the medal leader?
# Filter rows where the host nation is also the medal leader
host_leader_match <- Olympics[Olympics$Host == Olympics$Leader, ]
# Display the filtered data
print(host_leader_match)
## Year Type Host Competitors Events Nations Leader
## 4 1996 Summer United States 10318 271 197 United States
## 10 2008 Summer China 10942 302 204 China
## 11 2010 Winter Canada 2566 86 82 Canada
## 13 2014 Winter Russia 2873 98 88 Russia
9. What code would you use to select the rows of the data frame
where the number of competitors per event is greater than 35?
# Filter rows where the number of competitors per event is greater than 35
competitors_per_event <- Olympics[Olympics$Competitors / Olympics$Events > 35, ]
# Display the filtered data
print(competitors_per_event)
## Year Type Host Competitors Events Nations Leader
## 1 1992 Summer Spain 9356 257 169 Unified Team
## 4 1996 Summer United States 10318 271 197 United States
## 6 2000 Summer Australia 10651 300 199 United States
## 8 2004 Summer Greece 10625 301 201 United States
## 10 2008 Summer China 10942 302 204 China
## 12 2012 Summer United Kingdom 10768 302 204 United States
## 14 2016 Summer Brazil 11238 306 207 United States
10.What code would you use to select the rows of the data frame
where the number of competing nations in the Winter Olympics is at least
80?
# Filter rows where the Olympic type is Winter and the number of competing nations is at least 80
winter_nations_80 <- Olympics[Olympics$Type == "Winter" & Olympics$Nations >= 80, ]
# Display the filtered data
print(winter_nations_80)
## Year Type Host Competitors Events Nations Leader
## 9 2006 Winter Italy 2508 84 80 Germany
## 11 2010 Winter Canada 2566 86 82 Canada
## 13 2014 Winter Russia 2873 98 88 Russia
## 15 2018 Winter South Korea 2922 102 92 Norway