library(dslabs)
data(murders)
Use the $ operator to access the population size data and store it as the object pop. Then use the sort function to redefine pop so that it is sorted. Finally, use the [ operator to report the smallest population size.
# Access the population size data using the $ operator
pop <- murders$population
# Sort the population data in ascending order
sorted_pop <- sort(pop)
# Report the smallest population size (the first element in the sorted data)
smallest_population <- sorted_pop[1]
# Print the smallest population size
print(smallest_population)
## [1] 563626
Now instead of the smallest population size, find the index of the entry with the smallest population size. Hint: use order instead of sort.
# Use the order function to get the indices that would sort the population data
sorted_indices <- order(murders$population)
sorted_indices
## [1] 51 9 46 35 2 42 8 27 40 30 20 12 13 28 49 32 29 45 17 4 25 16 7 37 38
## [26] 18 19 41 1 6 24 50 21 26 43 3 15 22 48 47 31 34 23 11 36 39 14 33 10 44
## [51] 5
# Find the index of the entry with the smallest population size (the first index)
smallest_population_index <- sorted_indices[1]
# Print the index of the entry with the smallest population size
print(smallest_population_index)
## [1] 51
We can actually perform the same operation as in the previous exercise using the function which.min. Write one line of code that does this.
smallest_population_index <- which.min(murders$population)
Now we know how small the smallest state is and we know which row represents it. Which state is it? Define a variable states to be the state names from the murders data frame. Report the name of the state with the smallest population.
# Find the index of the entry with the smallest population
smallest_population_index <- which.min(murders$Population)
# Access the state names from the murders data frame
states <- murders$state
# Find the state with the smallest population
smallest_state <- states[smallest_population_index]
# Print the name of the state with the smallest population
print(smallest_state)
## character(0)
You can create a data frame using the data.frame function. Here is a quick example:
temp <- c(35, 88, 42, 84, 81, 30)
city <- c(“Beijing”, “Lagos”, “Paris”, “Rio de Janeiro”, “San Juan”, “Toronto”)
city_temps <- data.frame(name = city, temperature = temp)
Use the rank function to determine the population rank of each state from smallest population size to biggest. Save these ranks in an object called ranks, then create a data frame with the state name and its rank. Call the data frame my_df.
temp <- c(35, 88, 42, 84, 81, 30)
city <- c("Beijing", "Lagos", "Paris", "Rio de Janeiro", "San Juan", "Toronto")
city_temps <- data.frame(name = city, temperature = temp)
# Use the rank function to determine the population rank
ranks <- rank(murders$population)
# Create a data frame with state name and its rank
my_df <- data.frame(State = murders$state, Rank = ranks)
my_df
## State Rank
## 1 Alabama 29
## 2 Alaska 5
## 3 Arizona 36
## 4 Arkansas 20
## 5 California 51
## 6 Colorado 30
## 7 Connecticut 23
## 8 Delaware 7
## 9 District of Columbia 2
## 10 Florida 49
## 11 Georgia 44
## 12 Hawaii 12
## 13 Idaho 13
## 14 Illinois 47
## 15 Indiana 37
## 16 Iowa 22
## 17 Kansas 19
## 18 Kentucky 26
## 19 Louisiana 27
## 20 Maine 11
## 21 Maryland 33
## 22 Massachusetts 38
## 23 Michigan 43
## 24 Minnesota 31
## 25 Mississippi 21
## 26 Missouri 34
## 27 Montana 8
## 28 Nebraska 14
## 29 Nevada 17
## 30 New Hampshire 10
## 31 New Jersey 41
## 32 New Mexico 16
## 33 New York 48
## 34 North Carolina 42
## 35 North Dakota 4
## 36 Ohio 45
## 37 Oklahoma 24
## 38 Oregon 25
## 39 Pennsylvania 46
## 40 Rhode Island 9
## 41 South Carolina 28
## 42 South Dakota 6
## 43 Tennessee 35
## 44 Texas 50
## 45 Utah 18
## 46 Vermont 3
## 47 Virginia 40
## 48 Washington 39
## 49 West Virginia 15
## 50 Wisconsin 32
## 51 Wyoming 1
Repeat the previous exercise, but this time order my_df so that the states are ordered from least populous to most populous. Hint: create an object ind that stores the indexes needed to order the population values. Then use the bracket operator [ to re-order each column in the data frame.
# Create an object ind to store the indexes needed to order the population values
ind <- order(my_df$Rank)
# Use the bracket operator to re-order each column in the data frame
my_df <- my_df[ind, ]
# Print the re-ordered data frame
print(my_df)
## State Rank
## 51 Wyoming 1
## 9 District of Columbia 2
## 46 Vermont 3
## 35 North Dakota 4
## 2 Alaska 5
## 42 South Dakota 6
## 8 Delaware 7
## 27 Montana 8
## 40 Rhode Island 9
## 30 New Hampshire 10
## 20 Maine 11
## 12 Hawaii 12
## 13 Idaho 13
## 28 Nebraska 14
## 49 West Virginia 15
## 32 New Mexico 16
## 29 Nevada 17
## 45 Utah 18
## 17 Kansas 19
## 4 Arkansas 20
## 25 Mississippi 21
## 16 Iowa 22
## 7 Connecticut 23
## 37 Oklahoma 24
## 38 Oregon 25
## 18 Kentucky 26
## 19 Louisiana 27
## 41 South Carolina 28
## 1 Alabama 29
## 6 Colorado 30
## 24 Minnesota 31
## 50 Wisconsin 32
## 21 Maryland 33
## 26 Missouri 34
## 43 Tennessee 35
## 3 Arizona 36
## 15 Indiana 37
## 22 Massachusetts 38
## 48 Washington 39
## 47 Virginia 40
## 31 New Jersey 41
## 34 North Carolina 42
## 23 Michigan 43
## 11 Georgia 44
## 36 Ohio 45
## 39 Pennsylvania 46
## 14 Illinois 47
## 33 New York 48
## 10 Florida 49
## 44 Texas 50
## 5 California 51
The na_example vector represents a series of counts. You can quickly examine the object using: data(“na_example”) str(na_example) #> int [1:1000] 2 1 3 2 1 3 1 4 3 2 … However, when we compute the average with the function mean, we obtain an NA: mean(na_example) #> [1] NA The is.na function returns a logical vector that tells us which entries are NA. Assign this logical vector to an object called ind and determine how many NAs does na_example have.
# Load the na_example vector
data("na_example")
# Check the structure of na_example
str(na_example)
## int [1:1000] 2 1 3 2 1 3 1 4 3 2 ...
# Compute a logical vector indicating NA values
ind <- is.na(na_example)
# Count the number of NA values
num_nas <- sum(ind)
# Print the number of NA values
print(num_nas)
## [1] 145
Now compute the average again, but only for the entries that are not NA. Hint: remember the ! operator.
# Load the na_example vector
data("na_example")
# Compute a logical vector indicating non-NA values
ind <- !is.na(na_example)
# Compute the average for non-NA entries
average_non_na <- mean(na_example[ind])
# Print the average for non-NA entries
print(average_non_na)
## [1] 2.301754
Previously we created this data frame: temp <- c(35, 88, 42, 84, 81, 30) city <- c(“Beijing”, “Lagos”, “Paris”, “Rio de Janeiro”, “San Juan”, “Toronto”) city_temps <- data.frame(name = city, temperature = temp) Remake the data frame using the code above, but add a line that converts the temperature from Fahrenheit to Celsius. The conversion is C = 5/9 × (F − 32).
# Define the temperature data in Fahrenheit
temp_fahrenheit <- c(35, 88, 42, 84, 81, 30)
# Convert Fahrenheit to Celsius using the conversion formula
temp_celsius <- (5/9) * (temp_fahrenheit - 32)
# Define the city names
city <- c("Beijing", "Lagos", "Paris", "Rio de Janeiro", "San Juan", "Toronto")
# Create the data frame with city names and temperatures in Celsius
city_temps <- data.frame(name = city, temperature = temp_celsius)
# Print the data frame
print(city_temps)
## name temperature
## 1 Beijing 1.666667
## 2 Lagos 31.111111
## 3 Paris 5.555556
## 4 Rio de Janeiro 28.888889
## 5 San Juan 27.222222
## 6 Toronto -1.111111
What is the following sum 1+1/22 + 1/32 + … 1/1002? Hint: thanks to Euler, we know it should be close to π2/6.
# Initialize a variable to store the sum
sum_result <- 0
# Calculate the sum
for (i in 1:100) {
sum_result <- sum_result + 1 / (i^2)
}
# Calculate the result
result <- sum_result
# Print the result
print(result)
## [1] 1.634984
Compute the per 100,000 murder rate for each state and store it in the object murder_rate. Then compute the average murder rate for the US using the function mean. What is the average?
# Compute the murder rate per 100,000 for each state
murder_rate <- (murders$total * 100000) / murders$population
# Compute the average murder rate for the US
average_murder_rate <- mean(murder_rate, na.rm = TRUE)
# Print the average murder rate
print(average_murder_rate)
## [1] 2.779125