bs data science Excercise 3.11 &3.13

R Markdown

……………… Excercise 3.11 …………………………..

##…Q1… ## . Use the $ operator to access the population size data and store it as the object pop. Then use the sort function to redefne pop so that it is sorted. Finally, use the [ operator to report the smallest.

library(dslabs)
data(murders)
pop <- murders$population
sorted_pop <- sort(pop)
smallest_population <- sorted_pop[1]
print(smallest_population)

## [1] 563626

…Q2…

2. Now instead of the smallest population size, fnd the index of the entry with the smallest population.size. Hint: use order instead of sort.

index_of_smallest_population <- order(pop)[1]
print(index_of_smallest_population)

## [1] 51

##…Q3… ## 3. We can actually perform the same operation as in the previous exercise using the function which.min.Write one line of code that does this.

index_of_smallest_population <- which.min(pop)
print(index_of_smallest_population)

## [1] 51

…Q4…

##. Now we know how small the smallest state is and we know which row represents it. Which state is it?Defne a variable states to be the state names from the murders data frame. Report the name of the state with the smallest population.

states <- murders$state
state_with_smallest_population <- states[index_of_smallest_population]
print(state_with_smallest_population)

## [1] "Wyoming"

##………………………..Q5……………………………….. ##.5. You can create a data frame using the data.frame function. Here is a quick example:temp <- c(35, 88, 42, 84, 81, 30),city <- c(“Beijing”, “Lagos”, “Paris”, “Rio de Janeiro”, “San Juan”, “Toronto”),city_temps <- data.frame(name = city, temperature = temp),Use the rank function to determine the population rank of each state from smallest population size tobiggest. Save these ranks in an object called ranks, then create a data frame with the state name andits rank. Call the data frame my_df.

ranks <- rank(pop)
my_df <- data.frame(state = states, rank = ranks)
my_df

##                   state rank
## 1               Alabama   29
## 2                Alaska    5
## 3               Arizona   36
## 4              Arkansas   20
## 5            California   51
## 6              Colorado   30
## 7           Connecticut   23
## 8              Delaware    7
## 9  District of Columbia    2
## 10              Florida   49
## 11              Georgia   44
## 12               Hawaii   12
## 13                Idaho   13
## 14             Illinois   47
## 15              Indiana   37
## 16                 Iowa   22
## 17               Kansas   19
## 18             Kentucky   26
## 19            Louisiana   27
## 20                Maine   11
## 21             Maryland   33
## 22        Massachusetts   38
## 23             Michigan   43
## 24            Minnesota   31
## 25          Mississippi   21
## 26             Missouri   34
## 27              Montana    8
## 28             Nebraska   14
## 29               Nevada   17
## 30        New Hampshire   10
## 31           New Jersey   41
## 32           New Mexico   16
## 33             New York   48
## 34       North Carolina   42
## 35         North Dakota    4
## 36                 Ohio   45
## 37             Oklahoma   24
## 38               Oregon   25
## 39         Pennsylvania   46
## 40         Rhode Island    9
## 41       South Carolina   28
## 42         South Dakota    6
## 43            Tennessee   35
## 44                Texas   50
## 45                 Utah   18
## 46              Vermont    3
## 47             Virginia   40
## 48           Washington   39
## 49        West Virginia   15
## 50            Wisconsin   32
## 51              Wyoming    1

ind <- order(ranks)
my_df <- my_df[ind, ]
my_df

##                   state rank
## 51              Wyoming    1
## 9  District of Columbia    2
## 46              Vermont    3
## 35         North Dakota    4
## 2                Alaska    5
## 42         South Dakota    6
## 8              Delaware    7
## 27              Montana    8
## 40         Rhode Island    9
## 30        New Hampshire   10
## 20                Maine   11
## 12               Hawaii   12
## 13                Idaho   13
## 28             Nebraska   14
## 49        West Virginia   15
## 32           New Mexico   16
## 29               Nevada   17
## 45                 Utah   18
## 17               Kansas   19
## 4              Arkansas   20
## 25          Mississippi   21
## 16                 Iowa   22
## 7           Connecticut   23
## 37             Oklahoma   24
## 38               Oregon   25
## 18             Kentucky   26
## 19            Louisiana   27
## 41       South Carolina   28
## 1               Alabama   29
## 6              Colorado   30
## 24            Minnesota   31
## 50            Wisconsin   32
## 21             Maryland   33
## 26             Missouri   34
## 43            Tennessee   35
## 3               Arizona   36
## 15              Indiana   37
## 22        Massachusetts   38
## 48           Washington   39
## 47             Virginia   40
## 31           New Jersey   41
## 34       North Carolina   42
## 23             Michigan   43
## 11              Georgia   44
## 36                 Ohio   45
## 39         Pennsylvania   46
## 14             Illinois   47
## 33             New York   48
## 10              Florida   49
## 44                Texas   50
## 5            California   51

##…………………………Q6…………………………………… ## 6. Repeat the previous exercise, but this time order my_df so that the states are ordered from leastpopulous to most populous. Hint: create an object ind that stores the indexes needed to order thepopulation values. Then use the bracket operator [ to re-order each column in the data frame.

# Create a data frame with state names and population ranks
ranks <- rank(pop)
my_df <- data.frame(state = states, rank = ranks)

# Create an index to order the data frame
ind <- order(ranks)

# Reorder the data frame based on the population ranks
my_df <- my_df[ind, ]

# Now, my_df is ordered from least populous to most populous
my_df

##                   state rank
## 51              Wyoming    1
## 9  District of Columbia    2
## 46              Vermont    3
## 35         North Dakota    4
## 2                Alaska    5
## 42         South Dakota    6
## 8              Delaware    7
## 27              Montana    8
## 40         Rhode Island    9
## 30        New Hampshire   10
## 20                Maine   11
## 12               Hawaii   12
## 13                Idaho   13
## 28             Nebraska   14
## 49        West Virginia   15
## 32           New Mexico   16
## 29               Nevada   17
## 45                 Utah   18
## 17               Kansas   19
## 4              Arkansas   20
## 25          Mississippi   21
## 16                 Iowa   22
## 7           Connecticut   23
## 37             Oklahoma   24
## 38               Oregon   25
## 18             Kentucky   26
## 19            Louisiana   27
## 41       South Carolina   28
## 1               Alabama   29
## 6              Colorado   30
## 24            Minnesota   31
## 50            Wisconsin   32
## 21             Maryland   33
## 26             Missouri   34
## 43            Tennessee   35
## 3               Arizona   36
## 15              Indiana   37
## 22        Massachusetts   38
## 48           Washington   39
## 47             Virginia   40
## 31           New Jersey   41
## 34       North Carolina   42
## 23             Michigan   43
## 11              Georgia   44
## 36                 Ohio   45
## 39         Pennsylvania   46
## 14             Illinois   47
## 33             New York   48
## 10              Florida   49
## 44                Texas   50
## 5            California   51

##………………………Q7………………………………………… ##7. The na_example vector represents a series of counts. You can quickly examine the object using: ##data(“na_example”) ###However, when we compute the average with the function mean, we obtain an NA: ##mean(na_example) ##> [1] NA ##The is.na function returns a logical vector that tells us which entries are NA. Assign this logical vector ##to an object called ind and determine how many NAs does na_example have.

data("na_example")
str(na_example)

##  int [1:1000] 2 1 3 2 1 3 1 4 3 2 ...

ind <- is.na(na_example)
number_of_nas <- sum(ind)
print(number_of_nas)

## [1] 145

#Compute the average again, but only for the entries that are not NA.
average_without_nas <- mean(na_example[!ind])
print(average_without_nas)

## [1] 2.301754

##………………………………….Q8…………………………….. ##8. Now compute the average again, but only for the entries that are not NA. Hint: remember the !operator.

data("na_example")

# Create a logical vector that identifies non-NA entries
not_na <- !is.na(na_example)

# Calculate the average only for non-NA entries
average_without_nas <- mean(na_example[not_na])

# Print the result
print(average_without_nas)

## [1] 2.301754

##>>>>>>>>>>>>>>>>>>>>>>>>[Excercise no.3.13]<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

…………………………..Q1……………………………………………………

##1. Previously we created this data frame:temp <- c(35, 88, 42, 84, 81, 30)city <- c(“Beijing”, “Lagos”, “Paris”, “Rio de Janeiro”, “San Juan”, “Toronto”)city_temps <- data.frame(name = city, temperature = temp) Remake the data frame using the code above, but add a line that converts the temperature from Fahrenheit to Celsius. The conversion is C = 59 × (F − 32).

# Create vectors for city names and temperatures in Fahrenheit
temp_fahrenheit <- c(35, 88, 42, 84, 81, 30)
city <- c("Beijing", "Lagos", "Paris", "Rio de Janeiro", "San Juan", "Toronto")

# Convert temperatures from Fahrenheit to Celsius using the conversion formula
temp_celsius <- (5/9) * (temp_fahrenheit - 32)

# Create the data frame with city names and temperatures in Celsius
city_temps <- data.frame(name = city, temperature_Celsius = temp_celsius)
city_temps

##             name temperature_Celsius
## 1        Beijing            1.666667
## 2          Lagos           31.111111
## 3          Paris            5.555556
## 4 Rio de Janeiro           28.888889
## 5       San Juan           27.222222
## 6        Toronto           -1.111111

………………………………Q2…………………………………

2. What is the following sum 1+1/22 + 1/32 + … 1/1002? Hint: thanks to Euler, we know it should be close to π2/6.

n <- 1:1000
sum_result <- sum(1/n^2)
sum_result

## [1] 1.643935

##……………………………Q3…………………………………… ##3. Compute the per 100,000 murder rate for each state and store it in the object murder_rate. Then compute the average murder rate for the US using the function mean. What is the average?

# Calculate the murder rate for each state
murder_rate <- (murders$total / murders$population) * 100000

# Compute the average murder rate for the US
average_murder_rate_us <- mean(murder_rate)

# Print the average murder rate
print(average_murder_rate_us)

## [1] 2.779125