Assignment 1 (50 points)

install pakegages(“tidyverse”) ## Case 1 (20 points)

A survey of 417 individuals asks questions about how often they exercise, marital status, and annual income.

1. Load the Fitness dataset and Write a script in R to calculate the 25th, 50th, and 75th percentiles of income. What does this tell you about the income distribution? (10 points)

Fitness <- read.csv("Fitness.csv") 
quantile(Fitness$Income, probs = c(0.25, 0.5, 0.75))
##      25%      50%      75% 
##  64218.0  80705.0 101133.8

answer: The percentiles reveal 25% of individuals have an income below 64,218, 50% of individuals have an income below 80,705(majority of people earn around this amount), 25% of individuals have an income above 101,133.8. The increase from the 25th percentile (64,218) to the 75th percentile (101,133.8) suggests that incomes in the higher range are notably larger. This may indicate a right-skewed distribution, where a portion of individuals earn significantly more than others.

2. Use R to calculate the mean income for different marital statuses. What is the difference between mean income of married and that of non-married individuals? (10 points)

mean_income <- tapply(Fitness$Income, Fitness$Married, mean)
mean_income
##                No      Yes 
## 94913.00 81310.06 81829.97

answer: the gap between “yes” and “no” is: Yes-No = 81829.97-81310.06 = 519.91, Married people earn an average of $519.91 more than unmarried people.

Case 2 (30 points)

The Country data file shows the annual returns (in %) for a mutual fund focusing on investments in Latin America and a mutual fund focusing on investments in Canada over the past 20 years.

1. Write code to find out which fund had the higher average returns over this time period. Explain the result with your own words. (10 points)

Country<- read.csv("Country.csv")

summary(Country)
##       Year    Latin_America        Canada       
##  Min.   : 1   Min.   :-54.64   Min.   :-42.640  
##  1st Qu.: 7   1st Qu.:-17.23   1st Qu.: -9.610  
##  Median :13   Median :  4.11   Median : 12.280  
##  Mean   :13   Mean   : 10.58   Mean   :  9.433  
##  3rd Qu.:19   3rd Qu.: 41.11   3rd Qu.: 21.840  
##  Max.   :25   Max.   : 91.60   Max.   : 51.910
mean(Country$Latin_America)
## [1] 10.5768
mean(Country$Canada)
## [1] 9.4328

answer:Latin_America had the higher average returns over this time period

2. Write code to measure the location (range, IQR, mean, variance, sd) of both Latin_America and Canada. (10 points)Compare and Explain the result with your own words.(10 points)(20 points in total)

range <- max(Country$Latin_America) - min(Country$Latin_America)
range
## [1] 146.24
IQR <- as.numeric(quantile(Country$Latin_America, 0.75) - quantile(Country$Latin_America, 0.25))
IQR
## [1] 58.34
mean(abs(Country$Latin_America-mean(Country$Latin_America)))
## [1] 31.98867
var(Country$Latin_America)
## [1] 1375.87
sd(Country$Latin_America)
## [1] 37.09272
range <- max(Country$Canada) - min(Country$Canada)
range
## [1] 94.55
IQR <- as.numeric(quantile(Country$Canada, 0.75) - quantile(Country$Canada, 0.25))
IQR
## [1] 31.45
mean(abs(Country$Canada-mean(Country$Canada)))
## [1] 17.27686
var(Country$Canada)
## [1] 484.9471
sd(Country$Canada)
## [1] 22.02151

answer:sd Latin_America=37.09272, sd Canada=22.02151.Also, sd Latin_America is much lager than sd Canada. As a result, the data of Canada is more concentrated than the Latin_America’s.

Assignment 2 (50 points)

You have just obtained a dataset stockprice for analysis, including the date, AMZN,GOOG and X index.

1. Create a scatterplot for AMZN against X Index, set the title as “Amazon Stock Price against X Index”, name the X-axis as “Amazon Stock Price” and Y-asix as “X Index”. (10 points) Comment on the relationship between AMZN and X Index (10 points). (20 points in total)

stockprice <- read.csv("Stockprice.csv")
plot(stockprice$X.Index~ stockprice$AMZN , main = "Amazon Stock Price against X Index", xlab = "Amazon Stock Price", ylab = "X Index", pch=16)

answer:It is find that there is no relative relation character bewteen X index and stockprice.But there is a huge blank in around 1200 to 1500 stockprice.And it is also find that two point gather together in both side.The whole number are sperate in the chart.

2. Update the above scatterplot so that when X Index is equal or greater than 200, the color of the points changes, and add the Legend as “Above or equal 200” and “Below 200”. (10 points)

plot(stockprice$X.Index ~stockprice$AMZN , 
     main = "Amazon Stock Price against X Index", 
     xlab = "Amazon Stock Price", ylab = "X Index", 
     col = ifelse(stockprice$X.Index >= 200, "red", "blue"),
     pch=16)
legend("topleft", legend=c("Above or equal 200", "Below 200"), pch=16, col=c("red", "blue"))