JAT believes that over the years, the average number of app users have increased significantly. Is there statistical evidence to support that the average number of users in year 2017-2018 is more than average number of users in year 2015-2016 at a=0.05? Support your answer with all necessary tests.
data4 <- data1 %>% select(user.number, year)
# Ho : avg users of 2017-2018 <= 2015-2016,
# Ha : avg users of 2017-2018 > 2015-2016
t.test(x= data4$user.number[data4$year== "2017" | data4$year == "2018"], y= data4$user.number[data4$year == "2015" | data4$year == "2016"], data= data4, var.equal = TRUE)
##
## Two Sample t-test
##
## data: data4$user.number[data4$year == "2017" | data4$year == "2018"] and data4$user.number[data4$year == "2015" | data4$year == "2016"]
## t = 9.2567, df = 121, p-value = 0.0000000000000009507
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 103.0074 159.0556
## sample estimates:
## mean of x mean of y
## 181.10000 50.06849
#OR
data4.1 <- data4 %>% mutate(year.range = case_when(year %in% c("2017", "2018") ~ "2017-2018", year %in% c("2015", "2016") ~ "2015-2016"))
# here mutate and case_when syntax is different since we are creating new coulmn from a categorical column.
t.test (data4.1 $ user.number ~ data4.1 $year.range, var.equal = TRUE)
##
## Two Sample t-test
##
## data: data4.1$user.number by data4.1$year.range
## t = -9.2567, df = 121, p-value = 0.0000000000000009507
## alternative hypothesis: true difference in means between group 2015-2016 and group 2017-2018 is not equal to 0
## 95 percent confidence interval:
## -159.0556 -103.0074
## sample estimates:
## mean in group 2015-2016 mean in group 2017-2018
## 50.06849 181.10000
#OR
data1$group <- factor(ifelse(data1$month.year <= "2016-12-01", 0,1))
## we are using ifelse function for categorising
t.test (data1 $ user.number ~ data1$group, var.equal = TRUE)
##
## Two Sample t-test
##
## data: data1$user.number by data1$group
## t = -9.2567, df = 121, p-value = 0.0000000000000009507
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -159.0556 -103.0074
## sample estimates:
## mean in group 0 mean in group 1
## 50.06849 181.10000
#Since p value is less than 0.05, we reject null and go with alternate hypothesis.
#average number of users in year 2017-2018 is more than the average number of users in year 2015-2016
If a disease is likely to spread in particular weather condition (data given in the disease index sheet), then the access of that disease should be more in the months having suitable weather conditions. Help the analyst in coming up with a statistical test to support the claim for two districts for which the sample of weather and disease access data is provided in the data sheet. Identify the diseases for which you can support this claim. Test this claim both for temperature and relative humidity at 95% confidence.
# import Belgavi weather data
bel_weather <- read_excel("JAT.xlsx", sheet = "Belagavi_weather")
bel_weather <- bel_weather %>% rename(relative.humidity = `Relative Humidity`)
#Calculation for D1
data8.1 <- bel_weather %>% select(D1, `relative.humidity`, Temperature)
#Ho : access of D1 <= in the months having suitable weather conditions
#Ha : access of D1 > in the months having suitable weather conditions
data9.1 <- data8.1 %>% mutate(D1.probability = case_when(relative.humidity > 80 & Temperature >= 20 & Temperature <= 24 ~ "Favourable", relative.humidity <= 80 | Temperature < 20 | Temperature > 24 ~ "Unfavourable"))
# in the above step, we created a new conditional column. If humidity > 80 & temp within 20-24, then probability for accessing D1 would be higher and named it "high", else as "Low"
# now we have to test if `means` of high and low are statistically significant.
t.test(data9.1$D1 ~ data9.1$D1.probability, alternative = "greater", var.equal = TRUE)
##
## Two Sample t-test
##
## data: data9.1$D1 by data9.1$D1.probability
## t = 2.7605, df = 22, p-value = 0.005707
## alternative hypothesis: true difference in means between group Favourable and group Unfavourable is greater than 0
## 95 percent confidence interval:
## 9.704442 Inf
## sample estimates:
## mean in group Favourable mean in group Unfavourable
## 37.59305 11.91669
#Since, p value is < 0.05 (0.005707), we reject null hypothesis. So, access of D1 > in the months having suitable weather conditions
#Calculation for D2
data8.2 <- bel_weather %>% select(D2, `relative.humidity`, Temperature)
data9.2 <- data8.2 %>% mutate(D2.probability = case_when(relative.humidity > 83 & Temperature >= 21.5 & Temperature <= 24.5 ~ "High", relative.humidity <= 83 | Temperature < 21.5 | Temperature > 24.5 ~ "Low"))
#Ho : access of D2 <= in the months having suitable weather conditions
#Ha : access of D2 > in the months having suitable weather conditions
t.test(data9.2$D2 ~ data9.2$D2.probability, alternative = "greater", var.equal = TRUE)
##
## Two Sample t-test
##
## data: data9.2$D2 by data9.2$D2.probability
## t = 3.7247, df = 22, p-value = 0.0005887
## alternative hypothesis: true difference in means between group High and group Low is greater than 0
## 95 percent confidence interval:
## 10.89113 Inf
## sample estimates:
## mean in group High mean in group Low
## 29.380223 9.173547
#Since, p value is < 0.05 (0.0005887), we reject null hypothesis. So, access of D2 > in the months having suitable weather conditions
#Calculation for D3
data8.3 <- bel_weather %>% select(D3, `relative.humidity`, Temperature)
data9.3 <- data8.3 %>% mutate(D3.probability = case_when( Temperature >= 22 & Temperature <= 24 ~ "High", Temperature < 22 | Temperature > 24 ~ "Low"))
#Ho : access of D3 <= in the months having suitable weather conditions
#Ha : access of D3 > in the months having suitable weather conditions
t.test(data9.3$D3 ~ data9.3$D3.probability, alternative = "greater", var.equal = TRUE)
##
## Two Sample t-test
##
## data: data9.3$D3 by data9.3$D3.probability
## t = 2.2224, df = 22, p-value = 0.01843
## alternative hypothesis: true difference in means between group High and group Low is greater than 0
## 95 percent confidence interval:
## 4.39784 Inf
## sample estimates:
## mean in group High mean in group Low
## 30.95773 11.61233
#Since, p value is < 0.05 (0.01843), we reject null hypothesis. So, access of D3 > in the months having suitable weather conditions
# Calculation for D4
data8.4 <- bel_weather %>% select(D4, `relative.humidity`, Temperature)
data9.4 <- data8.4 %>% mutate(D4.probability = case_when(relative.humidity > 85 & Temperature >= 22 & Temperature <= 26 ~ "High", relative.humidity <= 85 | Temperature < 22 | Temperature > 26 ~ "Low"))
#Ho : access of D4 <= in the months having suitable weather conditions
#Ha : access of D4 > in the months having suitable weather conditions
t.test(data9.4$D4 ~ data9.4$D4.probability, alternative = "greater", var.equal = TRUE)
##
## Two Sample t-test
##
## data: data9.4$D4 by data9.4$D4.probability
## t = 1.793, df = 22, p-value = 0.04337
## alternative hypothesis: true difference in means between group High and group Low is greater than 0
## 95 percent confidence interval:
## 0.4785112 Inf
## sample estimates:
## mean in group High mean in group Low
## 24.28984 12.97384
#Since, p value is < 0.05 (0.04337), we reject null hypothesis. So, access of D4 > in the months having suitable weather conditions
#Calculation for D5
data8.5 <- bel_weather %>% select(D5, `relative.humidity`, Temperature)
data9.5 <- data8.5 %>% mutate(D5.probability = case_when(relative.humidity >= 77 & relative.humidity <= 85 & Temperature >= 22 & Temperature <= 24.5 ~ "High", relative.humidity < 77 | relative.humidity > 85 | Temperature < 22 | Temperature > 24.5 ~ "Low"))
#Ho : access of D5 <= in the months having suitable weather conditions
#Ha : access of D5 > in the months having suitable weather conditions
t.test(data9.5$D5 ~ data9.5$D5.probability, alternative = "greater", var.equal = TRUE)
##
## Two Sample t-test
##
## data: data9.5$D5 by data9.5$D5.probability
## t = 3.6675, df = 22, p-value = 0.0006761
## alternative hypothesis: true difference in means between group High and group Low is greater than 0
## 95 percent confidence interval:
## 13.85781 Inf
## sample estimates:
## mean in group High mean in group Low
## 36.57407 10.51547
#Since, p value is < 0.05 (0.0006761), we reject null hypothesis. So, access of D5 > in the months having suitable weather conditions
# Calculation for D7
data8.7 <- bel_weather %>% select(D7, `relative.humidity`, Temperature)
data9.7 <- data8.7 %>% mutate(D7.probability = case_when(relative.humidity > 80 & Temperature > 25 ~ "High", relative.humidity <= 80 | Temperature <= 25 ~ "Low"))
#Ho : access of D7 <= in the months having suitable weather conditions
#Ha : access of D7 > in the months having suitable weather conditions
t.test(data9.7$D7 ~ data9.7$D7.probability, alternative = "greater", var.equal = TRUE)
##
## Two Sample t-test
##
## data: data9.7$D7 by data9.7$D7.probability
## t = 3.4275, df = 22, p-value = 0.001204
## alternative hypothesis: true difference in means between group High and group Low is greater than 0
## 95 percent confidence interval:
## 25.65723 Inf
## sample estimates:
## mean in group High mean in group Low
## 72.42328 21.00642
#Since, p value is < 0.05 (0.001204), we reject null hypothesis. So, access of D7 > in the months having suitable weather conditions
# For Dharwad weather
# import Dharwad weather data
dhar_weather <- read_excel("JAT.xlsx", sheet = "Dharwad_weather")
#Calculation for D1
data10.1 <- dhar_weather %>% select(D1, Temperature, `Relative Humidity`)
data11.1 <- data10.1 %>% mutate(D1.probability = case_when(`Relative Humidity` > 80 & Temperature >=20 & Temperature <= 24 ~ "High", `Relative Humidity`<= 80 | Temperature < 20 | Temperature >24 ~ "Low"))
t.test(data11.1 $D1 ~ data11.1 $D1.probability, alternative = "greater", var.equal = TRUE)
##
## Two Sample t-test
##
## data: data11.1$D1 by data11.1$D1.probability
## t = 4.5934, df = 20, p-value = 0.00008801
## alternative hypothesis: true difference in means between group High and group Low is greater than 0
## 95 percent confidence interval:
## 15.66022 Inf
## sample estimates:
## mean in group High mean in group Low
## 31.590651 6.515126
#Since, p value is < 0.05 (0.00), we reject null hypothesis. So, access of D1 > in the months having suitable weather conditions
#Calculation for D2
data10.2 <- dhar_weather %>% select(D2, Temperature, `Relative Humidity`)
data11.2 <- data10.2 %>% mutate(D2.probability = case_when(`Relative Humidity` > 83 & Temperature >= 21.5 & Temperature <= 24.5 ~ "High", `Relative Humidity` <= 83 | Temperature < 21.5 | Temperature > 24.5 ~ "Low"))
#Ho : access of D2 <= in the months having suitable weather conditions
#Ha : access of D2 > in the months having suitable weather conditions
t.test(data11.2$D2 ~ data11.2$D2.probability, alternative = "greater", var.equal = TRUE)
##
## Two Sample t-test
##
## data: data11.2$D2 by data11.2$D2.probability
## t = 4.0726, df = 20, p-value = 0.0002968
## alternative hypothesis: true difference in means between group High and group Low is greater than 0
## 95 percent confidence interval:
## 19.62338 Inf
## sample estimates:
## mean in group High mean in group Low
## 40.134921 6.096486
#Since, p value is < 0.05 (0.0002968), we reject null hypothesis. So, access of D2 > in the months having suitable weather conditions
#Calculation for D3
data10.3 <- dhar_weather %>% select(D3, `Relative Humidity`, Temperature)
data11.3 <- data10.3 %>% mutate(D3.probability = case_when( Temperature >= 22 & Temperature <= 24 ~ "High", Temperature < 22 | Temperature > 24 ~ "Low"))
#Ho : access of D3 <= in the months having suitable weather conditions
#Ha : access of D3 > in the months having suitable weather conditions
t.test(data11.3$D3 ~ data11.3$D3.probability, alternative = "greater", var.equal = TRUE)
##
## Two Sample t-test
##
## data: data11.3$D3 by data11.3$D3.probability
## t = 1.5057, df = 20, p-value = 0.07389
## alternative hypothesis: true difference in means between group High and group Low is greater than 0
## 95 percent confidence interval:
## -4.118138 Inf
## sample estimates:
## mean in group High mean in group Low
## 40.26971 11.96166
#Since, p value is > 0.05 (0.07389), we cannot reject null hypothesis. So, test is insignificant.
# Calculation for D4
data10.4 <- dhar_weather %>% select(D4, `Relative Humidity`, Temperature)
data11.4 <- data10.4 %>% mutate(D4.probability = case_when(`Relative Humidity` > 85 & Temperature >= 22 & Temperature <= 26 ~ "High", `Relative Humidity` <= 85 | Temperature < 22 | Temperature > 26 ~ "Low"))
#Ho : access of D4 <= in the months having suitable weather conditions
#Ha : access of D4 > in the months having suitable weather conditions
t.test(data11.4$D4 ~ data11.4$D4.probability, alternative = "greater", var.equal = TRUE)
##
## Two Sample t-test
##
## data: data11.4$D4 by data11.4$D4.probability
## t = 2.3147, df = 20, p-value = 0.01569
## alternative hypothesis: true difference in means between group High and group Low is greater than 0
## 95 percent confidence interval:
## 6.896259 Inf
## sample estimates:
## mean in group High mean in group Low
## 39.16667 12.10875
#Since, p value is < 0.05 (0.01569), we reject null hypothesis. So, access of D4 > in the months having suitable weather conditions
#Calculation for D5
data10.5 <- dhar_weather %>% select(D5, `Relative Humidity`, Temperature)
data11.5 <- data10.5 %>% mutate(D5.probability = case_when(`Relative Humidity` >= 77 & `Relative Humidity` <= 85 & Temperature >= 22 & Temperature <= 24.5 ~ "High", `Relative Humidity` < 77 | `Relative Humidity` > 85 | Temperature < 22 | Temperature > 24.5 ~ "Low"))
#Ho : access of D5 <= in the months having suitable weather conditions
#Ha : access of D5 > in the months having suitable weather conditions
t.test(data11.5$D5 ~ data11.5$D5.probability, alternative = "greater", var.equal = TRUE)
##
## Two Sample t-test
##
## data: data11.5$D5 by data11.5$D5.probability
## t = 0.10853, df = 20, p-value = 0.4573
## alternative hypothesis: true difference in means between group High and group Low is greater than 0
## 95 percent confidence interval:
## -16.53381 Inf
## sample estimates:
## mean in group High mean in group Low
## 14.17749 13.06725
#Since, p value is > 0.05 (0.4573), we cannot reject null hypothesis. So, test is insignificant
# Calculation for D7
data10.7 <- dhar_weather %>% select(D7, `Relative Humidity`, Temperature)
data11.7 <- data10.7 %>% mutate(D7.probability = case_when(`Relative Humidity` > 80 & Temperature > 25 ~ "High", `Relative Humidity` <= 80 | Temperature <= 25 ~ "Low"))
#Ho : access of D7 <= in the months having suitable weather conditions
#Ha : access of D7 > in the months having suitable weather conditions
t.test(data11.7$D7 ~ data11.7$D7.probability, alternative = "greater", var.equal = TRUE)
##
## Two Sample t-test
##
## data: data11.7$D7 by data11.7$D7.probability
## t = 0.72663, df = 20, p-value = 0.2379
## alternative hypothesis: true difference in means between group High and group Low is greater than 0
## 95 percent confidence interval:
## -20.73009 Inf
## sample estimates:
## mean in group High mean in group Low
## 35.00000 19.90822
#Since, p value is > 0.05 (0.2379), we cannot reject null hypothesis. So, test is insignificant.