Hypothesis
H0 i.e. null hypothesis says: Disease 6 (d6) information was NOT accessed at least 60 times (µ<=60)
Ha i.e. alternative hypothesis says: Disease 6 (d6) information was accessed at least 60 times (µ>60)
Approach
We will compare sample mean against population mean (µ) and population standard deviation is not given to us. So, correct hypothesis can be verified using One Sample T-test.
getwd()
## [1] "C:/Users/pahar/OneDrive/IIM-K/Classes/2022-09-10"
setwd(dir="C:\\Users\\pahar\\OneDrive\\IIM-K\\Classes\\2022-09-10")
library(readxl)
all_jat_data<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Data Sheet",col_names = TRUE) # this is all data from IMB733-XLS-ENG.xlsx, sheet Data Sheet
library(janitor)
## Warning: package 'janitor' was built under R version 4.2.1
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
all_jat_data<-clean_names(all_jat_data) # using janitor package to replace spaces/other symobols with undescore and using lowercase naming thorughout in variable names
all_jat_q1<-subset(all_jat_data,all_jat_data$month_year>="2017-10-01") #this is all data from October 2017 onwards
#One Sample T-test
t.test(all_jat_q1$d6,mu=60,alternative = "greater")
##
## One Sample t-test
##
## data: all_jat_q1$d6
## t = 2.341, df = 28, p-value = 0.01329
## alternative hypothesis: true mean is greater than 60
## 95 percent confidence interval:
## 62.29976 Inf
## sample estimates:
## mean of x
## 68.41379
Outcomes
Using One Sample T-test, we found that sample mean is 68.41379, which is higher than population mean 60
t = 2.341, df = 28, p-value = 0.01329
As p value got from t test is less than alpha (0.05), we can reject the null hypothesis i.e. µ<=60
We can support the alternative hypothesis i.e. µ>60
Inference
Assumption
Avg number of weekly app users in 2015-2016 =µ1
Avg number of weekly app users in 2017-2018 =µ2
Hypothesis
H0 i.e. null hypothesis says: µ2 IS NOT > µ1
Ha i.e. alternative hypothesis says: µ2 > µ1
Approach
We will Two Sample T-test to compare sample mean of 2017-2018 data with 2015-2016 data. This will verify which hypothesis we will support and which one is to reject.
all_jat_q2<-all_jat_data
all_jat_q2$group<-factor(ifelse(all_jat_data$month_year>="2017-01-01","2017-2018","2015-2016"))
all_jat_q2$group<-relevel(all_jat_q2$group,ref = "2017-2018")
t.test(all_jat_q2$no_of_users~all_jat_q2$group,alternative = "greater",var.eq=TRUE)
##
## Two Sample t-test
##
## data: all_jat_q2$no_of_users by all_jat_q2$group
## t = 9.2567, df = 121, p-value = 4.753e-16
## alternative hypothesis: true difference in means between group 2017-2018 and group 2015-2016 is greater than 0
## 95 percent confidence interval:
## 107.5685 Inf
## sample estimates:
## mean in group 2017-2018 mean in group 2015-2016
## 181.10000 50.06849
Outcomes
Using Two Sample T-test, we found that t = 9.2567, df = 121, p-value = 4.753e-16
As p value got from t test is less than alpha(0.05),we can reject the null hypothesis i.e. µ2 IS NOT > µ1
We can support the alternative hypothesis i.e. µ2 > µ1 i.e.
Inference
Assumption
Avg App usage in first week of every month =µ1
Avg App usage in first week of every month =µ2
Avg App usage in first week of every month =µ3
Avg App usage in first week of every month =µ4
Hypothesis
H0 i.e. null hypothesis says: µ1 != µ2 != µ3 != µ4
Ha i.e. alternative hypothesis says: µ1 = µ2 = µ3 = µ4
Approach
We will subset the data to use only the data from 01-01-2016 onward as given in the question. ANOVA test can be used to compare Avg App usage across weeks. This will verify which hypothesis we will support and which one is to reject.
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.2.1
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
all_jat_q3_gt16<-filter(all_jat_data,month_year>="2016-01-01")
all_jat_q3_gt16_sub<-select(all_jat_q3_gt16, week,usage)
all_jat_q3_gt16_sub$week<-factor(all_jat_q3_gt16_sub$week)
anova_q3_gt16<-aov(usage~week,data=all_jat_q3_gt16_sub)
summary(anova_q3_gt16)
## Df Sum Sq Mean Sq F value Pr(>F)
## week 3 1675404 558468 2.319 0.0804 .
## Residuals 94 22633380 240781
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Outcomes
Using ANOVA test, we found that p value is 0.0804 which is greater than 0.05, so result is insignificant
Since p value got from test is greater than alpha(0.05),we can NOT reject the the null hypothesis i.e. µ1 != µ2 != µ3 != µ4
We conclude that Avg App usage is different across the four weeks of a month for data from 01-01-2016 onward.
Inference
Assumption
Avg App Usage till 31-12-2015 = µ1
Avg App Usage after 01-01-2016 = µ2
Hypothesis
H0 i.e. null hypothesis says: µ1>=µ2
Ha i.e. alternative hypothesis says: µ2 > µ1
Approach
We will Two Sample T-test to compare mean App Usage of of the two groups. This will verify which hypothesis we will support and which one is to reject.
all_jat_q3_1516<-all_jat_data
all_jat_q3_1516$group<-factor(ifelse(all_jat_data$month_year>="2015-12-31","2016_Onward","Before_2016"))
all_jat_q3_1516$group<-relevel(all_jat_q3_1516$group,ref = "2016_Onward")
t.test(all_jat_q3_1516$usage~all_jat_q3_1516$group,alternative = "greater",var.eq=TRUE)
##
## Two Sample t-test
##
## data: all_jat_q3_1516$usage by all_jat_q3_1516$group
## t = 3.5721, df = 121, p-value = 0.0002547
## alternative hypothesis: true difference in means between group 2016_Onward and group Before_2016 is greater than 0
## 95 percent confidence interval:
## 198.3213 Inf
## sample estimates:
## mean in group 2016_Onward mean in group Before_2016
## 657.7041 287.6800
Outcomes
Using Two Sample T-test, we found that t = 5.1578, df = 76.377, p-value = 9.567e-07
As p value got from t test is less than alpha(0.05),we can reject the null hypothesis i.e. µ1>=µ2
We can support the alternative hypothesis i.e. µ2 > µ1
Inference
Assumption
Assuming app was release on the first day of August 2016, we assume that this release will impact the data from 01-08-2016 onward
Approach
This involves grouping the data 01-08-2016 onward and then aggregate the same to find mean of the usage. This can be achieved using filter, followed by grouping and summarize function.
all_jat_data_q4<-all_jat_data
all_jat_data_q4 <- filter (all_jat_data_q4, all_jat_data_q4$month_year >= "2016-08-01")
all_jat_data_q4_monthwise <- group_by (all_jat_data_q4,month_year)
all_jat_data_q4_summary <- summarise (all_jat_data_q4_monthwise, usage = mean(usage))
all_jat_data_q4_summary
## # A tibble: 19 × 2
## month_year usage
## <dttm> <dbl>
## 1 2016-08-01 00:00:00 465
## 2 2016-09-01 00:00:00 396.
## 3 2016-10-01 00:00:00 1215
## 4 2016-11-01 00:00:00 751
## 5 2016-12-01 00:00:00 496.
## 6 2017-01-01 00:00:00 442.
## 7 2017-02-01 00:00:00 429.
## 8 2017-03-01 00:00:00 681.
## 9 2017-04-01 00:00:00 356.
## 10 2017-05-01 00:00:00 203
## 11 2017-09-01 00:00:00 1923
## 12 2017-10-01 00:00:00 1398.
## 13 2017-11-01 00:00:00 1300.
## 14 2017-12-01 00:00:00 1446.
## 15 2018-01-01 00:00:00 1123.
## 16 2018-02-01 00:00:00 718
## 17 2018-03-01 00:00:00 865.
## 18 2018-04-01 00:00:00 623.
## 19 2018-05-01 00:00:00 380
Observation
From the summary we got from summarise function above, we found that immediately after launch in August 2016, within 2 months, there was a spike in average usage, but that did not last long
But from September 2017, there was significant raise in the average usage which lasted way longer than before
Inference
plot (all_jat_data_q4_summary$month_year, all_jat_data_q4_summary$usage, type="o", col = "orange", xlab = "Year-Month", ylab = "Mean Usage",
main = "Mean Usage by Month: Aug 2016 onward")
Assumption
Avg disease access when conditions are not favorable to disease = µ1
Avg disease access when conditions are favorable to disease= µ2
Hypothesis
H0 i.e. null hypothesis says: µ2 <= µ1
Ha i.e. alternative hypothesis says: µ2 > µ1
Approach
We will Two Sample T-test to compare sample mean of the given data sheets. This will verify which hypothesis we will support and which one is to reject.
We will use data set naming convention as
q5_bwdata_d1(data for Belagavi disease1)
q5_dwdata_d1(data for Dharwad disease1)
setwd(dir="C:\\Users\\pahar\\OneDrive\\IIM-K\\Classes\\2022-09-10")
library(readxl)
q5_bwdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Belagavi_weather",col_names = TRUE)
q5_dwdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Dharwad_weather",col_names = TRUE)
library(janitor)
q5_bwdata<-clean_names(q5_bwdata)
q5_dwdata<-clean_names(q5_dwdata)
q5_wdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Disease_index",col_names = TRUE)
q5_bwdata_d1<-q5_bwdata
q5_bwdata_d1$flag<-factor(ifelse((q5_bwdata_d1$temperature>=20 & q5_bwdata_d1$temperature<=24 & q5_bwdata_d1$humidity>80),"Y","N"))
q5_bwdata_d1$flag<-relevel(q5_bwdata_d1$flag,ref = "Y")
t.test(q5_bwdata_d1$d1~q5_bwdata_d1$flag,data=q5_bwdata_d1,alternative="greater",var.eq=TRUE)
##
## Two Sample t-test
##
## data: q5_bwdata_d1$d1 by q5_bwdata_d1$flag
## t = 2.7605, df = 22, p-value = 0.005707
## alternative hypothesis: true difference in means between group Y and group N is greater than 0
## 95 percent confidence interval:
## 9.704442 Inf
## sample estimates:
## mean in group Y mean in group N
## 37.59305 11.91669
Outcomes
Using Two Sample T-test, we found that t = 2.7605, df = 22, p-value = 0.005707
As p value got from test is less than alpha(0.05),we can reject the null hypothesis i.e. µ2<=µ1
We will have to support the alternative hypothesis i.e. µ2 > µ1
Inference
setwd(dir="C:\\Users\\pahar\\OneDrive\\IIM-K\\Classes\\2022-09-10")
library(readxl)
q5_bwdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Belagavi_weather",col_names = TRUE)
q5_dwdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Dharwad_weather",col_names = TRUE)
library(janitor)
q5_bwdata<-clean_names(q5_bwdata)
q5_dwdata<-clean_names(q5_dwdata)
q5_wdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Disease_index",col_names = TRUE)
q5_dwdata_d1<-q5_dwdata
q5_dwdata_d1$flag<-factor(ifelse((q5_dwdata_d1$temperature>=20 & q5_dwdata_d1$temperature<=24 & q5_dwdata_d1$relative_humidity>80),"Y","N"))
q5_dwdata_d1$flag<-relevel(q5_dwdata_d1$flag,ref = "Y")
t.test(q5_dwdata_d1$d1~q5_dwdata_d1$flag,data=q5_dwdata_d1,alternative="greater",var.eq=TRUE)
##
## Two Sample t-test
##
## data: q5_dwdata_d1$d1 by q5_dwdata_d1$flag
## t = 4.5934, df = 20, p-value = 8.801e-05
## alternative hypothesis: true difference in means between group Y and group N is greater than 0
## 95 percent confidence interval:
## 15.66022 Inf
## sample estimates:
## mean in group Y mean in group N
## 31.590651 6.515126
Outcomes
Using Two Sample T-test, we found that t = 4.5934, df = 20, p-value = 8.801e-05
As p value got from t test is less than alpha(0.05),we can reject the null hypothesis i.e. µ2<=µ1
We will have to support the alternative hypothesis i.e. µ2 > µ1
Inference
setwd(dir="C:\\Users\\pahar\\OneDrive\\IIM-K\\Classes\\2022-09-10")
library(readxl)
q5_bwdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Belagavi_weather",col_names = TRUE)
q5_dwdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Dharwad_weather",col_names = TRUE)
library(janitor)
q5_bwdata<-clean_names(q5_bwdata)
q5_dwdata<-clean_names(q5_dwdata)
q5_wdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Disease_index",col_names = TRUE)
q5_bwdata_d2<-q5_bwdata
q5_bwdata_d2$flag<-factor(ifelse((q5_bwdata_d2$temperature>=21.5 & q5_bwdata_d2$temperature<=24.5 & q5_bwdata_d2$humidity>83),"Y","N"))
q5_bwdata_d2$flag<-relevel(q5_bwdata_d2$flag,ref = "Y")
t.test(q5_bwdata_d2$d2~q5_bwdata_d2$flag,data=q5_bwdata_d2,alternative="greater",var.eq=TRUE)
##
## Two Sample t-test
##
## data: q5_bwdata_d2$d2 by q5_bwdata_d2$flag
## t = 3.7247, df = 22, p-value = 0.0005887
## alternative hypothesis: true difference in means between group Y and group N is greater than 0
## 95 percent confidence interval:
## 10.89113 Inf
## sample estimates:
## mean in group Y mean in group N
## 29.380223 9.173547
Outcomes
Using Two Sample T-test, we found that t = 3.7247, df = 22, p-value = 0.0005887
As p value got from test is less than alpha(0.05),we can reject the null hypothesis i.e. µ2<=µ1
We will have to support the alternative hypothesis i.e. µ2 > µ1
Inference
setwd(dir="C:\\Users\\pahar\\OneDrive\\IIM-K\\Classes\\2022-09-10")
library(readxl)
q5_bwdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Belagavi_weather",col_names = TRUE)
q5_dwdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Dharwad_weather",col_names = TRUE)
library(janitor)
q5_bwdata<-clean_names(q5_bwdata)
q5_dwdata<-clean_names(q5_dwdata)
q5_wdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Disease_index",col_names = TRUE)
q5_dwdata_d2<-q5_dwdata
q5_dwdata_d2$flag<-factor(ifelse((q5_dwdata_d2$temperature>=21.5 & q5_dwdata_d2$temperature<=24.5 & q5_dwdata_d2$relative_humidity>83),"Y","N"))
q5_dwdata_d2$flag<-relevel(q5_dwdata_d2$flag,ref = "Y")
t.test(q5_dwdata_d2$d2~q5_dwdata_d2$flag,data=q5_dwdata_d2,alternative="greater",var.eq=TRUE)
##
## Two Sample t-test
##
## data: q5_dwdata_d2$d2 by q5_dwdata_d2$flag
## t = 4.0726, df = 20, p-value = 0.0002968
## alternative hypothesis: true difference in means between group Y and group N is greater than 0
## 95 percent confidence interval:
## 19.62338 Inf
## sample estimates:
## mean in group Y mean in group N
## 40.134921 6.096486
Outcomes
Using Two Sample T-test, we found that t = 4.0726, df = 20, p-value = 0.0002968
As p value got from t test is less than alpha(0.05),we can reject the null hypothesis i.e. µ2<=µ1
We will have to support the alternative hypothesis i.e. µ2 > µ1
Inference
setwd(dir="C:\\Users\\pahar\\OneDrive\\IIM-K\\Classes\\2022-09-10")
library(readxl)
q5_bwdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Belagavi_weather",col_names = TRUE)
q5_dwdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Dharwad_weather",col_names = TRUE)
library(janitor)
q5_bwdata<-clean_names(q5_bwdata)
q5_dwdata<-clean_names(q5_dwdata)
q5_wdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Disease_index",col_names = TRUE)
q5_bwdata_d3<-q5_bwdata
q5_bwdata_d3$flag<-factor(ifelse((q5_bwdata_d3$temperature>=22 & q5_bwdata_d3$temperature<=24),"Y","N"))
q5_bwdata_d3$flag<-relevel(q5_bwdata_d3$flag,ref = "Y")
t.test(q5_bwdata_d3$d3~q5_bwdata_d3$flag,data=q5_bwdata_d3,alternative="greater",var.eq=TRUE)
##
## Two Sample t-test
##
## data: q5_bwdata_d3$d3 by q5_bwdata_d3$flag
## t = 2.2224, df = 22, p-value = 0.01843
## alternative hypothesis: true difference in means between group Y and group N is greater than 0
## 95 percent confidence interval:
## 4.39784 Inf
## sample estimates:
## mean in group Y mean in group N
## 30.95773 11.61233
Outcomes
Using Two Sample T-test, we found that t = 2.2224, df = 22, p-value = 0.01843
As p value got from test is less than alpha(0.05),we can reject the null hypothesis i.e. µ2<=µ1
We will have to support the alternative hypothesis i.e. µ2 > µ1
Inference
setwd(dir="C:\\Users\\pahar\\OneDrive\\IIM-K\\Classes\\2022-09-10")
library(readxl)
q5_bwdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Belagavi_weather",col_names = TRUE)
q5_dwdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Dharwad_weather",col_names = TRUE)
library(janitor)
q5_bwdata<-clean_names(q5_bwdata)
q5_dwdata<-clean_names(q5_dwdata)
q5_wdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Disease_index",col_names = TRUE)
q5_dwdata_d3<-q5_dwdata
q5_dwdata_d3$flag<-factor(ifelse((q5_dwdata_d3$temperature>=22 & q5_dwdata_d3$temperature<=24),"Y","N"))
q5_dwdata_d3$flag<-relevel(q5_dwdata_d3$flag,ref = "Y")
t.test(q5_dwdata_d3$d3~q5_dwdata_d3$flag,data=q5_dwdata_d3,alternative="greater",var.eq=TRUE)
##
## Two Sample t-test
##
## data: q5_dwdata_d3$d3 by q5_dwdata_d3$flag
## t = 1.5057, df = 20, p-value = 0.07389
## alternative hypothesis: true difference in means between group Y and group N is greater than 0
## 95 percent confidence interval:
## -4.118138 Inf
## sample estimates:
## mean in group Y mean in group N
## 40.26971 11.96166
Outcomes
Using Two Sample T-test, we found that t = 1.5057, df = 20, p-value = 0.07389
As p value got from t test is greater than alpha(0.05), we will accept the null hypothesis i.e. µ2<=µ1
Inference
setwd(dir="C:\\Users\\pahar\\OneDrive\\IIM-K\\Classes\\2022-09-10")
library(readxl)
q5_bwdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Belagavi_weather",col_names = TRUE)
q5_dwdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Dharwad_weather",col_names = TRUE)
library(janitor)
q5_bwdata<-clean_names(q5_bwdata)
q5_dwdata<-clean_names(q5_dwdata)
q5_wdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Disease_index",col_names = TRUE)
q5_bwdata_d4<-q5_bwdata
q5_bwdata_d4$flag<-factor(ifelse((q5_bwdata_d4$temperature>=22 & q5_bwdata_d4$temperature<=26 & q5_bwdata_d4$humidity>85),"Y","N"))
q5_bwdata_d4$flag<-relevel(q5_bwdata_d4$flag,ref = "Y")
t.test(q5_bwdata_d4$d4~q5_bwdata_d4$flag,data=q5_bwdata_d4,alternative="greater",var.eq=TRUE)
##
## Two Sample t-test
##
## data: q5_bwdata_d4$d4 by q5_bwdata_d4$flag
## t = 1.793, df = 22, p-value = 0.04337
## alternative hypothesis: true difference in means between group Y and group N is greater than 0
## 95 percent confidence interval:
## 0.4785112 Inf
## sample estimates:
## mean in group Y mean in group N
## 24.28984 12.97384
Outcomes
Using Two Sample T-test, we found that t = 1.793, df = 22, p-value = 0.04337
As p value got from test is less than alpha(0.05),we can reject the null hypothesis i.e. µ2<=µ1
We will have to support the alternative hypothesis i.e. µ2 > µ1
Inference
setwd(dir="C:\\Users\\pahar\\OneDrive\\IIM-K\\Classes\\2022-09-10")
library(readxl)
q5_bwdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Belagavi_weather",col_names = TRUE)
q5_dwdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Dharwad_weather",col_names = TRUE)
library(janitor)
q5_bwdata<-clean_names(q5_bwdata)
q5_dwdata<-clean_names(q5_dwdata)
q5_wdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Disease_index",col_names = TRUE)
q5_dwdata_d4<-q5_dwdata
q5_dwdata_d4$flag<-factor(ifelse((q5_dwdata_d4$temperature>=22 & q5_dwdata_d4$temperature<=26 & q5_dwdata_d4$relative_humidity>85),"Y","N"))
q5_dwdata_d4$flag<-relevel(q5_dwdata_d4$flag,ref = "Y")
t.test(q5_dwdata_d4$d4~q5_dwdata_d4$flag,data=q5_dwdata_d4,alternative="greater",var.eq=TRUE)
##
## Two Sample t-test
##
## data: q5_dwdata_d4$d4 by q5_dwdata_d4$flag
## t = 2.3147, df = 20, p-value = 0.01569
## alternative hypothesis: true difference in means between group Y and group N is greater than 0
## 95 percent confidence interval:
## 6.896259 Inf
## sample estimates:
## mean in group Y mean in group N
## 39.16667 12.10875
Outcomes
Using Two Sample T-test, we found that t = 2.3147, df = 20, p-value = 0.01569
As p value got from t test is less than alpha(0.05),we can reject the null hypothesis i.e. µ2<=µ1
We will have to support the alternative hypothesis i.e. µ2 > µ1
Inference
setwd(dir="C:\\Users\\pahar\\OneDrive\\IIM-K\\Classes\\2022-09-10")
library(readxl)
q5_bwdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Belagavi_weather",col_names = TRUE)
q5_dwdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Dharwad_weather",col_names = TRUE)
library(janitor)
q5_bwdata<-clean_names(q5_bwdata)
q5_dwdata<-clean_names(q5_dwdata)
q5_wdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Disease_index",col_names = TRUE)
q5_bwdata_d5<-q5_bwdata
q5_bwdata_d5$flag<-factor(ifelse(((q5_bwdata_d5$temperature>=22 & q5_bwdata_d5$temperature<=24.5) & (q5_bwdata_d5$humidity>=77 & q5_bwdata_d5$humidity<=85)),"Y","N"))
q5_bwdata_d5$flag<-relevel(q5_bwdata_d5$flag,ref = "Y")
t.test(q5_bwdata_d5$d5~q5_bwdata_d5$flag,data=q5_bwdata_d5,alternative="greater",var.eq=TRUE)
##
## Two Sample t-test
##
## data: q5_bwdata_d5$d5 by q5_bwdata_d5$flag
## t = 3.6675, df = 22, p-value = 0.0006761
## alternative hypothesis: true difference in means between group Y and group N is greater than 0
## 95 percent confidence interval:
## 13.85781 Inf
## sample estimates:
## mean in group Y mean in group N
## 36.57407 10.51547
Outcomes
Using Two Sample T-test, we found that t = 3.6675, df = 22, p-value = 0.0006761
As p value got from test is less than alpha(0.05),we can reject the null hypothesis i.e. µ2<=µ1
We will have to support the alternative hypothesis i.e. µ2 > µ1
Inference
setwd(dir="C:\\Users\\pahar\\OneDrive\\IIM-K\\Classes\\2022-09-10")
library(readxl)
q5_bwdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Belagavi_weather",col_names = TRUE)
q5_dwdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Dharwad_weather",col_names = TRUE)
library(janitor)
q5_bwdata<-clean_names(q5_bwdata)
q5_dwdata<-clean_names(q5_dwdata)
q5_wdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Disease_index",col_names = TRUE)
q5_dwdata_d5<-q5_dwdata
q5_dwdata_d5$flag<-factor(ifelse(((q5_dwdata_d5$temperature>=22 & q5_dwdata_d5$temperature<=24.5) & (q5_dwdata_d5$relative_humidity>=77 & q5_dwdata_d5$relative_humidity<=85)),"Y","N"))
q5_dwdata_d5$flag<-relevel(q5_dwdata_d5$flag,ref = "Y")
t.test(q5_dwdata$d5~q5_dwdata_d5$flag,data=q5_dwdata_d5,alternative="greater",var.eq=TRUE)
##
## Two Sample t-test
##
## data: q5_dwdata$d5 by q5_dwdata_d5$flag
## t = 0.10853, df = 20, p-value = 0.4573
## alternative hypothesis: true difference in means between group Y and group N is greater than 0
## 95 percent confidence interval:
## -16.53381 Inf
## sample estimates:
## mean in group Y mean in group N
## 14.17749 13.06725
Outcomes
Using Two Sample T-test, we found that t = 0.10853, df = 20, p-value = 0.4573
As p value got from t test is greater than alpha(0.05),we will accept the null hypothesis i.e. µ2<=µ1
Inference
setwd(dir="C:\\Users\\pahar\\OneDrive\\IIM-K\\Classes\\2022-09-10")
library(readxl)
q5_bwdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Belagavi_weather",col_names = TRUE)
q5_dwdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Dharwad_weather",col_names = TRUE)
library(janitor)
q5_bwdata<-clean_names(q5_bwdata)
q5_dwdata<-clean_names(q5_dwdata)
q5_wdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Disease_index",col_names = TRUE)
q5_bwdata_d7<-q5_bwdata
q5_bwdata_d7$flag<-factor(ifelse((q5_bwdata_d7$temperature>25 & q5_bwdata_d7$humidity>80),"Y","N"))
q5_bwdata_d7$flag<-relevel(q5_bwdata_d7$flag,ref = "Y")
t.test(q5_bwdata_d7$d4~q5_bwdata_d7$flag,data=q5_bwdata_d7,alternative="greater",var.eq=TRUE)
##
## Two Sample t-test
##
## data: q5_bwdata_d7$d4 by q5_bwdata_d7$flag
## t = 1.1738, df = 22, p-value = 0.1265
## alternative hypothesis: true difference in means between group Y and group N is greater than 0
## 95 percent confidence interval:
## -4.373739 Inf
## sample estimates:
## mean in group Y mean in group N
## 23.59788 14.15040
Outcomes
Using Two Sample T-test, we found that t = 1.1738, df = 22, p-value = 0.1265
As p value got from test is greater than alpha(0.05),we will accept the null hypothesis i.e. µ2<=µ1
Inference
setwd(dir="C:\\Users\\pahar\\OneDrive\\IIM-K\\Classes\\2022-09-10")
library(readxl)
q5_bwdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Belagavi_weather",col_names = TRUE)
q5_dwdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Dharwad_weather",col_names = TRUE)
library(janitor)
q5_bwdata<-clean_names(q5_bwdata)
q5_dwdata<-clean_names(q5_dwdata)
q5_wdata<-read_xlsx("IMB733-XLS-ENG.xlsx",sheet = "Disease_index",col_names = TRUE)
q5_dwdata_d7<-q5_dwdata
q5_dwdata_d7$flag<-factor(ifelse((q5_dwdata_d7$temperature>25 & q5_dwdata_d7$relative_humidity>80),"Y","N"))
q5_dwdata_d7$flag<-relevel(q5_dwdata_d7$flag,ref = "Y")
t.test(q5_dwdata_d7$d4~q5_dwdata_d7$flag,data=q5_dwdata_d7,alternative="greater",var.eq=TRUE)
##
## Two Sample t-test
##
## data: q5_dwdata_d7$d4 by q5_dwdata_d7$flag
## t = 1.0197, df = 20, p-value = 0.16
## alternative hypothesis: true difference in means between group Y and group N is greater than 0
## 95 percent confidence interval:
## -10.80039 Inf
## sample estimates:
## mean in group Y mean in group N
## 30.00000 14.37831
Outcomes
Using Two Sample T-test, we found that t = 1.0197, df = 20, p-value = 0.16
As p value got from t test is greater than alpha(0.05),we will accept the null hypothesis i.e. µ2<=µ1
Inference