ANALYSIS OF JAYALAKSHMI AGRO TECH CASES USING THE DATA SHARED

Set the working directory, load the needed libraries and the data

setwd("C:/_MyData_/IIMK/Assignment 1")

library (readxl)
library (dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

agriData_org <- read_excel ("IMB733-XLS-ENG Spreadsheet 3.xlsx", sheet = "Data Sheet")
agriData <- agriData_org
str (agriData)

## tibble [123 x 26] (S3: tbl_df/tbl/data.frame)
##  $ Month-Year   : POSIXct[1:123], format: "2015-06-01" "2015-07-01" ...
##  $ Week         : chr [1:123] "Week4" "Week1" "Week2" "Week3" ...
##  $ No of users  : num [1:123] 2 1 1 4 6 12 13 10 7 12 ...
##  $ Usage        : num [1:123] 4 1 25 70 100 291 225 141 148 215 ...
##  $ D1           : num [1:123] 0 0 0 4 1 12 7 4 1 5 ...
##  $ D2           : num [1:123] 0 0 1 2 1 6 5 4 0 3 ...
##  $ D3           : num [1:123] 1 0 2 3 0 11 6 8 1 6 ...
##  $ D4           : num [1:123] 0 0 2 4 2 4 2 4 5 3 ...
##  $ D5           : num [1:123] 0 0 0 2 0 5 5 3 4 6 ...
##  $ D6           : num [1:123] 0 0 0 4 2 15 6 5 5 12 ...
##  $ D7           : num [1:123] 0 0 2 1 7 7 4 8 3 7 ...
##  $ D8           : num [1:123] 0 0 3 7 9 7 5 7 3 33 ...
##  $ D9           : num [1:123] 0 0 2 3 2 6 6 6 3 11 ...
##  $ D10          : num [1:123] 0 0 1 0 1 10 1 4 2 3 ...
##  $ D11          : num [1:123] 0 0 1 3 4 12 6 3 5 8 ...
##  $ V1           : num [1:123] 0 0 0 5 11 26 28 18 18 20 ...
##  $ V2           : num [1:123] 0 1 0 4 8 20 19 11 16 15 ...
##  $ V3           : num [1:123] 0 0 1 2 5 12 13 8 7 10 ...
##  $ V4           : num [1:123] 0 0 1 2 5 13 8 7 6 6 ...
##  $ V5           : num [1:123] 2 0 1 1 9 16 9 4 15 6 ...
##  $ V6           : num [1:123] 0 0 1 3 3 22 14 4 10 10 ...
##  $ V7           : num [1:123] 0 0 0 3 7 21 13 5 4 9 ...
##  $ V8           : num [1:123] 0 0 0 2 6 14 9 5 10 7 ...
##  $ V9           : num [1:123] 0 0 0 4 7 16 20 10 9 8 ...
##  $ V10          : num [1:123] 0 0 1 4 7 23 17 5 14 10 ...
##  $ Micronutrient: num [1:123] 1 0 6 7 3 13 22 8 7 17 ...

as.Date (agriData$`Month-Year`, "%Y-%m-%d")

## Warning in as.POSIXlt.POSIXct(x, tz = tz): unknown timezone '%Y-%m-%d'

##   [1] "2015-06-01" "2015-07-01" "2015-07-01" "2015-07-01" "2015-07-01"
##   [6] "2015-08-01" "2015-08-01" "2015-08-01" "2015-08-01" "2015-09-01"
##  [11] "2015-09-01" "2015-09-01" "2015-09-01" "2015-10-01" "2015-10-01"
##  [16] "2015-10-01" "2015-10-01" "2015-11-01" "2015-11-01" "2015-11-01"
##  [21] "2015-11-01" "2015-12-01" "2015-12-01" "2015-12-01" "2015-12-01"
##  [26] "2016-01-01" "2016-01-01" "2016-01-01" "2016-01-01" "2016-02-01"
##  [31] "2016-02-01" "2016-02-01" "2016-02-01" "2016-03-01" "2016-03-01"
##  [36] "2016-03-01" "2016-03-01" "2016-04-01" "2016-04-01" "2016-04-01"
##  [41] "2016-04-01" "2016-05-01" "2016-05-01" "2016-05-01" "2016-05-01"
##  [46] "2016-06-01" "2016-06-01" "2016-06-01" "2016-06-01" "2016-07-01"
##  [51] "2016-07-01" "2016-07-01" "2016-07-01" "2016-08-01" "2016-08-01"
##  [56] "2016-08-01" "2016-08-01" "2016-09-01" "2016-09-01" "2016-09-01"
##  [61] "2016-09-01" "2016-10-01" "2016-10-01" "2016-10-01" "2016-10-01"
##  [66] "2016-11-01" "2016-11-01" "2016-11-01" "2016-11-01" "2016-12-01"
##  [71] "2016-12-01" "2016-12-01" "2016-12-01" "2017-01-01" "2017-01-01"
##  [76] "2017-01-01" "2017-01-01" "2017-02-01" "2017-02-01" "2017-02-01"
##  [81] "2017-02-01" "2017-03-01" "2017-03-01" "2017-03-01" "2017-03-01"
##  [86] "2017-04-01" "2017-04-01" "2017-04-01" "2017-04-01" "2017-05-01"
##  [91] "2017-05-01" "2017-05-01" "2017-05-01" "2017-09-01" "2017-10-01"
##  [96] "2017-10-01" "2017-10-01" "2017-10-01" "2017-11-01" "2017-11-01"
## [101] "2017-11-01" "2017-11-01" "2017-12-01" "2017-12-01" "2017-12-01"
## [106] "2017-12-01" "2018-01-01" "2018-01-01" "2018-01-01" "2018-01-01"
## [111] "2018-02-01" "2018-02-01" "2018-02-01" "2018-02-01" "2018-03-01"
## [116] "2018-03-01" "2018-03-01" "2018-03-01" "2018-04-01" "2018-04-01"
## [121] "2018-04-01" "2018-04-01" "2018-05-01"

Case - 1

Anand, the cofounder of JAT, claims that disease 6 (leaf curl) information was accessed at least 60 times every week on average since October 2017 due to this disease outbreak. Test this claim at a significance level of 0.05 using an appropriate hypothesis test.

Hypothesis

u = Weekly average of accessing disease 6 since Oct 2017
H0: u < 60
Ha: u >= 60

Test: One sample t-test

case1Data <- subset (agriData, agriData$`Month-Year` >= "2017-10-01")
case1Test <- t.test (case1Data$D6, mu = 60, alernative = "greater")
case1Test

## 
##  One Sample t-test
## 
## data:  case1Data$D6
## t = 2.341, df = 28, p-value = 0.02658
## alternative hypothesis: true mean is not equal to 60
## 95 percent confidence interval:
##  61.05162 75.77597
## sample estimates:
## mean of x 
##  68.41379

Findings

p-value = 0.02658 which is less than 0.05, which means we reject H0 and accept Ha
With 95% confidence the disease 6 information on an average at 61.05 to 75.77 times per week

Conclusion

Information pertaining to disease 6 is accessed at least 60 times every week since Oct 2017

Case - 2

Hypothesis

p = Proportion of users accessing D6
H0: p < 0.15
Ha: p >= 0.15

Test: Z-proportion test

d6Proportion <- sum(agriData$D6) / sum(agriData$Usage)
se <- sqrt((0.15 * (1 - 0.15)) / 123) # We have 123 observations
z_stat_d6 <- (d6Proportion - 0.15) / se
d6pValue <- 1 - pnorm (z_stat_d6)
d6pValue

## [1] 0.9974213

Findings

p-value for proportion of D6 is 0.99 which is greater than 0.05, which means we accept H0 and reject Ha

Conclusion

Information related to disease 6 was accessed below 15% among all the app users

Case - 3

JAT believes that over the years, the average number of app users have increased significantly. Is there statistical evidence to support that the average number of users in year 2017-2018 is more than average number of users in year 2015-2016 at a=0.05? Support your answer with all necessary tests.

Hypothesis

u1 - 2015-16 [average number of app users per week]
u2 - 2017-18
H0: u1 <= u2
Ha: u1 > u2

Test: Two sample t-test

agriData2015_16 <- subset (agriData, agriData$`Month-Year` >= "2015-01-01" & agriData$`Month-Year` <= "2016-12-31")
agriData2015_16$yearGroup <- "2015-16"

agriData2017_18 <- subset (agriData, agriData$`Month-Year` >= "2017-01-01" & agriData$`Month-Year` <= "2018-12-31")
agriData2017_18$yearGroup <- "2017-18"

agriDataByYearGroup <- rbind (agriData2015_16, agriData2017_18)

case3Test <- t.test (agriDataByYearGroup$`No of users` ~ agriDataByYearGroup$yearGroup, alternative = "greater", var.equal = TRUE)
case3Test

## 
##  Two Sample t-test
## 
## data:  agriDataByYearGroup$`No of users` by agriDataByYearGroup$yearGroup
## t = -9.2567, df = 121, p-value = 1
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -154.4945       Inf
## sample estimates:
## mean in group 2015-16 mean in group 2017-18 
##              50.06849             181.10000

Findings

p-value is 1 which is greater than 0.05, which means we accept H0 and reject Ha
Mean users during 2015-16 is 50 and during 2017-18 is 181

Conclusion

Average number of app users is higher in 2017-18 than in 2015-16.

Case - 4a

Farmers use apps to access information throughout the month. Using the data, check whether app usage is same or different across the four weeks of a month.

Hypothesis

u1 = Average usage in 1st week of every month
u2 = Average usage in 2nd week of every month
u3 = Average usage in 3rd week of every month
u4 = Average usage in 4th week of every month
H0: u1 = u2 = u3 = u4
Ha: Not all u are equal

Test: Anova

case4aData <- data.frame(Usage = agriData$Usage, Week = agriData$Week)
case4aData$Week <- factor (case4aData$Week)
agriAnova <- aov(Usage~Week, data = case4aData)
summary (agriAnova)

##              Df   Sum Sq Mean Sq F value Pr(>F)  
## Week          3  1515178  505059    2.22 0.0894 .
## Residuals   119 27074553  227517                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

model.tables (agriAnova, type = "means")

## Tables of means
## Grand mean
##          
## 582.4959 
## 
##  Week 
##     Week1 Week2 Week3 Week4
##     551.9 522.2   480 764.7
## rep  31.0  30.0    30  32.0

Findings

Average between weeks variance is 505059
Average within weeks variance is 227517
F-Value the ratio of above two averages is close to 2 which is not that high.
In other words the F-value is statistically insignificant
p-value is 0.0894 which is > 0.05. So, we accept H0 and reject Ha
The average usage per week is like this. Week1 - 551.9, Week2 - 522.2, Week3 - 480, Week4 - 764.7 which are close enough

Conclusion

App usage is same across the four weeks of a month.

Case - 4b

Anand claims that app usage picked up after January 2016; so, test this hypothesis using data from January-2016 - May 2018.

Hypothesis

u1 = Average usage per week until Dec 2015
u2 = Average usage per week since Jan 2016
H0: u1 >= u2
Ha: u1 < u2

Test: Two sample t-test

agriDataTillDec2015 <- subset (agriData, agriData$`Month-Year` <= "2015-12-31")
agriDataTillDec2015$YearGroup <- "Till_2015"

agriDataFromJan2016 <- subset (agriData, agriData$`Month-Year` >= "2016-01-01")
agriDataFromJan2016$YearGroup <- "From_2016"

case4bData <- rbind(agriDataTillDec2015, agriDataFromJan2016)

case4Test <- t.test (case4bData$Usage ~ case4bData$YearGroup, alternative = "less", var.equal = TRUE)
case4Test

## 
##  Two Sample t-test
## 
## data:  case4bData$Usage by case4bData$YearGroup
## t = 3.5721, df = 121, p-value = 0.9997
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##      -Inf 541.7268
## sample estimates:
## mean in group From_2016 mean in group Till_2015 
##                657.7041                287.6800

Findings

p-value is 0.9997 which is > 0.05. So we accept H0 and reject Ha
Mean usage until 2015 is 287.68 and from 2016 it is 657.70

Conclusion

App usage picked up after Jan 2016

Case - 5

Anand claims that number of users have increased over a period of two years. He wants to understand if app usage (number of times his app is accessed in a month by various users) has increased with the increased number of users. Prove this claim statistically. Also suggest a suitable statistical test to prove that the correlation between users and usage is non-zero.

Hypothesis

H0: The app usage has not increased with the number of users (in last 2 years)
Ha: The app usage has increased with the number of users (in last 2 years)

Test: Pearson’s Correlation test

case5Data <- subset (agriData, agriData$`Month-Year` >= "2016-05-01") # We have data till May 2018. So select data from May 2016
groupByMonthCase5 <- group_by (case5Data, `Month-Year`)
case5Summary <- summarise(groupByMonthCase5, Users = sum(`No of users`), Usage = sum(Usage))

## `summarise()` ungrouping output (override with `.groups` argument)

cor.test (case5Summary$Users, case5Summary$Usage)

## 
##  Pearson's product-moment correlation
## 
## data:  case5Summary$Users and case5Summary$Usage
## t = 9.1887, df = 20, p-value = 1.287e-08
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.7691185 0.9577074
## sample estimates:
##       cor 
## 0.8991594

Findings

p-value is less than 0.05 which means, we reject H0 and accept Ha
The estimated correlation coefficient value is 0.899 which is non-zero and is statistically significant.
With 95% confidence, the correlation coefficient interval is 0.769 to 0.957

Conclusion

The app usage has increased with the number of users in last 2 years

Case - 6

A new version of the app was released in August 2016. Anand wants to understand which month in the given time frame after the launch of the new version, the mean usage pattern would start to show a statistically significant shift.

Test: Draw a line graph

case6Data <- subset (agriData, agriData$`Month-Year` >= "2016-09-01")
groupByMonthCase6 <- group_by (case6Data, `Month-Year`)
case6Summary <- summarise (groupByMonthCase6, Usage = mean(Usage))

## `summarise()` ungrouping output (override with `.groups` argument)

case6Summary

## # A tibble: 18 x 2
##    `Month-Year`        Usage
##    <dttm>              <dbl>
##  1 2016-09-01 00:00:00  396.
##  2 2016-10-01 00:00:00 1215 
##  3 2016-11-01 00:00:00  751 
##  4 2016-12-01 00:00:00  496.
##  5 2017-01-01 00:00:00  442.
##  6 2017-02-01 00:00:00  429.
##  7 2017-03-01 00:00:00  681.
##  8 2017-04-01 00:00:00  356.
##  9 2017-05-01 00:00:00  203 
## 10 2017-09-01 00:00:00 1923 
## 11 2017-10-01 00:00:00 1398.
## 12 2017-11-01 00:00:00 1300.
## 13 2017-12-01 00:00:00 1446.
## 14 2018-01-01 00:00:00 1123.
## 15 2018-02-01 00:00:00  718 
## 16 2018-03-01 00:00:00  865.
## 17 2018-04-01 00:00:00  623.
## 18 2018-05-01 00:00:00  380

plot (case6Summary$`Month-Year`, case6Summary$Usage, type="o", col = "red", xlab = "Year", ylab = "Average Usage",
      main = "Line Graph for Average Usage by Month")

Findings

As we can see from the data and the graph, usage Pattern started to show statistically significant shift from Sep 2017.

Case - 7

If a disease is likely to spread in particular weather condition (data given in the disease index sheet), then the access of that disease should be more in the months having suitable weather conditions. Help the analyst in coming up with a statistical test to support the claim for two districts for which the sample of weather and disease access data is provided in the data sheet. Identify the diseases for which you can support this claim. Test this claim both for temperature and relative humidity at 95% confidence.

Hypothesis

H0: A disease information is accessed more even when the weather condition is unfavorable to that disease
Ha: A disease information is accessed more when the weather condition is favorable to that disease
Weather condition is determined based on temperature and humidity
u1 = Mean access of the disease when the weather condition is favorable to that disease
u2 = Mean access of the disease when the weather condition is unfavorable to that disease
H0: u1 <= u2
Ha: u1 > u2

Test: Two sample t-test

Load district wise data

belagaviAgriData <- read_excel ("IMB733-XLS-ENG Spreadsheet 3.xlsx", sheet = "Belagavi_weather")
dharwadAgriData <- read_excel ("IMB733-XLS-ENG Spreadsheet 3.xlsx", sheet = "Dharwad_weather")

Analyze for D1 in Belagavi

bel_d1FavorableData <- subset (belagaviAgriData, belagaviAgriData$Temperature >= 20 & belagaviAgriData$Temperature <= 24 & belagaviAgriData$`Relative Humidity` > 80)
bel_d1FavorableData$D1Favorable <- "Yes"
bel_d1UnfavorableData <- subset (belagaviAgriData, !(belagaviAgriData$Temperature >= 20 & belagaviAgriData$Temperature <= 24 & belagaviAgriData$`Relative Humidity` > 80))
bel_d1UnfavorableData$D1Favorable <- "No"
belagaviD1Data <- rbind (bel_d1FavorableData, bel_d1UnfavorableData)
belagaviD1Data <- arrange (belagaviD1Data, belagaviD1Data$Months)

belagaviD1Test <- t.test (D1 ~ D1Favorable, data = belagaviD1Data, alternative = "greater", var.equal = TRUE)

Analyze for D1 in Dharwad

dhar_d1FavorableData <- subset (dharwadAgriData, dharwadAgriData$Temperature >= 20 & dharwadAgriData$Temperature <= 24 & dharwadAgriData$`Relative Humidity` > 80)
dhar_d1FavorableData$D1Favorable <- "Yes"
dhar_d1UnfavorableData <- subset (dharwadAgriData, !(dharwadAgriData$Temperature >= 20 & dharwadAgriData$Temperature <= 24 & dharwadAgriData$`Relative Humidity` > 80))
dhar_d1UnfavorableData$D1Favorable <- "No"
dharwadD1Data <- rbind (dhar_d1FavorableData, dhar_d1UnfavorableData)
dharwadD1Data <- arrange (dharwadD1Data, dharwadD1Data$Months)

dharwadD1Test <- t.test (D1 ~ D1Favorable, data = dharwadD1Data, alternative = "greater", var.equal = TRUE)

Display D1 results for both districts

belagaviD1Test

## 
##  Two Sample t-test
## 
## data:  D1 by D1Favorable
## t = -2.7605, df = 22, p-value = 0.9943
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -41.64827       Inf
## sample estimates:
##  mean in group No mean in group Yes 
##          11.91669          37.59305

dharwadD1Test

## 
##  Two Sample t-test
## 
## data:  D1 by D1Favorable
## t = -4.5934, df = 20, p-value = 0.9999
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -34.49083       Inf
## sample estimates:
##  mean in group No mean in group Yes 
##          6.515126         31.590651

Findings

p-value for both districts is 0.99 which is greater than 0.05 and hence we accept H0

Conclusion

D1 disease is accessed lesser when the weather is favorable to it than when the weather is unfavorable

Analyze for D2 in Belagavi

bel_d2FavorableData <- subset (belagaviAgriData, belagaviAgriData$Temperature >= 21.5 & belagaviAgriData$Temperature <= 24.5 & belagaviAgriData$`Relative Humidity` > 83)
bel_d2FavorableData$D2Favorable <- "Yes"
bel_d2UnfavorableData <- subset (belagaviAgriData, !(belagaviAgriData$Temperature >= 21.5 & belagaviAgriData$Temperature <= 24.5 & belagaviAgriData$`Relative Humidity` > 83))
bel_d2UnfavorableData$D2Favorable <- "No"
belagaviD2Data <- rbind (bel_d2FavorableData, bel_d2UnfavorableData)
belagaviD2Data <- arrange (belagaviD2Data, belagaviD2Data$Months)

belagaviD2Test <- t.test (D2 ~ D2Favorable, data = belagaviD2Data, alternative = "greater", var.equal = TRUE)

Analyze for D2 in Dharwad

dhar_d2FavorableData <- subset (dharwadAgriData, dharwadAgriData$Temperature >= 21.5 & dharwadAgriData$Temperature <= 24.5 & dharwadAgriData$`Relative Humidity` > 83)
dhar_d2FavorableData$D2Favorable <- "Yes"
dhar_d2UnfavorableData <- subset (dharwadAgriData, !(dharwadAgriData$Temperature >= 21.5 & dharwadAgriData$Temperature <= 24.5 & dharwadAgriData$`Relative Humidity` > 83))
dhar_d2UnfavorableData$D2Favorable <- "No"
dharwadD2Data <- rbind (dhar_d2FavorableData, dhar_d2UnfavorableData)
dharwadD2Data <- arrange (dharwadD2Data, dharwadD2Data$Months)

dharwadD2Test <- t.test (D2 ~ D2Favorable, data = dharwadD2Data, alternative = "greater", var.equal = TRUE)

Display D2 results for both districts

belagaviD2Test

## 
##  Two Sample t-test
## 
## data:  D2 by D2Favorable
## t = -3.7247, df = 22, p-value = 0.9994
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -29.52222       Inf
## sample estimates:
##  mean in group No mean in group Yes 
##          9.173547         29.380223

dharwadD2Test

## 
##  Two Sample t-test
## 
## data:  D2 by D2Favorable
## t = -4.0726, df = 20, p-value = 0.9997
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -48.45349       Inf
## sample estimates:
##  mean in group No mean in group Yes 
##          6.096486         40.134921

Findings

p-value for both districts is 0.99 which is > 0.05 and hence we accept H0

Conclusion

D2 disease is accessed lesser when the weather is favorable to it than when the weather is unfavorable

Analyze for D3 in Belagavi

bel_d3FavorableData <- subset (belagaviAgriData, belagaviAgriData$Temperature >= 22 & belagaviAgriData$Temperature <= 24)
bel_d3FavorableData$D3Favorable <- "Yes"
bel_d3UnfavorableData <- subset (belagaviAgriData, !(belagaviAgriData$Temperature >= 22 & belagaviAgriData$Temperature <= 24))
bel_d3UnfavorableData$D3Favorable <- "No"
belagaviD3Data <- rbind (bel_d3FavorableData, bel_d3UnfavorableData)
belagaviD3Data <- arrange (belagaviD3Data, belagaviD3Data$Months)

belagaviD3Test <- t.test (D3 ~ D3Favorable, data = belagaviD3Data, alternative = "greater", var.equal = TRUE)

Analyze for D3 in Dharwad

dhar_d3FavorableData <- subset (dharwadAgriData, dharwadAgriData$Temperature >= 22 & dharwadAgriData$Temperature <= 24)
dhar_d3FavorableData$D3Favorable <- "Yes"
dhar_d3UnfavorableData <- subset (dharwadAgriData, !(dharwadAgriData$Temperature >= 22 & dharwadAgriData$Temperature <= 24))
dhar_d3UnfavorableData$D3Favorable <- "No"
dharwadD3Data <- rbind (dhar_d3FavorableData, dhar_d3UnfavorableData)
dharwadD3Data <- arrange (dharwadD3Data, dharwadD3Data$Months)

dharwadD3Test <- t.test (D3 ~ D3Favorable, data = dharwadD3Data, alternative = "greater", var.equal = TRUE)

Display D3 results for both districts

belagaviD3Test

## 
##  Two Sample t-test
## 
## data:  D3 by D3Favorable
## t = -2.2224, df = 22, p-value = 0.9816
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -34.29296       Inf
## sample estimates:
##  mean in group No mean in group Yes 
##          11.61233          30.95773

dharwadD3Test

## 
##  Two Sample t-test
## 
## data:  D3 by D3Favorable
## t = -1.5057, df = 20, p-value = 0.9261
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -60.73424       Inf
## sample estimates:
##  mean in group No mean in group Yes 
##          11.96166          40.26971

Findings

p-value for both districts is 0.9 which is > 0.05 and hence we accept H0

Conclusion

D3 disease is accessed lesser when the weather is favorable to it than when the weather is unfavorable

Analyze for D4 in Belagavi

bel_d4FavorableData <- subset (belagaviAgriData, belagaviAgriData$Temperature >= 22 & belagaviAgriData$Temperature <= 26 & belagaviAgriData$`Relative Humidity` > 85)
bel_d4FavorableData$D4Favorable <- "Yes"
bel_d4UnfavorableData <- subset (belagaviAgriData, !(belagaviAgriData$Temperature >= 22 & belagaviAgriData$Temperature <= 26 & belagaviAgriData$`Relative Humidity` > 85))
bel_d4UnfavorableData$D4Favorable <- "No"
belagaviD4Data <- rbind (bel_d4FavorableData, bel_d4UnfavorableData)
belagaviD4Data <- arrange (belagaviD4Data, belagaviD4Data$Months)

belagaviD4Test <- t.test (D4 ~ D4Favorable, data = belagaviD4Data, alternative = "greater", var.equal = TRUE)

Analyze for D4 in Dharwad

dhar_d4FavorableData <- subset (dharwadAgriData, dharwadAgriData$Temperature >= 22 & dharwadAgriData$Temperature <= 26 & dharwadAgriData$`Relative Humidity` > 85)
dhar_d4FavorableData$D4Favorable <- "Yes"
dhar_d4UnfavorableData <- subset (dharwadAgriData, !(dharwadAgriData$Temperature >= 22 & dharwadAgriData$Temperature <= 26 & dharwadAgriData$`Relative Humidity` > 85))
dhar_d4UnfavorableData$D4Favorable <- "No"
dharwadD4Data <- rbind (dhar_d4FavorableData, dhar_d4UnfavorableData)
dharwadD4Data <- arrange (dharwadD4Data, dharwadD4Data$Months)

dharwadD4Test <- t.test (D4 ~ D4Favorable, data = dharwadD4Data, alternative = "greater", var.equal = TRUE)

Display D4 results for both districts

belagaviD4Test

## 
##  Two Sample t-test
## 
## data:  D4 by D4Favorable
## t = -1.793, df = 22, p-value = 0.9566
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -22.15349       Inf
## sample estimates:
##  mean in group No mean in group Yes 
##          12.97384          24.28984

dharwadD4Test

## 
##  Two Sample t-test
## 
## data:  D4 by D4Favorable
## t = -2.3147, df = 20, p-value = 0.9843
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -47.21957       Inf
## sample estimates:
##  mean in group No mean in group Yes 
##          12.10875          39.16667

Findings

p-value for both districts is 0.9 which is > 0.05 and hence we accept H0

Conclusion

D4 disease is accessed lesser when the weather is favorable to it than when the weather is unfavorable

Analyze for D5 in Belagavi

bel_d5FavorableData <- subset (belagaviAgriData, belagaviAgriData$Temperature >= 22 & belagaviAgriData$Temperature <= 24.5 & belagaviAgriData$`Relative Humidity` >= 77 & belagaviAgriData$`Relative Humidity` <= 85)
bel_d5FavorableData$D5Favorable <- "Yes"
bel_d5UnfavorableData <- subset (belagaviAgriData, !(belagaviAgriData$Temperature >= 22 & belagaviAgriData$Temperature <= 24.5 & belagaviAgriData$`Relative Humidity` >= 77 & belagaviAgriData$`Relative Humidity` <= 85))
bel_d5UnfavorableData$D5Favorable <- "No"
belagaviD5Data <- rbind (bel_d5FavorableData, bel_d5UnfavorableData)
belagaviD5Data <- arrange (belagaviD5Data, belagaviD5Data$Months)

belagaviD5Test <- t.test (D5 ~ D5Favorable, data = belagaviD5Data, alternative = "greater", var.equal = TRUE)

Analyze for D5 in Dharwad

dhar_d5FavorableData <- subset (dharwadAgriData, dharwadAgriData$Temperature >= 22 & dharwadAgriData$Temperature <= 24.5 & dharwadAgriData$`Relative Humidity` >= 77 & dharwadAgriData$`Relative Humidity` <= 85)
dhar_d5FavorableData$D5Favorable <- "Yes"
dhar_d5UnfavorableData <- subset (dharwadAgriData, !(dharwadAgriData$Temperature >= 22 & dharwadAgriData$Temperature <= 24.5 & dharwadAgriData$`Relative Humidity` >= 77 & dharwadAgriData$`Relative Humidity` <= 85))
dhar_d5UnfavorableData$D5Favorable <- "No"
dharwadD5Data <- rbind (dhar_d5FavorableData, dhar_d5UnfavorableData)
dharwadD5Data <- arrange (dharwadD5Data, dharwadD5Data$Months)

dharwadD5Test <- t.test (D5 ~ D5Favorable, data = dharwadD5Data, alternative = "greater", var.equal = TRUE)

Display D5 results for both districts

belagaviD5Test

## 
##  Two Sample t-test
## 
## data:  D5 by D5Favorable
## t = -3.6675, df = 22, p-value = 0.9993
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -38.2594      Inf
## sample estimates:
##  mean in group No mean in group Yes 
##          10.51547          36.57407

dharwadD5Test

## 
##  Two Sample t-test
## 
## data:  D5 by D5Favorable
## t = -0.10853, df = 20, p-value = 0.5427
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -18.75428       Inf
## sample estimates:
##  mean in group No mean in group Yes 
##          13.06725          14.17749

Findings

p-value for Belagavi is 0.9 and for Dharwad is 0.5. Both of them are > 0.05 and hence we accept H0

Conclusion

D5 disease is accessed lesser when the weather is favorable to it than when the weather is unfavorable

Analyze for D7 in Belagavi

bel_d7FavorableData <- subset (belagaviAgriData, belagaviAgriData$Temperature > 25 & belagaviAgriData$`Relative Humidity` > 80)
bel_d7FavorableData$D7Favorable <- "Yes"
bel_d7UnfavorableData <- subset (belagaviAgriData, !(belagaviAgriData$Temperature > 25 & belagaviAgriData$`Relative Humidity` > 80))
bel_d7UnfavorableData$D7Favorable <- "No"
belagaviD7Data <- rbind (bel_d7FavorableData, bel_d7UnfavorableData)
belagaviD7Data <- arrange (belagaviD7Data, belagaviD7Data$Months)

belagaviD7Test <- t.test (D7 ~ D7Favorable, data = belagaviD7Data, alternative = "greater", var.equal = TRUE)

Analyze for D7 in Dharwad

dhar_d7FavorableData <- subset (dharwadAgriData, dharwadAgriData$Temperature > 25 & dharwadAgriData$`Relative Humidity` > 80)
dhar_d7FavorableData$D7Favorable <- "Yes"
dhar_d7UnfavorableData <- subset (dharwadAgriData, !(dharwadAgriData$Temperature > 25 & dharwadAgriData$`Relative Humidity` > 80))
dhar_d7UnfavorableData$D7Favorable <- "No"
dharwadD7Data <- rbind (dhar_d7FavorableData, dhar_d7UnfavorableData)
dharwadD7Data <- arrange (dharwadD7Data, dharwadD7Data$Months)

dharwadD7Test <- t.test (D7 ~ D7Favorable, data = dharwadD7Data, alternative = "greater", var.equal = TRUE)

Display D7 results for both districts

belagaviD7Test

## 
##  Two Sample t-test
## 
## data:  D7 by D7Favorable
## t = -3.4275, df = 22, p-value = 0.9988
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -77.17649       Inf
## sample estimates:
##  mean in group No mean in group Yes 
##          21.00642          72.42328

dharwadD7Test

## 
##  Two Sample t-test
## 
## data:  D7 by D7Favorable
## t = -0.72663, df = 20, p-value = 0.7621
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -50.91364       Inf
## sample estimates:
##  mean in group No mean in group Yes 
##          19.90822          35.00000

Findings

p-value for Belagavi is 0.9 and for Dharwad is 0.7. Both of them are > 0.05 and hence we accept H0

Conclusion

D7 disease is accessed lesser when the weather is favorable to it than when the weather is unfavorable

ANALYSIS OF JAYALAKSHMI AGRO TECH CASES USING THE DATA SHARED

Set the working directory, load the needed libraries and the data

Case - 1

Anand, the cofounder of JAT, claims that disease 6 (leaf curl) information was accessed at least 60 times every week on average since October 2017 due to this disease outbreak. Test this claim at a significance level of 0.05 using an appropriate hypothesis test.

Hypothesis

Test: One sample t-test

Findings

Conclusion

Case - 2

Among the app users for disease information, at least 15% of them access disease information related to disease 6. Use an appropriate hypothesis test to check this claim at a = 0.05

Hypothesis

Test: Z-proportion test

Findings

Conclusion

Case - 3

JAT believes that over the years, the average number of app users have increased significantly. Is there statistical evidence to support that the average number of users in year 2017-2018 is more than average number of users in year 2015-2016 at a=0.05? Support your answer with all necessary tests.

Hypothesis

Test: Two sample t-test

Findings

Conclusion

Case - 4a

Farmers use apps to access information throughout the month. Using the data, check whether app usage is same or different across the four weeks of a month.

Hypothesis

Test: Anova

Findings

Conclusion

Case - 4b

Anand claims that app usage picked up after January 2016; so, test this hypothesis using data from January-2016 - May 2018.

Hypothesis

Test: Two sample t-test

Findings

Conclusion

Case - 5

Hypothesis

Test: Pearson’s Correlation test

Findings

Conclusion

Case - 6

A new version of the app was released in August 2016. Anand wants to understand which month in the given time frame after the launch of the new version, the mean usage pattern would start to show a statistically significant shift.

Test: Draw a line graph

Findings

Case - 7

Hypothesis

Test: Two sample t-test

Load district wise data

Analyze for D1 in Belagavi

Analyze for D1 in Dharwad

Display D1 results for both districts

Findings

Conclusion

Analyze for D2 in Belagavi

Analyze for D2 in Dharwad

Display D2 results for both districts

Findings

Conclusion

Analyze for D3 in Belagavi

Analyze for D3 in Dharwad

Display D3 results for both districts

Findings

Conclusion

Analyze for D4 in Belagavi

Analyze for D4 in Dharwad

Display D4 results for both districts

Findings

Conclusion

Analyze for D5 in Belagavi

Analyze for D5 in Dharwad

Display D5 results for both districts

Findings

Conclusion

Analyze for D7 in Belagavi

Analyze for D7 in Dharwad

Display D7 results for both districts

Findings

Conclusion