1 Research question
- 1.1 Clean and read data
- 1.2 Remove outliers
2 About the data
3 Sprints (hours and points) in weeks with 7hs/day
4 Sprints in weeks with 8hs/day
5 Mean Comparation between Total weeks
6 Hypothesis tests
- 6.1 Hypothesis t test Points per weeks with 8hs p/day comparing with weeks with 7hs p/day
7 Coeficients
8 Conclusions

1 Research question

Increase hours in working programming a day make a worker do more point in a week?

Hypotesis: Increase 1 hour working programming a day not make a worker do more point in a week?

1.1 Clean and read data

The hours have been rounded. Example, for more:36.87 = 37, for less: 36.40 = 37.

library(readxl)

library(readr)
AlleySprint <- read_csv("AlleySprint.csv")

library(readr)
clean <- read_delim("clean.csv", ";", escape_double = FALSE, 
    col_types = cols(Hours_p_point = col_number(), 
        Points = col_number(), Program_Hours = col_number(), 
        regime = col_number()), trim_ws = TRUE)


DT::datatable(clean[,1:7])

1.2 Remove outliers

outliers = c(boxplot.stats(clean$Points)$out,
boxplot.stats(clean$Program_Hours)$out)
 
data_outliers = clean[-c(which(clean$Points %in% outliers),
which(clean$Program_Hours %in% outliers)),]

2 About the data

Since week xxx until xx, there are 35 weeks of analysis, about 1211 hours of working programming, and 315 points. The avarage is spend 3.44 hours per point.

3 Sprints (hours and points) in weeks with 7hs/day

 Weeks_7hs <- data_outliers %>%
          filter(regime == 1)

DT::datatable(Weeks_7hs)

4 Sprints in weeks with 8hs/day

 Weeks_8hs <- data_outliers %>%
          filter(regime == 2)

DT::datatable(Weeks_8hs)

5 Mean Comparation between Total weeks

Comparation <- data_outliers %>%
            dplyr::select(Sprint_enddate,cat_regime, regime, Points, Program_Hours,Hours_p_point) %>%
            group_by(cat_regime)  %>%
            summarise(Weeks_qnt = n_distinct(Sprint_enddate),
                      M_Points = mean(Points),
                      M_Program_Hours_W = mean(Program_Hours),
                      M_Hours_p_point_W = mean(Hours_p_point),
                      Total_points = sum(Points),
                      Total_Hours = sum(Program_Hours))
           

                      
  
DT::datatable(Comparation)

6 Hypothesis tests

6.1 Hypothesis t test Points per weeks with 8hs p/day comparing with weeks with 7hs p/day

H0: X = Y

H1: X > Y, where

X = Points per week with 8h/d Y = Points per week with 7h/d

it means, weeks with 8h/d working have statistically more Points than weeks with 7h/d working

For H1 to be true the p-value need to be less than α (alfa = 0.05), or equal. If p-value > than alfa it means there are not statistically significant difference between Points per week with 8h/d and 7h/d (H0 is not reject)

The significance level, also denoted as alpha or α, is the probability of rejecting the null hypothesis when it is true. For example, a significance level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual difference.

t.test(Points ~ cat_regime, data = data_outliers)

## 
##  Welch Two Sample t-test
## 
## data:  Points by cat_regime
## t = -1.4194, df = 22.865, p-value = 0.1693
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.3095889  0.6166065
## sample estimates:
## mean in group 7h_p_day mean in group 8h_p_day 
##               7.916667               9.263158

The results show us that HO could not be reject (p-value = 0.1693), it means, there are not statistically significant difference in Points between weeks with 8h p/day working and 7h/p day working (programing hours).

# Create a boxplot with geom_boxplot()
ggplot(data_outliers, aes(x = as.factor(cat_regime) , y = Points)) + 
    geom_boxplot() + geom_hline(yintercept = 7.3, color="red", size=1.5) + geom_hline(yintercept = 10.3, color="red", size=1.5)

boxplot(Weeks_7hs$Points, Weeks_8hs$Points, 
        outline = FALSE,
        names = c("Point_W7", "Point_W8"),  
        col = c("blue", "yellow"), 
        main = "Points comparation")
        abline(h=7.3, col = "Red")
        abline(h=10.3, col = "Red")

boxplot(Weeks_7hs$Program_Hours,Weeks_8hs$Program_Hours, 
        outline = FALSE,
        names = c("Hours_W7", "Hours_W8"),  
        col = c("blue", "yellow"), 
        main = "Hours comparation")
        abline(h=32, col = "Red")
        abline(h=35, col = "Red")

library("ggpubr")
library(rpart)
library(rpart.plot)
library(corrplot)
library(ggvis)
library(viridis)
library(hrbrthemes)

ggplot(data = data_outliers, 
       aes(x = Program_Hours, y= Points, colour = as.factor(cat_regime))) + geom_point() + labs(x = "Programing hours", y = "Points") + geom_smooth(method = "lm") + theme(legend.position = "bottom") + ggtitle("Points and Hours")

data_outliers %>%
   ggplot(aes(x= Program_Hours, y= Points, size=Program_Hours , color=as.factor(cat_regime))) +
    geom_point(alpha=0.5) +
    scale_size(range = c(.1, 20), name="Points")

qplot(Program_Hours,    Points,   data    =   data_outliers, facets   =   .   ~   as.factor(cat_regime), color =   Hours_p_point, geom    =   c("point",  "smooth"),  method  =   "lm")

7 Coeficients

fit2 <- lm(formula = Points ~ Program_Hours + as.factor(cat_regime) -1, family = "peason", data = data_outliers)

summary(fit2)

## 
## Call:
## lm(formula = Points ~ Program_Hours + as.factor(cat_regime) - 
##     1, data = data_outliers, family = "peason")
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5871 -1.7448  0.2036  1.6077  5.4129 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)  
## Program_Hours                   0.2980     0.1516   1.965   0.0594 .
## as.factor(cat_regime)7h_p_day  -1.8415     5.0159  -0.367   0.7163  
## as.factor(cat_regime)8h_p_day  -1.8241     5.6704  -0.322   0.7501  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.433 on 28 degrees of freedom
## Multiple R-squared:  0.9355, Adjusted R-squared:  0.9286 
## F-statistic: 135.4 on 3 and 28 DF,  p-value: < 2.2e-16

fit2 <- lm(formula = Points  ~  Program_Hours -1, family = "peason", data = data_outliers)

summary(fit2)

## 
## Call:
## lm(formula = Points ~ Program_Hours - 1, data = data_outliers, 
##     family = "peason")
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.642 -1.765  0.099  1.599  5.358 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## Program_Hours  0.24691    0.01188   20.79   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.358 on 30 degrees of freedom
## Multiple R-squared:  0.9351, Adjusted R-squared:  0.9329 
## F-statistic: 432.2 on 1 and 30 DF,  p-value: < 2.2e-16

8 Conclusions

Although the number of hours worked in a week with 8h/d schedule is statistically greater than the number of hours worked in a week with 7h/d, there is no statistically significant difference between the number of points produced between those weeks

More hours means more points?

Viviane Schneider

21 de outubro de 2020