P-Value

P-value is an important element used in statistics.

A p-value is used for finding random samples and establishing if there is a large factor in determining if the hypothesis is true. This helps ascertain if there is any extremities in what is being tested.

Lets see different heights and weights for women.

I hypothesize that women weigh more as their height increases!

Further on we will see baby chickens weight and determine different factors and the affects of weight on those as well.

R Output for the plotly plot

We will use a plotly plot to compare weight vs. height in the dataset women.

mod = lm(weight ~ height, data=women)
Weight = women$weight
Height = women$height

xax <- list( 
  title = "Weight", 
  titlefont = list(family="Verdana") 
) 
yax <- list( 
  title = "Height", 
  titlefont = list(family="Verdana") 
)

W = plot_ly(women, x=Weight, y=Height, type = "scatter",
            mode="markers") %>%
  layout(xaxis= xax, yaxis=yax)
   
config(W, displaylogo=FALSE)

Plotly Plot of Womens’ Weight vs. Height

Solving P-Value

Solving for the linear regression helps to solve p-value. The formula is: \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n + \epsilon\)

Call:
lm(formula = weight ~ height, data = women)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.7333 -1.1333 -0.3833  0.7417  3.1167 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -87.51667    5.93694  -14.74 1.71e-09 ***
height        3.45000    0.09114   37.85 1.09e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.525 on 13 degrees of freedom
Multiple R-squared:  0.991, Adjusted R-squared:  0.9903 
F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14

With this we determine \(p-value\) is basically \(0\) meaning height and weight are associated with one another. The \(null\) \(hypothesis\) is then \(true\).

GGplot ChickWeight vs Time

GGplot with Diet

This shows that different diets can affect weight.

Solving P-Value for Chicks

Call:
lm(formula = ChickWeight$Time ~ ChickWeight$weight)

Residuals:
   Min     1Q Median     3Q    Max 
-9.757 -2.385 -0.817  1.954 14.088 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)        1.021013   0.305627   3.341  0.00089 ***
ChickWeight$weight 0.079602   0.002168  36.725  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.7 on 576 degrees of freedom
Multiple R-squared:  0.7007,    Adjusted R-squared:  0.7002 
F-statistic:  1349 on 1 and 576 DF,  p-value: < 2.2e-16

For time affecting weight the \(p-value\) is again \(0\).

Conclusion

There a lot of factors that determine weight and finding p-value helps you come to a conclusion….

One of the formulas for finding p is:
\(H_0:\mu_1-\mu_2=0\) \(\hspace{5cm}\) \(H_a:\mu_1-\mu_2\neq0\)

In the scenarios \(p-value\) was \(0\). This means that it is doubtful that this happened by coincidence.

Thank you!!!