Goals :

Application exercise

The “Birthweight” dataset is a survey of risk factors associated with low birthweight infants (data collected during Baystate Medical Center in Massachusetts during 1986). The weak birth weight is an event that has interested physicians for several years because of the very high infant mortality rate and infant abnormality rate high in low birth weight infants. A woman’s behavior during pregnancy (diet, smoking habits, etc.) can significantly alter the chances of carrying the pregnancy to term, and therefore of giving birth to a child of normal weight. A child is considered to have low birth weight if this is less than 2500 g.

knitr::opts_chunk$set(comment = NA)

1. Download the “Birth_weight.xls” data file directly from the Internetusing the RStudio import window: Import Dataset

Install the following libraries to help in data manipulation and display of summary statistics

if(!require(dplyr)){install.packages('dplyr')} #installing the package if not
Loading required package: dplyr

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(stargazer)

Please cite as: 
 Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
 R package version 5.2.3. https://CRAN.R-project.org/package=stargazer 
library(dplyr) #loading the library
library(gtsummary)
Warning: package 'gtsummary' was built under R version 4.2.2
library(epiDisplay)
Warning: package 'epiDisplay' was built under R version 4.2.2
Loading required package: foreign
Loading required package: survival
Loading required package: MASS
Warning: package 'MASS' was built under R version 4.2.2

Attaching package: 'MASS'
The following object is masked from 'package:gtsummary':

    select
The following object is masked from 'package:dplyr':

    select
Loading required package: nnet
library(vtable)
Warning: package 'vtable' was built under R version 4.2.2
Loading required package: kableExtra
Warning: package 'kableExtra' was built under R version 4.2.2
Warning in !is.null(rmarkdown::metadata$output) && rmarkdown::metadata$output
%in% : 'length(x) = 3 > 1' in coercion to 'logical(1)'

Attaching package: 'kableExtra'
The following object is masked from 'package:dplyr':

    group_rows
library(wilson)
Warning: package 'wilson' was built under R version 4.2.2

Attaching package: 'wilson'
The following object is masked from 'package:stats':

    heatmap
library(fastR2)
Warning: package 'fastR2' was built under R version 4.2.2
Loading required package: mosaic
Registered S3 method overwritten by 'mosaic':
  method                           from   
  fortify.SpatialPolygonsDataFrame ggplot2

The 'mosaic' package masks several functions from core packages in order to add 
additional features.  The original behavior of these functions should not be affected by this.

Attaching package: 'mosaic'
The following object is masked from 'package:Matrix':

    mean
The following object is masked from 'package:ggplot2':

    stat
The following objects are masked from 'package:dplyr':

    count, do, tally
The following objects are masked from 'package:stats':

    binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
    quantile, sd, t.test, var
The following objects are masked from 'package:base':

    max, mean, min, prod, range, sample, sum

Attaching package: 'fastR2'
The following object is masked from 'package:MASS':

    Traffic
library(readxl)
library(epitools)

Attaching package: 'epitools'
The following object is masked from 'package:survival':

    ratetable
library(mosaic)
library(finalfit)
Warning: package 'finalfit' was built under R version 4.2.2
library(dplyr)
library(magrittr)

Attaching package: 'magrittr'
The following object is masked from 'package:wilson':

    and
library(finalfit)
library(ggplot2)
library(psych)
Warning: package 'psych' was built under R version 4.2.2

Attaching package: 'psych'
The following objects are masked from 'package:mosaic':

    logit, rescale
The following objects are masked from 'package:ggplot2':

    %+%, alpha
The following object is masked from 'package:wilson':

    pca
The following objects are masked from 'package:epiDisplay':

    alpha, cs, lookup
library(corrplot)
Warning: package 'corrplot' was built under R version 4.2.2
corrplot 0.92 loaded
library(VIM)
Warning: package 'VIM' was built under R version 4.2.2
Loading required package: colorspace
Loading required package: grid
VIM is ready to use.
Suggestions and bug-reports can be submitted at: https://github.com/statistikat/VIM/issues

Attaching package: 'VIM'
The following object is masked from 'package:vtable':

    countNA
The following object is masked from 'package:datasets':

    sleep
library(gridExtra)

Attaching package: 'gridExtra'
The following object is masked from 'package:dplyr':

    combine
library(car)
Warning: package 'car' was built under R version 4.2.2
Loading required package: carData

Attaching package: 'car'
The following object is masked from 'package:psych':

    logit
The following objects are masked from 'package:mosaic':

    deltaMethod, logit
The following object is masked from 'package:dplyr':

    recode
library(knitr)
library(gmodels)
Warning: package 'gmodels' was built under R version 4.2.2

Attaching package: 'gmodels'
The following object is masked from 'package:epiDisplay':

    ci

Load the dataset

Poids_naissance <- read.csv("C:\\Users\\user\\Downloads\\Poids_naissance.csv")

2. Display the first 3 lines of the file

head(Poids_naissance,3)

3. Redefine the array using the as.data.frame() command, then use the attach() command

Poids_naissance <-data.frame(Poids_naissance)

Attach the dataset

attach(Poids_naissance)

4. Structuring categorical variables

Structuring categorical variables refers to organizing and preparing categorical data for analysis using the R programming language. Categorical variables are those that represent qualitative or categorical data, such as gender, race, education level, or favorite color.

In RStudio, categorical variables can be structured using factors, which are a data type in R that represent categorical variables with a fixed set of levels. Factors allow for the efficient storage and analysis of categorical data, as well as the ability to perform statistical tests and visualizations specific to categorical data.

To structure a categorical variable as a factor in RStudio, you can use the factor() function. This function takes a vector of categorical data as its first argument, and a set of levels for the variable as its second argument (if not specified, the levels will be automatically determined based on the unique values in the data). ` #### Factor the categorical variables

Poids_naissance$RACE<-factor(Poids_naissance$RACE, levels = c(1,2,3),
                              labels = c("White", "Black", "Other"))
Poids_naissance<- Poids_naissance %>%mutate(RACE = factor(RACE))
Poids_naissance<- Poids_naissance %>%mutate(LOW = factor(LOW))
Poids_naissance<- Poids_naissance %>%mutate(SMOKE = factor(SMOKE))
Poids_naissance<- Poids_naissance %>%mutate(PTL = factor(PTL))
Poids_naissance<- Poids_naissance %>%mutate(HT = factor(HT))
Poids_naissance<- Poids_naissance %>%mutate(UI = factor(UI))
Poids_naissance<- Poids_naissance %>%mutate(FVT = factor(FVT))

5. For the “RACE” variable, construct a frequency table and a diagram appropriate

Frequency Table
tab1(Poids_naissance$RACE)

Poids_naissance$RACE : 
        Frequency Percent Cum. percent
White          96    50.8         50.8
Black          26    13.8         64.6
Other          67    35.4        100.0
  Total       189   100.0        100.0

From the results above, there were 96 individuals from the race category (1), 26 individuals from race category (2) and 67 individuals from the race category 3. This represents 51%, 26% and 35% of the distribution races in the data, respectively. This bar above is the visual representation of the distribution of races in the dataset.

6. Calculate the mean and standard deviations of continuous quantitative variables

sum_stat <- data.frame(Poids_naissance$AGE, Poids_naissance$LWT, Poids_naissance$BWT)
stargazer(Poids_naissance[,-1], type = "text")

==========================================
Statistic  N    Mean    St. Dev. Min  Max 
------------------------------------------
AGE       189  23.238    5.299   14   45  
LWT       189  129.815   30.579  80   250 
BWT       189 2,944.656 729.022  709 4,990
------------------------------------------
Summary Statistics for a few selected variables
st(sum_stat, col.breaks = 3,
   summ = list(
     c('notNA(x)','mean(x)', 'median(x)','sd(x^2)','min(x)','max(x)'),
     c('notNA(x)','mean(x)')
   ),
   summ.names = list(
     c('N','Mean','median','SD','Min','Max'),
     c('Count','Percent')
   ))
Summary Statistics
Variable N Mean median SD Min Max
Poids_naissance.AGE 189 23.238 23 269.332 14 45
Poids_naissance.LWT 189 129.815 121 9347.597 80 250
Poids_naissance.BWT 189 2944.656 2977 4244457.151 709 4990
Summary Statistics of all variables
st(Poids_naissance, col.breaks = 11,
   summ = list(
     c('notNA(x)','mean(x)', 'median(x)','sd(x^2)','min(x)','max(x)', 'skew(x)'),
     c('notNA(x)','mean(x)')
   ),
   summ.names = list(
     c('N','Mean','median','SD','Min','Max', 'skew'),
     c('Count','Percent')
   ))
Summary Statistics
Variable N Mean median SD Min Max skew
ID 189 121.079 123 15468.655 4 226 -0.074
AGE 189 23.238 23 269.332 14 45 0.711
LWT 189 129.815 121 9347.597 80 250 1.38
RACE 189
… White 96 50.8%
… Black 26 13.8%
… Other 67 35.4%
SMOKE 189
… 0 115 60.8%
… 1 74 39.2%
PTL 189
… 0 159 84.1%
… 1 24 12.7%
… 2 5 2.6%
… 3 1 0.5%
HT 189
… 0 177 93.7%
… 1 12 6.3%
UI 189
… 0 161 85.2%
… 1 28 14.8%
FVT 189
… 0 100 52.9%
… 1 47 24.9%
… 2 30 15.9%
… 3 7 3.7%
… 4 4 2.1%
… 6 1 0.5%
BWT 189 2944.656 2977 4244457.151 709 4990 -0.207
LOW 189
… 0 130 68.8%
… 1 59 31.2%

Alternative Summary Statistics

describe(Poids_naissance[,2:11])
Other Grouped Summary Statistics
Poids_naissance [,c(2,3,4,5,6,7,8,9,10),11] %>%
  tbl_summary(by = SMOKE) %>%
  add_p() %>%
  add_overall() %>% 
  bold_labels()
Characteristic Overall, N = 1891 0, N = 1151 1, N = 741 p-value2
AGE 23 (19, 26) 23 (20, 26) 22 (19, 26) 0.5
LWT 121 (110, 140) 124 (112, 142) 120 (107, 137) 0.2
RACE <0.001
    White 96 (51%) 44 (38%) 52 (70%)
    Black 26 (14%) 16 (14%) 10 (14%)
    Other 67 (35%) 55 (48%) 12 (16%)
PTL 0.036
    0 159 (84%) 103 (90%) 56 (76%)
    1 24 (13%) 10 (8.7%) 14 (19%)
    2 5 (2.6%) 2 (1.7%) 3 (4.1%)
    3 1 (0.5%) 0 (0%) 1 (1.4%)
HT >0.9
    0 177 (94%) 108 (94%) 69 (93%)
    1 12 (6.3%) 7 (6.1%) 5 (6.8%)
UI 0.4
    0 161 (85%) 100 (87%) 61 (82%)
    1 28 (15%) 15 (13%) 13 (18%)
FVT 0.12
    0 100 (53%) 55 (48%) 45 (61%)
    1 47 (25%) 35 (30%) 12 (16%)
    2 30 (16%) 19 (17%) 11 (15%)
    3 7 (3.7%) 3 (2.6%) 4 (5.4%)
    4 4 (2.1%) 3 (2.6%) 1 (1.4%)
    6 1 (0.5%) 0 (0%) 1 (1.4%)
BWT 2,977 (2,414, 3,475) 3,100 (2,509, 3,622) 2,776 (2,370, 3,246) 0.007
1 Median (IQR); n (%)
2 Wilcoxon rank sum test; Pearson's Chi-squared test; Fisher's exact test

7. For the binary variable “LOW” (birth weight less than or equal to 2500 g), determine a Wilson confidence interval. Interpret

Get the proportion of success and failure

tab1(Poids_naissance$LOW)

Poids_naissance$LOW : 
        Frequency Percent Cum. percent
0             130    68.8         68.8
1              59    31.2        100.0
  Total       189   100.0        100.0

The test is used to determine if the proportion of successes in a sample is significantly different from a hypothesized value (in this case, 0.5). From the output above, we have 59 success (birth weight less than or equal to 2500 g), and 130 failure . We can therefore perform the the wilson confidence interval for proportion as shown below.

prop.test(59,189)

    1-sample proportions test with continuity correction

data:  59 out of 189
X-squared = 25.926, df = 1, p-value = 3.548e-07
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.2479596 0.3841585
sample estimates:
        p 
0.3121693 
prop.test(59,189, correct=FALSE)

    1-sample proportions test without continuity correction

data:  59 out of 189
X-squared = 26.672, df = 1, p-value = 2.411e-07
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.2504031 0.3814188
sample estimates:
        p 
0.3121693 

The sample size is 189, and out of those 189, 59 were successes. The sample proportion of successes (p) is 0.3121693. The results of the test with continuity correction show a chi-squared value of 25.926, with 1 degree of freedom and a p-value of 3.548e-07. The alternative hypothesis is that the true proportion (p) is not equal to 0.5. The 95% confidence interval for the true proportion ranges from 0.2479596 to 0.3841585.

The results of the test without continuity correction show a chi-squared value of 26.672, with 1 degree of freedom and a p-value of 2.411e-07. The alternative hypothesis is the same as before. The 95% confidence interval for the true proportion ranges from 0.2504031 to 0.3814188.

Both tests rejected the null hypothesis that the true proportion is equal to 0.5, since the p-values are less than the significance level (typically 0.05). The confidence intervals also contain 0.5. The difference between the two tests is minor, with the continuity correction providing slightly narrower confidence intervals.

9. Create a contingency table crossing the “LOW” response and smoking during pregnancy

A contingency table, also known as a cross-tabulation table, is a table that displays the distribution of two or more categorical variables. The table presents the frequency or count of each combination of the variables in rows and columns.

Each row in the table represents a level of one categorical variable, while each column represents a level of the other categorical variable. The cells in the table contain the number of observations that fall into each combination of the two variables. The table can also display percentages or proportions instead of counts, depending on the purpose of the analysis.

Contingency tables are commonly used in statistical analysis to examine the relationship between two or more categorical variables, such as gender and occupation or region of residence and political party affiliation. They can help identify patterns and associations between variables, and can be used to test hypotheses about the relationship between them. Chi-square tests and Fisher’s exact test are commonly used statistical tests for analyzing contingency tables.

dat <- Poids_naissance[,c(5,11)] %>%
  tbl_summary(by = LOW) %>%
  add_p() %>%
  add_overall() %>% 
  bold_labels()
dat
Characteristic Overall, N = 1891 0, N = 1301 1, N = 591 p-value2
SMOKE 0.026
    0 115 (61%) 86 (66%) 29 (49%)
    1 74 (39%) 44 (34%) 30 (51%)
1 n (%)
2 Pearson's Chi-squared test
Summary Statistics of other Variables
Poids_naissance [,c(2,3,5,10)] %>%
  tbl_summary(by = SMOKE) %>%
  add_p() %>%
  add_overall() %>% 
  bold_labels()
Characteristic Overall, N = 1891 0, N = 1151 1, N = 741 p-value2
AGE 23 (19, 26) 23 (20, 26) 22 (19, 26) 0.5
LWT 121 (110, 140) 124 (112, 142) 120 (107, 137) 0.2
BWT 2,977 (2,414, 3,475) 3,100 (2,509, 3,622) 2,776 (2,370, 3,246) 0.007
1 Median (IQR)
2 Wilcoxon rank sum test

The table represents a contingency table with two categorical variables: “SMOKE” and “LOW”, and the corresponding counts or percentages for each combination of the two variables.

The “LOW” variable has two levels: 0 and 1, with 130 observations in level 0 and 59 observations in level 1, and the “SMOKE” variable also has two levels: 0 and 1, with 189 observations in total.

The p-value of 0.026 indicates the result of a statistical test to evaluate the association between the two variables using Pearson’s Chi-squared test. The null hypothesis is that the two variables are independent, meaning that the distribution of one variable does not depend on the distribution of the other variable. The alternative hypothesis is that the two variables are dependent, meaning that the distribution of one variable does depend on the distribution of the other variable. The p-value of 0.026 is less than the commonly used threshold of 0.05, indicating that we can reject the null hypothesis and conclude that there is a statistically significant association between the “SMOKE” variable and the “LOW” variable. Specifically, it appears that individuals in level 1 of the “LOW” variable are more likely to have a value of 1 in the “SMOKE” variable than individuals in level 0 of the “Characteristic” variable. In other words, individuals who smoke during pregnancy are more likely going to give birth to children with birth weight less than or equal to 2500 g.

10. Is smoking during pregnancy a potential factor in causing premature birth? To answer this question, present the three measures of comparison and interpret each of these measurements. :

  • Proportion difference test
  • Relative Risk Confidence Interval
  • Odds ratio confidence interval The p-value of 0.026 indicates the result of a statistical test to evaluate the association between the two variables using Pearson’s Chi-squared test. The null hypothesis is that the two variables are independent, meaning that the distribution of one variable does not depend on the distribution of the other variable. The alternative hypothesis is that the two variables are dependent, meaning that the distribution of one variable does depend on the distribution of the other variable. The p-value of 0.026 is less than the commonly used threshold of 0.05, indicating that we can reject the null hypothesis and conclude that there is a statistically significant association between the “SMOKE” variable and the “LOW” variable. Specifically, it appears that individuals in level 1 of the “LOW” variable are more likely to have a value of 1 in the “SMOKE” variable than individuals in level 0 of the “Characteristic” variable. In other words, individuals who smoke during pregnancy are more likely going to give birth to children with birth weight less than or equal to 2500 g. We can confidently conclude that smoking during pregnancy a potential factor in causing premature birth.

Odd Ratio Confidence Interval

TBL<-table(Poids_naissance$SMOKE, Poids_naissance$LOW) 
TBL
   
     0  1
  0 86 29
  1 44 30

Perform the test

test <- fisher.test(TBL)
test

    Fisher's Exact Test for Count Data

data:  TBL
p-value = 0.03618
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 1.028780 3.964904
sample estimates:
odds ratio 
  2.014137 

Fisher’s Exact Test is a statistical test used to determine the association between two categorical variables, as well as the odds ratio confidence interval. From the output you provided, I have conducted conducted Fisher’s Exact Test on count data and obtained a p-value of 0.03618. The null hypothesis is that there is no association between the two categorical variables, while the alternative hypothesis is that there is an association.

The output also shows that the calculated odds ratio is 2.014137, which means that the odds of the event occurring in one group are two times greater than the odds of the event occurring in the other group. The 95% confidence interval for the odds ratio ranges from 1.028780 to 3.964904, indicating that there is a 95% probability that the true odds ratio falls within this range. Since the p-value is less than 0.05, which is a commonly used significance level, we can reject the null hypothesis and conclude that there is evidence of an association between the two categorical variables.

Proportion Difference Test

tbl <- matrix(c(86, 29, 44, 30), nrow = 2, byrow = TRUE)
colnames(tbl) <- c("LOW[yes]", "LOW[no]")
rownames(tbl) <- c("SMOKE[yes]", "SMOKE[no]")
View the table
tbl
           LOW[yes] LOW[no]
SMOKE[yes]       86      29
SMOKE[no]        44      30
# conduct the test
prop.test(tbl)
Warning in stats::prop.test(x = count, n = n, p = p, alternative =
alternative, : Chi-squared approximation may be incorrect

    1-sample proportions test with continuity correction

data:  tbl  [with success = 29]
X-squared = 0.25, df = 1, p-value = 0.6171
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.01319116 0.78057348
sample estimates:
   p 
0.25 

The output above suggests that a 1-sample proportions test with continuity correction was conducted on a dataset presented in the contingency table tbl, with a sample size of 29 and a success count of 29*0.25=7.25 (assuming that the expected proportion of success was 0.5). The null hypothesis is that the true proportion of success is equal to 0.5, while the alternative hypothesis is that it is not equal to 0.5. The test statistic used is the chi-squared statistic, which has a chi-squared distribution with 1 degree of freedom.

The output shows that the calculated chi-squared statistic is 0.25 and the p-value is 0.6171. Since the p-value is greater than the commonly used significance level of 0.05, we fail to reject the null hypothesis and conclude that there is not enough evidence to suggest that the true proportion of success is different from 0.5. The 95% confidence interval for the true proportion of success ranges from 0.01319116 to 0.78057348. This indicates that there is a 95% probability that the true proportion of success falls within this range. The sample estimate of the proportion of success is 0.25.

Relative Risk Confidence Interval

epitools::riskratio(tbl,rev='both',method = 'wald')
$data
           LOW[no] LOW[yes] Total
SMOKE[no]       30       44    74
SMOKE[yes]      29       86   115
Total           59      130   189

$measure
                        NA
risk ratio with 95% C.I. estimate    lower    upper
              SMOKE[no]  1.000000       NA       NA
              SMOKE[yes] 1.257708 1.013374 1.560953

$p.value
            NA
two-sided    midp.exact fisher.exact chi.square
  SMOKE[no]          NA           NA         NA
  SMOKE[yes] 0.02914865    0.0361765 0.02649064

$correction
[1] FALSE

attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"

The output above is from a 2x2 contingency table analysis using Fisher’s exact test to compare the proportion of “LOW” responses between two groups (“SMOKE[no]” and “SMOKE[yes]”). The contingency table shows the frequency distribution of the “LOW” responses for each group, as well as the total number of observations in each group.

The “measure” section of the output shows the estimated relative risk (risk ratio) and its 95% confidence interval (CI) based on the contingency table. The relative risk is 1.2577, which suggests that the proportion of “LOW” responses is higher in the “SMOKE[yes]” group compared to the “SMOKE[no]” group. The 95% CI for the relative risk ranges from 1.0134 to 1.5610.

The “p.value” section of the output shows the p-value for the test of the null hypothesis that the proportions of “LOW” responses in the two groups are equal. The p-value is 0.0291, which is less than the significance level of 0.05, suggesting that there is evidence of a significant difference between the two groups. The p-value is provided for different test options: midp.exact, fisher.exact, and chi-square.

The “correction” section of the output indicates whether a continuity correction was applied to the test statistic. In this case, the continuity correction was not applied. The “method” attribute indicates that the unconditional maximum likelihood estimation (MLE) and normal approximation (Wald) CI method was used to estimate the relative risk and its confidence interval.

11. Redo question 10 for the factor “UI” and the answer “LOW”

TBL2<-table(Poids_naissance$UI, Poids_naissance$LOW) 
TBL2
   
      0   1
  0 116  45
  1  14  14
tbl_1 <- matrix(c(116, 45, 14, 14), nrow = 2, byrow = TRUE)
colnames(tbl_1) <- c("LOW[yes]", "LOW[no]")
rownames(tbl_1) <- c("UI[yes]", "UI[no]")
View the two-way table
tbl_1
        LOW[yes] LOW[no]
UI[yes]      116      45
UI[no]        14      14
Odds Ratio Confidence Interval
test <- fisher.test(tbl_1)
test

    Fisher's Exact Test for Count Data

data:  tbl_1
p-value = 0.02692
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 1.041921 6.324135
sample estimates:
odds ratio 
  2.563399 

The p-value for the test is 0.02692, which is less than the significance level of 0.05, indicating that there is evidence of a significant association between the two variables in the contingency table. The alternative hypothesis is that the true odds ratio is not equal to 1.

The 95% confidence interval for the odds ratio ranges from 1.0419 to 6.3241. This means that with 95% confidence, we can say that the true odds ratio lies within this interval. Since the interval excludes 1, it suggests that the odds of having a certain outcome are significantly different between the two groups in the contingency table.

The estimated odds ratio is 2.5634, which is the point estimate of the odds ratio based on the contingency table. This suggests that the odds of having the outcome of interest is 2.5634 times higher in one group compared to the other group.

Proportion Difference Test

prop.test(tbl_1)
Warning in stats::prop.test(x = count, n = n, p = p, alternative =
alternative, : Chi-squared approximation may be incorrect

    1-sample proportions test without continuity correction

data:  tbl_1  [with success = 14]
X-squared = 0, df = 1, p-value = 1
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.150039 0.849961
sample estimates:
  p 
0.5 

The output shows the test statistic, degrees of freedom, and p-value for the test. In this case, the test statistic is X-squared with 1 degree of freedom, and the p-value is 1. This means that there is no evidence to reject the null hypothesis that the true proportion is equal to 0.5. The alternative hypothesis is that the true proportion is not equal to 0.5.

The 95% confidence interval for the proportion ranges from 0.1500 to 0.8500. This means that with 95% confidence, we can say that the true proportion lies within this interval. Since the interval includes 0.5, it suggests that the true proportion may be equal to 0.5. The sample estimate of the proportion is 0.5, which is the point estimate of the proportion based on the data used in the analysis.

Relative Risk Confidence Interval

epitools::riskratio(tbl_1,rev='both',method = 'wald')
$data
        LOW[no] LOW[yes] Total
UI[no]       14       14    28
UI[yes]      45      116   161
Total        59      130   189

$measure
                        NA
risk ratio with 95% C.I. estimate     lower    upper
                 UI[no]  1.000000        NA       NA
                 UI[yes] 1.440994 0.9827936 2.112817

$p.value
         NA
two-sided midp.exact fisher.exact chi.square
  UI[no]          NA           NA         NA
  UI[yes] 0.02648209   0.02691811 0.02012792

$correction
[1] FALSE

attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"

The contingency table used in the analysis shows the counts of two categorical variables, UI and LOW, with two levels each. The measure section of the output shows the estimated risk ratio with a 95% confidence interval. The risk ratio estimates the ratio of the probability of an event occurring in one group relative to the probability of the same event occurring in another group. In this case, the estimated risk ratio for UI[yes] compared to UI[no] is 1.44. The estimated risk ratio of 1.44 means that the risk (or probability) of the event occurring in the UI[yes] group is 1.44 times higher than the risk of the same event occurring in the UI[no] group. In other words, if the event under consideration was rare, and its probability was 0.05 in the UI[no] group, then we would expect its probability to be 0.05 * 1.44 = 0.072 in the UI[yes] group. The 95% confidence interval for the risk ratio ranges from 0.98 to 2.11. This means that with 95% confidence, we can say that the true risk ratio lies within this interval. The confidence interval includes 1, which suggests that there may not be a significant difference in risk between the two groups.

The p-value section of the output shows the p-value for the two-sided test of the null hypothesis that the risk ratio is equal to 1. In this case, the p-value is 0.027, which is less than the typical significance level of 0.05. Therefore, we can reject the null hypothesis and conclude that there is evidence of a significant difference in risk between the two groups.

The correction section of the output shows that no continuity correction was applied in the analysis. Finally, the method section of the output describes the statistical method used to compute the confidence interval. In this case, an unconditional maximum likelihood estimate (MLE) was used with a normal approximation (Wald) confidence interval.