Week 6 Assignment

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

datas <- read.csv("C:\\Users\\karth\\Downloads\\Child Growth and Malnutrition.csv")

view(datas)

datas$Wasting = as.numeric(datas$Wasting)

## Warning: NAs introduced by coercion

datas$Overweight = as.numeric(datas$Overweight)

## Warning: NAs introduced by coercion

a <- datas |>
  group_by(Country.Short.Name, Year.period) |>
  summarise(sum(Wasting))

## `summarise()` has grouped output by 'Country.Short.Name'. You can override
## using the `.groups` argument.

## # A tibble: 1,197 × 3
## # Groups:   Country.Short.Name [173]
##    Country.Short.Name                                 Year.period `sum(Wasting)`
##    <chr>                                              <chr>                <dbl>
##  1 ""                                                 "0.   - 4.…           NA  
##  2 ""                                                 "Not selec…           NA  
##  3 ""                                                 "Selected …           NA  
##  4 " Office of the Chief Government Statistician (OC… " and ICF\…           NA  
##  5 " Uverhkangai and"                                 ""                    NA  
##  6 " and Khovd aimags\""                              ""                    NA  
##  7 "Afghanistan"                                      "1997"                18.2
##  8 "Afghanistan"                                      "2004"               169. 
##  9 "Afghanistan"                                      "2013"               629. 
## 10 "Afghanistan"                                      "2018"               386. 
## # ℹ 1,187 more rows

datas_new <- datas |>
  mutate(BMI_Over = Overweight/(Stunting), na.rm = TRUE) |>
  mutate(BMI_Under = Underweight/(Stunting), na.rm = TRUE) |>
  mutate(SD_Difference = Wasting - Overweight, na.rm = TRUE)
view(datas_new)

datas_1 <- datas_new |>
  select(Overweight, Stunting, BMI_Over)
datas_2 <- datas_new |>
  select(Underweight, Stunting, BMI_Under)
datas_3 <- datas_new |>
  select(Wasting, Overweight, SD_Difference)
view(datas_1)

a <- datas_1 |>
  ggplot() +
  geom_point(mapping = aes(x = Overweight, y = BMI_Over))
a

## Warning: Removed 2357 rows containing missing values (`geom_point()`).

The above graph shows the BMI for Overweight, but stunted children. We see the values are somewhat minimal, stating that these children are not healthy, but some children have enormous BMIs - which means that their weight for height is too high.

b <- datas_2 |>
  ggplot() +
  geom_point(mapping = aes(x = Underweight, y = BMI_Under))
b

## Warning: Removed 1812 rows containing missing values (`geom_point()`).

The above graph shows the BMI for Underweight, but stunted children. We see the values are somewhat minimal, stating that these children are not healthy, but some children have enormous BMIs - which means that their weight for their somewhat short height is too high. We also see that as the weight increases, the height will slightly go up for these children.

c <- datas_3 |>
  ggplot() +
  geom_point(mapping = aes(x = Wasting, y = SD_Difference))
c

## Warning: Removed 2172 rows containing missing values (`geom_point()`).

The above graph shows the difference between Standard deviation for Wasting and Overweight children. These children are not only prevalent for their height - for - weight ratio, but some of them are enormous.

any(!is.finite(datas_2$BMI_Under))

## [1] TRUE

view(datas_2)

cor(na.omit(datas_2))

##             Underweight  Stunting BMI_Under
## Underweight   1.0000000 0.8129589       NaN
## Stunting      0.8129589 1.0000000       NaN
## BMI_Under           NaN       NaN         1

We see that Underweight and Stunting go in the same direction, but at different speeds. We have used the Pearson Coefficient. This makes sense, as a malnourished child is going to be both Underweight and Stunted, but as one increases in a child, so will the other

cor(na.omit(datas_1))

##            Overweight   Stunting BMI_Over
## Overweight  1.0000000 -0.2899611      NaN
## Stunting   -0.2899611  1.0000000      NaN
## BMI_Over          NaN        NaN        1

We see that Overweight and Stunting go in opposite directions, at different speeds. This also makes sense, as if a child is overweight, he/she is not going to be stunted

cor(na.omit(datas_3))

##                  Wasting Overweight SD_Difference
## Wasting        1.0000000 -0.2582857     0.8231593
## Overweight    -0.2582857  1.0000000    -0.7611542
## SD_Difference  0.8231593 -0.7611542     1.0000000

We see that Wasting and Overweight go in different directions at different speeds. SInce they are both defined for weight - for - height, and the definitions are almost polar opposites of one another, this correlation makes sense.

x_m <- mean(datas_1$BMI_Over, na.rm = TRUE)
y_m <- mean(datas_2$BMI_Under, na.rm = TRUE)
z_m <- mean(datas_3$SD_Difference, na.rm = TRUE)
cat(x_m, y_m, z_m, "\n")

## Inf Inf 1.114192

x_sd <- sd(datas_1$BMI_Over, na.rm = TRUE)
y_sd <- sd(datas_2$BMI_Under, na.rm = TRUE)
z_sd <- sd(datas_3$SD_Difference, na.rm = TRUE)
cat(x_sd, y_sd, z_sd)

## NaN NaN 8.647883

x_ma <- qt(0.975, df=nrow(datas_1)-1)*x_sd/sqrt(nrow(datas_1))
y_ma <- qt(0.975, df=nrow(datas_2)-1)*y_sd/sqrt(nrow(datas_2))
z_ma <- qt(0.975, df=nrow(datas_3)-1)*z_sd/sqrt(nrow(datas_3))

x_lo <- x_m - x_ma
x_hi <- x_m + x_ma
y_lo <- y_m - y_ma
y_hi <- y_m + y_ma
z_lo <- z_m - z_ma
z_hi <- z_m + z_ma
cat(x_lo, x_hi, "\n")

## NaN NaN

cat(y_lo, y_hi, "\n")

## NaN NaN

cat(z_lo, z_hi, "\n")

## 1.028928 1.199457

I was able to get the confidence interval of the mean for SD_Difference column, which is (1.028928, 1.199457) and the mean as seen above is 1.114192. The confidence interval built was 95%. This means that the average of difference between Wasting and Overweight for many children will fall between 1.03 and 1.2. The mean of this variable was 1.114 and the standard deviation was 8.648

Week 6 Assignment

2023-09-28

R Markdown