Import Nutrition Data

Data downloaded from Kaggle.

https://www.kaggle.com/datasets/utsavdey1410/food-nutrition-dataset

Clear up the workspace and read up on the help files.

remove (list=ls())

?read.csv # open the help file

My original import commands - you have to change the file path in your working directory.

FOOD.DATA.GROUP1 <- read.csv("~/Desktop/FOOD-DATA-GROUP1.csv")# you need to identify the key argument in the command

FOOD.DATA.GROUP1 <- read.csv(file = "~/Desktop/FOOD-DATA-GROUP1.csv") # better coding practice is to specify the key argument

FOOD.DATA.GROUP2 <- read.csv(file = "~/Desktop/FOOD-DATA-GROUP1.csv", 
                             header = TRUE) # explicitly specifying the default argumnent does not change anything, but might be a good practice when you are new

FOOD.DATA.GROUP2 <- read.csv(file = "~/Desktop/FOOD-DATA-GROUP1.csv", 
                             header = FALSE)

This piece of code is better as you do not have to change the file path and can simply run the code without making any changes, as long as you maintain the original folder (do not delete the sub folder FINAL FOOD DATASET).

FOOD.DATA.GROUP1 <- read.csv("~/Desktop/FOOD-DATA-GROUP1.csv")# you need to identify the key argument in the command

FOOD.DATA.GROUP2 <- read.csv(file = "~/Desktop/FOOD-DATA-GROUP1.csv", 
                             header = FALSE)

Sub Setting Data

We will create a healthy and unhealthy panel of food, based on Nutrition.Density values.

?remove
remove(FOOD.DATA.GROUP2) # remove data group two.

FOOD.DATA.GROUP1$X <- NULL #take away X category.

FOOD.DATA.GROUP1$healthy <- FOOD.DATA.GROUP1$Nutrition.Density > 10 # overriden values less than 10 - created own value of healthy based on current value of nutrition.density to study healthy foods, created true/false categories.

df_healthy_food <- FOOD.DATA.GROUP1[FOOD.DATA.GROUP1$healthy, ] # create healthy food data set.
df_unhealthy_food <- FOOD.DATA.GROUP1[!FOOD.DATA.GROUP1$healthy, ] # created opposite/unhealthy data set.

Sub Setting Large data

You can play with the nrows argument to import only a small subset of the original data.

FOOD.DATA.GROUP2 <- read.csv(file = "~/Desktop/FOOD-DATA-GROUP1.csv", 
                             header = TRUE,
                             nrows = 10)

‘read.csv’ help files gives us some instructions on how to use the command.

?read.csv # open the help file

## using count.fields to handle unknown maximum number of fields
## when fill = TRUE
test1 <- c(1:5, "6,7", "8,9,10")
tf <- tempfile()
writeLines(test1, tf)

read.csv(tf, fill = TRUE) # 1 column
ncol <- max(count.fields(tf, sep = ","))
read.csv(tf, fill = TRUE, header = FALSE,
         col.names = paste0("V", seq_len(ncol)))
unlink(tf)

## "Inline" data set, using text=
## Notice that leading and trailing empty lines are auto-trimmed

read.table(header = TRUE, text = "
a b
1 2
3 4
")

Summary Stats

We will use the psych package. Use the stargazer package if you wish instead.

# install.packages("psych") # installation - only once
library(psych)