Instructions

This assignment you will be practicing working with dataframes in R. Reading in data calculating simple statistics.

We will be working with the education_data.csv file we saw in class. This file are records of students test scores in the fall and spring semesters in one academic school year.

Exercise

Create a R script to record all the answers to these questions.

Set Up Work Space

Set up a R project to do this assignment in. Download the education_data.csv on google drive and put it into project folder

  1. Where is education_data.csv located on your computer
    1. What is the absolute path?
    2. What is the relative path?

Documentation

  1. Tell me what the abs function does and use it in an example?
#code template
?function_name
  1. What is the default value of the na.rm argument of the mean function and what does the na.rm argument control?
#code template
?function_name

Exploring Data

  1. Read in the data
#Code template
my_dataframe <- read.csv("path to the file")
  1. What are names of the columns?
#Code template
names(my_dataframe)
  1. How many rows are in the dataframe? Save it into an object you will need it later.
#Code template
nrow(my_dataframe)
numbers_of_rows <- nrow(my_dataframe)
  1. How many different races are contained in this dataframe?
#Code template
unique(my_dataframe$column_name)
  1. How many male and female students are there?
#Code template
table(my_dataframe$column_name)
  1. How many missing values are there in Fall Reading Scores? What is the percentage of missings in the total dataframe?
#R knowledge
#TRUE can be treated the value 1 so we can take the sum to get all
#the number of TRUE values
logic_test <- c(TRUE, FALSE, TRUE, FALSE) # 2 TRUE values
sum(logic_test)
#Code template
logic_missing <- is.na(my_dataframe$column_name)

total_missing <- sum(logic_missing)

total_missing/numbers_of_rows
  1. How many students have a fall math score greater then the fall reading score. What is the percentage of those who have a greater math score and what does it tell you?
#Code template
logic_greaterthan <- my_dataframe$column_name > my_dataframe$column_name

sum(logic_greaterthan)/numbers_of_rows
  1. Calculate the sum of fall reading score to fall math score? Save it into a new column
#Code template
my_dataframe$new_column <- my_dataframe$column_name + my_dataframe$column_name
  1. What is the average sum of reading to math scores?
#Code template
mean(my_dataframe$new_column)
  1. Create 2 new columns for the change in reading and math score
#Code template
my_dataframe$new_column <- my_dataframe$column_name - my_dataframe$column_name
  1. Save the dataframe with the new columns into a new datafile
write.csv(my_dataframe, file = "newfilename.csv")
  1. Read in the saved data and check it’s columns to see if all new columns are there.
new_education_data <- read.csv("newfilename.csv")

names(new_education_data)