Instructions:

  • This exam is open note, though you will want to use a cheat sheet to answer the following questions efficiently.
  • The data provided includes the estimated and actual costs of large road construction projects managed by the US Department of Transportation (DoT) in 2013.
  • Note, that the numbers provided in this spreadsheet are in thousands (e.g., 100 = $100,000).

Importing and Summarizing Data

  1. Import the data from “Road construction bids.csv”.
library(readxl)
Road= read.csv("C:\\Users\\etfie\\OneDrive\\Data Visualization\\Road construction bids.csv")
  1. How many bids are in this data set?

There are 235 bids in this data set.

  1. What are the average estimate and average actual costs of these projects? What can you conclude?
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
newdata<- summarize(Road, mean_ActualCost = mean(Actual_Cost, na.rm = TRUE),mean_DoTEstimate = mean(DoT_Estimate, na.rm = TRUE))
newdata
##   mean_ActualCost mean_DoTEstimate
## 1        1268.715         1347.077

Manipulating Data

  1. Add the following variables to your data set:
  • difference = the difference between the estimate and the actual cost.
  • percent_difference = difference/cost
  • budget = projects whose actual cost is more than the estimate are labeled as “Over Budget”, otherwise “Under Budget”.
library(dplyr)

newdata <- summarize(Road, 
                     mean_ActualCost = mean(Actual_Cost, na.rm = TRUE),
                     mean_DoTEstimate = mean(DoT_Estimate, na.rm = TRUE))

newdata$difference <- newdata$mean_DoTEstimate / newdata$mean_ActualCost
newdata<- summarize(Road, 
                     mean_ActualCost = mean(Actual_Cost, na.rm = TRUE))
newdata
##   mean_ActualCost
## 1        1268.715
library(dplyr)
newdata<- summarize(Road,
                   mean_DoTEstimate = mean(DoT_Estimate, na.rm = TRUE))
newdata
##   mean_DoTEstimate
## 1         1347.077

difference

percent_difference = difference/cost

  1. What is the mean difference between large and small projects? What does this tell us?

  2. What is the mean percent difference between large and small projects? What does this tell us?

Plotting Data

  1. Plot a graph of the difference (between estimate and actual cost) vs. actual costs. Make your graph as descriptive as possible, adding smoothing, color grouping, scales, and labels to best present the data.

  2. Plot a graph of the percent difference (between estimate and actual cost) vs. actual costs. Make your graph as descriptive as possible, adding color, grouping, scales, and labels to best present the data.

  3. Summarize what these graphs show us.