Generator Capacity Prices (Dan Smilowitz)

Reading the Data

The data is loaded from a csv. The first two rows of the file have the headers – they are read in, combined, and set as the names:

UCAP         <- read.csv('./UCAP.csv', stringsAsFactors = FALSE, header = FALSE, skip = 2)
UCAP_headers <- read.csv('./UCAP.csv', stringsAsFactors = FALSE, header = FALSE, nrows = 2)
UCAP_names   <- c("Month_Year", rep("", length(UCAP_headers)-1))

for (i in 2:length(UCAP_headers)) {
  UCAP_names[i] = paste(UCAP_headers[1, i], UCAP_headers[2, i], sep = "-")
}

names(UCAP) <- UCAP_names
Month_Year Monthly-NYC Monthly-LHV Monthly-LI Monthly-ROS Spot-NYC Spot-LHV Spot-LI Spot-ROS Strip-NYC Strip-LHV Strip-LI Strip-ROS
November 2003 $6.67 $1.15 $0.50 $1.15 $6.98 $1.94 $8.14 $1.94 $6.55 $1.17 $4.00 $1.17
December 2003 $6.67 $1.58 $5.00 $1.58 $6.98 $1.78 $8.22 $1.78 $6.55 $1.17 $4.00 $1.17
January 2004 $6.67 $1.65 $8.10 $1.65 $6.98 $1.75 $7.99 $1.75 $6.55 $1.17 $4.00 $1.17
February 2004 $6.95 $1.67 $7.50 $1.67 $6.98 $1.73 $7.08 $1.73 $6.55 $1.17 $4.00 $1.17
March 2004 $6.25 $1.65 $7.00 $1.65 $6.98 $1.00 $7.72 $1.00 $6.55 $1.17 $4.00 $1.17
April 2004 $6.25 $0.99 $6.85 $0.99 $6.98 $0.80 $7.04 $0.80 $6.55 $1.17 $4.00 $1.17

Tidying the Data

The dataset is converted to a tiny format with each row representing an observation:

UCAP <- UCAP %>% gather(Auction_Location, Price, 2:13)
Month_Year Auction_Location Price
November 2003 Monthly-NYC $6.67
December 2003 Monthly-NYC $6.67
January 2004 Monthly-NYC $6.67
February 2004 Monthly-NYC $6.95
March 2004 Monthly-NYC $6.25
April 2004 Monthly-NYC $6.25

The first two columns contain two variables each, and the third column is stored as a string. The variables are separated using regular expressions and the stringr package, and the prices are converted to values.

UCAP$Year     <- str_extract(UCAP$Month_Year, "[[:digit:]]{4}")
UCAP$Month    <- str_extract(UCAP$Month_Year, "[[:alpha:]]+")
UCAP$Auction  <- str_sub(UCAP$Auction_Location, 1, str_locate(UCAP$Auction_Location, "-")[, 1] - 1)
UCAP$Location <- str_sub(UCAP$Auction_Location, str_locate(UCAP$Auction_Location, "-")[, 1] + 1)
UCAP$Price    <- extract_numeric(UCAP$Price)

UCAP          <- UCAP %>% select(Year, Month, Auction, Location, Price)
Year Month Auction Location Price
2003 November Monthly NYC 6.67
2003 December Monthly NYC 6.67
2004 January Monthly NYC 6.67
2004 February Monthly NYC 6.95
2004 March Monthly NYC 6.25
2004 April Monthly NYC 6.25

Analyzing the Data

Which month of the year sees the highest prices in each location?

The average price is calculated by location and month. With the monthly means for each location sorted in descending order, returning every 12th row will return the highest value for each location:

UCAP_month <- UCAP %>% 
  group_by(Location, Month) %>%
  summarize(Mean = mean(Price, na.rm = TRUE)) %>%
  arrange(desc(Mean))

UCAP_month[seq(1, nrow(UCAP_month), 12), 1:2]
## Source: local data frame [4 x 2]
## Groups: Location [4]
## 
##   Location Month
##      (chr) (chr)
## 1      LHV  June
## 2       LI   May
## 3      NYC   May
## 4      ROS  June

What is the average difference between NYC and ROS prices?

UCAP_diff <- UCAP %>%
  group_by(Location) %>%
  filter(Location == "NYC" | Location == "ROS") %>%
  summarize(Mean=mean(Price, na.rm = TRUE))

UCAP_diff[1, 2] - UCAP_diff[2,2]
##      Mean
## 1 6.72846

Which calendar year saw the highest average price across regions?

UCAP %>%
  group_by(Year) %>%
  summarize(Mean = mean(Price, na.rm = TRUE)) %>%
  arrange(desc(Mean)) %>%
  top_n(3)
## Source: local data frame [3 x 2]
## 
##    Year     Mean
##   (chr)    (dbl)
## 1  2014 7.532708
## 2  2015 6.254306
## 3  2013 5.695833

Is the monthly auction or the spot auction more volatile?

UCAP %>%
  group_by(Auction) %>%
  filter(Auction == "Monthly" | Auction == "Spot") %>%
  summarize(Variance = var(Price, na.rm = TRUE)) %>%
  arrange(desc(Variance))
## Source: local data frame [2 x 2]
## 
##   Auction Variance
##     (chr)    (dbl)
## 1    Spot 16.66132
## 2 Monthly 15.79204

The spot auction is more volatile than the monthly auction, as it has a higher variance.

Untidy Auction Data (Christopher Martin)

Reading the Data

The data is loaded from a csv.

auction <- read.csv('./untidy_auction_data.csv', stringsAsFactors = FALSE, header = TRUE)
Lot Description Range Price
2001 New England Federal mahogany slant front desk, ca. 1810, with allover line inlay, 45" h., 40" w. $300 - $400 $283
2002 Federal brass and wire fire fender, early 19thc., 13 3/4" h., 30" w. $80 - $120 $234
2003 Pennsylvania or New Jersey late Chippendale applewood card table, ca. 1790, 29 1/2" h., 36" w. $200 - $300 $357

Tidying the Data

The item descriptions are not needed for this analysis – they are stored in a separate table with the lot number to allow for join functions in dplyr if needed for further analysis.

lot_info <- auction %>% select(Lot, Description)
auction  <- auction %>% select(Lot, Range, Price)
Lot Range Price
2001 $300 - $400 $283
2002 $80 - $120 $234
2003 $200 - $300 $357

The auction dataset is tidied, including the creation of two separate values for estimates (Low and High):

auction$Low  <- str_trim(str_sub(auction$Range, 1, str_locate(auction$Range, "-")[, 1] - 1))
auction$High <- str_trim(str_sub(auction$Range, str_locate(auction$Range, "-")[, 1] + 1))
auction <- auction %>% select(Lot, Low, High, Price)

auction <- auction %>% gather(Value, Price, -Lot) %>% arrange(Lot, Value)
auction$Price <- extract_numeric(auction$Price)
Lot Value Price
2001 High 400
2001 Low 300
2001 Price 283
2002 High 120
2002 Low 80
2002 Price 234
2003 High 300
2003 Low 200
2003 Price 357

Analyzing the Data

What is the percent different between estimates and sale prices?

The requested analysis asked for the total percent difference between estimates and sale prices. Thus, the data are summarized across lot numbers before calculation.

auction <- auction %>% group_by(Value) %>% summarize(Sum = sum(Price))

HighDiff <- round((auction[auction$Value == "High", 2] - auction[auction$Value == "Price", 2]) / 
                    auction[auction$Value == "Price", 2], 4) * 100
LowDiff <- round((auction[auction$Value == "Low", 2] - auction[auction$Value == "Price", 2]) / 
                   auction[auction$Value == "Price", 2], 4) * 100

The stated problem suggested calculating the percent difference with the estimate as the denominator; however, using the sale price as the denominator is likely more accurate, as it compares to the actual value rather than the predicted value.

The difference between the low estimates and sale prices is -33.64%, and the difference between high estimates and sale prices is -6.18%. In both cases, the negative numbers indicate that the estimates underpredict the actual sale prices.

Distribution of Wealth (Ken Markus)

Reading the Data

The data is loaded from a csv. The first row of the file contains the chart title, and the second row has column names with a number of spaces — as a result, the first two rows will be skipped and the column names entered manually.

wealth <- read.csv('./wealth.csv', stringsAsFactors = FALSE, header = FALSE, skip = 2)
names(wealth) <- c('Year', 'Top1', 'Next19', 'Bottom80')
Year Top1 Next19 Bottom80
1983 42.90% 48.40% 8.70%
1989 46.90% 46.50% 6.60%
1992 45.60% 46.70% 7.70%
1995 47.20% 45.90% 7.00%
1998 47.30% 43.60% 9.10%
2001 39.70% 51.50% 8.70%

Tidying the Data

Because of the percentage signs in the data, the values are read in as strings. This is corrected, and the numbers are converted to decimal form.

for (i in 2:4) {
  wealth[, i] <- extract_numeric(wealth[, i]) / 100
}

The data is then converted to a “long” data set to allow for easier analysis.

wealth <- wealth %>% gather(Tier, Share, -Year)
Year Tier Share
1983 Top1 0.429
1989 Top1 0.469
1992 Top1 0.456
1995 Top1 0.472
1998 Top1 0.473
2001 Top1 0.397

Analyzing the Data

How has the distribution of wealth changed over time?

To analyze the change in distribution of wealth, the values for each group are plotted over time:

From this chart, it is clear that the share of total wealth of the “Next 19%” has increased, surpassing taht of the “Top 1%”" some time between 1998 and 2001. The overall share of these two groups has increased, as the share of the “Bottom 80%” has decreased from almost 10% in 1983 to under 5% in 2010.