fish <- read.table("http://www.amstat.org/publications/jse/datasets/fishcatch.dat.txt ")
colnames(fish) <- c("obs","species","weight","len1","len2","len3"
,"height.pct","width.pct","sex")
fish$sex <- factor(fish$sex, levels = c(0,1), labels = c("female","male"))
fish$species <- factor(fish$species, levels = c(1:7),
labels = c("Common Bream","Whitefish","Roach","Silver Bream"
,"Smelt","Pike","Perch"))
# Do not modify the following code:
fish.sub <- filter(fish, sex != "NA")
knitr::kable(head(fish.sub), format = "markdown")
| obs | species | weight | len1 | len2 | len3 | height.pct | width.pct | sex |
|---|---|---|---|---|---|---|---|---|
| 14 | Common Bream | NA | 29.5 | 32 | 37.3 | 37.3 | 13.6 | male |
| 15 | Common Bream | 600 | 29.4 | 32 | 37.2 | 40.2 | 13.9 | male |
| 17 | Common Bream | 700 | 30.4 | 33 | 38.3 | 38.8 | 13.8 | male |
| 21 | Common Bream | 575 | 31.3 | 34 | 39.5 | 38.3 | 14.1 | male |
| 26 | Common Bream | 725 | 31.8 | 35 | 40.9 | 40.0 | 14.8 | male |
| 30 | Common Bream | 1000 | 33.5 | 37 | 42.6 | 44.5 | 15.5 | female |
mean.wt <- fish %>%
select(species,weight) %>%
na.omit() %>% # omit common Bream record w/ missing weight observation
group_by(species) %>%
dplyr::summarise("Weight" = mean(weight)) %>%
arrange(Weight)
# Do not modify the following code:
knitr::kable(mean.wt, format = "markdown")
| species | Weight |
|---|---|
| Smelt | 11.17857 |
| Roach | 152.05000 |
| Silver Bream | 154.81818 |
| Perch | 382.23929 |
| Whitefish | 531.00000 |
| Common Bream | 626.00000 |
| Pike | 718.70588 |
The species with the smallest mean weight is Smelt with a weight of 11.18g.
ggplot(data=mean.wt, aes(reorder(species,desc(Weight)),Weight)) +
geom_col() +
theme(axis.text.x = element_text(vjust = .75,angle = 45)) +
labs(title = "Mean Weight per Species",x="Species",y="Weight (g)")
Forbes <- read.csv("2014 Forbes Global 2000.csv",stringsAsFactors = FALSE
,na.strings = (""))
Forbes <- Forbes %>%
filter(!is.na(Sector), Sales > 0)
Forbes$Sector <- factor(Forbes$Sector)
Forbes$Industry <- factor(Forbes$Industry)
Forbes$Continent <- factor(Forbes$Continent)
Forbes$Country <- factor(Forbes$Country)
f_sub <- subset(Forbes, Continent %in% c("Asia","Europe","North America"))
ggplot(f_sub, aes(x=Sales,y=Market_Value)) +
geom_point() +
geom_smooth(method = "lm") +
facet_grid(cols=vars(Continent))
There are a number of interesting results being displayed in the graphs above. It is probably not all that surprising to learn that the most valuable companies on Forbe's 2014 list are based in the United States. However, what is surprising is how much more quickly (or easily) US companies are able to reach a market cap in excess of $100bil. The average Asian and European company will need to generate sales in the neighborhood of 1.5x that of a US company in order to reach the same valuation. Even more curious, some US based companies are able to generate in excess of $200 bil in market value on sales of $100bil or less.
Forbes <- Forbes %>%
mutate(ProfMgn = Profits / Sales)
# Do not modify the following code:
knitr::kable(head(Forbes), format = "markdown")
| Rank | Company | Sector | Industry | Continent | Country | Sales | Profits | Assets | Market_Value | ProfMgn |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | ICBC | Financials | Major Banks | Asia | China | 148.7 | 42.7 | 3124.9 | 215.6 | 0.2871553 |
| 2 | China Construction Bank | Financials | Regional Banks | Asia | China | 121.3 | 34.2 | 2449.5 | 174.4 | 0.2819456 |
| 3 | Agricultural Bank of China | Financials | Regional Banks | Asia | China | 136.4 | 27.0 | 2405.4 | 141.1 | 0.1979472 |
| 4 | JPMorgan Chase | Financials | Major Banks | North America | United States | 105.7 | 17.3 | 2435.3 | 229.7 | 0.1636708 |
| 5 | Berkshire Hathaway | Financials | Investment Services | North America | United States | 178.8 | 19.5 | 493.4 | 309.1 | 0.1090604 |
| 6 | Exxon Mobil | Energy | Oil & Gas Operations | North America | United States | 394.0 | 32.6 | 346.8 | 422.3 | 0.0827411 |
ggplot(data = Forbes, aes(Sector,ProfMgn)) +
geom_boxplot() +
coord_flip()
The sector that appears to have the greatest standard deviation is Consumer Discretionary.
Forbes.SD <- Forbes %>%
group_by(Sector) %>%
summarize(stdev = sd(ProfMgn))
# Do not modify the following code:
knitr::kable(Forbes.SD, format = "markdown")
| Sector | stdev |
|---|---|
| Consumer Discretionary | 0.6289455 |
| Consumer Staples | 0.1000578 |
| Energy | 0.1058560 |
| Financials | 0.4052307 |
| Health Care | 0.1074421 |
| Industrials | 0.0993903 |
| Information Technology | 0.2345233 |
| Materials | 0.2154817 |
| Telecommunication Services | 0.1022383 |
| Utilities | 0.1623265 |
The sector that has the greatest standard deviation is Consumer Discretionary. Though a majority of the data appears to be relatively concentrated about a profit margin of ~0.5, the Consumer Discretionary has the largest variance of any sector.