fish <- read.csv("Fish Data.csv", header = FALSE, col.names = c("species", "weight", "len1", "len2", "len3", "height.pct", "width.pct", "sex"))
fish$sex <- factor(fish$sex, levels=c(0,1), labels=c("Female", "Male"))
fish$species <- factor(fish$species, levels=c(1, 2, 3, 4, 5, 6, 7), labels=c("Bream", "Whitefish", "Roach", "Silver Bream", "Smelt", "Pike", "Perch"))
# Do not modify the following code:
fish.sub <- filter(fish, sex != "NA")
knitr::kable(head(fish.sub), format = "markdown")
| species | weight | len1 | len2 | len3 | height.pct | width.pct | sex |
|---|---|---|---|---|---|---|---|
| Bream | NA | 29.5 | 32 | 37.3 | 37.3 | 13.6 | Male |
| Bream | 600 | 29.4 | 32 | 37.2 | 40.2 | 13.9 | Male |
| Bream | 700 | 30.4 | 33 | 38.3 | 38.8 | 13.8 | Male |
| Bream | 575 | 31.3 | 34 | 39.5 | 38.3 | 14.1 | Male |
| Bream | 725 | 31.8 | 35 | 40.9 | 40.0 | 14.8 | Male |
| Bream | 1000 | 33.5 | 37 | 42.6 | 44.5 | 15.5 | Female |
mean.wt <- fish %>%
group_by(species) %>%
dplyr::summarize(Mean = mean(weight, na.rm=TRUE))
# Do not modify the following code:
knitr::kable(mean.wt, format = "markdown")
| species | Mean |
|---|---|
| Bream | 626.00000 |
| Whitefish | 531.00000 |
| Roach | 152.05000 |
| Silver Bream | 154.81818 |
| Smelt | 11.17857 |
| Pike | 718.70588 |
| Perch | 382.23929 |
The species with the smallest mean weight is Smelt with a weight of 11.17857g.
ggplot(data = mean.wt)+
geom_bar(mapping = aes(y = Mean, x = species, fill = species), stat= "identity") +
labs(title="Mean Weight per Species", x="Species", y="Weight in Grams")
Forbes <- read.csv("2014 Forbes Global 2000.csv", header = TRUE)
Forbes <- filter(Forbes, Sector !="" & Sales >0)
Forbes$Sector <- factor(Forbes$Sector)
Forbes$SIndustry <- factor(Forbes$Industry)
Forbes$Continent <- factor(Forbes$Continent)
Forbes$Country <- factor(Forbes$Country)
ggplot(subset(Forbes, Continent %in% c("Asia","Europe","North America"))) +
geom_point(mapping = aes(x = Sales, y = Market_Value, color = Continent, shape = Continent, alpha=.3)) +
facet_grid(~ Continent) +
geom_smooth(method = "lm", aes(x = Sales, y = Market_Value))
European and Asian company market values are closely correlated with a company’s sales. Both continents exhibit a similar correlation between the two attributes, which seems to create a consistent relationship. However, North American Company market values do not match the same trajectory as their European and Asian counterparts. Company sales in North America produce a more dramatic increase in market values comparatively. Thus increasing a North American’s market value based on sales much faster than those of European or Asian markets.
Forbes <- Forbes %>%
mutate(ProfMgn = Profits/Sales)
# Library_Open <- dplyr::mutate(Library_Open, ProfMgn = Profits/Sales)
# Do not modify the following code:
knitr::kable(head(Forbes), format = "markdown")
| Rank | Company | Sector | Industry | Continent | Country | Sales | Profits | Assets | Market_Value | SIndustry | ProfMgn |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | ICBC | Financials | Major Banks | Asia | China | 148.7 | 42.7 | 3124.9 | 215.6 | Major Banks | 0.2871553 |
| 2 | China Construction Bank | Financials | Regional Banks | Asia | China | 121.3 | 34.2 | 2449.5 | 174.4 | Regional Banks | 0.2819456 |
| 3 | Agricultural Bank of China | Financials | Regional Banks | Asia | China | 136.4 | 27.0 | 2405.4 | 141.1 | Regional Banks | 0.1979472 |
| 4 | JPMorgan Chase | Financials | Major Banks | North America | United States | 105.7 | 17.3 | 2435.3 | 229.7 | Major Banks | 0.1636708 |
| 5 | Berkshire Hathaway | Financials | Investment Services | North America | United States | 178.8 | 19.5 | 493.4 | 309.1 | Investment Services | 0.1090604 |
| 6 | Exxon Mobil | Energy | Oil & Gas Operations | North America | United States | 394.0 | 32.6 | 346.8 | 422.3 | Oil & Gas Operations | 0.0827411 |
ggplot(Forbes, aes(x = Sector, y = ProfMgn)) +
stat_boxplot(geom='errorbar', width=0.5) +
geom_boxplot(outlier.size = 1, aes(fill=Sector)) +
scale_y_continuous(limits=c(-2, 10)) +
coord_flip() +
stat_summary(fun.y = mean, color="yellow", geom="point", size=2, shape=18)
The sector that appears to have the greatest standard deviation is Consumer Discretionary.
Forbes.SD <- Forbes %>%
group_by(Sector) %>%
summarize(standard_deviation = sd(ProfMgn))
# Do not modify the following code:
knitr::kable(Forbes.SD, format = "markdown")
| Sector | standard_deviation |
|---|---|
| Consumer Discretionary | 0.6289455 |
| Consumer Staples | 0.1000578 |
| Energy | 0.1058560 |
| Financials | 0.4052307 |
| Health Care | 0.1074421 |
| Industrials | 0.0993903 |
| Information Technology | 0.2345233 |
| Materials | 0.2154817 |
| Telecommunication Services | 0.1022383 |
| Utilities | 0.1623265 |
The sector that has the greatest standard deviation is Consumer Discretionary. This large standard deviation is attributed to a much larger outlying ProfMgn value of 10. The next highest value being 6.5 gives Financials the second highest standard deviation. With the majority of ProfMgn values being less then 2, an outlier value of 10 greatly skews that set of data.