Taking a look at DSLabs Datasets/ Loading libraries
#install.packages("dslabs")
library("dslabs")
## Warning: package 'dslabs' was built under R version 4.1.3
data(package="dslabs")
#list.files(system.file("script", package = "dslabs"))
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.8
## v tidyr 1.2.0 v stringr 1.4.0
## v readr 2.1.2 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library("RColorBrewer")
library(dplyr)
library(ggplot2)
#install.packages("highcharter")
library(highcharter)
## Warning: package 'highcharter' was built under R version 4.1.3
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
##
## Attaching package: 'highcharter'
## The following object is masked from 'package:dslabs':
##
## stars
I have chosen the Italian olive dataset. I’m Italian so I felt like I just had to!
Italian Olive Dataset
This dataset explores the composition in percentage of eight fatty acids found in the lipid fraction of 572 Italian olive oils.
Exploring the dataset…
data("olive")
head(olive)
## region area palmitic palmitoleic stearic oleic linoleic
## 1 Southern Italy North-Apulia 10.75 0.75 2.26 78.23 6.72
## 2 Southern Italy North-Apulia 10.88 0.73 2.24 77.09 7.81
## 3 Southern Italy North-Apulia 9.11 0.54 2.46 81.13 5.49
## 4 Southern Italy North-Apulia 9.66 0.57 2.40 79.52 6.19
## 5 Southern Italy North-Apulia 10.51 0.67 2.59 77.71 6.72
## 6 Southern Italy North-Apulia 9.11 0.49 2.68 79.24 6.78
## linolenic arachidic eicosenoic
## 1 0.36 0.60 0.29
## 2 0.31 0.61 0.29
## 3 0.31 0.63 0.29
## 4 0.50 0.78 0.35
## 5 0.50 0.80 0.46
## 6 0.51 0.70 0.44
tail(olive)
## region area palmitic palmitoleic stearic oleic linoleic
## 567 Northern Italy West-Liguria 10.7 1.0 2.2 77.3 8.7
## 568 Northern Italy West-Liguria 12.8 1.1 2.9 74.9 7.9
## 569 Northern Italy West-Liguria 10.6 1.0 2.7 77.4 8.1
## 570 Northern Italy West-Liguria 10.1 0.9 2.1 77.2 9.7
## 571 Northern Italy West-Liguria 9.9 1.2 2.5 77.5 8.7
## 572 Northern Italy West-Liguria 9.6 0.8 2.4 79.5 7.4
## linolenic arachidic eicosenoic
## 567 0.1 0.1 0.02
## 568 0.1 0.1 0.02
## 569 0.1 0.1 0.03
## 570 0.0 0.0 0.02
## 571 0.1 0.1 0.02
## 572 0.1 0.2 0.02
dim(olive)
## [1] 572 10
summary(olive)
## region area palmitic palmitoleic
## Northern Italy:151 South-Apulia :206 Min. : 6.10 Min. :0.1500
## Sardinia : 98 Inland-Sardinia: 65 1st Qu.:10.95 1st Qu.:0.8775
## Southern Italy:323 Calabria : 56 Median :12.01 Median :1.1000
## Umbria : 51 Mean :12.32 Mean :1.2609
## East-Liguria : 50 3rd Qu.:13.60 3rd Qu.:1.6925
## West-Liguria : 50 Max. :17.53 Max. :2.8000
## (Other) : 94
## stearic oleic linoleic linolenic
## Min. :1.520 Min. :63.00 Min. : 4.480 Min. :0.0000
## 1st Qu.:2.050 1st Qu.:70.00 1st Qu.: 7.707 1st Qu.:0.2600
## Median :2.230 Median :73.03 Median :10.300 Median :0.3300
## Mean :2.289 Mean :73.12 Mean : 9.805 Mean :0.3189
## 3rd Qu.:2.490 3rd Qu.:76.80 3rd Qu.:11.807 3rd Qu.:0.4025
## Max. :3.750 Max. :84.10 Max. :14.700 Max. :0.7400
##
## arachidic eicosenoic
## Min. :0.000 Min. :0.0100
## 1st Qu.:0.500 1st Qu.:0.0200
## Median :0.610 Median :0.1700
## Mean :0.581 Mean :0.1628
## 3rd Qu.:0.700 3rd Qu.:0.2800
## Max. :1.050 Max. :0.5800
##
table(olive$area)
##
## Calabria Coast-Sardinia East-Liguria Inland-Sardinia North-Apulia
## 56 33 50 65 25
## Sicily South-Apulia Umbria West-Liguria
## 36 206 51 50
table(olive$region)
##
## Northern Italy Sardinia Southern Italy
## 151 98 323
* This dataset has 572 observations of 10 variables.
* The areas explored in this datasets are: Calabria, Coast-Sardinia, East-Liguria, Inland-Sardinia, North-Apulia, Sicily, South-Apulia, Umbria, and West-Liguria.
* The regions explored in this datasets are: Northern Italy, Sardinia, Southern Italy.
Researching…
Since I wasn’t familiar with most of these fatty acids, I listed them and provided definitions.
1. Palmitic acid: a solid saturated fatty acid obtained from palm oil and other vegetable and animal fats.
2. Palmitoleic acid: a non-essential omega-7 monounsaturated free fatty acid.
3. Stearic acid: a solid saturated fatty acid obtained from animal or vegetable fats.
4. Oleic acid: a fatty acid that occurs naturally in various animal and vegetable fats and oils.
5. Linoleic acid: a polyunsaturated fatty acid present as a glyceride in linseed oil and other oils and essential in the human diet.
6. Linolenic acid: a polyunsaturated fatty acid (with one more double bond than linoleic acid) present as a glyceride in linseed and other oils and essential in the human diet.
7. Arachidic acid: also known as icosanoic acid, is a saturated fatty acid with a 20-carbon chain.
8. Eicosenoic acid: a monounsaturated omega-9 fatty acid found in a variety of plant oils and nuts; jojoba oil. It is one of a number of eicosenoic acids.
According to the Mayo Clinic,“studies show that eating foods rich in unsaturated fat instead of saturated fat improve blood cholesterol levels, which can decrease your risk of heart attack and stroke. One type in particular omega-3 fatty acid appears to boost heart health by improving cholesterol levels, reducing blood clotting, reducing irregular heartbeats, and slightly lowering blood pressure.”
There are two main types of unsaturated fat:
* Monounsaturated fat
* Polyunsaturated fat
I want to focus on areas from Southern Italy only.
italysouth <- olive %>%
filter(region =='Southern Italy' )
Plotting a saturated fat and a unsaturated fat contents from areas of Southern Italy using Highcharter.
p1 <- italysouth %>%
hchart('scatter', hcaes(x = linoleic, y = palmitic, group = area)) %>%
hc_colors(c("#00bfff", "#ed9121", "#d70a53", "#00cc99" )) %>%
hc_xAxis(title = list(text="linoleic acid")) %>%
hc_subtitle(text = "Source: Olive data set") %>%
hc_yAxis(title = list(text="palmitic acid"))%>%
hc_title( text = "Palmitic and Linoleic acid contents in olive oils from Southern Italy") %>%
hc_add_theme(hc_theme_smpl())
p1
I chose the “olive” dataset from the DSLabs dataset. This dataset explores the composition in the percentage of eight fatty acids found in the lipid fraction of 572 Italian olive oils. It focuses on three regions of Italy, Northern Italy, Sardinia, and Southern Italy. As well as areas for these regions such as Calabria, Coast-Sardinia, East-Liguria, Inland-Sardinia, North-Apulia, Sicily, South-Apulia, Umbria, and West-Liguria. First, I did some basic exploring for this dataset, such as its dimensions, summary, etc. I then did some research about these eight fatty acids since I wasn’t familiar with all of them. I then listed them and provided their definitions. I then add information from the Mayo Clinic regarding what fats are considered healthy and unhealthy. Lastly, I created a scatterplot using Highcharter where I wanted to see the contents of one saturated fat, palmitic, and one polyunsaturated fat, linoleic, in olive oils from Southern Italy. I specifically focused on Southern Italy because that is where I’m from. I filtered this specific region using dplyr.