This data set provides information on the fate of passengers on the fatal maiden voyage of the ocean liner ‘Titanic’, summarized according to economic status (class), sex, age and survival.
The sinking of the Titanic is a famous event, and new books are still being published about it. Many well-known facts—from the proportions of first-class passengers to the ‘women and children first’ policy, and the fact that that policy was not entirely successful in saving the women and children in the third class—are reflected in the survival rates for various classes of passenger.
These data were originally collected by the British Board of Trade in their investigation of the sinking. Note that there is not complete agreement among primary sources as to the exact numbers on board, rescued, or lost.
Titanic Ship
## Class Sex Age Survived Freq
## 1 1st Male Child No 0
## 2 2nd Male Child No 0
## 3 3rd Male Child No 35
## 4 Crew Male Child No 0
## 5 1st Female Child No 0
## 6 2nd Female Child No 0
titanicData <- data.frame(Titanic)
survive <- split(titanicData, titanicData$Survived)
surviveNo <- survive$No
totalNoSurvive <- tapply(surviveNo$Freq, surviveNo$Age, sum)
surviveYes <- survive$Yes
totalYesSurvive <- tapply(surviveYes$Freq, surviveYes$Age, sum)
Based on the bar chart above and the code above, as we can see majority of passengers aboard Titanic ship who does survive the incidents and only a few of survivors. Based on the data set, 711 of the passengers consisting 57 Children and 654 Adult have survive. Meanwhile, 1490 of the passengers consisting 52 Children and 1438 Adult does not. The data shows that approximately 52.29% of children survives and approximately 31.26% of adult survives the incident.
titanicData <- data.frame(Titanic)
survive <- split(titanicData, titanicData$Survived)
surviveNo <- survive$No
totalNoSurvive <- tapply(surviveNo$Freq, surviveNo$Sex, sum)
surviveYes <- survive$Yes
totalYesSurvive <- tapply(surviveYes$Freq, surviveYes$Sex, sum)
Based on the data set and code above, 711 of passengers consisting of 367 Male and 344 Female survived the incident. Meanwhile, 1490 of the passengers consisting of 1364 Male and 126 Female does not survive the incident. The data shows that approximately 73.19% of female survives and approximately 21.20% of male only survived the Titanic incident.
## ================================================================================
##
## Class
##
## --------------------------------------------------------------------------------
##
## Storage mode: integer
## Factor with 4 levels
##
## Levels and labels N Valid
##
## 1 '1st' 8 25.0
## 2 '2nd' 8 25.0
## 3 '3rd' 8 25.0
## 4 'Crew' 8 25.0
##
## ================================================================================
##
## Sex
##
## --------------------------------------------------------------------------------
##
## Storage mode: integer
## Factor with 2 levels
##
## Levels and labels N Valid
##
## 1 'Male' 16 50.0
## 2 'Female' 16 50.0
##
## ================================================================================
##
## Age
##
## --------------------------------------------------------------------------------
##
## Storage mode: integer
## Factor with 2 levels
##
## Levels and labels N Valid
##
## 1 'Child' 16 50.0
## 2 'Adult' 16 50.0
##
## ================================================================================
##
## Survived
##
## --------------------------------------------------------------------------------
##
## Storage mode: integer
## Factor with 2 levels
##
## Levels and labels N Valid
##
## 1 'No' 16 50.0
## 2 'Yes' 16 50.0
##
## ================================================================================
##
## Freq
##
## --------------------------------------------------------------------------------
##
## Storage mode: double
##
## Min: 0.000
## Max: 670.000
## Mean: 68.781
## Std.Dev.: 133.854
## Skewness: 3.224
## Kurtosis: 10.780
| No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Valid | Missing | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Class [factor] |
|
|
32 (100.0%) | 0 (0.0%) | |||||||||||||||||||||
| 2 | Sex [factor] |
|
|
32 (100.0%) | 0 (0.0%) | |||||||||||||||||||||
| 3 | Age [factor] |
|
|
32 (100.0%) | 0 (0.0%) | |||||||||||||||||||||
| 4 | Survived [factor] |
|
|
32 (100.0%) | 0 (0.0%) | |||||||||||||||||||||
| 5 | Freq [numeric] |
|
22 distinct values | 32 (100.0%) | 0 (0.0%) |
Generated by summarytools 1.0.0 (R version 4.1.1)
2021-12-31
NFT sales dataset from https://www.kaggle.com/hemil26/nft-collections-dataset
NFT is a non-fungible token that have been use to buy and sell an owership of an unique digital item through the block chain.NFT have a lot of tansaction made every thus this the data set of NFT sales which contain Collection, Sales, Buyer, Transaction and Owners
nftSales <- read.csv("D:/newCode/university/data sc/lab/nft_sales2.csv",stringsAsFactors=FALSE)
filter() function is use to subset and extract data from the main data set (nftSales) based on the given condition :
library(dplyr)
filter(nftSales,is.na(nftSales$Owners))
## Collections Sales Buyers Txns Owners
## 1 Parallel Alpha 163724921 11103 67736 NA
## 2 Gutter Cat Gang 35876258 1729 3343 NA
## 3 Frontier Game 23972975 3257 7409 NA
## 4 Gutter Rats 12682958 1738 3157 NA
## 5 Illuvium 4849821 1255 3021 NA
## 6 Fluffy Polar Bears 3794206 3066 5104 NA
As result above have shown the filter() function is being use on the nftSales on the column Owners.The filter have been given the condition if any of the rows in the Owners column is NA, do not have data, the filter subset nftSales and give us which row that have the Owners column with NA.
arrange() function is used in order to sort the data set either in ascending order or descending order :
library(dplyr)
head(arrange(nftSales,desc(nftSales$Buyers)))#descending order
## Collections Sales Buyers Txns Owners
## 1 Axie Infinity 3328148500 1079811 9755511 2656431
## 2 Alien Worlds 33282729 405975 4630191 2562646
## 3 NBA Top Shot 781965423 374818 11790699 603928
## 4 CryptoKitties 45790208 111129 786656 109858
## 5 Sorare 129615752 42675 713122 60277
## 6 Zed Run 120191155 40469 160217 40190
head(arrange(nftSales,nftSales$Buyers))#ascending order
## Collections Sales Buyers Txns Owners
## 1 Chain Saw 5241292 31 47 28
## 2 Deafbeef 19249730 91 135 109
## 3 Wrapped Cryptocats 2774010 111 201 145
## 4 Non Fungible Fungi Genesis 4480098 127 167 71
## 5 Autoglyphs 41866276 183 349 157
## 6 Mutant Garden Seeder 9798416 250 468 272
As result above have shown the arrange() take the Buyers column and sort acording to either descending, when the data set is inside the desc() function, or ascending order, when the data set is not inside the desc() function.
mutate() function is use to add a new variable, new column, and preserse the existing ones :
library(dplyr)
head(mutate(nftSales,doubleSales = Sales * 2))
## Collections Sales Buyers Txns Owners doubleSales
## 1 Axie Infinity 3328148500 1079811 9755511 2656431 6656297000
## 2 CryptoPunks 1664246968 4723 18961 3289 3328493936
## 3 Art Blocks 1075223906 20934 117602 25094 2150447812
## 4 Bored Ape Yacht Club 783882186 8284 22584 5862 1567764372
## 5 NBA Top Shot 781965423 374818 11790699 603928 1563930846
## 6 Mutant Ape Yacht Club 422429206 10350 17343 10254 844858412
As result above have shown the mutate() function take the variable in column Sales, multiply it by two and and add new column while still maintaining the original data named doubleSales.
select() function is used to select a specific column from the data set based on some specifications.The specification can be the column name`s itself or using regex pattern :
library(dplyr)
head(select(nftSales,starts_with("Owner"))) #to search which column start with the word Owner
## Owners
## 1 2656431
## 2 3289
## 3 25094
## 4 5862
## 5 603928
## 6 10254
head(select(nftSales,matches("[Ow]n"))) #using regex pattern to determine which column to be selected
## Collections Owners
## 1 Axie Infinity 2656431
## 2 CryptoPunks 3289
## 3 Art Blocks 25094
## 4 Bored Ape Yacht Club 5862
## 5 NBA Top Shot 603928
## 6 Mutant Ape Yacht Club 10254
As result above have shown the select() function have given the column that you have specified either by calling the column name`s itself like Owner and only give the column with that name or using regex pattern and getting every column that match the pattern for instance the code above give use the column Collections and Owners since both match the criteria.
summarise() function create a new data frame that will contain with one coloumn for each summary statistics that have been specified :
library(dplyr)
summarise(nftSales,mean = mean(nftSales$Sales))
## mean
## 1 59224828
summarise(nftSales,qs = quantile(Sales, c(0.25, 0.75)),n=n())
## qs n
## 1 4645715 250
## 2 33968942 250
As result above shown the summarise() function have created a new data frame that contain with one summary statistics that have been specified like the first code only one column named mean while the second one contain to column With quantile and the number of the current group size.