This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
data<-read.csv("/Users/yaswithasakhamuri/Desktop/netflix.csv")
Printing Column Names of the Data
names(data)
## [1] "Country" "Total_Library_Size"
## [3] "Shows" "Movies"
## [5] "Cost_Per_Month_Basic" "Cost_Per_Month_Standard"
## [7] "Cost_Per_Month_Premium"
Finding Summary of Data
summary(data)
## Country Total_Library_Size Shows Movies
## Length:65 Min. :2274 Min. :1675 Min. : 373
## Class :character 1st Qu.:4948 1st Qu.:3154 1st Qu.:1628
## Mode :character Median :5195 Median :3512 Median :1841
## Mean :5314 Mean :3519 Mean :1795
## 3rd Qu.:5952 3rd Qu.:3832 3rd Qu.:1980
## Max. :7325 Max. :5234 Max. :2387
## Cost_Per_Month_Basic Cost_Per_Month_Standard Cost_Per_Month_Premium
## Min. : 1.970 Min. : 3.00 Min. : 4.02
## 1st Qu.: 7.990 1st Qu.:10.71 1st Qu.:13.54
## Median : 8.990 Median :11.49 Median :14.45
## Mean : 8.368 Mean :11.99 Mean :15.61
## 3rd Qu.: 9.030 3rd Qu.:13.54 3rd Qu.:18.06
## Max. :12.880 Max. :20.46 Max. :26.96
Finding Structure of Data
str(data)
## 'data.frame': 65 obs. of 7 variables:
## $ Country : chr "Argentina" "Austria" "Bolivia" "Bulgaria" ...
## $ Total_Library_Size : int 4760 5640 4991 6797 4994 4991 4988 2274 7325 4992 ...
## $ Shows : int 3154 3779 3155 4819 3156 3156 3152 1675 5234 3155 ...
## $ Movies : int 1606 1861 1836 1978 1838 1835 1836 599 2091 1837 ...
## $ Cost_Per_Month_Basic : num 3.74 9.03 7.99 9.03 7.07 4.31 8.99 9.03 8.83 7.99 ...
## $ Cost_Per_Month_Standard: num 6.3 14.67 10.99 11.29 9.91 ...
## $ Cost_Per_Month_Premium : num 9.26 20.32 13.99 13.54 12.74 ...
Head of the Data
head(data)
## Country Total_Library_Size Shows Movies Cost_Per_Month_Basic
## 1 Argentina 4760 3154 1606 3.74
## 2 Austria 5640 3779 1861 9.03
## 3 Bolivia 4991 3155 1836 7.99
## 4 Bulgaria 6797 4819 1978 9.03
## 5 Chile 4994 3156 1838 7.07
## 6 Colombia 4991 3156 1835 4.31
## Cost_Per_Month_Standard Cost_Per_Month_Premium
## 1 6.30 9.26
## 2 14.67 20.32
## 3 10.99 13.99
## 4 11.29 13.54
## 5 9.91 12.74
## 6 6.86 9.93
Tail of the Data
tail(data)
## Country Total_Library_Size Shows Movies Cost_Per_Month_Basic
## 60 Brazil 4972 3162 1810 4.61
## 61 Ireland 6486 4515 1971 9.03
## 62 Switzerland 5506 3654 1852 12.88
## 63 Australia 6114 4050 2064 7.84
## 64 Denmark 4558 2978 1580 12.00
## 65 United States 5818 3826 1992 8.99
## Cost_Per_Month_Standard Cost_Per_Month_Premium
## 60 7.11 9.96
## 61 14.67 20.32
## 62 20.46 26.96
## 63 12.12 16.39
## 64 15.04 19.60
## 65 13.99 17.99
Plotting Country vs No.of Movies seen
library(ggplot2)
ggplot(data, aes(x = Country, y =Movies)) +
geom_bar(stat = "identity", fill = "#B3A492") +
labs(title = "Movies watched by different Countries", x = " Country Name", y = "No.of Movies")
From above graph; 1) Maximum No.of Movies is watched by the Country Malaysia i.e., between 2000 and 2500 (2387 movies) 2) Minimum No.of Movies is watched by the Country San Mario i.e., between 0 and 500 (373 movies) 3) Average No.of Movies is watched by the Country Brazil i.e., between 1500 and 2000 (1810 movies)
ggplot(data, aes(x =Shows)) +
geom_boxplot(fill = "#EF9595",col="white") +
labs(title = "Box Plot", x = " No.of Tv Shows")
The boxplot shows No.of TV Shows watched. The x-axis shows No.of Tv Shows. The boxplot consists of a box with a line in the middle and two whiskers extending from the box.
ggplot(data, aes(x = 1, y = Shows, fill = Country)) +
geom_bar(stat = "identity") +
coord_polar(theta = "y") +
labs(title = "Country-Wise Analysis") +
theme_void()
The above given is a Pie Chart of portion of each Country in watching movies.
ggplot(data, aes(x = Movies, y = Cost_Per_Month_Premium)) +
geom_point(color = "#545B77") +
labs(title = "Scatterplot", x = "Movies", y = "Cost Per Month Premium(USD-$)")
The scatterplot shows the relationship between Movies and Cost per Month in USD. The x-axis shows the No.of Movies, and the y-axis Cost per Month in USD.
ggplot(data, aes(x = Total_Library_Size)) +
geom_histogram(fill = "#C4DFDF", color = "#ACB1D6", bins = 30) +
labs(title = "Histogram", x = "Total Library Size", y = "Frequency")
1) The ggplot2 code creates a histogram for the “Total_Library_Size”
variable. 2) The x-axis of the plot represents the total library size,
and the y-axis represents the frequency of occurrences. 3) The histogram
is configured with 30 bins for better granularity in displaying the
distribution. 4) Maximum Library Size is: 7325 5) Minimum Library Size
is: 2274 6) Average Library Size is around 5300