| LBB DATA VISUALIZATION |
| BY ANDREAS |
| 2022-05-23 |
Sebagai seorang YouTuber di Amerika yang ingin meningkatkan pamor channel YouTube, kita berencana untuk membuat konten video yang trending! Kita baru saja mendapatkan data YouTube’s US Trending Videos dan ingin mencari tahu karakteristik apa saja yang membuat suatu video menjadi trending?
vids <- read.csv('data_input/USvideos.csv')
cek data:
head(vids)
#> trending_date title
#> 1 17.14.11 WE WANT TO TALK ABOUT OUR MARRIAGE
#> 2 17.14.11 The Trump Presidency: Last Week Tonight with John Oliver (HBO)
#> 3 17.14.11 Racist Superman | Rudy Mancuso, King Bach & Lele Pons
#> 4 17.14.11 Nickelback Lyrics: Real or Fake?
#> 5 17.14.11 I Dare You: GOING BALD!?
#> 6 17.14.11 2 Weeks with iPhone X
#> channel_title category_id publish_time views likes
#> 1 CaseyNeistat 22 2017-11-13T17:13:01.000Z 748374 57527
#> 2 LastWeekTonight 24 2017-11-13T07:30:00.000Z 2418783 97185
#> 3 Rudy Mancuso 23 2017-11-12T19:05:24.000Z 3191434 146033
#> 4 Good Mythical Morning 24 2017-11-13T11:00:04.000Z 343168 10172
#> 5 nigahiga 24 2017-11-12T18:01:41.000Z 2095731 132235
#> 6 iJustine 28 2017-11-13T19:07:23.000Z 119180 9763
#> dislikes comment_count comments_disabled ratings_disabled
#> 1 2966 15954 FALSE FALSE
#> 2 6146 12703 FALSE FALSE
#> 3 5339 8181 FALSE FALSE
#> 4 666 2146 FALSE FALSE
#> 5 1989 17518 FALSE FALSE
#> 6 511 1434 FALSE FALSE
#> video_error_or_removed
#> 1 FALSE
#> 2 FALSE
#> 3 FALSE
#> 4 FALSE
#> 5 FALSE
#> 6 FALSE
str(vids)
#> 'data.frame': 13400 obs. of 12 variables:
#> $ trending_date : chr "17.14.11" "17.14.11" "17.14.11" "17.14.11" ...
#> $ title : chr "WE WANT TO TALK ABOUT OUR MARRIAGE" "The Trump Presidency: Last Week Tonight with John Oliver (HBO)" "Racist Superman | Rudy Mancuso, King Bach & Lele Pons" "Nickelback Lyrics: Real or Fake?" ...
#> $ channel_title : chr "CaseyNeistat" "LastWeekTonight" "Rudy Mancuso" "Good Mythical Morning" ...
#> $ category_id : int 22 24 23 24 24 28 24 28 1 25 ...
#> $ publish_time : chr "2017-11-13T17:13:01.000Z" "2017-11-13T07:30:00.000Z" "2017-11-12T19:05:24.000Z" "2017-11-13T11:00:04.000Z" ...
#> $ views : int 748374 2418783 3191434 343168 2095731 119180 2103417 817732 826059 256426 ...
#> $ likes : int 57527 97185 146033 10172 132235 9763 15993 23663 3543 12654 ...
#> $ dislikes : int 2966 6146 5339 666 1989 511 2445 778 119 1363 ...
#> $ comment_count : int 15954 12703 8181 2146 17518 1434 1970 3432 340 2368 ...
#> $ comments_disabled : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
#> $ ratings_disabled : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
#> $ video_error_or_removed: logi FALSE FALSE FALSE FALSE FALSE FALSE ...
YouTube’s US Trending Videos adalah kumpulan 200 video trending di US per harinya sejak 2017-11-14 hingga 2018-01-21. Berikut adalah deskripsi kolomnya:
General information relating to video:
* trending_date: tanggal trending *
title: judul video * channel_title:
nama channel Youtube * category_id: kategori video *
publish_time: tanggal upload video *
comment_disabled: apakah kolom komentar tidak
diaktifkan * rating_disabled: apakah rating video tidak
diaktifkan * video_error_or_removed: apakah video
dihapus
Statistics on a particular date: * views: jumlah views * likes: jumlah likes * dislikes: jumlah dislikes * comment_count: jumlah komentar
Eksplor data anda! Apakah tiap kolom sudah memiliki tipe data yang tepat?
Data wrangling adalah istilah lain dari data cleaning. Salah satu contohnya adalah mengubah tipe data dan subsetting baris/kolom tertentu.
lubridatelubridate adalah package yang sangat powerful untuk mengolah data waktu dan tanggal.
Sebelumnya kita mengubah data ke tipe date dengan menggunakan
as.Date():
YEAR
%Y = YYYY (2020) %y = YY (20)
MONTH
%B = month name e.g. March
%b = month name(abbreviation) e.g. Mar
%m = 2 digits mo e.g. 03
%M = 1 digit mo e.g. 3
DAY
%A = weekday e.g. Friday
%d = weekday digit.
ubah trending_date menjadi tipe data date:
head(vids$trending_date)
#> [1] "17.14.11" "17.14.11" "17.14.11" "17.14.11" "17.14.11" "17.14.11"
base_date <- as.Date(x = vids$trending_date, format = "%y.%d.%m" )
head(base_date)
#> [1] "2017-11-14" "2017-11-14" "2017-11-14" "2017-11-14" "2017-11-14"
#> [6] "2017-11-14"
menggunakan lubridate:
library(lubridate)
a <- "19/04/22"
b <- "Tuesday, 19-04-2022"
c <- "April 19, 2022"
d <- "2022/04/19, 1:42PM"
# metode base
as.Date(a, "%d/%m/%y")
#> [1] "2022-04-19"
as.Date(d, format="%Y/%m/%d %h:%m")
#> [1] NA
# metode lubridate: masukan urutan d/m/y nya saja
a <- dmy(a)
dmy(b)
#> [1] "2022-04-19"
d <- ymd_hm(d)
class(a) #bertipe Date karena tidak memiliki informasi waktu
#> [1] "Date"
class(d) #bertipe POSIXct karena memiliki informasi waktu
#> [1] "POSIXct" "POSIXt"
### `sapply()` & `lapply()`
**sapply**: mengaplikasikan fungsi ke tiap baris secara bersamaan.
formula: `sapply(data, fungsi)`
untuk mengubah nilai menjadi nilai tertentu dapat digunakan fungsi `switch()`. Namun switch hanya dapat mengubah satu nilai (hanya 1 baris, tidak bisa seluruh baris):
```r
switch("1", # data
"1" = "Education", # kamus
"2" = "Travel",
"3" = "Music")
#> [1] "Education"
# # will return error
# switch(c("1","2"),
# "1" = "Education",
# "2" = "Travel",
# "3" = "Music")
Hal ini diatasi dengan sapply():
data <- c("1","2")
sapply(X = data, # data/kolom yang ingin diubah
FUN = switch, # fungsi
"1" = "Education", # kamus
"2" = "Travel",
"4" = "Music")
#> 1 2
#> "Education" "Travel"
Note:
switch() mentranslasikan nilai berdasarkan kamus. Bila
nilai tidak ada pada kamus, maka dihasilkan NULL.Mengubah category_id untuk tiap row dengan
switch() dengan bantuan sapply():
# ubah kolom `category_id` menjadi label aslinya
vids$category_id <- sapply(X = as.character(vids$category_id),
FUN = switch,
"1" = "Film and Animation",
"2" = "Autos and Vehicles",
"10" = "Music",
"15" = "Pets and Animals",
"17" = "Sports",
"19" = "Travel and Events",
"20" = "Gaming",
"22" = "People and Blogs",
"23" = "Comedy",
"24" = "Entertainment",
"25" = "News and Politics",
"26" = "Howto and Style",
"27" = "Education",
"28" = "Science and Technology",
"29" = "Nonprofit and Activism",
"43" = "Shows")
head(vids)
#> trending_date title
#> 1 17.14.11 WE WANT TO TALK ABOUT OUR MARRIAGE
#> 2 17.14.11 The Trump Presidency: Last Week Tonight with John Oliver (HBO)
#> 3 17.14.11 Racist Superman | Rudy Mancuso, King Bach & Lele Pons
#> 4 17.14.11 Nickelback Lyrics: Real or Fake?
#> 5 17.14.11 I Dare You: GOING BALD!?
#> 6 17.14.11 2 Weeks with iPhone X
#> channel_title category_id publish_time views
#> 1 CaseyNeistat People and Blogs 2017-11-13T17:13:01.000Z 748374
#> 2 LastWeekTonight Entertainment 2017-11-13T07:30:00.000Z 2418783
#> 3 Rudy Mancuso Comedy 2017-11-12T19:05:24.000Z 3191434
#> 4 Good Mythical Morning Entertainment 2017-11-13T11:00:04.000Z 343168
#> 5 nigahiga Entertainment 2017-11-12T18:01:41.000Z 2095731
#> 6 iJustine Science and Technology 2017-11-13T19:07:23.000Z 119180
#> likes dislikes comment_count comments_disabled ratings_disabled
#> 1 57527 2966 15954 FALSE FALSE
#> 2 97185 6146 12703 FALSE FALSE
#> 3 146033 5339 8181 FALSE FALSE
#> 4 10172 666 2146 FALSE FALSE
#> 5 132235 1989 17518 FALSE FALSE
#> 6 9763 511 1434 FALSE FALSE
#> video_error_or_removed
#> 1 FALSE
#> 2 FALSE
#> 3 FALSE
#> 4 FALSE
#> 5 FALSE
#> 6 FALSE
# ubah kolom `category_id` menjadi tipe factor
vids$category_id <- as.factor(vids$category_id)
# cek data
str(vids)
#> 'data.frame': 13400 obs. of 12 variables:
#> $ trending_date : chr "17.14.11" "17.14.11" "17.14.11" "17.14.11" ...
#> $ title : chr "WE WANT TO TALK ABOUT OUR MARRIAGE" "The Trump Presidency: Last Week Tonight with John Oliver (HBO)" "Racist Superman | Rudy Mancuso, King Bach & Lele Pons" "Nickelback Lyrics: Real or Fake?" ...
#> $ channel_title : chr "CaseyNeistat" "LastWeekTonight" "Rudy Mancuso" "Good Mythical Morning" ...
#> $ category_id : Factor w/ 16 levels "Autos and Vehicles",..: 11 4 2 4 4 13 4 13 5 9 ...
#> $ publish_time : chr "2017-11-13T17:13:01.000Z" "2017-11-13T07:30:00.000Z" "2017-11-12T19:05:24.000Z" "2017-11-13T11:00:04.000Z" ...
#> $ views : int 748374 2418783 3191434 343168 2095731 119180 2103417 817732 826059 256426 ...
#> $ likes : int 57527 97185 146033 10172 132235 9763 15993 23663 3543 12654 ...
#> $ dislikes : int 2966 6146 5339 666 1989 511 2445 778 119 1363 ...
#> $ comment_count : int 15954 12703 8181 2146 17518 1434 1970 3432 340 2368 ...
#> $ comments_disabled : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
#> $ ratings_disabled : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
#> $ video_error_or_removed: logi FALSE FALSE FALSE FALSE FALSE FALSE ...
lapply: untuk mengaplikasikan fungsi (misal mengubah tipe data) ke banyak kolom secara bersamaan.
formula: lapply(data, fungsi)
Note: Di bawah adalah contoh penggunaan lapply(), namun pada kasus ini tidak wajib dilakukan.
# cara base
vids$views <- as.numeric(vids$views)
vids$likes <- as.numeric(vids$likes)
vids$dislikes <- as.numeric(vids$dislikes)
vids$comment_count <- as.numeric(vids$comment_count)
vids[,c("views","likes","dislikes","comment_count")] <- lapply(vids[,c("views","likes","dislikes","comment_count")], as.numeric)
str(vids)
#> 'data.frame': 13400 obs. of 12 variables:
#> $ trending_date : chr "17.14.11" "17.14.11" "17.14.11" "17.14.11" ...
#> $ title : chr "WE WANT TO TALK ABOUT OUR MARRIAGE" "The Trump Presidency: Last Week Tonight with John Oliver (HBO)" "Racist Superman | Rudy Mancuso, King Bach & Lele Pons" "Nickelback Lyrics: Real or Fake?" ...
#> $ channel_title : chr "CaseyNeistat" "LastWeekTonight" "Rudy Mancuso" "Good Mythical Morning" ...
#> $ category_id : Factor w/ 16 levels "Autos and Vehicles",..: 11 4 2 4 4 13 4 13 5 9 ...
#> $ publish_time : chr "2017-11-13T17:13:01.000Z" "2017-11-13T07:30:00.000Z" "2017-11-12T19:05:24.000Z" "2017-11-13T11:00:04.000Z" ...
#> $ views : num 748374 2418783 3191434 343168 2095731 ...
#> $ likes : num 57527 97185 146033 10172 132235 ...
#> $ dislikes : num 2966 6146 5339 666 1989 ...
#> $ comment_count : num 15954 12703 8181 2146 17518 ...
#> $ comments_disabled : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
#> $ ratings_disabled : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
#> $ video_error_or_removed: logi FALSE FALSE FALSE FALSE FALSE FALSE ...
Feature engineering adalah membuat kolom/variabel baru berdasarkan data yang ada. Hal ini berguna untuk mengekstrak informasi tambahan yang bisa digunakan untuk eksplorasi data dan modeling.
publish_time ke
dalam kolom baru publish_hourvids$publish_hour <- hour(vids$publish_time)
head(vids,3)
#> trending_date title
#> 1 17.14.11 WE WANT TO TALK ABOUT OUR MARRIAGE
#> 2 17.14.11 The Trump Presidency: Last Week Tonight with John Oliver (HBO)
#> 3 17.14.11 Racist Superman | Rudy Mancuso, King Bach & Lele Pons
#> channel_title category_id publish_time views likes
#> 1 CaseyNeistat People and Blogs 2017-11-13T17:13:01.000Z 748374 57527
#> 2 LastWeekTonight Entertainment 2017-11-13T07:30:00.000Z 2418783 97185
#> 3 Rudy Mancuso Comedy 2017-11-12T19:05:24.000Z 3191434 146033
#> dislikes comment_count comments_disabled ratings_disabled
#> 1 2966 15954 FALSE FALSE
#> 2 6146 12703 FALSE FALSE
#> 3 5339 8181 FALSE FALSE
#> video_error_or_removed publish_hour
#> 1 FALSE 0
#> 2 FALSE 0
#> 3 FALSE 0
unique(vids$publish_hour)
#> [1] 0
publish_when dengan membagi
publish_hour menjadi beberapa periode (Day-Night)
menggunakan ifelse():vids$publish_when <- ifelse(test = vids$publish_hour > 12, yes = "Night", no="Day")
# cek hasil
head(vids)
#> trending_date title
#> 1 17.14.11 WE WANT TO TALK ABOUT OUR MARRIAGE
#> 2 17.14.11 The Trump Presidency: Last Week Tonight with John Oliver (HBO)
#> 3 17.14.11 Racist Superman | Rudy Mancuso, King Bach & Lele Pons
#> 4 17.14.11 Nickelback Lyrics: Real or Fake?
#> 5 17.14.11 I Dare You: GOING BALD!?
#> 6 17.14.11 2 Weeks with iPhone X
#> channel_title category_id publish_time views
#> 1 CaseyNeistat People and Blogs 2017-11-13T17:13:01.000Z 748374
#> 2 LastWeekTonight Entertainment 2017-11-13T07:30:00.000Z 2418783
#> 3 Rudy Mancuso Comedy 2017-11-12T19:05:24.000Z 3191434
#> 4 Good Mythical Morning Entertainment 2017-11-13T11:00:04.000Z 343168
#> 5 nigahiga Entertainment 2017-11-12T18:01:41.000Z 2095731
#> 6 iJustine Science and Technology 2017-11-13T19:07:23.000Z 119180
#> likes dislikes comment_count comments_disabled ratings_disabled
#> 1 57527 2966 15954 FALSE FALSE
#> 2 97185 6146 12703 FALSE FALSE
#> 3 146033 5339 8181 FALSE FALSE
#> 4 10172 666 2146 FALSE FALSE
#> 5 132235 1989 17518 FALSE FALSE
#> 6 9763 511 1434 FALSE FALSE
#> video_error_or_removed publish_hour publish_when
#> 1 FALSE 0 Day
#> 2 FALSE 0 Day
#> 3 FALSE 0 Day
#> 4 FALSE 0 Day
#> 5 FALSE 0 Day
#> 6 FALSE 0 Day
Bisa juga untuk > 2 kondisi [Optional]:
# x = data
pw <- function(x){
if(x < 8){
x <- "12am to 8am"
}else if(x >= 8 & x < 16){
x <- "8am to 4pm"
}else{
x <- "4pm to 12am"
}
}
# gunakan `sapply()` untuk aplikasikan ke seluruh baris
temp <- sapply(vids$publish_hour, pw)
# cek hasil
head(vids$publish_hour)
#> [1] 0 0 0 0 0 0
head(temp)
#> [1] "12am to 8am" "12am to 8am" "12am to 8am" "12am to 8am" "12am to 8am"
#> [6] "12am to 8am"
match()Dalam data vids terdapat redudansi data yaitu terdapat
video yang muncul beberapa kali karena trending lebih dari 1 hari.
length(vids$title)
#> [1] 13400
length(unique(vids$title))
#> [1] 2986
Untuk analisis lanjutan, kita hanya akan menggunakan data saat
video tersebut pertama kali trending demi mengurangi redudansi
data. Untuk itu kita dapat menggunakan unique() dan
match().
Contoh:
# dummy data
df <- data.frame(nama = c("Lita", "Lita", "Nurul", "Dwi"),
umur = c(22,23,22,22))
df
#> nama umur
#> 1 Lita 22
#> 2 Lita 23
#> 3 Nurul 22
#> 4 Dwi 22
# mengambil nama unique
unique(df$nama)
#> [1] "Lita" "Nurul" "Dwi"
# mencari index saat nama unique pertama kali muncul
index <- match(unique(df$nama), df$nama)
# pada index berapa `unique(df$nama)` cocok/match dengan `df$nama`
index
#> [1] 1 3 4
# filter data yang termasuk index
df[index, ]
#> nama umur
#> 1 Lita 22
#> 3 Nurul 22
#> 4 Dwi 22
Aplikasikan pada data vids:
index.vids <- match(unique(vids$title), vids$title)
vids.u <- vids[index.vids,] #mulai melakukan subsetting khusus untuk video yang unik saja
head(vids.u)
#> trending_date title
#> 1 17.14.11 WE WANT TO TALK ABOUT OUR MARRIAGE
#> 2 17.14.11 The Trump Presidency: Last Week Tonight with John Oliver (HBO)
#> 3 17.14.11 Racist Superman | Rudy Mancuso, King Bach & Lele Pons
#> 4 17.14.11 Nickelback Lyrics: Real or Fake?
#> 5 17.14.11 I Dare You: GOING BALD!?
#> 6 17.14.11 2 Weeks with iPhone X
#> channel_title category_id publish_time views
#> 1 CaseyNeistat People and Blogs 2017-11-13T17:13:01.000Z 748374
#> 2 LastWeekTonight Entertainment 2017-11-13T07:30:00.000Z 2418783
#> 3 Rudy Mancuso Comedy 2017-11-12T19:05:24.000Z 3191434
#> 4 Good Mythical Morning Entertainment 2017-11-13T11:00:04.000Z 343168
#> 5 nigahiga Entertainment 2017-11-12T18:01:41.000Z 2095731
#> 6 iJustine Science and Technology 2017-11-13T19:07:23.000Z 119180
#> likes dislikes comment_count comments_disabled ratings_disabled
#> 1 57527 2966 15954 FALSE FALSE
#> 2 97185 6146 12703 FALSE FALSE
#> 3 146033 5339 8181 FALSE FALSE
#> 4 10172 666 2146 FALSE FALSE
#> 5 132235 1989 17518 FALSE FALSE
#> 6 9763 511 1434 FALSE FALSE
#> video_error_or_removed publish_hour publish_when
#> 1 FALSE 0 Day
#> 2 FALSE 0 Day
#> 3 FALSE 0 Day
#> 4 FALSE 0 Day
#> 5 FALSE 0 Day
#> 6 FALSE 0 Day
dim(vids.u)
#> [1] 2986 14
Missing value (NA) dapat menyulitkan pengolahan data. Oleh karena itu perlu dideteksi dan bila ada perlu diberi perlakuan.
# cek keseluruhan data
anyNA(vids.u)
#> [1] FALSE
# cek jumlah NA per kolom
colSums(is.na(vids.u))
#> trending_date title channel_title
#> 0 0 0
#> category_id publish_time views
#> 0 0 0
#> likes dislikes comment_count
#> 0 0 0
#> comments_disabled ratings_disabled video_error_or_removed
#> 0 0 0
#> publish_hour publish_when
#> 0 0
Exploratory Data Analysis (EDA) Bertujuan untuk mendapat informasi dari data (explorasi). EDA dapat dilakukan menggunakan base plot.
Tujuan: cek distribusi data.
Contoh, pada jam berapa saja video trending banyak dipublish?
bagaimana distribusi publish_hour dari data
vids.u?
hist(vids.u$publish_hour,
breaks = 20,
xlim = c(0,25),
xaxt = "n")
axis(side=1, at=seq(0,25,5))
Insight:
Tujuan: cek distribusi data dan outlier dari data.
Contoh, untuk pertanyaan yang sama seperti di atas:
boxplot(vids.u$publish_hour)
Insight:
plot()Tujuan: menyajikan beragam tipe plot sesuai tipe data yang dimasukkan.
bar chart -> frekuensi tiap
kategoriscatterplot -> sebaran datascatterplot
-> hubungan antar databoxplot -> perbandingan
distribusi tiap kategori# plot()
plot(vids.u$category_id, horiz=T, las=2)
Business Question:
Kita tertarik dengan category_id “Autos and Vehicles”, “Gaming”, dan
“Travel and Events”. Dari ketiga kategori tersebut, adakah hubungan
antara nilai likes/view dan dislikes/view?
Tahapan:
vids.u untuk kategori di atas dan simpan ke
objek vids.agt# vids.agt <- vids.u[vids.u$category_id == "Autos and Vehicles" | vids.u$category_id == "Gaming" | vids.u$category_id == "Travel and Events",]
vids.agt <- vids.u[vids.u$category_id %in% c("Autos and Vehicles", "Gaming", "Travel and Events"),]
likesp berisi likes/view dan
dislikesp berisi dislikes/view:vids.agt$likesp <- vids.agt$likes/vids.agt$views
vids.agt$dislikesp <- vids.agt$dislikes/vids.agt$views
head(vids.agt)
#> trending_date title
#> 31 17.14.11 I TOOK THE $3,000,000 LAMBO TO CARMAX! They offered me......
#> 35 17.14.11 New Emirates First Class Suite | Boeing 777 | Emirates
#> 59 17.14.11 Train Swipes Parked Vehicle
#> 132 17.14.11 L.A. Noire - Nintendo Switch Trailer
#> 164 17.14.11 Caterham Chris Hoy 60 Second Donut Challenge
#> 198 17.14.11 Inside Keanu Reeves' Custom Motorcycle Shop | WIRED
#> channel_title category_id publish_time views likes
#> 31 hp_overload Autos and Vehicles 2017-11-13T01:43:12.000Z 98378 4035
#> 35 Emirates Travel and Events 2017-11-12T05:55:42.000Z 141148 1661
#> 59 ViralHog Autos and Vehicles 2017-11-13T00:46:11.000Z 7265 89
#> 132 Nintendo Gaming 2017-11-09T19:59:48.000Z 154872 7683
#> 164 Caterham Cars Autos and Vehicles 2017-11-09T09:59:31.000Z 4850 22
#> 198 WIRED Autos and Vehicles 2017-11-08T15:00:27.000Z 704363 16352
#> dislikes comment_count comments_disabled ratings_disabled
#> 31 495 486 FALSE FALSE
#> 35 70 236 FALSE FALSE
#> 59 8 22 FALSE FALSE
#> 132 164 1734 FALSE FALSE
#> 164 1 1 FALSE FALSE
#> 198 224 841 FALSE FALSE
#> video_error_or_removed publish_hour publish_when likesp dislikesp
#> 31 FALSE 0 Day 0.041015268 0.0050316128
#> 35 FALSE 0 Day 0.011767790 0.0004959333
#> 59 FALSE 0 Day 0.012250516 0.0011011700
#> 132 FALSE 0 Day 0.049608709 0.0010589390
#> 164 FALSE 0 Day 0.004536082 0.0002061856
#> 198 FALSE 0 Day 0.023215302 0.0003180178
Tipe plot apa yang kira-kira sesuai?
plot(vids.agt$likesp,vids.agt$dislikesp)
cor(vids.agt$likesp, vids.agt$dislikesp)
#> [1] 0.1712322
Insight: likes per view dan dilikes per view pada data
vids.agt memiliki korelasi yang rendah
THANKYOUUUU :)