DBI
package
is still trivial , i have resorted to creating new columns USING R
instead ,thus am Calculating the return on investment as the
worldwide_gross/production_budget
.<-dat_new |>
dat_newselect(-1) |>
mutate(release_date = mdy(release_date)) |> #mdy is the setup of the date variable
mutate_if(is.character,as.factor) |>
mutate(roi = worldwide_gross / production_budget) |>
mutate(pct_domestic = domestic_gross / worldwide_gross) |>
mutate(year = year(release_date)) |>
mutate(month = month(release_date,label=T)) |>
mutate(day = as.factor(wday(release_date,label=T))) |>
arrange(desc(release_date)) |>
filter(worldwide_gross > 0) |>
filter(!is.na(distributor)) |>
filter(!is.na(mpaa_rating)) |>
select(year,month,day,everything())
<-dbConnect(RSQLite::SQLite(), ":memory:")
concopy_to(con,dat_new)
copy_to(con,out_new)
dbGetQuery(con,'SELECT *
FROM dat_new
LIMIT 5')
release_date
now looks weird
, hope we will take care of that along the wayyear,month and day
dbGetQuery(con,'SELECT *
FROM dat_new
LIMIT 5')
Question 1
>=1975
) but before 2018
(<2018
).dbGetQuery(con,'SELECT day,AVG(worldwide_gross) AS avg_rev
FROM dat_new
WHERE year BETWEEN 1975 AND 2017
GROUP BY day
ORDER BY avg_rev DESC')
Question 2
dbGetQuery(con,'SELECT month,AVG(worldwide_gross) AS avg_rev
FROM dat_new
WHERE year BETWEEN 1975 AND 2017
GROUP BY month
ORDER BY avg_rev DESC')
It looks like summer months and holiday months at the end of the year fare well.
Question 3
dbGetQuery(con,'SELECT year,AVG(worldwide_gross) AS avg_rev
FROM dat_new
WHERE year BETWEEN 1975 AND 2017
GROUP BY year
ORDER BY avg_rev DESC
LIMIT 10')
Question 4
dbGetQuery(con,'SELECT genre,AVG(worldwide_gross) AS avg_rev
FROM dat_new
WHERE year BETWEEN 1975 AND 2017
GROUP BY genre
ORDER BY avg_rev DESC
')
Question 5
dbGetQuery(con,'SELECT genre,AVG(roi) AS avg_roi
FROM dat_new
WHERE year BETWEEN 1975 AND 2017
GROUP BY genre
ORDER BY avg_roi DESC
')
aha! , horror movies have the largest
return on investment
Question 6
What were those super profitable movies? Looks like they’re mostly horror movies.
dbGetQuery(con,'SELECT movie,roi
FROM dat_new
WHERE year BETWEEN 1975 AND 2017
ORDER BY roi DESC
LIMIT 10
')
the highest most profitable is a
horror
movie , i still think over40%
of those in the top 10 are horror movies
What about rating
What about movie ratings? R-rated movies have a lower average revenue but ROI isn’t substantially less.
dbGetQuery(con,'SELECT mpaa_rating ,AVG(worldwide_gross) AS mean_rev,
SUM(worldwide_gross) AS total_rev,
AVG(roi) AS roi,
COUNT(*) AS number
FROM dat_new
WHERE year BETWEEN 1975 AND 2017
GROUP BY mpaa_rating
')
We can see that while G-rated movies have the highest mean revenue, there were relatively few of them produced, and had a lower total revenue. There were more R-rated movies, but PG-13 movies really drove the total revenue worldwide.
dbGetQuery(con,'SELECT mpaa_rating ,AVG(imdb) AS mean_imdb,
AVG(votes) AS avg_votes
FROM dat_new AS d
INNER JOIN out_new AS r
ON d.movie=r.movie
WHERE year BETWEEN 1975 AND 2017
GROUP BY mpaa_rating')
dbGetQuery(con,'SELECT genre ,AVG(imdb) AS mean_imdb,
AVG(votes) AS avg_votes
FROM dat_new AS d
INNER JOIN out_new AS r
ON d.movie=r.movie
WHERE year BETWEEN 1975 AND 2017
GROUP BY genre')