This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
#Question1
Three columns pose ambiguity unless the documentation is consulted:
date_x: Despite its name suggesting a generic date, the values appear more like release dates specific to movies. The documentation clarifies that it denotes the release date.
score: The nature and calculation method of this score are unclear without referencing the documentation. The documentation specifies that it represents the Rotten Tomatoes critic score.
country: As the values are in abbreviations, discerning the corresponding countries requires consultation of the documentation, which provides the mapping to full country names.
While the concise column names and values aid in saving space, they may cause confusion or misinterpretation if their meanings are not apparent. Consulting the documentation is crucial to prevent incorrect assumptions.
#Question2
One unclear element even after reading documentation:
budget_x - The documentation does not explain what currency the budget is represented in. This could lead to incorrect analysis when comparing budget to revenue if they are in different currencies.
#Question3
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggthemes)
df <- read.csv("C:/Users/DELL/Downloads/imdb.csv")
ggplot(df, aes(x=budget_x, y=revenue)) +
geom_point() +
labs(x='Budget (Unclear Currency)',
y='Revenue (USD)',
title='Unclear Budget Currency Creates Risk of Incorrect Analysis') +
annotate('text', x=200000000, y=2500000000,
label='Is budget in $USD like revenue?')
This scatter plot illustrates the potential for erroneous comparisons between budget and revenue when the currency used for budget is ambiguous. I’ve included a note highlighting this concern. To mitigate this risk, it is essential to specify the currency for the budget and make any necessary conversions before conducting the analysis. Neglecting to address such ambiguous data can result in inaccurate conclusions.