R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

#Question1

Three columns that are unclear without reading the documentation:

1.date_x - The name implies this is a date, but the values look more like release dates than general dates associated with the movie. The documentation clarifies this is the release date.

2.score - It’s not clear what this score refers to or how it was calculated without the documentation. The documentation indicates it is the Rotten Tomatoes critic score.

3.country - The values are abbreviations, so it is unclear what countries they refer to without consulting the documentation, which provides the mapping to country names.

The abbreviated column names and values help save space, but can lead to confusion or misinterpretation if the meanings are not clear. Reading the documentation helps avoid incorrect assumptions.

#Question2

One unclear element even after reading documentation:

budget_x - The documentation does not explain what currency the budget is represented in. This could lead to incorrect analysis when comparing budget to revenue if they are in different currencies.

#Question3

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggthemes)
df <- read.csv("D:/gdp/cinema.csv")

ggplot(df, aes(x=budget_x, y=revenue)) +
  geom_point() +
  labs(x='Budget (Unclear Currency)',
       y='Revenue (USD)',
       title='Unclear Budget Currency Creates Risk of Incorrect Analysis') +

  annotate('text', x=200000000, y=2500000000,  
           label='Is budget in $USD like revenue?')

This scatter plot illustrates the potential for erroneous comparisons between budget and revenue when the currency used for budget is ambiguous. I’ve included a note highlighting this concern. To mitigate this risk, it is essential to specify the currency for the budget and make any necessary conversions before conducting the analysis. Neglecting to address such ambiguous data can result in inaccurate conclusions.