R Notebook

#Question 1: Run the following code to load the car dataset. What is the median of the first column?

data("cars")
median(cars[, 1])

## [1] 15

#Question 2: What is the maximum of daily close price for BTC in the data?

library(pacman)
pacman::p_load(jsonlite)
BTC <- fromJSON("https://min-api.cryptocompare.com/data/v2/histoday?fsym=BTC&tsym=USD&limit=100")
BTC_data <- BTC$Data
BTC_data_2 <- BTC_data$Data$close
max(BTC_data_2)

## [1] 69020.94

#Question 3: Prepare your final project phase I by showing your process, code, preliminary data descriptive results, and answers ##Identify a topic of interest and give your project a name/title ### Top 100 IMDB Movies

##Phrase 3-5 research questions you would like to explore. ### What is the mean rating of the top 100 IMDB movies? ### Are certain genres more likely to achieve higher IMDB ratings? ### How does the distribution of the top 100 IMDB movies vary by decade?

##List the data sources that your find that are relevant with your research questions. ### From Kaggle: https://www.kaggle.com/datasets/mayurkadam9833/top-100-imdb-movies?resource=download

##Extract one or more relevant datasets associate with your research questions, either import the downloaded dataset(s), extract from APIs, or ethically scrape the web, etc.

movies <- read.csv("TOP 100 IMDB Movies.csv")

##Describe your data extracted, statistically and/or visually. ### Column labels: Rank, title, description, genre, rating, year ### Rating is on a 1-10 scale ### The ranking of the movie based on IMDb’s criteria ### Description: A brief description or synopsis of the movie. ### Genre: A list of genres the movie falls under (e.g., Drama, Action, Thriller).

##Perform necessary data cleaning and manipulation especially if the raw data contains special values or not directly in the format that can answer your research questions. ### Data already cleaned

##List future data preparation work needed if any. ### change categorical variables to numerical to perform certain analyses ### reorder the data set to rank in order(i.e. highest to lowest) ### TBD