University Solutions Hub provides Big Data Tools Week 5 solution (Big Data Tools).
Create a new connection to your Spark cluster -. sc
Make sure that you are using the correct version of Java
Execute the following code:
summarize_all(cars, max)
summarize_all(cars, min)
summarize_all(cars, mean)
summarize_all(cars, mean)%>%
show_query()
cars %>%
mutate(transmission = ifelse(am ==0, "automatic", "manual")) %>%
group_by(transmission) %>%
summarize_all(mean)
Submit a Word doc with screens showing the results of the code along with a timestamp in R and explain in detail what each element is doing in each line of code.
Read this article about doing statistics with categorical variables. Write at least 500 words discussing how to use these statistics to help understand big data.