University Solutions Hub provides Big Data Tools Week 2 solution (Big Data Tools).
Following the instrutions in the book, install the sparklyr library and a Spark cluster on your local machine. The instructions in the book contemplate that you will be using Windows. If you are not running Windows, use a virtual machine. If you are using your employer’s equipment, you may enounter a trusted domain error. This will also require that you install a virtual machine. The code in the book assumes Spark version 2.3 and Java 8. You are free to use any version, of course, but the code may not run and you may have to modify it to complete the assignment. Be sure to set your system to be using Java 8. You can use the Sys.setenv() function to set JAVA_HOME with the path to jre1.8.0_291 (the latest release of Java 8)
Submit a Word document with screenshots from your computer showing R studio and a time stamp. Your screenshots should show the console in R studio with the following:
Write at least 500 words discussing what Spark is and does. Explain what problems it solves.
Use at least three sources. Include at least 3 quotes from your sources enclosed in quotation marks and cited in-line by reference to your reference list. Example: “words you copied” (citation) These quotes should be one full sentence not altered or paraphrased. Cite your sources using APA format. Use the quotes in your paragraphs.