Data Mining (95-791 Z4)

Project Overview

The analytical project in this course is a great opporunity for you to apply the knowledge of data mining and R programming skills to a real life analytical problem. Different from the data set you encounter in homework, the data sets we use in this project are often messy, dirty and uncertain, and problem are not necessairly clearly defined. This in fact reflects the reality of the problem we’re trying to solve in the real world.

Project Selection and Teams

  1. There are 8 projects for you select from, the details of the projects are available from this link. One project can be picked by multiple teams, however, the work between teams can not be shared.

  2. You can work in a team of 1-3 members. We will adjust the expectation of amount of work according to the team size. You may try to team up yourself, or you may use the “Search for Teammate” function in Piazza to find a team mate.

  3. The project selection and team needs to be finalized by the end of 1st week (March 26), however you’re strongly encouraged to decide early to start early. Please fill the sign up sheet for project and team here

Weekly progreess update

Starting from second week, your team will be asked to submit a progress update in a shared google doc with the rest of the class and instruction team. In this update, you will write a paragraph summarizing following: * what has been done? * what are the obstacles? * what is your plan for next week?

Instructors and TA will use this doc as a onging communication tool with your team during the project phase, giving you advices and guidance.

“Client”" Interaction

The “client” for your project will be either Karen or Yixin. Just like a real life consulting project, you’re strongly encouraged to interact with “client” to seek clarification or expectation during the course of your project. At the same time, we’re also the mentors for your project to help you to deliver a successful project. Depending on your preference, you may interact through forum Q&A (please use your specific project tag) or schedule video conference (such as google hangout) at times mutually convienient, in which case, please give us advance notice.

Report Writing

It is highly recommeneded that you present your result using RMarkdown. This is a great tool that allows you to seemingly integrate your analysis in R with the report production process. If you use RMarkdown, you may submit your report in html (in which case you may publish your report to Rpubs.com). Please refer to RMarkdown cheat sheet.

The deadline for submitting final report is midnight of May 14th.

Your report should include following sections:
* Abstract
* Introduction
* Method and result
* Conclusion and furture work
* Your take away from this project (a refelection on project execution, team work, what work, what doesn’t, how would you have done to make it a better project etc.)

Grading Criteria

The grading of the project will be based on following componets: * The understanding of problem
* The appropraiteness of the method
* Interpretation of results
* Clarity of reporting writing
* Creativity (The questions in each project serves only guideline, you’re encouraged to come up with other analysis idea that could be interesting/valuable to the “clients”)

All team members receive the same grade unless there is request to grade independently in case of severe unbalanced work load.

Advice

  1. Decide Early! Start Early!
  2. Think along the course what kind of method is applicable and incrementally build up your analysis(you’re welcomed to share with us your work-in-progress and receive feedback)
  3. Enjoy the immersive experience!