Data-Cleaning-Assignment

Introduction:

This document explains a project done by me as part of assignment of ONLINE Course “Getting Data and Cleaning” offered by Johns Hopkins University through Coursera.com.

Objective:

The purpose of this project is to demonstrate the ability to collect, work with, and clean a data set. The goal is to prepare tidy data that can be used for later analysis.

Activities to be done:

  1. Create one R script called run_analysis.R that does the following.
  2. Merges the training and the test sets to create one data set.
  3. Extracts only the measurements on the mean and standard deviation for each measurement.
  4. Uses descriptive activity names to name the activities in the data set
  5. Appropriately labels the data set with descriptive variable names.
  6. From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.

Environment:

Hardware: 
Macbook Pro with OSX Yosimite Version 10.10.3
Processor: 2.4 GHz Intel Core Duo
Memory: 8 GB 1067 MHz DDR3
Internet Connection

Tools:

RStidio 
R for Mac OS X GUI
GitHub Repository : https://github.com/nvramamoorthy/Data-Cleaning-Assignment
GitHub DeskTop

Data Source:

http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones 
https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip 

My Activities:

Created a project workspace on my Macbook and set that as working directory.

 Working Directory:
 
 "/Users/siddharth/Desktop/Data-Scientist/DataCleaning/Data-Cleaning-Assignment"

Unzipped the data downloaded from the source mentioned above and renamed the Folder as “UCI-HAR-Dataset” where the raw data resides.

Analysed the objectives of the project and defined the data fields and functions required to clean the source Dataset and prepare the Tidy data as per the requirement of the project stated above.

A details of Data Variables, and Functions to get the objectives done are explained in a separate file CodeBook.md

Coded run_analysis.R to perform the main Job of the project.

The README.md and CodeBokk.md are authored by me to explain the functioning of run_analysis.R

Initial Files & Folders in Working Directory:

    Folder : UCI-HAR-Dataset
    Files  : README.md (initially created while creating Repository in GitHub)
    

Final Files & Folders in Working Directory:

    Folder: UCI-HAR-Dataset
    Files : 
            Created by me :
            run_analysis.R
            README.md
            CodeBook.md
            
            Created by the Code run_analysis.R 
            mergedData.txt
            tidyDataWithMeans.txt
            

Created Reopsitory on GitHub and cloned the working directory worked on the project and committed and synchornized so that Peers can evaluated my work. and pushed my work.

Web Link : https://github.com/nvramamoorthy/Data-Cleaning-Assignment

Acknowledgements:

I thank all staff and instructore of Johns Hopkins , Coursera and all peers who patiently evaluated my work.