2024-03-14

Introduction

  • overview of the steps involved in data mining
    • goal definition and ending with model deployment. -
  • The general steps are
    • data collection,
    • cleaning, and
    • preprocessing.

  • General definitions of data mining.
    • what has come to be called predictive analyt- ics, the tasks of classification and prediction as well as pattern discovery
    • which have become key elements of a “business analytics” function in most large firms.

Core Idea in Data Mining

Data level summary

The following is the structure of the data

## 'data.frame':    5802 obs. of  14 variables:
##  $ TOTAL.VALUE: num  344 413 330 499 332 ...
##  $ TAX        : int  4330 5190 4152 6272 4170 4244 4521 4030 4195 5150 ...
##  $ LOT.SQFT   : int  9965 6590 7500 13773 5000 5142 5000 10000 6835 5093 ...
##  $ YR.BUILT   : int  1880 1945 1890 1957 1910 1950 1954 1950 1958 1900 ...
##  $ GROSS.AREA : int  2436 3108 2294 5032 2370 2124 3220 2208 2582 4818 ...
##  $ LIVING.AREA: int  1352 1976 1371 2608 1438 1060 1916 1200 1092 2992 ...
##  $ FLOORS     : num  2 2 2 1 2 1 2 1 1 2 ...
##  $ ROOMS      : int  6 10 8 9 7 6 7 6 5 8 ...
##  $ BEDROOMS   : int  3 4 4 5 3 3 3 3 3 4 ...
##  $ FULL.BATH  : int  1 2 1 1 2 1 1 1 1 2 ...
##  $ HALF.BATH  : int  1 1 1 1 0 0 1 0 0 0 ...
##  $ KITCHEN    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ FIREPLACE  : int  0 0 0 1 0 1 0 0 1 0 ...
##  $ REMODEL    : chr  "None" "Recent" "None" "None" ...

This data contain 5802 observation and 14 variables

## [1] 5802   14

head(data) tail(data) ```

The Steps in Data Mining

Preliminary Steps

  • Contents

Predicitve Power and Overfitting

  • Contents

Building a Predicitive Model

  • Contents

Using R for Data Mining on a Local Machine

  • Contents

Automating Data Mining

  • Contents