December 11th, 2017

Artificial Inteligence - A.I.

Inside AI

  • Many related areas;

What is AI?

  • Making computers that think?
  • The automation of activities we associate with human thinking, like decision making, learning … ?
  • The art of creating machines that perform functions that require intelligence when performed by people ?
  • The study of mental faculties through the use of computational models ?
  • Anything in Computing Science that we don't yet know how to do properly ? (!)

What is AI?

Systems that think like humans?

Systems that think rationally?

Systems that act like humans?

Systems that act rationally?

Machines that think like humans

  • Determine how humans think;
  • Get inside the actual workings of human minds;
    • Introspection - catch our own thoughts as they go by;
    • Psychological experiments;
  • Sufficiently precise theory of the mind:
    • Express the theory as a computer program;

Machines that think like humans

If the program's input/output and timing behavior matches human behavior, that is evidence that some of the program's mechanisms may also be operating in humans;

Machines that think like humans

  • How do we know how humans think?
  • Program correctly solving problems is not enough!
    • Comparing the trace of its reasoning steps to traces of human subjects solving the same problems;
  • Cognitive science: computer models from AI and experimental techniques from psychology
    • Precise and testable theories of the workings of the human mind;
    • Vision, natural language, and learning;

Machines that think like humans

The exciting new effort to make computers think … machines with minds, in the full and literal sense. (Haugeland, 1985)

The automation of activities that we associate with human thinking, activities such as decision-making, problem solving, learning … (Bellman, 1978)

Machines that act like humans

  • Intelligent behavior as the ability to achieve human-level performance in all cognitive tasks, sufficient to fool an interrogator.

Turing Test

Turing Test

  • Uses the "Imitation Game"
  • Usual method:
    • Three people play (man, woman, and interrogator)
    • Interrogator determines which of the other two is a woman by asking questions
      • Example: How long is your hair?
    • Typewritten or repeated by an intermediary

Turing Test

  • Requires success in:
    • Natural language processing: communicate with the interrogator;
    • Knowledge representation: store and retrieve what it knows;
    • Automated reasoning: use the stored information to answer questions and to draw new conclusions;
    • Machine learning: adapt to new circumstances and to detect and extrapolate patterns

Turing Test

  • Not a big effort to try to pass the Turing test;
  • Acting like a human:
    • When AI programs have to interact with people
    • e.g. when an expert system explains how it came to its diagnosis;
    • e.g. natural language processing system has a dialogue with a user.
  • When programs must behave according to certain normal conventions of human interaction?
  • Underlying representation and reasoning may or may not be based on a human model.

Machines that act like humans

The art of creating machines that perform functions that require intelligence when performed by people. (Kurzweil)

The study of how to make computers do things at which, at the moment, people are better. (Rich and Knight)

Machines that think rationally

  • Aristotle was one of the first to attempt to codify right thinking
    • Irrefutable reasoning processes;
    • Syllogisms: patterns for argument structures: always gave correct conclusions given correct premises;
    • Laws of thought supposed to govern the operation of the mind
  • Formal logic provide a precise notation for statements about all kinds of things in the world and the relations between them;

Machines that think rationally

  • Humans are not always rational.
    • Big difference between being able to solve a problem and doing so;
  • Rational - defined in terms of logic?
  • Logic can’t express everything: informal knowledge, knowledge is less than 100% certain;
  • Logical approach is often not feasible in terms of computation time (needs ‘guidance’)

Machines that think rationally

The study of mental faculties through the use of computational models. (Charniak and McDermott, 1985)

The study of the computations that make it possible to perceive, reason, and act. (Winston, 1992)

Machines that act rationally

  • Rational behavior: doing the right thing;
  • The right thing: that which is expected to maximize goal achievement, given the available information;
  • Correct inference is not all of rationality:
    • Situations where there is no provably correct thing to do, yet something must still be done;
    • Ways of acting rationally that cannot be reasonably said to involve inference:
      • pulling one's hand off of a hot stove is a reflex action that is more successful than a slower action taken after careful deliberation;

Machines that act rationally

A field of study that seeks to explain and emulate intelligent behavior in terms of computational processes. (Schalkoff, 1990)

The branch of computer science that is concerned with the automation of intelligent behavior. (Luger and Stubblefield, 1993)

What is inside A.I.?

  • Search (includes Game Playing).
  • Representing Knowledge and Reasoning with it.
  • Planning;
  • Learning;
  • Natural language processing.
  • Interacting with the Environment (e.g. Vision, Speech recognition, Robotics)

Machine Learning

Types of Learning

  • Supervised (inductive) learning
    • Training data includes desired outputs
  • Unsupervised learning
    • Training data does not include desired outputs
  • Semi-supervised learning
    • Training data includes a few desired outputs
  • Reinforcement learning
    • Rewards from sequence of actions

Supervised Learning

  • Prediction of future cases: Use the rule to predict the output for future inputs
  • Knowledge extraction: The rule is easy to understand
  • Compression: The rule is simpler than the data it explains
  • Outlier detection: Exceptions that are not covered by the rule, e.g., fraud

Unsupervised Learning

  • Learning “what normally happens”
  • No output
  • Clustering: Grouping similar instances
  • Other applications: Summarization, Association Analysis

Linear Regression

Linear Regression

  • Regression analysis is used to describe the relationship between:
  • A single response variable: \(Y\) ; and
  • One or more predictor variables: \(X_1\), \(X_2\),…, \(X_n\)
    • n = 1: Simple Regression
    • n > 1: Multivariate Regression

Linear Regression

The car R library: Companion to Applied Regression

install.packages("car")
library(car);

Uncompress dataset

data("mtcars")

Linear Regression

Load the dataset into a variable

cars.dataset <- mtcars;

Check the first lines of the dataset:

head(cars.dataset);
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Linear Regression

Check the summary of the data

summary(cars.dataset);
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000

Linear Regression

Check scatterplot

plot(x=cars.dataset$hp, y=cars.dataset$mpg);

Linear Regression

  • Model a continuous variable \(Y\) as a mathematical function of one or more \(X\) variable(s);
  • Predict the \(Y\) when \(X\) is known;
  • This mathematical equation can be generalized as follows:

\[ \widehat{Y}=\beta_0 + \beta_1X_1 + \beta_2X_2+...+\beta_nX_n \] * The predictors \(X_1\),…, \(X_n\) can be continuous, discrete or categorical variables.

Simple Linear Regression

\[ Y=\beta_0 + \beta_1X+\epsilon \]

  • \(\beta_0\) (Intercept): point in which the line intercepts the y-axis;
  • \(\beta_1\) (Slope): increase in Y per unit change in X.

Simple Linear Regression

We want to find the equation of the line that “best” fits the data. It means finding \(b_0\) and \(b_1\) such that the fitted values of \(y_i\), given by \[ \hat{y_i} = b_0+b_1x_i \] are as “close” as possible to the observed values \(y_i\).

Simple Linear Regression

Residuals *The difference between the observed value \(y_i\) and the fitted value $:

\[ e_i = y_i − \hat{y_i} \]

Simple Linear Regression

A usual way of calculating \(b_0\) and \(b_1\) is based on the minimization of the sum of the squared residuals, or residual sum of squares (RSS): \[\begin{eqnarray} RSS &=& \sum_{i}e_i^2 \\ &=& \sum_{i}(y_i - \hat{y}_i)^2\\ &=& \sum_{i}(y_i - b_0 + b_1x_i) \end{eqnarray}\]

Simple Linear Regression

Simple Linear Regression

Regression in R with lm() function:

cars.lm1 <- lm(mpg ~ hp, data = cars.dataset);
plot(x=cars.dataset$hp, y=cars.dataset$mpg);
abline(cars.lm1);

Simple Linear Regression

Check with summary() some details of the model:

summary(cars.lm1);
## 
## Call:
## lm(formula = mpg ~ hp, data = cars.dataset)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7121 -2.1122 -0.8854  1.5819  8.2360 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
## hp          -0.06823    0.01012  -6.742 1.79e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.863 on 30 degrees of freedom
## Multiple R-squared:  0.6024, Adjusted R-squared:  0.5892 
## F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07

Simple Linear Regression

Obtain fitted values with fitted():

plot(fitted(cars.lm1), residuals(cars.lm1));
abline(a=0,b=0); # Intercept and Slope

Assessing the Model quality

Residual Standard Error - RSE:

  • Derived from the Residual Sum of Squares - RSS;
  • Associated with each observation is an error term \(\epsilon\):

\[ y_i = b_0+b_1x_i+\epsilon_i \]

  • Even if we knew the true regression line, we would not be able to perfectly predict \(Y\) from \(X\);
  • The RSE is an estimate of the standard deviation of \(\epsilon\);
  • The average amount that the response will deviate from the true regression line

Assessing the Model quality

Residual Standard Error - RSE:

  • a measure of the lack of fit of the model to the data;
  • If the predictions obtained using the model are very close to the true outcome:
    • RSE will be small, and we can conclude that the model fits the data very well;
  • If \(\hat{y}_i\) is very far from \(y_i\) for one or more observations, then:
    • The RSE may be quite large, indicating that the model doesn’t fit the data well;

Assessing the Model quality

\(R^2\):

  • Provides an alternative measure to RSE;
  • "Unitless";
  • The proportion of variance explained;
  • Always takes on a value between 0 and 1;
  • Independent of the scale of \(Y\);

Assessing the Model quality

\(R^2\):

  • Statistic close to 1:
    • A large proportion of the variability in the response has been explained by the regression.
  • A value near 0:
    • Indicates that the regression did not explain much of the variability in the response;
  • it can still be challenging to determine what is a good \(R^2\) value;
    • depend on the application;

Multiple Linear Regression

Reading a CSV file:

adv <- read.csv("Advertising.csv", header = TRUE, 
                colClasses = c("NULL", NA, NA, NA, NA)); # Drop First Col.
head(adv);
##      TV radio newspaper sales
## 1 230.1  37.8      69.2  22.1
## 2  44.5  39.3      45.1  10.4
## 3  17.2  45.9      69.3   9.3
## 4 151.5  41.3      58.5  18.5
## 5 180.8  10.8      58.4  12.9
## 6   8.7  48.9      75.0   7.2

Multiple Linear Regression

Exploring Simple Correlations:

attach(adv);
adv.lm1 <- lm(sales ~ TV);
summary(adv.lm1);
## 
## Call:
## lm(formula = sales ~ TV)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.3860 -1.9545 -0.1913  2.0671  7.2124 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.032594   0.457843   15.36   <2e-16 ***
## TV          0.047537   0.002691   17.67   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.259 on 198 degrees of freedom
## Multiple R-squared:  0.6119, Adjusted R-squared:  0.6099 
## F-statistic: 312.1 on 1 and 198 DF,  p-value: < 2.2e-16

Multiple Linear Regression

Exploring Simple Correlations:

adv.lm2 <- lm(sales ~ radio);
summary(adv.lm2);
## 
## Call:
## lm(formula = sales ~ radio)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.7305  -2.1324   0.7707   2.7775   8.1810 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.31164    0.56290  16.542   <2e-16 ***
## radio        0.20250    0.02041   9.921   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.275 on 198 degrees of freedom
## Multiple R-squared:  0.332,  Adjusted R-squared:  0.3287 
## F-statistic: 98.42 on 1 and 198 DF,  p-value: < 2.2e-16

Multiple Linear Regression

Exploring Simple Correlations:

adv.lm3 <- lm(sales ~ newspaper);
summary(adv.lm3);
## 
## Call:
## lm(formula = sales ~ newspaper)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.2272  -3.3873  -0.8392   3.5059  12.7751 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 12.35141    0.62142   19.88  < 2e-16 ***
## newspaper    0.05469    0.01658    3.30  0.00115 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.092 on 198 degrees of freedom
## Multiple R-squared:  0.05212,    Adjusted R-squared:  0.04733 
## F-statistic: 10.89 on 1 and 198 DF,  p-value: 0.001148

Multiple Linear Regression

Exploring Simple Correlations:

Multiple Linear Regression

Exploring Simple Correlations:

Multiple Linear Regression

Exploring Simple Correlations:

Multiple Linear Regression

  • The values of radio and TV better explain the variance;
  • Fitting a separate simple linear regression model for each predictor is bad;
  • Each of the three regression equations ignores the other two media;
  • Extend the simple linear regression model to directly accommodate multiple predictors:

\[\begin{eqnarray} \hat{Y} &=& \beta_0 + \beta_1X_1+ \beta_2X_2+...+\beta_nX_n \\ sales &=& \beta_0 + \beta_1 \times TV + \beta_2 \times newspaper + \epsilon \end{eqnarray}\]

Multiple Linear Regression

Multiple Linear Regression

In R:

adv.lm4 <- lm(sales ~ TV + radio + newspaper);
summary(adv.lm4)$coefficients;
##                 Estimate  Std. Error    t value     Pr(>|t|)
## (Intercept)  2.938889369 0.311908236  9.4222884 1.267295e-17
## TV           0.045764645 0.001394897 32.8086244 1.509960e-81
## radio        0.188530017 0.008611234 21.8934961 1.505339e-54
## newspaper   -0.001037493 0.005871010 -0.1767146 8.599151e-01

Multiple Linear Regression

  • The coefficient estimate for newspaper in the multiple regression model is close to zero;
  • Increase in newspaper advertising, ignoring other predictors;

Does it make sense for the multiple regression to suggest no relationship between sales and newspaper while the simple linear regression implies the opposite?

Multiple Linear Regression

Correlation Matrix:

cor(adv);
##                   TV      radio  newspaper     sales
## TV        1.00000000 0.05480866 0.05664787 0.7822244
## radio     0.05480866 1.00000000 0.35410375 0.5762226
## newspaper 0.05664787 0.35410375 1.00000000 0.2282990
## sales     0.78222442 0.57622257 0.22829903 1.0000000
  • Notice that the correlation between radio and newspaper is 0.35;
  • Spend more on newspaper advertising in markets where more is spent on radio advertising;

Hands on

The activity

  • Use the Prestige dataset in cars dataset;
  • Use R Markdown to create a Report with answers for the questions.
    • Report all steps up to the model creation/ testing;
  • Publish the report using RPubs

The questions

  1. Is at least one of the predictors X1 , X2 , . . . , Xp useful in predicting the response?
  2. Do all the predictors help to explain Y, or is only a subset of the predictors useful?
  3. How well does the model fit the data?
  4. Given a set of predictor values, what response value should we predict, and how accurate is our prediction?