-Fetch data and use 80% of the data for the training set:
library(kknn) Data <- read.table(Users/mariajimenaolanov/Documents/R/credit_card_data-headers.txt,header=FALSE)
set.seed(1)
training <- Data[1:524,] head(training)
-Create Nearest Neighbor Model:
model1 <- cv.kknn(R1~A1+A2+A3+A8+A9+A10+A11+A12+A14+A15,data=training,kcv=50) model1 <- as.data.frame(model1) Create list with model output:
results <- (model1[,2])
-Transform results into vector: for(i in 1:length(results)){results[i]=round(results[i])}
-Check model accuracy: sum(results==training[,11])/nrow(training)
-The model with a kvc of 50 proves the most effective, with a predicted value of 0.8320.Other values for kvc were tested as follows: kvc 100: 0.8282 kvc 50:0.8320 kvc 10: 0.8225 kvc 5:0.8148855
-A test set was then used to check the models true accuracy (20% of the data): test <- Data[525:654,]
-Create Nearest Neighbor Model: model2 <- cv.kknn(R1~A1+A2+A3+A8+A9+A10+A11+A12+A14+A15,data=test,kcv=5) model2 <- as.data.frame(model2)
-Create list with model output: results2 <- (model2[,2]) for(i in 1:length(results2)){results2[i]=round(results2[i])}
-Check model accuracy: sum(results2==test[,11])/nrow(test)
-With the test our model has an accuracy of 0.8846.
Working in the experience center for an e-commerce platform we might want to know how valuable is each seller for our company, so we can categorize them and provide different levels of contact priority. To do this we could use the following predictors: 1) Sales 2) Seller Reputation in the platform (Ranking) 3) Time using the platform 4) Sales growth 5) Amount of revenue from his use of services in the platform.(Sales comission, publicity, payment tool, etc)
-First we fetch the data to do a fast analysis of what is presented:
Data <- read.table(Users/mariajimenaolanov/Documents/R/iris.txt,header=FALSE)
-After doing so, it is evident that petal length and width varies between species, but remains within a certain range for the same species. Sepal width and length, on the other hand, don`t seem to have such a strong correlation with a certain type of species.
-We use these two predictors for our model, and set nstart =10, witch means we will iterate 10 times to find a new center. We use 3 clusters since there are 3 species. kmeans(iris[3:4],3,nstart=10)
-With this we get the ranges of values within each predictor for our thre clusters. Now we organize the information to get a conclusion:
clusters <- kmeans(iris[3:4],3,nstart=10) table(clusters\(cluster,iris\)Species)
-This shows us that our model classified, 2 versicolor in the virginica cluster and 9 virginica in the versicolor cluster. Having a 92,6% success rate. setosa versicolor virginica 1 0 48 9 2 0 2 41 3 50 0 0