Ok, I will try to give the basics of how to load a dataset and run the regression, however, you will notice that the learn curve is steep. This is a R_notebook, which means that you can modify and run the code inside a document. For instance, cars a package that contains a two columns (variables) dataset.

For now I will stick to basic packages

Internal data set

# Load a dataset that is inside R
data(cars)
# Show the structure of the dataset
str(cars)
'data.frame':   50 obs. of  2 variables:
 $ speed: num  4 4 7 7 8 9 10 10 10 11 ...
 $ dist : num  2 10 4 22 16 10 18 26 34 17 ...

As you can see there are two variables (speed and distance) each with 50 observations.

Loading a external dataset

The first thing you have to consider is the format of the data, is it an excel xlsx? A file from Stata? In this cases it is necessary to install complementary packages.

Second, it is common to work with .txt or .csv filetypes in R, in my opinion the latter is more handful in order to learn the basics.

Third, let’s assume that you have a csv file with name “data”, hence it will be data.csv. Another particularity of this format is the separator, that is, the “symbol” that divides between columns. It can be a semicolon, a tab space, a comma, etc…

Fourth, are the decimal described by commas or dots? The first line has the variable names? This is also relevant.

Data set used as an example

# Create 100 random numbers from a normal distribution
x=rnorm(n = 1000, mean = 5, sd = 1)
# Create 100 random numbers (zeros and ones) from a binomial distribution with equal probability
y=rbinom(n = 100, size = 1, prob = .5)
# Create 100 random numbers (zeros to fives) from a binomial distribution with equal probability
z=rbinom(n = 100, size = 5, prob = .5)
# Bind all variables into one dataset (dataframe is one of the basic forms in R)
data=data.frame(x, y, z)
# Now the data looks like this (head shows the first observations of a dataset)
head(data)
NA

Functions have arguments, lot of them, to adjust them is one the main complications when you first start to use R. For instance

args(read.csv)
function (file, header = TRUE, sep = ",", quote = "\"", dec = ".", 
    fill = TRUE, comment.char = "", ...) 
NULL
# function (file, header = TRUE, sep = ",", quote = "\"", dec = ".", 
#     fill = TRUE, comment.char = "", ...) 
# NULL

This is the function to load data from outside, it has header, sep for separator, quote (I have never used) and dec for decimal.

Now for you is important to load the data from “outside”, hence we will save the data dataset that we just have created to a file. But first, you have to know the “working directory” how can you find it? with this function:

getwd()
[1] "C:/Users/opoys/Desktop"

If this directory does not work for you, it can be changed by:

# In my case I will set it to my desktop
setwd("C:/Users/opoys/Desktop")

This is where the dataset is going to be exported. The function to write the file outside is given by:

write.csv(x = data, file = "data.csv", row.names = FALSE)

This is how it looks from a “notepad” viewer

As can notice, the decimal separator is a dot, if you want to be a comma the right function is write.csv2.

Let’s forget about the exporting thing, it is time to read it!

# See the difference between ".csv" and ".csv2", in this this the decimal point is a dot, thus read.csv is the one we need
data.imported=read.csv(file = "C:/Users/opoys/Desktop/data.csv", header = TRUE, sep = ",")
# Let's check if everything is ok
head(data.imported) # It is fine!
# What would have happened if I adjust wrong the arguments (semicolon instead a comma as a separator)
data.imported.bad=read.csv2(file = "C:/Users/opoys/Desktop/data.csv", header = TRUE, sep = ";")
head(data.imported.bad) # A mess!

Regression

Is is pretty simple to run a regression, let say that you want to test for the effect of z on x

# The symbol "~" is important to define the regressors
fit1=lm(x~z, data = data.imported)
# See the results
summary(fit1)

Call:
lm(formula = x ~ z, data = data.imported)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.4796 -0.6478  0.0379  0.6826  3.4733 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  5.00896    0.07318  68.448   <2e-16 ***
z            0.01625    0.02836   0.573    0.567    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.014 on 998 degrees of freedom
Multiple R-squared:  0.000329,  Adjusted R-squared:  -0.0006727 
F-statistic: 0.3285 on 1 and 998 DF,  p-value: 0.5667
# Basic diagnostics
plot(fit1)

# Add a second variable
fit2=lm(x~y+z, data.imported)
# Same procedure
   summary(fit2)

Call:
lm(formula = x ~ y + z, data = data.imported)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5165 -0.6546  0.0287  0.6741  3.5059 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.93121    0.08851  55.713   <2e-16 ***
y            0.10309    0.06612   1.559    0.119    
z            0.02755    0.02925   0.942    0.346    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.013 on 997 degrees of freedom
Multiple R-squared:  0.002761,  Adjusted R-squared:  0.0007604 
F-statistic:  1.38 on 2 and 997 DF,  p-value: 0.252
   
   plot(fit2)

I think this are the basics that you asked for. The first steps in R are simply a headache, but is it getting better as you get proficient in it.

Good luck!

LS0tDQp0aXRsZTogIkxpbmVhciByZWdyZXNzaW9uIGluIFIiDQpvdXRwdXQ6DQogIGh0bWxfbm90ZWJvb2s6IGRlZmF1bHQNCiAgaHRtbF9kb2N1bWVudDogZGVmYXVsdA0KLS0tDQoNCk9rLCBJIHdpbGwgdHJ5IHRvIGdpdmUgdGhlIGJhc2ljcyBvZiBob3cgdG8gbG9hZCBhIGRhdGFzZXQgYW5kIHJ1biB0aGUgcmVncmVzc2lvbiwgaG93ZXZlciwgeW91IHdpbGwgbm90aWNlIHRoYXQgdGhlIGxlYXJuIGN1cnZlIGlzICpzdGVlcCouIFRoaXMgaXMgYSBSX25vdGVib29rLCB3aGljaCBtZWFucyB0aGF0IHlvdSBjYW4gbW9kaWZ5IGFuZCBydW4gdGhlIGNvZGUgaW5zaWRlIGEgZG9jdW1lbnQuIEZvciBpbnN0YW5jZSwgYGNhcnNgIGEgcGFja2FnZSB0aGF0IGNvbnRhaW5zIGEgdHdvIGNvbHVtbnMgKHZhcmlhYmxlcykgZGF0YXNldC4NCg0KPiBGb3Igbm93IEkgd2lsbCBzdGljayB0byBiYXNpYyBwYWNrYWdlcw0KDQojIEludGVybmFsIGRhdGEgc2V0DQoNCmBgYHtyfQ0KIyBMb2FkIGEgZGF0YXNldCB0aGF0IGlzIGluc2lkZSBSDQpkYXRhKGNhcnMpDQoNCiMgU2hvdyB0aGUgc3RydWN0dXJlIG9mIHRoZSBkYXRhc2V0DQpzdHIoY2FycykNCmBgYA0KDQpBcyB5b3UgY2FuIHNlZSB0aGVyZSBhcmUgdHdvIHZhcmlhYmxlcyAoYHNwZWVkYCBhbmQgYGRpc3RhbmNlYCkgZWFjaCB3aXRoIDUwIG9ic2VydmF0aW9ucy4NCg0KIyMgTG9hZGluZyBhIGV4dGVybmFsIGRhdGFzZXQNCg0KVGhlIGZpcnN0IHRoaW5nIHlvdSBoYXZlIHRvIGNvbnNpZGVyIGlzIHRoZSBmb3JtYXQgb2YgdGhlIGRhdGEsIGlzIGl0IGFuIGV4Y2VsIGB4bHN4YD8gQSBmaWxlIGZyb20gU3RhdGE/IEluIHRoaXMgY2FzZXMgaXQgaXMgbmVjZXNzYXJ5IHRvIGluc3RhbGwgY29tcGxlbWVudGFyeSBwYWNrYWdlcy4NCg0KU2Vjb25kLCBpdCBpcyBjb21tb24gdG8gd29yayB3aXRoIGAudHh0YCBvciBgLmNzdmAgZmlsZXR5cGVzIGluIFIsIGluIG15IG9waW5pb24gdGhlIGxhdHRlciBpcyBtb3JlIGhhbmRmdWwgaW4gb3JkZXIgdG8gbGVhcm4gdGhlIGJhc2ljcy4NCg0KVGhpcmQsIGxldCdzIGFzc3VtZSB0aGF0IHlvdSBoYXZlIGEgYGNzdmAgZmlsZSB3aXRoIG5hbWUgImRhdGEiLCBoZW5jZSBpdCB3aWxsIGJlIGRhdGEuY3N2LiBBbm90aGVyIHBhcnRpY3VsYXJpdHkgb2YgdGhpcyBmb3JtYXQgaXMgdGhlIHNlcGFyYXRvciwgdGhhdCBpcywgdGhlICJzeW1ib2wiIHRoYXQgZGl2aWRlcyBiZXR3ZWVuIGNvbHVtbnMuIEl0IGNhbiBiZSBhIHNlbWljb2xvbiwgYSB0YWIgc3BhY2UsIGEgY29tbWEsIGV0Yy4uLiANCg0KRm91cnRoLCBhcmUgdGhlIGRlY2ltYWwgZGVzY3JpYmVkIGJ5IGNvbW1hcyBvciBkb3RzPyBUaGUgZmlyc3QgbGluZSBoYXMgdGhlIHZhcmlhYmxlIG5hbWVzPyBUaGlzIGlzIGFsc28gcmVsZXZhbnQuIA0KDQojIyBEYXRhIHNldCB1c2VkIGFzIGFuIGV4YW1wbGUNCg0KYGBge3J9DQojIENyZWF0ZSAxMDAgcmFuZG9tIG51bWJlcnMgZnJvbSBhIG5vcm1hbCBkaXN0cmlidXRpb24NCng9cm5vcm0obiA9IDEwMDAsIG1lYW4gPSA1LCBzZCA9IDEpDQoNCiMgQ3JlYXRlIDEwMCByYW5kb20gbnVtYmVycyAoemVyb3MgYW5kIG9uZXMpIGZyb20gYSBiaW5vbWlhbCBkaXN0cmlidXRpb24gd2l0aCBlcXVhbCBwcm9iYWJpbGl0eQ0KeT1yYmlub20obiA9IDEwMCwgc2l6ZSA9IDEsIHByb2IgPSAuNSkNCg0KIyBDcmVhdGUgMTAwIHJhbmRvbSBudW1iZXJzICh6ZXJvcyB0byBmaXZlcykgZnJvbSBhIGJpbm9taWFsIGRpc3RyaWJ1dGlvbiB3aXRoIGVxdWFsIHByb2JhYmlsaXR5DQoNCno9cmJpbm9tKG4gPSAxMDAsIHNpemUgPSA1LCBwcm9iID0gLjUpDQoNCiMgQmluZCBhbGwgdmFyaWFibGVzIGludG8gb25lIGRhdGFzZXQgKGRhdGFmcmFtZSBpcyBvbmUgb2YgdGhlIGJhc2ljIGZvcm1zIGluIFIpDQoNCmRhdGE9ZGF0YS5mcmFtZSh4LCB5LCB6KQ0KDQojIE5vdyB0aGUgZGF0YSBsb29rcyBsaWtlIHRoaXMgKGhlYWQgc2hvd3MgdGhlIGZpcnN0IG9ic2VydmF0aW9ucyBvZiBhIGRhdGFzZXQpDQoNCmhlYWQoZGF0YSkNCiANCmBgYA0KDQpGdW5jdGlvbnMgaGF2ZSBhcmd1bWVudHMsIGxvdCBvZiB0aGVtLCB0byBhZGp1c3QgdGhlbSBpcyBvbmUgdGhlIG1haW4gY29tcGxpY2F0aW9ucyB3aGVuIHlvdSBmaXJzdCBzdGFydCB0byB1c2UgUi4gRm9yIGluc3RhbmNlDQoNCmBgYHtyfQ0KYXJncyhyZWFkLmNzdikNCg0KYGBgDQoNClRoaXMgaXMgdGhlIGZ1bmN0aW9uIHRvIGxvYWQgZGF0YSBmcm9tIG91dHNpZGUsIGl0IGhhcyBfaGVhZGVyXywgX3NlcF8gZm9yIHNlcGFyYXRvciwgX3F1b3RlXyAoSSBoYXZlIG5ldmVyIHVzZWQpIGFuZCBfZGVjXyBmb3IgZGVjaW1hbC4NCg0KTm93IGZvciB5b3UgaXMgaW1wb3J0YW50IHRvIGxvYWQgdGhlIGRhdGEgZnJvbSAib3V0c2lkZSIsIGhlbmNlIHdlIHdpbGwgc2F2ZSB0aGUgYGRhdGFgIGRhdGFzZXQgdGhhdCB3ZSBqdXN0IGhhdmUgY3JlYXRlZCB0byBhIGZpbGUuIEJ1dCBmaXJzdCwgeW91IGhhdmUgdG8ga25vdyB0aGUgIndvcmtpbmcgZGlyZWN0b3J5IiBob3cgY2FuIHlvdSBmaW5kIGl0PyB3aXRoIHRoaXMgZnVuY3Rpb246DQoNCmBgYHtyfQ0KZ2V0d2QoKQ0KYGBgDQoNCklmIHRoaXMgZGlyZWN0b3J5IGRvZXMgbm90IHdvcmsgZm9yIHlvdSwgaXQgY2FuIGJlIGNoYW5nZWQgYnk6DQoNCmBgYHtyfQ0KIyBJbiBteSBjYXNlIEkgd2lsbCBzZXQgaXQgdG8gbXkgZGVza3RvcA0KDQpzZXR3ZCgiQzovVXNlcnMvb3BveXMvRGVza3RvcCIpDQpgYGANCg0KDQoNClRoaXMgaXMgd2hlcmUgdGhlIGRhdGFzZXQgaXMgZ29pbmcgdG8gYmUgZXhwb3J0ZWQuIFRoZSBmdW5jdGlvbiB0byB3cml0ZSB0aGUgZmlsZSBvdXRzaWRlIGlzIGdpdmVuIGJ5Og0KDQpgYGB7cn0NCndyaXRlLmNzdih4ID0gZGF0YSwgZmlsZSA9ICJkYXRhLmNzdiIsIHJvdy5uYW1lcyA9IEZBTFNFKQ0KYGBgDQoNClRoaXMgaXMgaG93IGl0IGxvb2tzIGZyb20gYSAibm90ZXBhZCIgdmlld2VyDQoNCiFbXShDOi9Vc2Vycy9vcG95cy9QaWN0dXJlcy9NeSBTY3JlZW4gU2hvdHMvU2NyZWVuIFNob3QgMDYtMDYtMTcgYXQgMTIuNTggUE0uUE5HKQ0KDQpBcyBjYW4gbm90aWNlLCB0aGUgZGVjaW1hbCBzZXBhcmF0b3IgaXMgYSAqZG90KiwgaWYgeW91IHdhbnQgdG8gYmUgYSAqY29tbWEqIHRoZSByaWdodCBmdW5jdGlvbiBpcyBgd3JpdGUuY3N2MmAuDQoNCkxldCdzIGZvcmdldCBhYm91dCB0aGUgZXhwb3J0aW5nIHRoaW5nLCBpdCBpcyB0aW1lIHRvIHJlYWQgaXQhDQoNCmBgYHtyfQ0KIyBTZWUgdGhlIGRpZmZlcmVuY2UgYmV0d2VlbiAiLmNzdiIgYW5kICIuY3N2MiIsIGluIHRoaXMgdGhpcyB0aGUgZGVjaW1hbCBwb2ludCBpcyBhIGRvdCwgdGh1cyByZWFkLmNzdiBpcyB0aGUgb25lIHdlIG5lZWQNCg0KZGF0YS5pbXBvcnRlZD1yZWFkLmNzdihmaWxlID0gIkM6L1VzZXJzL29wb3lzL0Rlc2t0b3AvZGF0YS5jc3YiLCBoZWFkZXIgPSBUUlVFLCBzZXAgPSAiLCIpDQoNCiMgTGV0J3MgY2hlY2sgaWYgZXZlcnl0aGluZyBpcyBvaw0KDQpoZWFkKGRhdGEuaW1wb3J0ZWQpICMgSXQgaXMgZmluZSENCg0KIyBXaGF0IHdvdWxkIGhhdmUgaGFwcGVuZWQgaWYgSSBhZGp1c3Qgd3JvbmcgdGhlIGFyZ3VtZW50cyAoc2VtaWNvbG9uIGluc3RlYWQgYSBjb21tYSBhcyBhIHNlcGFyYXRvcikNCg0KZGF0YS5pbXBvcnRlZC5iYWQ9cmVhZC5jc3YyKGZpbGUgPSAiQzovVXNlcnMvb3BveXMvRGVza3RvcC9kYXRhLmNzdiIsIGhlYWRlciA9IFRSVUUsIHNlcCA9ICI7IikNCg0KaGVhZChkYXRhLmltcG9ydGVkLmJhZCkgIyBBIG1lc3MhDQoNCmBgYA0KDQojIyBSZWdyZXNzaW9uDQoNCklzIGlzIHByZXR0eSBzaW1wbGUgdG8gcnVuIGEgcmVncmVzc2lvbiwgbGV0IHNheSB0aGF0IHlvdSB3YW50IHRvIHRlc3QgZm9yIHRoZSBlZmZlY3Qgb2YgYHpgIG9uIGB4YA0KDQpgYGB7cn0NCiMgVGhlIHN5bWJvbCAifiIgaXMgaW1wb3J0YW50IHRvIGRlZmluZSB0aGUgcmVncmVzc29ycw0KZml0MT1sbSh4fnosIGRhdGEgPSBkYXRhLmltcG9ydGVkKQ0KDQojIFNlZSB0aGUgcmVzdWx0cw0KDQpzdW1tYXJ5KGZpdDEpDQoNCiMgQmFzaWMgZGlhZ25vc3RpY3MNCg0KcGxvdChmaXQxKQ0KDQojIEFkZCBhIHNlY29uZCB2YXJpYWJsZQ0KDQpmaXQyPWxtKHh+eSt6LCBkYXRhLmltcG9ydGVkKQ0KDQojIFNhbWUgcHJvY2VkdXJlDQogICBzdW1tYXJ5KGZpdDIpDQogICANCiAgIHBsb3QoZml0MikNCmBgYA0KDQpJIHRoaW5rIHRoaXMgYXJlIHRoZSBiYXNpY3MgdGhhdCB5b3UgYXNrZWQgZm9yLiBUaGUgZmlyc3Qgc3RlcHMgaW4gUiBhcmUgc2ltcGx5IGEgaGVhZGFjaGUsIGJ1dCBpcyBpdCBnZXR0aW5nIGJldHRlciBhcyB5b3UgZ2V0IHByb2ZpY2llbnQgaW4gaXQuDQoNCkdvb2QgbHVjayE=