First, we must calculate the mean, standard deviation, maximum, and minimum for the Age column using R.
In R, we must read in the file again, extract the column and find the values that are asked for.
#Read File
#Name the extracted variable
scoringdata = read.csv(file="data/scoring.csv")
head(scoringdata)
#Calculate the average age below. Refer to Worksheet 1 for the correct command.
#Calculating the mean.
age = scoringdata$Age
mean_age = mean(age)
mean_age
[1] 37.08412
#Calculate standard deviation of age below. Refer to Worksheet 1 for the correct command.
sd_age = sd(age)
sd_age
[1] 10.98637
#Calculating the maximum value present in the Age column.
max_age = max(age)
max_age
[1] 68
#Calculate the minimum value present in the Age column.
min_age = min(age)
min_age
[1] 18
Next, use the formula from class to detect any outliers. An outlier is value that “lies outside” most of the other values in a set of data. A common way to estimate the upper and lower threshold is to take the mean (+ or -) 3 * standard deviation. Try using this formula to find the upper and lower limit for age.
#Use the formula "mean (+ or -) 3 * standard deviation" to calculate the upper and lower threshold
upper_age = mean_age + (3) * sd_age
lower_age = mean_age - (3) * sd_age
mean_age - (3) * sd_age
[1] 4.125023
lowerq = mean(age) - (3) * sd_age
mean_age + (3) * sd_age
[1] 70.04322
upperq = mean(age) + (3) * sd_age
A method to find the upper and lower thresholds discussed in introductory statistics courses involves finding the interquartile range. Follow along below to see how we first calculate the interquartile range..
quantile(age)
0% 25% 50% 75% 100%
18 28 36 45 68
iqr = upperq - lowerq
The formula below calculates the threshold. The threshold is the boundaries that determine if a value is an outlier. If the value falls above the upper threshold or below the lower threshold, it is an outlier.
Below is the upper threshold:
#The next step is to use the command that calls the interquartile range.
lowerq = quantile(age)[2]
upperq = quantile(age)[4]
#The formula calculates the threshold by using the interquartile range instead of the mean and sd. Have upper and lower thresholds.
qrange = upperq - lowerq
qrange
75%
17
upperthreshold = (qrange * 1.5) + upperq
upperthreshold
75%
70.5
lowerthreshold = lowerq - (qrange * 1.5)
lowerthreshold
25%
2.5
Are there any outliers? How many? It can also be useful to visualize the data using a box and whisker plot. The boxplot below supports the IQR we found of 15 and upper and lower threshold.
age[age>upperthreshold]
integer(0)
age[age<lowerthreshold]
integer(0)
boxplot(age)
Next, we must read the ‘creditriskorg.csv’ file into R. This is the original dataset and contains missing values.
originalscoringdata = read.csv(file="data/scoring_original.csv")
head(originalscoringdata)
mydata = read.csv(file="data/scoring_original.csv", skip=1)
head(mydata)
To calculate the mean for Checking in R, follow Worksheet 1. Extract the Checking column first and then find the average using the function built in R. What happens when we try to use the function?
price = originalscoringdata$Price
price
[1] $846.00 $1,658.00 $2,985.00 $1,325.00 $910.00 $1,645.00 $1,800.00 $1,093.00 $1,957.00
[10] $1,468.00 $1,577.00 $915.00 $1,650.00 $940.00 $500.00 $1,186.00 $2,201.00 $1,350.00
[19] $1,511.00 $1,253.00 $2,189.00 $- $1,159.00 $1,332.00 $1,497.00 $1,357.00 $2,100.00
[28] $1,070.00 $2,557.00 $1,600.00 $- $1,312.00 $400.00 $650.00 $1,394.00 $1,542.00
[37] $1,200.00 $1,560.00 $1,200.00 $950.00 $- $1,300.00 $1,700.00 $1,167.00 $1,150.00
[46] $1,566.00 $1,552.00 $1,300.00 $2,104.00 $2,154.00 $545.00 $1,778.00 $1,718.00 $1,500.00
[55] $1,274.00 $1,015.00 $1,701.00 $1,345.00 $1,238.00 $1,048.00 $1,048.00 $1,324.00 $2,201.00
[64] $1,430.00 $926.00 $1,500.00 $1,542.00 $1,360.00 $1,000.00 $1,564.00 $1,205.00 $2,200.00
[73] $2,214.00 $1,137.00 $1,025.00 $1,593.00 $1,735.00 $1,132.00 $1,338.00 $1,100.00 $2,276.00
[82] $1,360.00 $1,436.00 $612.00 $1,100.00 $1,300.00 $2,089.00 $2,262.00 $1,054.00 $1,426.00
[91] $1,048.00 $1,292.00 $1,177.00 $1,490.00 $850.00 $1,011.00 $1,571.00 $1,000.00 $270.00
[100] $1,382.00 $1,713.00 $1,086.00 $3,262.00 $1,610.00 $1,030.00 $650.00 $2,175.00 $1,780.00
[109] $950.00 $1,211.00 $1,650.00 $350.00 $1,117.00 $2,468.00 $1,740.00 $1,210.00 $963.00
[118] $1,646.00 $800.00 $470.00 $1,370.00 $839.00 $1,346.00 $1,350.00 $2,100.00 $1,307.00
[127] $1,126.00 $4,786.00 $1,478.00 $1,568.00 $1,056.00 $1,608.00 $1,179.00 $1,524.00 $1,730.00
[136] $823.00 $800.00 $1,482.00 $1,110.00 $1,555.00 $1,170.00 $886.00 $1,395.00 $2,084.00
[145] $8,800.00 $1,160.00 $1,211.00 $1,300.00 $850.00 $1,204.00 $1,528.00 $1,298.00 $600.00
[154] $1,320.00 $1,300.00 $1,380.00 $1,426.00 $5,200.00 $1,503.00 $1,449.00 $1,422.00 $1,430.00
[163] $1,578.00 $1,250.00 $2,260.00 $280.00 $1,298.00 $945.00 $1,694.00 $700.00 $1,290.00
[172] $1,581.00 $1,383.00 $1,637.00 $1,275.00 $1,400.00 $832.00 $2,886.00 $1,257.00 $1,346.00
[181] $1,138.00 $1,801.00 $1,200.00 $950.00 $1,589.00 $1,617.00 $1,200.00 $2,194.00 $1,642.00
[190] $1,423.00 $1,600.00 $700.00 $300.00 $1,165.00 $1,365.00 $1,706.00 $2,053.00 $1,700.00
[199] $1,462.00 $1,300.00 $1,122.00 $1,257.00 $1,387.00 $1,266.00 $2,195.00 $2,004.00 $1,203.00
[208] $1,138.00 $1,374.00 $1,514.00 $750.00 $1,894.00 $1,048.00 $1,369.00 $1,035.00 $1,218.00
[217] $1,637.00 $953.00 $1,212.00 $1,218.00 $1,419.00 $1,100.00 $987.00 $2,400.00 $1,375.00
[226] $1,595.00 $1,054.00 $1,651.00 $1,542.00 $2,014.00 $2,624.00 $1,135.00 $1,105.00 $1,500.00
[235] $1,200.00 $1,555.00 $2,251.00 $1,542.00 $2,186.00 $1,700.00 $1,180.00 $800.00 $1,668.00
[244] $1,750.00 $875.00 $1,514.00 $1,950.00 $1,462.00 $1,098.00 $840.00 $1,634.00 $1,603.00
[253] $1,200.00 $1,360.00 $1,236.00 $1,395.00 $1,390.00 $1,355.00 $1,604.00 $1,241.00 $375.00
[262] $1,045.00 $1,781.00 $900.00 $1,698.00 $1,048.00 $2,200.00 $1,119.00 $1,090.00 $4,100.00
[271] $1,400.00 $2,010.00 $2,014.00 $1,342.00 $1,132.00 $1,048.00 $1,218.00 $1,114.00 $1,206.00
[280] $2,132.00 $570.00 $1,094.00 $550.00 $1,255.00 $4,138.00 $1,714.00 $1,150.00 $2,673.00
[289] $1,630.00 $1,544.00 $1,195.00 $1,206.00 $2,050.00 $1,513.00 $3,300.00 $1,250.00 $1,339.00
[298] $875.00 $1,101.00 $1,568.00 $1,194.00 $1,378.00 $1,139.00 $350.00 $1,406.00 $1,497.00
[307] $1,600.00 $1,350.00 $2,032.00 $2,610.00 $1,193.00 $1,275.00 $3,750.00 $2,500.00 $1,790.00
[316] $1,419.00 $1,480.00 $1,542.00 $1,850.00 $1,380.00 $1,888.00 $1,672.00 $808.00 $1,107.00
[325] $1,335.00 $1,390.00 $1,363.00 $1,482.00 $2,357.00 $850.00 $1,180.00 $1,617.00 $1,114.00
[334] $1,976.00 $1,095.00 $1,380.00 $1,529.00 $1,263.00 $1,600.00 $1,400.00 $1,520.00 $1,617.00
[343] $1,500.00 $1,705.00 $1,292.00 $1,294.00 $1,936.00 $2,143.00 $1,394.00 $425.00 $1,600.00
[352] $1,563.00 $1,284.00 $990.00 $1,125.00 $1,267.00 $1,780.00 $1,298.00 $1,600.00 $1,340.00
[361] $1,244.00 $1,403.00 $1,980.00 $1,290.00 $1,339.00 $1,559.00 $1,078.00 $1,113.00 $2,290.00
[370] $1,578.00 $1,039.00 $1,218.00 $1,329.00 $500.00 $1,384.00 $1,553.00 $1,626.00 $1,674.00
[379] $1,500.00 $1,421.00 $480.00 $1,274.00 $1,580.00 $1,300.00 $1,314.00 $957.00 $1,500.00
[388] $1,504.00 $935.00 $1,028.00 $1,440.00 $1,030.00 $1,366.00 $1,768.00 $2,340.00 $1,700.00
[397] $1,450.00 $1,155.00 $1,191.00 $841.00 $2,441.00 $1,516.00 $1,591.00 $1,999.00 $1,336.00
[406] $1,600.00 $1,850.00 $1,775.00 $1,224.00 $1,663.00 $1,500.00 $2,225.00 $1,930.00 $1,607.00
[415] $1,230.00 $1,581.00 $1,072.00 $1,471.00 $1,300.00 $1,192.00 $2,700.00 $500.00 $1,720.00
[424] $1,770.00 $1,197.00 $950.00 $450.00 $1,707.00 $1,201.00 $1,355.00 $1,196.00 $1,297.00
[433] $1,420.00 $1,036.00 $1,201.00 $1,527.00 $1,500.00 $1,057.00 $2,127.00 $2,218.00 $1,570.00
[442] $650.00 $1,175.00 $825.00 $2,152.00 $978.00 $1,650.00 $2,150.00 $1,771.00 $1,615.00
[451] $994.00 $1,147.00 $1,194.00 $825.00 $1,128.00 $360.00 $1,190.00 $1,580.00 $1,600.00
[460] $1,735.00 $2,178.00 $1,589.00 $1,828.00 $1,566.00 $1,250.00 $1,626.00 $1,414.00 $1,687.00
[469] $1,638.00 $1,296.00 $1,460.00 $1,400.00 $911.00 $1,559.00 $1,315.00 $2,600.00 $1,208.00
[478] $975.00 $1,274.00 $1,197.00 $1,092.00 $1,100.00 $1,202.00 $2,646.00 $1,816.00 $1,700.00
[487] $1,700.00 $984.00 $1,387.00 $1,636.00 $1,135.00 $1,453.00 $1,500.00 $1,350.00 $2,247.00
[496] $1,367.00 $950.00 $2,154.00 $1,100.00 $1,492.00 $1,351.00 $2,040.00 $1,037.00 $1,202.00
[505] $1,683.00 $1,000.00 $1,100.00 $1,734.00 $1,604.00 $1,330.00 $877.00 $1,759.00 $1,564.00
[514] $1,306.00 $1,108.00 $1,500.00 $1,645.00 $1,500.00 $2,625.00 $1,456.00 $2,178.00 $1,642.00
[523] $1,265.00 $325.00 $1,471.00 $1,440.00 $2,140.00 $1,377.00 $1,780.00 $818.00 $1,017.00
[532] $1,788.00 $1,212.00 $1,800.00 $1,394.00 $400.00 $940.00 $1,433.00 $1,439.00 $2,414.00
[541] $1,571.00 $1,500.00 $1,258.00 $1,403.00 $1,346.00 $2,195.00 $1,637.00 $2,375.00 $873.00
[550] $1,450.00 $1,607.00 $1,376.00 $1,092.00 $1,307.00 $1,128.00 $1,698.00 $1,075.00 $1,920.00
[559] $1,298.00 $1,362.00 $1,770.00 $2,022.00 $1,225.00 $724.00 $1,346.00 $1,415.00 $1,380.00
[568] $1,500.00 $900.00 $1,639.00 $2,220.00 $1,600.00 $1,270.00 $1,556.00 $1,878.00 $1,532.00
[577] $900.00 $1,100.00 $1,556.00 $1,054.00 $1,285.00 $1,338.00 $1,571.00 $1,556.00 $1,194.00
[586] $1,556.00 $1,989.00 $2,624.00 $925.00 $1,128.00 $2,220.00 $1,395.00 $1,137.00 $875.00
[595] $1,417.00 $700.00 $1,207.00 $1,268.00 $1,326.00 $1,480.00 $1,228.00 $2,168.00 $2,100.00
[604] $1,197.00 $510.00 $325.00 $1,000.00 $1,748.00 $1,392.00 $820.00 $1,275.00 $1,758.00
[613] $1,250.00 $1,560.00 $1,570.00 $1,045.00 $500.00 $1,800.00 $1,600.00 $1,177.00 $1,371.00
[622] $1,100.00 $1,256.00 $1,469.00 $1,682.00 $1,150.00 $675.00 $1,236.00 $1,265.00 $1,204.00
[631] $1,353.00 $1,330.00 $1,200.00 $1,488.00 $1,695.00 $1,547.00 $1,112.00 $1,062.00 $1,567.00
[640] $900.00 $1,280.00 $1,340.00 $1,490.00 $2,640.00 $900.00 $1,569.00 $1,937.00 $1,403.00
[649] $1,501.00 $1,255.00 $1,345.00 $2,134.00 $1,590.00 $400.00 $1,406.00 $1,872.00 $1,123.00
[658] $1,067.00 $2,253.00 $580.00 $1,992.00 $1,400.00 $1,284.00 $1,749.00 $2,125.00 $1,505.00
[667] $1,376.00 $2,154.00 $1,138.00 $1,931.00 $1,698.00 $1,279.00 $2,755.00 $1,600.00 $2,200.00
[676] $2,260.00 $1,904.00 $1,580.00 $2,810.00 $700.00 $1,100.00 $1,910.00 $1,360.00 $1,030.00
[685] $460.00 $1,339.00 $525.00 $1,315.00 $1,517.00 $1,330.00 $1,290.00 $1,900.00 $1,366.00
[694] $4,063.00 $1,064.00 $1,606.00 $1,570.00 $1,381.00 $816.00 $1,150.00 $2,215.00 $1,586.00
[703] $1,469.00 $1,556.00 $1,720.00 $1,650.00 $1,191.00 $1,127.00 $1,100.00 $1,480.00 $1,258.00
[712] $1,544.00 $1,404.00 $2,053.00 $1,575.00 $1,430.00 $1,409.00 $962.00 $1,400.00 $1,571.00
[721] $1,200.00 $1,200.00 $1,842.00 $841.00 $1,585.00 $957.00 $950.00 $1,375.00 $1,800.00
[730] $2,212.00 $1,133.00 $1,570.00 $1,569.00 $1,590.00 $1,734.00 $450.00 $2,008.00 $1,150.00
[739] $630.00 $1,571.00 $1,770.00 $1,600.00 $1,627.00 $1,094.00 $1,570.00 $935.00 $2,259.00
[748] $1,419.00 $820.00 $1,060.00 $600.00 $1,372.00 $758.00 $1,164.00 $1,450.00 $2,125.00
[757] $1,557.00 $1,700.00 $1,571.00 $1,654.00 $350.00 $420.00 $2,173.00 $6,802.00 $1,061.00
[766] $1,425.00 $546.00 $3,400.00 $1,776.00 $1,200.00 $1,403.00 $1,086.00 $1,200.00 $2,200.00
[775] $2,360.00 $1,758.00 $1,783.00 $1,108.00 $1,136.00 $1,557.00 $886.00 $1,490.00 $400.00
[784] $1,105.00 $1,131.00 $1,256.00 $1,571.00 $1,443.00 $1,160.00 $896.00 $1,639.00 $1,571.00
[793] $1,075.00 $1,590.00 $2,200.00 $450.00 $1,536.00 $1,375.00 $1,005.00 $1,264.00 $1,313.00
[802] $1,365.00 $1,105.00 $1,100.00 $1,480.00 $1,310.00 $1,265.00 $1,474.00 $980.00 $600.00
[811] $1,635.00 $1,376.00 $675.00 $2,000.00 $978.00 $1,130.00 $2,600.00 $950.00 $1,493.00
[820] $2,240.00 $1,224.00 $1,000.00 $2,500.00 $1,120.00 $1,242.00 $725.00 $720.00 $2,470.00
[829] $1,150.00 $1,655.00 $1,389.00 $1,310.00 $758.00 $1,770.00 $425.00 $1,035.00 $1,226.00
[838] $1,750.00 $1,800.00 $2,150.00 $1,713.00 $975.00 $650.00 $916.00 $625.00 $1,318.00
[847] $750.00 $325.00 $1,014.00 $1,525.00 $1,144.00 $550.00 $1,556.00 $1,396.00 $2,470.00
[856] $1,890.00 $575.00 $1,294.00 $710.00 $2,600.00 $1,849.00 $450.00 $1,369.00 $2,281.00
[865] $4,575.00 $1,332.00 $1,240.00 $931.00 $1,330.00 $1,192.00 $950.00 $1,900.00 $1,810.00
[874] $1,700.00 $1,318.00 $750.00 $1,168.00 $1,300.00 $1,111.00 $1,264.00 $1,692.00 $1,123.00
[883] $1,386.00 $1,395.00 $1,127.00 $960.00 $2,179.00 $886.00 $1,314.00 $2,800.00 $960.00
[892] $873.00 $1,111.00 $1,040.00 $1,995.00 $1,333.00 $1,358.00 $1,613.00 $1,855.00 $2,271.00
[901] $925.00 $1,950.00 $2,058.00 $1,600.00 $860.00 $875.00 $1,710.00 $1,260.00 $1,705.00
[910] $1,307.00 $1,982.00 $1,479.00 $1,308.00 $2,201.00 $1,350.00 $1,386.00 $2,276.00 $1,608.00
[919] $1,585.00 $1,320.00 $3,190.00 $1,500.00 $1,571.00 $1,386.00 $1,680.00 $450.00 $1,488.00
[928] $1,396.00 $1,255.00 $1,883.00 $375.00 $3,070.00 $2,189.00 $1,692.00 $2,216.00 $1,298.00
[937] $750.00 $1,830.00 $1,242.00 $882.00 $1,137.00 $275.00 $580.00 $1,403.00 $957.00
[946] $1,300.00 $1,097.00 $1,033.00 $913.00 $962.00 $1,123.00 $1,400.00 $1,800.00 $1,980.00
[955] $989.00 $1,250.00 $1,646.00 $1,114.00 $1,365.00 $1,600.00 $1,133.00 $550.00 $350.00
[964] $1,321.00 $1,225.00 $950.00 $550.00 $1,277.00 $1,172.00 $1,318.00 $2,185.00 $2,610.00
[973] $1,257.00 $600.00 $1,408.00 $2,413.00 $1,388.00 $1,636.00 $850.00 $1,757.00 $1,600.00
[982] $2,022.00 $1,200.00 $1,550.00 $710.00 $1,286.00 $1,400.00 $1,638.00 $1,700.00 $1,850.00
[991] $1,677.00 $1,900.00 $1,601.00 $1,200.00 $1,456.00 $1,456.00 $960.00 $1,337.00 $1,359.00
[1000] $801.00
[ reached getOption("max.print") -- omitted 3459 entries ]
1419 Levels: $- $1,000.00 $1,001.00 $1,003.00 $1,005.00 $1,007.00 $1,008.00 $1,011.00 ... $999.00
mean_price = mean(price)
argument is not numeric or logical: returning NA
mean_price = mean(price)
argument is not numeric or logical: returning NA
To resolve the error, we must remove understand where it is coming from. There are missing values in the csv file, which is quite common as most datasets are not perfect. Additionally, there are commas within the excel spreadsheet, and R does not recognize that ‘1,234’ is equivalent to ‘1234’. Lastly, there are ‘$’ symbols throughout the file which is not a numerica symbol either.
The sub function replaces these symbols with something else. So, in order to remove the comma in the number “1,234”, we must substitute it with just an empty space.
As shown on the worksheet, type and copy the exact commands to find the mean with the NA values removed.
#substitute comma with blank in all of price
cleanscoring = price[17:25]
cleanscoring = sub(",","", cleanscoring)
#substitute dollar sign with blank in all of price
cleanscoring = sub("\\$", "", cleanscoring)
class(cleanscoring)
[1] "character"
cleanscoring = as.numeric(cleanscoring)
NAs introduced by coercion
class(cleanscoring)
[1] "numeric"
#mean with NA removed
cleanscoring
[1] 2201 1350 1511 1253 2189 NA 1159 1332 1497
What are some other ways to clean this data in R? How about in Excel?
In R, you can use the findreplace funtion that is located within the program and you can also make changes to data while you are in the process of “importing” the data. You can use the find and relpace tool, format data, or use functions that will automate the process for you.
Now, we will look at Chicago taxi data. Go and explore the interactive dashboard and read the description of the data.
Chicago Taxi Dashboard: https://data.cityofchicago.org/Transportation/Taxi-Trips-Dashboard/spcw-brbq
Chicago Taxi Data Description: http://digital.cityofchicago.org/index.php/chicago-taxi-data-released/
Open in RStudio the csv file is located in the data folder, note the size of the file, the number of columns and of rows here. Use the functions learned in lab00 and lab01 to describe the data, identify unique entities, fields and summarize.
taxidata = read.csv(file="data/taxi_trips_sample.csv")
head(taxidata)
fare = taxidata$Fare
fare
[1] $7.05 $6.05 $7.05 $31.25 $5.50 $9.25 $9.05 $30.45 $18.25 $17.25 $8.05 $21.25 $6.85
[14] $10.45 $7.45 $6.25 $7.45 $7.25 $10.05 $13.25 $35.25 $11.65 $3.25 $9.05 $14.25 $15.85
[27] $14.85 $6.85 $38.85 $5.65 $6.45 $15.45 $3.25 $9.75 $14.85 $16.25 $3.45 $6.50 $5.45
[40] $9.65 $107.45 $13.45 $35.04 $6.25 $13.65 $4.85 $5.05 $9.00 $13.05 $5.45 $10.25 $6.45
[53] $16.25 $8.65 $5.65 $7.05 $8.25 $9.65 $5.25 $7.00 $6.65 $5.85 $9.44 $5.50 $9.25
[66] $16.85 $4.65 $9.65 $39.25 $10.65 $4.84 $18.65 $5.85 $6.45 $18.85 $7.05 $7.25 $9.25
[79] $7.45 $4.25 $17.45 $4.50 $35.25 $7.45 $9.45 $11.45 $9.65 $4.25 $4.25 $6.65 $13.45
[92] $7.50 $5.65 $4.84 $7.45 $4.25 $38.00 $11.50 $8.75 $8.00 $4.45 $14.25 $6.65 $7.05
[105] $12.65 $4.45 $9.85 $5.05 $9.25 $31.45 $6.65 $10.65 $10.85 $5.25 $11.25 $5.45 $6.85
[118] $6.45 $7.85 $17.25 $8.65 $8.85 $5.05 $7.65 $6.45 $11.05 $8.65 $8.05 $37.45 $8.50
[131] $16.85 $4.84 $6.65 $9.45 $7.05 $7.25 $13.85 $15.45 $4.25 $12.45 $23.85 $6.85 $24.25
[144] $38.25 $7.05 $17.85 $7.25 $37.05 $5.65 $6.85 $10.75 $5.25 $5.25 $13.45 $36.65 $9.25
[157] $7.45 $8.25 $10.25 $35.05 $7.25 $14.00 $4.85 $8.25 $7.05 $8.85 $5.85 $7.45 $6.00
[170] $16.25 $3.45 $5.50 $9.25 $6.85 $7.05 $8.65 $8.05 $5.65 $5.25 $3.25 $36.25 $6.65
[183] $8.05 $4.45 $8.45 $5.45 $23.85 $7.05 $15.05 $6.75 $5.75 $9.05 $12.45 $9.65 $8.45
[196] $7.45 $31.45 $6.65 $10.25 $7.65 $7.65 $7.25 $6.45 $90.95 $20.45 $5.75 $3.25 $3.45
[209] $9.44 $9.75 $7.00 $8.85 $44.85 $7.65 $10.25 $10.25 $5.25 $8.25 $8.85 $6.05 $4.05
[222] $9.25 $13.05 $32.25 $6.05 $3.25 $6.25 $22.50 $6.25 $13.45 $5.85 $5.45 $4.85 $11.65
[235] $4.85 $25.65 $5.05 $6.25 $7.25 $4.05 $4.25 $5.00 $17.25 $16.00 $12.05 $44.75 $12.65
[248] $5.45 $48.65 $5.45 $4.25 $4.65 $6.25 $7.45 $5.85 $10.45 $7.45 $4.65 $58.45 $22.85
[261] $11.00 $14.45 $6.25 $7.05 $19.45 $11.45 $11.85 $12.25 $7.05 $4.65 $4.50 $6.45 $8.25
[274] $15.25 $35.04 $3.45 $36.45 $20.45 $4.65 $6.25 $36.05 $5.05 $16.85 $8.25 $5.75 $12.45
[287] $7.65 $10.45 $7.25 $4.25 $8.25 $7.05 $6.05 $11.85 $21.05 $14.00 $6.75 $20.05 $8.44
[300] $46.50 $4.25 $38.85 $7.85 $4.85 $7.05 $5.85 $4.45 $13.65 $8.45 $33.05 $5.05 $10.85
[313] $5.05 $11.65 $5.75 $4.65 $5.25 $7.65 $3.85 $4.45 $36.05 $6.65 $6.25 $13.05 $4.75
[326] $4.45 $37.05 $10.45 $30.25 $24.45 $8.25 $17.05 $8.65 $13.25 $20.75 $6.00 $28.65 $7.05
[339] $5.45 $15.25 $15.00 $6.00 $36.85 $7.75 $8.85 $7.25 $12.65 $14.05 $15.45 $52.50 $6.45
[352] $16.65 $8.25 $5.65 $27.05 $12.25 $10.25 $5.05 $32.05 $8.25 $7.05 $19.65 $29.45 $3.25
[365] $7.85 $4.65 $6.25 $8.65 $5.85 $9.85 $8.05 $16.75 $3.25 $4.65 $3.25 $9.25 $7.25
[378] $10.85 $7.85 $9.85 $9.65 $40.25 $8.05 $10.25 $11.75 $23.85 $30.05 $7.05 $5.25 $8.85
[391] $6.65 $7.05 $4.65 $7.50 $9.25 $17.25 $9.85 $5.45 $32.05 $34.25 $9.45 $13.05 $5.65
[404] $3.25 $34.85 $13.05 $4.85 $3.25 $6.05 $10.25 $14.65 $16.25 $13.25 $14.65 $7.45 $67.00
[417] $7.45 $5.45 $6.25 $10.05 $6.85 $4.65 $12.50 $6.45 $11.25 $5.25 $9.75 $30.25 $7.85
[430] $6.50 $5.45 $10.25 $6.25 $7.45 $6.05 $5.45 $8.50 $5.45 $5.25 $8.44 $10.85 $3.25
[443] $7.05 $9.25 $8.50 $17.85 $4.85 $0.00 $17.64 $7.85 $4.05 $7.25 $35.65 $15.05 $6.25
[456] $7.25 $39.45 $37.25 $5.85 $9.85 $26.85 $3.25 $7.85 $6.15 $5.25 $64.25 $6.05 $8.25
[469] $35.85 $10.45 $12.75 $4.45 $7.00 $7.05 $5.85 $37.45 $7.45 $5.85 $6.65 $3.25 $16.00
[482] $6.45 $8.25 $0.00 $10.05 $9.65 $26.85 $5.25 $10.75 $5.05 $5.65 $8.65 $9.25 $7.85
[495] $5.05 $7.05 $10.50 $6.45 $38.45 $32.85 $8.44 $4.45 $4.25 $7.05 $11.45 $10.45 $37.45
[508] $18.25 $13.25 $15.65 $8.05 $17.25 $35.65 $7.45 $6.05 $6.75 $3.25 $6.50 $10.65 $5.45
[521] $9.85 $6.50 $11.45 $7.85 $12.75 $9.75 $11.65 $33.65 $9.25 $5.25 $37.05 $7.45 $42.75
[534] $5.85 $19.85 $7.05 $7.05 $5.85 $12.25 $7.05 $5.25 $12.65 $11.25 $10.85 $7.05 $28.85
[547] $45.00 $16.05 $5.85 $6.65 $5.25 $5.25 $40.45 $5.00 $16.45 $7.45 $12.65 $13.25 $5.25
[560] $11.65 $45.65 $5.85 $6.00 $6.50 $17.45 $2.85 $7.45 $4.50 $8.65 $6.05 $8.45 $35.25
[573] $5.45 $7.75 $8.05 $9.05 $7.85 $5.65 $7.25 $7.25 $20.85 $6.85 $6.05 $6.00 $10.00
[586] $14.45 $12.45 $36.85 $8.25 $26.85 $11.65 $7.00 $8.05 $89.85 $6.25 $37.05 $8.25 $31.25
[599] $9.50 $6.45 $33.75 $5.45 $29.85 $8.25 $5.50 $13.45 $48.50 $5.45 $4.85 $7.05 $38.65
[612] $5.85 $9.65 $7.25 $10.75 $34.65 $8.25 $6.25 $11.45 $19.45 $8.45 $7.85 $7.65 $8.65
[625] $6.85 $8.65 $5.45 $6.05 $17.25 $10.65 $7.25 $12.45 $19.45 $9.00 $10.45 $8.00 $22.65
[638] $28.85 $3.45 $5.45 $8.05 $6.45 $9.65 $12.45 $5.25 $36.65 $11.25 $37.25 $5.45 $4.05
[651] $45.85 $11.75 $11.45 $11.05 $5.85 $24.05 $44.25 $14.85 $15.85 $5.85 $22.65 $6.45 $6.25
[664] $6.05 $30.65 $6.45 $6.25 $10.25 $8.45 $12.00 $7.45 $4.05 $6.25 $4.45 $8.25 $9.05
[677] $6.45 $4.25 $8.65 $6.05 $9.44 $6.45 $30.05 $27.25 $6.85 $6.50 $8.25 $35.45 $10.25
[690] $10.05 $12.65 $5.25 $6.45 $4.25 $5.85 $11.65 $4.65 $15.45 $6.65 $11.25 $5.05 $9.45
[703] $4.65 $11.45 $6.05 $5.25 $5.65 $18.25 $7.45 $14.65 $7.85 $6.25 $41.05 $15.05 $5.25
[716] $6.05 $5.65 $18.05 $36.85 $8.45 $10.45 $5.45 $8.85 $7.75 $15.65 $13.25 $6.85 $8.45
[729] $12.45 $8.25 $9.00 $8.05 $5.45 $8.65 $4.85 $11.65 $5.75 $23.65 $12.50 $14.05 $6.65
[742] $8.65 $39.25 $6.85 $10.05 $16.45 $5.45 $10.45 $5.45 $8.85 $6.05 $9.65 $82.25 $4.85
[755] $6.25 $8.65 $10.65 $5.05 $8.25 $5.25 $14.75 $5.65 $8.65 $42.00 $6.05 $7.05 $7.25
[768] $38.00 $6.65 $9.05 $6.25 $30.65 $6.50 $7.05 $25.85 $7.05 $10.85 $9.05 $6.00 $8.44
[781] $6.85 $5.05 $9.25 $6.50 $5.00 $8.85 $12.05 $11.25 $5.85 $10.45 $5.45 $6.25 $12.85
[794] $8.25 $7.85 $6.05 $8.25 $39.75 $5.25 $20.65 $8.45 $6.45 $6.85 $8.85 $10.85 $25.45
[807] $5.85 $18.45 $5.65 $12.65 $4.65 $7.75 $7.45 $36.25 $6.50 $38.25 $7.85 $7.85 $13.25
[820] $4.85 $4.65 $6.50 $3.25 $6.45 $7.25 $20.45 $29.05 $9.25 $7.25 $8.00 $4.84 $5.25
[833] $8.65 $5.25 $35.25 $5.65 $5.25 $36.50 $29.05 $10.85 $11.25 $8.00 $6.05 $37.65 $9.05
[846] $12.05 $4.75 $10.25 $7.05 $5.65 $26.25 $85.25 $5.45 $13.50 $10.00 $7.05 $6.25 $15.25
[859] $4.45 $4.84 $35.00 $5.25 $6.05 $8.45 $5.25 $8.85 $7.05 $5.65 $5.65 $7.85 $5.45
[872] $7.85 $10.25 $11.50 $6.85 $41.05 $5.65 $8.05 $7.00 $7.25 $5.65 $36.50 $10.50 $4.65
[885] $4.65 $10.05 $8.05 $6.45 $4.45 $27.85 $5.45 $7.85 $4.85 $12.05 $4.65 $9.05 $6.45
[898] $12.25 $7.05 $5.25 $9.75 $4.65 $7.25 $3.25 $9.00 $5.25 $4.84 $17.85 $8.25 $12.05
[911] $4.65 $14.25 $7.05 $4.50 $34.85 $7.45 $12.05 $3.45 $7.45 $5.65 $8.85 $8.85 $5.05
[924] $9.25 $5.65 $4.85 $9.65 $30.45 $12.85 $7.75 $6.25 $13.45 $4.45 $35.65 $17.75 $6.85
[937] $39.45 $9.05 $5.25 $11.75 $6.85 $11.65 $7.25 $7.45 $12.85 $7.45 $3.25 $8.50 $7.05
[950] $4.00 $8.25 $6.45 $5.65 $5.65 $5.65 $6.45 $5.65 $7.85 $22.25 $12.25 $37.45 $3.25
[963] $36.45 $21.85 $10.05 $5.85 $5.85 $10.45 $15.45 $6.65 $15.05 $34.45 $18.45 $8.25 $5.25
[976] $10.75 $13.85 $7.05 $15.65 $5.25 $33.00 $3.25 $17.45 $5.25 $4.05 $5.50 $9.05 $5.05
[989] $10.25 $6.25 $5.85 $9.85 $15.05 $10.45 $5.45 $27.45 $11.50 $6.75 $4.85 $6.45
[ reached getOption("max.print") -- omitted 98999 entries ]
892 Levels: $0.00 $0.01 $0.03 $0.05 $0.10 $0.11 $0.28 $0.30 $0.32 $0.34 $0.42 $0.60 $1.00 $1.11 $1.52 ... $99.99
#To find the mean...the first step will be to deal with the dollar symbols present in the data.
cleantaxi = fare[1:10]
cleantaxi = sub("\\$", " ", cleantaxi)
class(cleantaxi)
[1] "character"
#The second step will be to convert the classification from "character" to "numeric".
cleantaxi = as.numeric(cleantaxi)
class(cleantaxi)
[1] "numeric"
mean(cleantaxi)
[1] 14.115
Define a relational business logic for the column field ‘Trip Seconds’.
Trip seconds could not be more or less than the ‘Trip End Timestamp’ and ‘Trip Start Timestamp’ difference.
Using https://erdplus.com/#/standalone draw a star schema using the following three tables: