This is my first Data Analysis Project. I did it as part of Google Data Analytics course. The complete analysis prcess is divided into three parts as following:

This document is not part of final analysis but it is important to me to publish it. The reflex part contains what I think about this process. The reject part contains the reject graphs and tables that do not make it in the three documents above for various reasons.

Reflex

Why R?

This is the first time that I use R. I decided to use R for this project because I want to learn more about R. At the end, I did not disapoint, I learn a lot (but still not enough).

What did I learn?

What I found out about R is that it is very easy to learn. You can do some interesting graph with few line of code. However, I also learn that the data processing part is very important. It is so easy to use wrong data to create very good looking graph that will mislead dicision makers.

I learn how to do most, if not all, visualizes that included in this project from search the web. There are many resources and community that can help you create a diagram that you dream off. If you think about some kind of graph or diagram, I am sure that someone already think about something simmilar and already post how to do it on the web.

What I want to accomplish?

I did this project because I took Google Data Analytics course. This is a cap-stone project. There are three choices for the data, bike share data, smart device, and bring your own data. I took what look like a easy path the bike share. However, I know that there were and there will be thousands of Divvy Bike analyses published on the web. My aim is not to be the best Divvy bike analysis but to learn about data analysis with R.

My other goals are:

  • Force myself to complete everyting in R and R markdown.
  • Pay attention to details such as scales, labels, captions. For example, most of the time I use daily average trips instead of total trips in the scale. This make different when you compare two project that may have different time frame. I used two years of data, some people may use one year or one quarter. The total number of ride will not be comparable but average daily (or monthly) value should be.
  • Make sure my visualize is easy to understand and consistant. I rejected many of cool looking graphs and diagrams because they are difficult to understand or too complex.
  • Not overlook any important information. I make sure that I consider every possible variables avaliable to me in the data. Also, I did a lot of research to familiar myself with Chicago.

Did I accomplish all of my goals.

The answer is partially yes. I try but because it took too long to finished and I have to stop myself. I learn a lot but the more I learn the more I know that there are more thing to learn. At the end, I am happy with the final product even I know that I can do better.

What do I want to improve?

  1. Have more comments in the code.
  2. Use functions to reduce redundant code and make it easier to understand.
  3. Explain them better.

Reject

It was fun to create visualize and I want to include all of them in the final report. However, I decided not to include a lot of them because they are:

  • Redundant
  • Too complex or hard to understand
  • Not show what I want to show
  • Do not help answer the main question.
  • May be mislead
  • I found better way to visualize
  • Does not look good
  • I do not know how to do it correctly

Many of these reject were not finish and may have wrong title, label and guild. I created them and decided not to use them. So, I did not bother to correct them.

Number Ride per day Statistic by Day of Week
Day Rider Type min max mean sd
Sun casual 77 18,235 6,422.17 4,897.66
Sun member 462 13,696 7,577.36 3,760.35
Mon casual 52 15,368 4,461.83 3,064.72
Mon member 619 16,415 9,808.53 3,881.85
Tue casual 116 10,335 4,282.61 2,916.48
Tue member 892 18,619 10,735.07 4,574.90
Wed casual 324 11,562 4,544.30 3,273.72
Wed member 1,113 19,045 10,891.78 4,739.09
Thu casual 495 12,651 4,802.99 3,384.74
Thu member 1,817 18,785 10,771.75 4,529.09
Fri casual 337 14,078 5,718.96 4,140.33
Fri member 1,608 16,844 9,778.95 4,399.76
Sat casual 304 17,739 7,717.56 5,782.07
Sat member 1,480 15,850 8,646.66 4,119.72
Number Ride per day Statistic by Weekend/Weekday
Day Rider Type min max mean sd
weekday casual 52 15,368 4,760.64 3,406.25
weekday member 619 19,045 10,396.74 4,444.57
weekend casual 77 18,235 7,066.77 5,382.42
weekend member 462 15,850 8,109.45 3,970.11

Summary of Random Sampling Data with Equal Casual and Member
member_casual start_lat start_lng end_lat end_lng season week_day morning_afternoon length_min
casual:100000 Min. :41.65 Min. :-87.86 Min. :41.65 Min. :-87.86 Spring:51952 weekday:138734 morning : 57996 Min. : 1.000
member:100000 1st Qu.:41.88 1st Qu.:-87.66 1st Qu.:41.88 1st Qu.:-87.66 Summer:85637 weekend: 61266 afternoon:142004 1st Qu.: 5.935
NA Median :41.90 Median :-87.64 Median :41.90 Median :-87.64 Fall :44412 NA NA Median : 10.238
NA Mean :41.90 Mean :-87.65 Mean :41.90 Mean :-87.65 Winter:17999 NA NA Mean : 15.397
NA 3rd Qu.:41.93 3rd Qu.:-87.63 3rd Qu.:41.93 3rd Qu.:-87.63 NA NA NA 3rd Qu.: 18.083
NA Max. :42.07 Max. :-87.53 Max. :42.07 Max. :-87.53 NA NA NA Max. :179.977