library(data.table)
library(DT)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:data.table':
## 
##     hour, isoweek, mday, minute, month, quarter, second, wday, week,
##     yday, year
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(ggplot2)
library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
## 
##     format.pval, units
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:Hmisc':
## 
##     src, summarize
## The following objects are masked from 'package:data.table':
## 
##     between, first, last
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
custs <- fread("customers.csv")
prods <- fread("products.csv")
views <- fread("views -- January 2020.csv")
txns <- fread("transactions -- January 2020.csv")




## Part 1:  Summary {.tabset}

This part of the report will be directed to your internal team at the consulting company.  It is intended to document the sources of information that were used in the project.  It will also describe the data in less technical terms to team members who are not data scientists.  If another member of the team joins the project later, they will rely on your descriptions to gain familiarity with the data.  To that end, we recommend providing some instructions that will help other consultants use the information more effectively.

### Customers

The customers table contains 100000 rows of 5 variables: customer_id age gender income region .

The average customer age is 41.5 , the average income is $ 66362 , and the regions are: Northeast Midwest South West .



### Products

The products table contains 8637 rows of 3 variables: category product_id price .

The product categories are: shoes pants shirt coat hat .



### Views

The views table contains 4474131 rows of 3 variables: customer_id product_id time . The customer_id variable joins to the customers table and the product_id variable joins to the products table.



### Transactions

The transactions table contains 119287 rows of 5 variables: customer_id product_id price quantity time . The customer_id variable joins to the customers table and the product_id variable joins to the products table.

```

Part 3: Generalization

This part of the report will be directed internally to your team’s engagement manager. The idea is to present these approaches to your team. The work will then be conveyed to the client’s technical team and middle managers who are working closely with you on the project. Plan your communication accordingly.

Q1

Question

Did you see any problems with the data set? If so, whom would you report them to, and what would you do to address them? What would be different about the next version of the data?

Answer

One problem I noticed with the data set is that in the transactions table, the transactions are split up by product but not by total purchases. That is, each product within a transaction has its own line with the same customer id and transaction time. If a customer purchases multiple different items at once, it could be misleading to treat these as separate transactions. It would be beneficial to add a transaction id, which would make it possible to group by the entire transaction and analyze how often purchases are being made and what the spending patterns are for purchases. The only way to do that with the current setup would be to group by the time of the transaction and the customer id.

Another problem I noticed is that the times in the views table are in character format, not a datetime. There are confusing T’s and Z’s within the strings that I don’t exactly understand what they mean or what their purpose is, and they make it difficult to convert to a date and time format that is useful for analysis. I worked around this by subsetting certain characters and then converting to a date, but that is messy and challenging to reproduce. To allow all users to do meaningful analysis on viewing patterns by time, we should publish that data as a common datetime type.

I would find out who owns these data sets within the company, and report these problems to them. I would also suggest my potential solutions and offer to work with them on the implementation for the next version of the data.

Q2

Question

Now generate a version of the same report using the data on views and transactions from the month of February 2020.

In building this report, do not create a new RMarkdown file. Instead, build a small .R file that allows the user to specify some parameters (e.g. the names of the files). Then use the render function in the rmarkdown library to run the report. Supply these new parameters as a list object in the params input. Then you can make use of these parameters within the RMarkdown file. For instance, if your file name is “views – January 2020.csv” and it is stored as params$views.file, then you can read the data with fread(input = params$views.file)

Use the dir.create function to build new subfolders to store each month’s report. Specify a name for the output file when calling the render function. Use this method to generate the separate reports for January and February.

Briefly describe your process for implementing this automated approach. What work would a non-technical user need to perform to run this script without your involvement?

Answer

This approach automates the reporting process and allows non-technical users to access the reports on their own machine. Users need only to follow the instructions in the file to type in their desired path and the input file names. They also need to have the template and .R file downloaded and saved in the correct place. Then they can run the script by running the .R file and once it finishes they can access the rendered reports on their computer in the specified location.

Q3

Question

What are the advantages of creating an automated approach to routine reporting?

Answers

The biggest advantage of creating an automated approach to routine reporting is the time saved. Instead of spending time recreating the report every time it needs to be run, users can get the information they need right away and be more productive faster. It also ensures that no business insights or opportunities are lost or delayed due to not running the report often enough.

Another advantage is that once the reporting process is automated, the person who created the report and process does not have to be involved. They can pass on the template to a colleague or client who can follow the process and run it themselves. Even non-technical users can easily access the report as needed if the instructions for the process are made clear.

Additionally, an automated approach reduces the possibility for error. Once the report is confirmed to be correct, we can reuse it, instead of trying to recreate it and risking mistakes or inconsistencies.

Part 4: Opportunities

This part of the report will be directed externally to your client’s senior leadership. Your work will help to determine the future direction of the project and the company’s contract with this client. Plan your communication accordingly.

Q1

Question

How would you build on the reporting capabilities that you have created? What would you design next?

Answer

I would like to build on the reporting capabilities created here to further our analyses and gain insight into the business. In addition to simply publishing the whole report of views and transactions, I would like to identify KPIs that are meaningful to the business and use the data to calculate and monitor those metrics on a regular basis.

To expand on making the reports reproducible for non-technical end users, I would also like to build a user-friendly interface. If there is an application where end users could input any parameters, date ranges, product categories, etc. and receive the reports they need on-demand with self-service, it will empower people across the business to make smarter decisions based on analytics.

Q2

Question

What are some opportunities to learn valuable information and inform strategic decisions? List a number of questions that you might explore.

Answer

What is the conversion success rate of each product, calculated as the ratio of transactions to views? Which products are performing the best and worst?

How does the price impact the conversion success rate? How does it impact the volume of transactions?

Which types of transactions are most and least common among each combination of demographic groups?

How have the total product views and transactions trended over time? Do these trends match up with events like launches, promotions, and marketing campaigns?

Which times of day and days of week are the most likely for purchases to occur?

Q3

Question

How would you approach other decisionmakers within the client’s organization to assess their priorities and help them better utilize the available information?

Answer

I would start by asking each decision maker what they care about and how their business area’s success is measured. Once we discuss their business goals, I would inquire and learn about their current day to day processes of how they retrieve information to get their work done. From there, we can identify pain points and time-consuming steps that may be able to be streamlined with a reproducible reporting process. Once I understand what they want to do, I can begin to build out potential solutions and work with the decision makers along the way to make changes and improvements on them.

To establish credibility and to give decision makers some starting ideas, I could share the prior work I’ve done with other areas of the client’s organization and discuss the outcomes and impact of those projects.

Q4

Video Submission: Make a 2-minute pitch to the client with a proposal for the next phase of work. Include in your request a budget, a time frame, and staffing levels. Explain why this proposal would be valuable for the client and worth the investment in your consulting services. Please submit this answer as a short video recording. You may use any video recording program you feel comfortable with. The only requirements are that you are visible and audible in the video. You may also submit a file of slides if that is part of your pitch.