First we take a look at the geographic market, where are the majority of adverts posted for each site?
Are road bikes or mountain bikes more expensive, on average? We look first at Gumtree
It is evident that prices have a long tail on the right, indicating a few expensive bikes in the Eastern Cape, Gauteng and the Western Cape.
What about for bikehub? They have many more types so we will plot them lumped into the three largest categories and “other”
The picture is more mixed here and there are bimodal distributions for many provinces and bike types. There is also a wider spread in terms of price than on gumtree.
## Warning: Groups with fewer than two data points have been dropped.
We have information on seller’s date of registration on the platform, the number of ads over all time and the number of ads currently active from that seller. First we look at the relationship between current ads and all time ads. In other words, are sellers repeat users of the platforms?
For gumtree, there is a positive relationship between active ads and all time ads, at least for fewer than 2000 all time ads. It appears that sellers on gumtree have listed many many more ads than those on bikehub.
There are at least two explanations for this. First, gumtree allows for the sale of a wider variety of things, not only bicycles, so it makes sense that the average seller may have more ads on gumtree than bikehub if they sold other household items on gumtree. Second, the format of gumtree’s rankings shows more recently posted ads at the top of the search page, and so there is an incentive for dealers to post the same ads every day in order for their ads to be at the top of the results. The same holds for bikehub, but perhaps people do not do it?
It appears that more sellers have registered on bikehub recently than on gumtree, there the bulk of users is a little further left, indicating registering earlier. It is also worth noting that prior to 2008, gumtree had few users while bikehub had some, potentially indicating that bikehub was the leader. However our sample of adverts is form users still active today - so it may indicate that some fraction of bikehub’s users are repeat advertisers who registered longer ago.
It is worth cautioning that this is not a time series plot of the number of users - rather it is the date of registration of active users whose ads were scraped.
Next we turn to the number of ads over time and their year of registration.
## Warning: Removed 296 rows containing non-finite values (stat_smooth).
## Warning: Removed 296 rows containing missing values (geom_point).
As we expect the people that have joined most recently have the fewest number of adverts. The two look similar, but it is worth noting that there are far fewer all time ads per seller on bikehub. The y-axis is log scale.
On average, we see that the ads on gumtree last for a shorter duration. However there are some outliers that are very high!
Is there a relationship between the number of images and length of description and the price asked?
My intuition is that the adverts that contain more information are likely to be for a more expensive bike.
We look first at the length of the text in each advert (measured by number of characters) compared to the log of price.
It appears that for gumtree users, the higher the asking price the longer the description of the bike in text. This is less clearly a linear upward trend for bikehub adverts. The smoothed line is quity kinky and the wider confidence interval does not indicate a strong relationship. Regression analysis on this topic will follow.
Next we turn to the relationship between number of photos and price.
It appears that adverts on bikehub have a larger number of images on average. On gumtree, it appears that there is indeed a relationship wherein as the log of price increases, so does the number of images. This isn’t conclusive.
There is scope for a more complex text analysis which can text mine the descriptions in the adverts and find specific words associated with higher asking prices or relatively larger number of images in each advert, and contrast these between the two sites. It is relatively easy to implement, I just need a little bit of time to get it working.