Discussion 2 - Spark Summit (Chris Johnson-Spotify)

My Takeaways From the Spark Summit Video

This video addressed how spotify does recommendations using Spark (note - this video took place in 2017). There were a couple of things I found interesting in this video. The first was my recognition that these music providers like Spotify, Pandora, etc are examples of long-tailed business - just like Amazon. This point hit home for me when the speaker was going presented a list of music providers and started explaining each providers offering by means of the algorithm / approach they used to recommend music to their users. For example, Songza uses manual curation , Pandora uses experts to manually tag the song as part of the music genome project, echonext uses audio content meta data text analysis and last.fm and Spotify use Collaborative Filerting amoung other techniques. I think its very interesting how these algorithms define and shape the business models of these companies. It leads me to think of other long-tail opportunities that could benefit from recommender engines.

I also found the discussion of how Spotify transitioned from Hadoop to Spark with Broadcasting without cache to Spark with Caching in the full gridify approach and Spark with cache with the half gridify method. Its interesting to me to see how the manufacturing efficiency process in these long-tailed businesses have been moved from the shop floor to the server room. One of the chart presented showed how moving from Hadoop to full gridify to half gridify cut “manufacturing” time from 10 hours to 3.5 hours to 1.5 hours - substantial improvements any way you cut it.

Discussion 2 - Spark Summit (Chris Johnson-Spotify)

Jim Mundy

7/14/2020

My Takeaways From the Spark Summit Video