摘要(英) |
The recommendation system is widely used in the on-line entertainment industries.By building the system, services prociders like Amazon、Spotify、Netflix can reveal as more products or contents to their users as possible. The more satisfaction they get from their users means the more user engagement they win.
Take digital music services, in trandition, the system recommended musics based on the historical records or its’ metadata. Along with the improvement of technology, we can easily process large datasets such as user-ratings data or user-behavior data and apply some data mining algorithm such as collaborative filtering algorithm to do the personalization recommendation.
In this study, the Yahoo! Music dataset is used.First, we try to tune the performance of collaborative filtering algorithm and treat it as the baseline of our recommendation system. Second, we reform the user-ratings data to apply two algorithms: Frequent-Pattern Growth and Word2vec in order to find the similarity of songs. Finally, the hybrid models combine the results of CF and fp-growth/Word2vec and both their evaluation metrics : map@5、map@10 are improved. Moreover, the approach we provided is adopted in the Apache Spark framework. It benefits us when dealing with the larger datasets in real world. |
參考文獻 |
1. IFPI. (2020). IFPI issues annual Global Music Report. Retrieved June 7th, 2020, from https://www.ifpi.org/news/IFPI-issues-annual-Global-Music-Report
2. 文化部. (2020). 107年流行音樂產業調查報告
3. PwC. (2019). 2019-2023 全球娛樂暨媒體業展望
4. Global Online Music Streaming Grew 32% YoY to Cross 350 Million Subscriptions in 2019. Retrieved June 7th, 2020, from https://www.counterpointresearch.com/global-online-music-streaming-grew-2019/
5. Spotify. https://www.spotify.com/
6. Apple Music. https://www.apple.com/tw/apple-music/
7. Amazon Music. https://music.amazon.com/home
8. 資策會產業情報研究所. (2015). 2015台灣數位音樂型態與消費趨勢分析
9. 為何Spotify推薦的歌曲總能符合你的喜好?讓它超越競爭對手的秘密.Retrieved June 7th, 2020, from https://www.businessweekly.com.tw/business/blog/3001912
10. 數位時代. (2019). LINE MUSIC強勢登台!一張圖看串流音樂「四兄弟」差異. Retrieved June 7th, 2020, from https://www.bnext.com.tw/article/53955/streaming-music-service-in-taiwan-line-music-kkbox-apple-music-spotify
11. 協同過濾演算法. Retrieved June 7th, 2020, from https://zh.wikipedia.org/wiki/%E5%8D%94%E5%90%8C%E9%81%8E%E6%BF%BE
12. Szu-Yu Chou et al.,. (2015) Evaluating music recommendation in a real-world setting: on data splitting and evaluation metrics
13. Apache Spark. https://spark.apache.org/
14. Yahoo! Music. https://en.wikipedia.org/wiki/Yahoo!_Music
15. P.Resnick, H. R. Varian, G.(1997) Editors. Recommender Systems .Communications Of The ACM,1997,40(3):56-58.
16. Recommender system. https://en.wikipedia.org/wiki/Recommender_system
17. Cold start.https://en.wikipedia.org/wiki/Cold_start_(recommender_systems)
18. Gomez-Uribe, Carlos A.; Hunt, Neil (2015). The Netflix Recommender System. ACM Transactions on Management Information Systems. 6 (4): 1–19.
19. Systex Etu.(2014). Etu Recommender 2.0 精準推薦和消費者行為分析平台教育訓練
20. Akoios.(2020). Building a movie recommender system. Retrieved June 10th, 2020, from https://medium.com/@akoios/building-a-movie-recommender-system-e2384328a134
21. 張良卉.(2013).矩陣分解法對網路評比資料分析之探討.
22. Reza Zadeh,Databricks,Stanford. (2015). Stanford CME 323: Distributed Algorithms and Optimization, Spring 2015 ,lecture 14
23. DigiTimes.(2014) Big Data經典案例:星期五、尿布與啤酒. Retrieved June 8th, 2020, from https://www.digitimes.com.tw/tw/dt/n/shwnws.asp?cnlid=10&cat=35&id=401927
24. Tomas Mikolov et al., (2013). Efficient Estimation of Word Representations in Vector Space.
25. Co-occurrence matrix.https://en.wikipedia.org/wiki/Co-occurrence_matrix
26. 唐正陽. (2016) 用 Word2vec 輕鬆處理新金融風控場景中的文本類數據 https://kknews.cc/tech/38lg8v8.html
27. 溫品竹,蔡易霖,蔡宗翰 (2015) 基於Word2Vec 詞向量的網路情緒文和流行音樂媒合方法之研究
28. David Reinsel et al., (2018) The Digitization of the World from Edge to Core .IDC White Paper – #US44413318.
29. Apache Hadoop.(2006) https://zh.wikipedia.org/wiki/Apache_Hadoop
30. Map Reduce.(2005) https://zh.wikipedia.org/wiki/MapReduce
31. 加州大學柏克萊分校AMPLab. https://amplab.cs.berkeley.edu/
32. Apache Mahout. https://mahout.apache.org/
33. Scalable Collaborative Filtering with Apache Spark MLlib. Retrieved June 10th, 2020,from https://databricks.com/blog/2014/07/23/scalable-collaborative-filtering-with-spark-mllib.html
34. WSDM - KKBox′s Music Recommendation Challenge. (2018) https://www.kaggle.com/c/kkbox-music-recommendation-challenge/
35. FMA: A Dataset For Music Analysis Data Set. (2017) https://archive.ics.uci.edu/ml/datasets/FMA%3A+A+Dataset+For+Music+Analysis
36. Netflix Prize.(2009). https://en.wikipedia.org/wiki/Netflix_Prize
37. Yahoo Webscope Program. (2020) Retrieved June 8th, 2020, from https://webscope.sandbox.yahoo.com/
38. Yahoo! Music Webscope_C15-Yahoo! Music user ratings of musical tracks, albums, artists and genres, v 1.0 (1.5 Gbyte) Retrieved Feb 4th, 2017, from https://webscope.sandbox.yahoo.com/catalog.php?datatype=c&did=48
39. Fayyadet al., (1996) , “ From Data Mining to Knowledge Discovery in Databases, “ AI Magazine, Volume 17, Number 3. pp. 37–54
40. 梁德馨,葉建良. (2007) 消費者信用貸款違約風險評估模型之研究- 以 CART 分類與迴歸樹建模. 中山管理評論
41. Apache Spark ALS.(2020). Retrieved Feb 4th, 2020, from https://spark.apache.org/docs/1.6.0/mllib-collaborative-filtering.html#collaborative-filtering
42. Regularization.(2020). 林軒田 機器學習基石-第十四講 https://www.coursera.org/learn/ntumlone-algorithmicfoundations
43. Overfitting https://zh.wikipedia.org/wiki/%E9%81%8E%E9%81%A9
44. Apache Spark ALS API.(2020). Retrieved Feb 4th, 2020, from https://spark.apache.org/docs/1.6.0/api/python/pyspark.mllib.html#pyspark.mllib.recommendation.ALS
45. Gensim.(2010) https://radimrehurek.com/gensim/models/word2vec.html
46. RSME.(2020) https://en.wikipedia.org/wiki/Root-mean-square_deviation
47. Dheeraj kumar Bokde et al., (2015). An Item-Based Collaborative Filtering using Dimensionality Reduction Techniques on Mahout Framework
48. map.(2014). Stanford Class cs276 .Retrieved Feb 4th, 2020, from https://web.stanford.edu/class/cs276/handouts/EvaluationNew-handout-6-per.pdf
49. Nick Pentreath. (2015). Machine Learning with Spark. ISBN: 9781783288519 |