分析網路日誌中有意圖、無意圖及缺失之使用者行為

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：6

、訪客IP：18.117.157.139

姓名

許哲芸(Che-Yun Hsu) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

分析網路日誌中有意圖、無意圖及缺失之使用者行為
(Analyzing Intentional, Unintentional and Missing User Behaviors in Weblogs)

相關論文

★ 透過網頁瀏覽紀錄預測使用者之個人資訊與性格特質	★ 透過矩陣分解之多目標預測方法預測使用者於特殊節日前之瀏覽行為變化
★ 動態多模型融合分析研究	★ 擴展點擊流：分析點擊流中缺少的使用者行為
★ 關聯式學習：利用自動編碼器與目標傳遞法分解端到端倒傳遞演算法	★ 融合多模型排序之點擊預測模型
★ 基於自注意力機制產生的無方向性序列編碼器使用同義詞與反義詞資訊調整詞向量	★ 探索深度學習或簡易學習模型在點擊率預測任務中的使用時機
★ 空氣品質感測器之故障偵測--基於深度時空圖模型的異常偵測框架	★ 以同反義詞典調整的詞向量對下游自然語言任務影響之實證研究
★ 結合時空資料的半監督模型並應用於PM2.5空污感測器的異常偵測	★ 藉由權重之梯度大小調整DropConnect的捨棄機率來訓練神經網路
★ 使用圖神經網路偵測 PTT 的低活躍異常帳號	★ 針對個別使用者從其少量趨勢線樣本生成個人化趨勢線
★ 基於雙變量及多變量貝他分布的兩個新型機率分群模型	★ 一種可同時更新神經網路各層網路參數的新技術— 採用關聯式學習及管路化機制

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

網路日誌(Weblog) 已廣泛的用來代表使用者線上行為，然而我們發
現網路日誌只記錄了使用者部分的行為，例如只記錄到使用者點擊網頁
之行為卻忽略網頁分頁間切換之行為。同時可能多紀錄了非使用者自發
性的行為，例如當瀏覽的網頁進行重新網址導向或者彈出廣告視窗，後
面所開啟的網頁並非使用者意圖想要瀏覽的，但卻會被包含在瀏覽紀錄
中。我們發現一般的網路日誌中僅記錄到使用者一半左右的瀏覽行為而
且其中5.6% 是屬於使用者可能無意識的行為。
透過建立Google Chrome 瀏覽器中plugin 和招募受試者下載使用，
我們對有意圖的瀏覽紀錄、無意圖的瀏覽紀錄以及缺失的瀏覽紀錄進行
統計，並且發現傳統的瀏覽紀錄中最常使用的網站類型之排名和加上缺
失的瀏覽紀錄或者去掉無意圖的瀏覽紀錄之排名是不一樣的，也因此我
們對於傳統的瀏覽紀錄是否能代表使用者瀏覽行為產生疑問，依傳統瀏
覽記錄進行的分析也可能因此而產生偏誤。本文透過對傳統的瀏覽紀
錄、有意圖的瀏覽紀錄及有意圖加上缺失的瀏覽紀錄三者進行分析，並
使用常見的分類模型對「下次點擊的事件類型」、「下次點擊會間隔多久」
及「未來的瀏覽之網站比例」進行預測，發現相較於傳統的瀏覽紀錄另
外兩者皆有良好的表現。這表示網路日誌漏記的使用者行為可能含有額
外的資訊且非使用者自發性但存在於網路日記中的紀錄可能雜訊大於資
訊。

摘要(英)

Weblogs have been widely used to represent the behavior of online
users. However, we found that weblog only records part of users’ behaviors.
For example, traditional weblogs do not record tab switching and
browser window switching. Besides, weblog may record some visits that
do not come from a users’ conscious actions. For instance, web pages resulted
from page redirects and page pop-ups are recorded in the browsing
history, but users may not have intentions to visit these pages. We discover
that, on average, weblogs approximately record only half of a users’
page visits and 5.6% of the visits recorded in the weblog belongs to users’
unconscious actions. To collect and analyze the conscious visits, unconscious
visits, and ”missing”visits (i.e., the visits that are unrecorded in
the traditional weblog), we created a Google Chrome plugin and recruited
users to install the plugin. We reported the statistics of visits and showed
that sorting the popular website categories based on the traditional weblog
is different from the rankings obtained from including the missing visits or
excluding the unintentional visits. Therefore, traditional weblog may be a
biased representation of a user’s online behaviors, and the observations or
conclusions derived from weblog analysis are questionable. Additionally,
we predicted users’ future behaviors based on three types of training data –all the visits in traditional weblogs, intentional visits in weblogs, and intentional
visits plus missing visits in weblogs. We applied supervised learning
algorithms to make predictions. The experiment results show that using
intentional visits in weblogs or intentional visits plus missing visits in weblogs
usually perform better compared to using all the visits in traditional
weblogs. This result indicates that missing visits in weblogs may contain
additional information, and unintentional visits in weblogs may have more
noise than information.

關鍵字(中)

★ 點擊流
★ 網路日誌分析
★ 使用者行為分析

關鍵字(英)

論文目次

摘要ix
Abstract xi
目錄xiii
一、緒論1
1.1 研究動機.................................................................. 1
1.2 研究目標.................................................................. 2
1.3 研究貢獻.................................................................. 3
1.4 論文架構.................................................................. 4
二、相關研究5
2.1 網路日誌應用實例...................................................... 5
2.1.1 社群網站之應用................................................ 5
2.1.2 電商網站之應用................................................ 6
2.1.3 旅遊租屋網站之應用.......................................... 7
2.2 擴展點擊流：分析點擊流中缺少的使用者行為.................. 7
三、有意圖點擊流、無意圖點擊流及擴展點擊流9
3.1 點擊流..................................................................... 9
3.2 擴展點擊流............................................................... 10
xiii
目錄
四、資料集介紹13
4.1 原始資料集............................................................... 13
4.2 資料集前處理............................................................ 14
4.2.1 時間單位前處理................................................ 14
4.2.2 網站網址前處理................................................ 15
4.2.3 事件間之時間間隔前處理.................................... 16
4.3 資料統計與分析......................................................... 17
五、實驗21
5.1 問題與想法............................................................... 21
5.2 實驗設計.................................................................. 21
5.3 分類器選擇及介紹...................................................... 22
六、實驗結果與分析25
6.1 實驗資料集介紹......................................................... 25
6.2 模型評估標準............................................................ 26
6.3 實驗結果分析討論...................................................... 27
6.3.1 預測使用者下次點擊網站類型.............................. 27
6.3.2 預測使用者下次點擊間隔時間.............................. 31
6.3.3 預測使用者未來瀏覽之網站比例........................... 32
七、結論與未來展望37
7.1 結論........................................................................ 37
7.2 未來展望.................................................................. 38
參考文獻39
附錄A 實驗之混淆矩陣41

參考文獻

[1] M. Grbovic and H. Cheng, “Real-time personalization using embeddings for search
ranking at airbnb,” in Proceedings of the 24th ACM SIGKDD International Conference
on Knowledge Discovery Data Mining, 2018.
[2] R. Bhagat, S. Muralidharan, A. Lobzhanidze, and S. Vishwanath, “Buy it again:
Modeling repeat purchase recommendations,” in Proceedings of the 24th ACM
SIGKDD International Conference on Knowledge Discovery Data Mining, 2018.
[3] A. Kumar, V. Ahirwar, and R. K. Singh, “A study on prediction of user behavior
based on web server log files in web usage mining,” International Journal of
Engineering and Computer Science, 2017.
[4] G. Wang, X. Zhang, S. Tang, H. Zheng, and B. Y. Zhao, “Unsupervised clickstream
clustering for user behavior analysis,” Proceedings of the 2016 CHI Conference on
Human Factors in Computing Systems, 2016.
[5] N. B. Pawar, M. Gaikwad, S. Kalyani, and M. Savla, “Analysis and prediction of
e-customers behaviour by mining clickstream data using naive bayes,” International
Journal of Advance Research, Ideas and Innovations in Technology, vol. 4, pp. 2427–
2430, 2018.
[6] J. Liu, P. Dolan, and E. R. Pedersen, “Personalized news recommendation based
on click behavior,” in IUI ’10, 2010.
[7] F. Abel, Q. Gao, G.-J. Houben, and K. Tao, “Analyzing user modeling on twitter
for personalized news recommendations,” in UMAP’11, 2011.
[8] K. R. Suneetha and R. Krishnamoorthi, “Identifying user behavior by analyzing
web server access log file,” 2009.
[9] F. Buccafurri, G. Lax, S. Nicolazzo, and A. Nocera, “Comparing twitter and facebook
user behavior: Privacy and other aspects,” Comput. Hum. Behav., vol. 52,
pp. 87–95, 2015.
[10] M. Kosinski, D. Stillwell, and T. Graepel, “Private traits and attributes are predictable
from digital records of human behavior.,” Proceedings of the National
Academy of Sciences of the United States of America, vol. 110 15, pp. 5802–5,
2013.
[11] S. C. Matz, M. Kosinski, G. Nave, and D. Stillwell, “Psychological targeting as an
effective approach to digital mass persuasion,” Proceedings of the National Academy
of Sciences of the United States of America, vol. 114, pp. 12 714–12 719, 2017.
[12] P. Covington, J. L. Adams, and E. Sargin, “Deep neural networks for youtube
recommendations,” Proceedings of the 10th ACM Conference on Recommender
Systems, 2016.
[13] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua, “Neural collaborative
filtering,” Proceedings of the 26th International Conference on World Wide Web,
2017.
[14] H. Bang and J.-H. Lee, “Collective matrix factorization using tag embedding for
effective recommender system,” 2016 Joint 8th International Conference on Soft
Computing and Intelligent Systems (SCIS) and 17th International Symposium on
Advanced Intelligent Systems (ISIS), pp. 846–850, 2016.
[15] J. Tang and K. Wang, “Personalized top-n sequential recommendation via convolutional
sequence embedding,” Proceedings of the Eleventh ACM International
Conference on Web Search and Data Mining, 2018.
[16] G. Zhou, C. Song, X. Zhu, X. Ma, Y. Yan, X. Dai, H. Zhu, J. Jin, H. Li, and K. Gai,
“Deep interest network for click-through rate prediction,” Proceedings of the 24th
ACM SIGKDD International Conference on Knowledge Discovery Data Mining,
2018.
[17] G. Zhou, N. Mou, Y. Fan, Q. Pi, W. Bian, C. Zhou, X. Zhu, and K. Gai, “Deep interest
evolution network for click-through rate prediction,” ArXiv, vol. abs/1809.03672,
2019.
[18] Y. Feng, F. Lv, W. Shen, M. Wang, F. Sun, Y. Zhu, and K. Yang, “Deep session
interest network for click-through rate prediction,” in IJCAI, 2019.
[19] T.-R. Chen, “Extended clickstream: An analysis of the missing user behaviors in
the clickstream,” Master’s thesis, NCU, 2019.
[20] N. Altman, “An introduction to kernel and nearest-neighbor nonparametric regression,”
American Statistician, 1992.
[21] T. K. Ho, “Random decision forests,” in Proceedings of the Third International
Conference on Document Analysis and Recognition (Volume 1) - Volume 1, 1995.
[22] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings
of the 22nd ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, 2016.

指導教授

陳弘軒

審核日期

2020-7-20

推文