On Large-Scale Multi-Label Classification for POI Tagging

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：16

、訪客IP：52.14.114.210

姓名

楊鎧謙(Kai-Qian Yang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

(On Large-Scale Multi-Label Classification for POI Tagging)

相關論文

★ 行程邀約郵件的辨識與不規則時間擷取之研究	★ NCUFree校園無線網路平台設計及應用服務開發
★ 網際網路半結構性資料擷取系統之設計與實作	★ 非簡單瀏覽路徑之探勘與應用
★ 遞增資料關聯式規則探勘之改進	★ 應用卡方獨立性檢定於關連式分類問題
★ 中文資料擷取系統之設計與研究	★ 非數值型資料視覺化與兼具主客觀的分群
★ 關聯性字組在文件摘要上的探討	★ 淨化網頁：網頁區塊化以及資料區域擷取
★ 問題答覆系統使用語句分類排序方式之設計與研究	★ 時序資料庫中緊密頻繁連續事件型樣之有效探勘
★ 星狀座標之軸排列於群聚視覺化之應用	★ 由瀏覽歷程自動產生網頁抓取程式之研究
★ 動態網頁之樣版與資料分析研究	★ 同性質網頁資料整合之自動化研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

近年來智慧型手持裝置迅速普及，現在已經達到幾乎人手一機的情況。而交通方式的進步更是使得人們移動的機率大幅增加，因此到陌生地點的機會也跟著增加。在陌生的環境之中要尋找感興趣的點是不容易的，所以需要提供電子地圖系統以便查詢。電子地圖如果只提供名稱搜尋是不夠的，因為使用者可能不知道這些點的確切名稱，他們可能只是想找特定類型的點，所以一個好的電子地圖需要提供類別搜尋服務。
為了要提供類別搜尋服務，我們需要將系統中所有的點進行分類。因為系統中有許多筆資料，每筆資料都有一個或多個類別，所以這是一個大數量的多類別分類問題。地圖上的這些資料通常有許多種分類方式，我們使用中華黃頁的分類方式。類別包含兩個等級，等級一類別有29種類別而等級二類別則有1,287種。因為類別與資料較多使得一般訓練分類器的方式需要訓練多個分類器，導致訓練與測試時間增加許多。我們利用降低類別維度的方式來加快訓練與測試的速度。
實驗顯示採用KDE+SVM的混合模型方式的訓練時間與測試時間皆比一般的SVM分類快幾乎一倍，對29個大類別Micro-F1可達0.813，等級二類別的Micro-F1為0.718僅略低於SVM在等級一類別的Micro-F1 0.842，等級二類別的Micro-F1 0.783。由於資料為imbalanced data我們比較了Reweighting和Downsampling的方式想增進效能，但其結果顯示在大數量的資料中這兩個方法效果較不明顯。

摘要(英)

In recent years, mobile device become more popular. And due to convenient transportation, people have higher probability to visit strange places. It is not easy to find a point of interest in a strange places, so we need to provide an electronic map system for users. It is not enough to provide name search for users only, because the users may not know the exact name of points. They may just want to find a specific category of point, so a good electronic map system needs to provide category search service.
In order to provide category search services, we need to classify all the points in the system. Because the system has many points, each item has one or more categories, so this is a large-scale multi-label classification problem. There are many kind of categories, we follow the categories defined by Chinese yellow pages. The category consists two levels. There are 29 categories in level 1and 1,287 in level 2. Because the number of points and categories are large, we need to spend much time for training classifiers and testing data. We reduce the dimension of categories to speed up training and testing.
After the experiment, our method’s training time and testing time are superior to the general SVM classification, the performance in level 1 Micro-F1 is 0.813, in level 2 Micro-F1 is 0.718 all slightly lower than SVM in level 1 Micro-F1 is 0.842. In level 2 Micro-F1 is 0.783. We want to try Reweighting, Downsampling to improve performance, but the performance is not wall in large-scale data.

關鍵字(中)

★ 機器學習
★ 多類別分類
★ 非平衡資料
★ 興趣點

關鍵字(英)

★ Machine Learning
★ Multi Label Classification
★ Unbalanced Data
★ point of interest

論文目次

中文摘要..........................i
Abstract.........................ii
圖目錄............................iv
表目錄.............................v
一、緒論.......................1
1.1 研究動機與目的...............1
1.2 多標籤分類...................3
1.3 章節概要.....................3
二、相關研究....................4
三、系統架構....................6
3.1 資料前處理...................6
3.2 KDE-based Classification 8
3.3 資料測試....................11
四、實驗結果....................12
4.1 資料集描述...................12
4.2 評估方式.....................14
4.3 實驗分析與討論...............15
4.3.1 β值對KDE結果的影響.........15
4.3.2 訓練工具比較...............16
4.3.3 訓練與測試時間結果..........16
4.3.4 各個方法效能結果............18
4.3.5 SVM的非平衡資料改善實驗......20
4.3.6 額外特徵影響................20
五、結論與未來工作 ...............22
六、參考文獻.....................23

參考文獻

[1] C.-C. Chang and C.-J. Lin, “LIBSVM: a library for support vector machines,” ACM Trans. Intell. Syst. Technol., vol. 2(27), pp. 1–27, 2011.
[2] Q Chen, et al. “Improvement of Kernel Dependency Estimation and Case Study on Skewed Data.” National Central University, 2013
[3] Fan, Rong-En, Pai-Hsuen Chen, and Chih-Jen Lin. ”Working set selection using second order information for training support vector machines.” Journal of machine learning research 6.Dec (2005): 1889-1918.
[4] Fan, Rong-En, et al. ”LIBLINEAR: A library for large linear classification.” Journal of machine learning research 9.Aug (2008): 1871-1874.
[5] Godbole, Shantanu, and Sunita Sarawagi. ”Discriminative methods for multi-labeled classification.” Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer Berlin Heidelberg, 2004.
[6] Tang, Lei, Suju Rajan, and Vijay K. Narayanan. ”Large scale multi-label classification via metalabeler.” Proceedings of the 18th international conference on World wide web. ACM, 2009.
[7] Wallace, Byron C., et al. ”Class imbalance, redux.” Data Mining (ICDM), 2011 IEEE 11th International Conference on. IEEE, 2011.
[8] Weston, Jason, et al. ”Kernel dependency estimation.” Advances in neural information processing systems. 2003.
[9] Yang, Yiming, and Jan O. Pedersen. ”A comparative study on feature selection in text categorization.” Icml. Vol. 97. 1997.

指導教授

張嘉惠(Chia-Hui Chang)

審核日期

2017-8-24

推文