On Large-Scale Multi-Label Classification for POI Tagging

DC 欄位	值	語言
DC.contributor	資訊工程學系	zh_TW
DC.creator	楊鎧謙	zh_TW
DC.creator	Kai-Qian Yang	en_US
dc.date.accessioned	2017-8-24T07:39:07Z
dc.date.available	2017-8-24T07:39:07Z
dc.date.issued	2017
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=103522046
dc.contributor.department	資訊工程學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	近年來智慧型手持裝置迅速普及，現在已經達到幾乎人手一機的情況。而交通方式的進步更是使得人們移動的機率大幅增加，因此到陌生地點的機會也跟著增加。在陌生的環境之中要尋找感興趣的點是不容易的，所以需要提供電子地圖系統以便查詢。電子地圖如果只提供名稱搜尋是不夠的，因為使用者可能不知道這些點的確切名稱，他們可能只是想找特定類型的點，所以一個好的電子地圖需要提供類別搜尋服務。為了要提供類別搜尋服務，我們需要將系統中所有的點進行分類。因為系統中有許多筆資料，每筆資料都有一個或多個類別，所以這是一個大數量的多類別分類問題。地圖上的這些資料通常有許多種分類方式，我們使用中華黃頁的分類方式。類別包含兩個等級，等級一類別有29種類別而等級二類別則有1,287種。因為類別與資料較多使得一般訓練分類器的方式需要訓練多個分類器，導致訓練與測試時間增加許多。我們利用降低類別維度的方式來加快訓練與測試的速度。實驗顯示採用KDE+SVM的混合模型方式的訓練時間與測試時間皆比一般的SVM分類快幾乎一倍，對29個大類別Micro-F1可達0.813，等級二類別的Micro-F1為0.718僅略低於SVM在等級一類別的Micro-F1 0.842，等級二類別的Micro-F1 0.783。由於資料為imbalanced data我們比較了Reweighting和Downsampling的方式想增進效能，但其結果顯示在大數量的資料中這兩個方法效果較不明顯。	zh_TW
dc.description.abstract	In recent years, mobile device become more popular. And due to convenient transportation, people have higher probability to visit strange places. It is not easy to find a point of interest in a strange places, so we need to provide an electronic map system for users. It is not enough to provide name search for users only, because the users may not know the exact name of points. They may just want to find a specific category of point, so a good electronic map system needs to provide category search service. In order to provide category search services, we need to classify all the points in the system. Because the system has many points, each item has one or more categories, so this is a large-scale multi-label classification problem. There are many kind of categories, we follow the categories defined by Chinese yellow pages. The category consists two levels. There are 29 categories in level 1and 1,287 in level 2. Because the number of points and categories are large, we need to spend much time for training classifiers and testing data. We reduce the dimension of categories to speed up training and testing. After the experiment, our method’s training time and testing time are superior to the general SVM classification, the performance in level 1 Micro-F1 is 0.813, in level 2 Micro-F1 is 0.718 all slightly lower than SVM in level 1 Micro-F1 is 0.842. In level 2 Micro-F1 is 0.783. We want to try Reweighting, Downsampling to improve performance, but the performance is not wall in large-scale data.	en_US
DC.subject	機器學習	zh_TW
DC.subject	多類別分類	zh_TW
DC.subject	非平衡資料	zh_TW
DC.subject	興趣點	zh_TW
DC.subject	Machine Learning	en_US
DC.subject	Multi Label Classification	en_US
DC.subject	Unbalanced Data	en_US
DC.subject	point of interest	en_US
DC.title	On Large-Scale Multi-Label Classification for POI Tagging	en_US
dc.language.iso	en_US	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 103522046 完整後設資料紀錄