dc.description.abstract |
With the increase usage of mobile phones, the demand of searching POI (Point of Interest), such as store, address, etc., is becoming part of people′s daily life. Providing such services needs a massive POI database. However, the POI information for such a database may change as time passing. It’s annoying for user to get wrong information. How to keep the POI database up to date by continuously identifying outdated POIs and updating the database has become a key issue.
As the POI database grows, it is difficult to effectively use the manual way to verify the data. Yet the government has open data regarding business, e.g. “全國營業(稅籍)登記資料集” and “公司解散登記清冊”. However, the data should be used carefully since the data format of the open data set is different with general POI database used in may service On the other hand, there is rich and available information on the web. Using the information on the web, such as the date that the web page is updated, the volume of POI mentioned on the web, we can train a verification model to detect POIs that may be outdated in the database.
In this paper, our goal is to detect outdated POIs in the database within a feasible time. The approach can be divided to two parts. The first part is using open government information. The second part is using Web information to train a model to detect outdated POI in the database. Experiments show that our performance can achieve 0.758 F-measure (by using google map information, time distance between today and recent publishing date, appear on official website or not, words about outdated POI description), best performance can be reached to 0.91 F-measure by feature combination, it′s higher than Chuang 0.201. | en_US |