With the increase usage of mobile phones, the demand of searching POI (Point of Interest), such as store, address, etc., is becoming part of people′s daily life. Providing such services needs a massive POI database. However, the POI information for such a database may change as time passing. It’s annoying for user to get wrong information. How to keep the POI database up to date by continuously identifying outdated POIs and updating the database has become a key issue.
As the POI database grows, it is difficult to effectively use the manual way to verify the data. Yet the government has open data regarding business, e.g. “全國營業(稅籍)登記資料集” and “公司解散登記清冊”. However, the data should be used carefully since the data format of the open data set is different with general POI database used in may service On the other hand, there is rich and available information on the web. Using the information on the web, such as the date that the web page is updated, the volume of POI mentioned on the web, we can train a verification model to detect POIs that may be outdated in the database.
In this paper, our goal is to detect outdated POIs in the database within a feasible time. The approach can be divided to two parts. The first part is using open government information. The second part is using Web information to train a model to detect outdated POI in the database. Experiments show that our performance can achieve 0.758 F-measure (by using google map information, time distance between today and recent publishing date, appear on official website or not, words about outdated POI description), best performance can be reached to 0.91 F-measure by feature combination, it′s higher than Chuang 0.201.
Chuang, H. M., & Chang, C. H. (2015, May). Verification of poi and location pairs via weakly labeled web data. In Proceedings of the 24th International Conference on World Wide Web (pp. 743-748).
Al-Bahadili, H., Qtishat, H., & Naoum, R. S. (2013). Speeding up the Web Crawling process on a Multi-core processor using Virtualization. International Journal on Web Service Computing, 4(1), 19.
Chuang, H. M., Chang, C. H., & Kao, T. Y. (2014, September). Effective web crawling for chinese addresses and associated information. In International Conference on Electronic Commerce and Web Technologies (pp. 13-25). Springer International Publishing.
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992, July). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory (pp. 144-152). ACM.
Chuang, H. M., Chang, C. H. (2016). POI Extraction and Relation Verification from the Web [Chuang, NCU, PhD Thesis]
Platt, J. (1998). Sequential minimal optimization: A fast algorithm for training support vector machines.
Tran, T., & Cao, T. H. (2013). Automatic Detection of Outdated Information in Wikipedia Infoboxes. Research in Computing Science, 70, 211-222.
Hu, Y., Janowicz, K., & Prasad, S. (2014, November). Improving Wikipedia-based place name disambiguation in short texts using structured data from DBpedia. In Proceedings of the 8th workshop on geographic information retrieval (p. 8). ACM.
Lin Y. Y., Chang, C. H. (2014) Store Name Extraction and Name-Address Matching for Geographic Information Retrieval [Lin, NCU, masters Thesis]