摘要(英) |
Mobile devices are the trend of 2014. According to the report of IDC, the first time unit shipments of tablet has exceed PCs in 2013 Q4. The smart phone has already exceed other devices in unit shipments and market ratio. LBS (Location-based Service) plays an important role in this trend. Because of the device mobility, many demand have been proposed, for example, navigation, searching restaurant or gas station. It’s usually needs a POI (Point-of Interest) database to support a LBS. The web is the largest data source, these data come from website manager, crowdsourcing and people sharing information, including address, name, phone and comment. There are many method to extract address associated information nowadays, but they are usually faced with the challenge of extracting name of POI. It’s a limitation of information retrieval.
Our system could be separated into three parts: the Taiwan address normalization, the Store Name Entity Recognition and Address-StoreNE matching. Finally, users can search the store names on the mobile device and get the informations like address, telephone and comment immediately. In the part of Store NER, our research propose a common characteristic of store and organization names. We use these characteristic as features to join the CRF model, enhanced the recognition result. |
參考文獻 |
[1] H.-M. Chuang, C.-H. Chang and T.-Y. Kao, "Effective Web Crawling for Chinese Addresses and Associated Information," in EC-Web, Munich, Germany, 2014.
[2] S.-Y. Li, Application and Extraction of Postal Addresses and Related Information, National Central University, 2009.
[3] C.-H. Chang, C.-Y. Huang and Y.-S. Su, "Chinese Postal Address and Associated Information Extraction," The 26th Annual Conference of the Japanese Society for Artificial Intelligence, 2012.
[4] L. D. John , M. Andrew and N. C. Fernando, "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data," ICML Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282-289, 2001.
[5] Z. Suxiang, Z. Suxian and W. Xiaojie, "Automatic Recognition of Chinese Organization Name Based on Conditional Random Fields," Natural Language Processing and Knowledge Engineering, pp. 229-233, 2007.
[6] Y. Xiying, "A METHOD OF CHINESE ORGANIZATION NAMED ENTITIES RECOGNITION BASED ON STATISTICAL WORD FREQUENCY, PART OF SPEECH AND LENGTH," Broadband Network and Multimedia Technology (IC-BNMT), pp. 637-641, 2011.
[7] L. Yajuan, Y. Jing and H. Liang, "Chinese Organization Name Recognition Based on Multiple Features," Pacific Asia conference on Intelligence and Security Informatics, pp. 136-144, 2012.
[8] C.-W. Wu, R. T.-H. Tsai and W.-L. Hsu, "Semi-joint labeling for chinese named entity recognition," Proceedings of the 4th Asia information retrieval conference, pp. 107-116, 2008.
[9] Y.-S. Su, Associated Information Extraction for Enabling Entity Search on Electronic Map, National Central University, 2012.
[10] A. Dirk and B. Susanne, "Location-based Web search," 2007, pp. 55-66.
[11] D. Ahlers, "Business entity retrieval and data provision for yellow pages by local search," Integrating IR technologies for professional search, ECIR, 2013.
[12] D. Ahlers, “Lo major de dos idiomas – cross-lingual linkage of geotagged Wikipedia articles.,” 於 Advances in Information Retrieval, 2013, pp. 668-671.
[13] "Apache Tika," The Apache Software Foundation, [Online]. Available: http://tika.apache.org/.
[14] "The Stanford NLP (Natural Language Processing) Group," Stanford NLP Group, [Online]. Available: http://nlp.stanford.edu/software/segmenter.shtml.
[15] R. C. Hassan A. Sleiman, "TEX: An efficient and effective unsupervised Web information extractor," Knowledge-Based Systems, pp. 109-123, 2013.
[16] "教育部重編國語辭典修訂本-主站," 中華民國教育部, [Online]. Available: http://dict.revised.moe.edu.tw/.
[17] "GeoNames," [Online]. Available: http://www.geonames.org/.
[18] W. Liu, X. Meng 且 W. Meng, “ViDE: A Vision-Based Approach for Deep Web Data Extraction,” Transactions on Knowledge and Data Engineering, pp. 447-460, 2010. |