參考文獻 |
[1] Chang CH., Chen TS., Chen MC., Ding JL. Efficient Page-Level Data Extraction via
Schema Induction and Verification. PAKDD 2016.
[2] Kayed, Mohammed , Mohammed & Shaalan, Khaled & F, Khaled. (2006). A Survey of Web
Information Extraction Systems. IEEE Transactions on Knowledge and Data Engineering.
18. 1411-1428.
[3] Chang, C.-H. and Lui, S.-C., IEPAD: Information extraction based on pattern discovery.
Proceedings of the Tenth International Conference on World Wide Web (WWW), Hong
Kong, pp. 223-231, 2001.
[4] Crescenzi, V., Mecca, G. and Merialdo, P., RoadRunner: towardsautomatic data extraction
from large Web sites. Proceedings of the 26th International Conference on Very Large
Database Systems (VLDB), Rome, Italy, pp. 109-118, 2001.
[5] K. Mohammed, "FiVaTech: Page-Level Web Data Extraction from Template Pages," IEEE
Transactions on Knowledge and Data Engineering, vol. 22, pp. 249-263, 03/31 2010.
[6] MING-CYUAN, Chen, et al.應用路徑資訊輔助 樣板探勘於網頁層級之資料擷取研究.
Technologies and Applications of Artificial Intelligencester, 2013.
[7] TIAN-CHENG, Chen, et al.基於頁面層級之快 速網頁資料擷取與綱要驗證.
Technologies and Applications of Artificial Intelligencester, 2014.
[8] Gottron T. (2008) Clustering Template Based Web Documents. In: Macdonald C., Ounis I.,
Plachouras V., Ruthven I., White R.W. (eds) Advances in Information Retrieval. ECIR
2008. Lecture Notes in Computer Science, vol 4956.
[9] Huang X. et al. (2017) Web Content Extraction Using Clustering with Web Structure. In:
Cong F., Leung A., Wei Q. (eds) Advances in Neural Networks - ISNN 2017. ISNN 2017.
Lecture Notes in Computer Science, vol 10261.
[10] Nikolaos K. Papadakis, Dimitrios Skoutas, Konstantinos Raftopoulos, and Theodora A.
Varvarigou. 2005. STAVIES: A System for Information Extraction from Unknown Web
Data Sources through Automatic Web Wrapper Generation Using Clustering Techniques.
IEEE Trans. on Knowl. and Data Eng. 17, 12 (December 2005), 1638-1652.
[11] Mucha J., Snaprud M., Nietzio A. (2016) Web Page Clustering for More Efficient Website
Accessibility Evaluations. In: Miesenberger K., Buhler C., Penaz P. (eds) Computers
Helping People with Special Needs. ICCHP 2016. Lecture Notes in Computer Science,
vol 9758.
[12] Eibe Frank, Mark A. Hall, and Ian H. Witten (2016). The WEKA Workbench. Online
Appendix for "Data Mining: Practical Machine Learning Tools and Techniques", Morgan
Kaufmann, Fourth Edition, 2016.
[13] Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
[14] Crescenzi, Valter & Merialdo, Paolo & Missier, Paolo. (2005). Clustering Web pages based
on their structure. Data & Knowledge Engineering. 54. 279-299.
[15] https://en.wikipedia.org/wiki/Support_vector_machine
[16] C.-C. Chang, C.-J. Lin, LIBSVM: A Library for Support Vector Machines, ACM
Transactions on Intelligent Systems and Technology, Vol. 2, No.3, Article 27, April, 2011.
[17]https://nlp.stanford.edu/IR-book/html/htmledition/single-link-and-complete-link
clustering-1.html
[18] https://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html
[19] Peter J. Rousseeuw, Silhouettes: A graphical aid to the interpretation and validation of
cluster analysis, Journal of Computational and Applied Mathematics, Volume 20, 1987,
Pages 53-65,
[20] http://daisen.cc.kyushu-u.ac.jp/TBDW/
[21] C.-H. Chang, Y.-L. Lin, K.-C. Lin, and M. Kayed, "Page-Level Wrapper Verification for
Unsupervised Web Data Extraction," in Web Information Systems Engineering – WISE 2013. vol. 8180, X. Lin, Y. Manolopoulos, D. Srivastava, and G. Huang, Eds., ed: Springer
Berlin Heidelberg, 2013, pp. 454-467
[22] O. Yuliana, C.-H. Chang, A novel alignment algorithm for effective web data extraction
from singleton pages, Applied Intelligence (To appear) |