參考文獻 |
[1] T. Abeel, Y. V. de Peer, and Y. Saeys. Java-ML: A machine learning library. Journal of Machine Learning Research, 10:931–934, 2009. Software available at http://java-ml.sourceforge.net.
[2] A. Arasu and H. Garcia-Molina. Extracting structured data from web pages. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data. ACM, New York, pages 337–348, 2003.
[3] M.-F. Balcan, S. Hanneke, and J. Vaughan. The true sample complexity of active learning. Machine Learning, 80:111–139, 2010.
[4] M. Bronzi, V. Crescenzi, P. Merialdo, and P. Papotti. Extraction and integration of partially overlapping web sources. In Proceedings of the VLDB Endowment, Vol.6, No. 10, pages 805–416, 2013.
[5] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[6] C.-H. Chang, T.-S. Chen, M.-C. Chen, and J.-L. Ding. Efficient page-level data extraction via schema induction and verification. In Proceedings of the 1st International Conference on Web Information Systems Engineering. Springer, Switzerland, pages 454–467, 2013.
[7] C.-H. Chang, Y.-L. Lin, K.-C. Lin, and M. Kayed. Page-level wrapper verification for unsupervised web data extraction. In Proceedings of the 2nd Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, Switzerland, pages 478–490, 2016.
[8] C.-H. Chang and S.-C. Lui. IEPAD: information extraction based on pattern discovery. In Proceedings of the 10th international conference on World Wide Web. ACM, New York, pages 681–688, 2001.
[9] V. Crescenzi and G. Mecca. Automatic information extraction from large websites. Journal of the ACM (JACM), 51(5):731–779, 2004.
[10] V. Crescenzi, P. Merialdo, and D. Qiu. Hybrid crowd-machine wrapper inference. ACM Transactions on Knowledge Discovery from Data, 13(5):1–43, 2019.
[11] I. F. de Viana, P. J. Abad, J. L. Alvarez, and J. L. Arjona. MAVE: Multilevel wrApper Verification systEm. IEEE Transactions on Knowledge and Data Engineering, 28(9): 2393–2406, 2016.
[12] Diffbot. Diffbot. (2020). retrieved may 5, 2020 from https://www.diffbot.com.
[13] R. R. Fayzrakhmanov, E. Sallinger, B. Spencer, T. Furche, and G. Gottlob. Browserless web data extraction: Challenges and opportunities. In Proceedings of WWW’18: The World Wide Web Conference, Lyon, France, pages 1095–1104, 2018.
[14] Fminer. Fminer. (2020). retrieved may 5, 2020 from http://www.fminer.com.
[15] T. Furche, G. Gottlob, G. Grasso, X. Guo, G. Orsi, C. Schallhart, and C. Wang. DIADEM: thousands of websites to a single database. In Proceedings of the VLDB Endowment (14), Vol. 7, pages 1845–1856, 2014.
[16] J. Guo, V. Crescenzi, T. Furche, G. Grasso, and G. Gottlob. RED: Redundancy-driven data extraction from result pages. In Proceedings of WWW ’19: The World Wide Web Conference, San Francisco, CA, USA, pages 605–615, 2019.
[17] C.-N. Hsu and C.-C. Chang. Finite-state transducers for semi-structured text mining. In Proceedings of IJCAI-99 Workshop on Text Mining: Foundations, Techniques and
Applications. USA, pages 38–49, 1999.
[18] Import.io. import.io. (2020). retrieved may 5, 2020 from https://www.import.io.
[19] Ipswitch, Inc. imacros. (2020). retrieved may 5, 2020 from https://imacros.net.
[20] U. Irmak and T. Suel. Interactive wrapper generation with minimal user effort. In Proceedings of the 15th international conference on World Wide Web, pages 553–
563, 2006.
[21] P. Jiménez and R. Corchuelo. On learning web information extraction rules with tango. Information Systems, 62:74––103, 2016.
[22] M. Kayed and C.-H. Chang. FiVaTech: Page level web data extraction from template pages. IEEE transactions on knowledge and data engineering, 22(2):249–263, 2009.
[23] N. Kushmerick. Wrapper verification. World Wide Web, 3(2):79–94, 2000.
[24] N. Kushmerick, D. S. Weld, and R. Doorenbos. Wrapper induction for information extraction. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI)., pages 729–737, 1997.
[25] K. Lerman, S. N. Minton, and C. A. Knoblock. Wrapper maintenance: A machine learning approach. Journal of Artificial Intelligence Research, 18:149–181, 2003.
[26] B. Liu, R. Frossman, and Y. Zhai. Mining data records in web pages. In Proceedings of the 9th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pages 601–606, 2003.
[27] J. Lyseggen. Outside Insight: Navigating a World Drowning in Data. Penguin, 2017.
[28] Mozenda, Inc. Mozenda. (2020). retrieved may 5, 2020 from https://www.mozenda.com.
[29] I. Muslea, S. Minton, and C. Knoblock. Stalker: Learning extraction rules for semistructured, web-based information sources. In Proceedings of AAAI-98 Workshop on AI and Information Integration. AAAI Press, USA, pages 74–81, 1998.
[30] I. Muslea, S. Minton, and C. Knoblock. Active learning with multiple views. Journal of Artificial Intelligence Research, 27(1):203–233, 2006.
[31] Naoaki Okazaki. Crfsuite: a fast implementation of conditional random fields (crfs). retrieved may 25, 2016 from http://www.chokkan.org/software/crfsuite/, 2007.
[32] S. Ortona, G. Orsi, M. Buoncristiano, and T. Furche. WADaR: Joint repairs for web wrappers. In Proceedings of the VLDB Endowment, pages 1996––1999, 2015.
[33] S. Ortona, G. Orsi, T. Furche, and M. Buoncristiano. Joint repairs for web wrappers. In Proceedings of IEEE 32nd International Conference on Data Engineering (ICDE), pages 1146–1157, 2016.
[34] B. Settles. Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin–Madison, 2009.
[35] H. A. Sleiman and R. Corchuelo. Tex: An efficient and effective unsupervised web information extractor. Knowledge-Based Systems, 39:109–123, 2013.
[36] Visual Web Ripper. Visual Web Ripper. (2020). retrieved may 5, 2020 from http://visualwebripper.com.
[37] J. Wang and W. Tepfenhart. Formal Methods in Computer Science. CRC Press, 06 2019.
[38] Wrapidity Limited. Wrapidity. (2020). retrieved may 5, 2020 from https://www.wrapidity.com.
[39] O. Y. Yuliana and C.-H. Chang. DCADE: divide and conquer alignment with dynamic encoding for full page data extraction. Applied Intelligence, 50:271–295, 2019.
[40] Y. Zhai and B. Liu. Structured data extraction from the web based on partial tree alignment. IEEE Transactions on Knowledge and Data Engineering, 18(12):1614–1628, 2006. |