參考文獻 |
1. Aggarwal, C. C., Al-Garawi, F., and Yu, P. S. 2001. Intelligent crawling on the World Wide Web with arbitrary predicates. In Proceedings of the 10th international Conference on World Wide Web (Hong Kong, Hong Kong, May 01 - 05, 2001). WWW '01. ACM Press, New York, NY, 96-105.
2. Chakrabarti, S., Punera, K., and Subramanyam, M. 2002. Accelerated focused crawling through online relevance feedback. In Proceedings of the 11th international Conference on World Wide Web (Honolulu, Hawaii, USA, May 07 - 11, 2002). WWW '02. ACM Press, New York, NY, 148-159.
3. Diligenti, M., Coetzee, F., Lawrence, S., Giles, C. L., and Gori, M. 2000. Focused Crawling Using Context Graphs. In Proceedings of the 26th international Conference on Very Large Data Bases (September 10 - 14, 2000). A. E. Abbadi, M. L. Brodie, S. Chakravarthy, U. Dayal, N. Kamel, G. Schlageter, and K. Whang, Eds. Very Large Data Bases. Morgan Kaufmann Publishers, San Francisco, CA, 527-534.
4. Fontes, A. d. and Silva, F. S. 2004. SmartCrawl: a new strategy for the exploration of the hidden web. In Proceedings of the 6th Annual ACM international Workshop on Web information and Data Management (Washington DC, USA, November 12 - 13, 2004). WIDM '04. ACM Press, New York, NY, 9-15.
5. J. Cho, H. Garcia-Molina, and L. Page. Efficient crawling through url ordering. In Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia, 1998.
6. Liu, H., Milios, E., and Janssen, J. 2004. Probabilistic models for focused web crawling. In Proceedings of the 6th Annual ACM international Workshop on Web information and Data Management (Washington DC, USA, November 12 - 13, 2004). WIDM '04. ACM Press, New York, NY,
7. Menczer, F., Pant, G., Srinivasan, P., and Ruiz, M. E. 2001. Evaluating topic-driven web crawlers. In Proceedings of the 24th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (New Orleans, Louisiana, United States). SIGIR '01. ACM Press, New York, NY, 241-249.
8. M. Ester, H.-P. Kriegel, and M. Schubert. Accurate and efficient crawling for relevant websites. In Proceedings of the 30th international Conference on Very Large Data Bases (Toronto Canada, August31-September3, 2004). VLDB’04. 396-407.
9. Najork, M. and Wiener, J. L. 2001. Breadth-first crawling yields high-quality pages. In Proceedings of the 10th international Conference on World Wide Web (Hong Kong, Hong Kong, May 01 - 05, 2001). WWW '01. ACM Press, New York, NY, 114-118.
10. Pandey, S. and Olston, C. 2005. User-centric Web crawling. In Proceedings of the 14th international Conference on World Wide Web (Chiba, Japan, May 10 - 14, 2005). WWW '05. ACM Press, New York, NY, 401-411.
11. Raghavan, S. and Garcia-Molina, H. 2001. Crawling the Hidden Web. In Proceedings of the 27th international Conference on Very Large Data Bases (September 11 - 14, 2001). P. M. Apers, P. Atzeni, S. Ceri, S. Paraboschi, K. Ramamohanarao, and R. T. Snodgrass, Eds. Very Large Data Bases. Morgan Kaufmann Publishers, San Francisco, CA, 129-138.
12. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia, 1998.
13. Google Soap Search API, http://code.google.com/apis/soapsearch/
14. JAMA: A Java Matrix Package, http://math.nist.gov/javanumerics/jama/
15. Jahmm-Hidden Markov Model: An Implementation in Java, http://www.run.montefiore.ulg.ac.be/~francois/software/jahmm/
16. JDIC: JDesktop Integration Components, https://jdic.dev.java.net/
17. Jeff Heaton. Programming Spiders, Bots, and Aggregators in Java. Book ISBN: 0782140408, http://www.jeffheaton.com/java/bot/
18. K-means Clustering Tool, http://www.javaworld.com/javaworld/jw-11-2006/jw-1121-thread.html
19. K-Nearest-Neighbor, http://ww2.cs.fsu.edu/~chap/projects/knn/
20. LSI: Latent Semantic Indexing Tool, http://www.cs.utk.edu/~lsi/
21. String Edit Distance, http://en.wikipedia.org/wiki/Levenshtein_distance
22. Web Crawler, http://en.wikipedia.org/wiki/Web_crawling
23. Weka 3: Data Mining Software in Java, http://www.cs.waikato.ac.nz/ml/weka/
24. Wikipedia: http://en.wikipedia.org/wiki/Main_Page
25. WVTool: The World Vector Tool, http://nemoz.org/joomla/index.php?option=com_content&task=view&id=43&Itemid=83 |