透過文件特徵－字型、位置及引用文獻搜尋科學文件

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：8

、訪客IP：3.149.214.24

姓名

鄭雲玲(Yun-Ling Cheng) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

透過文件特徵－字型、位置及引用文獻搜尋科學文件
(Search scientific documents by the features, positions, fonts, and cited references)

相關論文

★ 零售業商業智慧之探討	★ 有線電話通話異常偵測系統之建置
★ 資料探勘技術運用於在學成績與學測成果分析 -以高職餐飲管理科為例	★ 利用資料採礦技術提昇財富管理效益 -以個案銀行為主
★ 晶圓製造良率模式之評比與分析－以國內某DRAM廠為例	★ 商業智慧分析運用於學生成績之研究
★ 運用資料探勘技術建構國小高年級學生學業成就之預測模式	★ 應用資料探勘技術建立機車貸款風險評估模式之研究－以A公司為例
★ 績效指標評估研究應用於提升研發設計品質保證	★ 基於文字履歷及人格特質應用機械學習改善錄用品質
★ 以關係基因演算法為基礎之一般性架構解決包含限制處理之集合切割問題	★ 關聯式資料庫之廣義知識探勘
★ 考量屬性值取得延遲的決策樹建構	★ 從序列資料中找尋偏好圖的方法 - 應用於群體排名問題
★ 利用分割式分群演算法找共識群解群體決策問題	★ 以新奇的方法有序共識群應用於群體決策問題

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

在現今社會中，學術成果越來越受重視並被需要；然而，隨著科學成果逐年地累積，並於網際網路上公開和流傳，使用者要能夠在這些大量資料中找到符合需求的文件成為一項巨大挑戰。由於科學文件是具有結構性的文本，當中必然包含能夠用以提升使用者檢索成效的因子。本研究針對科學文件的三項特徵：字型、位置和引用文獻等加以探討，這三項特徵在過去雖然有個別的文獻探討，但截至目前未有整合三者以提升檢索成效的相關研究。首先，我們將釐清字型、位置和引用文獻這三項因子彼此之間的關係，之後將依據其存在關係，透過結合三者來設計出能夠用以提升文件檢索成效的方法，最後，我們經由真實科學文件進行實驗，來實證本研究方法的有效性和成效。

摘要(英)

As the fast dissemination of research results on the worldwide web, a user’s task of finding useful information becomes more challenging. Usage of scholarly material is growing rapidly and there is a growing demand for high-quality scholarly information. Since a scientific document is a structural text, there would have some useful features that can be used to improve retrieval performance. Here, we investigate three features, fonts, positions and cited references. Although in the past these three individual features have been used in document search, no existing research discusses how to integrate these three together to improve retrieval performance. Therefore, we will first investigate the relationships among them, and then study how to combine them to design a novel retrieval method based on their relationships. Finally, extensive experiments have been carried out through real scientific documents to show its usefulness and performance.

關鍵字(中)

★ 資訊檢索
★ 科學文
★ 文本
★ 特徵
★ 相似度

關鍵字(英)

★ Scientific documents
★ Information retrieval
★ Similarity
★ features
★ Text

論文目次

List of Illustrations II
List of Tables III
1. Introduction 1
2. Related Work 4
2.1. Fonts 4
2.2. Positions in documents 4
2.3. Cited references 5
2.4. Similarity measures 6
3. Methodology 8
3.1. Document Pre-processing 9
3.1.1. Database building 9
3.2. Vector Construction 14
3.2.1. Construction of Content Vector 14
3.2.2. Construction of Reference Vector 17
3.3. Determining the similarities between documents 32
4. Evaluation 39
4.1. The experimental environment 39
4.2. The stage of pre-test 41
4.2.1. The experiment design 41
4.2.2. Experimental results 42
4.3. The stage of formal evaluation 47
4.3.1. The experiment design 48
4.3.2. Experimental results 48
5. Conclusion 50
6. References 51

參考文獻

1. Odlyzko, A., The rapid evolution of scholarly communication. Learned Publishing, 2002. 15(1): p. 7-19.
2. Samuels, S.J., et al., Adults' use of text structure in the recall of a scientific journal article. Journal of Education Research 1988. 18(3): p. 171-174.
3. Crookes, G., Towards a validated analysis of scientific text structure. Applied Linguistics, 1986. 17(1): p. 57-70.
4. Brin, S. and L. Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems, 1998. 30(1-7): p. 107-117.
5. Baxendale, P.B., Man made Index for Technical Literature - An Experiment. Experiment, IBM Journal of Research and Development 1958. 2(4): p. 254-361.
6. Hill, S. and F. Provost, The myth of the double-blind review?: author identification using only citations, in ACM SIGKDD Explorations Newsletter. 2003. p. 179-184.
7. Popescul, A., et al., Clustering and identifying temporal trends in document databases, in Advances in Digital Libraries, ADL 2000. May 2000: Washington, D.C. p. 173-182.
8. Jeh, G. and J. Widom. SimRank: A Measure of Structural-Context Similarity. in Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. July 2002. Edmonton, Alberta, Canada.
9. Small, H., Visualizing Science by Citation Mapping. Journal of the American Society for Information Science, 1999. 50(9): p. 799-813.
10. Chen, C. and L. Carr. Trailblazing the literature of hypertext: author co-citation analysis (1989–1998). in Proceedings of the tenth ACM Conference on Hypertext and hypermedia : returning to our diverse roots: returning to our diverse roots. 1999: ACM Press.
11. Small, H., Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society of Information Science, 1973(24): p. 265-269.
12. Kessler, M.M., Bibliographic Coupling Between Scientific Papers. American Documentation, 1963. 14: p. 10-25.
13. Edmundson, H.P., New Methods in Automatic Extracting. Journal of the ACM (JACM), 1969. 16(2): p. 264-285.
14. Hans, H.v., New Feature Sets for Summarization by Sentence Extraction. IEEE Intelligent Systems, 2003. 18(4): p. 34-42.
15. Radev, D.R., E. Hovy, and K. McKeown, Introduction to the special issue on summarization. Comput. Linguist., 2002: p. 399-408
16. Yoshimi, T., et al., Evaluation of Importance of Sentences based on Connectivity to Title. ACL, 1998: p. 1443-1447.
17. Kleinberg, J.M., Authoritative sources in a hyperlinked environment. Journal of the ACM, 1999. 46(5): p. 604-632.
18. Page, L., et al., The pagerank citation ranking: Bringing order to the web, in Stanford Digital Library working paper. 1997.
19. Bharat, K., et al. Who Links to Whom: Mining Linkage between Web Sites. in IEEE International Conference on Data Mining (ICDM'01) November 2001. San Jose, California.
20. Salton, G. and M.E. Lesk, Computer evaluation of indexing and text processing. J. ACM, 1968. 15: p. 8-36.
21. G, S., The SMART Retrieval System Prentice Hall, 1971.
22. Salton, G., Automatic Text Processing. 1988: Addison-Wesley Publishing Company.
23. Strehl, A., J. Ghosh, and R. Mooney. Impact of similarity measures on web-page clustering. in Proceedings of the 17th National Conference on Artificial Intelligence: Workshop of Artificial Intelligence for Web Search. July 2000. Austin, Texas, USA: AAAI.
24. Wong, S.K.M., W. Ziarko, and P.C.N. Wong. Generalized vector space model in information retrieval. in Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1985. New York: ACM Press.
25. Furnas, G.W., et al., Information retrieval using a singular value decomposition model of latent semantic structure. In: Proceeding of the 11th International Comference on Research and Development in Information Retrieval 1988: p. 465-480.
26. Kontostathis, A. and W.M. Pottenger, A framework for understanding Latent Semantic Indexing (LSI) performance. Information Processing and Management, 2006: p. 56-73.
27. Tai, X., F. Ren, and K. Kita, An information retrieval model based on vector space method by supervised learning. Information Processing and Management: an International Journal, November 2002. 38(6): p. 749 - 764.
28. Kalczynski, P.J. and A. Chou, Temporal Document Retrieval Model for business news archives. Information Processing and Management 2005: p. 635-650.
29. Berry, M.W., Z. Drmac, and E.R. Jessup, Vector spaces and information retrieval. SIAM Review 1999: p. 335-362.
30. Teufel, S. and M. Moens, Summarizing Scientific Articles_Experiments with Relevance and Rhetorical Status. Computational Linguistics, 2002. 28(4): p. 409-445.
31. Wolfram Research, I., http://mathworld.wolfram.com/SpearmanRankCorrelationCoefficient.html. 1999-2006, Spearman.

指導教授

陳彥良(Yen-Liang Chen)

審核日期

2006-6-23

推文