摘要: | Internet的蓬勃發展,讓資訊與知識能更廣泛,更有效率地流通。但是方便取得的資訊,也意味著網路上的不當資訊更加地四處橫流;電腦教育的日漸普及,使得越來越多的人可以接觸到網路,對於藉由Internet來擴散的負面題材,例如色情、暴力、吸毒、種族仇恨...等等資訊,將因為未設防的存取環境,而比實體的傳播管道更具穿透力。因此在不妨礙言論自由的範圍內,對於以國中小學教育為主的網路環境所能接取的網站內容,及存取行為施以某種程度的過濾是有必要的。 對於網站過濾方面的研究,應用黑名單其中一種受歡迎的手法,獲得名單的方式則因方法而異。一般來說有可以分為人工檢查、關鍵字分析、程式自動收尋...等等。本文針對色情網站在影像及文字方面的特性,發展出一套綜合的分析方法。在色情圖片方面,利用影像處理及圖樣分析方面的技術:如色彩分析,紋理分析,中軸抽取,Shape From Shading...等技術,來分析影像中是否有膚色色調的區域,以及這些區域是否能代表存在著裸露的人體;在文字方面,則運用資訊檢索和文件分類的手法,測量關於色情方面的關鍵字之數目及出現頻率。最後藉由衡量兩方面所萃取出的特徵向量,計算彼此間的相似性,來對名單作群聚分析的工作,進一步精煉出色情與非色情的網址,來提高名單整體的精確性。 With the explosive growing of Internet, information and knowledge may proliferating wide-spreadly and efficiently. And the computer education is available to all in recent years, let more and more people access varirty material in Internet, But at the same time, it also implyed the flooding of inappropriate Internet content. In the unfortified enviroment, some objectionable topic such as pornography, violence, and hate messages, will penetrate to those who shouldn’t access these web sites. Thus, it is nessessary that apply filting scheme to offensive content, without harmimg to free speech. Blacklist is a popular way in current web filtering research, and there are variety collecting method of blacklist, i.e. key word analysis, human inspectnig ...etc.But there are alway some false positive exist. In this paper we develope a compounded method, according to the multiple characteristics of pornography sites in image and text, to refining the blacklist. For erotic images, we use the image processing techniques: color segmentation, coarse detection, median axes extraction, and shape from shading. For text in web document, we use the techniques of Information Retrieval and Document Classification, to measure the number and frequence of erotic key word. After extract two forms of feature vector, we measure the similarity of two document by the angle of their feature vector. Finally, the refining task is cast to the graph partitioning problem, and divide the blacklist into two groups: pornographic site and non-pornographic site. |