Geekynotes 影音說明文件之 AI 自動化資訊萃取、匯入與標註;AI-based Automated Information Extraction, Import, and Annotation for Geekynotes Video Documentation

NCU Institutional Repository > 資訊電機學院 > 軟體工程研究所 > 博碩士論文 > Item 987654321/98023

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/98023

題名:	Geekynotes 影音說明文件之 AI 自動化資訊萃取、匯入與標註;AI-based Automated Information Extraction, Import, and Annotation for Geekynotes Video Documentation
作者:	楊斌翔;Yang, Pin-Hsiang
貢獻者:	軟體工程研究所
關鍵詞:	技術文件管理;知識傳承;語音轉文字（STT）;大型語言模型（LLM）;光學字元辨識（OCR）;Software Documentation;Knowledge Transfer;Speech-to-Text (STT);Large Language Model (LLM);Optical Character Recognition (OCR)
日期:	2025-07-28
上傳時間:	2025-10-17 12:15:56 (UTC+8)
出版者:	國立中央大學
摘要:	Geekynotes 是一款用於解決知識傳承問題的工具，其目標是將開發過程中產生的文件，諸如圖片、影片、簡報…等集合於一處，並且透過此工具特有的功能「標籤（Label）」將這些文件與程式碼進行連結，方便後續的使用者查詢與管理，其中能將影片與程式碼產生雙向連結的 Label 就是 Video Label。透過錄製影片解說程式碼來取代傳統藉由文字說明程式碼的方式能更清楚和快速的讓觀看者了解當初開發的目的。然而，錄製影片的方式也衍伸出三個問題：影片過長觀看者無法抓住影片重點且無法搜尋、影片的語言侷限於錄製者所使用的語言，以及影片說明與程式碼連結不夠精確。因此，本論文提出 AI Video Enhancement 系統，整合了語音轉錄（Speech-to-Text, STT）、大型語言模型（Large Language Model, LLM）與光學字元辨識（Optical Character Recognition, OCR）技術，將影片轉換為結構化的多語言文字資料，並自動化分析並建立影片片段與程式碼檔案間的精確對應關係。本研究有效解決了影片文件在知識傳承中的根本性限制。透過將非結構化的影片媒體轉換為可檢索、可互動的學習資源，顯著提升了開發者獲取與理解技術資訊的效率，為軟體開發中的知識管理與文件維護提供了創新的技術方案。 ;Geekynotes is a tool designed to address the problem of knowledge transfer in software development. Its goal is to consolidate various types of documentation—such as images, videos, and presentations—generated during the development process into a single platform. Through its unique feature called Label, these materials can be linked to specific parts of the source code, making it easier for future developers to retrieve and manage relevant information. Among these, the Video Label feature enables the creation of bidirectional links between videos and source code. Replacing traditional text-based code explanations with recorded video walkthroughs can often convey development intent more clearly and efficiently. However, relying on videos also introduces three key challenges: long videos make it difficult for viewers to identify key points and lack searchability, the language used in videos is limited to the speaker′s language, and the linkage between video explanations and actual code is often imprecise. To address these issues, this thesis proposes an AI Video Enhancement system that innovatively integrates Speech-to-Text (STT), Large Language Model (LLM), and Optical Character Recognition (OCR) technologies to transform videos into structured multilingual textual data. The system automatically analyzes content to establish precise correspondences between video segments and source code files. This research effectively addresses fundamental limitations of video documentation in knowledge transfer. By transforming unstructured video media into searchable and interactive learning resources, it significantly improves developers′ efficiency in acquiring and understanding technical information, providing an innovative technological solution for knowledge management and documentation maintenance in modern software development.
顯示於類別:	[軟體工程研究所 ] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	13	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....