博碩士論文 109522148 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:3 、訪客IP:18.220.16.184
姓名 陳哲民(Che-Min Chen)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱
(Applying Content-Defined Chunking to OCSSD-based Deduplication Systems)
相關論文
★ 重新思考虛擬記憶體管理的方式以開放通道式固態硬碟最大限度地減少深度學習推薦系統演算法的讀寫流量★ 開啟製程相似檢查方法在組裝超級塊上以最小化額外的寫入延遲
★ LaDy: Enabling Locality-aware Deduplication Technology on Shingled Magnetic Recording Drives★ On Minimizing Writing Overhead to Establish a Low-latency LSM-tree on Skyrmion-based Racetrack Memory
★ WABE: Rethinking B-epsilon-tree to Minimize Write-amplification on NAND Flash Memory★ Rethinking Bϵ tree Indexing Structure over NVM with the Support of Multi-write Modes
★ Prophet’s Insight: Unleashing Deduplication System Performance in Multi-tier Storage Systems★ Freeing the Power of High Parallelism: Accelerating the Bϵ-Tree Indexing Scheme Performance on Open-Channel SSD
★ GraLoc: Preserving Graph Locality to Minimize Read and Write Amplification on NAND Flash Memory★ On Minimizing Writing Overhead to Establish a Low-latency LSM-tree on Skyrmion-based Racetrack Memory
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   至系統瀏覽論文 (2026-8-11以後開放)
摘要(中) 本文探討了將內容定義分塊 (Content-Defined Chunking, CDC) 技術應用於基於固態硬碟(Solid State Drive, SSD) 的重複資料刪除系統(Data Deduplication System)上所遭遇得挑戰與機會。CDC 是一種根據資料的內容,而非根據邏輯位址,將資料分割成不特定長度的分塊的技術。CDC 可以減少寫入儲存系統的重複資料量,從而提高 SSD 的效能和儲存效率。然而,CDC 也為基於 SSD 的重複資料刪除系統帶來了一些難題,例如分塊與 SSD 頁面之間的不對齊,以及邏輯區塊位址 (LBAs) 與實體頁面位址 (PPAs) 之間的低效率位址映射。

為了解決這些問題,我們提出了利用主機管理型 SSD (例如 OCSSD、ZNS) 的方法,這是一種新型的 SSD,它將其內部區塊暴露給主機系統,並允許主機系統完全控制 SSD 內的資料放置。我們提出了一種簡單的資料對齊技術,以減輕不對齊對讀取效能的負面影響。我們還結合了多重串流 (multi-streaming) 的概念,根據資料的壽命將其分配到不同的串流中,並利用主機提供的關於分塊大小和創建時間的資訊,以實現更好的資料放置和減少垃圾收集的花費。我們用模擬實作的方式來評估我們提出的方法,並分享一些關於我們所遭遇得挑戰的見解。
摘要(英) This paper explores the challenges and opportunities of applying Content-Defined Chunking (CDC) to SSD-based deduplication systems. CDC is a technique that divides data into variable-sized chunks based on their content, rather than fixed-sized blocks based on their logical addresses. CDC can reduce the amount of redundant data written to the storage system, which can improve the performance and storage efficiency of SSDs. However, CDC also introduces some difficulties for SSD-based deduplication systems, such as misalignment between chunks and SSD pages, and inefficient address mapping between logical block addresses (LBAs) and physical page addresses (PPAs).

To address these issues, we propose to leverage host-managed SSDs (e.g. OCSSD, ZNS), a new type of SSDs that that expose their internal blocks to the host system and allow the host system to to fully control the data placement within the SSD. We proposed a simple data fitting technique to alleviate the negative impact of misalignment on read performance. We also combine the concept of multi-streaming, which distribute data based on lifetime into different streams, with the host-provided information about chunk′s size and creation time, to achieve better data placement and mitigate the overhead of garbage collection, and. We evaluate our approach with a simulated implementation and share some insights of the challenges.
關鍵字(中) ★ 快閃記憶體
★ 固態硬碟
★ 內容定義分塊
★ 重複資料刪除技術
關鍵字(英) ★ NAND Flash
★ Solid State Drives
★ Content-Defined Chunking
★ Data Deduplication
論文目次 1 Introduction 1
2 Background and Motivation 5
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 NAND flash memory . . . . . . . . . . . . . . . . . . 5
2.1.2 Data deduplication . . . . . . . . . . . . . . . . . . . 6
2.1.3 Open-Channel Solid State Drives (OCSSDs) . . . . . 7
2.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Storage efficiency improvement of implementing CDC 7
2.2.2 Storage Misalignment . . . . . . . . . . . . . . . . . 8
2.2.3 Overhead during Garbage Collection . . . . . . . . . 9
3 OCSSD-based CDC-deduplication 10
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 CDC-Awarded Fingerprint Store . . . . . . . . . . . . . . . . 11
iv
3.3 CDC-Awarded Space Management . . . . . . . . . . . . . . . 12
3.4 BestFit : A CDC-awared data placement technique . . . . . . 13
3.5 Lifetime-Awarded Multi-Stream Relocation . . . . . . . . . . 14
4 Evaluations 18
4.1 Experimental Environment . . . . . . . . . . . . . . . . . . . 18
4.2 Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . 21
5 Conclusion and Further Works 26
5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.2 Further Works . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Bibliography 28
參考文獻 [1] Matias Bjørling, Javier Gonzalez, and Philippe Bonnet. “LightNVM: The Linux Open-
Channel SSD Subsystem”. In: 15th USENIX Conference on File and Storage Tech-
nologies (FAST 17). Santa Clara, CA: USENIX Association, Feb. 2017, pp. 359–
374. ISBN: 978-1-931971-36-2. URL: https://www.usenix.org/conference/
fast17/technical-sessions/presentation/bjorling.
[2] Feng Chen, Tian Luo, and Xiaodong Zhang. “CAFTL: A Content-Aware Flash Trans-
lation Layer Enhancing the Lifespan of Flash Memory based Solid State Drives”. In:
9th USENIX Conference on File and Storage Technologies (FAST 11). San Jose, CA:
USENIX Association, Feb. 2011. URL: https://www.usenix.org/conference/
fast11 / caftl - content - aware - flash - translation - layer - enhancing -
lifespan-flash-memory-based.
[3] PUB FIPS. “180-1. secure hash standard”. In: National Institute of Standards and
Technology 17 (1995), p. 45.
[4] Robert E Fontana and Gary M Decad. “Moore’s law realities for recording systems
and memory storage components: HDD, tape, NAND, and optical”. In: AIP Advances
8 (2018).
[5] Jeong-Uk Kang et al. “The multi-streamed solid-state drive”. In: 6th {USENIX} Work-
shop on Hot Topics in Storage and File Systems (HotStorage 14). 2014.
[6] Richard M. Karp and Michael O. Rabin. “Efficient randomized pattern-matching al-
gorithms”. In: IBM Journal of Research and Development 31.2 (Mar. 1987), pp. 249–
28
260. DOI: 10.1147/rd.312.0249. URL: https://doi.org/10.1147/rd.312.
0249.
[7] Huaicheng Li et al. “The CASE of FEMU: Cheap, Accurate, Scalable and Extensi-
ble Flash Emulator”. In: 16th USENIX Conference on File and Storage Technolo-
gies (FAST 18). Oakland, CA: USENIX Association, Feb. 2018, pp. 83–90. ISBN:
978-1-931971-42-3. URL: https : / / www . usenix . org / conference / fast18 /
presentation/li.
[8] Dutch T. Meyer and William J. Bolosky. “A Study of Practical Deduplication”. In:
9th USENIX Conference on File and Storage Technologies (FAST 11). San Jose, CA:
USENIX Association, Feb. 2011. URL: https://www.usenix.org/conference/
fast11/study-practical-deduplication.
[9] Athicha Muthitacharoen, Benjie Chen, and David Mazieres. “A low-bandwidth net-
work file system”. In: Proceedings of the eighteenth ACM symposium on Operating
systems principles. 2001, pp. 174–187.
[10] NVME 2.0. 2021. URL: https : / / nvmexpress . org / wp - content / uploads /
NVMe-NVM-Express-2.0a-2021.07.26-Ratified.pdf.
[11] NVMe zoned namespaces (ZNS) devices. 2021. URL: http://zonedstorage.io/
docs/introduction/zns.
[12] Open-Channel SSDs (OCSSD) devices. 2021. URL: https://openchannelssd.
readthedocs.io/en/latest/.
[13] QLC Flash HAMRs HDD. 2021. URL: https : / / wikibon . com / qlc - flash -
hamrs-hdd/.
[14] Sean Quinlan, Sean Dorward, et al. “Venti: A new approach to archival storage.” In:
FAST. Vol. 2. 2002, pp. 89–101.
[15] R. RIVEST. “The MD5 message-digest algorithm.” In: (1992). URL: https://www.
ietf.org/rfc/rfc1321.txt.
29
[16] Suzhen Wu et al. “EaD: a Collision-free and High Performance Deduplication Scheme
for Flash Storage Systems”. In: 2020 IEEE 38th International Conference on Com-
puter Design (ICCD). 2020, pp. 155–162. DOI: 10.1109/ICCD50377.2020.00039.
[17] Wen Xia et al. “FastCDC: A fast and efficient content-defined chunking approach for
data deduplication”. In: 2016 {USENIX} Annual Technical Conference ({USENIX}{ATC}
16). 2016, pp. 101–114.
[18] Jingpei Yang et al. “Don’t stack your log on my log”. In: 2nd Workshop on In-
teractions of NVM/Flash with Operating Systems and Workloads ({INFLOW} 14).
2014.
指導教授 陳增益(Tseng-Yi Chen) 審核日期 2023-8-11
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明