博碩士論文 110522108 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:3 、訪客IP:3.144.151.106
姓名 拾以兆(Yi-Chao Shih)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱
(Prophet’s Insight: Unleashing Deduplication System Performance in Multi-tier Storage Systems)
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   至系統瀏覽論文 (2026-7-25以後開放)
摘要(中) 隨著巨量資料的增加,多層式儲存系統 (Multi-tier Storage System) 如傳統硬碟(Hard Disk Drive, HDD) 與固態硬碟 (Solid State Drive, SSD) 的組合被廣泛利用來增加成本效益。即使 SSD 的高速的讀寫特性能提高整體系統的效能,但 SSD 仍需面對其有限壽命、耐用程度的問題。為了解決這些問題,重複資料刪除 (Data Deduplication) 技術被提出,並且其可以有效減少 SSD 的寫入量。然而,隨著 SSD存儲媒介 NAND Flash 的基本單位 page 變大,會造成去重率 (Deduplication Rate)下降。本文提出了一種在 HDD-SSD 結合的儲存系統上的實施多粒度去重方法(Multi-grained Deduplication)。我們將 SSD 僅用作 read cache,並利用更細的粒度(例如: HDD 的基本單位,sector) 來進行去重。透過對 page 間的內容檢測,對於相似的 page 實施細粒度的去重,進而提高去重率、去重效能以及 I/O 效能。依據實驗結果顯示,與傳統方法 CAFTL 比較,多粒度去重在去重率上提高最多 69.4%,去重時間減少最多 4.8 倍。
摘要(英) As the volume of data continues to increase, multi-tier storage systems such as HDD-SSD combination are commonly adopted to achieve cost-effectiveness. Although SSDs can greatly benefit the overall system with their high performance, they still face challenges related to limited lifespan and durability. To address these problems, data deduplication has been proposed as an effective
technique to reduce the amount of data written to SSDs. However, as the size of NAND flash pages continues to grow, the deduplication rate with fixed-size chunking decreases. In this paper, we propose a multi-grained deduplication
approach on the HDD-SSD combination storage system. We utilize SSD as solely a read cache and leveraging finer-granularity (i.e., sector) in deduplication. Specifically, our proposed solution minimizes computational overhead
and achieves efficient detection on similar page content which contributes to fine-grained deduplication. By doing so, we can improve the deduplication rate, deduplication performance, and I/O performance. Our experimental results demonstrate that our multi-grained deduplication improves up to 69.4% in deduplication rate and reduces computational latency by up to 4.8 times with the compared baseline.
關鍵字(中) ★ 快閃記憶體
★ 多層式儲存系統
★ 重複資料刪除技術
關鍵字(英) ★ NAND Flash
★ Multi-tier Storage System
★ Data Deduplication
論文目次 1 Introduction 1
2 Background and Motivation 4
2.1 Multi-tier Storage System . . . . . . . . . . . . . . . . . . . . 4
2.2 Trend of NAND Flash Development . . . . . . . . . . . . . . 6
2.3 Data Deduplication . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 The Effect of Deduplication under Different Chunk Size . . . 9
3 Multi-grained Deduplication 11
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.1 Hint Tble . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.2 Multi-grained Deduplicator . . . . . . . . . . . . . . 14
3.2.3 Address Mapping . . . . . . . . . . . . . . . . . . . . 19
3.2.4 Garbage Collection . . . . . . . . . . . . . . . . . . . 20
iii
4 Evaluations 21
4.1 Experimental Environment . . . . . . . . . . . . . . . . . . . 21
4.2 Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . 23
Reference 27
參考文獻 [1] ”Synology C2”, https://c2.synology.com.
[2] ” Seagate Firecuda SSHD”, https:// www.seagate.com/ tw/ zh/ products/
internal-hard-drives/ internal-hard-drives/ firecuda-solid-state-hybriddrive–sshd-/
[3] ”IBM storage”, https://www.ibm.com/storage.
[4] ”NetApp Flash Cache Datasheet”, https://www.netapp.com/media/19759-
ds-2811.pdf.
[5] Gokul Soundararajan, Vijayan Prabhakaran, Mahesh Balakrishnan, and
Ted Wobber. 2010. Extending SSD lifetimes with disk-based write caches.
In Proceedings of the 8th USENIX conference on File and storage technologies (FAST’10). USENIX Association, USA, 8.
[6] J. Kim et al., ”Deduplication in SSDs: Model and quantitative analysis,”
2012 IEEE 28th Symposium on Mass Storage Systems and Technologies
(MSST), 2012, pp. 1-12, doi: 10.1109/MSST.2012.6232379.
[7] Feng Chen, Tian Luo, and Xiaodong Zhang. 2011. CAFTL: a contentaware flash translation layer enhancing the lifespan of flash memory based solid state drives. In Proceedings of the 9th USENIX conference on File
and stroage technologies (FAST’11). USENIX Association, USA, 6.
[8] C. Lin, Y. Chang, T. Kuo, H. Chang and H. Li, ” How to improve the space utilization of dedup-based PCM storage devices?,”
2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2015, pp. 11-20, doi: 10.1109/CODESISSS.2015.7331363.
[9] Benjamin Zhu, Kai Li, and Hugo Patterson. 2008. Avoiding the disk bottleneck in the data domain deduplication file system. In Proceedings of the
6th USENIX Conference on File and Storage Technologies (FAST’08).
USENIX Association, USA, Article 18, 1–14.
[10] Zhou, You, Qiulin Wu, Fei Wu, Hong Jiang, Jian Zhou and Changsheng Xie.“Remap-SSD: Safely and Efficiently Exploiting SSD Address
Remapping to Eliminate Duplicate Writes."FAST (2021).
[11] Z. Chen, Z. Chen, N. Xiao and F. Liu, ” NF-Dedupe: A novel nofingerprint deduplication scheme for flash-based SSDs,” 2015 IEEE Symposium on Computers and Communication (ISCC), 2015, pp. 588-594,
doi: 10.1109/ISCC.2015.7405578.
[12] Yang, Qirui, Runyu Jin and Ming Zhao.“SmartDedup: Optimizing Deduplication for Resource-constrained Devices."USENIX Annual Technical
Conference (2019).
[13] Zhichao Cao, Hao Wen, Xiongzi Ge, Jingwei Ma, Jim Diehl, and David
H. C. Du. 2019. TDDFS: A Tier-Aware Data Deduplication-Based File System. ACM Trans. Storage 15, 1, Article 4 (February 2019), 26 pages.
DOI:https://doi.org/10.1145/3295461
[14] G. Cheng, D. Guo, L. Luo, J. Xia and S. Gu, ”LOFS: A Lightweight
Online File Storage Strategy for Effective Data Deduplication at Network
Edge” in IEEE Transactions on Parallel & Distributed Systems, vol. , no.
01, pp. 1-1, 5555. doi: 10.1109/TPDS.2021.3133098
[15] Wen Xia, Yukun Zhou, Hong Jiang, Dan Feng, Yu Hua, Yuchong Hu,
Yucheng Zhang, and Qing Liu. 2016. FastCDC: a fast and efficient contentdefined chunking approach for data deduplication. In Proceedings of
the 2016 USENIX Conference on Usenix Annual Technical Conference
(USENIX ATC ’16). USENIX Association, USA, 101–114.
[16] S. Wu et al., ”EaD: a Collision-free and High Performance Deduplication
Scheme for Flash Storage Systems,” 2020 IEEE 38th International Conference on Computer Design (ICCD), 2020, pp. 155-162, doi: 10.1109/
ICCD50377.2020.00039.
[17] F. Wu et al., ”Characterizing 3D Charge Trap NAND Flash: Observations, Analyses and Applications,” 2018 IEEE 36th International Conference on Computer Design (ICCD), 2018, pp. 381-388, doi: 10.1109/
ICCD.2018.00064.
[18] Jinhua Cui, Youtao Zhang, Liang Shi, Chun Jason Xue, Jun Yang, Weiguang Liu, Laurence T. Yang,Leveraging partial-refresh for performance and lifetime improvement of 3D NAND flash memory in cyber-physical systems,Journal of Systems Architecture,Volume
103,2020,101685,ISSN 1383-7621.
[19] F. Margaglia and A. Brinkmann, ”Improving MLC flash performance and
endurance with extended P/E cycles,” 2015 31st Symposium on Mass Storage Systems and Technologies (MSST), Santa Clara, CA, USA, 2015, pp.
1-12, doi: 10.1109/MSST.2015.7208278.
[20] Open Nand Flash Interface: https://www.onfi.org/specifications
[21] Arash Tavakkol, Juan Gómez-Luna, Mohammad Sadrosadati, Saugata
Ghose, and Onur Mutlu. 2018. MQsim: a framework for enabling realistic studies of modern multi-queue SSD devices. In Proceedings of the
16th USENIX Conference on File and Storage Technologies (FAST’18).
USENIX Association, USA, 49–65.
[22] ”TensorFlow open source”, https://github.com/tensorflow/tensorflow
[23] ”Powertoy open source”, https://github.com/microsoft/PowerToys
[24] ”Enron email dataset”, https://www.cs.cmu.edu/ enron/
[25] Gala Yadgar, MOSHE Gabel, Shehbaz Jaffer, and Bianca Schroeder.
2021. SSD-based Workload Characteristics and Their Performance Implications. ACM Trans. Storage 17, 1, Article 8 (February 2021), 26 pages.
https://doi.org/10.1145/3423137
[26] ”SNIA block I/O traces”, http://iotta.snia.org/traces/block-io
指導教授 陳增益 陳增益(Tseng-Yi Chen Tseng-Yi Chen) 審核日期 2023-7-26
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明