On Supporting Large Neural Networks Model Implementation in Programmable Data Plane

、線上人數：80

、訪客IP：18.223.211.203

姓名	莫拉那(Muhamad Rizka Maulana) 查詢紙本館藏	畢業系所	資訊工程學系
論文名稱	(On Supporting Large Neural Networks Model Implementation in Programmable Data Plane)
檔案	[Endnote RIS 格式] [Bibtex 格式] [相關文章] [文章引用] [完整記錄] [館藏目錄] 至系統瀏覽論文 ( 永不開放)
摘要(中)	神經網路演算法以高準確性聞名，被大量運用於解決各個領域的諸多問題。憑藉著在各種應用中的優良表現，將神經網路應用於交換器中的作法可以帶來許多好處，同時也是一個具有前瞻性的概念。隨著可編成資料平面語言P4的出現，使得將神經網路部屬於交換器中變成了可能。然而，當前的P4交換器在實現複雜功能方面仍然存在許多限制，如記憶體大小及有限的指令數量，此外，眾所周知神經網路的計算成本較高，通常需要復雜的神經網路架構才能實現良好的準確度。但是，在架構複雜的情況下，將會影響P4交換器轉發封包的效率。因此，在P4交換器中實現神經網路演算法的同時，如何將其帶來的效能損失最大幅度減少，是一件至關重要的事。本論文提出NNSplit的技術，用來解決將神經網路中的隱藏層佈署於多個P4交換器所帶來的效能損失問題。為了支援此做法，本論文同時提出稱為SØREN的網路協定，透過SØREN協定讓P4交換器在轉送封包的過程中，同時傳遞神經網路運算所需的激活值。本論文使用Mininet與BMv2實作，將此技術應用於分類多種不同的流量。根據實驗結果，NNSplit可減少將近50%的記憶體使用量並提高整體的吞吐量，而僅增加14%的延遲。此外，在封包中加入SØREN協定對整體的處理時間影響不大，僅213微秒。總體而言，本論文所提出的方法可以讓大型的神經網路模型應用於P4交換器中，且只帶來些微的效能損耗。
摘要(英)	Neural networks algorithms are known for their high accuracy and are heavily used to solve many problems in various fields. With its proven capability in various tasks, embedding neural networks algorithms in the data plane is an appealing and promising option. This is possible with the emergence of P4 language to control the data plane. However, current data plane technology still has many constraints to implement complex functions. Most data planes have limited memory size and a limited set of operations. In addition, it is also widely known that neural networks are computationally expensive. Generally, a complex neural networks architecture is required for achieving high accuracy. Yet, with a complex architecture, it will affect the data plane’s forwarding capability as the main function. Therefore, minimizing the performance cost caused by implementing neural networks algorithms in the data plane is critical. This thesis proposes a technique called NNSplit for solving the performance issue by splitting neural networks layers into several data planes. By splitting the layers, NNSplit distributes the computational burden from implementing neural networks across data planes. For supporting layer splitting, a new protocol called SØREN is also proposed. SØREN protocol header carries the activation value and bridges neural network layers in all switches. In our implementation, we consider a use case of multi-class traffic classification. The result from experiments using Mininet and BMv2 show that NNSplit can reduce memory usage by almost 50% and increase the throughput compared to non-splitiing scenario, with a cost of small additional delay of 14%. In addition, adding SØREN protocol in the packet brings only a small impact of 213 µs in terms of processing time. The results suggest that our method can support a large neural networks model in the data plane with a small performance cost.
關鍵字(中)	★ Programmable Data Plane ★ P4 Language ★ Neural Networks ★ Traffic Classification	關鍵字(英)	★ Programmable Data Plane ★ P4 Language ★ Neural Networks ★ Traffic Classification
論文目次	Chinese Abstract i English Abstract ii Acknowledgement iii Table of Contents iv List of Figures vii List of Tables ix 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Background 5 2.1 In-Network Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Programmable Data Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.1 Protocol-Independent Switching Architecture Model . . . . . . . . . . 10 2.3 P4 Programming Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.1 Forwarding Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.2 Target Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.3 Stateful Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4.1 Binarized Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.5 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3 RelatedWorks 23 4 Design 28 4.1 Overall Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.3 Layer Splitting Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.4 SØREN Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.5 Data Plane Pipeline Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5 Performance Evaluation 44 5.1 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.2 NN Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3 Benchmark: Decision Tree Algorithm . . . . . . . . . . . . . . . . . . . . . . . 48 5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.4.1 Measuring the Performance . . . . . . . . . . . . . . . . . . . . . . . . 51 Round-trip Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 End-to-End Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Memory Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.4.2 Benchmarking BNN with Decision Tree . . . . . . . . . . . . . . . . . 57 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Inference Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Impact of Number of Neurons . . . . . . . . . . . . . . . . . . . . . . . 64 5.4.3 Measuring the Impact of SØREN Protocol Header . . . . . . . . . . . . 66 Impact on Processing Time . . . . . . . . . . . . . . . . . . . . . . . . 66 Impact on End-to-End Throughput . . . . . . . . . . . . . . . . . . . . 68 6 Conclusion 69 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Bibliography
參考文獻	[1] Cisco Annual Internet Report - Cisco Annual Internet Report (2018-2023) White Paper. 2020. url: https://www.cisco.com/c/en/us/solutions/collateral/executiveperspectives/ annual-internet-report/white-paper-c11-741490.html. [2] Nigel Williams, Sebastian Zander, and Grenville Armitage. Evaluating Machine Learning Methods for Online Game Trac Identication. Tech. rep. 060410C. Melbourne, Australia: Centre for Advanced Internet Architectures, Swinburne University of Technology, 2006. url: http://caia.swin.edu.au/reports/060410C/CAIA-TR-060410C.pdf. [3] Ang Kun Joo Michael et al. Network trac classication via neural networks. Tech. rep. UCAM-CL-TR-912. University of Cambridge, Computer Laboratory, Sept. 2017. url: https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-912.pdf. [4] Mohammad Lotfollahi et al. “Deep Packet: A Novel Approach For Encrypted Trac Classication Using Deep Learning”. In: CoRR abs/1709.02656 (2017). arXiv: 1709.02656. url: http://arxiv.org/abs/1709.02656. [5] Benoît Claise. Cisco Systems NetFlow Services Export Version 9. RFC 3954. Oct. 2004. doi: 10.17487/RFC3954. url: https://rfc-editor.org/rfc/rfc3954.txt. [6] Paul Aitken, Benoît Claise, and Brian Trammell. Specication of the IP Flow Information Export (IPFIX) Protocol for the Exchange of Flow Information. RFC 7011. Sept. 2013. doi: 10.17487/RFC7011. url: https://rfc-editor.org/rfc/rfc7011.txt. [7] Amedeo Sapio et al. “In-Network Computation is a Dumb Idea Whose Time Has Come”. In: Proceedings of the 16th ACM Workshop on Hot Topics in Networks. HotNets- XVI. Palo Alto, CA, USA: Association for Computing Machinery, 2017, 150¡V156. isbn: 9781450355698. doi: 10.1145/3152434.3152461. url: https://doi.org/10.1145/ 3152434.3152461. [8] Noa Zilberman. In-Network Computing. 2019. url: https : / / www . sigarch . org / in - network-computing-draft/ (visited on 07/02/2021). [9] Vladimir Olteanu et al. “Stateless Datacenter Load-balancing with Beamer”. In: 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). Renton, WA: USENIX Association, Apr. 2018, pp. 125–139. isbn: 978-1-939133-01-4. url: https: //www.usenix.org/conference/nsdi18/presentation/olteanu. [10] Xin Jin et al. “NetCache: Balancing Key-Value Stores with Fast In-Network Caching”. In: Proceedings of the 26th Symposium on Operating Systems Principles. SOSP ’17. Shanghai, China: Association for Computing Machinery, 2017, 121–136. isbn: 9781450350853. doi: 10.1145/3132747.3132764. url: https://doi.org/10.1145/3132747.3132764. [11] Huynh Tu Dang et al. “Paxos Made Switch-y”. In: SIGCOMM Comput. Commun. Rev. 46.2 (May 2016), 18–24. issn: 0146-4833. doi: 10.1145/2935634.2935638. url: https: //doi.org/10.1145/2935634.2935638. [12] Minlan Yu, Lavanya Jose, and Rui Miao. “Software Dened Trac Measurement with OpenSketch”. In: 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13). Lombard, IL: USENIX Association, Apr. 2013, pp. 29–42. isbn: 978- 1-931971-00-3. url: https : / / www . usenix . org / conference / nsdi13 / technical - sessions/presentation/yu. [13] Curtis Yu et al. “FlowSense: Monitoring Network Utilization with Zero Measurement Cost”. In: Passive and Active Measurement. Ed. by Matthew Roughan and Rocky Chang. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 31–41. isbn: 978-3-642-36516- 4. [14] Mohammad Al-Fares et al. “Hedera: Dynamic Flow Scheduling for Data Center Networks”. In: 7th USENIX Symposium on Networked Systems Design and Implementation (NSDI 10). San Jose, CA: USENIX Association, Apr. 2010. url: https://www.usenix.org/ conference/nsdi10-0/hedera-dynamic-flow-scheduling-data-center-networks. [15] Brandon Heller et al. “ElasticTree: Saving Energy in Data Center Networks”. In: 7th USENIX Symposium on Networked Systems Design and Implementation (NSDI 10). San Jose, CA: USENIX Association, Apr. 2010. url: https://www.usenix.org/conference/ nsdi10-0/elastictree-saving-energy-data-center-networks. [16] Frederik Hauser et al. “A Survey on Data Plane Programming with P4: Fundamentals, Advances, and Applied Research”. In: CoRR abs/2101.10632 (2021). arXiv: 2101.10632. url: https://arxiv.org/abs/2101.10632. [17] Elie F. Kfoury, Jorge Crichigno, and Elias Bou-Harb. “An Exhaustive Survey on P4 Programmable Data Plane Switches: Taxonomy, Applications, Challenges, and Future Trends”. In: IEEE Access 9 (2021), pp. 87094–87155. doi: 10.1109/ACCESS.2021.3086704. [18] P4 Applications Working Group. In-Band Network Telemetry (INT) Dataplane Speci- cation version 2.1. 2020. url: https://github.com/p4lang/p4- applications/blob/ master/docs/INT_v2_1.pdf (visited on 05/28/2021). [19] Pat Bosshart et al. “Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN”. In: Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM. SIGCOMM ’13. Hong Kong, China: Association for Computing Machinery, 2013, 99–110. isbn: 9781450320566. doi: 10.1145/2486001.2486011. url: https: //doi.org/10.1145/2486001.2486011. [20] Pat Bosshart et al. “P4: Programming Protocol-Independent Packet Processors”. In: SIGCOMM Comput. Commun. Rev. 44.3 (July 2014), 87–95. issn: 0146-4833. doi: 10.1145/ 2656877.2656890. url: https://doi.org/10.1145/2656877.2656890. [21] P4 Language Consortium. P416 Specication version 1.2.2. 2021. url: https://p4lang. github.io/p4-spec/docs/P4-16-v1.2.2.html (visited on 05/28/2021). [22] _. Behavioral Model (bmv2). url: https : / / github . com / p4lang / behavioral - model (visited on 06/18/2021). [23] Andy Fingerhut and Antonin Bas. BMv2 Simple Switch target. 2020. url: https : / / github.com/p4lang/behavioral- model/blob/main/docs/simple_switch.md (visited on 06/18/2021). [24] Xavier Glorot, Antoine Bordes, and Yoshua Bengio. “Deep Sparse Rectier Neural Networks”. In: Proceedings of the Fourteenth International Conference on Articial Intelligence and Statistics. Ed. by Georey Gordon, David Dunson, and Miroslav Dudík. Vol. 15. Proceedings of Machine Learning Research. Fort Lauderdale, FL, USA: PMLR, 2011, pp. 315– 323. url: http://proceedings.mlr.press/v15/glorot11a.html. [25] Minje Kim and Paris Smaragdis. “Bitwise Neural Networks”. In: CoRR abs/1601.06071 (2016). arXiv: 1601.06071. url: http://arxiv.org/abs/1601.06071. [26] Matthieu Courbariaux and Yoshua Bengio. “BinaryNet: Training Deep Neural Networks withWeights and Activations Constrained to +1 or -1”. In: CoRR abs/1602.02830 (2016). arXiv: 1602.02830. url: http://arxiv.org/abs/1602.02830. [27] Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. “BinaryConnect: Training Deep Neural Networks with binary weights during propagations”. In: CoRR abs/1511.00363 (2015). arXiv: 1511.00363. url: http://arxiv.org/abs/1511.00363. [28] Haotong Qin et al. “Binary neural networks: A survey”. In: Pattern Recognition 105 (2020), p. 107281. issn: 0031-3203. doi: https : / / doi . org / 10 . 1016 / j . patcog . 2020 . 107281. url: https : / / www . sciencedirect . com / science / article / pii / S0031320320300856. [29] Mohammad Rastegari et al. “XNOR-Net: ImageNet Classication Using Binary Convolutional Neural Networks”. In: CoRR abs/1603.05279 (2016). arXiv: 1603.05279. url: http://arxiv.org/abs/1603.05279. [30] Y. Lecun et al. “Gradient-based learning applied to document recognition”. In: Proceedings of the IEEE 86.11 (1998), pp. 2278–2324. doi: 10.1109/5.726791. [31] Coralie Busse-Grawitz et al. “pForest: In-Network Inference with Random Forests”. In: CoRR abs/1909.05680 (2019). arXiv: 1909.05680. url: http://arxiv.org/abs/1909. 05680. [32] Zhaoqi Xiong and Noa Zilberman. “Do Switches Dream of Machine Learning? Toward In-Network Classication”. In: Proceedings of the 18th ACM Workshop on Hot Topics in Networks. HotNets ’19. Princeton, NJ, USA: Association for Computing Machinery, 2019, 25¡V33. isbn: 9781450370202. doi: 10.1145/3365609.3365864. url: https://doi.org/ 10.1145/3365609.3365864. [33] Giuseppe Siracusano and Roberto Bifulco. “In-network Neural Networks”. In: CoRR abs/1801.05731 (2018). arXiv: 1801.05731. url: http://arxiv.org/abs/1801.05731. [34] Jonatan Langlet. Towards Machine Learning Inference in the Data Plane. 2019. [35] Yung-Sheng Lu and Kate Ching-Ju Lin. “Enabling Inference Inside Software Switches”. In: 2019 20th Asia-Pacic Network Operations and Management Symposium (APNOMS). 2019, pp. 1–4. doi: 10.23919/APNOMS.2019.8893042. [36] Qiaofeng Qin et al. “Line-Speed and Scalable Intrusion Detection at the Network Edge via Federated Learning”. In: 2020 IFIP Networking Conference (Networking). 2020, pp. 352–360. [37] Davide Sanvito, Giuseppe Siracusano, and Roberto Bifulco. “Can the Network Be the AI Accelerator?” In: Proceedings of the 2018 Morning Workshop on In-Network Computing. NetCompute ’18. Budapest, Hungary: Association for Computing Machinery, 2018, 20–25. isbn: 9781450359085. doi: 10.1145/3229591.3229594. url: https://doi.org/ 10.1145/3229591.3229594. [38] Alex Krizhevsky, Ilya Sutskever, and Georey E Hinton. “ImageNet Classication with Deep Convolutional Neural Networks”. In: Advances in Neural Information Processing Systems. Ed. by F. Pereira et al. Vol. 25. Curran Associates, Inc., 2012. url: https:// proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b- Paper.pdf. [39] Arun Viswanathan, Eric C. Rosen, and Ross Callon. Multiprotocol Label Switching Architecture. RFC 3031. Jan. 2001. doi: 10.17487/RFC3031. url: https://rfc-editor.org/ rfc/rfc3031.txt. [40] Wojciech Muła, Nathan Kurz, and Daniel Lemire. “Faster Population Counts UsingAVX2 Instructions”. In: The Computer Journal 61.1 (May 2017), pp. 111–120. issn: 0010-4620. doi: 10.1093/comjnl/bxx046. eprint: https://academic.oup.com/comjnl/articlepdf/ 61/1/111/23571969/bxx046.pdf. url: https://doi.org/10.1093/comjnl/bxx046. [41] _. P4 V1 Model. url: https://github.com/p4lang/p4c/blob/main/p4include/v1model. p4 (visited on 06/18/2021). [42] _. P4 Tutorials. url: https://github.com/p4lang/tutorials/tree/master/utils (visited on 06/25/2021). [43] Laurent Vanbever. Improving current P4 prototyping tools [Semester Thesis Proposal]. Networked Systems Group, ETH Zürich. 2021. url: https://nsg.ee.ethz.ch/fileadmin/ user_upload/thesis_proposal_01.pdf (visited on 06/20/2021). [44] Antonin Bas. Performance of BMv2. 2019. url: https : / / github . com / p4lang / behavioral-model/blob/main/docs/performance.md (visited on 06/25/2021). [45] Ben Pfa et al. “The Design and Implementation of Open vSwitch”. In: 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15). Oakland, CA: USENIX Association, May 2015, pp. 117–130. isbn: 978-1-931971-218. url: https:// www.usenix.org/conference/nsdi15/technical-sessions/presentation/pfaff. [46] Gerard Draper-Gil. et al. “Characterization of Encrypted and VPN Trac using Timerelated Features”. In: Proceedings of the 2nd International Conference on Information Systems Security and Privacy - ICISSP, INSTICC. SciTePress, 2016, pp. 407–414. isbn: 978- 989-758-167-0. doi: 10.5220/0005740704070414. [47] Padraig Brady. ps_mem. url: https : / / github . com / pixelb / ps _ mem (visited on 07/13/2021). [48] Noa Zilberman et al. “NetFPGA: Rapid Prototyping of Networking Devices in Open Source”. In: SIGCOMM Comput. Commun. Rev. 45.4 (Aug. 2015), 363–364. issn: 0146- 4833. doi: 10 . 1145 / 2829988 . 2790029. url: https : / / doi . org / 10 . 1145 / 2829988 . 2790029.
指導教授	周立德李大衛(Chou Li-Der David C. Li)	審核日期	2021-9-6
推文	facebook plurk twitter funp google live udn HD myshare reddit netvibes friend youpush delicious baidu
網路書籤	Google bookmarks del.icio.us hemidemi myshare

博碩士論文 108522602 詳細資訊