CSR: Small: Cascading Failures Modeling and Mitigation in the Internet of Things
Abstract (NSF2302094)
Cascading failures take place when a single incident triggers successive malfunction of system components in a chain reaction. Without timely and effective mitigation mechanisms, cascading failures often cause catastrophic impacts on the entire system and even the environment and society. There is a big gap between the existing cascading failure theories and realities in the Internet of Things (IoT) systems. This project aims to make breakthroughs in the IoT cascading failure and reliability research to bridge the gap, mitigating serious consequences caused by the failures. In addition, the project has educational components with the goals to help increase participation of underrepresented minorities, particularly women in computing and engineering, and to integrate research and education through new curriculum development and research training of both graduate and undergraduate students.
The overarching objective of this project is to enhance the reliability and robustness of IoT-based systems by developing efficient cascading failure modeling and analysis methods as well as effective mitigation strategies to strengthen IoT systems’ resilience to cascading failures. Specifically, the project will (1) develop new comprehensive mathematical models to characterize the complex physical mechanism of cascading failures in IoT; (2) create efficient and scalable reliability evaluation methodologies to provide an accurate and fast prediction of the IoT system reliability considering impacts of cascading failures and various dependencies; (3) develop innovative in-process and multi-phase mitigation mechanisms to effectively respond to cascading failures and prevent or at least alleviate their consequence; and (4) evaluate and validate the proposed research using case studies of real-world IoT-based systems as well as publications evaluated by independent peers and experts. The proposed cascading failure models and mitigation strategies, and reliability analysis methods are fundamental contributions to the body of knowledge on the reliable and robust design and operation of modern and evolving IoT-based systems.
Members
- Dr. Liudong Xing (PI)
- Guixiang Lyu (PhD student)
- Junxing Ren (MS student)
- Xianchao Guo (PhD student)
Collaborators
- Dr. Gregory Levitin; Senior Expert, Reliability Department, NOGA- Israel Independent System Operator, Israel
Publications
- L. Xing, Reliability and Resilience in the Internet of Things, Elsevier, Paperback ISBN: 9780443156106, eBook ISBN: 9780443156113, April 2024, https://doi.org/10.1016/C2022-0-01488-4.
- L. Xing, “Decision Diagrams for Complex System Reliability Analysis,” in Frontiers of Performability Engineering (Editor: Durga R Karanki), Springer Series in Risk, Reliability and Safety Engineering, Springer, Singapore, Chapter 3, pp. 51-67, March 2024, https://doi.org/10.1007/978-981-99-8258-5_3
- G. Lyu*, L. Xing, and G. Zhao*, “Static and Dynamic Load-Triggered Cascading Failure Mitigation for Storage Area Networks,” International Journal of Mathematical, Engineering and Management Sciences, vol. 9, no. 4, pp. 697-713, August, 2024, https://doi.org/10.33889/IJMEMS.2024.9.4.036
- G. Levitin, L. Xing, and Y. Dai, “A new self-adaptive mission aborting policy for systems operating in uncertain random shock environment,” Reliability Engineering & System Safety, vol. 248, 110184, August 2024, https://doi.org/10.1016/j.ress.2024.110184
- G. Levitin, L. Xing, and Y. Dai, “Multi-attempt missions with multiple rescue options,” Reliability Engineering & System Safety, vol. 248, 110168, August 2024, https://doi.org/10.1016/j.ress.2024.110168
- G. Levitin, L. Xing, and Y. Dai, “Optimizing time-varying performance and mission aborting policy in resource constrained missions,” Reliability Engineering & System Safety, vol. 245, 110011, May 2024, https://doi.org/10.1016/j.ress.2024.110011
- G. Levitin, L. Xing, and Y. Dai, “Consecutively connected systems with unreliable resource generators and storages,” Reliability Engineering & System Safety, vol. 241, 109680, January 2024, https://doi.org/10.1016/j.ress.2023.109680
Conference & Poster Presentations
- G. Lyu* and L. Xing, “Mitigating Risk of Cascading Failures in Storage Area Networks,” UMassD Sigma Xi Research Exhibition, April 2024.
Seminar, Keynote Talks
- “Reliability in the Internet of Things,” Keynote Talk, International Conference on Quality, Reliability, Risk, Maintenance, and Safety Engineering, July 26, 2024
- “Complex System Reliability Engineering: Methodology, Behavior, Application,” Fu-Jen Catholic University, Institute of Information Management, Taiwan, June 6, 2024
- “Reliability in the Internet of Things,” Keynote Talk, IEEE International Conference on Computing, Power, and Communication Technologies, February 9, 2024
- “Research Papers Publishing & Writing,” Introduction to the Research University Workshop/Seminar Series for NSF S-STEM and SFS Scholars, College of Engineering, UMass Dartmouth, October 30, 2023
Supporting Activities
- Xing attended workshop “Empowering Women in Science & Engineering”, October 25 (4:00-8:00 p.m), 2023.
- Xing chaired the SIG – IoT in Tactile Internet, IoT-AHSN TC (Internet of Things, Ad Hoc and Sensor Networks Technical Committee), IEEE Communications Society, 2023, 2024
- MS student Junxing Ren participated in the High School visit (recruitment activities) of the College of Engineering to demo the UAV, May 31, 2024
Acknowledgment: This site is based upon work supported by the National Science Foundation under Grant No. 2302094. Any opinions, findings, and conclusions or recommendations expressed in this site are those of the author(s) and do not necessarily reflect the views of the National Science Foundation