Abstract—As distributed systems become more complex, understanding the underlying algorithms that make these systems work becomes even harder. Traditional learning modalities based on didactic teaching and theoretical proofs alone are no longer sufficient for a holistic understanding of these algorithms. Instead, an environment that promotes an immersive, hands-on learning of distributed system algorithms is needed to complement existing teaching modalities. Such an environment must be flexible to support learning of a variety of algorithms. Moreover, since many of these algorithms share several common traits with each other while differing only in some aspects, the environment should support extensibility and reuse. Finally, it must also allow students to experiment with large-scale deployments in a variety of operating environments. To address these concerns, we use the principles of software product lines (SPLs) and model-driven engineering and adopt the cloud platform to design an immersive learning environment called the Playground of Algorithms for Distributed Systems (PADS). The research contributions in PADS include the underlying feature model, the design of a domainspecific modeling language that supports the feature model, and the generative capabilities that maximally automate the synthesis of experiments on cloud platforms. A prototype implementation of PADS is described to showcase a distributed systems algorithm illustrating a peer to peer file transfer algorithm based on BitTorrent, which shows the benefits of rapid deployment of the distributed systems algorithm.
Industrial control systems (ICS) are composed of sensors, actuators, control processing units, and communication devices all interconnected to provide monitoring and control capabilities. Due to the integral role of the networking infrastructure, such systems are vulnerable to cyber attacks. In-depth consideration of security and resilience and their effects to system performance are very important. This paper focuses on railway control systems (RCS), an important and potentially vulnerable class of ICS, and presents a simulation integration platform that enables (1) Modeling and simulation including realistic models of cyber and physical components and their interactions, as well as operational scenarios that can be used or evaluations of cybersecurity risks and mitigation measures and (2) Evaluation of performance impact and security assessment of mitigation mechanisms focusing on authentication mechanisms and firewalls. The approach is demonstrated using simulation results from a realistic RCS case study.
In this paper, we propose a mixed method for analyzing telemetry data from a robotic space mission. The idea is to first apply unsupervised learning methods to the telemetry data divided into temporal segments. The large clusters that ensue typically represent the nominal operations of the spacecraft and are not of interest from an anomaly detection viewpoint. However, the smaller clusters and outliers that result from this analysis may represent specialized modes of operation, e.g., conduct of a specialized experiment on board the spacecraft, or they may represent true anomalous or unexpected behaviors. To differentiate between specialized modes and anomalies, we employ a supervised method of consulting human mission experts in the approach presented in this paper. Our longer term goal is to develop more automated methods for detecting anomalies in time series data, and once anomalies are identified, use feature selection methods to build online detectors that can be used in future missions, thus contributing to making operations more effective and improving overall safety of the mission.
Many seemingly simple questions that individual users face in their daily lives may actually require substantial number of computing resources to identify the right answers. For example, a user may want to determine the right thermostat settings for different rooms of a house based on a tolerance range such that the energy consumption and costs can be maximally reduced while still offering comfortable temperatures in the house. Such answers can be determined through simulations. However, some simulation models as in this example are stochastic, which require the execution of a large number of simulation tasks and aggregation of results to ascertain if the outcomes lie within specified confidence intervals. Some other simulation models, such as the study of traffic conditions using simulations may need multiple instances to be executed for a number of different parameters. Cloud computing has opened up new avenues for individuals and organizations with limited resources to obtain answers to problems that hitherto required expensive and computationally-intensive resources. This paper presents SIMaaS, which is a cloud-based Simulation-as-a-Service to address these challenges. We demonstrate how lightweight solutions using Linux containers (e.g., Docker) are better suited to support such services instead of heavyweight hypervisor-based solutions, which are shown to incur substantial overhead in provisioning virtual machines on-demand. Empirical results validating our claims are presented in the context of two case studies.
Resiliency and reliability is of paramount impor- tance for energy cyber physical systems. Electrical protection systems including detection elements such as Distance Relays and actuation elements such as Breakers are designed to protect the system from abnormal operations and arrest failure propagation by rapidly isolating the faulty components. However, failure in the protection devices themselves can and do lead to major system events and fault cascades, often leading to blackouts. This paper augments our past work on Temporal Causal Diagrams (TCD), a modeling formalism designed to help reason about the failure progressions by (a) describing a way to generate the TCD model from the system specification, and (b) understand the system failure dynamics for TCD reasoners by configuring simulation models.