2022
publication
A High-Speed, Long-Distance and Wall-Penetrating Covert Channel Based on EM Emanations from DRAM Clock
<p><span>An air-gapped computer is physically isolated from unsecured networks to guarantee effective protection against data exfiltration. Due to air gaps, unauthorized data transfer seems impossible over legitimate communication channels, but in reality many so-called physical covert channels can be constructed to allow data exfiltration across the air gaps. Most of such covert channels are very slow and often require certain strict conditions to work (e.g., no physical obstacles between the sender and the receiver). In this paper, we introduce a new through-wall physical covert channel named BitJabber that is extremely fast and has a long attacking distance. We show that this covert channel can be easily created by an unprivileged sender running on a victim’s computer. Specifically, the sender constructs the channel by using only memory accesses to modulate the electromagnetic (EM) signals generated by the DRAM clock. While possessing a very high bandwidth (up to 300,000 bps), this new covert channel is also very reliable (less than 1% error rate). More importantly, this covert channel can enable data exfiltration from an air-gapped computer enclosed in a room with thick walls up to 15 cm and the maximum attacking distance is more than 6 m.</span></p>
Embedded and real-time systems are increasingly attached to networks. This enables broader coordination beyond the physical system, but also opens the system to attacks. The increasingly complex workloads of these systems include software of varying assurance levels, including that which might be susceptible to compromise by remote attackers. To limit the impact of compromise, μ-kernels focus on maintaining strong memory protection domains between different bodies of software, including system services. They enable limited coordination between processes through Inter-Process Communication (IPC). Real-time systems also require strong temporal guarantees for tasks, and thus need temporal isolation to limit the impact of malicious software. This is challenging as multiple client threads that use IPC to request service from a shared server will impact each other’s response times.To constrain the temporal interference between threads, modern μ-kernels often build priority and budget awareness into the system. Unfortunately, this paper demonstrates that this is more challenging than previously thought. Adding priority awareness to IPC processing can lead to significant interference due to the kernel’s prioritization logic. Adding budget awareness similarly creates opportunities for interference due to the budget tracking and management operations. In both situations, a Thundering Herd of malicious threads can significantly delay the activation of mission-critical tasks. The Thundering Herd effects are evaluated on seL4 and results demonstrate that high-priority threads can be delayed by over 100,000 cycles per malicious thread. This paper reveals a challenging dilemma: the temporal protections μ-kernels add can, themselves, provide means of threatening temporal isolation. Finally, to defend the system, we identify and empirically evaluate possible mitigations, and propose an admission-control test based upon an interference-aware analysis.
publication
FedACA: An Adaptive Communication-Efficient Asynchronous Framework for Federated Learning
Federated Learning (FL) is a type of distributed machine learning, which avoids sharing privacy and sensitive data with a central server. Despite the advances in FL, current approaches cannot provide satisfactory performance when dealing with heterogeneity in data and unpredictability of system devices. First, straggler devices can adversely impact convergence speed of the global model training. Second, for model aggregation in traditional FL, edge devices communicate frequently with a central server using their local updates. However, this process may encounter communication bottleneck caused by substantial bandwidth usage. To address these challenges, this paper presents an adaptive, communication-efficient and asynchronous FL technique called FedACA comprising feedback loops at two levels. Our approach contains a self-adjusting local training step with active participant selection to accelerate the convergence of the global model. To reduce the communication overhead, FedACA supports an adaptive uploading policy at the edge devices, which leverages the model similarity and L2-norm differences between the current and previous local gradient. It also utilizes contrastive learning to tackle data heterogeneity by regularizing the local training if the local model has deviated from the global model and helps with the model similarity measurement in the uploading policy. Extensive experiments on a benchmark comprising three image datasets with non-independent and identically distributed (non-i.i.d) data show that FedACA adapts well to the straggler effect in asynchronous environments and also provides significant reductions in communication costs compared to other state-of-the-art FL algorithms.
Edge-based and autonomous, deep learning computer vision applications, such as those used in surveillance or traffic management, must be assuredly correct and performant. However, realizing these applications in practice incurs a number of challenges. First, the constraints on edge resources precludes the use of large-sized, deep learning computer vision models. Second, the heterogeneity in edge resource types causes different execution speeds and energy consumption during model inference. Third, deep learning models are known to be vulnerable to adversarial perturbations, which can make them ineffective or lead to incorrect inferences. Although some research that addresses the first two challenges exists, defending against adversarial attacks at the edge remains mostly an unresolved problem. To that end, this paper presents techniques to realize robust and edge-based deep learning computer vision applications thereby providing a level of assured autonomy. We utilize state-of-the-art (SOTA) object detection attacks from the TOG (adversarial objectness gradient attacks) suite to design a generalized adversarial robustness evaluation procedure. It enables fast robustness evaluations on popular object detection architectures of YOLOv3, YOLOv3-tiny, and Faster R-CNN with different image classification backbones to test the robustness of these object detection models. We explore two variations of adversarial training. The first variant augments the training data with multiple types of attacks. The second variant exchanges a clean image in the training set for a randomly chosen adversarial image. Our solutions are then evaluated using the PASCAL VOC dataset. Using the first variant, we are able to improve the robustness of YOLOv3-tiny models by 1–2% mean average precision (mAP) and YOLOv3 realized an improvement of up to 17% mAP on attacked data. The second variant saw even better results in some cases with improvements in robustness of over 25% for YOLOv3. The Faster RCNN models also saw improvement, however, less substantially at around 10–15%. Yet, their mAP was improved on clean data as well.
Although machine learning (ML)-based models are increasingly being used by cloud-based data-driven services, two key problems exist when used at the edge. First, the size and complexity of these models hampers their deployment at the edge, where heterogeneity of resource types and constraints on resources is the norm. Second, ML models are known to be vulnerable to adversarial perturbations. To address the edge deployment issue, model compression techniques, especially model quantization, have shown significant promise. However, the adversarial robustness of such quantized models remains mostly an open problem. To address this challenge, this paper investigates whether quantized models with different precision levels can be vulnerable to the same universal adversarial perturbation (UAP). Based on these insights, the paper then presents a cloud-native service that generates and distributes adversarially robust compressed models deployable at the edge using a novel, defensive post-training quantization approach. Experimental evaluations reveal that although quantized models are vulnerable to UAPs, post-training quantization on the synthesized, adversarially-trained models are effective against such UAPs. Furthermore, deployments on heterogeneous edge devices with flexible quantization settings are efficient thereby paving the way in realizing adversarially robust data-driven cloud/edge services.
Memory corruption attacks such as code injection, code reuse, and non-control data attacks have become widely popular for compromising safety-critical Cyber–Physical Systems (CPS). Moving target defense (MTD) techniques such as instruction set randomization (ISR), address space randomization (ASR), and data space randomization (DSR) can be used to protect systems against such attacks. CPS often use time-triggered architectures to guarantee predictable and reliable operation. MTD techniques can cause time delays with unpredictable behavior. To protect CPS against memory corruption attacks, MTD techniques can be implemented in a mixed time and event-triggered architecture that provides capabilities for maintaining safety and availability during an attack. This paper presents a mixed time and event-triggered MTD security approach based on the ARINC 653 architecture that provides predictable and reliable operation during normal operation and rapid detection and reconfiguration upon detection of attacks. We leverage a hardware-in-the-loop testbed and an advanced emergency braking system (AEBS) case study to show the effectiveness of our approach.
publication
Efficient Out-of-Distribution Detection Using Latent Space of β-VAE for Cyber-Physical Systems
Deep Neural Networks are actively being used in the design of autonomous Cyber-Physical Systems (CPSs). The advantage of these models is their ability to handle high-dimensional state-space and learn compact surrogate representations of the operational state spaces. However, the problem is that the sampled observations used for training the model may never cover the entire state space of the physical environment, and as a result, the system will likely operate in conditions that do not belong to the training distribution. These conditions that do not belong to training distribution are referred to as Out-of-Distribution (OOD). Detecting OOD conditions at runtime is critical for the safety of CPS. In addition, it is also desirable to identify the context or the feature(s) that are the source of OOD to select an appropriate control action to mitigate the consequences that may arise because of the OOD condition. In this article, we study this problem as a multi-labeled time series OOD detection problem over images, where the OOD is defined both sequentially across short time windows (change points) as well as across the training data distribution. A common approach to solving this problem is the use of multi-chained one-class classifiers. However, this approach is expensive for CPSs that have limited computational resources and require short inference times. Our contribution is an approach to design and train a single β-Variational Autoencoder detector with a partially disentangled latent space sensitive to variations in image features. We use the feature sensitive latent variables in the latent space to detect OOD images and identify the most likely feature(s) responsible for the OOD. We demonstrate our approach using an Autonomous Vehicle in the CARLA simulator and a real-world automotive dataset called nuImages.
<p><span>Modern smart cities are focusing on smart transportation solutions to detect and mitigate the effects of various traffic incidents in the city. To materialize this, roadside units and ambient trans-portation sensors are being deployed to collect vehicular data that provides real-time traffic monitoring. In this paper, we first propose a real-time data-driven anomaly-based traffic incident detection framework for a city-scale smart transportation system. Specifically, we propose an incremental region growing approximation algorithm for optimal Spatio-temporal clustering of road segments and their data; such that road segments are strategically divided into highly correlated clusters. The highly correlated clusters enable identifying a Pythagorean Mean-based invariant as an anomaly detection metric that is highly stable under no incidents but shows a deviation in the presence of incidents. We learn the bounds of the invariants in a robust manner such that anomaly detection can generalize to unseen events, even when learning from real noisy data. We perform extensive experimental validation using mobility data collected from the City of Nashville, Tennessee, and prove that the method can detect incidents within each cluster in real-time.</span></p>
<p><span>Physical therapy (PT) is crucial for patients to restore and maintain mobility, function, and well-being. Many on-site activities and body exercises are performed under the supervision of therapists or clinicians. However, the postures of some exercises at home cannot be performed accurately due to the lack of supervision, quality assessment, and self-correction. Therefore, in this paper, we design a new framework, PhysiQ, that continuously tracks and quantitatively measures people's off-site exercise activity through passive sensory detection. In the framework, we create a novel multi-task spatio-temporal Siamese Neural Network that measures the absolute quality through classification and relative quality based on an individual's PT progress through similarity comparison. PhysiQ digitizes and evaluates exercises in three different metrics: range of motions, stability, and repetition. We collect and annotate 31 participants' motion data with different levels of quality. Evaluation results show that PhysiQ recognizes the nuances in exercises, works with different numbers of repetitions, and achieves an accuracy of 89.67% in detecting levels of exercise quality and an average R-squared correlation of 0.949 in similarity comparison.</span></p>
2021
This paper presents our preliminary results developing an incremental query and transformation engine for our modeling framework. Our prior framework combined WebGME, a cloud-based collaborative modeling tool, with FORMULA, a language and tool for specifying and analyzing domain-specific modeling languages. While this arrangement has been successful for defining non-trivial languages in domains like CPS, one ongoing challenge is the scalability of executing model queries and transformations on large models. The inherent incremental nature of the modeling process exacerbates this scalability issue: model queries and transformations are repeatedly performed on incrementally updated models. To address this issue, we are developing an incremental version of FORMULA that can perform efficient model queries and transformations in the face of continual model updates. This paper describes our experiences designing this incremental version, including the challenges we faced and design decisions. We also report encouraging benchmark results.
During the past decade, virtualization-based (e.g., virtual machine introspection) and hardware-assisted approaches (e.g., x86 SMM and ARM TrustZone) have been used to defend against low-level malware such as rootkits. However, these approaches either require a large Trusted Computing Base (TCB) or they must share CPU time with the operating system, disrupting normal execution. In this article, we propose an introspection framework called Nighthawk that transparently checks system integrity and monitor the runtime state of target system. Nighthawk leverages the Intel Management Engine (IME), a co-processor that runs in isolation from the main CPU. By using the IME, our approach has a minimal TCB and incurs negligible overhead on the host system on a suite of indicative benchmarks. We use Nighthawk to introspect the system software and firmware of a host system at runtime. The experimental results show that Nighthawk can detect real-world attacks against the OS, hypervisors, and System Management Mode while mitigating several classes of evasive attacks. Additionally, Nighthawk can monitor the runtime state of host system against the suspicious applications running in target machine.
There has been a prolific rise in the popularity of cloud storage in recent years. While cloud storage offers many advantages such as flexibility and convenience, users are typically unable to tell or control the actual locations of their data. This limitation may affect users confidence and trust in the storage provider, or even render cloud unsuitable for storing data with strict location requirements. To address this issue, we propose a system called LAST-HDFS which integrates Location-Aware Storage Technique (LAST) into the open source Hadoop Distributed File System (HDFS). The LAST-HDFS system enforces location-aware file allocations and continuously monitors file transfers to detect potentially illegal transfers in the cloud. Illegal transfers here refer to attempts to move sensitive data outside the (“legal”) boundaries specified by the file owner and its policies. Our underlying algorithms model file transfers among nodes as a weighted graph, and maximize the probability of storing data items of similar privacy preferences in the same region. We equip each cloud node with a socket monitor that is capable of monitoring the real-time communication among cloud nodes. Based on the real-time data transfer information captured by the socket monitors, our system calculates the probability of a given transfer to be illegal. We have implemented our proposed framework and carried out an extensive experimental evaluation in a large-scale real cloud environment to demonstrate the effectiveness and efficiency of our proposed system.
Data visualization has become a vital tool to help people understand the driving forces behind real-world phenomena. Although the learning curve of visualization tools have been reduced, domain experts still often require significant amounts of training to use them effectively. To reduce this learning curve even further, this paper proposes Sketch2Vis, a novel solution using deep learning techniques and tools to generate the source code for multi-platform data visualizations automatically from hand-drawn sketches provided by domain experts, which is similar to how an expert might sketch on a cocktail napkin and ask a software engineer to implement the sketched visualization.This paper explores key challenges (such as model training) in generating visualization code from hand-drawn sketches since acquiring a large dataset of sketches paired with visualization source code is often prohibitively complicated. We present solutions for these problems and conduct experiments on three baseline models that demonstrate the feasibility of generating visualizations from hand-drawn sketches. The best models tested reach a structural accuracy of 95% in generating correct data visualization code from hand-drawn sketches of visualizations.
Counterfeiting is a significant problem for safety-critical systems, since cyber-information, such as a quality control certification, may be passed off with a flawed counterfeit part. Safety-critical systems, such as planes, are at risk because cyber-information cannot be provably tied to a specific physical part instance (e.g., impeller). This paper presents promising initial work showing that using piezoelectric sensors to measure impedance identities of parts may serve as a physically unclonable function that can produce unclonable part instance identities. When one of these impedance identities is combined with cyber-information and signed using existing public key infrastructure approaches, it creates a provable binding of cyber-information to a specific part instance. Our initial results from experimentation with traditionally and additively manufactured parts indicate that it will be extremely expensive and improbable for an attacker to counterfeit a part that replicates the impedance signature of a legitimate part.
Software transactional memory (STM) is a synchronization paradigm originally proposed for throughput-oriented computing to facilitate producing performant concurrent code that is free of synchronization bugs. With STM, programmers merely annotate code sections requiring synchronization; the underlying STM framework automatically resolves how synchronization is done. Today, the programming issues that motivated STM are becoming a concern in embedded computing, where ever more sophisticated systems are being produced that require highly parallel implementations. These implementations are often produced by engineers and control experts who may not be well versed in concurrency-related issues. In this context, a real-time STM framework would be useful in ensuring that the synchronization aspects of a system pass real-time certification. However, all prior STM approaches fundamentally rely on retries to resolve conflicts, and such retries can yield high worst-case synchronization costs compared to lock-based approaches. This paper presents a new STM class called Retry-Free Real-Time STM (R2STM), which is designed for worst-case real-time performance. The benefit of a retry-free approach for use in a real-time system is demonstrated by a schedulability study, in which it improved overall schedulability across all considered task systems by an average of 95.3% over a retry-based approach. This paper also presents TORTIS, the first R2STM implementation for real-time systems. Throughput-oriented benchmarks are presented to highlight the tradeoffs between throughput and schedulability for TORTIS.
Many embedded systems have evolved from simple bare-metal control systems to highly complex network-connected systems. These systems increasingly demand rich and feature-full operating-systems (OS) functionalities. Furthermore, the network connectedness offers attack vectors that require stronger security designs. To that end, this paper defines a prototypical RTOS API called Patina that provides services common in featurerich OSes (e.g., Linux) but absent in more trustworthy μ -kernel based systems. Examples of such services include communication channels, timers, event management, and synchronization. Two Patina implementations are presented, one on Composite and the other on seL4, each of which is designed based on the Principle of Least Privilege (PoLP) to increase system security. This paper describes how each of these μ -kernels affect the PoLP based design, as well as discusses security and performance tradeoffs in the two implementations. Results of comprehensive evaluations demonstrate that the performance of the PoLP based implementation of Patina offers comparable or superior performance to Linux, while offering heightened isolation.
<p><span>The growing importance and maturity of Internet of Things (IoT) and wearable computing are revolutionizing healthcare diagnosis and body treatment by providing access to meaningful healthcare data and improving the effectiveness of medical services. In this context, personal health information must be exchanged via trusted transactions that provide secure and encrypted sensitive data of the patient. Moreover, healthcare smart devices need flexible, programmable, and agile networks to allow on-demand configuration and management to enable scalable and interoperable healthcare applications. Two complementary trends show promise in meeting these needs. First, blockchain is emerging as a transparent, immutable, and validated-by-design technology that offers a potential solution to address the key security challenges in healthcare domains by providing secure and pseudo-anonymous transactions in a fully distributed and decentralized manner. Second, software-defined networking (SDN) offers a significant promise in meeting the healthcare communication needs by providing a flexible and programmable environment to support customized security policies and services in a dynamic, software-based fashion. To that end, we present our ideas on SDN-enabled blockchains that can be used to develop and deploy privacy-preserving healthcare applications. First, we present a survey of the emerging trends and prospects, followed by an in-depth discussion of major challenges in this area. Second, we introduce a fog computing architecture that interconnects various IoT elements, SDN networking, and blockchain computing components that control and manage patients’ health-related parameters. Third, we validate our architecture in the context of three use cases involving smart health care, precision medicine, and pharmaceutical supply chain. Finally, we discuss open issues that need significant new research investigations.</span></p>
The Internet of Things (IoT) is gaining popularity as it offers to connect billions of devices and exchange data over the internet. However, the large-scale and heterogeneous IoT network environment brings serious challenges to assuring the quality of service of IoT-based services. In this context, Software-Defined Networking (SDN) shows promise in improving the performance of IoT services by decoupling the control plane from the data plane. However, existing SDN-based distributed architectures are able to address the scalability and management issues in static IoT scenarios only. In this paper, we utilize multiple M/M/1 queues to model and optimize the service-level and system-level objectives in dynamic IoT scenarios, where the network switches and/or their request rates could change dynamically over time. We propose several heuristic-based solutions including a genetic algorithm, a simulated annealing algorithm and a modified greedy algorithm with the goal of minimizing the queuing and processing times of the requests from switches at the controllers and balancing the controller loads while also incorporating the switch migration costs. Empirical studies using Mininet-based simulations show that our algorithms offer effective self-adaptation and self-healing in dynamic network conditions.
The supervisory control and data acquisition (SCADA) network in a smart grid requires to be reliable and efficient to transmit real-time data to the controller, especially when the system is under contingencies or cyberattacks. Introducing the features of software-defined networks (SDN) into a SCADA network helps in better management of communication and deployment of novel grid control operations. Unfortunately, it is impossible to transform the overall smart grid network to have only SDN-enabled devices overnight because of budget and logistics constraints, which raises the requirement of a systematic deployment methodology. In this article, we present a framework, named SDNSynth, that can design a hybrid network consisting of both legacy forwarding devices and programmable SDN-enabled switches. The design satisfies the resiliency requirements of the SCADA network, which are determined with respect to a set of pre-identified threat vectors. The resiliency-aware SDN deployment plan primarily includes the best placements of the SDN-enabled switches (replacing the legacy switches). The plan may include one or more links to be installed newly to provide flexible or alternate routing paths. We design and implement the SDNSynth framework that includes the modeling of the SCADA topology, SDN-based resiliency measures, resiliency threats, mitigation requirements, the deployment budget, and other constraints. It uses satisfiability modulo theories (SMT) for encoding the synthesis model and solving it. We demonstrate SDNSynth on a case study of an example small-scale network. We also evaluate SDNSynth on different synthetic SCADA systems and analyze how different parameters impact each other. We simulate the SDNSynth suggested networks in a Mininet environment, which demonstrate the effectiveness of the deployment strategy over traditional networks and randomly deployed SDN switches in terms of packet loss and recovery time during network congestions.
<p><span dir="ltr">Autonomous Cyber Physical Systems (CPSs) are</span><br><span dir="ltr">often required to handle uncertainties and self-manage the system</span><br><span dir="ltr">operation in response to problems and increasing risk in the</span><br><span dir="ltr">operating paradigm. This risk may arise due to distribution</span><br><span dir="ltr">shifts, environmental context, or failure of software or hardware</span><br><span dir="ltr">components. Traditional techniques for risk assessment focus on</span><br><span dir="ltr">design-time techniques such as hazard analysis, risk reduction,</span><br><span dir="ltr">and assurance cases among others. However, these static, design-</span><br><span dir="ltr">time techniques do not consider the dynamic contexts and failures</span><br><span dir="ltr">the systems face at runtime. We hypothesize that this requires</span><br><span dir="ltr">a dynamic assurance approach that computes the likelihood</span><br><span dir="ltr">of unsafe conditions or system failures considering the safety</span><br><span dir="ltr">requirements, assumptions made at design time, past failures in a</span><br><span dir="ltr">given operating context, and the likelihood of system component</span><br><span dir="ltr">failures. We introduce the ReSonAte dynamic risk estimation</span><br><span dir="ltr">framework for autonomous systems. ReSonAte reasons over Bow-</span><br><span dir="ltr">Tie Diagrams (BTDs) which capture information about hazard</span><br><span dir="ltr">propagation paths and control strategies. Our innovation is the</span><br><span dir="ltr">extension of the BTD formalism with attributes for modeling</span><br><span dir="ltr">the conditional relationships with the state of the system and</span><br><span dir="ltr">environment. We also describe a technique for estimating these</span><br><span dir="ltr">conditional relationships and equations for estimating risk based</span><br><span dir="ltr">on the state of the system and environment. To help with this</span><br><span dir="ltr">process, we provide a scenario modeling procedure that can</span><br><span dir="ltr">use the prior distributions of the scenes and threat conditions</span><br><span dir="ltr">to generate the data required for estimating the conditional</span><br><span dir="ltr">relationships. To improve scalability and reduce the amount of</span><br><span dir="ltr">data required, this process considers each control strategy in</span><br><span dir="ltr">isolation and composes several single-variate distributions into</span><br><span dir="ltr">one complete multi-variate distribution for the control strategy</span><br><span dir="ltr">in question. Lastly, we describe the effectiveness of our approach</span><br><span dir="ltr">using two separate autonomous system simulations: CARLA and</span><br><span dir="ltr">an unmanned underwater vehicle.</span></p>
publication
Power-Attack: A Comprehensive Tool-Chain for Modeling and Simulating Attacks in Power Systems
Due to the increased deployment of novel communication, control and protection functions, the grid has become vulnerable to a variety of attacks. Designing robust machine learning based attack detection and mitigation algorithms require large amounts of data that rely heavily on a representative environment, where different attacks can be simulated. This paper presents a comprehensive tool-chain for modeling and simulating attacks in power systems. The paper makes the following contributions, first, we present a probabilistic domain specific language to define multiple attack scenarios and simulation configuration parameters. Secondly, we extend the PyPower-dynamics simulator with protection system components to simulate cyber attacks in control and protection layers of power system. In the end, we demonstrate multiple attack scenarios with a case study based on IEEE 39 bus system.
2020
publication
Google Data Collection
Technological advancements in today’s electrical grids give rise to new vulnerabilities and increase the potential attack surface for cyber-attacks that can severely affect the resilience of the grid. Cyber-attacks are increasing both in number as well as sophistication and these attacks can be strategically organized in chronological order (dynamic attacks), where they can be instantiated at different time instants. The chronological order of attacks enables us to uncover those attack combinations that can cause severe system damage but this concept remained unexplored due to the lack of dynamic attack models. Motivated by the idea, we consider a game-theoretic approach to design a new attacker-defender model for power systems. Here, the attacker can strategically identify the chronological order in which the critical substations and their protection assemblies can be attacked in order to maximize the overall system damage. However, the defender can intelligently identify the critical substations to protect such that the system damage can be minimized. We apply the developed algorithms to the IEEE-39 and 57 bus systems with finite attacker/defender budgets. Our results show the effectiveness of these models in improving the system resilience under dynamic attacks.