Publications

Search by title, author, keywords, etc.

Year

Type

Topic Area

Sort by

Order

publication

StrongBox: A GPU TEE on Arm Endpoints

A wide range of Arm endpoints leverage integrated and discrete GPUs to accelerate computation such as image processing and numerical processing applications. However, in spite of these important use cases, Arm GPU security has yet to be scrutinized by the community. By exploiting vulnerabilities in the kernel, attackers can directly access sensitive data used during GPU computing, such as personally-identifiable image data in computer vision tasks. Existing work has used Trusted Execution Environments (TEEs) to address GPU security concerns on Intel-based platforms, while there are numerous architectural differences that lead to novel technical challenges in deploying TEEs for Arm GPUs. In addition, extant Arm-based GPU defenses are intended for secure machine learning, and lack generality. There is a need for generalizable and efficient Arm-based GPU security mechanisms.To address these problems, we present StrongBox, the first GPU TEE for secured general computation on Arm endpoints. During confidential computation on Arm GPUs, StrongBox provides an isolated execution environment by ensuring exclusive access to the GPU. Our approach is based in part on a dynamic, fine-grained memory protection policy as Arm-based GPUs typically share a unified memory with the CPU, a stark contrast with Intel-based platforms. Furthermore, by characterizing GPU buffers as secure and non-secure, StrongBox reduces redundant security introspection operations to control access to sensitive data used by the GPU, ultimately reducing runtime overhead. Our design leverages the widely-deployed Arm TrustZone and generic Arm features, without hardware modification or architectural changes. We prototype StrongBox using an off-the-shelf Arm Mali GPU and perform an extensive evaluation. Our results show that StrongBox successfully ensures the GPU computing security with a low (4.70\% - 15.26\%) overhead across several indicative benchmarks.

Authored by Yunjie Deng, Chenxu Wang, Shunchang Yu, Shiqing Liu, Zhenyu Ning, Kevin Leach, Jin Li, Shoumeng Yan, Zhengyu He, Jiannong Cao, and Fengwei Zhang

publication

START: A Framework for Trusted and Resilient Autonomous Vehicles (Practical Experience Report)

From delivering groceries and vital medical supplies to driving trucks and passenger vehicles, society is becoming increasingly reliant on autonomous vehicles (AVs), It is therefore vital that these systems be resilient to adversarial actions, perform mission-critical functions despite known and unknown vulnerabilities, and protect and repair themselves during or after operational failures and cyber-attacks. While techniques have been proposed to address individual aspects of software resilience, vulnerability assessment, automated repair, and invariant detection, there is no approach that provides end-to-end trusted and resilient mission operation and repair on AVs. In this paper, we describe our experience of building START,11Software Techniques for Automated Resilience and Trust a framework that provides increased resilience, accurate vul-nerability assessment, and trustworthy post-repair operation in autonomous vehicles. We combine techniques from binary analysis and rewriting, runtime monitoring and verification, auto-mated program repair, and invariant detection that cooperatively detect and eliminate a swath of software security vulnerabilities in cyberphysical systems. We evaluate our framework using an autonomous vehicle simulation platform, demonstrating its holistic applicability to AVs.

Authored by Kevin Leach, Christopher Timperley, Kevin Angstadt, Anh Nguyen-Tuong, Jason Hiser, Aaron Paulos, Partha Pal, Patrick Hurley, Carl Thomas, Jack Davidson, Stephanie Forrest, Claire Le Goues, and Westley Weimer

publication

Syntheto: A Surface Language for APT and ACL2

Syntheto is a surface language for carrying out formally verified program synthesis by transformational refinement in ACL2 using the APT toolkit. Syntheto aims at providing more familiarity and automation, in order to make this technology more widely usable. Syntheto is a strongly statically typed functional language that includes both executable and non-executable constructs, including facilities to state and prove theorems and facilities to apply proof-generating transformations. Syntheto is integrated into an IDE with a notebook-style, interactive interface that translates Syntheto to ACL2 definitions and APT transformation invocations, and back-translates the prover's results to Syntheto; the bidirectional translation happens behind the scenes, with the user interacting solely with Syntheto.

Authored by Alessandro Coglio, Eric McCarthy, Stephen Westfold, Daniel Balasubramanian, Abhishek Dubey, and Gabor Karsai

publication

Moving target defense for the security and resilience of mixed time and event triggered cyber–physical systems

Memory corruption attacks such as code injection, code reuse, and non-control data attacks have become widely popular for compromising safety-critical Cyber–Physical Systems (CPS). Moving target defense (MTD) techniques such as instruction set randomization (ISR), address space randomization (ASR), and data space randomization (DSR) can be used to protect systems against such attacks. CPS often use time-triggered architectures to guarantee predictable and reliable operation. MTD techniques can cause time delays with unpredictable behavior. To protect CPS against memory corruption attacks, MTD techniques can be implemented in a mixed time and event-triggered architecture that provides capabilities for maintaining safety and availability during an attack. This paper presents a mixed time and event-triggered MTD security approach based on the ARINC 653 architecture that provides predictable and reliable operation during normal operation and rapid detection and reconfiguration upon detection of attacks. We leverage a hardware-in-the-loop testbed and an advanced emergency braking system (AEBS) case study to show the effectiveness of our approach.

Authored by Bradley Potteiger, Abhishek Dubey, Feiyang Cai, Xenofon Koutsoukos, and Zhenkai Zhang

publication

Efficient Out-of-Distribution Detection Using Latent Space of β-VAE for Cyber-Physical Systems

Deep Neural Networks are actively being used in the design of autonomous Cyber-Physical Systems (CPSs). The advantage of these models is their ability to handle high-dimensional state-space and learn compact surrogate representations of the operational state spaces. However, the problem is that the sampled observations used for training the model may never cover the entire state space of the physical environment, and as a result, the system will likely operate in conditions that do not belong to the training distribution. These conditions that do not belong to training distribution are referred to as Out-of-Distribution (OOD). Detecting OOD conditions at runtime is critical for the safety of CPS. In addition, it is also desirable to identify the context or the feature(s) that are the source of OOD to select an appropriate control action to mitigate the consequences that may arise because of the OOD condition. In this article, we study this problem as a multi-labeled time series OOD detection problem over images, where the OOD is defined both sequentially across short time windows (change points) as well as across the training data distribution. A common approach to solving this problem is the use of multi-chained one-class classifiers. However, this approach is expensive for CPSs that have limited computational resources and require short inference times. Our contribution is an approach to design and train a single β-Variational Autoencoder detector with a partially disentangled latent space sensitive to variations in image features. We use the feature sensitive latent variables in the latent space to detect OOD images and identify the most likely feature(s) responsible for the OOD. We demonstrate our approach using an Autonomous Vehicle in the CARLA simulator and a real-world automotive dataset called nuImages.

Authored by Shreyas Ramakrishna, Zahra Rahiminasab, Gabor Karsai, Arvind Easwaran, and Abhishek Dubey

publication

Anomaly based Incident Detection in Large Scale Smart Transportation Systems

Modern smart cities are focusing on smart transportation solutions to detect and mitigate the effects of various traffic incidents in the city. To materialize this, roadside units and ambient trans-portation sensors are being deployed to collect vehicular data that provides real-time traffic monitoring. In this paper, we first propose a real-time data-driven anomaly-based traffic incident detection framework for a city-scale smart transportation system. Specifically, we propose an incremental region growing approximation algorithm for optimal Spatio-temporal clustering of road segments and their data; such that road segments are strategically divided into highly correlated clusters. The highly correlated clusters enable identifying a Pythagorean Mean-based invariant as an anomaly detection metric that is highly stable under no incidents but shows a deviation in the presence of incidents. We learn the bounds of the invariants in a robust manner such that anomaly detection can generalize to unseen events, even when learning from real noisy data. We perform extensive experimental validation using mobility data collected from the City of Nashville, Tennessee, and prove that the method can detect incidents within each cluster in real-time.

Authored by Jaminur Islam, Jose Talusan, Shameek Bhattacharjee, Francis Tiausas, Sayyed Vazirizade, Abhishek Dubey, Keiichi Yasumoto, and Sajal Das

publication

Sketch2Vis: Generating Data Visualizations from Hand-drawn Sketches with Deep Learning

Data visualization has become a vital tool to help people understand the driving forces behind real-world phenomena. Although the learning curve of visualization tools have been reduced, domain experts still often require significant amounts of training to use them effectively. To reduce this learning curve even further, this paper proposes Sketch2Vis, a novel solution using deep learning techniques and tools to generate the source code for multi-platform data visualizations automatically from hand-drawn sketches provided by domain experts, which is similar to how an expert might sketch on a cocktail napkin and ask a software engineer to implement the sketched visualization.This paper explores key challenges (such as model training) in generating visualization code from hand-drawn sketches since acquiring a large dataset of sketches paired with visualization source code is often prohibitively complicated. We present solutions for these problems and conduct experiments on three baseline models that demonstrate the feasibility of generating visualizations from hand-drawn sketches. The best models tested reach a structural accuracy of 95% in generating correct data visualization code from hand-drawn sketches of visualizations.

Authored by Zhongwei Teng, Quchen Fu, Jules White, and Douglas Schmidt

publication

Detecting Suspicious File Migration or Replication in the Cloud

There has been a prolific rise in the popularity of cloud storage in recent years. While cloud storage offers many advantages such as flexibility and convenience, users are typically unable to tell or control the actual locations of their data. This limitation may affect users confidence and trust in the storage provider, or even render cloud unsuitable for storing data with strict location requirements. To address this issue, we propose a system called LAST-HDFS which integrates Location-Aware Storage Technique (LAST) into the open source Hadoop Distributed File System (HDFS). The LAST-HDFS system enforces location-aware file allocations and continuously monitors file transfers to detect potentially illegal transfers in the cloud. Illegal transfers here refer to attempts to move sensitive data outside the (“legal”) boundaries specified by the file owner and its policies. Our underlying algorithms model file transfers among nodes as a weighted graph, and maximize the probability of storing data items of similar privacy preferences in the same region. We equip each cloud node with a socket monitor that is capable of monitoring the real-time communication among cloud nodes. Based on the real-time data transfer information captured by the socket monitors, our system calculates the probability of a given transfer to be illegal. We have implemented our proposed framework and carried out an extensive experimental evaluation in a large-scale real cloud environment to demonstrate the effectiveness and efficiency of our proposed system.

Authored by Adam Bowers, Cong Liao, Douglas Steiert, Dan Lin, Anna Squicciarini, and Ali Hurson

publication

A Coprocessor-Based Introspection Framework Via Intel Management Engine

During the past decade, virtualization-based (e.g., virtual machine introspection) and hardware-assisted approaches (e.g., x86 SMM and ARM TrustZone) have been used to defend against low-level malware such as rootkits. However, these approaches either require a large Trusted Computing Base (TCB) or they must share CPU time with the operating system, disrupting normal execution. In this article, we propose an introspection framework called Nighthawk that transparently checks system integrity and monitor the runtime state of target system. Nighthawk leverages the Intel Management Engine (IME), a co-processor that runs in isolation from the main CPU. By using the IME, our approach has a minimal TCB and incurs negligible overhead on the host system on a suite of indicative benchmarks. We use Nighthawk to introspect the system software and firmware of a host system at runtime. The experimental results show that Nighthawk can detect real-world attacks against the OS, hypervisors, and System Management Mode while mitigating several classes of evasive attacks. Additionally, Nighthawk can monitor the runtime state of host system against the suspicious applications running in target machine.

Authored by Lei Zhou, Fengwei Zhang, Jidong Xiao, Kevin Leach, Westley Weimer, Xuhua Ding, and Guojun Wang

publication

Differential-FORMULA: Towards a Semantic Backplane for Incremental Modeling

This paper presents our preliminary results developing an incremental query and transformation engine for our modeling framework. Our prior framework combined WebGME, a cloud-based collaborative modeling tool, with FORMULA, a language and tool for specifying and analyzing domain-specific modeling languages. While this arrangement has been successful for defining non-trivial languages in domains like CPS, one ongoing challenge is the scalability of executing model queries and transformations on large models. The inherent incremental nature of the modeling process exacerbates this scalability issue: model queries and transformations are repeatedly performed on incrementally updated models. To address this issue, we are developing an incremental version of FORMULA that can perform efficient model queries and transformations in the face of continual model updates. This paper describes our experiences designing this incremental version, including the challenges we faced and design decisions. We also report encouraging benchmark results.

Authored by Qishen Zhang, Daniel Balasubramanian, Tamas Kecskes, and Janos Sztipanovits

publication

Towards secure cyber-physical information association for parts

Counterfeiting is a significant problem for safety-critical systems, since cyber-information, such as a quality control certification, may be passed off with a flawed counterfeit part. Safety-critical systems, such as planes, are at risk because cyber-information cannot be provably tied to a specific physical part instance (e.g., impeller). This paper presents promising initial work showing that using piezoelectric sensors to measure impedance identities of parts may serve as a physically unclonable function that can produce unclonable part instance identities. When one of these impedance identities is combined with cyber-information and signed using existing public key infrastructure approaches, it creates a provable binding of cyber-information to a specific part instance. Our initial results from experimentation with traditionally and additively manufactured parts indicate that it will be extremely expensive and improbable for an attacker to counterfeit a part that replicates the impedance signature of a legitimate part.

Authored by Michael Sandborn, Carlos Olea, Jules White, Chris Williams, Pablo Tarazaga, Logan Sturm, Mohammad Albakri, and Charles Tenney

publication

BlockCare: SDN-Enabled Blockchain Framework for Securing Decentralized Healthcare and Precision Medicine Applications

The growing importance and maturity of Internet of Things (IoT) and wearable computing are revolutionizing healthcare diagnosis and body treatment by providing access to meaningful healthcare data and improving the effectiveness of medical services. In this context, personal health information must be exchanged via trusted transactions that provide secure and encrypted sensitive data of the patient. Moreover, healthcare smart devices need flexible, programmable, and agile networks to allow on-demand configuration and management to enable scalable and interoperable healthcare applications. Two complementary trends show promise in meeting these needs. First, blockchain is emerging as a transparent, immutable, and validated-by-design technology that offers a potential solution to address the key security challenges in healthcare domains by providing secure and pseudo-anonymous transactions in a fully distributed and decentralized manner. Second, software-defined networking (SDN) offers a significant promise in meeting the healthcare communication needs by providing a flexible and programmable environment to support customized security policies and services in a dynamic, software-based fashion. To that end, we present our ideas on SDN-enabled blockchains that can be used to develop and deploy privacy-preserving healthcare applications. First, we present a survey of the emerging trends and prospects, followed by an in-depth discussion of major challenges in this area. Second, we introduce a fog computing architecture that interconnects various IoT elements, SDN networking, and blockchain computing components that control and manage patients’ health-related parameters. Third, we validate our architecture in the context of three use cases involving smart health care, precision medicine, and pharmaceutical supply chain. Finally, we discuss open issues that need significant new research investigations.

Authored by Akram Hakiri, Aniruddha Gokhale, and Nicolae Tapus

publication

Power-Attack: A Comprehensive Tool-Chain for Modeling and Simulating Attacks in Power Systems

Due to the increased deployment of novel communication, control and protection functions, the grid has become vulnerable to a variety of attacks. Designing robust machine learning based attack detection and mitigation algorithms require large amounts of data that rely heavily on a representative environment, where different attacks can be simulated. This paper presents a comprehensive tool-chain for modeling and simulating attacks in power systems. The paper makes the following contributions, first, we present a probabilistic domain specific language to define multiple attack scenarios and simulation configuration parameters. Secondly, we extend the PyPower-dynamics simulator with protection system components to simulate cyber attacks in control and protection layers of power system. In the end, we demonstrate multiple attack scenarios with a case study based on IEEE 39 bus system.

Authored by Ajay Chhokra, Carlos Barreto, Abhishek Dubey, Gabor Karsai, and Xenofon Koutsoukos

publication

ReSonAte: A Runtime Risk Assessment Framework for Autonomous Systems

Autonomous Cyber Physical Systems (CPSs) are often required to handle uncertainties and self-manage the system operation in response to problems and increasing risk in the operating paradigm. This risk may arise due to distribution shifts, environmental context, or failure of software or hardware components. Traditional techniques for risk assessment focus on design-time techniques such as hazard analysis, risk reduction, and assurance cases among others. However, these static, design- time techniques do not consider the dynamic contexts and failures the systems face at runtime. We hypothesize that this requires a dynamic assurance approach that computes the likelihood of unsafe conditions or system failures considering the safety requirements, assumptions made at design time, past failures in a given operating context, and the likelihood of system component failures. We introduce the ReSonAte dynamic risk estimation framework for autonomous systems. ReSonAte reasons over Bow- Tie Diagrams (BTDs) which capture information about hazard propagation paths and control strategies. Our innovation is the extension of the BTD formalism with attributes for modeling the conditional relationships with the state of the system and environment. We also describe a technique for estimating these conditional relationships and equations for estimating risk based on the state of the system and environment. To help with this process, we provide a scenario modeling procedure that can use the prior distributions of the scenes and threat conditions to generate the data required for estimating the conditional relationships. To improve scalability and reduce the amount of data required, this process considers each control strategy in isolation and composes several single-variate distributions into one complete multi-variate distribution for the control strategy in question. Lastly, we describe the effectiveness of our approach using two separate autonomous system simulations: CARLA and an unmanned underwater vehicle.

Authored by Charles Hartsell, Shreyas Ramakrishna, Abhishek Dubey, Daniel Stojcsics, Nagabhushan Mahadevan, and Gabor Karsai

publication

A software-defined blockchain-based architecture for scalable and tamper-resistant IoT-enabled smart cities

Authored by Akram Hakiri and Aniruddha Gokhale

publication

Resiliency-Aware Deployment of SDN in Smart Grid SCADA: A Formal Synthesis Model

The supervisory control and data acquisition (SCADA) network in a smart grid requires to be reliable and efficient to transmit real-time data to the controller, especially when the system is under contingencies or cyberattacks. Introducing the features of software-defined networks (SDN) into a SCADA network helps in better management of communication and deployment of novel grid control operations. Unfortunately, it is impossible to transform the overall smart grid network to have only SDN-enabled devices overnight because of budget and logistics constraints, which raises the requirement of a systematic deployment methodology. In this article, we present a framework, named SDNSynth, that can design a hybrid network consisting of both legacy forwarding devices and programmable SDN-enabled switches. The design satisfies the resiliency requirements of the SCADA network, which are determined with respect to a set of pre-identified threat vectors. The resiliency-aware SDN deployment plan primarily includes the best placements of the SDN-enabled switches (replacing the legacy switches). The plan may include one or more links to be installed newly to provide flexible or alternate routing paths. We design and implement the SDNSynth framework that includes the modeling of the SCADA topology, SDN-based resiliency measures, resiliency threats, mitigation requirements, the deployment budget, and other constraints. It uses satisfiability modulo theories (SMT) for encoding the synthesis model and solving it. We demonstrate SDNSynth on a case study of an example small-scale network. We also evaluate SDNSynth on different synthetic SCADA systems and analyze how different parameters impact each other. We simulate the SDNSynth suggested networks in a Mininet environment, which demonstrate the effectiveness of the deployment strategy over traditional networks and randomly deployed SDN switches in terms of packet loss and recovery time during network congestions.

Authored by Ahm Jakaria, Mohammad Rahman, and Aniruddha Gokhale

publication

A Self-Adaptive Load Balancing Approach for Software-Defined Networks in IoT

The Internet of Things (IoT) is gaining popularity as it offers to connect billions of devices and exchange data over the internet. However, the large-scale and heterogeneous IoT network environment brings serious challenges to assuring the quality of service of IoT-based services. In this context, Software-Defined Networking (SDN) shows promise in improving the performance of IoT services by decoupling the control plane from the data plane. However, existing SDN-based distributed architectures are able to address the scalability and management issues in static IoT scenarios only. In this paper, we utilize multiple M/M/1 queues to model and optimize the service-level and system-level objectives in dynamic IoT scenarios, where the network switches and/or their request rates could change dynamically over time. We propose several heuristic-based solutions including a genetic algorithm, a simulated annealing algorithm and a modified greedy algorithm with the goal of minimizing the queuing and processing times of the requests from switches at the controllers and balancing the controller loads while also incorporating the switch migration costs. Empirical studies using Mininet-based simulations show that our algorithms offer effective self-adaptation and self-healing in dynamic network conditions.

Authored by Ziran Min, Hongyang Sun, Shunxing Bao, Aniruddha Gokhale, and Swapna Gokhale

publication

Practical Principle of Least Privilege for Secure Embedded Systems

Many embedded systems have evolved from simple bare-metal control systems to highly complex network-connected systems. These systems increasingly demand rich and feature-full operating-systems (OS) functionalities. Furthermore, the network connectedness offers attack vectors that require stronger security designs. To that end, this paper defines a prototypical RTOS API called Patina that provides services common in featurerich OSes (e.g., Linux) but absent in more trustworthy μ -kernel based systems. Examples of such services include communication channels, timers, event management, and synchronization. Two Patina implementations are presented, one on Composite and the other on seL4, each of which is designed based on the Principle of Least Privilege (PoLP) to increase system security. This paper describes how each of these μ -kernels affect the PoLP based design, as well as discusses security and performance tradeoffs in the two implementations. Results of comprehensive evaluations demonstrate that the performance of the PoLP based implementation of Patina offers comparable or superior performance to Linux, while offering heightened isolation.

Authored by Samuel Jero, Juliana Furgala, Runyu Pan, Phani Gadepalli, Alexandra Clifford, Bite Ye, Roger Khazan, Bryan Ward, Gabriel Parmer, and Richard Skowyra

publication

TORTIS: Retry-Free Software Transactional Memory for Real-Time Systems

Software transactional memory (STM) is a synchronization paradigm originally proposed for throughput-oriented computing to facilitate producing performant concurrent code that is free of synchronization bugs. With STM, programmers merely annotate code sections requiring synchronization; the underlying STM framework automatically resolves how synchronization is done. Today, the programming issues that motivated STM are becoming a concern in embedded computing, where ever more sophisticated systems are being produced that require highly parallel implementations. These implementations are often produced by engineers and control experts who may not be well versed in concurrency-related issues. In this context, a real-time STM framework would be useful in ensuring that the synchronization aspects of a system pass real-time certification. However, all prior STM approaches fundamentally rely on retries to resolve conflicts, and such retries can yield high worst-case synchronization costs compared to lock-based approaches. This paper presents a new STM class called Retry-Free Real-Time STM (R2STM), which is designed for worst-case real-time performance. The benefit of a retry-free approach for use in a real-time system is demonstrated by a schedulability study, in which it improved overall schedulability across all considered task systems by an average of 95.3% over a retry-based approach. This paper also presents TORTIS, the first R2STM implementation for real-time systems. Throughput-oriented benchmarks are presented to highlight the tradeoffs between throughput and schedulability for TORTIS.

Authored by Claire Nord, Shai Caspin, Catherine Nemitz, Howard Shrobe, Hamed Okhravi, James Anderson, Nathan Burow, and Bryan Ward

publication

Light Reading: Optimizing Reader/Writer Locking for Read-Dominant Real-Time Workloads

Authored by Catherine Nemitz, Shai Caspin, James Anderson, and Bryan Ward

publication

A physical hash for preventing and detecting cyber-physical attacks in additive manufacturing systems

Cyber-physical security is a major concern in the modern environment of digital manufacturing, wherein a cyber-attack has the potential to result in the production of defective parts, theft of IP, or damage to infrastructure or the operator have become a real threat that have the potential to create bad parts. Current cyber only solutions are insufficient due to the nature of manufacturing environments where it may not be feasible or even possible to upgrade physical equipment to the most current cyber security standards, necessitating an approach that addresses both the cyber and the physical components. This paper proposes a new method for detecting malicious cyber-physical attacks on additive manufacturing (AM) systems. The method makes use of a physical hash, which links digital data to the manufactured part via a disconnected side-channel measurement system. The disconnection ensures that if the network and/or AM system becomes compromised, the manufacturer can still rely on the measurement system for attack detection. The physical hash ensures protection of the intellectual property (IP) associated with both process and toolpath parameters while also enabling in situ quality assurance. In this paper, the physical hash takes the form of a QR code that contains a hash string of the nominal process parameters and toolpath. It is manufactured alongside the original geometry for the measurement system to scan and compare to the readings from its sensor suite. By taking measurements in situ, the measurement system can detect in real-time if the part being manufactured matches the designer’s specification. In this paper, the overall concept and underlying algorithm of the physical hash is presented. A proof-of-concept validation is realized on a material extrusion AM machine, to demonstrate the ability of a physical hash and in situ monitoring to detect the existence (and absence) of malicious attacks on the STL file, the printing process parameters, and the printing toolpath.

Authored by Josh Brandman, Logan Sturm, Jules White, and Chris Williams

publication

Byzantine Resilient Distributed Multi-Task Learning

Distributed multi-task learning provides significant advantages in multi-agent networks with heterogeneous data sources where agents aim to learn distinct but correlated models simultaneously. However, distributed algorithms for learning relatedness among tasks are not resilient in the presence of Byzantine agents. In this paper, we present an approach for Byzantine resilient distributed multi-task learning. We propose an efficient online weight assignment rule by measuring the accumulated loss using an agent s data and its neighbors models. A small accumulated loss indicates a large similarity between the two tasks. In order to ensure the Byzantine resilience of the aggregation at a normal agent, we introduce a step for filtering out larger losses. We analyze the approach for convex models and show that normal agents converge resiliently towards the global minimum. Further, aggregation with the proposed weight assignment rule always results in an improved expected regret than the non-cooperative case. Finally, we demonstrate the approach using three case studies, including regression and classification problems, and show that our method exhibits good empirical performance for non-convex models, such as convolutional neural networks.

Authored by Jiani Li, Waseem Abbas, and Xenofon Koutsoukos

publication

A game-theoretic approach for power systems defense against dynamic cyber-attacks

Technological advancements in today’s electrical grids give rise to new vulnerabilities and increase the potential attack surface for cyber-attacks that can severely affect the resilience of the grid. Cyber-attacks are increasing both in number as well as sophistication and these attacks can be strategically organized in chronological order (dynamic attacks), where they can be instantiated at different time instants. The chronological order of attacks enables us to uncover those attack combinations that can cause severe system damage but this concept remained unexplored due to the lack of dynamic attack models. Motivated by the idea, we consider a game-theoretic approach to design a new attacker-defender model for power systems. Here, the attacker can strategically identify the chronological order in which the critical substations and their protection assemblies can be attacked in order to maximize the overall system damage. However, the defender can intelligently identify the critical substations to protect such that the system damage can be minimized. We apply the developed algorithms to the IEEE-39 and 57 bus systems with finite attacker/defender budgets. Our results show the effectiveness of these models in improving the system resilience under dynamic attacks.

Authored by Saqib Hasan, Abhishek Dubey, Gabor Karsai, and Xenofon Koutsoukos

publication

A Human Study of Comprehension and Code Summarization

Software developers spend a great deal of time reading and understanding code that is poorly-documented, written by other developers, or developed using differing styles. During the past decade, researchers have investigated techniques for automatically documenting code to improve comprehensibility. In particular, recent advances in deep learning have led to sophisticated summary generation techniques that convert functions or methods to simple English strings that succinctly describe that code s behavior. However, automatic summarization techniques are assessed using internal metrics such as BLEU scores, which measure natural language properties in translational models, or ROUGE scores, which measure overlap with human-written text. Unfortunately, these metrics do not necessarily capture how machine-generated code summaries actually affect human comprehension or developer productivity.We conducted a human study involving both university students and professional developers (n = 45). Participants reviewed Java methods and summaries and answered established program comprehension questions. In addition, participants completed coding tasks given summaries as specifications. Critically, the experiment controlled the source of the summaries: for a given method, some participants were shown human-written text and some were shown machine-generated text.We found that participants performed significantly better (p = 0.029) using human-written summaries versus machine-generated summaries. However, we found no evidence to support that participants perceive human- and machine-generated summaries to have different qualities. In addition, participants performance showed no correlation with the BLEU and ROUGE scores often used to assess the quality of machine-generated summaries. These results suggest a need for revised metrics to assess and guide automatic summarization techniques.

Authored by Sean Stapleton, Yashmeet Gambhir, Alexander LeClair, Zachary Eberhart, Westley Weimer, Kevin Leach, and Yu Huang

publication

Neurological Divide: An FMRI Study of Prose and Code Writing

Software engineering involves writing new code or editing existing code. Recent efforts have investigated the neural processes associated with reading and comprehending code —- however, we lack a thorough understanding of the human cognitive processes underlying code writing. While prose reading and writing have been studied thoroughly, that same scrutiny has not been applied to code writing. In this paper, we leverage functional brain imaging to investigate neural representations of code writing in comparison to prose writing. We present the first human study in which participants wrote code and prose while undergoing a functional magnetic resonance imaging (fMRI) brain scan, making use of a full-sized fMRI-safe QWERTY keyboard.We find that code writing and prose writing are significantly dissimilar neural tasks. While prose writing entails significant left hemisphere activity associated with language, code writing involves more activations of the right hemisphere, including regions associated with attention control, working memory, planning and spatial cognition. These findings are unlike existing work in which code and prose comprehension were studied. By contrast, we present the first evidence suggesting that code and prose writing are quite dissimilar at the neural level.

Authored by Ryan Krueger, Yu Huang, Xinyu Liu, Tyler Santander, Westley Weimer, and Kevin Leach