III: Small: Collaborative Research: Summarizing Heterogeneous Crowdsourced & Web Streams Using Uncertain Concept Graphs
Ubiquitous access to mobile and web technologies enables the public to share valuable information about their surroundings anywhere and anytime. For example, during an emergency or crisis people report needs from affected areas via social media as an alternative to the traditional 911 calls. This can be valuable information for a range of emergency service officials. However, the utilization of this data poses several computational challenges as it is generated in real time, is heterogeneous, highly unstructured, redundant, and sometimes unreliable. This project innovates in two specific directions to alleviate the challenges associated with large, streaming datasets during emergencies: (1) The project investigates new summarization approaches to handle noisy, unstructured data streams from multiple web sources in real time while accounting for the possibility of untrustworthy information, so that they can be fed into decision support systems of public services in a structured and machine-readable format. (2) The project develops and validates robust decision support systems for allocating critical resources to needed areas based on the structured summary reports. The evaluation plan includes collaboration with emergency responders and the communities they serve. The broader impacts of this research include the design of a generic methodology to extract, integrate, and summarize structured information from big data streams on the web for helping public services of future smart cities. The research team plans to share simulated datasets with an open source system for real-time decision support during emergency response exercises. This can assist in workforce training and also, help design novel educational projects of data science for social good.
Formally, this research project investigates the theories behind a novel knowledge representation called Uncertain Concept Graph. The graph contains heterogeneous nodes based on key concepts of an application domain (e.g., regions, incidents, and information sources during a disaster). The graph has heterogeneous edges connecting these concept nodes, based on the inference of concept relationships using the extracted information from data streams (e.g., Twitter and news sources). The structure of the graph evolves over time and both nodes and edges can be added, deleted, or updated. An equivalent Bayesian Network is derived from the Uncertain Concept Graph describing the dependencies between the events captured in the graph at a given time instance. Based on the relationship edges in a graph state and the constructed Bayesian Network, an action recommendation system is created to support an application domain task (e.g., dispatching ambulance resources to incident-specific regions). To ensure robustness, this project develops and validates a novel anomaly identification and diagnosis approach using mode similarity to assess the correctness of current state of concept nodes and their relationships in the Uncertain Concept Graph at any time. The research team uses historical datasets of recent disasters to construct the graph and develop a demo system for domain evaluation, in order to recommend actions in emergency response for the city emergency services. The investigators are including the lessons learned and methodologies developed in their respective course curriculums.