“High Integrity Data Engineering and Knowledge Extraction for Human-Augmented Science”
Dr. Eric Davis
Thursday, Jan. 31, 11:45 am
Complex system analysis currently requires teams of domain experts, data scientists, mathematicians, and software engineers to support the entire life cycle of model-based inference. The models that result from this process are often bespoke, lack generalizability, are not performable, and make it difficult for practitioners to synthesize actionable knowledge and policies from their raw outputs. In addition our growing reliance on automated reasoning, machine learning, and machine-aided decision making has lead to serious vulnerabilities in the area of data-integrity. The trustworthy and reliable operation of next-generation data-driven systems and the infrastructure which manages this data will require effective and scalable solutions to the growing threat of faults due to data-integrity, especially when using patient-reported data or crowd sourced observations. In this talk we will discuss current efforts from our laboratory to improve and formalize these processes and establish more rigorous methods which can be more easily validated and verified. We will present AMIDOL, our novel system which aims to reduce the overhead associated with the entire model life cycle, enabling experts to more easily build, maintain, and reason over models of complex systems and respond rapidly to emerging crises. AMIDOL leverages visual domain specific languages and a rich intermediate representation which derives formal executable semantics from the semi-formal diagrams scientists normally construct. We also present IAIDO, a type system of Integrity-Aware Data Objects which utilizes the concepts of polymorphism, subsumption, composition, association, and aggregation to build a system of inheritance to improve data-integrity for large-scale data sets with shared provenance, representations, and types. We demonstrate these methods on data from our partners at the CDC to address problems in the domain of epidemic outbreak and disease management to help coordinate policies and responses to these crises in real-time.
Dr. Eric Davis is a Principal Scientist at an Industrial Lab where he is PI on over $10 million of externally funded research in the areas of data-driven modeling, data engineering, data science, machine learning, and artificial intelligence. His research focuses on issues surrounding complex systems, data integrity, and highly interdependent systems, especially those with applications for social good. Before moving to industry, Eric was an Assistant Professor at the University of Miami and Iowa State University where he was director of the Trustworthy Data Engineering Laboratory, and founder of the Fortinet Cybersecurity Laboratory. He has twice served as an Eric and Wendy Schmidt Data Science for Social Good Summer Faculty Fellow at the University of Chicago, has been named a Frontiers of Engineering Education Faculty Member by the National Academy of Engineering, and served as an IBM Doctoral Fellow while completing his PhD at the University of Illinois at Urbana-Champaign.