The Arkansas NSF EPSCoR program is a multi-institutional, interdisciplinary, statewide grant program leveraging $24 million over 5 years to expand research, workforce development, and science, technology, engineering, and mathematics (STEM) educational outreach in Arkansas. The program is administered by the Arkansas Economic Development Commission (AEDC) Division of Science and Technology to maximize resources available to support the advancement of STEM in Arkansas.
The new Track 1 project, Data Analytics that are Robust and Trusted (DART), was awarded July 1, 2020 and will fund five years of cutting-edge data science research and education around the state.
DART is designed to address fundamental barriers to practical application and acceptance of modern data analytics and, learning and prediction, any one of which could derail or impede its full development and contributions. These barriers are:
Before data streams and datasets can be used in the many kinds of learning models, they are often manually curated, or at the least, curated for a specific problem. We still rely on hosts of analysts to assess the content and quality of source data, engineer features, define data models, create ETLflows (Extract, Transform, and Load flows), and to track and document data processes and movement. The application of machine and deep learning is currently limited by the additional need for human annotation of the training data. We want to develop the means to more automatically curate heterogeneous, unstructured, and/or poorly structured data, and augment manual training to improve supervised models.
Robust and trusted data analytics is more than organizing data and enhancing its training value. Government agencies and private entities collect and integrate large amounts of data, process it in real time, and deliver products or services based on these data to consumers and constituents. There are increasing worries that both the acquisition and subsequent application of big data analytics are insecure. There are risks that activities could cause various privacy breaches, enable discrimination, or negatively impact diversity in our society. Through research in this proposal we explore techniques for securing the data by 1) protecting the privacy of contributors, 2) defining and characterizing its quality, and 3) ensuring that data analytics do not inject intentional (perhaps adversarial) and unintentional bias into subsequent models and decision support systems.
Finally, even after machine learning models are trained and validated, their predictions, while accurate within the test data, may or may not be generalizable beyond the test data. Machine learning models sacrifice interpretability for predictive power. Interpretability, and the generality of trained models, is critical in many decision-making systems and/or processes, especially when learning from multi-modal and heterogeneous big data sources. We need to better balance the predictive power of complex machine learning models with the inferential strengths of statistical models to better configure deep learning models to allow humans to see the reasoning behind the predictions.
The goal of DART is to systematically investigate key aspects of these three barriers and develop novel, integrated solutions to address them. Our team of researchers are organized into five broad areas of expertise: data curation and life cycle analysis, social awareness, social media and networks, learning and prediction, and research and development cyberinfrastructure. Each of these topical areas contributes to particular aspects of the three integrative areas: big data management and curation, security and privacy, and model interpretability.
DART will also establish a statewide data science educational ecosystem by defining a combination of model programs, degrees, pedagogy, and curriculum; providing resources and training for K20 educators; providing educational opportunities inside and outside the classroom for K20 students; and, ensuring broad participation to impact the state’s pipeline of data science skilled workers.
DART Participating Institutions:
Support for the Arkansas EPSCoR Program is provided by the National Science Foundation's Research Infrastructure Improvement Award OIA-1946391.