PSU Data Science Lab

NSF IIS -- Career: Automated Multimodal Learning for Healthcare, 2023-2028, $555,752

[Project Page]Multimodal learning is one of the central tasks of artificial intelligence (AI), which aims to effectively fuse and model multimodal data to gain a better understanding of the world around us. Many multimodal fusion strategies have been proposed, ranging from manually designed policies to advanced automated machine learning (AutoML)-based approaches. Although AutoML-based solutions outperform handcrafted ones, they are still far from optimal due to their lack of generalizability in model design and failure to account for the unique characteristics of multimodal data. This project takes the multimodal healthcare predictive modeling task as a representative example, aiming to discover and identify the optimal way to fuse multimodal data via a new learning paradigm, i.e., automated multimodal learning, with minimal human interventions. The success of this project will yield new fundamental knowledge in various fields, including automated machine learning, multimodal deep learning, and healthcare predictive modeling. The new automated multimodal learning paradigm will revolutionize multimodal data mining by automatically searching for new and complex yet optimal fusion strategies from the data, potentially motivating researchers and domain experts to understand the multimodal data better. In addition, recognizing unique research challenges posed by the unique nature of multimodal data in the healthcare domain and providing customized solutions will advance the research of healthcare predictive modeling significantly.

To meet these goals, the investigator proposes to equip automated multimodal learning with the ability to model the unique challenges of multimodal health data, including data size variety, noise, and missing modalities. The investigator also proposes to validate the proposed research for different multimodal fusion tasks in healthcare informatics and beyond and gather feedback from experts to refine the proposed research. The results of this project will provide a needed paradigm shift toward automated multimodal data fusion, impacting a broad range of research fields, including machine learning, data mining, and healthcare informatics. The proposed research will also make an enduring contribution to multimodal predictive modeling in clinical practice and other domains. The generated data, source codes, and software tools will be made available to researchers worldwide. The open platform will expedite research, enhance global collaborations in this field, and provide longstanding value for academia, healthcare organizations, and health industries. The proposed education plan will help to ensure that graduates are well equipped to design and evaluate machine learning solutions and cultivate K-12 students' interest in computer science and informatics. It will also lead to a more diverse population of undergraduate research assistants and enhance collaboration and networking among graduate students.

NIH R01 -- SCH: AI-Enhanced Multimodal Sensor-on-a-chip for Alzheimer's Disease Detection, 2022-2026, $1.2M

[Project Page] We propose a new research paradigm aimed at addressing scientific questions in both biosensing and machine learning for the early prediction of Alzheimer's disease (AD), and at solving a grand challenge in the identification of minimally-invasive AD biomarkers in tear, saliva, and blood. Our goal is to develop a novel and minimally-invasive system that integrates a multimodal biosensing platform and a machine learning framework, which synergistically work together to significantly enhance the detection accuracy. The program will pioneer a novel Multimodal Optical, Mechanical, Electrochemical Nano-sensor with Twodimensional material Amplification (MOMENTA) platform for sensitive and selective detection of AD biomarkers. The sensor outputs are used for training the new Hierarchical Multimodal Machine Learning (HMML) framework, which not only automatically integrates the heterogeneous data from different modalities but also ranks the importance of different biosensors and biomarkers for AD prediction. Moreover, the framework is able to identify potential new biomarkers based on a statistical analysis of the learned weights on the input signals and provide feedback information to further improve the MOMENTA platform design. This interdisciplinary research brings together materials scientists who create new twodimensional (2D) material platforms for sensor enhancement, nanotechnology and device experts who advance chip-scale sensor platforms, data scientists who analyze data with machine learning methods to target early prediction of AD, and AD experts who help to identify potentially new AD biomarkers. The machine-learning-enhanced multi-modal sensor system will not only offer major performance boost compared to state-of-the-art, but also yield critical insights on new biomarker discovery for AD diagnosis at an early stage.

NSF SaTC -- Understanding and Mitigating the Security Risks of AutoML, 2022-2024, $500K

Automated machine learning (AutoML) represents a new machine learning paradigm that automates the pipeline from raw data to deployable models, enabling a much wider range of people to use machine learning techniques. However, each stage of this pipeline is subject to malicious attacks, which can lead to inaccurate or vulnerable models. This project’s goal is to understand how both the technologies underlying AutoML and the ways it is adopted change security risks around machine learning and how possible defenses to them change when using AutoML. The success of this project will not only improve the security of AutoML but also promote more principled practices of building and operating machine learning systems in general, while contributing to knowledge in the areas of security, machine learning, and human-computer interaction.

The project has three main sub-goals: accounting for the full spectrum of security risks that arise around AutoML; understanding the fundamental factors that drive such risks; and designing for machine learning practitioners without extensive expertise. To accomplish these goals, the team will (i) better understand current practices around AutoML through user studies and interviews; (ii) empirically and analytically explore the security vulnerabilities of AutoML-generated models through assessing these models on widely used datasets; (iii) analyze the results of the first two activities to develop a comprehensive accounting of underlying factors such as standardization of algorithmic choices in the technology or over-reliance on automated metrics by users; and (iv) developing new principles, methodologies, and tools to mitigate the aforementioned risks. The team will also integrate the work into a number of college courses and conduct public outreach to raise awareness of the role machine learning plays in everyday life.

Sony Research Award -- Federated Semi-Supervised Learning: Exploiting the Power of Unlabeled Data, 2022-2023, $100K

We propose a novel, adaptive, and general framework for federate semi-supervised learning, which is a new research paradigm aiming to exploit the power of unlabeled data ignored by existing federated supervised learning. Introducing unlabeled data to federated learning brings several new challenges, which limits the applicability of existing models. In this proposal, we define the federated semi-supervised learning problem from the insight of data regularization and analyze the new-raised difficulties. In particular, we propose a novel learning framework, named FedSemi, to introduce the consistency regularization technique into federated learning using a teacher-student model in Task 1. In Task 2, we further propose a new metric to measure the divergence of local model layers. Using the divergence, FedSemi can automatically select layer-level parameters to be uploaded to the server in an adaptive manner. In Task 3, we propose an element-wise aggregation approach to reduce the influence of the noisy gradients in the global model aggregation. Task 4 aims to evaluate the proposed framework on three different real world applications, including image classification, sentiment analysis, and health prediction. This project creates new fundamental knowledge in both federated learning and semi-supervise learning fields and will significantly advance these two fields by developing novel methods and implementing analytic tools.

Research Projects

NSF IIS -- Career: Automated Multimodal Learning for Healthcare, 2023-2028, $555,752

NIH R01 -- SCH: AI-Enhanced Multimodal Sensor-on-a-chip for Alzheimer's Disease Detection, 2022-2026, $1.2M

NSF SaTC -- Understanding and Mitigating the Security Risks of AutoML, 2022-2024, $500K

Sony Research Award -- Federated Semi-Supervised Learning: Exploiting the Power of Unlabeled Data, 2022-2023, $100K