PSU Data Science Lab

Let others know my work

Talks

Keynotes

IJCAI 2024 AI4TS Workshop: Learning Healthcare Foundation Models: From Pre-training to Fine-tuning [Link]
Foundation models have recently garnered significant attention due to their powerful capabilities across various tasks. In the medical domain, although some medical foundation models have been developed, their ability to handle diverse medical tasks remains limited. To address this gap, Dr. Ma's lab has developed a series of medical foundation models using pre-training and fine-tuning techniques, tailored to the unique characteristics of multi-sourced and multi-modal clinical data. In this talk, Dr. Ma will detail the development and capabilities of these medical foundation models.

Tutorials

SDM 2024: Heterogeneity in Federated Learning [Link]
Federated learning is a distributed machine learning paradigm, which enables multiple participants to cooperate in training machine learning models without sharing data. Heterogeneity is one of the main challenges in federated learning. To solve this challenge, in this tutorial, we will cover the state-of-the-art federated learning techniques to handle the heterogeneity issue. In particular, we focus on the following three aspects: (1) providing a comprehensive review of heterogeneity challenges in federated learning from three perspectives, including data heterogeneity, model heterogeneity, and system heterogeneity; (2) introducing cutting-edge techniques to solve the heterogeneity issue in federated learning from both algorithm and application perspectives; and (3) identifying open challenges and proposing convincing future research directions in heterogeneous federated learning. We believe this is an emerging and potentially high-impact topic in distributed machine learning, which will attract both researchers and practitioners from academia and industry.

KDD 2021: Advances in Mining Heterogeneous Healthcare Data [Link]
Thanks to the explosion of heterogeneous healthcare data and advanced machine learning and data mining techniques, specifically deep learning methods, we now have an opportunity to make difference in healthcare. In this tutorial, we will present state-of-the-art deep learning methods and their real-world applications, specifically focusing on exploring the unique characteristics of different types of healthcare data. The first half will be spent on introducing recent advances in mining structured healthcare data, including computational phenotyping, disease early detection/risk prediction and treatment recommendation. In the second half, we will focus on challenges specific to the unstructured healthcare data, and introduce advanced deep learning methods in automated ICD coding, understandable medical language translation, clinical trial mining, and medical report generation. This tutorial is intended for students, engineers and researchers who are interested in applying deep learning methods to healthcare, and prerequisite knowledge will be minimal. The tutorial will be concluded with open problems and a Q&A session.

WSDM 2020: Learning with Small Data [Link]
In the era of big data, it is easy for us collect a huge number of image and text data. However, we frequently face the real-world problems with only small (labeled) data in some domains, such as healthcare and urban computing. The challenge is how to make machine learn algorithms still work well with small data? To solve this challenge, in this tutorial, we will cover the state-of-the-art machine learning techniques to handle small data issue. In particular, we focus on the following three aspects: (1) Providing a comprehensive review of recent advances in exploring the power of knowledge transfer, especially focusing on meta-learning; (2) introducing the cutting-edge techniques of incorporating human/expert knowledge into machine learning models; and (3) identifying the open challenges to data augmentation techniques, such as generative adversarial networks.

KDD 2019: Optimizing the Wisdom of the Crowd: Inference, Learning, and Teaching [Link]
The increasing need for labeled data has brought the booming growth of crowdsourcing in a wide range of high-impact real-world applications, such as collaborative knowledge (e.g., data annotations, language translations), collective creativity (e.g., analogy mining, crowdfunding), and reverse Turing test (e.g., CAPTCHA-like systems), etc. In the context of supervised learning, crowdsourcing refers to the annotation procedure where the data items are outsourced and processed by a group of mostly unskilled online workers. Thus, the researchers or the organizations are able to collect large amount of information via the feedback of the crowd in a short time with a low cost.
Despite the wide adoption of crowdsourcing, several of its fundamental problems remain unsolved especially at the information and cognitive levels with respect to incentive design, information aggregation, and heterogeneous learning. This tutorial aims to: (1) provide a comprehensive review of recent advances in exploring the power of crowdsourcing from the perspective of optimizing the wisdom of the crowd; and (2) identify the open challenges and provide insights to the future trends in the context of human-in- the-loop learning. We believe this is an emerging and potentially high-impact topic in computational data science, which will attract both researchers and practitioners from academia and industry.

Selected Recent Invited Talks

Deep Predictive Models for Mining Electronic Health Records. (1) King Abdullah University of Science and Technology, Thuwal, Saudi Arabia, March 2019; (2) Pennsylvania State University, State College, PA, March 2019; (3) Case Western Reserve University, Cleveland, OH, February 2019; (4) Auburn University, Auburn, Alabama, February 2019; (5) Missouri University of Science and Technology, Rolla, MO, February 2019; (6) Nanyang Technological University, Singapore, January 2019; (7) Temple University, Philadelphia, PA, USA, December 2018.

Truth Discovery from Multi-Sourced Data. Dalian University of Technology, Dalian, China, March 2017.

Truth Discovery for Crowdsourced Data Aggregation. PARC, a Xerox Company, Rochester, NY, USA, August 2016.

Selected Conference Presentations

"KAME: Knowledge-based Attention Model for Diagnosis Prediction in Healthcare", CIKM Conference Talk, Turin, Italy, October 2018.

"Risk Prediction on Electronic Healthcare Records with Prior Medical Knowledge", KDD Conference Talk, London, United Kingdom, August 2018.

"TextTruth: An Unsupervised Approach to Discover Trustworthy Information from Multi-Sourced Text Data", KDD Conference Talk, London, United Kingdom, August 2018.

"Deep Learning for Diagnosis Prediction in Healthcare", UB CSE Graduate Research Conference Poster, Buffalo, NY, USA, September 2017. (Best Poster Award)

"Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks", KDD Conference Poster, Halifax, NS, Canada, August 2017.

"Unsupervised Discovery of Drug Side-Effects From Heterogeneous Data Sources", KDD Conference Poster, Halifax, NS, Canada, August 2017.