Speakers & Presentations

Scaling Insights in Cancer Care with Machine Learning

Will Shapiro, Data Science and Machine Learning at Flatiron Health

Will Shapiro currently leads Data Insights Engineering at Flatiron, which encompasses data science, machine learning, artificial intelligence and product analytics. He became fascinated by machine learning while building personalization engines at Spotify, where he led AI teams focused on foundational research. He is a prolific inventor, with 11 (and counting) granted patents in AI, ML and personalization, as well as an author of a pivotal study that used listening behaviors to predict personality type. After experiencing firsthand the profound difference between biomarker-targeted cancer therapy and traditional chemotherapy, Will became passionate about the potential of personalized medicine - and in particular - ensuring that the future of medicine is personalized for everyone, not just the targets of traditional clinical trials.

Abstract

The ML team at Flatiron Health has primarily been focused on extracting RWD from unstructured documents for the past 6 years, dramatically increasing the scale of patients we’re able to learn from. In this talk I will give an overview of some of the approaches we’ve taken to ML-extraction historically, as well as recent experiments with LLMs and foundation models. I’ll also provide a brief overview of recent applications of ML and LLMs at the Point of Care and in the context of clinical research.

Advancing Clinical Development through Real-World Data, AI and Machine Learning

Harry Yang, Vice President of Biometrics at Recursion Pharmaceuticals

Harry Yang, Ph.D., is Vice President of Biometrics at Recursion Pharmaceuticals, following positions as VP of Biometrics at Fate Therapeutics and Head of Statistical Science at AstraZeneca and MedImmune. With 26 years of experience in small molecule, biologics, and cellular immunotherapy development, his expertise spans the therapeutics areas of transplantation, vaccine, autoimmune & inflammatory disease, oncology, and rare disease. Dr. Yang is well-versed in innovative trial design, regulatory submissions, real-world data utilization, and the integration of AI and machine learning in drug R&D. He is a prolific author, having published 8 books and over 130 articles and book chapters covering diverse statistical, scientific, and regulatory topics in drug R&D. Additionally, he serves as the Vice Chair of the USP Statistics Expert Committee.

Abstract

In the ever-evolving landscape of biomedical research, the confluence of recent breakthroughs in real-world data (RWD), artificial intelligence (AI), and machine learning has paved the way for unlocking the untapped potential residing within the vast reservoirs of historical and ongoing clinical trial data. This presentation delves into the transformative power of these advancements, showcasing two use cases that illuminate the pivotal role of RWD and machine learning models in shaping and enhancing clinical development strategies, specifically within the area of immune-oncology products.

Active and Federated Learning in Drug Discovery and Development

Christopher James Langmead, Scientific Executive Director at Amgen

Dr. Langmead leads Amgen's efforts in the development and application of AI/ML-based methods for the discovery and optimization of biologics. His team at Amgen is involved with all stages of the pipeline. Prior to joining Amgen, Dr. Langmead was a tenured faculty member in the School of Computer Science at Carnegie Mellon University where his research concerned the development of Generative AI methods for the design of proteins, and algorithms for automatic scientific discovery and sequential optimization.

Abstract

This presentation will discuss Amgen's use of Active and Federated Learning to enhance its Generative Biology platform for engineering biologics.

Artificial Intelligence in Regulatory Decision-Making for Drug and Biological Products

Tala H. Fakhouri, Associate Director for Policy Analysis, Officer of Medical Policy, Center for Drug Evaluation and Research, U.S. FDA

Tala H. Fakhouri PhD MPH is the Associate Director for Policy Analysis in the Office of Medical Policy Initiatives (OMPI), Center for Drug Evaluation and Research (CDER), Food and Drug Administration (FDA). Dr. Fakhouri manages a team tasked with developing, coordinating, and implementing medical policy with a focus on the use of Artificial Intelligence (AI) and Machine Learning (ML) in drug development. These efforts include overseeing an AI policy group, as well as engaging external stakeholders and advancing the development of regulatory science around the use of AI in drug development. Recently, she led the development and publication of a Discussion Paper; titled: “Using Artificial Intelligence and Machine Learning in the Development of Drug and Biological Products”. She also contributes to the development of medical policy related to real-world evidence (RWE) and the use of digital health technologies for medical product development. In 2023, Dr. Fakhouri was selected by the Office of Management and Budget (OMB) to serve on the Federal Committee for Statistical Methodology (FCSM) for her expertise in statistical methods.\ Prior to joining FDA, Dr. Fakhouri served as a Senior Health Scientist and Chief Statistician for the CDC’s flagship population survey, the National Health and Nutrition Examination Survey (NHANES). The NHANES program, which is conducted by the CDC’s National Center for Health Statistics (NCHS), is recognized as the premier source of nationally representative data on the health of the nation. In her role at NHANES, Dr. Fakhouri advised and provided guidance on all epidemiologic, statistical, and methodological issues related to NHANES within CDC and with external stakeholders, with special emphasis on issues related to selection bias, data linkage, and data quality. She was also responsible for designing strategies to increase survey cooperation rates including changes to recruitment protocol and procedures, changes to survey sampling and design, and developing targeted outreach materials to increase survey representativeness. In addition, Dr. Fakhouri served on the NCHS Disclosure Review Board, the Cancer Moonshot Data Science Workgroup, and co-led the FCSM Nonresponse Bias Subcommittee, which was tasked with identifying gaps in issues related to survey nonresponse and selection bias, and with providing recommendations to OMB on Federal survey standards and guidelines. Prior to joining NHANES, she served as an Epidemic Intelligence Service Officer with the CDC, and deputy lead for health surveys at ICF-Macro International. Dr. Fakhouri published over 30 government reports, peer-reviewed papers, and book chapters on chronic disease epidemiology and on methodological issues related to nonresponse bias and data quality. Dr. Fakhouri earned a Ph.D. in Oncological Sciences from The Huntsman Cancer Institute at the University of Utah, an MPH in Epidemiologic and Biostatistical Methods from the Johns Hopkins University School of Public Health, and a postdoctoral fellowship in molecular biology and genetics from Harvard University, and holds a BSc Medical Technology form the Jordan University of Science and Technology

Abstract

TBD

Multimodal Integration in the Age of Million Cells and Billion Parameters

Himel Mallick, Assistant Professor in Department of Population Health Sciences, Weill Cornell Medicine, Cornell University

Himel Mallick is a data scientist and computational biologist with almost two decades of experience in Statistics, Biostatistics, and AI/ML in academia and industry. His methodological research interests are in Bayesian statistics, machine learning, and omics data science methods including multi-omics integration, microbiome, single-cell, spatial transcriptomics, imaging, and digital pathology. He has a highly cited publication track record with over 40+ publications in top-tier scientific journals including Nature and Lancet as well as top health science journals such as Statistics in Medicine and PLoS Computational Biology. He is the lead developer of several popular Bioconductor packages including MaAsLin2. He is a recipient of the IISA Early Career Award in Statistics in Data Sciences (ECASDS), a Fellow of the American Statistical Association (FASA) and an Elected Member of the International Statistical Institute.

Abstract

Research in machine learning and data science is increasingly entering the realm of staggeringly large multiview data collections (concurrent measurements (views) collected on the same subjects from multiple sources). Fueled by an explosion in recent high-throughput and AI technologies, we are now ready to enter the world of personalized medicine and individualized solutions, where clinical or other non-therapeutic interventions can be custom-tailored to individuals to achieve better outcomes based on their multiview profiles. Although analyses of such multimodal datasets have the potential to provide new insights into the underlying mechanistic processes that cannot be inferred with a single modality, the integration of very large, complex, multimodal data represents a considerable statistical and computational challenge. An understanding of the principles of data integration and visualization methods is thus necessary to determine which methods are best applied to a particular integration problem. In this talk, I will discuss open challenges in multimodal integration, including methodological issues that must be resolved to establish the resources needed to move beyond incremental advances toward translational intervention while keeping machine learning and data science at the forefront of the next generation of multiview research.

Mapping the Mind: Modeling Brain Connectivity and Its Link to Behavior

Yize Zhao, Associate Professor in the Department of Biostatistics, Yale School of Public Health, Yale University

Dr. Zhao is an Associate Professor in the Department of Biostatistics at Yale School of Public Health. She is also affiliated with Yale Center for Analytical Sciences, Yale Alzheimer's Disease Research Center, Yale Wu Tsai Institute, Yale Center for Brain and Mind Health and Yale Computational Biology and Bioinformatics. Her main research focuses on the development of statistical and machine learning methods to analyze large-scale complex data (imaging, -omics, EHRs), Bayesian methods, feature selection, predictive modeling, data integration, missing data and network analysis. She has strong interests in biomedical research areas including mental health, mental disorders and aging, etc. Her most recent research agenda includes analytical method development and applications on brain network analyses, multimodal imaging modeling, imaging genetics, and the integration of biomedical data with EHR data. Her research is supported by multiple NIH grants.

Abstract

Brain functional connectivity or connectome, a unique measure for brain functional organization, provides a great potential to explain the neurobiological underpinning of behavioral profiles. In the first project, we study the complex impact of multi-state functional connectivity on behaviors by analyzing the data from a recent landmark brain development and child health study. We propose a nonparametric, Bayesian supervised heterogeneity analysis to uncover neurodevelopmental subtypes with distinct effect mechanisms. We impose stochastic block structures to identify network-based functional phenotypes and develop a variational expectation–maximization algorithm to facilitate an efficient posterior computation. Through integrating resting-state and task-related functional connectomes, we dissect heterogeneous effect mechanisms on children’s fluid intelligence from the functional network phenotypes including Fronto-parietal Network and Default Mode Network under different cognitive states. Based on extensive simulations, we further confirm the superior performance of our method on uncovering brain-to-behavior relationships. In the second project, we further incorporate the disease outcome, and proposed a mediation analysis to explore the effect mechanism among genetic exposure, structural connectivity and time to disease onset. Extensive simulations confirmed the superiority of our methods compared with existing alternatives. By applying the methods to landmark brain connectivity and Alzheimer’s disease studies, we obtained biologically plausible insights.

iPIPE: Bayesian Supervised Learning under Monotonicity with Applications in mHealth and Cancer Screening

Ying Kuen (Ken) Cheung, Professor of Biostatistics in the Mailman School of Public Health, Columbia University

Ying Kuen (Ken) Cheung, PhD, is Professor of Biostatistics in the Mailman School of Public Health at Columbia University. He has general interests in the development and evaluation of evidence-based treatments, interventions and policies at all phases of translational research. He is an expert in adaptive designs in clinical trials of treatments for cancer, stroke, neurological disorders, cardiovascular diseases, and mental health, SMART designs for behavioral intervention technologies, N-of-1 personalized trials, implementation study designs, and the analysis of high dimensional behavioral data. An overarching goal of his research and professional activities is to advance precision medicine and digital health (e.g., mobile health apps, Internet of Things) using data science and biostatistical methods. He is a recipient of IBM Faculty Award on Big Data and Analytics. He is a Fellow of the American Statistical Association, and a Fellow of the New York Academy of Medicine.

Abstract

In this talk, I will introduce a new Bayesian learning method, called iPIPE, for applications where monotonicity holds. Briefly, we formulate the estimation of monotone response surface of multiple factors as the inverse of an iteration of partially ordered classifier ensembles. Each ensemble (called PIPE-classifiers) is a projection of Bayes classifiers on the constrained space. We prove the inverse of PIPE-classifiers (iPIPE) exists, and propose algorithms to efficiently compute iPIPE by reducing the space over which optimization is conducted. The methods are applied in analysis and simulation settings where the surface dimension is higher than what the isotonic regression literature typically considers. Simulation shows iPIPE-based credible intervals achieve nominal coverage probability and are more precise compared to unconstrained estimation.

Clinical Drug Developments: Challenges and Data Science Applications

Dacheng Liu, Highly Distinguished Therapeutic Area and Methodology Statistician, Boehringer Ingelheim

Dacheng Liu serves as the Highly Distinguished Therapeutic Area and Methodology Statistician at Boehringer Ingelheim with 18 years of experience in the pharmaceutical industry. In this role, he provides leadership in driving the statistical quality and fostering innovation of companywide clinical development programs. As the chair of the statistical strategy and review committee, he is instrumental in shaping the organization’s statistical practices. Dacheng represents Boehringer Ingelheim at industry-wide groups, such as PhRMA clinical development working group, and leads collaborations with partners in the US from both industry and academia. Prior to his current role, Dacheng served as the Global Head of Clinical Data Sciences, and the US Head of Statistics, leading both US and global teams in clinical drug developments of the company pipeline. He has extensive experience leading early and late-phase projects in multiple disease areas, including landmark studies, regulatory submissions, and FDA advisory committee meetings. He played a key role in harmonizing SOP processes and standardizing statistical methodologies within Boehringer Ingelheim. Dacheng has over 40 publications in areas of clinical research, trial design, statistical methodologies, and machine learning.

Abstract

Drug development is a highly complex process involving substantial investment and long cycle time in a complex regulatory and commercial environment. Pharma R&D has been facing major challenges in the past decades, due to e.g. substantial increase of drug development cost, high failure rate, high economic pressures and regulatory hurdles. With the advancement of AI and machine learning, coupled with the availability of multiple sources of data including clinical trial and real world data, data science has the potential of improving the productivity of Pharma R&D, particularly in evidence generation of clinical drug development. In this talk we will discuss some challenges of clinical drug development and provide examples of data science applications, such as treatment compliance, disease modeling, patient screening etc.