[05.13]菁英论坛丨医药基础模型的临床实际应用
[复制链接] 分享:Title: Closing the GAP between medical foundation models and real-world clinics: unmatched patient profiles, privacy, and GPU constraints主讲人(Speaker):王晟 华盛顿大学助理教授
时 间(Date & Time):2024年5月13日 星期一 19:00-21:00
地 点(Location):北京大学双创中心地下讲堂
主持人(Host):张铭教授
主办方:北京大学计算机学院 数据科学与工程研究所
报告摘要(Abstract)
Medical foundation models have achieved state-of-the-art performance on a variety of biomedical applications, resulting in the trend to build even larger models by training on more medical datasets. Despite their encouraging performance on artificial biomedical benchmarks, there are still critical gaps needed to be filled before these models can be used in real-world clinics. In this talk, I will introduce three gaps, including unmatched patient information, privacy and GPU constraints. First, I will introduce BioTranslator, a multilingual translation framework that projects a variety of biomedical modalities into the text space, allowing the comparison between patients with unmatched profiles. Next, I will introduce BiomedCLIP, a public medical foundation model trained from 15 millions public text-image pairs. I will illustrate how BiomedCLIP, as a public model, can be used as a proxy for clinicians to query large language models on the cloud without exposing their private data. Finally, I will introduce LLaVA-Rad, a 7B parameter model that achieves superior performance than Med PaLM M (84B) in radiology by exploiting the trade-off between domain specificity and model size, demonstrating the possibility to build small models for efficient fine-tuning and inference in clinics. I will conclude this talk with a vision of “everything everywhere all at once”, where medical foundation models and generative AI will benefit every patient in every clinic all at once.
医学基础模型在多个生物医学应用领域取得了最先进的性能,导致了训练更大模型以利用更多医学数据集的趋势。尽管在人工生物医学基准测试中表现出色,但在这些模型能够应用于真实临床之前,仍有一些关键差距需要弥补。在这次演讲中,王晟老师将介绍三个差距,包括不匹配的患者信息、隐私和GPU限制。首先,王晟老师将介绍BioTranslator,一个将多种生物医学模态映射到文本空间的多语种翻译框架,从而允许比较不匹配的患者资料。接下来,王晟老师将介绍BiomedCLIP,一个由1500万对公开文本-图像对训练而成的公共医学基础模型。他将阐述如何使用BiomedCLIP这一公共模型作为临床医生在云端查询大型语言模型的代理,而无需暴露他们的私人数据。最后,王晟老师将会介绍LLaVA-Rad,一个700亿参数的模型,通过权衡领域特异性和模型大小,在放射学领域取得了优于Med PaLM M(840亿参数)的卓越表现,证明了在真实临床构建小型模型用于高效微调和推理的可能性。王晟老师将以"全景全在"的愿景结束此次演讲,在那里医学基础模型和生成式AI将使每个诊所的每位患者同时获益。
报告人(Bio)
Sheng Wang is an assistant professor in the School of Computer Science and Engineering at the University of Washington Seattle. He obtained his B.S. degree in Computer Science from Peking University, Ph.D. degree in Computer Science from University of Illinois at Urbana Champaign, and conducted postdoc training at Stanford School of Medicine. Sheng is currently interested in developing large-scale models for biomedical applications, with a focus on digital pathology, chromatin structure prediction, and genomics-based drug discovery. His research has been published in top venues such as Nature, Science, Nature Biotechnology, Nature Machine Intelligence and The Lancet Oncology, and used by major biomedical institutes, including Chan Zuckerberg Biohub, Mayo Clinic, UW Medicine, NIH 4D Nucleome center, and National Center for Advancing Translational Sciences.
王晟老师是华盛顿大学西雅图分校计算机科学与工程学院的助理教授。他在北京大学获得了计算机科学学士学位,在伊利诺伊大学厄巴纳-香槟分校获得了计算机科学博士学位,并在斯坦福医学院进行了博士后训练。王晟老师目前的研究方向是开发大规模模型用于生物医学应用,重点领域包括数字病理学、染色质结构预测和基于基因组的药物发现。他的研究成果发表于顶级期刊,如Nature, Science, Nature Biotechnology, Nature Machine Intelligence 和 The Lancet Oncology,并被陈-扎克伯格Biohub、梅奥医疗国际、华盛顿大学医学院、NIH 4D Nucleome中心和美国国家先进科学成果转化中心等主要生物医学研究所使用。