Hoifung Poon’s Post

View profile for Hoifung Poon, graphic

General Manager, Microsoft Health Futures

Excited to attend #CVPR 2024 tomorrow (Summit 324 14:20-15:00) and talk about the research frontier in multimodal #GenerativeAI for precision health: https://lnkd.in/gheD5Svn. The confluence of digital transformation and GenAI revolution opens unprecedented opportunities for optimizing patient care and accelerating biomed discovery, but challenges abound in the forward path. Generic frontier models such as GPT-4 are amazingly proficient in understanding biomedical text (e.g., MedPrompt https://lnkd.in/gmgWPiS3), but they exhibit major competency gaps in other modalities such as medical imaging and multi-omics. In this talk, I'll present some learning in bridging such competency gaps: BiomedCLIP: For any modality, its study typically involves natural languages, which means that modality-text pairs are often abundantly available. We explore using publicly available data for vision-language pretraining (15 million PMC image-caption pairs): https://lnkd.in/ghF55fyU. Excited to see many subsequent amazing works, e.g., Twitter (PLIP, James Zou), Youtube (QUILT, Linda Shapiro), PMC/textbook (CONCH, Faisal Mahmood). LLaVA-Med: Standard contrastive learning treats all modalities equally. We instead explore using text as the interlingua modality and focus on learning an adapter for "translating" into the text semantic space, which is very data efficient. To train a multimodal GenAI copilot, we leverage GPT4 to synthesize instruction-following data from available image-text pairs. LLaVA-Med also features a modular design with late-fusion, with plug-and-play modality-specific encoders/decoders, thus can easily scale to general use cases (e.g., combining X-ray, CT, MRI, digital pathology, multi-omics). As a PoC, LLaVA-Med 1.0 was trained using just a tiny fraction of BiomedCLIP data: https://lnkd.in/deNH4XkR. We have since substantially improved the model and just release LLaVA-Med 1.5: https://lnkd.in/gndEMbC9. We are also exploring specialization such as LLaVA-Rad https://lnkd.in/g7cCZqse. Excited to see many subsequent amazing works, e.g., MAIRA (Javier Alvarez Valle), PRISM (SIQI LIU, Kristen Severson), PathChat (Faisal Mahmood). Multimodal generation: The LLaVA-Med framework can incorporate modality-specific decoders for generating multimodal output, e.g., BiomedJourney https://lnkd.in/gFnvqFTw, BiomedParse https://lnkd.in/gzFPe5aH (Mu Wei will present in the same workshop 10:40-11:20). Whole-slide modeling: Digital pathology poses unique computational challenges due to its enormous size. We propose the first whole-slide pathology foundation model, GigaPath, in our #Nature paper https://lnkd.in/gHjmT7We. So much more remains to be done!

Multimodal Generative AI: the Next Frontier in Precision Health - Microsoft Research

Multimodal Generative AI: the Next Frontier in Precision Health - Microsoft Research


Hoifung Poon

General Manager, Microsoft Health Futures


We are fortunate to work with many amazing collaborators, such as Carlo Bifulco, Brian Piening, Sheng Wang, Muhao Chen, Jianfeng Gao, Tao Qin, Furu Wei, Mu Wei, Hany Awadalla, to name just a few. There are many other exciting works in health AI by amazing teams at Microsoft, e.g., Biomedical Imaging (Javier Alvarez Valle), Biomedical Signal Processing (Michael Hansen), Biomed ML (Nicolo Fusi), ...... Please check out the whole workshop: Foundation Models for Medical Vision https://fmv-cvpr24workshop.github.io/#about. There are an amazing array of great speakers: Shekoofeh Azizi, Sharon Xiaolei Huang, Mu Wei, Faisal Mahmood, David Ouyang, MD. Thanks Bo Wang and all for the invite and organizing! Look forward to seeing many old friends and meeting new ones. If you want to check out the brand new #Microsoft campus at Redmond, happy to be the tour guide if schedule aligns :)

James Weinstein

Global VP Access/Equity Microsoft


Hoifung and his team are leading the way in AI supported technolgies while being ever minded of the complexities associated with real world medicine.Ecosystem transformation will only occur with technology substitution when integrated to systems that work to improve outcomes

Oded Kalev

CTO and Co-Founder @ Converge Bio Some men see things as they are and ask, "Why?" I dream things that never were and ask, "Why not?"


Amazing results, congratulations!

Bridger Ammar, PhD 🌍🕊️

CEO @ higg.world, Affiliate Professor@ University of Washington, Investor - Opinions my own.


Looking forward to this! Thanks for the heads up Hoifung Poon.

Sarah Gordon

Operations Program Manager at Microsoft


Amazing! Love to see this!

Stephen Ibaraki

Global Chairman REDDS Capital, Microsoft 22 Global Awards (7 Awards, 2018-2025 in AI), Investor/Venture Capitalist, Futurist, Serial Entrepreneur, Founder & Chair Outreach UN ITU AI For Good, Author, 300+ recognitions


Hoifung Poon 💯🌎

Md Mostafijur Rahman

Doctoral Candidate at UT Austin | AI in Healthcare | NIH IRTA Fellow | Ex - Machine Learning Research Intern at Bosch Research, USA


Great work!

See more comments

To view or add a comment, sign in

Explore topics