Excited to attend #CVPR 2024 tomorrow (Summit 324 14:20-15:00) and talk about the research frontier in multimodal #GenerativeAI for precision health: https://lnkd.in/gheD5Svn. The confluence of digital transformation and GenAI revolution opens unprecedented opportunities for optimizing patient care and accelerating biomed discovery, but challenges abound in the forward path. Generic frontier models such as GPT-4 are amazingly proficient in understanding biomedical text (e.g., MedPrompt https://lnkd.in/gmgWPiS3), but they exhibit major competency gaps in other modalities such as medical imaging and multi-omics. In this talk, I'll present some learning in bridging such competency gaps: BiomedCLIP: For any modality, its study typically involves natural languages, which means that modality-text pairs are often abundantly available. We explore using publicly available data for vision-language pretraining (15 million PMC image-caption pairs): https://lnkd.in/ghF55fyU. Excited to see many subsequent amazing works, e.g., Twitter (PLIP, James Zou), Youtube (QUILT, Linda Shapiro), PMC/textbook (CONCH, Faisal Mahmood). LLaVA-Med: Standard contrastive learning treats all modalities equally. We instead explore using text as the interlingua modality and focus on learning an adapter for "translating" into the text semantic space, which is very data efficient. To train a multimodal GenAI copilot, we leverage GPT4 to synthesize instruction-following data from available image-text pairs. LLaVA-Med also features a modular design with late-fusion, with plug-and-play modality-specific encoders/decoders, thus can easily scale to general use cases (e.g., combining X-ray, CT, MRI, digital pathology, multi-omics). As a PoC, LLaVA-Med 1.0 was trained using just a tiny fraction of BiomedCLIP data: https://lnkd.in/deNH4XkR. We have since substantially improved the model and just release LLaVA-Med 1.5: https://lnkd.in/gndEMbC9. We are also exploring specialization such as LLaVA-Rad https://lnkd.in/g7cCZqse. Excited to see many subsequent amazing works, e.g., MAIRA (Javier Alvarez Valle), PRISM (SIQI LIU, Kristen Severson), PathChat (Faisal Mahmood). Multimodal generation: The LLaVA-Med framework can incorporate modality-specific decoders for generating multimodal output, e.g., BiomedJourney https://lnkd.in/gFnvqFTw, BiomedParse https://lnkd.in/gzFPe5aH (Mu Wei will present in the same workshop 10:40-11:20). Whole-slide modeling: Digital pathology poses unique computational challenges due to its enormous size. We propose the first whole-slide pathology foundation model, GigaPath, in our #Nature paper https://lnkd.in/gHjmT7We. So much more remains to be done!
Hoifung and his team are leading the way in AI supported technolgies while being ever minded of the complexities associated with real world medicine.Ecosystem transformation will only occur with technology substitution when integrated to systems that work to improve outcomes
Amazing results, congratulations!
Looking forward to this! Thanks for the heads up Hoifung Poon.
Amazing! Love to see this!
Great work!
General Manager, Microsoft Health Futures
4wWe are fortunate to work with many amazing collaborators, such as Carlo Bifulco, Brian Piening, Sheng Wang, Muhao Chen, Jianfeng Gao, Tao Qin, Furu Wei, Mu Wei, Hany Awadalla, to name just a few. There are many other exciting works in health AI by amazing teams at Microsoft, e.g., Biomedical Imaging (Javier Alvarez Valle), Biomedical Signal Processing (Michael Hansen), Biomed ML (Nicolo Fusi), ...... Please check out the whole workshop: Foundation Models for Medical Vision https://fmv-cvpr24workshop.github.io/#about. There are an amazing array of great speakers: Shekoofeh Azizi, Sharon Xiaolei Huang, Mu Wei, Faisal Mahmood, David Ouyang, MD. Thanks Bo Wang and all for the invite and organizing! Look forward to seeing many old friends and meeting new ones. If you want to check out the brand new #Microsoft campus at Redmond, happy to be the tour guide if schedule aligns :)