Skip to main content

9 docs tagged with "multimodal"

CheXagent

论文名称 Towards a Foundation Model for Chest X-Ray Interpretation

CLIP

论文名称：Learning Transferable Visual Models From Natural Language Supervision

DALLE

DALLE：from text to image.

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs

摘要和介绍

GPT4RoI-Instruction Tuning Large Language Model on Region-of-Interest

论文名称：GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

LLaVA

论文名称：Visual Instruction Tuning

Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Shikra-Unleashing Multimodal LLM’s Referential Dialogue Magic

论文名称：Shikra: Unleashing Multimodal LLM’s Referential Dialogue Magic

VisionLLM

论文名称：VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks