Visual Promping Papers in Computer Vision

A curated list of prompt-based papers in computer vision and vision-language learning.

Keywords

Vision Prompt

This section collects papers prompting pretrained vision foundation models (e.g., ViT) for parameter-efficient adaptation.

ArXiv Papers

Vision-Language Prompt

This section collects papers prompting pretrained vision-language foundation models (e.g., CLIP) for parameter-efficient adaptation.

ArXiv Papers

Language-Interactable Prompt

Language-interactable prompter develops zero/few-shot capabilities by prompting several independent foundational models (VLMs, LLMs, VMs, etc.) with the language interface. One of the most attractive applications is multimodal chatbot.

Arxiv Papers

Vision-Language Instruction Tuning

The goal of vision-language instruction tuning is to train a model that can effectively understand instructions for general-purpose multimodal tasks.

More Resources