[Multimodal Support] Add MiniGPT4 #390

Kathryn-cat · 2023-06-12T16:16:39Z

This PR introduces MiniGPT4, a model that enables large language models to see. It is built on top of the Gradio interface and as part of the MLC-llm Python package. This PR supports the CUDA backend in CLI. Further performance optimization, support for other backends such as Vulkan, Metal, iPhone, Android, WebGPU, etc, and edge case detection will be introduced in follow-up PRs.

Design in this PR:

0. Searching and combining different compiled models
Since multimodal models typically come with different parts, for example, the architecture in this PR consists of MiniGPT image model and Vicuna model, we adhere to a norm that they're compiled separately and stored in different folders. When the user specifies MiniGPT with a certain quantization type, the Vicuna model with the corresponding quantization type and the MiniGPT image model with the closest dtype will be loaded. Currently no quantization is supported for MiniGPT image model alone since performance does not gain much in the image uploading stage. Details can be found in the reload_model() function in gradio.py.

1. Isolated Embed Tokens Workflow (only for llama-related models)
In order to insert the image embedding in the middle of the embedding of tokenized prompt (identified by a placeholder), we found that isolating embed_tokens() into a separate runtime function called embed is necessary. To handle such a new runtime function in the MiniGPT use case and minimize the impact on other use cases, I adhere to the following when introducing embed into the workflow:

I only changed llama.py to isolate the embed_tokens() function and did not change other models, since MiniGPT only relies on Vicuna models.
I updated cli_main.cc so that no matter the user has the old compiled llama-related models or the newly-compiled ones using the new llama.py, the CLI case would handle both in the workflow by detecting whether an embed function exists.
In order to be different from the original PrefillStep workflow, I introduced a new PrefillWithEmbedStep function to handle embeddings produced by EmbedStep. This workflow is only applied when the model is detected to have an embed function, and thus would not affect the original workflow in other use cases.

EmbedStep takes in a text input, gets the prompt, splits the prompt string by multimodal placeholders, and returns an array of embeddings on the tokenized prompts. The reason that it does not concatenate the results is that concatenation on TVM runtime NDArray is not supported, and I found it to be easier to handle the concatenation in Python and numpy (in the Gradio case).

🛑 The embed workflow is still not mature enough to be introduced globally due to the following reason:

In LM such as Vicuna, originally the parameters for prefill and decode stages are shared. If embed_tokens() is isolated, we have to specify the index in the params array to load the required params. As can be seen in the EmbedStep() function in llm_chat.cc, the index is currently hardcoded for the Vicuna case, but the index could be different in other LMs.

2. Prompt and New Conversation Template
In order to support Vicuna with a different prompt from its original one, I override the conv_template to be "minigpt" when Vicuna is called from the MiniGPT use case. I introduced a new separator style called kAccumRoleMsg in conversation.h to handle the fact that prompts are accumulated for all user inputs and LM responses history in MiniGPT. I also enabled splitting prompt string by placeholder by modifying the GetInputTokens() workflow in llm_chat.cc.

3. Gradio Workflow
The MiniGPT image model is not loaded into llm_chat.cc since it does not fit into a LM model pipeline, so it is instead stored under vision-related attributes in GradioChatModule. When upload_image() is triggered, the image model generates an image embedding and is stored as vision_embed. When the user resets the chat or removes the image, vision_embed would be cleared as well.

Example commands and demo:
Coming soon!

Progress:

Model architecture support for MiniGPT4
Pairing and combining compiled MiniGPT and Vicuna models
Enable prompt override for Vicuna.
Separate embed_tokens from the prefill stage
Prompt engineering and multimodal feature combination
Gradio interface support includes model selection, resetting, and another usability of chat interfaces.

yzh119

Good job @Kathryn-cat , some comments.

build.py

cpp/llm_chat.cc

yzh119 · 2023-06-19T18:58:12Z

Also, it would be good if you can explain the new separator style in https://mlc.ai/mlc-llm/docs/get_started/mlc_chat_config.html

cpp/llm_chat.cc

yzh119

Good job @Kathryn-cat !

This reverts commit 3500963.

Kathryn-cat changed the title ~~[model support] add MiniGPT4~~ [model support] add MiniGPT4 for CLI Jun 13, 2023

Kathryn-cat changed the title ~~[model support] add MiniGPT4 for CLI~~ [model support] add MiniGPT4 Jun 13, 2023

Kathryn-cat force-pushed the pr-minigpt branch 26 times, most recently from 04d5595 to 450734b Compare June 19, 2023 03:36

Kathryn-cat marked this pull request as ready for review June 19, 2023 03:49

Kathryn-cat changed the title ~~[model support] add MiniGPT4~~ [Multimodal Support] Add MiniGPT4 Jun 19, 2023

Kathryn-cat force-pushed the pr-minigpt branch 2 times, most recently from dfe6add to ecf0ec6 Compare June 19, 2023 16:47

yzh119 reviewed Jun 19, 2023

View reviewed changes

build.py Show resolved Hide resolved

cpp/llm_chat.cc Outdated Show resolved Hide resolved

cpp/llm_chat.cc Show resolved Hide resolved

cpp/llm_chat.cc Outdated Show resolved Hide resolved

cpp/llm_chat.cc Show resolved Hide resolved

tqchen reviewed Jun 19, 2023

View reviewed changes

cpp/llm_chat.cc Outdated Show resolved Hide resolved

Kathryn-cat force-pushed the pr-minigpt branch from ecf0ec6 to c54d067 Compare June 19, 2023 21:35

Kathryn-cat added 2 commits June 20, 2023 13:48

wip, checking correctness

edadbf5

address comments

ffcceeb

Kathryn-cat force-pushed the pr-minigpt branch from cf4d95d to ffcceeb Compare June 20, 2023 20:49

CamelCase

8285416

yzh119 approved these changes Jun 22, 2023

View reviewed changes

yzh119 merged commit 3500963 into mlc-ai:main Jun 22, 2023

MasterJH5574 added a commit that referenced this pull request Jun 22, 2023

Revert "[Multimodal Support] Add MiniGPT4 (#390)"

dbb8c6c

This reverts commit 3500963.

MasterJH5574 mentioned this pull request Jun 22, 2023

Revert "[Multimodal Support] Add MiniGPT4" #461

Merged

junrushao pushed a commit that referenced this pull request Jun 22, 2023

Revert "[Multimodal Support] Add MiniGPT4 (#390)" (#461)

2909069

This reverts commit 3500963.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Multimodal Support] Add MiniGPT4 #390

[Multimodal Support] Add MiniGPT4 #390

Uh oh!

Kathryn-cat commented Jun 12, 2023 •

edited

Loading

yzh119 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yzh119 commented Jun 19, 2023

Uh oh!

yzh119 left a comment

Labels

3 participants

[Multimodal Support] Add MiniGPT4 #390

[Multimodal Support] Add MiniGPT4 #390

Uh oh!

Conversation

Kathryn-cat commented Jun 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

yzh119 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yzh119 commented Jun 19, 2023

Uh oh!

yzh119 left a comment

Choose a reason for hiding this comment

Labels

3 participants

Kathryn-cat commented Jun 12, 2023 •

edited

Loading