Skip to content

Conversation

@Kathryn-cat
Copy link
Contributor

@Kathryn-cat Kathryn-cat commented Jun 12, 2023

This PR introduces MiniGPT4, a model that enables large language models to see. It is built on top of the Gradio interface and as part of the MLC-llm Python package. This PR supports the CUDA backend in CLI. Further performance optimization, support for other backends such as Vulkan, Metal, iPhone, Android, WebGPU, etc, and edge case detection will be introduced in follow-up PRs.


Design in this PR:

0. Searching and combining different compiled models
Since multimodal models typically come with different parts, for example, the architecture in this PR consists of MiniGPT image model and Vicuna model, we adhere to a norm that they're compiled separately and stored in different folders. When the user specifies MiniGPT with a certain quantization type, the Vicuna model with the corresponding quantization type and the MiniGPT image model with the closest dtype will be loaded. Currently no quantization is supported for MiniGPT image model alone since performance does not gain much in the image uploading stage. Details can be found in the reload_model() function in gradio.py.

1. Isolated Embed Tokens Workflow (only for llama-related models)
In order to insert the image embedding in the middle of the embedding of tokenized prompt (identified by a placeholder), we found that isolating embed_tokens() into a separate runtime function called embed is necessary. To handle such a new runtime function in the MiniGPT use case and minimize the impact on other use cases, I adhere to the following when introducing embed into the workflow:

  • I only changed llama.py to isolate the embed_tokens() function and did not change other models, since MiniGPT only relies on Vicuna models.
  • I updated cli_main.cc so that no matter the user has the old compiled llama-related models or the newly-compiled ones using the new llama.py, the CLI case would handle both in the workflow by detecting whether an embed function exists.
  • In order to be different from the original PrefillStep workflow, I introduced a new PrefillWithEmbedStep function to handle embeddings produced by EmbedStep. This workflow is only applied when the model is detected to have an embed function, and thus would not affect the original workflow in other use cases.

EmbedStep takes in a text input, gets the prompt, splits the prompt string by multimodal placeholders, and returns an array of embeddings on the tokenized prompts. The reason that it does not concatenate the results is that concatenation on TVM runtime NDArray is not supported, and I found it to be easier to handle the concatenation in Python and numpy (in the Gradio case).

🛑 The embed workflow is still not mature enough to be introduced globally due to the following reason:

  • In LM such as Vicuna, originally the parameters for prefill and decode stages are shared. If embed_tokens() is isolated, we have to specify the index in the params array to load the required params. As can be seen in the EmbedStep() function in llm_chat.cc, the index is currently hardcoded for the Vicuna case, but the index could be different in other LMs.

2. Prompt and New Conversation Template
In order to support Vicuna with a different prompt from its original one, I override the conv_template to be "minigpt" when Vicuna is called from the MiniGPT use case. I introduced a new separator style called kAccumRoleMsg in conversation.h to handle the fact that prompts are accumulated for all user inputs and LM responses history in MiniGPT. I also enabled splitting prompt string by placeholder by modifying the GetInputTokens() workflow in llm_chat.cc.

3. Gradio Workflow
The MiniGPT image model is not loaded into llm_chat.cc since it does not fit into a LM model pipeline, so it is instead stored under vision-related attributes in GradioChatModule. When upload_image() is triggered, the image model generates an image embedding and is stored as vision_embed. When the user resets the chat or removes the image, vision_embed would be cleared as well.


Example commands and demo:
Coming soon!


Progress:

  • Model architecture support for MiniGPT4
  • Pairing and combining compiled MiniGPT and Vicuna models
  • Enable prompt override for Vicuna.
  • Separate embed_tokens from the prefill stage
  • Prompt engineering and multimodal feature combination
  • Gradio interface support includes model selection, resetting, and another usability of chat interfaces.
@Kathryn-cat Kathryn-cat changed the title [model support] add MiniGPT4 [model support] add MiniGPT4 for CLI Jun 13, 2023
@Kathryn-cat Kathryn-cat changed the title [model support] add MiniGPT4 for CLI [model support] add MiniGPT4 Jun 13, 2023
@Kathryn-cat Kathryn-cat force-pushed the pr-minigpt branch 26 times, most recently from 04d5595 to 450734b Compare June 19, 2023 03:36
@Kathryn-cat Kathryn-cat marked this pull request as ready for review June 19, 2023 03:49
@Kathryn-cat Kathryn-cat changed the title [model support] add MiniGPT4 [Multimodal Support] Add MiniGPT4 Jun 19, 2023
@Kathryn-cat Kathryn-cat force-pushed the pr-minigpt branch 2 times, most recently from dfe6add to ecf0ec6 Compare June 19, 2023 16:47
Copy link
Member

@yzh119 yzh119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job @Kathryn-cat , some comments.

@yzh119
Copy link
Member

yzh119 commented Jun 19, 2023

Also, it would be good if you can explain the new separator style in https://mlc.ai/mlc-llm/docs/get_started/mlc_chat_config.html

Copy link
Member

@yzh119 yzh119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job @Kathryn-cat !

@yzh119 yzh119 merged commit 3500963 into mlc-ai:main Jun 22, 2023
MasterJH5574 added a commit that referenced this pull request Jun 22, 2023
junrushao pushed a commit that referenced this pull request Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants