-
-
Notifications
You must be signed in to change notification settings - Fork 7.3k
[Model] Support MiMo-7B inference with MTP #17433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
This pull request has merge conflicts that must be resolved before it can be |
FYI, this model already works with vLLM vllm serve XiaomiMiMo/MiMo-7B-Base --model-impl transformers --trust-remote-code |
vllm/config.py
Outdated
self.target_model_config.hf_text_config.architectures[0] \ | ||
== "MiMoForCausalLM"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not check if model_type == "mimo"
similar to deepseek?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure to add to the supported models list
vllm/docs/source/models/supported_models.md
Line 238 in f2e7af9
### Generative Models |
Line 111 in f2e7af9
_TEXT_GENERATION_EXAMPLE_MODELS = { |
Signed-off-by: wp-alpha <wangpeng66@xiaomi.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Have you tried enabling mtp to test it?
lm-eval
vllm (pretrained=XiaomiMiMo/MiMo-7B-RL,max_model_len=2048,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.7794|± |0.0114|
| | |strict-match | 5|exact_match|↑ |0.7733|± |0.0115|
sampling_metadata: SamplingMetadata, | ||
spec_step_idx: int = 0, | ||
) -> torch.Tensor: | ||
self.mtp_layers[str(self.mtp_start_layer_idx + spec_step_idx)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this line redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is required for predicting subsequent tokens (beyond the first). Currently, only first-token prediction is supported because parallel Multi-Token Prediction (MTP) is not yet implemented. However, this line remains included to facilitate future multi-token prediction capabilities once parallel MTP is supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean this code here only reference to an element in the layers but no calculation was done. If this is expected for future parallel MTP, just ignore this cooment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed the line. This line is redundant.
@mgoin In our tests of aime24, the results with MTP enabled were consistent with the results without MTP enabled. And the draft acceptance is about 0.9. |
sampling_metadata: SamplingMetadata, | ||
spec_step_idx: int = 0, | ||
) -> torch.Tensor: | ||
self.mtp_layers[str(self.mtp_start_layer_idx + spec_step_idx)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean this code here only reference to an element in the layers but no calculation was done. If this is expected for future parallel MTP, just ignore this cooment.
name = name.replace(group.group(), group.group() + "mtp_block.") | ||
return name | ||
|
||
def _rewrite_spec_layer_name(self, spec_layer: int, name: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing to deepseek_mtp.py
, this function here seems only defined by not used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this function is useless.
Failing test here https://buildkite.com/vllm/ci/builds/19355#0196a398-af9b-41fb-b4d9-1cb71b4282ad [2025-05-06T03:32:04Z] FAILED models/test_registry.py::test_registry_imports[MiMoMTPModel] - KeyError: 'MiMoMTPModel' | [2025-05-06T03:32:04Z] FAILED models/test_registry.py::test_hf_registry_coverage - AssertionError: Please add the following architectures to |
MiMo-7B is a decoder-only
Transformer LM with MTP layers that exhibits extraordinary reasoning potential.
The model and technical report are open-sourced in https://github.com/XiaomiMiMo/MiMo
We provide the main block and MTP layer definition, as well as tweaking the config in this PR.
We hope MiMo can boost the exploration of reasoning models and RL in the future.
All the best!