[Model] Support MiMo-7B inference with MTP #17433

bwshen-mi · 2025-04-30T02:55:25Z

MiMo-7B is a decoder-only
Transformer LM with MTP layers that exhibits extraordinary reasoning potential.

The model and technical report are open-sourced in https://github.com/XiaomiMiMo/MiMo

We provide the main block and MTP layer definition, as well as tweaking the config in this PR.

We hope MiMo can boost the exploration of reasoning models and RL in the future.

All the best!

github-actions · 2025-04-30T02:55:35Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

mergify · 2025-04-30T02:56:01Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bwshen-mi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

hmellor · 2025-04-30T10:08:07Z

FYI, this model already works with vLLM

vllm serve XiaomiMiMo/MiMo-7B-Base --model-impl transformers --trust-remote-code

mgoin · 2025-05-01T16:59:01Z

vllm/config.py

+                    self.target_model_config.hf_text_config.architectures[0] \
+                        == "MiMoForCausalLM"):


Why not check if model_type == "mimo" similar to deepseek?

mgoin

Make sure to add to the supported models list

vllm/docs/source/models/supported_models.md

Line 238 in f2e7af9

### Generative Models

and add to the test model registry

vllm/tests/models/registry.py

Line 111 in f2e7af9

_TEXT_GENERATION_EXAMPLE_MODELS = {

Signed-off-by: wp-alpha <wangpeng66@xiaomi.com>

mgoin

LGTM! Have you tried enabling mtp to test it?

lm-eval

vllm (pretrained=XiaomiMiMo/MiMo-7B-RL,max_model_len=2048,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.7794|±  |0.0114|
|     |       |strict-match    |     5|exact_match|↑  |0.7733|±  |0.0115|

ryang-max · 2025-05-05T12:10:27Z

vllm/model_executor/models/mimo_mtp.py

+        sampling_metadata: SamplingMetadata,
+        spec_step_idx: int = 0,
+    ) -> torch.Tensor:
+        self.mtp_layers[str(self.mtp_start_layer_idx + spec_step_idx)]


was this line redundant?

This line is required for predicting subsequent tokens (beyond the first). Currently, only first-token prediction is supported because parallel Multi-Token Prediction (MTP) is not yet implemented. However, this line remains included to facilitate future multi-token prediction capabilities once parallel MTP is supported.

I mean this code here only reference to an element in the layers but no calculation was done. If this is expected for future parallel MTP, just ignore this cooment.

Sorry, I missed the line. This line is redundant.

wp-alpha · 2025-05-06T03:15:50Z

LGTM! Have you tried enabling mtp to test it?

lm-eval

vllm (pretrained=XiaomiMiMo/MiMo-7B-RL,max_model_len=2048,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.7794|±  |0.0114|
|     |       |strict-match    |     5|exact_match|↑  |0.7733|±  |0.0115|

@mgoin In our tests of aime24, the results with MTP enabled were consistent with the results without MTP enabled. And the draft acceptance is about 0.9.

ryang-max · 2025-05-06T09:12:02Z

vllm/model_executor/models/mimo_mtp.py

+        sampling_metadata: SamplingMetadata,
+        spec_step_idx: int = 0,
+    ) -> torch.Tensor:
+        self.mtp_layers[str(self.mtp_start_layer_idx + spec_step_idx)]


I mean this code here only reference to an element in the layers but no calculation was done. If this is expected for future parallel MTP, just ignore this cooment.

ryang-max · 2025-05-06T09:12:46Z

vllm/model_executor/models/mimo_mtp.py

+            name = name.replace(group.group(), group.group() + "mtp_block.")
+        return name
+
+    def _rewrite_spec_layer_name(self, spec_layer: int, name: str) -> str:


Comparing to deepseek_mtp.py, this function here seems only defined by not used

Yes, this function is useless.

simon-mo · 2025-05-08T05:31:36Z

Failing test here

https://buildkite.com/vllm/ci/builds/19355#0196a398-af9b-41fb-b4d9-1cb71b4282ad

[2025-05-06T03:32:04Z] FAILED models/test_registry.py::test_registry_imports[MiMoMTPModel] - KeyError: 'MiMoMTPModel'

| [2025-05-06T03:32:04Z] FAILED models/test_registry.py::test_hf_registry_coverage - AssertionError: Please add the following architectures to tests/models/registry.py: {'MiMoMTPModel'}
| [2025-05-06T03:32:04Z] assert not {'MiMoMTPModel'}

bwshen-mi requested review from zhuohan123, youkaichao, alexm-redhat, comaniac and njhill as code owners April 30, 2025 02:55

mergify bot added the needs-rebase label Apr 30, 2025

simon-mo added this to the v0.9.0 milestone Apr 30, 2025

wp-alpha force-pushed the feat_mimo_mtp branch from 3a353c0 to 04b0268 Compare April 30, 2025 17:16

mergify bot removed the needs-rebase label Apr 30, 2025

mgoin reviewed May 1, 2025

View reviewed changes

wp-alpha force-pushed the feat_mimo_mtp branch from 04b0268 to 51dba3f Compare May 1, 2025 17:30

[Model]: support mimo model

322f2c8

Signed-off-by: wp-alpha <wangpeng66@xiaomi.com>

wp-alpha force-pushed the feat_mimo_mtp branch from 51dba3f to 322f2c8 Compare May 1, 2025 17:40

wp-alpha requested review from DarkLight1337 and ywang96 as code owners May 1, 2025 17:40

mergify bot added the documentation Improvements or additions to documentation label May 1, 2025

mgoin approved these changes May 2, 2025

View reviewed changes

ryang-max reviewed May 5, 2025

View reviewed changes

mgoin added new-model Requests to new models ready ONLY add when PR is ready to merge/full CI is needed labels May 6, 2025

wp-alpha referenced this pull request in XiaomiMiMo/vllm May 6, 2025

[Model]: support mimo model

3a353c0

ryang-max reviewed May 6, 2025

View reviewed changes

DarkLight1337 mentioned this pull request May 9, 2025

[Feature] support MiMo models #17878

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model] Support MiMo-7B inference with MTP #17433

[Model] Support MiMo-7B inference with MTP #17433

bwshen-mi commented Apr 30, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Apr 30, 2025

mergify bot commented Apr 30, 2025

hmellor commented Apr 30, 2025

mgoin May 1, 2025

mgoin left a comment

mgoin left a comment

ryang-max May 5, 2025

wp-alpha May 6, 2025

ryang-max May 6, 2025

wp-alpha May 6, 2025

wp-alpha commented May 6, 2025 •

edited

Loading

ryang-max May 6, 2025

ryang-max May 6, 2025

wp-alpha May 6, 2025

simon-mo commented May 8, 2025

		self.target_model_config.hf_text_config.architectures[0] \
		== "MiMoForCausalLM"):

[Model] Support MiMo-7B inference with MTP #17433

Are you sure you want to change the base?

[Model] Support MiMo-7B inference with MTP #17433

Conversation

bwshen-mi commented Apr 30, 2025 • edited by github-actions bot Loading

github-actions bot commented Apr 30, 2025

mergify bot commented Apr 30, 2025

hmellor commented Apr 30, 2025

Choose a reason for hiding this comment

mgoin left a comment

Choose a reason for hiding this comment

mgoin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wp-alpha commented May 6, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simon-mo commented May 8, 2025

[2025-05-06T03:32:04Z] FAILED models/test_registry.py::test_registry_imports[MiMoMTPModel] - KeyError: 'MiMoMTPModel'

bwshen-mi commented Apr 30, 2025 •

edited by github-actions bot

Loading

wp-alpha commented May 6, 2025 •

edited

Loading