[v0.13.5] Vulkan 2 x GeForce RTX 3090 24 GB #201

b4rtaz · 2025-04-19T21:19:53Z

b4rtaz
Apr 19, 2025
Maintainer

	1 x GeForce RTX 3090 24 GB	2 x GeForce RTX 3090 24 GB
Llama 3.1 8B Q40 - prediction	27.0 tok / s	34.4 tok / s
Llama 3.3 70B Instruct Q40 - prediction	not enough memory	4.8 tok / s

dllama_model_llama3_1_8b_instruct_q40

1 GPU

🔶 Pred   37 ms Sync    0 ms | Sent     0 kB Recv     0 kB | 2
🔶 Pred   37 ms Sync    0 ms | Sent     0 kB Recv     0 kB | .
🔶 Pred   37 ms Sync    0 ms | Sent     0 kB Recv     0 kB |  
🔶 Pred   37 ms Sync    0 ms | Sent     0 kB Recv     0 kB |  **

2 GPUs

🔶 Pred   30 ms Sync   16 ms | Sent   288 kB Recv   522 kB |  not
🔶 Pred   29 ms Sync   15 ms | Sent   288 kB Recv   522 kB |  to
🔶 Pred   29 ms Sync   15 ms | Sent   288 kB Recv   522 kB |  enable

dllama_model_llama3_3_70b_instruct_q40

2 GPUs

./dllama inference --prompt "Tensor parallelism is all you need" --steps 128 --model models/llama3_3_70b_instruct_q40/dllama_model_llama3_3_70b_instruct_q40.m --tokenizer models/llama3_3_70b_instruct_q40/dllama_tokenizer_llama3_3_70b_instruct_q40.t --nthreads 1 --buffer-float-type q80 --max-seq-len 256 --gpu-index 0 --workers 127.0.0.1:9999

🔶 Pred  208 ms Sync  164 ms | Sent  1392 kB Recv  1610 kB |   was
🔶 Pred  208 ms Sync  177 ms | Sent  1392 kB Recv  1610 kB |  it
🔶 Pred  205 ms Sync  162 ms | Sent  1392 kB Recv  1610 kB |  was

Spec

> nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        On  |   00000000:01:00.0 Off |                  N/A |
|  0%   29C    P8             25W /  350W |      47MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 3090        On  |   00000000:04:00.0 Off |                  N/A |
|  0%   44C    P8             23W /  350W |      21MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v0.13.5] Vulkan 2 x GeForce RTX 3090 24 GB #201

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

[v0.13.5] Vulkan 2 x GeForce RTX 3090 24 GB #201

b4rtaz Apr 19, 2025 Maintainer

dllama_model_llama3_1_8b_instruct_q40

dllama_model_llama3_3_70b_instruct_q40

Spec

Replies: 0 comments

b4rtaz
Apr 19, 2025
Maintainer