Running DeepSeek-V3 inference without GPU (on CPU only)

Question

I am trying to run the DeepSeek-V3 model inference on a remote machine (SSH). This machine does not have any GPU, but has many CPU cores.

1rst method/

I try to run the model inference using the DeepSeek-Infer Demo method:

generate.py --ckpt-path /path/to/DeepSeek-V3-Demo --config configs/config_671B.json --interactive --temperature 0.7 --max-new-tokens 200

This produced the following error message:

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

2nd method/

I then try to use a second method, using the Hugging-Face Transformer library.
I installed the Transformers Python package v4.51.3 (which supports DeepSeek-V3).
I then implemented the script described in the Transformers/DeepSeek-V3 documentation:

# `run_deepseek_v1.py` from transformers import AutoModelForCausalLM, AutoTokenizer import torch torch.manual_seed(30) tokenizer = AutoTokenizer.from_pretrained("path/to/local/deepseek-v3") chat = [ {"role": "user", "content": "Hello, how are you?"}, {"role": "assistant", "content": "I'm doing great. How can I help you today?"}, {"role": "user", "content": "I'd like to show off how chat templating works!"}, ] model = AutoModelForCausalLM.from_pretrained("path/to/local/deepseek-v3", device_map="auto", torch_dtype=torch.bfloat16) inputs = tokenizer.apply_chat_template(chat, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device) import time start = time.time() outputs = model.generate(inputs, max_new_tokens=50) print(tokenizer.batch_decode(outputs)) print(time.time()-start)

A got a similar error message when running this:

transformers/quantizers/quantizer_finegrained_fp8.py, line 51, in validate_environment raise RuntimeError("No GPU found. A GPU is needed for FP8 quantization.").

I tried to change device_map="auto" to device_map="cpu", but it did not change anything (I still got the same error message)

So my question is the following, is there any way to run DeepSeek on CPU only (without any GPU), ideally using one of this method (or another method that I would not know) ?

P.S.: I am new to the Data Science website. So if this question is too "implementation/runtime environment details" oriented, don't hesitate to tell it to me and I will close this post.

guillemonth · Accepted Answer · 2025-04-29 15:50:17Z

Have you tried using ollama?

I've found this 2 possible solutions to your problem.

1: Creating and system variable link
2: Using it as a parameter in a python object link

Using ollama is very simple, you just have to install it, download the model using a command and you should be able to run the model already.

I hope it helps :)

Stack Exchange Network

Running DeepSeek-V3 inference without GPU (on CPU only)

1 Answer 1

Hot Network Questions

Running DeepSeek-V3 inference without GPU (on CPU only)

1 Answer 1

Related

Hot Network Questions