I start llama cpp Python server with the command:
python -m llama_cpp.server --model D:\Mistral-7B-Instruct-v0.3.Q4_K_M.gguf --n_ctx 8192 --chat_format functionary
Then I run my Python script which looks like this:
from openai import OpenAI import json import requests try: client = OpenAI( base_url="http://localhost:8000/v1", api_key="sk-xxx") response = client.chat.completions.create( model="mistralai--Mistral-7B-Instruct-v0.3", messages=[ {"role": "user", "content": "hi"}, ], ) # Extract the assistant's reply response_message = response.choices[0].message print(response_message) except Exception as e: error_msg = str(e) print(f"Exception type: {type(e)}") However, I don’t know how to set the top_k value to 1.
I tried changing my code to:
from openai import OpenAI import json import requests try: client = OpenAI( base_url="http://localhost:8000/v1", api_key="sk-xxx") response = client.chat.completions.create( model="mistralai--Mistral-7B-Instruct-v0.3", messages=[ {"role": "user", "content": "hi"}, ], top_k=1 ) # Extract the assistant's reply response_message = response.choices[0].message print(response_message) except Exception as e: error_msg = str(e) print(f"Exception type: {type(e)}") Also tried adding top_k value when starting the server like this:
python -m llama_cpp.server --model D:\Mistral-7B-Instruct-v0.3.Q4_K_M.gguf —-top-k 1 --n_ctx 8192 --chat_format functionary
But doesn’t seem to work. Can anyone help?