61 questions
0 votes
1 answer
413 views
How to properly install llama-cpp-python on windows 11 with GPU support
I have been trying to install llama-cpp-python for windows 11 with GPU support for a while, and it just doesn't work no matter how I try. I installed the necessary visual studio toolkit packages, ...
1 vote
0 answers
324 views
Failed to build installable wheels for some pyproject.toml based projects llama-cpp-python
I tried to install llama-cpp-python via pip, but I have an error with the installation The command that I wrote: CMAKE_ARGS="-DLLAMA_METAL_EMBED_LIBRARY=ON -DLLAMA_METAL=on" pip install ...
1 vote
0 answers
193 views
llama-cpp-python installing for x86_64 instead of arm64
I am trying to set up local, high speed NLP but am failing to install the arm64 version of llama-cpp-python. Even when I run CMAKE_ARGS="-DLLAMA_METAL=on -DLLAMA_METAL_EMBED_LIBRARY=on" \ ...
3 votes
0 answers
209 views
Cannot interence with images on llama-cpp-python
I am new to this. I have been trying but could not make the the model answer on images. from llama_cpp import Llama import torch from PIL import Image import base64 llm = Llama( model_path='Holo1-...
0 votes
0 answers
100 views
llama-cpp and transformers with pyinstaller in creation of .exe file
I am attempting to bundle a rag agent into a .exe. However on usage of the .exe i keep running into the same two problems. The first initial problem is with locating llama-cpp, which i have fixed. The ...
-1 votes
2 answers
599 views
while pip install llama-cpp-python getting error on windows pc
Creating directory "llava_shared.dir\Release". Structured output is enabled. The formatting of compiler diagnostics will reflect the error hierarchy. See https://aka.ms/cpp/structured-output ...
0 votes
0 answers
97 views
Generating an n-gram dataset based on an LLM
I want a dataset of common n-grams and their log likelihoods. Normally I would download the Google Books Ngram Exports, but I wonder if I can generate a better dataset using a large language model. ...
1 vote
0 answers
147 views
Does Ollama guarantee cross-platform determinism with identical quantization, seed, temperature, and version but differing hardware?
I’m working on a project that requires fully deterministic outputs across different machines using Ollama. I’ve ensured the following parameters are identical: Model quantization (e.g., llama2:7b-q4_0)...
0 votes
0 answers
265 views
Why Does Running LLaMA 13B Model with llama_cpp on CPU Take Excessive Time and Produce Poor Outputs?
I'm experiencing significant performance and output quality issues when running the LLaMA 13B model using the llama_cpp library on my laptop. The same setup works efficiently with the LLaMA 7B model. ...
0 votes
0 answers
279 views
How do you enable runtime-repack in llama cpp python?
After updating llama-cpp-python I am getting an error when trying to run an ARM optimized GGUF model TYPE_Q4_0_4_4 REMOVED, use Q4_0 with runtime repacking. After looking into it, the error comes from ...
2 votes
1 answer
1k views
How to make a llm remember previous runtime chats
I want my llm chatbot to remember previous conversations even after restarting the program. It is made with llama cpp python and langchain, it has conversation memory of the present chat but obviously ...
1 vote
0 answers
72 views
My llama2 model is talking to itself asking question and answering it to them using Conversational retrieval chain
I was implementing RAG on a document with using the LLama2 model but my model is asking questions to itself and answering it to them. llm = LlamaCpp(model_path=model_path, temperature=0, ...
0 votes
0 answers
180 views
Unable to set top_k value in Llama cpp Python server
I start llama cpp Python server with the command: python -m llama_cpp.server --model D:\Mistral-7B-Instruct-v0.3.Q4_K_M.gguf --n_ctx 8192 --chat_format functionary Then I run my Python script which ...
2 votes
1 answer
972 views
How to use `llama-cpp-python` to output list of candidate tokens and their probabilities?
I want to manually choose my tokens by myself, instead of letting llama-cpp-python automatically choose one for me. This requires me to see a list of candidate next tokens, along their probabilities, ...
0 votes
2 answers
861 views
How do I stream output as it is being generated by an LLM in Streamlit?
code: from langchain_community.vectorstores import FAISS from langchain_community.embeddings import HuggingFaceEmbeddings from langchain import PromptTemplate from langchain_community.llms import ...