Skip to main content
AI Assist is now on Stack Overflow. Start a chat to get instant answers from across the network. Sign up to save and share your chats.
18 votes
7 answers
77k views

I am using Llama to create an application. Previously I used openai but am looking for a free alternative. Based on my limited research, this library provides openai-like api access making it quite ...
hehe's user avatar
  • 845
13 votes
11 answers
77k views

I have been playing around with oobabooga text-generation-webui on my Ubuntu 20.04 with my NVIDIA GTX 1060 6GB for some weeks without problems. I have been using llama2-chat models sharing memory ...
imbr's user avatar
  • 7,872
5 votes
2 answers
10k views

I'm trying to use llama-cpp-python (a Python wrapper around llama.cpp) to do inference using the Llama LLM in Google Colab. My code looks like this: !pip install llama-cpp-python from llama_cpp import ...
Utrax's user avatar
  • 53
5 votes
2 answers
14k views

I built a Q/A query bot over a 4MB csv file I have in my local, I'm using chroma for vector DB creation and with embedding model being Instructor Large from hugging face, and LLM chat model being ...
Avish Wagde's user avatar
4 votes
1 answer
3k views

I can install llama cpp with cuBLAS using pip as below: CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python However, I don't know how to install it with cuBLAS when ...
KimuGenie's user avatar
4 votes
1 answer
4k views

I'm trying to run llama index with llama cpp by following the installation docs but inside a docker container. Following this repo for installation of llama_cpp_python==0.2.6. DOCKERFILE # Use the ...
Pratyush's user avatar
3 votes
1 answer
4k views

I'm reaching out to the community for some assistance with an issue I'm encountering in llama.cpp. Previously, the program was successfully utilizing the GPU for execution. However, recently, it seems ...
Montassar Jaziri's user avatar
3 votes
1 answer
5k views

I have built a RAG application with Langchain and now want to deploy it with FastAPI. Generally it works tto call a FastAPI endpoint and that the answer of the LCEL-chain gets streamed. However I want ...
Maxl Gemeinderat's user avatar
3 votes
1 answer
3k views

Confession: At first, I am not an expert at all in this sector; I am just practicing and trying to learn while working. Also, I am confused about whether this kind of model does not run on this type ...
Mahmud Arfan's user avatar
3 votes
0 answers
209 views

I am new to this. I have been trying but could not make the the model answer on images. from llama_cpp import Llama import torch from PIL import Image import base64 llm = Llama( model_path='Holo1-...
Abhash Rai's user avatar
2 votes
1 answer
973 views

I want to manually choose my tokens by myself, instead of letting llama-cpp-python automatically choose one for me. This requires me to see a list of candidate next tokens, along their probabilities, ...
caveman's user avatar
  • 464
2 votes
2 answers
4k views

Question How can I programmatically check if llama-cpp-python is installed with support for a CUDA-capable GPU? Context In my program, I am trying to warn the developers when they fail to configure ...
Programmer.zip's user avatar
2 votes
1 answer
4k views

I am trying to run a RAG with Gemma LLM locally it is running fine but the idea is I can't handle more than one request at a time. Is there a way to handle concurrent requests with utilizing resources ...
khalidwalamri's user avatar
2 votes
1 answer
1k views

I want my llm chatbot to remember previous conversations even after restarting the program. It is made with llama cpp python and langchain, it has conversation memory of the present chat but obviously ...
QUARKS's user avatar
  • 29
2 votes
0 answers
1k views

raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by ...
Abhishek Kapoor's user avatar

15 30 50 per page
1
2 3 4 5