Subscribe to RSS

Question 1

I am using Llama to create an application. Previously I used openai but am looking for a free alternative. Based on my limited research, this library provides openai-like api access making it quite ...

Question 2

I have been playing around with oobabooga text-generation-webui on my Ubuntu 20.04 with my NVIDIA GTX 1060 6GB for some weeks without problems. I have been using llama2-chat models sharing memory ...

Question 3

I'm trying to use llama-cpp-python (a Python wrapper around llama.cpp) to do inference using the Llama LLM in Google Colab. My code looks like this: !pip install llama-cpp-python from llama_cpp import ...

Question 4

I built a Q/A query bot over a 4MB csv file I have in my local, I'm using chroma for vector DB creation and with embedding model being Instructor Large from hugging face, and LLM chat model being ...

Question 5

I can install llama cpp with cuBLAS using pip as below: CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python However, I don't know how to install it with cuBLAS when ...

Question 6

I'm trying to run llama index with llama cpp by following the installation docs but inside a docker container. Following this repo for installation of llama_cpp_python==0.2.6. DOCKERFILE # Use the ...

Question 7

I'm reaching out to the community for some assistance with an issue I'm encountering in llama.cpp. Previously, the program was successfully utilizing the GPU for execution. However, recently, it seems ...

Question 8

I have built a RAG application with Langchain and now want to deploy it with FastAPI. Generally it works tto call a FastAPI endpoint and that the answer of the LCEL-chain gets streamed. However I want ...

Question 9

Confession: At first, I am not an expert at all in this sector; I am just practicing and trying to learn while working. Also, I am confused about whether this kind of model does not run on this type ...

Question 10

I am new to this. I have been trying but could not make the the model answer on images. from llama_cpp import Llama import torch from PIL import Image import base64 llm = Llama( model_path='Holo1-...

Question 11

I want to manually choose my tokens by myself, instead of letting llama-cpp-python automatically choose one for me. This requires me to see a list of candidate next tokens, along their probabilities, ...

Question 12

Question How can I programmatically check if llama-cpp-python is installed with support for a CUDA-capable GPU? Context In my program, I am trying to warn the developers when they fail to configure ...

Question 13

I am trying to run a RAG with Gemma LLM locally it is running fine but the idea is I can't handle more than one request at a time. Is there a way to handle concurrent requests with utilizing resources ...

Question 14

I want my llm chatbot to remember previous conversations even after restarting the program. It is made with llama cpp python and langchain, it has conversation memory of the present chat but obviously ...

Question 15

raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by ...

Collectives™ on Stack Overflow

Error while installing python package: llama-cpp-python

llama-cpp-python not using NVIDIA GPU CUDA

AssertionError when using llama-cpp-python in Google Colab

Very slow Response from LLM based Q/A query engine

How can I install llama-cpp-python with cuBLAS using poetry?

No GPU support while running llama-cpp-python inside a docker container

Llama.cpp GPU Offloading Issue - Unexpected Switch to CPU

RAG with Langchain and FastAPI: Stream generated answer and return source documents

LLM model is not loading into the GPU even after BLAS = 1, LlamaCpp, Langchain, Mistral 7b GGUF Model

Cannot interence with images on llama-cpp-python

How to use `llama-cpp-python` to output list of candidate tokens and their probabilities?

Detecting GPU availability in llama-cpp-python

Running Local LLMs in Production and handling multiple requests

How to make a llm remember previous runtime chats

Connection error in langchain with llama2 model downloaded locally

Hot Network Questions