DHOB (IU5SGN): Ollama

Visualizzazione post con etichetta Ollama. Mostra tutti i post

lunedì 30 dicembre 2024

Ollama e Jetson Nano 4G

Ancora nel tentativo di utilizzare in modo serio la Jetson Nano ho provato ad usarla per Ollama seguendo le istruzioni da https://dev.to/ajeetraina/running-ollama-2-on-nvidia-jetson-nano-with-gpu-using-docker-hfi (e' stato usato il docker perche' usando lo script di installazione dal sito di Ollama fallisce per la necessita' di capabilitie maggiore di quella disponibile da Jetson Nano)

Per creare il docker si seguono i comandi

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
    | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
    | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
    | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update

sudo apt-get install -y nvidia-container-toolkit

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

sudo docker run -d --gpus=all --runtime nvidia -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

sudo docker exec -it ollama ollama run llama2

Alcune persone su Github si lamentano che il docker non sfrutti la GPU della Jetson ma installando il comando jtop (tramite  sudo pip3 install -U jetson-stats) si vede che il carico sulla GPU non e' a zero

schermata di jtop

Per vedere se ollama e' in fuzione basta bussare alla porta 11434

La mia Jetson ha una ram di 4Gb ma solo pochi modelli sono in grado di girare (a meno di non quantizzarli)

Funzionano i seguenti modelli

tinyllama

llama3.2:1b

starcoder:1b

Non funzionano invece

orca-mini:3b (ncessita 5.1 GiB di ram )

mistral-small (necessita 14.4 Gib)

llama3.2

Per interagire si puo' installare web ui (che e' in generale decisamente piu' lenta della linea di comando) tramite il docker

docker run -d -p 3000:8080 --runtime=nvidia --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui2 --restart always ghcr.io/open-webui/open-webui:cuda

docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:cuda

L'unico aspetto veramente positivo di Jetson nano e' il consumo massimo di 5W

comunque a questo semplice prompt l'AI TinyLLama ha fatto una serie veramente sanguionosa di errori

>>> raccontami la storia di Albert Einstein

Einstein, uno dei più grandi pensatori e fisici del XX secolo, nacque nel 1879 a Ulm, in Germania. La sua infanzia fu caratterizzata da una forte intesa con gli elementi naturali e una grande affinità per la matematica.Nel 1890 Einstein iniziò le sue scuole medie a Kiel e Karlsruhe, dove cominciò a studiare fisica e matematica. Nel 1896 si unì al Liceo Max Planck di Göttingene, dove incontrò il suo futuro compagno di insegnamento, Max Planck.Einstein studiò filosofia, letteratura e lingue, ma la sua vera passione era la fisica. Durante i suoi primi anni universitari, si avvicinò alla teoria della relatività Speciala di Albert Michelson e George Gamow, che cercavano di spiegare l'effetto fotoelettrico.Nel 1905 Einstein pubblicò i suoi esperimenti sull'effetto elettronico, che dimostrarono la connessione tra le onde e le particelle. Questa scoperta lo aprì alla teoria della relatività generale di Albert Einstein, sviluppata dopo il 1905. Nel 1914 Einstein si unì al corpo terrestre militare tedesco durante la Prima Guerra Mondiale e fu catturato dai soldati austriaci. Durante l'arresto Einstein scrisse la sua famosa lettera a Max Planck, in cui chiese di essere reso responsabile per il tempo di vita che aveva trascorso nella prigione. Nel 1919 Einstein ricevette la medaglia Nobel per la fisica per la sua teoria della relatività generale. Tuttavia, ricevette anche una serie di minacce e messaggi oscure, tra cui un'ultima minaccia che lo costringe a lasciare l'Austria nel 1933.Durante il regime nazista, Einstein si ritirò in Svizzera, dove continuò a studiare fisica. Nel 1932 ricevette la medaglia Copley per la sua carriera accademica. Tuttavia, si unì all'Ordine di Malta nel 1933 e fu invitato a lasciare gli Stati Uniti. Nel 1935 Einstein pubblicò il suo libro "La relatività generale", che divenne un successo internazionale. Tuttavia, ricevette sempre più minacce e messaggi oscari, e nel 1938 scrisse la sua famosa formula dell'Einstein-Rosen bridge. Durante l'Anno di Natale 1939 Einstein fu attaccato con una bomba atomica a Praga, in Cecoslovacchia. La sua salute si deteriorò ulteriormente negli anni successivi e morì il 18 April 1955 a Princeton, nel New Jersey. Einstein lasciò un'impronta indelebile sulla scienza e nella cultura, e la sua teoria della relatività generale rimane una delle fondamenta fondazionali dell'attuale fisica. La sua storia è un esempio di come la curiosità, l'iniziativa e la determinazione possano portare a grandi risultati scientifici.

martedì 17 settembre 2024

RAG con Ollama Mistral e LangChain

Una altra prova usando questo repository https://github.com/CallumJMac/lessons

Il folder di riferimento e' lessons/1. RAG/examples/pixegami /PDF_files_langchain/rag-tutorial-v2-main

I files Pdf vanno messi nel folder data

Poi si lancia populate_database.py. La persistenza e' data da ChromaDB

import argparse
import os
import shutil
from langchain.document_loaders.pdf import PyPDFDirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.schema.document import Document
from get_embedding_function import get_embedding_function
from langchain.vectorstores.chroma import Chroma


CHROMA_PATH = "chroma"
DATA_PATH = "data"


def main():

    # Check if the database should be cleared (using the --clear flag).
    parser = argparse.ArgumentParser()
    parser.add_argument("--reset", action="store_true", help="Reset the database.")
    args = parser.parse_args()
    if args.reset:
        print("✨ Clearing Database")
        clear_database()

    # Create (or update) the data store.
    documents = load_documents()
    chunks = split_documents(documents)
    add_to_chroma(chunks)


def load_documents():
    document_loader = PyPDFDirectoryLoader(DATA_PATH)
    return document_loader.load()


def split_documents(documents: list[Document]):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=800,
        chunk_overlap=80,
        length_function=len,
        is_separator_regex=False,
    )
    return text_splitter.split_documents(documents)


def add_to_chroma(chunks: list[Document]):
    # Load the existing database.
    db = Chroma(
        persist_directory=CHROMA_PATH,
        embedding_function=get_embedding_function()
    )

    # Calculate Page IDs.
    chunks_with_ids = calculate_chunk_ids(chunks)

    # Add or Update the documents.
    existing_items = db.get(include=[])  # IDs are always included by default
    existing_ids = set(existing_items["ids"])
    print(f"Number of existing documents in DB: {len(existing_ids)}")

    # Only add documents that don't exist in the DB.
    new_chunks = []
    for chunk in chunks_with_ids:
        if chunk.metadata["id"] not in existing_ids:
            new_chunks.append(chunk)

    if len(new_chunks):
        print(f"👉 Adding new documents: {len(new_chunks)}")
        new_chunk_ids = [chunk.metadata["id"] for chunk in new_chunks]
        db.add_documents(new_chunks, ids=new_chunk_ids)
        db.persist()
    else:
        print("✅ No new documents to add")


def calculate_chunk_ids(chunks):

    # This will create IDs like "data/monopoly.pdf:6:2"
    # Page Source : Page Number : Chunk Index

    last_page_id = None
    current_chunk_index = 0

    for chunk in chunks:
        source = chunk.metadata.get("source")
        page = chunk.metadata.get("page")
        current_page_id = f"{source}:{page}"

        # If the page ID is the same as the last one, increment the index.
        if current_page_id == last_page_id:
            current_chunk_index += 1
        else:
            current_chunk_index = 0

        # Calculate the chunk ID.
        chunk_id = f"{current_page_id}:{current_chunk_index}"
        last_page_id = current_page_id

        # Add it to the page meta-data.
        chunk.metadata["id"] = chunk_id

    return chunks


def clear_database():
    if os.path.exists(CHROMA_PATH):
        shutil.rmtree(CHROMA_PATH)


if __name__ == "__main__":
    main()

In seguito si puo' effettuare la query da linea di comando come

python query_data.py "what's monopoly"

import argparse
from langchain.vectorstores.chroma import Chroma
from langchain.prompts import ChatPromptTemplate
from langchain_community.llms.ollama import Ollama

from get_embedding_function import get_embedding_function

CHROMA_PATH = "chroma"

PROMPT_TEMPLATE = """
Answer the question based only on the following context:

{context}

---

Answer the question based on the above context: {question}
"""


def main():
    # Create CLI.
    parser = argparse.ArgumentParser()
    parser.add_argument("query_text", type=str, help="The query text.")
    args = parser.parse_args()
    query_text = args.query_text
    query_rag(query_text)


def query_rag(query_text: str):
    # Prepare the DB.
    embedding_function = get_embedding_function()
    db = Chroma(persist_directory=CHROMA_PATH, embedding_function=embedding_function)

    # Search the DB.
    results = db.similarity_search_with_score(query_text, k=5)

    context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in results])
    prompt_template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
    prompt = prompt_template.format(context=context_text, question=query_text)
    # print(prompt)

    model = Ollama(model="mistral")
    response_text = model.invoke(prompt)

    sources = [doc.metadata.get("id", None) for doc, _score in results]
    formatted_response = f"Response: {response_text}\nSources: {sources}"
    print(formatted_response)
    return response_text


if __name__ == "__main__":
    main()

giusto per dare un'idea questa e' la risposta

Response: Monopoly is a property trading game from Parker Brothers designed for ages 8 and up, suitable for 2 to 8 players. The gameboard is used along with tokens, houses, hotels, Chance and Community Chest cards, Title Deed cards, play money, and a Banker's tray. Players can choose to play by the classic rules or use the Speed Die for faster gameplay. In Monopoly, the objective is to become the wealthiest player by buying, renting, and selling properties.

Usando LLama3:7b la risposta e' stata

A simple one!

According to the context, Monopoly is a "Property Trading Game" from Parker Brothers.

RAG con Ollama Gemma2 e LangChain

Aggiornamento:

Per poter avere le risposte in italiano (al posto dell'inglese di default) basta modificare il context template

<|context|>
You are an AI assistant that follows instruction extremely well.
Please be truthful and give direct answers
Answer in italian
</s>

questa la risposta alla domanda "Descrivimi il monopoly"

Monopoly è un gioco da tavolo di strategia e fortuna.

I giocatori comprano, vendono e scambiano proprietà immobiliari sul tabellone, cercando di accumulare il maggior patrimonio possibile.

Si lanciano i dadi per muoversi sul tabellone, acquistando le caselle che incontrano. Alcune caselle comportano pagamenti o attività specifiche, come andare in prigione o pagare tasse.

Il gioco prosegue finché tutti tranne uno dei giocatori sono falliti a causa del debito. Il vincitore è il giocatore che rimane con denaro e proprietà alla fine.

Per la cronaca il Pdf delle istruzioni di Monopoly era in inglese

=========================================

Seguendo un po' la moda del momento volevo provare a fare una applicazione RAG (Retrieval Augmented Generation) partendo dal modello Gemma2 di Google e dandogli in pasto dei Pdf personali per il retraining. Per questa prova ho sentito le indicazioni contenute in questo notebook

Prima cosa: per usare Gemma2 la macchina deve avere almeno 9 Gb di ram liberi (giusto per la cronaca per usare il modello LLama 70b sono necessari almeno 40 Gb di ram liberi..diciamo che non se ne parla nemmeno)

Secondo: per installare i modelli la cosa piu' comodo e' utilizzare Ollama

Terzo : molti esempi si basano su API esterne come https://jina.ai/, OpenAI, ma in questo caso ci sono pacchetti gratuiti e poi si deve pagare il servizio. Cercavo di mettere in piedi una soluzione offline svincolata da servizi cloud di terze parti

Si scaricano quindi il modello ed un text encoder (in questo caso Nomic-Embed-Text)

ollama pull gemma2

ollama pull nomic-embed-text

Si installa poi i pacchetti delle librerie (meglio in un venv)

pip install langchain chromadb langchain_community pypdf

ChromaDb e' il database in cui vengono salvati i vectorstore per garantire la persistenza dell'apprendimento. E' in pratica un Db orientato all'AI basato su sqlite; una alternativa con un vero server e' Milvus

I file PDF devono essere inseriti nel folder ./Pdf

# -*- coding: utf-8 -*-

from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import CharacterTextSplitter,RecursiveCharacterTextSplitter
from langchain_community.embeddings import OllamaEmbeddings
from langchain.vectorstores import Chroma
from langchain_community.llms import Ollama
from langchain.chains import RetrievalQA, LLMChain

import pathlib
import textwrap
from IPython.display import display
from IPython.display import Markdown



def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))


loader = PyPDFDirectoryLoader("./Pdf")
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks = text_splitter.split_documents(docs)


embeddings = OllamaEmbeddings(model="nomic-embed-text")



vectorstore = Chroma.from_documents(chunks, embeddings)



query = "who is at risk of heart disease"
search = vectorstore.similarity_search(query)

to_markdown(search[0].page_content)

"""## Retriever"""

retriever = vectorstore.as_retriever(
    search_kwargs={'k': 5}
)

retriever.get_relevant_documents(query)

"""## Large Language Model - Open Source

## RAG Chain
"""

from langchain_community.llms import Ollama
llm = Ollama(model="gemma2")

from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser
from langchain.prompts import ChatPromptTemplate

template = """
<|context|>
You are an AI assistant that follows instruction extremely well.
Please be truthful and give direct answers
</s>
<|user|>
{query}
</s>
 <|assistant|>
"""

prompt = ChatPromptTemplate.from_template(template)

rag_chain = (
    {"context": retriever,  "query": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

response = rag_chain.invoke("why should I care about my heart health")

to_markdown(response)

import sys

while True:
  user_input = input(f"Input Prompt: ")
  if user_input == 'exit':
    print('Exiting')
    sys.exit()
  if user_input == '':
    continue
  result = rag_chain.invoke(user_input)
  print("Answer: ",result)

Ho inserito due Pdf (regole di Monopoly e Ticket to Ride), questo un esempio di interazione

===========================================

Input Prompt: give me the rules of Monopoly
Answer: ## Monopoly Rules (Simplified):

**Goal:** Be the last player standing with money! Drive your opponents into bankruptcy by buying properties, building houses and hotels, and charging rent.

**Gameplay:**

1. **Roll Dice & Move:** Players roll two dice and move their piece around the board clockwise.
2. **Land on Property?**
* **Unowned:** You can buy it for the listed price.
* **Owned:** Pay rent to the owner based on the property's value and any buildings.
3. **Chance & Community Chest:** Draw a card and follow its instructions. These can be good or bad!
4. **Go To Jail:** Land on "Go To Jail," draw a "Go To Jail" card, or roll doubles three times in a row. While in jail, you can't collect rent or move until you pay a fine, use a "Get Out of Jail Free" card, or roll doubles.
5. **Build Houses & Hotels:** Once you own all properties in a color group, you can start building houses and hotels. These increase rent significantly!
6. **Bankruptcy:** If you can't afford to pay your debts, you're bankrupt and out of the game.

**Key Elements:**

* **Properties:** Divided into color groups, each with different rent values.
* **Railroads & Utilities:** Pay rent based on dice rolls.
* **Taxes:** Pay a set amount when landing on specific spaces.
* **Free Parking:** A safe space where you can collect any "Free Parking" money (rules vary).

**Variations:**

Many variations of Monopoly exist, with different rules and themes. Be sure to clarify the specific rules being used before starting a game!

Let me know if you'd like more detail on any specific aspect of the rules!

===========================================

Adesso rendiamo le cose piu' difficili...domanda in italiano e richiesta di trovare le differenze tra i due giochi. La domanda viene correttamente interpretata anche se in italiano ed in effetti la risposta fa una sintesi delle differenze tra i due giochi

===========================================

Answer: Here are the key differences between Monopoly and Ticket to Ride:

**Monopoly:**

* **Goal:** Become the richest player by buying properties, developing them, and charging rent to other players.
* **Gameplay:** Players roll dice, move around a board, buy properties, build houses and hotels, and collect rent from opponents who land on their spaces.
* **Emphasis:** Strategic property acquisition, negotiation, and bankrupting opponents.

**Ticket to Ride:**

* **Goal:** Score the most points by claiming railway routes connecting cities across a map.
* **Gameplay:** Players draw colored train cards and use them to claim routes. Longer routes earn more points, and completing destination tickets (secret routes) awards bonus points.
* **Emphasis:** Route building, strategic card management, and fulfilling destination goals.

**In short:** Monopoly is about accumulating wealth and bankrupting opponents, while Ticket to Ride is about building railway networks and strategically claiming routes.

DHOB (IU5SGN)

lunedì 30 dicembre 2024

Ollama e Jetson Nano 4G

martedì 17 settembre 2024

RAG con Ollama Mistral e LangChain

RAG con Ollama Gemma2 e LangChain

NRDE

Clock

Ricerca

analitcs