30-Tage-DSPy-Challenge: Tag 17: Optimierung deiner RAG-Pipeline

An Tag 17 der 30-Tage-DSPy-Challenge liegt der Fokus auf der systematischen Verbesserung einer bestehenden Retrieval Augmented Generation (RAG) Pipeline. Eine RAG-Pipeline habe ich gestern bereits implementiert. Nun probiere ich, wie der DSPy-Optimizer genutzt werden können, um die Qualität der generierten Antworten automatisiert zu steigern.

Die Rolle von Optimizern in DSPy

Ein DSPy-Optimizer, auch Teleprompter genannt, ist ein Algorithmus, der die Prompts automatisch optimiert. Anstatt Prompts manuell zu verfeinern (Prompt Engineering), lernt der Optimizer aus Daten, wie die Anweisungen an das Sprachmodell (LLM) formuliert sein müssen, um die bestmögliche Leistung zu erzielen.

Der Prozess funktioniert wie folgt:

Trainingsdatensatz
Der Optimizer benötigt einen Satz von Beispiel-Fragen und den dazugehörigen, qualitativ hochwertigen Antworten.

Evaluierungsmetrik
Eine Metrik ist erforderlich, um die Qualität der vom LLM generierten Antworten zu bewerten. Sie dient dem Optimizer als Erfolgskriterium.

Kompilierung
Der Optimizer führt das Programm mit verschiedenen Prompt-Strategien auf den Trainingsdaten aus, bewertet die Ergebnisse mit der Metrik und „kompiliert“ die erfolgreichste Strategie in ein neues, optimiertes Programm.

Für RAG-Systeme könnte dies besonders wirkungsvoll sein, wenn der Optimizer lernt, wie das LLM den abgerufenen Kontext am besten nutzen kann, um präzise und faktenbasierte Antworten zu formulieren. Spoiler – das ist in den Beispiel leider nicht der Fall 🙁

Implementierung der RAG-Optimierung

Die praktische Umsetzung erfolgt in vier Schritten:

Aufbau der Basis-RAG-Pipeline.
Erstellung eines Trainingsdatensatzes und einer Evaluierungsmetrik.
Anwendung des BootstrapFewShot-Optimizers.
Vergleich der Ergebnisse vor und nach der Optimierung.

Als Ausgangspunkt dient eine RAG-Pipeline, die einen Vektor-Retrieval aus einer Qdrant-Datenbank mit einem Generator-Modul verbindet. Die RAG Pipeline wird wie in day16.ipynb mit einem Huggingface Datensatz erstellt.

DSPy Konfiguration der Modelle

import dspy
from dspy.evaluate.evaluate import Evaluate
from dspy.teleprompt import BootstrapFewShot
from datasets import load_dataset
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

# Initialisierung der Modelle und Konfiguration
# Annahme: Lokale Server für Embedding- und Sprachmodelle sind aktiv.
embedder = dspy.Embedder(
    "openai/embeddinggemma-300M-Q8_0.gguf", 
    api_base="http://localhost:8081/v1", 
    api_key="no_key_needed"
)

local_llm = dspy.LM(
    "openai/gemma-3-4b-it-Q4_K_M.gguf", 
    api_base="http://localhost:8080/v1", 
    api_key="no_key_needed",
    temperature=0.1,
    cache=False
)

dspy.configure(lm=local_llm, embedder=embedder)

# Definition der RAG-Komponenten
client = QdrantClient(host="localhost", port=6333)
collection_name = "illuin-conteb-geography"

Datensatz „illuin-conteb/geography“ laden und in Qdrant speichern

from datasets import load_dataset

# Laden des Datensatzes
dataset = load_dataset("illuin-conteb/geography", 'documents', split="train")

documents = [item['og_chunk'] for item in dataset]

# Erstellen der Qdrant Collection (nur falls sie nicht existiert)
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams

# Qdrant Client erstellen
client = QdrantClient(host="localhost", port=6333)
embedding_dim = 768 

# Prüfen, ob die Collection existiert
if not client.collection_exists(collection_name):
    client.create_collection(
        collection_name=collection_name,
        vectors_config=VectorParams(
            size=embedding_dim,
            distance=Distance.COSINE
        )
    )
    print(f"Collection '{collection_name}' created successfully.")
else:
    print(f"Collection '{collection_name}' already exists.")

# Embedden und Indexieren der Dokumente
embeddings = embedder(documents)
len(embeddings)

from qdrant_client.models import PointStruct

points = [
    PointStruct(id=i, vector=vec, payload={"text": chunk})
    for i, (chunk, vec) in enumerate(zip(all_chunks, embeddings))
]

# Punkte in Qdrant hochladen (in Batches für große Daten)
batch_size = 100
for i in range(0, len(points), batch_size):
    batch = points[i:i+batch_size]
    client.upsert(
        collection_name=collection_name,
        points=batch
    )

print(f"{len(points)} Chunks erfolgreich in Qdrant gespeichert.")

Erstellen der RAG Pipeline

class QdrantRetriever(dspy.Retrieve):
    def __init__(self, client, collection_name, embedder, k=3):
        self._client = client
        self._collection_name = collection_name
        self._embedder = embedder
        self._k = k
        super().__init__()

    def forward(self, query_or_queries, k=None):
        k = k if k is not None else self._k
        query_embeddings = self._embedder(query_or_queries)
        results = [
            self._client.query_points(
                collection_name=collection_name,
                query=query_embeddings,
                limit=k,
            ) for emb in query_embeddings
        ]

        
        # Korrekte Extraktion für Batches
        passages = [p.payload["text"] for p in results[0].points]
        return passages

class GenerateAnswer(dspy.Signature):
    """Beantworte die Frage basierend auf dem bereitgestellten Kontext."""
    context = dspy.InputField(desc="Relevante Fakten zur Beantwortung der Frage.")
    question = dspy.InputField(desc="Die ursprüngliche Nutzerfrage.")
    answer = dspy.OutputField(desc="Eine prägnante und faktenbasierte Antwort.")

class RAG(dspy.Module):
    def __init__(self):
        super().__init__()
        self.retriever = QdrantRetriever(client, collection_name, embedder, k=3)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
        
    def forward(self, question):
        context = self.retriever(question)
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

Beispieldaten bestehend aus Paaren von Fragen und idealen Antworten (dspy.Example) erzeugen und eine Metrik definieren, die den Erfolg misst.

# Erstellen eines kleinen Trainingsdatensatzes
train_data = [
    {
        'question': "What industries have contributed to the growth of Belfast's services sector?", 
        'answer': "The industries that have contributed to the growth of Belfast's services sector are financial technology (fintech), tourism, and film."},
    {
        'question': "What is the population of Changsha, and how does it rank in terms of livability in China?",  
        'answer': "Changsha has a population of 10,513,100 and is considered the most livable city in China."},
    {
        'question': "When was Kobe founded, and how did it get its name?", 
        'answer': "Kobe was founded in 1889, and its name comes from Kanbe, an archaic title for supporters of the city's Ikuta Shrine."}
]
trainset = [dspy.Example(**x).with_inputs('question') for x in train_data]

# Definition der Validierungsmetrik
# Die Metrik prüft, ob die generierte Antwort die Gold-Antwort enthält.
def validate_answer(example, pred, trace=None):
    return example.answer.lower() in pred.answer.lower()

Der BootstrapFewShot-Optimizer wird nun verwendet, um der generate_answer-Komponente Few-Shot-Beispiele beizubringen. Er generiert Demonstrationen aus den Trainingsdaten und integriert sie in den Prompt.

teleprompter = BootstrapFewShot(metric=validate_answer, max_bootstrapped_demos=2, max_labeled_demos=2)
# Kompilierung des RAG-Moduls
optimized_rag = teleprompter.compile(RAG(), trainset=trainset)

Abschließend wird die Leistung des optimierten Modells mit dem unoptimierten Original verglichen.

# Testfrage
question = "Which groups have influenced the city's history, and what was its role in the Kingdom of Hungary?"

# Ausführung der unoptimierten Pipeline
unoptimized_rag = RAG()
prediction_unoptimized = unoptimized_rag(question)
print(f"Frage: {question}")
print(f"Antwort (Unoptimiert): {prediction_unoptimized.answer}")
print("\n--- Prompt (Unoptimiert) ---")

local_llm.inspect_history(n=1)

In dem generierten Prompt kann man sehen, dass der QdrantReceiver ein paar Beispiele gefunden hat und das LLM daraus eine Antwort ableiten konnte.

System message:

Your input fields are:
1. `context` (str): Relevante Fakten zur Beantwortung der Frage.
2. `question` (str): Die ursprüngliche Nutzerfrage.
Your output fields are:
1. `reasoning` (str): 
2. `answer` (str): Eine prägnante und faktenbasierte Antwort.
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## context ## ]]
{context}

[[ ## question ## ]]
{question}

[[ ## reasoning ## ]]
{reasoning}

[[ ## answer ## ]]
{answer}

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        Beantworte die Frage basierend auf dem bereitgestellten Kontext.


User message:

[[ ## context ## ]]
[1] «The city's history has been influenced by people of many nations and religions, including Austrians, Bulgarians, Croats, Czechs, Germans, Hungarians, Jews and Slovaks. It was the coronation site and legislative center and capital of the Kingdom of Hungary from 1536 to 1783; eleven Hungarian kings and eight queens were crowned in St Martin's Cathedral. Most Hungarian parliament assemblies were held here from the 17th century until the Hungarian Reform Era, and the city has been home to many Hungarian, German»
[2] « (1872), and power plant (1882). Yokohama developed rapidly as Japan's prominent port city following the end of Japan's relative isolation in the mid-19th century and is today one of its major ports along with Kobe, Osaka, Nagoya, Fukuoka, Tokyo and Chiba.»
[3] «Alexandria ( AL-ig-ZA(H)N-dree-ə; Arabic: الإسكندرية; Ancient Greek: Ἀλεξάνδρεια, Coptic: Ⲣⲁⲕⲟϯ - Rakoti or ⲁⲗⲉⲝⲁⲛⲇⲣⲓⲁ) is the  second largest city in Egypt and the largest city on the Mediterranean coast. It lies at the western edge of the Nile River delta. Founded in c. 331 BC by Alexander the Great, Alexandria grew rapidly and became a major centre of Hellenic civilisation, eventually replacing Memphis, in present-day Greater Cairo, as Egypt's capital. Called the "Bride of the Mediterranean" internationa»

[[ ## question ## ]]
Which groups have influenced the city's history, and what was its role in the Kingdom of Hungary?

Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.

Führt man den Code noch mal mit der optimierten RAG Pipeline aus sieht man dass in dem Prompt auch Beispiele vom Optimizer enthalten sind.

#  Ausführung der optimierten Pipeline
prediction_optimized = optimized_rag(question)
print(f"\nFrage: {question}")
print(f"Antwort (Optimiert): {prediction_optimized.answer}")
print("\n--- Prompt (Optimiert) ---")

Prompt:

System message:

Your input fields are:
1. `context` (str): Relevante Fakten zur Beantwortung der Frage.
2. `question` (str): Die ursprüngliche Nutzerfrage.
Your output fields are:
1. `reasoning` (str): 
2. `answer` (str): Eine prägnante und faktenbasierte Antwort.
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## context ## ]]
{context}

[[ ## question ## ]]
{question}

[[ ## reasoning ## ]]
{reasoning}

[[ ## answer ## ]]
{answer}

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        Beantworte die Frage basierend auf dem bereitgestellten Kontext.


User message:

This is an example of the task, though some input or output fields are not supplied.

[[ ## question ## ]]
When was Kobe founded, and how did it get its name?


Assistant message:

[[ ## reasoning ## ]]
Not supplied for this particular example. 

[[ ## answer ## ]]
Kobe was founded in 1889, and its name comes from Kanbe, an archaic title for supporters of the city's Ikuta Shrine.

[[ ## completed ## ]]


User message:

This is an example of the task, though some input or output fields are not supplied.

[[ ## question ## ]]
What is the population of Changsha, and how does it rank in terms of livability in China?


Assistant message:

[[ ## reasoning ## ]]
Not supplied for this particular example. 

[[ ## answer ## ]]
Changsha has a population of 10,513,100 and is considered the most livable city in China.

[[ ## completed ## ]]


User message:

[[ ## context ## ]]
[1] «The city's history has been influenced by people of many nations and religions, including Austrians, Bulgarians, Croats, Czechs, Germans, Hungarians, Jews and Slovaks. It was the coronation site and legislative center and capital of the Kingdom of Hungary from 1536 to 1783; eleven Hungarian kings and eight queens were crowned in St Martin's Cathedral. Most Hungarian parliament assemblies were held here from the 17th century until the Hungarian Reform Era, and the city has been home to many Hungarian, German»
[2] « (1872), and power plant (1882). Yokohama developed rapidly as Japan's prominent port city following the end of Japan's relative isolation in the mid-19th century and is today one of its major ports along with Kobe, Osaka, Nagoya, Fukuoka, Tokyo and Chiba.»
[3] «Alexandria ( AL-ig-ZA(H)N-dree-ə; Arabic: الإسكندرية; Ancient Greek: Ἀλεξάνδρεια, Coptic: Ⲣⲁⲕⲟϯ - Rakoti or ⲁⲗⲉⲝⲁⲛⲇⲣⲓⲁ) is the  second largest city in Egypt and the largest city on the Mediterranean coast. It lies at the western edge of the Nile River delta. Founded in c. 331 BC by Alexander the Great, Alexandria grew rapidly and became a major centre of Hellenic civilisation, eventually replacing Memphis, in present-day Greater Cairo, as Egypt's capital. Called the "Bride of the Mediterranean" internationa»

[[ ## question ## ]]
Which groups have influenced the city's history, and what was its role in the Kingdom of Hungary?

Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.

Das Ergebnis ist aber in beiden Fällen nahezu identisch, Modell hat also keine Beispiele benötigt, um die Frage aus dem Kontext zu beantworten. Eine Optimierung macht in diesem konkreten Fall keinen Unterschied. Entscheidend ist hier, das aus dem Qdrant Vektor Store die relevanten Informationen geladen hat und ihn in den Kontext des Modells übernimmt.

Zusammenfassung

Tag 17 demonstriert, dass die Leistung einer RAG-Pipeline in DSPy durch den Einsatz von Optimizern in diesem konkreten Beispiel nicht gesteigert werden kann.