30 Tage DSPy-Challenge – Tag 22: Agenten mit dspy.ReAct

In den vorangegangenen Lektionen lag der Fokus auf der Optimierung von Prompts und dem Abruf von Informationen (RAG). Am Tag 22 widme ich mich der Erstellung eines Agenten, der aktiv Werkzeuge nutzen kann.

Das ReAct-Konzept: Reasoning and Acting

Sprachmodelle (LLMs) besitzen die Fähigkeiten in der Textgenerierung, weisen jedoch Schwächen bei exakten Berechnungen oder dem Zugriff auf Echtzeitdaten auf. Um diese Lücke zu schließen, wurde das ReAct-Paradigma („Reasoning and Acting“) entwickelt.

Das Konzept kombiniert zwei Prozesse:

Reasoning (Überlegen)
Das Modell analysiert die Eingabe und plant die nächsten Schritte (ähnlich wie bei Chain-of-Thought).

Acting (Handeln)
Das Modell entscheidet, ein externes Werkzeug (Tool) aufzurufen, um spezifische Informationen zu erhalten oder Berechnungen durchzuführen.

Ablauf

Gedanke (Thought) –> Was muss getan werden?
Aktion (Action) –>Aufruf eines Werkzeugs mit spezifischen Parametern.
Beobachtung (Observation) –> Das Ergebnis des Werkzeugs wird zurück an das Modell gegeben.
Antwort (Answer) –> Sobald genügend Informationen vorliegen, generiert das Modell die finale Antwort.

DSPy abstrahiert diesen komplexen Prozess durch das Modul dspy.ReAct.

Definition des Werkzeugs

Bevor ein ReAct-Agent erstellt werden kann, muss das Werkzeug definiert werden, auf das der Agent zugreifen soll. In diesem Beispiel dient eine einfache Python-Funktion als Taschenrechner.

Für DSPy ist es entscheidend, dass die Funktion über einen aussagekräftigen Docstring verfügt. Dieser Docstring dient dem Sprachmodell als Instruktion, wofür das Werkzeug genutzt wird und wie die Eingabeparameter zu formatieren sind.

import dspy
from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct, VectorParams, Distance
import ollama
from typing import List, Union
import numpy as np

# Konfiguration des lokalen LLMs (z.B. Qwen oder Llama 3 via Ollama)
local_llm = dspy.LM(
    "openai/qwen3:30b",
    api_base="http://localhost:11434/v1", 
    api_key="no_key_needed"
)

dspy.configure(lm=local_llm)

def taschenrechner(ausdruck: str) -> str:
    """
    Berechnet das Ergebnis eines mathematischen Ausdrucks.
    Nimmt einen String entgegen, der eine mathematische Formel enthält (z.B. "5 * 5 + 2").
    Gibt das Ergebnis als String zurück.
    """
    try:
        # Hinweis: eval() sollte in Produktionsumgebungen mit Vorsicht genutzt werden.
        # Für dieses Beispiel ist es ausreichend.
        return str(eval(ausdruck))
    except Exception as e:
        return f"Fehler bei der Berechnung: {e}"

Definition der Signatur

Wie bei anderen DSPy-Modulen bildet eine Signatur die Grundlage für die Interaktion. Für eine Rechenaufgabe wird eine einfache Struktur benötigt, die eine Frage entgegennimmt und eine Antwort liefert.

class MatheAufgabe(dspy.Signature):
    """Beantwortet Fragen, die Berechnungen erfordern."""

    question = dspy.InputField(desc="Die mathematische Frage oder Aufgabe.")
    answer = dspy.OutputField(desc="Die berechnete Antwort.")

Implementierung des ReAct-Agenten

Das Modul dspy.ReAct übernimmt die Orchestrierung zwischen dem Sprachmodell und den definierten Werkzeugen. Bei der Initialisierung werden die Signatur und eine Liste der verfügbaren Werkzeuge übergeben.

# Initialisierung des ReAct-Moduls mit der Signatur und dem Tool
agent = dspy.ReAct(MatheAufgabe, tools=[taschenrechner])

# Ausführung einer komplexen Rechenaufgabe
frage = "Was ist das Ergebnis von 35 mal 12 geteilt durch 4?"
resultat = agent(question=frage)

print(f"Frage: {frage}")
print(f"Antwort: {resultat.answer}")

Analyse der Ausführung

Um zu verstehen, wie dspy.ReAct arbeitet, lohnt sich wieder ein Blick auf die internen Schritte, die das Modell durchlaufen hat mit inspect_history.

print(f"--- ANALYSE ---\n")
# Inspektion der letzten Interaktionen
local_llm.inspect_history(n=5)

Das Ergebnis sieht wie folgt aus:

--- ANALYSE ---

System message:

Your input fields are:
1. `question` (str): Die mathematische Frage oder Aufgabe.
2. `trajectory` (str):
Your output fields are:
1. `next_thought` (str): 
2. `next_tool_name` (Literal['taschenrechner', 'finish']): 
3. `next_tool_args` (dict[str, Any]):
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## trajectory ## ]]
{trajectory}

[[ ## next_thought ## ]]
{next_thought}

[[ ## next_tool_name ## ]]
{next_tool_name}        # note: the value you produce must exactly match (no extra characters) one of: taschenrechner; finish

[[ ## next_tool_args ## ]]
{next_tool_args}        # note: the value you produce must adhere to the JSON schema: {"type": "object", "additionalProperties": true}

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        Beantwortet Fragen, die Berechnungen erfordern.
        
        You are an Agent. In each episode, you will be given the fields `question` as input. And you can see your past trajectory so far.
        Your goal is to use one or more of the supplied tools to collect any necessary information for producing `answer`.
        
        To do this, you will interleave next_thought, next_tool_name, and next_tool_args in each turn, and also when finishing the task.
        After each tool call, you receive a resulting observation, which gets appended to your trajectory.
        
        When writing next_thought, you may reason about the current situation and plan for future steps.
        When selecting the next_tool_name and its next_tool_args, the tool must be one of:
        
        (1) taschenrechner, whose description is <desc>      Berechnet das Ergebnis eines mathematischen Ausdrucks.      Nimmt einen String entgegen, der eine mathematische Formel enthält (z.B. "5 * 5 + 2").      Gibt das Ergebnis als String zurück.      </desc>. It takes arguments {'ausdruck': {'type': 'string'}}.
        (2) finish, whose description is <desc>Marks the task as complete. That is, signals that all information for producing the outputs, i.e. `answer`, are now available to be extracted.</desc>. It takes arguments {}.
        When providing `next_tool_args`, the value inside the field must be in JSON format


User message:

[[ ## question ## ]]
Was ist das Ergebnis von 35 mal 12 geteilt durch 4?

[[ ## trajectory ## ]]


Respond with the corresponding output fields, starting with the field `[[ ## next_thought ## ]]`, then `[[ ## next_tool_name ## ]]` (must be formatted as a valid Python Literal['taschenrechner', 'finish']), then `[[ ## next_tool_args ## ]]` (must be formatted as a valid Python dict[str, Any]), and then ending with the marker for `[[ ## completed ## ]]`.


Response:

[[ ## next_thought ## ]]
The question requires calculating 35 multiplied by 12 and then divided by 4. To resolve this, I'll use the calculator tool with the expression "35 * 12 / 4" to get the precise result.

[[ ## next_tool_name ## ]]
taschenrechner

[[ ## next_tool_args ## ]]
{"ausdruck": "35 * 12 / 4"}

[[ ## completed ## ]]

System message:

Your input fields are:
1. `question` (str): Die mathematische Frage oder Aufgabe.
2. `trajectory` (str):
Your output fields are:
1. `next_thought` (str): 
2. `next_tool_name` (Literal['taschenrechner', 'finish']): 
3. `next_tool_args` (dict[str, Any]):
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## trajectory ## ]]
{trajectory}

[[ ## next_thought ## ]]
{next_thought}

[[ ## next_tool_name ## ]]
{next_tool_name}        # note: the value you produce must exactly match (no extra characters) one of: taschenrechner; finish

[[ ## next_tool_args ## ]]
{next_tool_args}        # note: the value you produce must adhere to the JSON schema: {"type": "object", "additionalProperties": true}

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        Beantwortet Fragen, die Berechnungen erfordern.
        
        You are an Agent. In each episode, you will be given the fields `question` as input. And you can see your past trajectory so far.
        Your goal is to use one or more of the supplied tools to collect any necessary information for producing `answer`.
        
        To do this, you will interleave next_thought, next_tool_name, and next_tool_args in each turn, and also when finishing the task.
        After each tool call, you receive a resulting observation, which gets appended to your trajectory.
        
        When writing next_thought, you may reason about the current situation and plan for future steps.
        When selecting the next_tool_name and its next_tool_args, the tool must be one of:
        
        (1) taschenrechner, whose description is <desc>      Berechnet das Ergebnis eines mathematischen Ausdrucks.      Nimmt einen String entgegen, der eine mathematische Formel enthält (z.B. "5 * 5 + 2").      Gibt das Ergebnis als String zurück.      </desc>. It takes arguments {'ausdruck': {'type': 'string'}}.
        (2) finish, whose description is <desc>Marks the task as complete. That is, signals that all information for producing the outputs, i.e. `answer`, are now available to be extracted.</desc>. It takes arguments {}.
        When providing `next_tool_args`, the value inside the field must be in JSON format


User message:

[[ ## question ## ]]
Was ist das Ergebnis von 35 mal 12 geteilt durch 4?

[[ ## trajectory ## ]]
[[ ## thought_0 ## ]]
The question requires calculating 35 multiplied by 12 and then divided by 4. To resolve this, I'll use the calculator tool with the expression "35 * 12 / 4" to get the precise result.

[[ ## tool_name_0 ## ]]
taschenrechner

[[ ## tool_args_0 ## ]]
{"ausdruck": "35 * 12 / 4"}

[[ ## observation_0 ## ]]
105.0

Respond with the corresponding output fields, starting with the field `[[ ## next_thought ## ]]`, then `[[ ## next_tool_name ## ]]` (must be formatted as a valid Python Literal['taschenrechner', 'finish']), then `[[ ## next_tool_args ## ]]` (must be formatted as a valid Python dict[str, Any]), and then ending with the marker for `[[ ## completed ## ]]`.


Response:

[[ ## next_thought ## ]]
The calculation 35 * 12 / 4 has been performed and yielded 105.0. This is the correct result, so the task is complete.

[[ ## next_tool_name ## ]]
finish

[[ ## next_tool_args ## ]]
{}

[[ ## completed ## ]]

System message:

Your input fields are:
1. `question` (str): Die mathematische Frage oder Aufgabe.
2. `trajectory` (str):
Your output fields are:
1. `reasoning` (str): 
2. `answer` (str): Die berechnete Antwort.
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## trajectory ## ]]
{trajectory}

[[ ## reasoning ## ]]
{reasoning}

[[ ## answer ## ]]
{answer}

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        Beantwortet Fragen, die Berechnungen erfordern.


User message:

[[ ## question ## ]]
Was ist das Ergebnis von 35 mal 12 geteilt durch 4?

[[ ## trajectory ## ]]
[[ ## thought_0 ## ]]
The question requires calculating 35 multiplied by 12 and then divided by 4. To resolve this, I'll use the calculator tool with the expression "35 * 12 / 4" to get the precise result.

[[ ## tool_name_0 ## ]]
taschenrechner

[[ ## tool_args_0 ## ]]
{"ausdruck": "35 * 12 / 4"}

[[ ## observation_0 ## ]]
105.0

[[ ## thought_1 ## ]]
The calculation 35 * 12 / 4 has been performed and yielded 105.0. This is the correct result, so the task is complete.

[[ ## tool_name_1 ## ]]
finish

[[ ## tool_args_1 ## ]]
{}

[[ ## observation_1 ## ]]
Completed.

Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


Response:

[[ ## reasoning ## ]]
Die Berechnung folgt der Reihenfolge der Operationen: Zuerst wird 35 mal 12 gerechnet, dann das Ergebnis durch 4 geteilt.  
35 × 12 = 420  
420 ÷ 4 = 105  

Der berechnete Wert stimmt mit der Ausgabe des Taschenrechners (105.0) überein.

[[ ## answer ## ]]
105

[[ ## completed ## ]]

Die vorliegenden Protokolle dokumentieren den Ablauf einer Problemlösung durch einen KI-Agenten, der nach dem ReAct-Paradigma (Reasoning and Acting) arbeitet. Der Prozess beginnt mit einer natürlichsprachlichen Anfrage und endet mit einer finalen, verifizierten Antwort, wobei der Agent autonom entscheidet, wann und wie er externe Werkzeuge einsetzt.

Phase 1: Initialisierung und Handlungsplanung

Im ersten Interaktionsschritt wird der Agent mit der Aufgabe konfrontiert, das Ergebnis der Rechnung „35 mal 12 geteilt durch 4“ zu ermitteln. Zu diesem Zeitpunkt ist das Feld trajectory (der Verlauf bisheriger Handlungen) noch leer. Das System instruiert das Modell, basierend auf der Eingabe einen Gedanken (next_thought) zu formulieren und anschließend ein Werkzeug auszuwählen.

Der Agent analysiert die Eingabe und erkennt korrekt, dass es sich um eine mathematische Operation handelt, die gelöst werden muss. Anstatt zu halluzinieren oder das Ergebnis zu schätzen, formuliert das Modell den Plan, das bereitgestellte Werkzeug taschenrechner zu nutzen. Es übersetzt die natürlichsprachliche Anweisung in einen mathematischen Ausdruck ("35 * 12 / 4"), der als Argument für das Werkzeug dient. Dieser Schritt demonstriert die Fähigkeit des Modells, unstrukturierte Textdaten in strukturierte Befehle für API-Aufrufe oder Funktionsaufrufe zu transformieren.

Phase 2: Beobachtung und Abschlussentscheidung

Der zweite Protokolleintrag zeigt den Zustand nach der Ausführung des Werkzeugs. Das System hat den vom Agenten angeforderten Funktionsaufruf durchgeführt und das Ergebnis (105.0) als observation_0 in die Historie (trajectory) zurückgespielt. Der Agent erhält nun den gesamten bisherigen Kontext: seine ursprüngliche Überlegung, den Werkzeugaufruf und das daraus resultierende Ergebnis.

In dieser Phase bewertet das Modell den neuen Informationsstand. Es stellt fest, dass die Rückgabe des Taschenrechners das gesuchte Ergebnis liefert und keine weiteren Berechnungsschritte notwendig sind. Basierend auf dieser Erkenntnis wählt der Agent nun das spezielle Werkzeug finish. Dies signalisiert dem steuernden Framework, dass die Informationsbeschaffung abgeschlossen ist und der Agent bereit ist, die finale Antwort zu generieren. Das Argument für dieses Werkzeug bleibt leer, da es lediglich als Abbruchbedingung für die Schleife dient.

Phase 3: Synthese und Antwortgenerierung

Der letzte Schritt markiert den Übergang von der Handlungs- zur Antwortphase. Nachdem das finish-Signal empfangen wurde, ändert das System die Aufgabenstellung für das Modell. Es wird nicht mehr nach dem nächsten Werkzeug gefragt, sondern nach der finalen Begründung (reasoning) und der Antwort (answer).

Der Agent nutzt die vollständige Historie, um die Antwort herzuleiten. Interessanterweise gibt das Modell im Feld reasoning nicht nur das Ergebnis wieder, sondern rekonstruiert den Rechenweg (35 mal 12 gleich 420, geteilt durch 4 gleich 105), um die Plausibilität zu untermauern. Dies dient der Transparenz und Verifizierbarkeit. Schließlich extrahiert der Agent den Wert 105 als finale Antwort. Dieser letzte Schritt verdeutlicht den Vorteil von ReAct: Die Antwort basiert nicht auf den internen Gewichten des Sprachmodells, sondern auf den deterministischen Daten des externen Werkzeugs, die lediglich sprachlich verpackt werden.

Zusammenfassung

Die Nutzung von dspy.ReAct ermöglicht es, die semantischen Fähigkeiten von LLMs mit der Präzision von funktionalem Code zu verbinden. Durch die Definition von Signaturen und Werkzeugen können Agenten erstellt werden, die nicht nur Text generieren, sondern aktiv Probleme durch die Nutzung externer Ressourcen lösen. Dies bildet die Grundlage für komplexe Anwendungen, die API-Zugriffe, Datenbankabfragen oder mathematische Operationen erfordern.