Knowledge Integration

Agentle provides a powerful knowledge integration system that allows agents to leverage information from various sources when generating responses. This feature is particularly useful for building specialized agents that need domain-specific knowledge beyond their pre-trained capabilities.

Basic Knowledge Integration

Here’s a simple example of integrating knowledge with an agent:

from agentle.agents.agent import Agent
from agentle.agents.knowledge.static_knowledge import StaticKnowledge
from agentle.generations.providers.google.google_genai_generation_provider import GoogleGenaiGenerationProvider

# Create an agent with static knowledge
travel_expert = Agent(
    name="Japan Travel Expert",
    generation_provider=GoogleGenaiGenerationProvider(),
    model="gemini-2.0-flash",
    instructions="You are a Japan travel expert who provides detailed information about Japanese destinations.",
    # Provide static knowledge from multiple sources
    static_knowledge=[
        # Include knowledge from local documents
        StaticKnowledge(content="data/japan_travel_guide.pdf", cache=3600),  # Cache for 1 hour
        # Include knowledge from websites
        StaticKnowledge(content="https://www.japan-guide.com/", cache="infinite"),  # Cache indefinitely
        # Include direct text knowledge
        "Tokyo is the capital of Japan and one of the most populous cities in the world."
    ]
)

# The agent will incorporate the knowledge when answering
response = travel_expert.run("What should I know about visiting Tokyo in cherry blossom season?")
print(response.text)

Knowledge Source Types

The framework supports multiple knowledge source types through the StaticKnowledge class:

  • Documents: PDF, DOCX, TXT, PPTX, and other document formats

  • URLs: Web pages and online resources

  • Raw Text: Direct text snippets

You can provide knowledge in several ways:

  1. As strings: The framework will automatically convert them to StaticKnowledge objects

    agent = Agent(
        # ... other agent settings ...
        static_knowledge=[
            # URLs as strings
            "https://example.com/research-paper.html",
            # Local file paths as strings
            "data/report.pdf",
            "references/definitions.txt",
        ]
    )
    
  2. As StaticKnowledge objects: For maximum control, especially with caching

    from agentle.agents.knowledge.static_knowledge import StaticKnowledge
    
    agent = Agent(
        # ... other agent settings ...
        static_knowledge=[
            # Document with 1 hour cache
            StaticKnowledge(content="data/report.pdf", cache=3600),
            # URL with infinite cache
            StaticKnowledge(content="https://example.com/api-docs", cache="infinite"),
            # Raw text with no cache
            StaticKnowledge(content="This is raw knowledge text", cache=None),
        ]
    )
    

Caching Behavior

The caching system works as follows:

  1. If cache is not specified or is None, content is parsed fresh each time

  2. If cache is an integer, the content is cached for that many seconds (requires aiocache)

  3. If cache is the string “infinite”, the content is cached indefinitely until the process ends (requires aiocache)

To enable caching, you’ll need to install the optional aiocache package:

pip install aiocache

Caching is particularly useful for large documents or URLs that are expensive to parse repeatedly.

How Knowledge Integration Works

When you provide static knowledge to an agent:

  1. The agent uses appropriate document parsers to extract content from each knowledge source

  2. If caching is enabled and the aiocache package is installed, parsed content is cached for the specified duration

  3. The parsed content is organized into a structured knowledge base format

  4. This knowledge base is appended to the agent’s instructions

  5. When the agent responds to queries, it can leverage this knowledge base

Custom Document Parsers

For specialized knowledge extraction needs, you can provide a custom document parser to the agent:

from agentle.agents.agent import Agent
from agentle.agents.knowledge.static_knowledge import StaticKnowledge
from agentle.parsing.parsers.file_parser import FileParser

# Create a custom document parser with specialized settings
custom_parser = FileParser(
    strategy="high",  # Use high-detail parsing
    visual_description_agent=your_custom_vision_agent  # Customize image analysis
)

# Create an agent with the custom parser
research_agent = Agent(
    # ... other agent settings ...
    static_knowledge=[
        StaticKnowledge(content="research_papers/paper.pdf", cache=3600),
        # ... other knowledge sources ...
    ],
    document_parser=custom_parser
)

You can also create completely custom document parsers by implementing the DocumentParser abstract base class:

from typing import override
from pathlib import Path
from agentle.parsing.document_parser import DocumentParser
from agentle.parsing.parsed_document import ParsedDocument
from agentle.parsing.section_content import SectionContent

# Create a custom parser
class CustomParser(DocumentParser):
    """Custom document parser implementation"""

    @override
    async def parse_async(self, document_path: str) -> ParsedDocument:
        # Implement your custom parsing logic here
        path = Path(document_path)

        # For this example, we'll just use a placeholder
        parsed_content = f"Content from {path.name} would be parsed with custom logic"

        # Return in the standard ParsedDocument format
        return ParsedDocument(
            name=path.name,
            sections=[
                SectionContent(
                    number=1,
                    text=parsed_content,
                    md=parsed_content
                )
            ]
        )

# Use the custom parser with an agent
agent = Agent(
    name="Document Expert",
    # ... other agent settings ...
    static_knowledge=[
        StaticKnowledge(content="documents/report.pdf", cache="infinite")
    ],
    # Pass your custom parser to the agent
    document_parser=CustomParser()
)

Best Practices

  1. Caching Strategy: Use appropriate caching based on how frequently the source changes

  2. Knowledge Organization: Organize knowledge into related documents rather than one large document

  3. Quality over Quantity: Provide high-quality, relevant knowledge rather than overwhelming the agent

  4. Test Different Sources: Experiment with different knowledge sources to find the best combination

  5. Update Regularly: Keep knowledge sources updated, especially for domains that change frequently