Build a versatile query agent with RAG, LlamaIndex, and Google Gemini

As a developer, you often face the challenge of retrieving information from multiple sources with different structures. What if you could create a single interface that automatically routes queries to the right data source?

Imagine your application needing to answer both “What’s the population of California?” and “What are popular attractions in Hawaii?”. The first question requires precise data from a structured database, while the second benefits from searching through descriptive documents.

In this tutorial you will build a versatile query routing agent that intelligently directs questions to the approapriate data source - whether a SQL database or a document repository or both.

Your application will provide users with a seamless interface where they can ask natural language questions about U.S. states. They will receive accurate and comprehensive answers regardless of the storage location.

You will build a U.S. state information application that:

Automatically routes queries to the optimal data source
Queries a SQL database for demographic information (population, land size, etc.)
Searches document repositories for descriptions, history, and attractions
Presents results through a clean Streamlit chat interface

StreamLit chat app

Your finished application will provide a conversational interface for users to access both structured and unstructured state information.

Prerequisites

Before you begin, make sure you have:

Python 3.9 or higher installed on your system
Basic understanding of Python programming
Familiarity with SQL databases and queries
Basic understanding of LLMs and RAG (Retrieval-Augmented Generation) systems
Google Gemini API account with an API key
LlamaCloud Account with an API key
CircleCI Account for deployment automation
Docker for containerization
Google Cloud Platform Account for hosting your application

Take a moment to verify you have access to all these accounts and API keys before proceeding.

Setting up your environment

First, clone the repository with the project code:

git clone https://github.com/CIRCLECI-GWP/circleci-llamaindex-sql-rag-agent.git
cd sql-rag-agent

Next, create and activate a virtual environment to isolate your project dependencies:

python -m venv sql-rag-env
source sql-rag-env/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

The requirements.txt file contains all the necessary packages:

llama-index 
llama_index.llms.google_genai
google-generativeai 
sqlalchemy 
requests
llama_hub
Wikipedia
llama-index-readers-web
pypdf2
streamlit
python-dotenv
pytest

Here’s what each package will be used for in your application:

llama-index: Core framework for building LLM applications with data indexing and retrieval
llama_index.llms.google_genai: Connects LlamaIndex with Google’s Gemini models
google-generativeai: Official Python SDK for accessing Gemini models
sqlalchemy: Handles database operations and structured data access
requests: Makes HTTP requests to external APIs and web services
llama_hub: Provides connectors for various data sources
Wikipedia: Programmatically accesses Wikipedia content
llama-index-readers-web: Extracts and processes web page content
pypdf2: Processes PDF documents
streamlit: Creates the interactive web interface
python-dotenv: Securely manages environment variables and API keys
pytest: Enables testing of your application components

Finally, set up your environment variables by creating a .env file with your API keys:

GOOGLE_API_KEY="your key here"
LLAMA_CLOUD_API_KEY= "your key here"
LLAMA_CLOUD_ORG_ID= "your org ID here"
LLAMA_CLOUD_PROJECT_NAME= "your_project_name_here"
LLAMA_CLOUD_INDEX_NAME= "your_index_name"

Save this file as .env in your project root. The contents of this file will load automatically when your application runs.

Creating the SQL database

For this project, you’ll use a SQLite database containing demographic data about U.S. states. This structured information is perfect for answering factual questions about population, land area, capitals, and more.

The database is pre-created in the repo as states.db. If you’d like to recreate it from scratch, you need to delete the file and then run:

python db_creator.py

This script loads state data from the states.json file and populates the database. The data comes from Back4app and contains comprehensive information about each state. Here is a sample of the data structure for Alabama:

{
      "objectId": "B9IfALchYP",
      "name": "Alabama",
      "flag": "https://upload.wikimedia.org/wikipedia/commons/thumb/5/5c/Flag_of_Alabama.svg/23px-Flag_of_Alabama.svg.png",
      "link": "/wiki/Alabama",
      "postalAbreviation": "AL",
      "capital": "Montgomery",
      "largestCity": "Birmingham",
      "established": "Dec 14, 1819",
      "population": 4874747,
      "totalAreaSquareMiles": 52420,
      "totalAreaSquareKilometers": 135767,
      "landAreaSquareMiles": 50645,
      "... and more"
}

The states.json file is located in your root directory. If you want to recreate the database, run:

python db_creator.py

The db_creator.py script defines a SQLAlchemy model that maps these attributes to database columns:

class State(Base):
    __tablename__ = 'states'

    id = sa.Column(sa.Integer, primary_key=True)
    object_id = sa.Column(sa.String(50), unique=True)
    name = sa.Column(sa.String(50), unique=True, nullable=False)
    flag_url = sa.Column(sa.String(255))
    link = sa.Column(sa.String(100))
    postal_abbreviation = sa.Column(sa.String(2), unique=True)
    capital = sa.Column(sa.String(50))
    largest_city = sa.Column(sa.String(50))
    established = sa.Column(sa.String(50))
    population = sa.Column(sa.Integer)
    # Other demographic columns...

Quantitative fields like population, total_area_square_miles are the most important for our application. The app will generate SQL queries to query these fields automatically.

Setting up the SQL query engine

Now, you will create a natural language to SQL query engine that allows quering the database in plain English. This component handles the structured data portion of your application.

Here is how to initialize the SQL query engine:

# ...
# Create sql engine 
# Initialize LLM
llm = GoogleGenAI(
    model="models/gemini-2.5-pro-exp-03-25", 
    google_api_key=os.getenv("GOOGLE_API_KEY"),
    temperature=0.3
)
Settings.llm = llm

# Connect to the database
engine = sa.create_engine('sqlite:///states.db', future=True)

# Create the SQL database wrapper
sql_database = SQLDatabase(engine, include_tables=["states"])

# Create the natural language to SQL query engine with more context
sql_query_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,
    tables=["states"],
    # sample_rows_in_table_info=2,  # Include sample rows to provide better context
    llm=llm,
    embed_model=llm,
    synthesize_response=True,
    table_schema_str=table_schema_str,
    verbose=True  # This enables debug output to see the generated SQL
)

Here’s what this code does:

Initializes the Google Gemini LLM with a low temperature (0.3) for more deterministic responses
Connects to your SQLite database through SQLAlchemy
Creates a SQL database wrapper that focuses only on the “states” table
Builds the natural language SQL query engine with several important parameters:
- synthesize_response=True: Transforms raw SQL results into natural language answers
- table_schema_str: Provides additional context about the database schema
- verbose=True: Displays the generated SQL for debugging

This engine will translate questions like “What’s the capital of Texas?” into proper SQL queries and execute them against your database.

Configuring LlamaCloud for document retrieval

For questions about state history, attractions, and other descriptive information, you will need a document-based retrieval system. You will use LlamaCloud to index and search through Wikipedia articles about U.S. states.

Getting the Wikipedia content

First, you need to download the Wikipedia pages for a sample of states as PDFs. Run the included script:

python wikipedia_scrapper.py

This creates a state_pdfs folder with 5 pdfs for a sample of US states’ Wikipedia pages. In your own time, you can scale the project to scrape data for all 50 states. You define the sample states in wikipedia_scrapper.py as follows:

states = ["Alabama", "Alaska", "Arizona", "California","New York" ]

The scrapper uses Wikipedia’s REST API. Here is a snippet of its key functionality:

def download_wikipedia_pdf(title, output_path):
    """Download a Wikipedia page as PDF using Wikipedia's PDF export feature"""
    # URL encode the title
    encoded_title = title.replace(' ', '_')

    # Wikipedia's PDF export URL
    url = f"https://en.wikipedia.org/api/rest_v1/page/pdf/{encoded_title}"

    # Add a user agent to avoid being blocked
    headers = {
        'User-Agent': 'Wikipedia PDF Downloader for Research (contact@example.com)'
    }

    # Download the PDF
    response = requests.get(url, headers=headers)

    if response.status_code == 200:
        # Save the PDF
        with open(output_path, 'wb') as f:
            f.write(response.content)
        return True
    else:
        print(f"Error downloading {title}: {response.status_code}")
        return False

Creating an index on LlamaCloud

Now that you have the pdfs ready, you will create a LLamaCloud index to make this content searchable. Follow these steps:

Log in to your LlamaCloud account. LlamaCloud will provide with Default organisation and Default project.
Edit the Default project name:
- Go to Organization settings.
- Click Edit project.
- Name it sql_rag or the name you prefer.
Create a new index:
- Inside your project, click Create Index.
- Name it US-States-Wiki.
- You can upload your PDFs.

Creating an index

Select embedding model and chunking strategy:
- Choose appropriate embedding (for this project you can use Gemini).
- Paste your Gemini Key and validate it.
- Enable multimodal indexing.

Creating an index

Deploy your index

Click Deploy index and wait for the indexing to complete.

Creating an index

Once indexing is complete, you will see a confirmation screen like this:

Successfully indexed documents

You can view your index name, project name, and organization ID. Copy and store the organization ID for future use.

LlamaCloud provides an API key for connecting to your index. To create you API key:

Access the API Key from the bottom-left corner of the dashboard. Click API Key.
Click Generate New Key and enter a descriptive name.
Copy and store your key immediately; you will not be able to access it later. If you misplace it, you can revoke the old key and generate a new one at any time.

You will use the API key to create a query engine for document-based searches:

from llama_index.indices.managed.llama_cloud import LlamaCloudIndex

# Initialize the LlamaCloud index
index = LlamaCloudIndex(
    name=os.getenv("LLAMA_CLOUD_INDEX_NAME"),
    project_name=os.getenv("LLAMA_CLOUD_PROJECT_NAME"),
    organization_id=os.getenv("LLAMA_CLOUD_ORG_ID"),
    api_key=os.getenv("LLAMA_CLOUD_API_KEY")
)
# Create a query engine for the `US-States-Wiki` index
llama_cloud_query_engine = index.as_query_engine()

This code creates a connection to your LlamaCloud index and initializes a query engine that can search through the Wikipedia content about states.

Creating query engine tools

Now that you have both your SQL query engine and LlamaCloud query engine, you need to create tools that your agent will use to route queries to the appropriate data source.

from llama_index.core.tools import QueryEngineTool
# Create a tool for SQL queries
sql_tool = QueryEngineTool.from_defaults(
    query_engine=sql_query_engine,
    description=(
        "Useful for answering factual questions about US states like population, "
        "capital, land size, postal abbreviation, and other demographic statistics "
        "stored in a structured database."
    ),
    name="sql_tool"
)
# Create a tool for document-based queries
llama_cloud_tool = QueryEngineTool.from_defaults(
    query_engine=llama_cloud_query_engine,
    description=(
        f"Useful for answering questions about US states' history, attractions, culture, "
        f"geography, and other information that requires searching through documents. "
        f"This tool contains Wikipedia information about all US states."
    ),
    name="llama_cloud_tool"
)

These tools wrap your query engines with descriptive information that helps the agent understand when to use each one. The prompts are important. They provide context about what information each tool can access and their appropriate questions.

Building the custom router ReAct agent workflow

The heart of your application is the ReAct agent that intelligently routes queries to the appropriate tools. The ReAct (Reasoning + Acting) pattern combines tool usage with self-reflection, making it powerful for query routing.

Understanding the ReAct pattern

ReAct is an advanced agentic pattern for LLM applications. It works through a loop of:

Observation: The agent examines the user’s query and available tools
Thought: The agent reasons about which tool(s) would best answer the query
Action: The agent uses the selected tool(s) to retrieve information
Reflection: The agent evaluates the results and determines if they fully answer the query

This cycle allows the agent to make informed decisions when handling each question. Here is a diagram illustrating the ReAct pattern:

React agentic pattern

The power of ReAct comes from its ability to:

Break complex problems into steps
Choose appropriate tools for each step
Reflect on intermediate results
Combine information from multiple sources
Generate coherent responses from diverse data

Implementing the ReAct agent

Here’s how to implement the ReAct agent with your tools:

# define tools
tools = [sql_tool, llama_cloud_tool]
# Create the agent with tools
from llama_index.core.agent import ReActAgent
agent = ReActAgent.from_tools(
    tools,
    llm=llm,
    verbose=True,
    system_prompt=(
        "You are an expert US States information system. "
        "You have access to two sources of information:\n\n"
        "1. A SQLite database with factual demographic data about US states in the 'states' table "
        "containing fields: object_id, name, flag_url, link, postal_abbreviation, capital, largest_city, "
        "established, population, total_area_square_miles, total_area_square_kilometers, land_area_square_miles, "
        "land_area_square_kilometers, water_area_square_miles, "
        "water_area_square_kilometers, number_representatives, created_at, updated_at\n\n"
        "2. Document retrieval for detailed information about history, attractions, and more\n\n"
        "Choose the appropriate tool based on the user's question. "
        "For the SQL tool, formulate clear SQL queries that match the database schema. "
        "Use the SQL tool for factual queries about population, area, capitals, etc. "
        "Use the document tool for questions about history, attractions, culture, and detailed information. "
        "If needed, you can use both tools and combine the information."
    )
)

The system prompt is important as it:

Defines the agent’s role and capabilities
Describes available data sources in detail
Provides guidance on when to use each tool
Explains how to handle complex queries requiring multiple tools

Create a function to handle user queries:

def answer_state_question(query):
    """
    Answer questions about US states using the appropriate tools.
    Args:
        query (str): User's natural language query
    Returns:
        str: Response to the user's query
    """
    try:
        # Use the agent to answer the query
        response = agent.query(query)
        #return str(response) 
        return response
    except Exception as e:
        return f"Sorry, I couldn't process your query. Error: {str(e)}"

This function passes the user’s query to the ReAct agent, which handles tool selection, query execution, and response generation.

Testing the query router

Now test the query router with different types of questions to see how it routes them to appropriate tools. Ensure that you are working in your virtual environment.

Testing factual queries (SQL tool)

Start with a straightforward factual question that should use the SQL tool:

query = ("What is the population of California?")
print(f"\nQuestion: {query}")
print(f"Answer: {answer_state_question(query)}")

Running this code reveals the ReAct agent’s reasoning process:

How the ReAct agent answers a factual question

The agent:

Recognizes this is a factual question about population
Selects the SQL tool
Generates and executes the appropriate SQL query
Formats the response in natural language

Testing semantic queries (LlamaCloud tool)

Now let us try a question that requires searching through documents:

query = ("Tell me about the history of New York.")
print(f"\nQuestion: {query}")
print(f"Answer: {answer_state_question(query)}")

The ReAct agent correctly routes this to the LlamaCloud tool.

How the ReAct agent answers a semantic question

Testing complex queries (multiple tools)

Finally, let us test a question that requires both factual and contextual information:

query = ("Tell me about the availability of water in Arizona. What is the water area (km2) in Arizona?")
print(f"\nQuestion: {query}")
print(f"Answer: {answer_state_question(query)}")

The ReAct agent uses both tools: first handling the semantic question and then the semantic question.

ReAct agent answers both factual and semantic question

The agent:

Recognizes that this question has multiple parts
Uses the SQL tool for the factual water area data
Uses the LlamaCloud tool for contextual information about water availability
Combines the results into a comprehensive answer

This demonstrates the power of the ReAct pattern. The agent can intelligently combine information from multiple sources to provide complete answers.

Building your Streamlit user interface

Now that your query router is working correctly, you will create a user-friendly Streamlit interface where users can interact with your application. This will provide a chat-like experience for users to ask questions about states.

Creating main app layout

Start with the page configuration and header:

st.set_page_config(
    page_title="US States Information Center",
    page_icon="🇺🇸",
    layout="centered"
)

# App header with logo
st.image("us-flag.png", width=200)
st.title("US States Information Center")

This sets up the basic page structure with a title and logo.

Next, add a brief introduction to help users understand what the app can do:

st.markdown("""
Ask me anything about US states! I can help with:
- Demographic data (population, land area, etc.)
- State capitals and largest cities
- Popular attractions and landmarks
- State history and culture
- And much more!
""")

This clear introduction helps users understand the types of questions they can ask.

Designing query input and result display

To create a chat-like experience, you need to manage conversation history and display messages:

# Initialize chat history
if "chat_history" not in st.session_state:
    st.session_state.chat_history = []

# Display chat history
for message in st.session_state.chat_history:
    with st.chat_message(message["role"]):
        st.write(message["content"])

# Chat input
if prompt := st.chat_input("Ask a question about any US state..."):
    # Display user message
    with st.chat_message("user"):
        st.write(prompt)

    # Add to history
    st.session_state.chat_history.append({"role": "user", "content": prompt})

    # Process the query with spinner
    with st.spinner("Searching for information..."):
        response = answer_state_question(prompt)

    # Display assistant response
    with st.chat_message("assistant"):
        st.write(response)

    # Add to history
    st.session_state.chat_history.append({"role": "assistant", "content": response})

This code:

Initializes and maintains a chat history in the session state
Displays previous messages in the conversation
Provides an input field for new questions
Shows a loading spinner while processing queries
Displays the agent’s response and adds it to the history

Adding example questions

Add a sidebar with example questions to help users get started:

# Sidebar with example questions
with st.sidebar:
    st.header("Example Questions")
    example_questions = [
        "What are popular tourist attractions in Hawaii?",
        "Which state has the largest land area?",
        "Tell me about the history of New York.",
        "Which states border Florida?"
    ]

    st.write("Try asking:")
    for question in example_questions:
        if st.button(question):
            st.session_state["example_question"] = question

When users click an example question, it’s added to the session state so it can be processed in the next interaction.

Running the Streamlit app

Launch and run the streamlit app:

streamlit run app.py

Go to http://localhost:8501/ to see your app running:

Streamlit app running

Try asking different questions about states and observe how the application intelligently routes your queries to the appropriate data sources and provides comprehensive answers.

Developing your deployment workflow

Now that your application is working locally, let’s deploy it to the cloud so others can access it. You will containerize the app with Docker and set up an automated CI/CD pipeline with CircleCI to deploy to Google Cloud Run.

Containerizing with Docker

You will now containerize your application for better portability and cloud deployment.

Create a Dockerfile in your project root.

FROM python:3.12.5

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY app.py .
COPY states.db .
COPY tests/ ./tests/
COPY us-flag.png .

EXPOSE 8080

# Create a non-root user
RUN useradd -m appuser && chown -R appuser /app
USER appuser

# Run app.py when the container launches
CMD ["streamlit", "run", "app.py", "--server.port", "8080"]

This Dockerfile:

Uses a Python 3.12.5 base image
Installs your application dependencies
Copies your application files and database
Exposes port 8080 for web traffic
Creates a non-root user for security
Configures Streamlit to run on port 8080

To build the Docker image execute the following command:

docker build -t sql-rag-agent:latest .

To run the Docker image execute:

docker run -p 8501:8080 \
-e GOOGLE_API_KEY=your_api_key_here \
-e LLAMA_CLOUD_API_KEY=your_llama_key_here \
-e LLAMA_CLOUD_ORG_ID=your_llama_cloud_org_id \
-e LLAMA_CLOUD_PROJECT_NAME=your_llama_project_name \
-e LLAMA_CLOUD_INDEX_NAME=your_llama_index_name \
sql-rag-agent:latest

Make sure to replace the environment variables with actual values before running the command. Go to http://localhost:8501 to verify that your containerized app works correctly.

Creating a CI/CD pipeline with CircleCI

CircleCI will automate the testing, building, and deployment of your application to Google Cloud Run. Before setting up the pipeline, you need to:

Create a CircleCI account if you don’t have one
Connect your GitHub or Bitbucket repository to CircleCI
Set up a Google Cloud project with the necessary APIs enabled
Create a service account with appropriate permissions

Setting up Google Cloud

At this point, you need to have your Google Cloud project set up with a service account. You will also need to enable important permissions and necessary APIs (like Cloud Run API, Artifact Registry API, IAM API). If you have not done so, refer to the official Google documentation on creating and managing service accounts for a guide.

For Google Cloud setup:

Create or select a Google Cloud project
Enable these APIs:
- Cloud Run Admin API
- Artifact Registry API
- IAM API
Create a service account with these roles:
- Artifact Registry Admin (roles/artifactregistry.admin)
- Service Account User (roles/iam.serviceAccountUser)
- Cloud Run Admin (roles/run.admin)
Download the service account key JSON file (base64)

Setting up CircleCI

CircleCI offers Contexts for advanced environmental variable management. You can also set environmental variables for each project. Read more on setting environmental variables in CircleCI. Contexts allow you to create reusable sets of environment variables that can be applied to specific workflows.

To create and use contexts:

Go to your CircleCI dashboard
Click Organization Settings
Select Contexts
Click Create Context
Give your context a name - for example gcp_deploy
Add environment variables to this context:
- GOOGLE_CLOUD_KEYFILE_JSON: Your base64-encoded service account key
- GOOGLE_CLOUD_PROJECT: Your GCP project ID
- GOOGLE_API_KEY: Your Google API key
- LLAMA_CLOUD_API_KEY: Your LlamaCloud API Key
- LLAMA_CLOUD_ORG_ID: Your LlamaCloud Organization ID
- LLAMA_CLOUD_PROJECT_NAME: Your LlamaCloud Project name
- LLAMA_CLOUD_INDEX_NAME: Your LlamaCloud index name

Then, in your .circleci/config.yml file, you can specify which context to use:

workflows:
  build-deploy:
    jobs:
      - build-and-deploy:
          context: 
           - gcp_deploy

Creating the CircleCI configuration

Create a .circleci/config.yml file in your repository:

version: 2.1
jobs:
  build-and-deploy:
    docker:
      - image: cimg/python:3.12.5
    steps:
      - checkout
      # Enable Docker support
      - setup_remote_docker:
          docker_layer_caching: true
      - run:
          name: Install Google Cloud SDK
          command: |
            curl https://sdk.cloud.google.com | bash > /dev/null 2>&1
            source $HOME/google-cloud-sdk/path.bash.inc
      # Install dependencies and run tests
      - restore_cache:
          keys:
            - v1-dependencies-{{ checksum "requirements.txt" }}
      - run:
          name: Install Dependencies
          command: |
            python -m venv venv
            . venv/bin/activate
            pip install --no-cache-dir -r requirements.txt
      - save_cache:
          paths:
            - ./venv
          key: v1-dependencies-{{ checksum "requirements.txt" }}
      - run:
          name: Run Tests
          command: |
            . venv/bin/activate
            pytest tests/ -v
      # Create repo and build container in Artifacts Registry
      - run:
          name: Authenticate Google Cloud
          command: |
            export PATH=$HOME/google-cloud-sdk/bin:$PATH
            echo $GOOGLE_CLOUD_KEYFILE_JSON | base64 -d > ${HOME}/gcloud-service-key.json
            gcloud auth activate-service-account --key-file=${HOME}/gcloud-service-key.json
            gcloud config set project $GOOGLE_CLOUD_PROJECT
            gcloud auth configure-docker us-docker.pkg.dev
      - run:
          name: Create Artifact Registry Repository
          command: |
            export PATH=$HOME/google-cloud-sdk/bin:$PATH
            if ! gcloud artifacts repositories describe images --location=us-central1 --project=$GOOGLE_CLOUD_PROJECT > /dev/null 2>&1; then
              gcloud artifacts repositories create images \
                --repository-format=docker \
                --location=us-central1 \
                --project=$GOOGLE_CLOUD_PROJECT
            fi
      - run:
          name: Build Docker Image
          command: |
            docker build -t us-central1-docker.pkg.dev/$GOOGLE_CLOUD_PROJECT/images/sql-rag-agent:latest .
      - run:
          name: Docker Login
          command: |
            export PATH=$HOME/google-cloud-sdk/bin:$PATH
            docker login -u _json_key -p "$(cat ${HOME}/gcloud-service-key.json)" us-central1-docker.pkg.dev
      - run:
          name: Push Docker Image to Artifacts Registry
          command: |
            export PATH=$HOME/google-cloud-sdk/bin:$PATH
            docker push us-central1-docker.pkg.dev/$GOOGLE_CLOUD_PROJECT/images/sql-rag-agent:latest
      - run:
          name: Deploy to Google Cloud Run
          command: |
            export PATH=$HOME/google-cloud-sdk/bin:$PATH
            gcloud run deploy sql-rag-agent \
              --image=us-central1-docker.pkg.dev/$GOOGLE_CLOUD_PROJECT/images/sql-rag-agent:latest \
              --platform=managed \
              --region=us-central1 \
              --allow-unauthenticated \
              --update-env-vars GOOGLE_API_KEY=$GOOGLE_API_KEY,LLAMA_CLOUD_API_KEY=$LLAMA_CLOUD_API_KEY,LLAMA_CLOUD_ORG_ID=$LLAMA_CLOUD_ORG_ID,LLAMA_CLOUD_PROJECT_NAME=$LLAMA_CLOUD_PROJECT_NAME,LLAMA_CLOUD_INDEX_NAME=$LLAMA_CLOUD_INDEX_NAME

workflows:
  build-deploy:
    jobs:
      - build-and-deploy:
          context: 
           - gcp_deploy

This configuration:

Sets up a Python environment
Runs your tests
Configures the Google Cloud SDK
Builds and pushes your Docker image
Deploys your application to Google Cloud Run

Triggering the CI/CD pipeline

To trigger the pipeline, push your code to your repository:

git add .
git commit -m "Initial implementation of query routing agent"
git push origin main

CircleCI will detect the push and automatically start the build and deployment process. You can monitor the progress on the CircleCI dashboard:

CircleCI successful build process

Once deployment completes, go to the Google Cloud Console, navigate to Cloud Run, and select your sql-rag-agent service. You’ll find a URL at the top (e.g., https://sql-rag-agent-123456.us-central1.run.app/) where your application is now running and accessible to users.

If you encounter any issues during deployment:

Check your CircleCI build logs for specific errors
Verify that all environment variables are correctly set in the CircleCI context
Ensure your service account has the necessary permissions
Check that all required Google Cloud APIs are enabled.

Monitoring and observability

After deploying your application, it’s important to monitor its performance and usage. Google Cloud Run provides built-in monitoring capabilities that make this easy.

Using Google Cloud Run’s built-in monitoring dashboard

Google Cloud Run automatically collects metrics about your application’s performance without requiring any additional code. To access the monitoring dashboard:

Go to the Google Cloud Console
Navigate to Cloud Run
Select your service (for example sql-rag-agent)
Click the Metrics tab

The dashboard provides visibility into several important metrics:

Request metrics
Resource usage
Container health

Request metrics

Request count: How many queries your application is handling
Latency: How long requests take to process (particularly important for LLM operations)
Error rates: Percentage of requests that fail

Resource usage

CPU utilization: How much processing power your app is using
Memory consumption: How much RAM your application requires
Instance count: How many instances are running to handle load
Billable time: How much time you’re being charged for

Container health

Startup latency: How long it takes for new instances to start
Instance concurrency: How many requests each instance handles simultaneously

These metrics are invaluable for understanding your application’s performance, especially since LLM operations and database queries can cause resource spikes.

Configuring alerts

You can set up alerts to notify you when critical conditions occur:

In the Cloud Run console, click Create Alert
Define conditions based on metrics like:
- Error rate > 5%
- P95 latency > 5 seconds
- Memory usage > 85%
- Sustained CPU usage > 80%
Configure notification channels (email, Slack, PagerDuty, etc.)
Add appropriate documentation for responders

For more details on setting up alerts, check the Google Cloud Run documentation.

Using logs

Cloud Run also provides comprehensive logging for your application:

Access logs through the “Logs” tab in the Cloud Run service
Use Cloud Logging’s Log Explorer for advanced queries
Set up log-based alerts for specific error conditions

The built-in monitoring dashboard gives you full visibility into your application’s performance without writing additional monitoring code.

Security and best practices

When working with LLMs and databases, security is crucial. Here are some best practices to implement in your application:

Never hardcode API keys in your application
Use environment variables or a secure key management system
Validate inputs and restrict database permissions to prevent SQL injection risks
Implement rate limiting in your application to prevent abuse
Set up logging and monitoring to detect unusual patterns that might indicate security issues.

Conclusion

You have now built a powerful query routing agent that can intelligently direct questions to either a SQL database or a document index. This approach combines the strengths of structured and unstructured data retrieval for a seamless user experience.

This architecture is highly flexible and you can be extend it by:

Adding more data sources and tools
Implementing more sophisticated routing logic
Scaling up your document index for more topics(e.g health)
Adding caching to improve performance

Ready to automate?

Now that you have seen how to build a versatile query agent and automated testing and deployment, put it into practice and accelerate your own development pipeline.

New to CircleCI? Sign up for a free account and start building faster today.

Site

Blog

Prerequisites

Setting up your environment

Creating the SQL database

Setting up the SQL query engine

Configuring LlamaCloud for document retrieval

Getting the Wikipedia content

Creating an index on LlamaCloud

Creating query engine tools

Building the custom router ReAct agent workflow

Understanding the ReAct pattern

Implementing the ReAct agent

Testing the query router

Testing factual queries (SQL tool)

Testing semantic queries (LlamaCloud tool)

Testing complex queries (multiple tools)

Building your Streamlit user interface

Creating main app layout

Designing query input and result display

Adding example questions

Running the Streamlit app

Developing your deployment workflow

Containerizing with Docker

Creating a CI/CD pipeline with CircleCI

Setting up Google Cloud

Setting up CircleCI

Creating the CircleCI configuration

Triggering the CI/CD pipeline

Monitoring and observability

Using Google Cloud Run’s built-in monitoring dashboard

Request metrics

Resource usage

Container health

Configuring alerts

Using logs

Security and best practices

Conclusion

Ready to automate?

Similar posts you may enjoy

Enforce type safety with TypeScript checks before deployments

Integrate CircleCI with Railway for automated deployments

Validate CDC data in your CI/CD pipeline using CircleCI