Build a versatile query agent with RAG, LlamaIndex, and Google Gemini
Full Stack Engineer

As a developer, you often face the challenge of retrieving information from multiple sources with different structures. What if you could create a single interface that automatically routes queries to the right data source?
Imagine your application needing to answer both “What’s the population of California?” and “What are popular attractions in Hawaii?”. The first question requires precise data from a structured database, while the second benefits from searching through descriptive documents.
In this tutorial you will build a versatile query routing agent that intelligently directs questions to the approapriate data source - whether a SQL database or a document repository or both.
Your application will provide users with a seamless interface where they can ask natural language questions about U.S. states. They will receive accurate and comprehensive answers regardless of the storage location.
You will build a U.S. state information application that:
- Automatically routes queries to the optimal data source
- Queries a SQL database for demographic information (population, land size, etc.)
- Searches document repositories for descriptions, history, and attractions
- Presents results through a clean Streamlit chat interface
Your finished application will provide a conversational interface for users to access both structured and unstructured state information.
Prerequisites
Before you begin, make sure you have:
- Python 3.9 or higher installed on your system
- Basic understanding of Python programming
- Familiarity with SQL databases and queries
- Basic understanding of LLMs and RAG (Retrieval-Augmented Generation) systems
- Google Gemini API account with an API key
- LlamaCloud Account with an API key
- CircleCI Account for deployment automation
- Docker for containerization
- Google Cloud Platform Account for hosting your application
Take a moment to verify you have access to all these accounts and API keys before proceeding.
Setting up your environment
First, clone the repository with the project code:
git clone https://github.com/CIRCLECI-GWP/circleci-llamaindex-sql-rag-agent.git
cd sql-rag-agent
Next, create and activate a virtual environment to isolate your project dependencies:
python -m venv sql-rag-env
source sql-rag-env/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
The requirements.txt file contains all the necessary packages:
llama-index
llama_index.llms.google_genai
google-generativeai
sqlalchemy
requests
llama_hub
Wikipedia
llama-index-readers-web
pypdf2
streamlit
python-dotenv
pytest
Here’s what each package will be used for in your application:
- llama-index: Core framework for building LLM applications with data indexing and retrieval
- llama_index.llms.google_genai: Connects LlamaIndex with Google’s Gemini models
- google-generativeai: Official Python SDK for accessing Gemini models
- sqlalchemy: Handles database operations and structured data access
- requests: Makes HTTP requests to external APIs and web services
- llama_hub: Provides connectors for various data sources
- Wikipedia: Programmatically accesses Wikipedia content
- llama-index-readers-web: Extracts and processes web page content
- pypdf2: Processes PDF documents
- streamlit: Creates the interactive web interface
- python-dotenv: Securely manages environment variables and API keys
- pytest: Enables testing of your application components
Finally, set up your environment variables by creating a .env
file with your API keys:
GOOGLE_API_KEY="your key here"
LLAMA_CLOUD_API_KEY= "your key here"
LLAMA_CLOUD_ORG_ID= "your org ID here"
LLAMA_CLOUD_PROJECT_NAME= "your_project_name_here"
LLAMA_CLOUD_INDEX_NAME= "your_index_name"
Save this file as .env
in your project root. The contents of this file will load automatically when your application runs.
Creating the SQL database
For this project, you’ll use a SQLite database containing demographic data about U.S. states. This structured information is perfect for answering factual questions about population, land area, capitals, and more.
The database is pre-created in the repo as states.db
. If you’d like to recreate it from scratch, you need to delete the file and then run:
python db_creator.py
This script loads state data from the states.json
file and populates the database. The data comes from Back4app and contains comprehensive information about each state. Here is a sample of the data structure for Alabama:
{
"objectId": "B9IfALchYP",
"name": "Alabama",
"flag": "https://upload.wikimedia.org/wikipedia/commons/thumb/5/5c/Flag_of_Alabama.svg/23px-Flag_of_Alabama.svg.png",
"link": "/wiki/Alabama",
"postalAbreviation": "AL",
"capital": "Montgomery",
"largestCity": "Birmingham",
"established": "Dec 14, 1819",
"population": 4874747,
"totalAreaSquareMiles": 52420,
"totalAreaSquareKilometers": 135767,
"landAreaSquareMiles": 50645,
"... and more"
}
The states.json
file is located in your root directory. If you want to recreate the database, run:
python db_creator.py
The db_creator.py
script defines a SQLAlchemy model that maps these attributes to database columns:
class State(Base):
__tablename__ = 'states'
id = sa.Column(sa.Integer, primary_key=True)
object_id = sa.Column(sa.String(50), unique=True)
name = sa.Column(sa.String(50), unique=True, nullable=False)
flag_url = sa.Column(sa.String(255))
link = sa.Column(sa.String(100))
postal_abbreviation = sa.Column(sa.String(2), unique=True)
capital = sa.Column(sa.String(50))
largest_city = sa.Column(sa.String(50))
established = sa.Column(sa.String(50))
population = sa.Column(sa.Integer)
# Other demographic columns...
Quantitative fields like population
, total_area_square_miles
are the most important for our application. The app will generate SQL queries to query these fields automatically.
Setting up the SQL query engine
Now, you will create a natural language to SQL query engine that allows quering the database in plain English. This component handles the structured data portion of your application.
Here is how to initialize the SQL query engine:
# ...
# Create sql engine
# Initialize LLM
llm = GoogleGenAI(
model="models/gemini-2.5-pro-exp-03-25",
google_api_key=os.getenv("GOOGLE_API_KEY"),
temperature=0.3
)
Settings.llm = llm
# Connect to the database
engine = sa.create_engine('sqlite:///states.db', future=True)
# Create the SQL database wrapper
sql_database = SQLDatabase(engine, include_tables=["states"])
# Create the natural language to SQL query engine with more context
sql_query_engine = NLSQLTableQueryEngine(
sql_database=sql_database,
tables=["states"],
# sample_rows_in_table_info=2, # Include sample rows to provide better context
llm=llm,
embed_model=llm,
synthesize_response=True,
table_schema_str=table_schema_str,
verbose=True # This enables debug output to see the generated SQL
)
Here’s what this code does:
- Initializes the Google Gemini LLM with a low temperature (0.3) for more deterministic responses
- Connects to your SQLite database through SQLAlchemy
- Creates a SQL database wrapper that focuses only on the “states” table
- Builds the natural language SQL query engine with several important parameters:
synthesize_response=True
: Transforms raw SQL results into natural language answerstable_schema_str
: Provides additional context about the database schemaverbose=True
: Displays the generated SQL for debugging
This engine will translate questions like “What’s the capital of Texas?” into proper SQL queries and execute them against your database.
Configuring LlamaCloud for document retrieval
For questions about state history, attractions, and other descriptive information, you will need a document-based retrieval system. You will use LlamaCloud to index and search through Wikipedia articles about U.S. states.
Getting the Wikipedia content
First, you need to download the Wikipedia pages for a sample of states as PDFs. Run the included script:
python wikipedia_scrapper.py
This creates a state_pdfs
folder with 5 pdfs for a sample of US states’ Wikipedia pages.
In your own time, you can scale the project to scrape data for all 50 states. You define the sample states in wikipedia_scrapper.py
as follows:
states = ["Alabama", "Alaska", "Arizona", "California","New York" ]
The scrapper uses Wikipedia’s REST API. Here is a snippet of its key functionality:
def download_wikipedia_pdf(title, output_path):
"""Download a Wikipedia page as PDF using Wikipedia's PDF export feature"""
# URL encode the title
encoded_title = title.replace(' ', '_')
# Wikipedia's PDF export URL
url = f"https://en.wikipedia.org/api/rest_v1/page/pdf/{encoded_title}"
# Add a user agent to avoid being blocked
headers = {
'User-Agent': 'Wikipedia PDF Downloader for Research (contact@example.com)'
}
# Download the PDF
response = requests.get(url, headers=headers)
if response.status_code == 200:
# Save the PDF
with open(output_path, 'wb') as f:
f.write(response.content)
return True
else:
print(f"Error downloading {title}: {response.status_code}")
return False
Creating an index on LlamaCloud
Now that you have the pdfs ready, you will create a LLamaCloud index to make this content searchable. Follow these steps:
-
Log in to your LlamaCloud account. LlamaCloud will provide with
Default
organisation andDefault
project. - Edit the
Default
project name:- Go to Organization settings.
- Click Edit project.
- Name it
sql_rag
or the name you prefer.
- Create a new index:
- Inside your project, click Create Index.
- Name it
US-States-Wiki
. - You can upload your PDFs.
- Select embedding model and chunking strategy:
- Choose appropriate embedding (for this project you can use Gemini).
- Paste your
Gemini Key
and validate it. - Enable multimodal indexing.
- Deploy your index
- Click Deploy index and wait for the indexing to complete.
Once indexing is complete, you will see a confirmation screen like this:
You can view your index name, project name, and organization ID. Copy and store the organization ID for future use.
LlamaCloud provides an API key for connecting to your index. To create you API key:
- Access the API Key from the bottom-left corner of the dashboard. Click API Key.
- Click Generate New Key and enter a descriptive name.
- Copy and store your key immediately; you will not be able to access it later. If you misplace it, you can revoke the old key and generate a new one at any time.
You will use the API key to create a query engine for document-based searches:
from llama_index.indices.managed.llama_cloud import LlamaCloudIndex
# Initialize the LlamaCloud index
index = LlamaCloudIndex(
name=os.getenv("LLAMA_CLOUD_INDEX_NAME"),
project_name=os.getenv("LLAMA_CLOUD_PROJECT_NAME"),
organization_id=os.getenv("LLAMA_CLOUD_ORG_ID"),
api_key=os.getenv("LLAMA_CLOUD_API_KEY")
)
# Create a query engine for the `US-States-Wiki` index
llama_cloud_query_engine = index.as_query_engine()
This code creates a connection to your LlamaCloud index and initializes a query engine that can search through the Wikipedia content about states.
Creating query engine tools
Now that you have both your SQL query engine and LlamaCloud query engine, you need to create tools that your agent will use to route queries to the appropriate data source.
from llama_index.core.tools import QueryEngineTool
# Create a tool for SQL queries
sql_tool = QueryEngineTool.from_defaults(
query_engine=sql_query_engine,
description=(
"Useful for answering factual questions about US states like population, "
"capital, land size, postal abbreviation, and other demographic statistics "
"stored in a structured database."
),
name="sql_tool"
)
# Create a tool for document-based queries
llama_cloud_tool = QueryEngineTool.from_defaults(
query_engine=llama_cloud_query_engine,
description=(
f"Useful for answering questions about US states' history, attractions, culture, "
f"geography, and other information that requires searching through documents. "
f"This tool contains Wikipedia information about all US states."
),
name="llama_cloud_tool"
)
These tools wrap your query engines with descriptive information that helps the agent understand when to use each one. The prompts are important. They provide context about what information each tool can access and their appropriate questions.
Building the custom router ReAct agent workflow
The heart of your application is the ReAct agent that intelligently routes queries to the appropriate tools. The ReAct (Reasoning + Acting) pattern combines tool usage with self-reflection, making it powerful for query routing.
Understanding the ReAct pattern
ReAct is an advanced agentic pattern for LLM applications. It works through a loop of:
- Observation: The agent examines the user’s query and available tools
- Thought: The agent reasons about which tool(s) would best answer the query
- Action: The agent uses the selected tool(s) to retrieve information
- Reflection: The agent evaluates the results and determines if they fully answer the query
This cycle allows the agent to make informed decisions when handling each question. Here is a diagram illustrating the ReAct pattern:
The power of ReAct comes from its ability to:
- Break complex problems into steps
- Choose appropriate tools for each step
- Reflect on intermediate results
- Combine information from multiple sources
- Generate coherent responses from diverse data
Implementing the ReAct agent
Here’s how to implement the ReAct agent with your tools:
# define tools
tools = [sql_tool, llama_cloud_tool]
# Create the agent with tools
from llama_index.core.agent import ReActAgent
agent = ReActAgent.from_tools(
tools,
llm=llm,
verbose=True,
system_prompt=(
"You are an expert US States information system. "
"You have access to two sources of information:\n\n"
"1. A SQLite database with factual demographic data about US states in the 'states' table "
"containing fields: object_id, name, flag_url, link, postal_abbreviation, capital, largest_city, "
"established, population, total_area_square_miles, total_area_square_kilometers, land_area_square_miles, "
"land_area_square_kilometers, water_area_square_miles, "
"water_area_square_kilometers, number_representatives, created_at, updated_at\n\n"
"2. Document retrieval for detailed information about history, attractions, and more\n\n"
"Choose the appropriate tool based on the user's question. "
"For the SQL tool, formulate clear SQL queries that match the database schema. "
"Use the SQL tool for factual queries about population, area, capitals, etc. "
"Use the document tool for questions about history, attractions, culture, and detailed information. "
"If needed, you can use both tools and combine the information."
)
)
The system prompt is important as it:
- Defines the agent’s role and capabilities
- Describes available data sources in detail
- Provides guidance on when to use each tool
- Explains how to handle complex queries requiring multiple tools
Create a function to handle user queries:
def answer_state_question(query):
"""
Answer questions about US states using the appropriate tools.
Args:
query (str): User's natural language query
Returns:
str: Response to the user's query
"""
try:
# Use the agent to answer the query
response = agent.query(query)
#return str(response)
return response
except Exception as e:
return f"Sorry, I couldn't process your query. Error: {str(e)}"
This function passes the user’s query to the ReAct agent, which handles tool selection, query execution, and response generation.
Testing the query router
Now test the query router with different types of questions to see how it routes them to appropriate tools. Ensure that you are working in your virtual environment.
Testing factual queries (SQL tool)
Start with a straightforward factual question that should use the SQL tool:
query = ("What is the population of California?")
print(f"\nQuestion: {query}")
print(f"Answer: {answer_state_question(query)}")
Running this code reveals the ReAct agent’s reasoning process:
The agent:
- Recognizes this is a factual question about population
- Selects the SQL tool
- Generates and executes the appropriate SQL query
- Formats the response in natural language
Testing semantic queries (LlamaCloud tool)
Now let us try a question that requires searching through documents:
query = ("Tell me about the history of New York.")
print(f"\nQuestion: {query}")
print(f"Answer: {answer_state_question(query)}")
The ReAct agent correctly routes this to the LlamaCloud tool.
Testing complex queries (multiple tools)
Finally, let us test a question that requires both factual and contextual information:
query = ("Tell me about the availability of water in Arizona. What is the water area (km2) in Arizona?")
print(f"\nQuestion: {query}")
print(f"Answer: {answer_state_question(query)}")
The ReAct agent uses both tools: first handling the semantic question and then the semantic question.
The agent:
- Recognizes that this question has multiple parts
- Uses the SQL tool for the factual water area data
- Uses the LlamaCloud tool for contextual information about water availability
- Combines the results into a comprehensive answer
This demonstrates the power of the ReAct pattern. The agent can intelligently combine information from multiple sources to provide complete answers.
Building your Streamlit user interface
Now that your query router is working correctly, you will create a user-friendly Streamlit interface where users can interact with your application. This will provide a chat-like experience for users to ask questions about states.
Creating main app layout
Start with the page configuration and header:
st.set_page_config(
page_title="US States Information Center",
page_icon="🇺🇸",
layout="centered"
)
# App header with logo
st.image("us-flag.png", width=200)
st.title("US States Information Center")
This sets up the basic page structure with a title and logo.
Next, add a brief introduction to help users understand what the app can do:
st.markdown("""
Ask me anything about US states! I can help with:
- Demographic data (population, land area, etc.)
- State capitals and largest cities
- Popular attractions and landmarks
- State history and culture
- And much more!
""")
This clear introduction helps users understand the types of questions they can ask.
Designing query input and result display
To create a chat-like experience, you need to manage conversation history and display messages:
# Initialize chat history
if "chat_history" not in st.session_state:
st.session_state.chat_history = []
# Display chat history
for message in st.session_state.chat_history:
with st.chat_message(message["role"]):
st.write(message["content"])
# Chat input
if prompt := st.chat_input("Ask a question about any US state..."):
# Display user message
with st.chat_message("user"):
st.write(prompt)
# Add to history
st.session_state.chat_history.append({"role": "user", "content": prompt})
# Process the query with spinner
with st.spinner("Searching for information..."):
response = answer_state_question(prompt)
# Display assistant response
with st.chat_message("assistant"):
st.write(response)
# Add to history
st.session_state.chat_history.append({"role": "assistant", "content": response})
This code:
- Initializes and maintains a chat history in the session state
- Displays previous messages in the conversation
- Provides an input field for new questions
- Shows a loading spinner while processing queries
- Displays the agent’s response and adds it to the history
Adding example questions
Add a sidebar with example questions to help users get started:
# Sidebar with example questions
with st.sidebar:
st.header("Example Questions")
example_questions = [
"What are popular tourist attractions in Hawaii?",
"Which state has the largest land area?",
"Tell me about the history of New York.",
"Which states border Florida?"
]
st.write("Try asking:")
for question in example_questions:
if st.button(question):
st.session_state["example_question"] = question
When users click an example question, it’s added to the session state so it can be processed in the next interaction.
Running the Streamlit app
Launch and run the streamlit app:
streamlit run app.py
Go to http://localhost:8501/
to see your app running:
Try asking different questions about states and observe how the application intelligently routes your queries to the appropriate data sources and provides comprehensive answers.
Developing your deployment workflow
Now that your application is working locally, let’s deploy it to the cloud so others can access it. You will containerize the app with Docker and set up an automated CI/CD pipeline with CircleCI to deploy to Google Cloud Run.
Containerizing with Docker
You will now containerize your application for better portability and cloud deployment.
Create a Dockerfile
in your project root.
FROM python:3.12.5
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py .
COPY states.db .
COPY tests/ ./tests/
COPY us-flag.png .
EXPOSE 8080
# Create a non-root user
RUN useradd -m appuser && chown -R appuser /app
USER appuser
# Run app.py when the container launches
CMD ["streamlit", "run", "app.py", "--server.port", "8080"]
This Dockerfile:
- Uses a Python 3.12.5 base image
- Installs your application dependencies
- Copies your application files and database
- Exposes port 8080 for web traffic
- Creates a non-root user for security
- Configures Streamlit to run on port 8080
To build the Docker image execute the following command:
docker build -t sql-rag-agent:latest .
To run the Docker image execute:
docker run -p 8501:8080 \
-e GOOGLE_API_KEY=your_api_key_here \
-e LLAMA_CLOUD_API_KEY=your_llama_key_here \
-e LLAMA_CLOUD_ORG_ID=your_llama_cloud_org_id \
-e LLAMA_CLOUD_PROJECT_NAME=your_llama_project_name \
-e LLAMA_CLOUD_INDEX_NAME=your_llama_index_name \
sql-rag-agent:latest
Make sure to replace the environment variables with actual values before running the command. Go to http://localhost:8501
to verify that your containerized app works correctly.
Creating a CI/CD pipeline with CircleCI
CircleCI will automate the testing, building, and deployment of your application to Google Cloud Run. Before setting up the pipeline, you need to:
- Create a CircleCI account if you don’t have one
- Connect your GitHub or Bitbucket repository to CircleCI
- Set up a Google Cloud project with the necessary APIs enabled
- Create a service account with appropriate permissions
Setting up Google Cloud
At this point, you need to have your Google Cloud project set up with a service account. You will also need to enable important permissions and necessary APIs (like Cloud Run API, Artifact Registry API, IAM API). If you have not done so, refer to the official Google documentation on creating and managing service accounts for a guide.
For Google Cloud setup:
- Create or select a Google Cloud project
- Enable these APIs:
- Cloud Run Admin API
- Artifact Registry API
- IAM API
- Create a service account with these roles:
- Artifact Registry Admin (
roles/artifactregistry.admin
) - Service Account User (
roles/iam.serviceAccountUser
) - Cloud Run Admin (
roles/run.admin
)
- Artifact Registry Admin (
- Download the service account key JSON file (base64)
Setting up CircleCI
CircleCI offers Contexts for advanced environmental variable management. You can also set environmental variables for each project. Read more on setting environmental variables in CircleCI. Contexts allow you to create reusable sets of environment variables that can be applied to specific workflows.
To create and use contexts:
- Go to your CircleCI dashboard
- Click Organization Settings
- Select Contexts
- Click Create Context
- Give your context a name - for example
gcp_deploy
- Add environment variables to this context:
GOOGLE_CLOUD_KEYFILE_JSON
: Your base64-encoded service account keyGOOGLE_CLOUD_PROJECT
: Your GCP project IDGOOGLE_API_KEY
: Your Google API keyLLAMA_CLOUD_API_KEY
: Your LlamaCloud API KeyLLAMA_CLOUD_ORG_ID
: Your LlamaCloud Organization IDLLAMA_CLOUD_PROJECT_NAME
: Your LlamaCloud Project nameLLAMA_CLOUD_INDEX_NAME
: Your LlamaCloud index name
Then, in your .circleci/config.yml
file, you can specify which context to use:
workflows:
build-deploy:
jobs:
- build-and-deploy:
context:
- gcp_deploy
Creating the CircleCI configuration
Create a .circleci/config.yml
file in your repository:
version: 2.1
jobs:
build-and-deploy:
docker:
- image: cimg/python:3.12.5
steps:
- checkout
# Enable Docker support
- setup_remote_docker:
docker_layer_caching: true
- run:
name: Install Google Cloud SDK
command: |
curl https://sdk.cloud.google.com | bash > /dev/null 2>&1
source $HOME/google-cloud-sdk/path.bash.inc
# Install dependencies and run tests
- restore_cache:
keys:
- v1-dependencies-{{ checksum "requirements.txt" }}
- run:
name: Install Dependencies
command: |
python -m venv venv
. venv/bin/activate
pip install --no-cache-dir -r requirements.txt
- save_cache:
paths:
- ./venv
key: v1-dependencies-{{ checksum "requirements.txt" }}
- run:
name: Run Tests
command: |
. venv/bin/activate
pytest tests/ -v
# Create repo and build container in Artifacts Registry
- run:
name: Authenticate Google Cloud
command: |
export PATH=$HOME/google-cloud-sdk/bin:$PATH
echo $GOOGLE_CLOUD_KEYFILE_JSON | base64 -d > ${HOME}/gcloud-service-key.json
gcloud auth activate-service-account --key-file=${HOME}/gcloud-service-key.json
gcloud config set project $GOOGLE_CLOUD_PROJECT
gcloud auth configure-docker us-docker.pkg.dev
- run:
name: Create Artifact Registry Repository
command: |
export PATH=$HOME/google-cloud-sdk/bin:$PATH
if ! gcloud artifacts repositories describe images --location=us-central1 --project=$GOOGLE_CLOUD_PROJECT > /dev/null 2>&1; then
gcloud artifacts repositories create images \
--repository-format=docker \
--location=us-central1 \
--project=$GOOGLE_CLOUD_PROJECT
fi
- run:
name: Build Docker Image
command: |
docker build -t us-central1-docker.pkg.dev/$GOOGLE_CLOUD_PROJECT/images/sql-rag-agent:latest .
- run:
name: Docker Login
command: |
export PATH=$HOME/google-cloud-sdk/bin:$PATH
docker login -u _json_key -p "$(cat ${HOME}/gcloud-service-key.json)" us-central1-docker.pkg.dev
- run:
name: Push Docker Image to Artifacts Registry
command: |
export PATH=$HOME/google-cloud-sdk/bin:$PATH
docker push us-central1-docker.pkg.dev/$GOOGLE_CLOUD_PROJECT/images/sql-rag-agent:latest
- run:
name: Deploy to Google Cloud Run
command: |
export PATH=$HOME/google-cloud-sdk/bin:$PATH
gcloud run deploy sql-rag-agent \
--image=us-central1-docker.pkg.dev/$GOOGLE_CLOUD_PROJECT/images/sql-rag-agent:latest \
--platform=managed \
--region=us-central1 \
--allow-unauthenticated \
--update-env-vars GOOGLE_API_KEY=$GOOGLE_API_KEY,LLAMA_CLOUD_API_KEY=$LLAMA_CLOUD_API_KEY,LLAMA_CLOUD_ORG_ID=$LLAMA_CLOUD_ORG_ID,LLAMA_CLOUD_PROJECT_NAME=$LLAMA_CLOUD_PROJECT_NAME,LLAMA_CLOUD_INDEX_NAME=$LLAMA_CLOUD_INDEX_NAME
workflows:
build-deploy:
jobs:
- build-and-deploy:
context:
- gcp_deploy
This configuration:
- Sets up a Python environment
- Runs your tests
- Configures the Google Cloud SDK
- Builds and pushes your Docker image
- Deploys your application to Google Cloud Run
Triggering the CI/CD pipeline
To trigger the pipeline, push your code to your repository:
git add .
git commit -m "Initial implementation of query routing agent"
git push origin main
CircleCI will detect the push and automatically start the build and deployment process. You can monitor the progress on the CircleCI dashboard:
Once deployment completes, go to the Google Cloud Console, navigate to Cloud Run, and select your sql-rag-agent
service. You’ll find a URL at the top (e.g., https://sql-rag-agent-123456.us-central1.run.app/
) where your application is now running and accessible to users.
If you encounter any issues during deployment:
- Check your CircleCI build logs for specific errors
- Verify that all environment variables are correctly set in the CircleCI context
- Ensure your service account has the necessary permissions
- Check that all required Google Cloud APIs are enabled.
Monitoring and observability
After deploying your application, it’s important to monitor its performance and usage. Google Cloud Run provides built-in monitoring capabilities that make this easy.
Using Google Cloud Run’s built-in monitoring dashboard
Google Cloud Run automatically collects metrics about your application’s performance without requiring any additional code. To access the monitoring dashboard:
- Go to the Google Cloud Console
- Navigate to Cloud Run
- Select your service (for example
sql-rag-agent
) - Click the Metrics tab
The dashboard provides visibility into several important metrics:
- Request metrics
- Resource usage
- Container health
Request metrics
- Request count: How many queries your application is handling
- Latency: How long requests take to process (particularly important for LLM operations)
- Error rates: Percentage of requests that fail
Resource usage
- CPU utilization: How much processing power your app is using
- Memory consumption: How much RAM your application requires
- Instance count: How many instances are running to handle load
- Billable time: How much time you’re being charged for
Container health
- Startup latency: How long it takes for new instances to start
- Instance concurrency: How many requests each instance handles simultaneously
These metrics are invaluable for understanding your application’s performance, especially since LLM operations and database queries can cause resource spikes.
Configuring alerts
You can set up alerts to notify you when critical conditions occur:
- In the Cloud Run console, click Create Alert
- Define conditions based on metrics like:
- Error rate > 5%
- P95 latency > 5 seconds
- Memory usage > 85%
- Sustained CPU usage > 80%
- Configure notification channels (email, Slack, PagerDuty, etc.)
- Add appropriate documentation for responders
For more details on setting up alerts, check the Google Cloud Run documentation.
Using logs
Cloud Run also provides comprehensive logging for your application:
- Access logs through the “Logs” tab in the Cloud Run service
- Use Cloud Logging’s Log Explorer for advanced queries
- Set up log-based alerts for specific error conditions
The built-in monitoring dashboard gives you full visibility into your application’s performance without writing additional monitoring code.
Security and best practices
When working with LLMs and databases, security is crucial. Here are some best practices to implement in your application:
- Never hardcode API keys in your application
- Use environment variables or a secure key management system
- Validate inputs and restrict database permissions to prevent SQL injection risks
- Implement rate limiting in your application to prevent abuse
- Set up logging and monitoring to detect unusual patterns that might indicate security issues.
Conclusion
You have now built a powerful query routing agent that can intelligently direct questions to either a SQL database or a document index. This approach combines the strengths of structured and unstructured data retrieval for a seamless user experience.
This architecture is highly flexible and you can be extend it by:
- Adding more data sources and tools
- Implementing more sophisticated routing logic
- Scaling up your document index for more topics(e.g health)
- Adding caching to improve performance
Ready to automate?
Now that you have seen how to build a versatile query agent and automated testing and deployment, put it into practice and accelerate your own development pipeline.
New to CircleCI? Sign up for a free account and start building faster today.