β οΈ Educational Workshop: This repository contains demonstration code for AWS re:Invent 2025. Not intended for production deployment without proper security hardening and testing.
Workshop Duration: 2 hours | Hands-on: Parts 1 & 3 (50 min) | Guided Demo: Part 2 (20 min) | Optional: Part 4 (Self-paced)
Build enterprise-grade agentic AI applications with semantic search, multi-agent orchestration, and Model Context Protocol integration. Leverage Amazon Aurora PostgreSQL 17.5 with pgvector 0.8.0, Amazon Bedrock (Claude Sonnet 4 + Titan Text Embeddings v2), and modern full-stack technologies.
start-backend # Terminal 1: FastAPI backend (port 8000) start-frontend # Terminal 2: React frontend (port 5173)Access Points:
- π Frontend:
<CloudFront-URL>/ports/5173/ - π API Docs:
<CloudFront-URL>/ports/8000/docs - π Health:
<CloudFront-URL>/ports/8000/api/health
βββ notebooks/ # Workshop Notebooks (Parts 1-4) β βββ Part_1_Semantic_Search_Foundations_Exercises.ipynb β βββ Part_1_Semantic_Search_Foundations_Solutions.ipynb β βββ Part_2_Context_Management_Custom_Tools_Exercises.ipynb β βββ Part_2_Context_Management_Custom_Tools_Solutions.ipynb β βββ Part_3_Multi_Agent_Orchestration_Exercises.ipynb β βββ Part_3_Multi_Agent_Orchestration_Solutions.ipynb β βββ Part_4_Advanced_Topics_Production_Patterns.ipynb β βββ requirements.txt βββ blaize-bazaar/ # Full-Stack Demo Application β βββ backend/ # FastAPI + Multi-Agent System β β βββ agents/ # Orchestrator, Inventory, Pricing, Recommendation β β βββ services/ # Search, MCP, Bedrock integration β β βββ models/ # Pydantic data models β β βββ app.py # FastAPI application β βββ frontend/ # React + TypeScript UI β β βββ src/ # Components, hooks, services β βββ config/ # MCP server configuration β βββ start-backend.sh β βββ start-frontend.sh βββ data/ # Product catalog datasets β βββ amazon-products-sample.csv βββ scripts/ # Setup & bootstrap scripts βββ bootstrap-environment.sh βββ bootstrap-labs.sh βββ load-database-fast.sh Building semantic search with pgvector 0.8.0 and Aurora PostgreSQL
- Vector embeddings with Amazon Titan Text Embeddings v2 (1024 dimensions)
- HNSW indexing for production-scale similarity search
- Enterprise-tuned indexes (M=16, ef_construction=64)
- Automatic iterative scanning for guaranteed recall
- Session state management with Aurora PostgreSQL
Building custom tools for Aurora PostgreSQL data access with MCP
- Custom tool creation with
@tooldecorator patterns - Trending products, inventory analytics, pricing insights
- Intelligent token counting and context optimization
- Model Context Protocol integration with Strands SDK
Agents as Tools pattern with Strands SDK
- Orchestrator + specialist agents (Inventory, Pricing, Recommendation)
- Claude Sonnet 4 for intelligent query routing and agent coordination
- Agent routing, coordination, and tool selection
- OpenTelemetry distributed tracing
Production deployment patterns and optimization
- Session management at enterprise scale
- Vector quantization strategies (binary, scalar)
- Resilience patterns and error handling
- Cost optimization and performance tuning
Automatic Iterative Scanning eliminates manual tuning and guarantees complete results:
Before (pgvector 0.7.x):
SET hnsw.ef_search = 40; -- Manual tuning required for each query -- Risk: May miss relevant results with strict filters -- Challenge: Different ef_search values needed per use caseAfter (pgvector 0.8.0):
SET hnsw.iterative_scan = 'relaxed_order'; -- Automatically finds all matching results with minimal latency -- Guarantees 100% recall across all queries regardless of filters -- No manual tuning needed for production deployment| Traditional Monolithic Approach | Agents as Tools Pattern |
|---|---|
| Single agent handles all tasks | Orchestrator + specialized agents |
| All capabilities in one codebase | Focused expertise per agent domain |
| Hard to maintain and debug | Independent testing and updates |
| Sequential execution only | Parallel execution possible |
| Difficult to scale | Horizontal scaling per agent type |
Benefits:
- π― Domain expertise - Each agent masters specific capabilities
- π Easy maintenance - Update agents independently
- β‘ Better performance - Optimized per agent type
- π Scalable architecture - Add new agents without refactoring
- π§ͺ Testability - Unit test agents in isolation
Full-stack e-commerce platform demonstrating enterprise-grade agentic AI
Step 1: Split terminal into two panes (side-by-side)
Step 2: Navigate to blaize-bazaar directory in both panes
blaize-bazaarStep 3: Start backend (Left Pane)
start-backend # FastAPI server starts on port 8000 # Wait for "Application startup complete" messageStep 4: Start frontend (Right Pane)
start-frontend # React dev server starts on port 5173 # Opens automatically in browserReact Frontend (TypeScript + Tailwind CSS) β FastAPI Backend (Python 3.13) β β Orchestrator β Specialist Agents β β β Inventory Pricing Recommendation ββββββββββββββ΄βββββββββββββ β Aurora PostgreSQL + pgvector - β¨ Semantic Search: Vector similarity with pgvector 0.8.0 HNSW indexes for natural language queries
- π¬ Conversational AI: Claude Sonnet 4 for intelligent query understanding and agent routing
- π§ MCP Context Manager: Custom tools for Aurora PostgreSQL data access
- π€ Multi-Agent System: Orchestrator + 3 specialist agents (Agents as Tools)
- π Smart Filters: Category, price, rating with real-time filtering
- β‘ Real-time: Autocomplete and quick search results
- π Agent Traces: OpenTelemetry observability for multi-agent workflows
- π― Enterprise-Ready: Cost analysis, security patterns, and monitoring
Table: bedrock_integration.product_catalog
| Column | Type | Index | Description |
|---|---|---|---|
productId | CHAR(10) | PRIMARY KEY | Unique product identifier |
product_description | VARCHAR(500) | GIN | Full product details for text search |
imgUrl | VARCHAR(70) | β | Product image URL |
productURL | VARCHAR(40) | β | Product page URL |
stars | NUMERIC(2,1) | Partial | Rating (1.0-5.0) |
reviews | INTEGER | β | Customer review count |
price | NUMERIC(8,2) | Partial | Price in USD |
category_id | SMALLINT | β | Category identifier |
isBestSeller | BOOLEAN | Partial | Bestseller flag |
boughtInLastMonth | INTEGER | β | Recent purchase count |
category_name | VARCHAR(50) | B-tree | Product category |
quantity | SMALLINT | β | Available stock (0-1000) |
embedding | VECTOR(1024) | HNSW | Titan v2 semantic vector embedding |
-- Vector similarity search (HNSW optimized for 21,704 products) CREATE INDEX idx_product_embedding_hnsw ON product_catalog USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 128); -- Full-text search (GIN for keyword matching) CREATE INDEX idx_product_fts ON product_catalog USING GIN (to_tsvector('english', product_description)); -- Category and price filters CREATE INDEX idx_product_category_name ON product_catalog(category_name); CREATE INDEX idx_product_price ON product_catalog(price) WHERE price > 0; -- Partial indexes for common filters CREATE INDEX idx_product_stars ON product_catalog(stars) WHERE stars >= 4.0; CREATE INDEX idx_product_bestseller ON product_catalog("isBestSeller") WHERE "isBestSeller" = TRUE; -- Composite index for category + price queries CREATE INDEX idx_product_category_price ON product_catalog(category_name, price) WHERE price > 0 AND quantity > 0;POST /api/search Content-Type: application/json { "query": "wireless gaming headphones noise cancellation", "limit": 10, "min_similarity": 0.3, "filters": { "category": "Electronics", "min_price": 50, "max_price": 200, "min_stars": 4.0 } }Response:
{ "results": [ { "productId": "B08XYZ", "product_description": "Premium wireless gaming headset...", "price": 149.99, "stars": 4.5, "reviews": 1243, "similarity": 0.87 } ], "total": 10, "query_time_ms": 45 }Custom tools built with Strands SDK for Aurora PostgreSQL agent integration, enabling intelligent database access and business logic execution.
Custom Tools Implemented:
get_trending_products- Top products by popularity metricscheck_inventory- Real-time stock availability queriesanalyze_pricing- Price trend analysis and insightsget_recommendations- Semantic similarity-based suggestions
Architecture Benefits:
- π Standardized tool interface via MCP specification
- π Reusable across multiple agents
- π Built-in token counting and context management
- β‘ Direct database access with connection pooling
π§ Framework Agnostic Concepts: While this workshop uses Strands SDK for hands-on implementation, the multi-agent patterns and architectural concepts (Agents as Tools, orchestration, specialist agents) apply equally to other frameworks like LangGraph, LangChain, CrewAI, AutoGen, and more. Focus on understanding the patterns - the implementation details are transferable.
Capabilities:
- π§ Intelligent query routing and agent coordination (supports extended thinking with interleaved mode for complex multi-step analysis)
- π Adaptive task routing based on tool responses and context
- π Context-aware agent selection and coordination
- π― Dynamic workflow orchestration
1. Inventory Agent
β Real-time stock monitoring across catalog β Low inventory alerts (threshold: <10 units) β Restocking recommendations with priority levels β Stock availability forecasting2. Recommendation Agent
β Personalized product suggestions via semantic search β Feature-based matching and similarity analysis β Budget-conscious alternatives with price awareness β Cross-category recommendations3. Pricing Agent
β Price trend analysis and historical patterns β Deal identification (discount threshold: >20% off) β Value-for-money rankings and comparisons β Competitive pricing insights| Service | Usage | Estimated Cost |
|---|---|---|
| Amazon Bedrock | ||
| Titan Text Embeddings v2 | ~10K tokens (initial load) | $0.10 |
| Claude Sonnet 4 | ~50K tokens (agent queries) | $1.50 |
| Aurora PostgreSQL | ||
| Storage (10K vectors) | 100 MB | $0.00* |
| I/O Operations | ~1K reads | $0.00* |
*Included in pre-provisioned workshop environment
| Component | Monthly Cost Range | Notes |
|---|---|---|
| Aurora PostgreSQL | $150-600 | Depends on instance family, size, and I/O configuration |
| Bedrock Embeddings | $100 | 100M tokens @ $0.001/1K tokens |
| Bedrock Claude Sonnet 4 | $300 | 100M tokens @ $0.003/1K tokens |
| Data Transfer | $50 | 500 GB outbound from AWS |
| Total | $600-1,050 | Varies based on Aurora configuration |
For Read-Heavy Workloads (Recommended):
- Aurora I/O-Optimized - Zero I/O charges, predictable monthly costs
- Optimized Reads (NVMe-SSD) - Faster query performance with local caching
- Read Replicas - Distribute read load across multiple instances (up to 15)
Cost Optimization Benefits:
- I/O-Optimized eliminates per-request I/O charges (typical savings: 20-40%)
- Optimized Reads reduce network I/O by caching frequently accessed data locally
- Combined approach ideal for vector search workloads with high read volume
Scaling Guidance:
- Start with smaller instances and scale based on actual metrics
- Monitor
ReadLatency,CPUUtilization, andDatabaseConnections - Use Aurora Serverless v2 for variable or unpredictable workloads
- Consider Aurora Global Database for multi-region deployments
- Cache embeddings - Reduce Bedrock calls by 80% with semantic caching
- Aurora Serverless v2 - Auto-scaling for variable workloads (0.5-16 ACU)
- Query result caching - Redis/ElastiCache for frequently accessed data
- Batch processing - Generate embeddings during off-peak hours
- Read replicas - Distribute query load across multiple Aurora instances
β Enable encryption at rest (AES-256 for all data) β Use IAM database authentication (no password rotation needed) β Restrict security groups to application subnets only β Enable automated backups (7-35 day retention period) β Use AWS Secrets Manager for credential management β Enable VPC endpoints for private connectivityβ Input validation on all user queries and API endpoints β SQL injection prevention (parameterized queries only) β Rate limiting per user/IP (default: 100 requests/minute) β API authentication (JWT tokens with expiration) β CORS configuration for production domains β Content Security Policy (CSP) headersβ Bedrock Guardrails for content filtering and safety β PII detection and redaction in user queries β Audit logging for all AI interactions (CloudTrail) β Model access controls via IAM policies β Prompt injection prevention and validation β Token usage monitoring and anomaly detectionBuilt-in distributed tracing for multi-agent workflows:
# Automatic trace capture with context propagation β¨ Agent: Orchestrator Duration: 245ms Tokens: 215 (input: 150, output: 65) Status: Success π€ LLM Call: claude-sonnet-4 Duration: 180ms Model: anthropic.claude-sonnet-4-20250514-v1:0 Temperature: 0.7 π§ Tool: get_trending_products Duration: 45ms Result: 10 products Query: SELECT * FROM product_catalog...Database Metrics:
DatabaseConnections- Active connection countReadLatency/WriteLatency- Query performance (milliseconds)CPUUtilization- Compute resource usage (%)FreeableMemory- Available RAM for caching (GB)VolumeReadIOPs/VolumeWriteIOPs- Disk operations
Application Metrics:
SearchLatency- End-to-end query processing timeAgentInvocations- Agent usage patterns and frequencyBedrockTokens- Token consumption and costsErrorRate- Failed requests and exceptionsCacheHitRate- Embedding cache effectiveness
Custom Dashboards:
# Key Performance Indicators (KPIs) - P50/P95/P99 search latency percentiles - Agent routing accuracy and success rate - Cache hit rate and memory efficiency - Cost per query and daily spend tracking| Alert | Threshold | Action |
|---|---|---|
| High Latency | P95 > 2s | Scale Aurora read replicas |
| Error Rate | > 5% | Page on-call engineer immediately |
| Token Spike | > 2x baseline | Investigate potential abuse or bugs |
| DB Connections | > 80% max | Check for connection leaks |
| Cost Anomaly | > 150% daily budget | Review usage patterns |
# Context-rich structured logging for debugging logger.info( "search_query_executed", query=query, user_id=user_id, latency_ms=latency, results_count=len(results), trace_id=trace_id, similarity_threshold=min_similarity, filters=filters )| Layer | Technologies |
|---|---|
| Database | Aurora PostgreSQL 17.5 β’ pgvector 0.8.0 (HNSW) |
| AI/ML | Amazon Bedrock (Titan Text Embeddings v2, Claude Sonnet 4) |
| Backend | FastAPI β’ Python 3.13 β’ psycopg3 β’ boto3 β’ Pydantic v2 |
| Frontend | React 18 β’ TypeScript 5 β’ Tailwind CSS β’ Vite β’ Lucide Icons |
| Search | HNSW vector indexes β’ Trigram text indexes β’ Cosine similarity |
| Agent Framework | Strands SDK β’ Agents as Tools pattern β’ MCP integration |
| Observability | OpenTelemetry β’ CloudWatch β’ Structured logging |
Database Layer:
- Aurora read replicas for search queries (up to 15 replicas)
- Multi-AZ deployment for high availability
- Cross-region read replicas for global applications
Application Layer:
- Application Load Balancer (ALB) for FastAPI instances
- Auto Scaling Groups (ASG) based on CPU/memory
- CloudFront CDN for React frontend static assets
Database Layer:
- Aurora read replicas for search queries (up to 15 replicas)
- Multi-AZ deployment for high availability
- Cross-region read replicas for global applications
Application Layer:
- Application Load Balancer (ALB) for FastAPI instances
- Auto Scaling Groups (ASG) based on CPU/memory metrics
- CloudFront CDN for React frontend static assets
General Guidance:
- Start with smaller instance sizes and scale based on actual performance metrics
- Monitor key metrics:
ReadLatency,CPUUtilization,DatabaseConnections,FreeableMemory - Scale vertically when consistently hitting >70% CPU or memory utilization
- Consider Aurora Serverless v2 for workloads with variable or unpredictable patterns
Performance Indicators:
- ReadLatency consistently >50ms β Consider larger instance or read replicas
- CPUUtilization sustained >70% β Scale to larger instance size
- DatabaseConnections approaching max β Review connection pooling or scale up
- FreeableMemory <20% of total β Increase instance size for better caching
# Auto-scaling configuration for variable workloads MinCapacity: 0.5 ACU (1 GB RAM) MaxCapacity: 16 ACU (32 GB RAM) AutoPause: true (after 5 minutes of inactivity) ScaleIncrement: 0.5 ACU per scaling stepBenefits:
- Pay only for resources used (per-second billing)
- Automatic scaling based on workload
- Zero infrastructure management overhead
- Aurora PostgreSQL User Guide - Complete reference for Aurora configuration
- Amazon Bedrock Documentation - Foundation models and API reference
- pgvector 0.8.0 Performance Blog - Deep dive into 0.8.0 features
- pgvector GitHub - Open-source vector similarity search extension
- Model Context Protocol (MCP) - Protocol specification and documentation
- AWS Labs MCP Servers - AWS-maintained MCP server implementations
- Strands SDK Documentation - Agent framework and patterns
- DAT409: Implement hybrid search with Aurora PostgreSQL for MCP retrieval [REPEAT]
- DAT428: Build a cost-effective RAG-based gen AI application with Amazon Aurora [REPEAT]
- DAT403: Build a multi-agent AI solution with Amazon Aurora & Bedrock AgentCore
- HNSW Algorithm Paper - Efficient and robust approximate nearest neighbor search
- Agents as Tools Pattern - Multi-agent architecture best practices
If you find this helpful:
- β Star this repository to show support and help others discover it
- π± Fork it to customize for your specific use cases
- π Report issues to help improve the workshop
- π’ Share it with your community and colleagues
- π¬ Contribute - Pull requests welcome for improvements
- Workshop Issues: GitHub Issues
- AWS Support: AWS Support Center
- Community: AWS Database Blog
This library is licensed under the MIT-0 License. See the LICENSE file for details.
Workshop Developed and Tested By:
- Shayon Sanyal - Principal Solutions Architect, AWS | Email: shayons@amazon.com
- AWS Database Specialists - Workshop support team
Special Thanks:
- pgvector community for the amazing open-source extension
- Anthropic for Claude Sonnet 4 capabilities
- AWS Workshop Studio team for platform support