Services / AI Agents & MCP

Autonomous AI agents and MCP servers built for production, not demos

CodeWheel builds AI agents and Model Context Protocol (MCP) servers with authentication boundaries, audit logging, and security testing from day one. Agents that reason deeply, call tools safely, and survive investor diligence—not just work in demos.

We focus on single agents that reason deeply, call tools safely, and integrate into your platform. When a prompt is enough we'll say so—when agents are required, we build them to withstand audits.

Review AI security baseline

Fit check

You’re a fit if…

Production stakes

Agents will touch real systems (billing, support, data pipelines) and must be auditable with rollback and RBAC.

Retrieval + tools

You need semantic search/RAG plus tool calls in one flow, with clear tenant filters and schema validation.

One owner

You want architecture diagrams, MCP server code, and security testing handled by the same person.

What the diagram covers

Agent → MCP server → tools/APIs → RAG/semantic search → response, with checkpoints for auth scopes, tenant filters, logging, and human review. You get the diagram and the implementation notes.

Ready to explore agent feasibility?

Map your agent workflow

Share your workflow in a 30-minute session. We'll map agent architecture, tool requirements, guardrails, and platform touchpoints so you know exactly what to build (or if a prompt will do).

Review broader platform build

What are AI agents?

Practical explanation for business buyers

AI agents are LLM-powered workflows that keep state, call tools, and make decisions across multiple steps (schedule, summarize, update systems). They aren't always the right fit-simple prompts often work. Agents make sense when you need long-running automation, structured tool usage, and human approvals. I keep agents grounded in real objectives, RAG knowledge bases, and clear guardrails so they behave in production.

When you actually need agents

You have multi-step workflows where a single prompt can't manage state or secure MCP tool usage.
You need human-in-the-loop approvals with audit trails.
You must chain RAG retrieval, tool execution, and policy checks without brittle scripts.
You require long-running processes (hours/days) orchestrated across tasks.

Use cases I build

Common use cases for custom AI agents including automated workflows, data analysis, customer support, and system integrations.

Customer support automation

Agents triage tickets, summarize context, and draft responses with human approval.

Document analysis & processing

Autonomous workflows digest PDFs, extract structured data, and push into CRMs or ERPs.

Research & data gathering

Agents crawl knowledge bases, run retrieval, and compile briefings for internal teams.

Workflow automation

Connect Jira, GitHub, Stripe, or HubSpot so agents update systems after reasoning steps.

Integrate agents into the full platform

Agents rarely live alone. They rely on RAG knowledge bases, traditional SaaS features, and secure identity/billing layers. I build the connective tissue so agents are part of your AI platform, not a sidecar demo.

RAG & knowledge base integration

Agents tap into multi-tenant RAG pipelines, retrieval APIs, and eval harnesses that we already deploy. Simplifies agentic RAG and keeps citations accurate.

Tooling & orchestration stack

LangChain, Anthropic Claude, OpenAI GPT, custom orchestrators, Inngest/Temporal, and MCP servers all wired with RBAC, logging, and guardrails described above.

Next.js development with secure MCP integration

AI agents need authentication architecture, not just UI polish. I implement Clerk orgs, OAuth scopes, and per-tool API keys so agents respect tenant boundaries and survive security reviews.

Next.js authentication patterns

Clerk orgs with RBAC per MCP tool, JWT validation in agent middleware, session management for long-running tasks, OAuth refresh flows for third-party APIs.

Why it matters

Most Next.js shops treat AI integration as an afterthought. I design auth architecture first so MCP servers plug into the platform securely.

Vercel Hosting & Deployment for Production AI Agents

Vercel hosting handles MCP servers well-but agents need deployment patterns tuned for external tool calls, streaming responses, and sensitive data.

Vercel deployment details

Separate projects for agent APIs vs dashboards, env var management for 20+ LLM keys, build-time validation of MCP tool schemas, custom deployment hooks testing prompt injection defenses.

Why most deployments fail

Agents introduce new runtime considerations—prompt injection defenses, rate limiting, API key scoping, audit logging, and sandboxed execution. I design deployment pipelines that account for these constraints so launches don't get blocked at security review.

Case studies

Real projects: What we're building and what we've learned

We're an early-stage consulting practice focused on production-grade AI platforms with security testing from day one. These case studies show actual projects-including in-progress work-so you can see how we approach architecture, security, and deployment.

Completed · MCP Platform

Multi-tenant MCP API platform

Project overview: Built a multi-tenant SaaS platform with MCP server integration, Clerk authentication, rate limiting, and production infrastructure on Vercel with comprehensive security testing.

Technical challenges

Row-level security for multi-tenant data isolation
Redis rate limiting per tenant + per tool
MCP server integration with typed auth boundaries
2,000+ automated tests from the first sprint

Architecture decisions

Next.js App Router with TypeScript strict mode
Clerk for org-level authentication and RBAC
Postgres + pgvector for semantic search
Docker sandboxes for untrusted tool execution

Key outcomes

Production deployment with zero downtime
Auth boundaries prevent tenant data leakage
Rate limiting handles API abuse gracefully
Preview environments for every pull request

8 weeks

Build time

2,000+

Security tests

Multi-tenant

Architecture

Building in public: Demonstrates how I handle multi-tenant architecture with security testing from day one.

In progress · RAG

AI-powered SEO analysis platform

Project overview: Real-time keyword research platform using Claude and OpenAI, built with a Next.js frontend and Python API backend. Currently in active development.

Technical challenges

Live sitemap crawling and keyword extraction
Multi-LLM inference orchestration
Async Python workers for crawling + scoring
Intelligent caching to reduce API costs

Architecture decisions

Next.js 14 UI with RSC
Python FastAPI backend for crawling logic
Playwright for automated testing
Vercel deployment with edge functions

Current status

Core keyword analysis features deployed
Sitemap scanning in production preview
Multi-LLM inference pipeline operational
Security testing integrated into CI/CD

Building in public: Shows my approach to multi-LLM orchestration and real-time data processing.

Internal project · Modernization

Rails 5.2 → 7.1 platform modernization

Project overview: Internal capability project to modernize a legacy Rails 5.2 platform. Project was paused before production launch when priorities shifted, but delivered a complete upgrade path, schema cleanup, and working prototype demonstrating modern Rails best practices.

Technical challenges

Legacy Rails 5.2 codebase with years of technical debt
Database schema inconsistencies accumulated over multiple versions
No test coverage for critical business logic
Outdated dependencies blocking security patches

Modernization approach

Upgraded Rails 5.2 → 7.1 with incremental version jumps
Normalized schema and added missing foreign keys
Built RSpec suite covering auth and core workflows
Implemented Postgres row-level security for tenant isolation
Modernized deployment pipeline with CI/CD automation

Deliverables

Working Rails 7.1 prototype with clean schema
Comprehensive upgrade documentation for future launch
Security improvements including row-level access controls
Automated tests enabling confident refactoring

Rails 7.1

Final version

Schema

Normalized

RSpec

Test suite

Rails 5.2 → 7.1 PostgreSQL RSpec Row-level security CI/CD

Project status: Paused before production launch when priorities shifted. Demonstrates our Rails modernization expertise and provides a proven upgrade path for similar legacy platforms.

Want similar results? Let's talk

Transparency note: We're an early-stage consulting practice building in public. These case studies represent real projects-including in-progress work-to show our technical approach and lessons learned.

Engagement flow

1. Workflow mapping. Identify user journeys, desired automation, and human-in-the-loop steps.
2. Tool surface design. Define MCP tools/functions, schemas, auth scopes, and failure modes.
3. Build & guard. Implement agents, prompts, tool servers, and guardrails. Wire logging + alerts.
4. Hardening & rollout. Prompt-injection testing, pen tests, telemetry dashboards, and staged rollout.

Use cases I deliver

Customer support copilots with CRM integrations
DevOps + SRE assistants (runbooks, PagerDuty, GitHub)
AI-accelerated dev pipelines (Cursor/VSCode plugins, tool APIs)
Billing + finance agents (Stripe, Netsuite, QuickBooks)
Internal knowledge copilots with RAG + workflows

MCP Server Architecture & Tool Development

Typed tools, rate limiting per tenant, runtime sandboxes, and audit logging make Model Context Protocol servers production-ready.

Technical stack

Next.js 16+ with App Router, Vercel Edge functions for low-latency routing, Supabase Postgres + pgvector for agent memories, Redis/Upstash for queues, Docker sandboxes for code execution, PostHog instrumentation for every tool invocation.

Security testing

Our RAG security testing approach applies to MCP servers. We attack tool definitions with malicious prompts, test privilege escalation, and validate audit logs.

AI Agent development faq

Common questions

Honest answers to the questions founders ask before building agents and MCP servers.

How do AI agents differ from simple prompts?

Prompts are great for single-turn responses. Agents maintain state, call tools, log actions, and support human approvals. I recommend agents only when workflows truly need those behaviors.

What is the typical timeline for agent development?

Most production-ready agent builds take 6-8 weeks including discovery, tool design, guardrails, observability, and penetration testing. Smaller pilots can ship faster if prerequisites exist.

How do you secure agent tool access?

Every MCP tool has RBAC, OAuth scopes, per-tool API keys, rate limiting, prompt-injection defenses, and logging to SIEM/PostHog. We also add runtime sandboxes when tools execute code.

Do you build custom MCP servers or use existing ones?

Both. I build custom MCP servers when you need typed schemas, tenant-aware logic, or internal APIs. I also integrate Anthropic Claude MCP or LangChain routers when they fit.

Launch secure AI agents

Ready to build autonomous agents that survive production? Book a call to map use cases, tool requirements, and guardrail expectations.

Build or optimize agents Review broader platform

For a deeper look at runtime patterns and guardrails, see our complete guide to AI agent architecture.

Serving companies across the San Francisco Bay Area, Silicon Valley, and remote teams worldwide.