Why Inspect LLM Requests

This project started with a simple question: “Does Claude Code load CLAUDE.md as a system prompt or a user prompt?” Online answers hinted it was added as a user prompt, but I wanted proof—not assumptions.

With agent-based coding tools, context really matters. The system prompt, supplemental context, and the way the application wraps messages can significantly affect how the model behaves. Understanding how an app structures requests is one of the most reliable ways to diagnose odd behavior or improve prompt design.

The most practical way to inspect those requests is to sit between the app and the LLM provider. A proxy lets you log both requests and responses in a controlled way.

This article explains how I built such a proxy and how you can use it to observe the request structure used by Claude Code and Cursor.

Requirements for the Proxy

Claude Code allows configuring a custom endpoint via ANTHROPIC_BASE_URL and ANTHROPIC_API_KEY. The endpoint must implement the Anthropic Messages API.

Cursor works in a similar way: you can set an OpenAI API key and override the Base URL. However, it only supports the OpenAI Chat Completions API and does not allow localhost endpoints—requests must go through Cursor’s servers and point to a public URL.

So the proxy needs to meet three requirements:

Support the Anthropic Messages API
Support the OpenAI Chat Completions API
Be hosted online (not local)

LiteLLM was an ideal fit. It supports multiple providers and exposes both API formats with minimal configuration.

For hosting, I chose Railway because it is easy to deploy for such simple stateless services. And their free tier is good enough for such a small experiment.

As for the backend LLM provider, I used OpenRouter because it supports many models through a single interface—though LiteLLM can work with practically any provider.

Building and Deploying the Proxy

If you only want to run the proxy, you can find the repository with deployment instructions here: https://github.com/Dragnalith/llm-proxy-logger.

The implementation consists of three files:

config.yaml
logger.py
run_proxy.py

Configuration File: `config.yaml`

Before the proxy can forward requests, LiteLLM needs to know which incoming model names map to which real model providers. The config.yaml file defines that routing, the API credentials, and the callback used to log requests.

model_list:
  - model_name: claude-sonnet-4-5-20250929 # For Claude Code
    litellm_params:
      model: openrouter/anthropic/claude-sonnet-4.5
      api_key: os.environ/OPENAI_API_KEY
      base_url: https://openrouter.ai/api/v1
  - model_name: cls-s45 # For Cursor
    litellm_params:
      model: openrouter/anthropic/claude-sonnet-4.5
      api_key: os.environ/OPENAI_API_KEY
      base_url: https://openrouter.ai/api/v1

litellm_settings:
  success_callback: ["logger.log_request"]

general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY

The most important part of this configuration is the model_list. Each entry represents a model name that external tools (Claude CLI or Cursor) will reference. LiteLLM uses this name to determine which real LLM and provider URL to call. The value in model_name must exactly match what the client application will send — otherwise requests will fail silently.

Claude Code requires using the timestamped model identifier (e.g., claude-sonnet-4-5-20250929) rather than a generic alias like claude-sonnet-4-5. Cursor, meanwhile, needs a custom model entry that mirrors the name defined in its settings. During testing, model names containing "claude" caused Cursor to switch its protocol behavior, so a neutral name such as cld-s45 is safer.

Next, the success_callback points to the function in logger.py that receives metadata about each successful LLM request. This is what enables logging.

Finally, master_key defines authentication for the proxy. Without it, anyone with the URL could send requests and consume your upstream API quota. Even though this setup is experimental, enabling authentication prevents accidental exposure of valid credentials.

Logging Requests: `logger.py`

Once the proxy can forward requests, the next step is capturing what those requests look like. The logger.py file defines the callback function referenced in the configuration. LiteLLM invokes this function after each successful request, passing details such as the input messages, the model response, and timing metadata.

async def log_request(kwargs, response_obj, start_time, end_time):
    with open('llm-proxy-logger.log', 'a') as f:
            log_entry = {
                'timestamp': datetime.now().isoformat(),
                'messages': kwargs.get('input')
            }
            f.write(json.dumps(log_entry, indent=2) + ',\n')

The callback determines what gets written to the log. I chose a JSON format because it's readable and easy to parse later, but you can tailor the structure to your needs.

Running the Server: `run_proxy.py`

With configuration and logging in place, the final step is running the proxy. LiteLLM already provides a command-line interface to launch a proxy directly from a configuration file, and in many cases that would be sufficient. However, creating a custom runner offers additional flexibility—for example, exposing helper endpoints or controlling initialization behavior.

import uvicorn
import asyncio
from litellm.proxy.proxy_server import app, initialize

@app.get("/logs")
def serve_logs():
    return FileResponse('llm-proxy-logger.log')

async def startup_event():
    await initialize(config='config.yaml')

if __name__ == "__main__":
    asyncio.run(startup_event())
    uvicorn.run(app, host="0.0.0.0", port=4000, lifespan="on")

In this version, the custom runner adds a /logs endpoint that serves the log file via HTTP. This is particularly useful when deploying to platforms like Railway, where filesystem access is not exposed and log files cannot be retrieved through the UI.

The script initializes LiteLLM using the shared configuration file, then starts a lightweight FastAPI server powered by Uvicorn. Because the proxy is stateless, this setup scales easily and remains inexpensive to run.

Deploy to Railway

Once the proxy files are ready, you can deploy them to Railway. Create a Railway service, set the required environment variables (OPENAI_API_KEY, LITELLM_MASTER_KEY and PORT=4000), then deploy.

railway login
railway init
railway add
railway domain
railway up

The railway domain command assigns a public URL similar to https://<your-service-name>.up.railway.app which becomes your proxy endpoint.

Connecting Claude Code to the Proxy

Before using Claude Code, set the following environment variables:

ANTHROPIC_BASE_URL=https://<your-service-name>.up.railway.app
ANTHROPIC_API_KEY=<your-proxy-master-key>

After that, every Claude request will flow through your proxy.

Connecting Cursor to the Proxy

In the Models section of Cursor settings:

Enable OpenAI API Key and paste your proxy master key.
Enable Override OpenAI Base URL and set: https://<your-service-name>.up.railway.app.
Add a new custom model name matching one from your config.yaml. (Example: cld-s45)

Cursor will now send requests through the proxy when you select your custom model.

Insights from the Logged Requests

Once connected, you can inspect traffic by visiting: https://<your-service-name>.up.railway.app/logs

After logging requests from Claude Code and Cursor, I confirmed:

CLAUDE.md is added to the first message of the conversation and not the system prompt.
Cursor injects rules content in the first message using <rules></rules> tags.
The user prompt appears in a second message wrapped in <user_query></user_query>.
Before that prompt, in the second message, Cursor often includes <additional_data></additional_data> describing recent file state or edits.
Contrary to what I thought, Cursor agent modes like Plan or Ask mode are implemented using a <system_reminder></system_reminder> block that modifies instructions after the user query. And not in the system.
System prompts differ only slightly between modes—mostly in tool guidance or small behavioral nudges.

Observing LLM request from Claude Code and Cursor using LiteLLM as a proxy

Why Inspect LLM Requests

Requirements for the Proxy

Building and Deploying the Proxy

Configuration File: `config.yaml`

Logging Requests: `logger.py`

Running the Server: `run_proxy.py`

Deploy to Railway

Connecting Claude Code to the Proxy

Connecting Cursor to the Proxy

Insights from the Logged Requests

Comments

More from this blog

Introducing toolchains_msvc: a hermetic MSVC toolchain for Bazel

My ideal Japanese teacher in the age of ChatGPT

Command Palette

Why Inspect LLM Requests

Requirements for the Proxy

Building and Deploying the Proxy

Configuration File: config.yaml

Logging Requests: logger.py

Running the Server: run_proxy.py

Deploy to Railway

Connecting Claude Code to the Proxy

Connecting Cursor to the Proxy

Insights from the Logged Requests

Comments

More from this blog

Configuration File: `config.yaml`

Logging Requests: `logger.py`

Running the Server: `run_proxy.py`