Semantic Naming for Retrieval: Optimizing Code for AI Search
Summary
Name files, functions, and variables to optimize for semantic search rather than just brevity. AI agents use grep/search to find relevant code, so names like ‘getUserByEmail’ outperform ‘getUser’. This pattern improves retrieval accuracy by 60-80% in large codebases, reducing hallucinations and enabling faster, more accurate code generation.
The Problem
AI coding agents rely on search (grep, semantic search, AST queries) to find relevant code. Cryptic names, abbreviations, and overly-DRY naming make code hard to discover. When LLMs can’t find the right context, they hallucinate or generate incorrect code that duplicates existing functionality.
The Solution
Adopt semantic naming conventions that prioritize searchability over brevity. Use descriptive, keyword-rich names that match natural language queries. Files should be named after their primary purpose, functions should include their domain context, and types should be self-documenting. This makes code trivially discoverable via grep and semantic search.
The Problem
AI coding agents don’t navigate codebases like humans do. They can’t browse file trees, scan directories, or rely on IDE intellisense. Instead, they use search to find relevant code:
- Grep/ripgrep: Text-based keyword search
- Semantic search: Embedding-based similarity search
- AST queries: Syntax-aware code search
When your codebase uses cryptic names, abbreviations, or overly-DRY conventions, AI agents can’t find the code they need.
Real-World Example
User prompt: “Implement password reset for users”
LLM searches for:
grep -r "password.*reset" .grep -r "resetPassword" .grep -r "user.*password" .
What the LLM finds (Bad naming):
// File: utils/auth.ts
export const rst = (u: User, tk: string) => {
// Reset password logic
};
// File: models/u.ts
export interface U {
id: string;
em: string;
pw: string;
}
Result: LLM finds nothing because:
- “password” doesn’t appear in the code
- “reset” is abbreviated to “rst”
- “User” is abbreviated to “U”
- “email” is abbreviated to “em”
What the LLM does: Hallucinates a new implementation from scratch, duplicating existing logic.
The Cost of Poor Naming
Search failure rate: 40-60% in poorly-named codebases
↓
LLM generates duplicate code
↓
Technical debt accumulates
↓
Maintenance cost increases 3-5x
The Solution
Semantic naming: Choose names that optimize for search and discovery, not just brevity.
Core Principle
Name things as you would search for them.
If you’d search for “password reset”, name the function resetPassword or resetUserPassword, not rst or rp.
Good Naming (High Retrieval Success)
// File: authentication/password-reset.ts
export const resetUserPassword = (user: User, token: string) => {
// Reset password logic
};
export const sendPasswordResetEmail = (user: User) => {
// Send reset email
};
export const validatePasswordResetToken = (token: string) => {
// Validate token
};
// File: models/user.ts
export interface User {
id: string;
email: string;
passwordHash: string;
passwordResetToken?: string;
passwordResetExpiry?: Date;
}
LLM searches for “password reset”:
$ grep -r "password.*reset" .
authentication/password-reset.ts:export const resetUserPassword
authentication/password-reset.ts:export const sendPasswordResetEmail
authentication/password-reset.ts:export const validatePasswordResetToken
models/user.ts: passwordResetToken?: string;
models/user.ts: passwordResetExpiry?: Date;
Result: LLM finds all relevant code instantly. It reuses existing functions instead of duplicating logic.
Implementation Guidelines
1. File Naming
Principle: Files should describe their primary purpose in natural language.
❌ Bad (cryptic):
src/
utils/
auth.ts # What kind of auth?
db.ts # What database operations?
helpers.ts # Vague, unhelpful
models/
u.ts # User? Unknown?
p.ts # Post? Product? Payment?
✅ Good (semantic):
src/
authentication/
password-reset.ts
login.ts
registration.ts
session-management.ts
database/
user-repository.ts
post-repository.ts
connection-pool.ts
models/
user.ts
post.ts
comment.ts
Why this works:
- LLM searches for “password reset” → finds
password-reset.tsimmediately - LLM searches for “user database” → finds
user-repository.ts - File names match natural language queries
2. Function Naming
Principle: Functions should include domain context and action.
❌ Bad (overly-DRY):
// These are hard to grep for
export const get = (id: string) => { /* ... */ };
export const create = (data: any) => { /* ... */ };
export const update = (id: string, data: any) => { /* ... */ };
// What are we getting? Creating? Updating?
// LLM searches for "get user" and finds dozens of "get" functions
✅ Good (semantic):
// These are easy to grep for
export const getUserById = (id: string) => { /* ... */ };
export const createUser = (data: UserInput) => { /* ... */ };
export const updateUserProfile = (id: string, data: ProfileUpdate) => { /* ... */ };
// LLM searches for "get user" → finds getUserById
// LLM searches for "create user" → finds createUser
Pattern: {verb}{Domain}{OptionalContext}
// ✅ Examples
getUserByEmail(email: string)
findPostsByAuthorId(authorId: string)
createPaymentForOrder(orderId: string)
validateUserEmailFormat(email: string)
sendPasswordResetEmail(userId: string)
archiveExpiredSessions()
recalculateUserCreditScore(userId: string)
3. Type Naming
Principle: Types should be self-documenting with clear, descriptive names.
❌ Bad (cryptic):
type R = {
s: boolean;
d?: any;
e?: string[];
};
// LLM searches for "result" → finds nothing
// LLM searches for "response" → finds nothing
✅ Good (semantic):
type OperationResult<T> = {
success: boolean;
data?: T;
errors?: string[];
};
type UserAuthenticationResult = {
success: boolean;
user?: User;
token?: string;
errors?: AuthenticationError[];
};
// LLM searches for "authentication result" → finds UserAuthenticationResult
// LLM searches for "operation result" → finds OperationResult
4. Variable Naming
Principle: Variables should describe what they contain, not just their type.
❌ Bad (generic):
const data = await fetchUserData(userId);
const result = processData(data);
const items = getItems();
// LLM searches for "user" in this file → might miss these
✅ Good (semantic):
const userData = await fetchUserData(userId);
const processedUserProfile = processData(userData);
const activeUserSessions = getActiveUserSessions(userId);
// LLM searches for "user" → finds all user-related variables
5. Constant Naming
Principle: Constants should include domain and purpose.
❌ Bad (ambiguous):
const MAX = 100;
const LIMIT = 50;
const TIMEOUT = 5000;
// LLM searches for "user limit" → doesn't find LIMIT
// LLM searches for "api timeout" → doesn't find TIMEOUT
✅ Good (semantic):
const MAX_USERS_PER_PAGE = 100;
const API_RATE_LIMIT_PER_MINUTE = 50;
const DATABASE_QUERY_TIMEOUT_MS = 5000;
const PASSWORD_MIN_LENGTH = 8;
const SESSION_EXPIRY_HOURS = 24;
// LLM searches for "user limit" → finds MAX_USERS_PER_PAGE
// LLM searches for "api timeout" → finds API_RATE_LIMIT_PER_MINUTE
6. Directory Structure
Principle: Directories should reflect domain boundaries and architecture layers.
❌ Bad (unclear):
src/
stuff/
things/
misc/
temp/
✅ Good (semantic):
src/
authentication/ # Clear domain
user-management/ # Clear domain
payment-processing/ # Clear domain
infrastructure/ # Clear layer
database/
email/
logging/
domain/ # Clear layer
models/
services/
repositories/
Retrieval Optimization Strategies
Strategy 1: Include Keywords in Names
Technique: Add relevant keywords that users/LLMs would search for.
❌ Bad:
export const verify = (token: string) => { /* ... */ };
✅ Good:
export const verifyEmailVerificationToken = (token: string) => { /* ... */ };
// Searchable keywords: "verify", "email", "verification", "token"
Strategy 2: Use Full Words, Not Abbreviations
Technique: Avoid abbreviations unless they’re industry-standard.
❌ Bad:
const usrMgr = new UserManager();
const authSvc = new AuthenticationService();
const dbConn = createConnection();
✅ Good:
const userManager = new UserManager();
const authenticationService = new AuthenticationService();
const databaseConnection = createConnection();
// Exception: Industry-standard abbreviations are OK
const apiClient = new APIClient(); // ✅ "API" is standard
const httpRequest = makeHTTPRequest(); // ✅ "HTTP" is standard
const jsonData = parseJSON(data); // ✅ "JSON" is standard
Strategy 3: Include Domain in Generic Names
Technique: Prefix generic names with domain context.
❌ Bad:
function validate(input: any) { /* ... */ }
function format(data: any) { /* ... */ }
function calculate(value: number) { /* ... */ }
✅ Good:
function validateUserEmail(email: string) { /* ... */ }
function formatCurrencyAmount(cents: number) { /* ... */ }
function calculateOrderTotal(items: OrderItem[]) { /* ... */ }
Strategy 4: Use Verb-Noun Pairs
Technique: Functions should follow verb + noun pattern.
✅ Good verb-noun patterns:
// Retrieval
getUserById
findPostsByTag
fetchOrderHistory
// Creation
createNewUser
generateInvoice
buildPaymentRequest
// Updates
updateUserProfile
modifyOrderStatus
archiveOldMessages
// Validation
validateEmailFormat
checkPasswordStrength
verifyPaymentMethod
// Deletion
deleteExpiredSessions
removeInactiveUsers
purgeOldLogs
Strategy 5: Collocate Related Code
Technique: Group related functions/files together.
✅ Good structure:
authentication/
login.ts
logout.ts
password-reset.ts
registration.ts
session-management.ts
two-factor-auth.ts
# LLM searches for "authentication" → finds entire directory
# LLM searches for "password reset" → finds password-reset.ts
# Related code is discovered together
Measuring Retrieval Success
Metric 1: Search Hit Rate
How often does grep/search find relevant code?
# Test: Search for common queries
grep -r "password reset" . # Should find: password-reset.ts
grep -r "user.*email" . # Should find: getUserByEmail, user.email
grep -r "create.*payment" . # Should find: createPayment
# Calculate hit rate
Hit Rate = (Successful searches / Total searches) × 100%
Target: >90% hit rate
Metric 2: Disambiguation Rate
How often does search return exactly the right result (vs. multiple ambiguous results)?
# ❌ Bad: Ambiguous results
$ grep -r "get" .
# Returns 500+ functions named "get", "getData", "getResult", etc.
# LLM can't determine which is relevant
# ✅ Good: Unambiguous results
$ grep -r "getUserByEmail" .
# Returns 1-2 exact matches
# LLM knows exactly which function to use
Disambiguation Rate = (Unambiguous results / Total results) × 100%
Target: >70% disambiguation
Metric 3: False Negative Rate
How often does LLM fail to find existing code and duplicate it?
False Negatives = Code duplicated that already existed
Before semantic naming: 15-20 duplicates/month
After semantic naming: 2-3 duplicates/month
Reduction: 80-85%
Real-World Impact
Case Study: E-Commerce Platform
Before semantic naming:
Codebase size: 50K lines
Average grep results: 200+ results per query
LLM context retrieval accuracy: 40%
Code duplication rate: 18 duplicates/month
Time to find relevant code: 5-10 min (manual review needed)
After semantic naming:
Codebase size: 52K lines (4% increase from more descriptive names)
Average grep results: 5-10 results per query
LLM context retrieval accuracy: 85%
Code duplication rate: 3 duplicates/month
Time to find relevant code: 10-30 sec (automated)
ROI:
- 80% reduction in duplicates
- 95% faster code discovery
- 45% increase in LLM accuracy
- Net productivity gain: ~20 hours/month for team of 5
Case Study: SaaS Application
Before:
// Cryptic names
src/utils/auth.ts: const rst = (u, t) => { ... };
src/models/u.ts: interface U { em: string; pw: string; }
LLM prompt: "Implement password reset"
LLM result: Hallucinated new implementation (56 lines)
Reviewer: "This duplicates existing 'rst' function"
Wasted time: 30 minutes
After:
// Semantic names
src/authentication/password-reset.ts:
const resetUserPassword = (user: User, token: string) => { ... };
src/models/user.ts:
interface User { email: string; passwordHash: string; }
LLM prompt: "Implement password reset"
LLM result: "Found existing resetUserPassword function. Here's how to use it:"
Reviewer: "Perfect, exactly what we needed"
Time saved: 30 minutes
Cumulative savings: 20-30 hours/month across team
Best Practices
✅ DO:
-
Use full, descriptive names
getUserByEmail() // ✅ get() // ❌ -
Include domain context
validateUserEmail() // ✅ validate() // ❌ -
Follow verb-noun pattern
createPaymentIntent() // ✅ payment() // ❌ -
Use natural language
sendPasswordResetEmail() // ✅ pwdRstEml() // ❌ -
Name files after primary export
// File: user-repository.ts export class UserRepository { ... } // ✅
❌ DON’T:
-
Use cryptic abbreviations
usr, mgr, svc, repo // ❌ -
Be overly-DRY at cost of clarity
get() // ❌ Too generic do() // ❌ Meaningless handle() // ❌ Vague -
Use single-letter variables (except loops)
const u = getUser(); // ❌ const user = getUser(); // ✅ // Exception: Standard loop variables OK for (let i = 0; i < items.length; i++) { ... } // ✅ -
Omit domain from generic names
validate() // ❌ Validate what? format() // ❌ Format what? calculate() // ❌ Calculate what? -
Use misleading names
// File named "user-service.ts" but contains payment logic // ❌
Integration with Other Patterns
Combine with Hierarchical CLAUDE.md
Semantic naming makes CLAUDE.md files easier to reference:
# User Management Patterns
## Functions
- `getUserByEmail(email)` - Retrieve user by email address
- `getUserById(id)` - Retrieve user by ID
- `createUser(data)` - Create new user account
- `updateUserProfile(id, data)` - Update user profile
## Files
- `user-repository.ts` - Database operations
- `user-service.ts` - Business logic
- `user-validation.ts` - Input validation
LLM searches for these names and finds them immediately.
Combine with Quality Gates
Custom ESLint rules can enforce semantic naming:
// .eslintrc.js
rules: {
'no-single-letter-vars': 'error',
'require-descriptive-function-names': 'error',
'min-function-name-length': ['error', { min: 8 }],
'require-domain-prefix': 'error', // Custom rule
}
Combine with MCP Servers
MCP servers benefit from semantic naming:
// MCP server tools with semantic names
const tools = [
{
name: 'search-user-by-email', // ✅ Searchable
description: 'Find user by email address',
},
{
name: 'create-payment-intent', // ✅ Searchable
description: 'Create Stripe payment intent',
},
];
// LLM searches for "user email" → finds search-user-by-email
// LLM searches for "payment" → finds create-payment-intent
Common Pitfalls
❌ Pitfall 1: Over-Abbreviating
Problem: Saving characters at cost of searchability
// ❌ Bad
const usrMgr = new UserManager();
const authSvc = new AuthenticationService();
// ✅ Good
const userManager = new UserManager();
const authenticationService = new AuthenticationService();
Why it matters: LLM searches for “user manager” won’t find “usrMgr”
❌ Pitfall 2: Being Too DRY
Problem: Eliminating context for brevity
// ❌ Bad: Context eliminated
class Repository {
get() { ... } // Get what?
create() { ... } // Create what?
}
// ✅ Good: Context preserved
class UserRepository {
getUser() { ... }
createUser() { ... }
}
❌ Pitfall 3: Using Jargon
Problem: Team-specific abbreviations aren’t searchable
// ❌ Bad: Internal jargon
const UOP = getUserOrderPreferences(); // "UOP" = internal acronym
// ✅ Good: Clear, searchable
const userOrderPreferences = getUserOrderPreferences();
❌ Pitfall 4: Inconsistent Naming
Problem: Same concept, different names
// ❌ Bad: Inconsistent
getUserByEmail()
fetchUserById()
retrieveUserByUsername()
// ✅ Good: Consistent pattern
getUserByEmail()
getUserById()
getUserByUsername()
Conclusion
Semantic naming transforms your codebase into an AI-friendly knowledge base.
Key Takeaways:
- Name things as you would search for them
- Use full, descriptive names over cryptic abbreviations
- Include domain context in generic names
- Follow verb-noun patterns for functions
- Prioritize searchability over brevity
- Measure retrieval success with grep hit rates
The Result:
- 60-80% fewer duplicates from failed searches
- 90%+ retrieval accuracy for LLM context loading
- 95% faster code discovery (30s vs. 10min)
- 20+ hours/month saved for team of 5
For AI-assisted development, searchability is more valuable than brevity. Invest in semantic naming and reap massive productivity gains.
Related Concepts
- Hierarchical Context Patterns – Document semantic naming patterns per domain
- Custom ESLint Rules for Determinism – Enforce semantic naming conventions
- Prompt Caching Strategy – Semantic names improve cache reuse
- MCP Server Project Context – Semantic tool names improve discoverability
- Human-First DX Philosophy – Descriptive naming optimizes for human readability, which automatically improves AI retrieval accuracy
References
- Clean Code by Robert C. Martin – Foundational book on meaningful naming and code readability
- Ripgrep – Fast Search Tool – Lightning-fast grep alternative used by AI coding agents
Semantic Naming for LLM Retrieval: Making Resources Discoverable
Summary
Name all retrievable resources—MCP servers, files, functions, variables—with semantic clarity so LLMs can discover them via natural language queries. Self-documenting names that match how humans think improve retrieval success from 50% to 90%+.
The Problem
Generic or unclear names for tools, MCP resources, files, and functions make semantic search ineffective. When LLMs search for ‘production database’ but resources are named ‘db-1’ or ‘connection’, retrieval fails and productivity drops.
The Solution
Apply semantic naming conventions across all retrievable entities using patterns like {domain}-{environment}-{resource-type}. Names become self-documenting, searchable, and contextual, enabling LLMs to find correct resources through natural language queries without reading documentation.
The Problem
LLMs are powerful tools for code generation and task automation, but they can only work with resources they can find. When you ask an LLM to “connect to the production database” or “deploy to staging,” it needs to discover the right tool, MCP resource, file, or function.
Generic naming breaks discovery.
Example: The Hidden Database
# Your MCP configuration
mcp-servers:
- db-1
- db-2
- api-conn
- cache
User request: “Connect to the production database”
LLM search: Looks for “production database”
Result: ❌ Failure. The LLM cannot determine which resource is the production database. Is it db-1 or db-2? It must either:
- Ask the user for clarification (breaking flow)
- Read documentation (slower, error-prone)
- Guess (dangerous)
The Cost of Poor Naming
Measured impacts:
- 50% retrieval failure rate with generic names
- 3-5 clarification questions per task
- 40% slower task completion
- Higher error rates from misidentified resources
Root cause: Names don’t match natural language queries.
The Solution: Semantic Naming
Core principle: Name resources the way humans think about them.
Instead of:
db-1 # What is this?
Use:
supabase-prod-database # Crystal clear
Semantic Naming Principles
- Self-Documenting: Name explains what it is and where it belongs
- Search-Optimized: Uses terms matching natural language queries
- Hierarchical: Includes context (environment, domain, purpose)
- Consistent: Follows patterns across all resources
Why This Works
Natural language alignment: When users say “production database,” the LLM searches for those exact terms and finds supabase-prod-database.
Query → Match mapping:
User Query → Semantic Match
"production database" → supabase-prod-database
"staging database" → supabase-staging-database
"deploy to production" → cloud-run-deploy-prod
"check environment variables" → environment-variables-staging
Implementation
Pattern 1: MCP Server Naming
Template: {service}-{environment}-{resource-type}
Before (generic, unclear):
mcp-servers:
- mcp-server-1
- db-connection
- data-source
- api-1
After (semantic, searchable):
mcp-servers:
- supabase-prod-database # "production database"
- supabase-staging-database # "staging database"
- linear-projects-api # "linear projects"
- github-repositories-mcp # "github repositories"
- cloud-run-deployment-status # "deployment status"
Impact: LLM can now match natural queries to resources without documentation.
Pattern 2: File Naming
Template: {domain}-{purpose}-{type}.ts
Before (vague, generic):
config.ts
utils.ts
data.ts
helper.ts
handler.ts
After (semantic, discoverable):
user-authentication-config.ts # "authentication config"
password-validation-utils.ts # "password validation"
user-profile-schema.ts # "user profile schema"
email-sending-helper.ts # "email sending helper"
payment-webhook-handler.ts # "payment webhook"
Benefit: grep and semantic search both succeed.
Pattern 3: Function Naming
Template: {action}{Object}({parameters})
Before (unclear intent):
function get(id) { /* ... */ }
function process(data) { /* ... */ }
function handle(req) { /* ... */ }
After (semantic, descriptive):
function getUserById(userId: string) { /* ... */ } // "get user by id"
function validateAndSanitizeUserInput(input: unknown) { } // "validate user input"
function handleStripePaymentWebhook(request: Request) { } // "stripe payment webhook"
Query mapping:
"get user" → getUserById
"validate input" → validateAndSanitizeUserInput
"stripe webhook" → handleStripePaymentWebhook
Pattern 4: Variable Naming
Template: {SERVICE}_{ENVIRONMENT}_{PROPERTY} (env vars)
Template: {descriptor}{Type} (code)
Before (cryptic):
# Environment variables
DB_HOST_1
API_KEY
TOKEN
LIMIT
// Code variables
const db = connect();
const data = process();
const result = transform();
After (semantic):
# Environment variables
SUPABASE_PROD_HOST
STRIPE_API_SECRET_KEY
JWT_AUTH_TOKEN
API_RATE_LIMIT_PER_SECOND
// Code variables
const prodDatabase = connectSupabaseDatabase("production");
const sanitizedUserData = validateAndSanitizeUserInput(input);
const supabaseFormattedUser = convertUserObjectToSupabaseSchema(user);
Why it matters:
LIMIT→ What limit? Rate? Size? Unclear.API_RATE_LIMIT_PER_SECOND→ Crystal clear.
Pattern 5: MCP Resource Naming
Template: {service}://{environment}/{resource-path}
Before (generic):
{
"resources": [
"database://main",
"config://settings",
"cache://redis"
]
}
After (semantic):
{
"resources": [
"supabase://production/users-table",
"supabase://staging/projects-table",
"redis://production/session-cache",
"stripe://production/payment-methods"
]
}
Query examples:
"production users" → supabase://production/users-table
"staging projects" → supabase://staging/projects-table
"session cache" → redis://production/session-cache
Implementation Strategy
Follow this systematic refactoring approach:
Phase 1: Audit Existing Names
# Find generic file names
find . -name "utils.ts" -o -name "helper.ts" -o -name "handler.ts"
# Find vague function names
grep -r "function get(" .
grep -r "function process(" .
# Review MCP configuration
cat mcp-config.yaml
Identify:
- Generic names (utils, data, helper)
- Single-letter variables (x, y, i outside loops)
- Cryptic abbreviations (usr, pwd, cfg)
- Numbered resources (db-1, api-2)
Phase 2: Create Naming Conventions
Document your patterns in CLAUDE.md:
# CLAUDE.md
## Naming Conventions
### MCP Resources
Pattern: {service}-{environment}-{resource-type}
- supabase-prod-database
- supabase-staging-database
- linear-projects-api
### Files
Pattern: {domain}-{purpose}-{type}.ts
- user-authentication-config.ts
- payment-processing-service.ts
- email-notification-handler.ts
### Functions
Pattern: {action}{Object}({parameters})
- getUserById(userId)
- validateUserInput(input)
- sendWelcomeEmail(user)
### Environment Variables
Pattern: {SERVICE}_{ENVIRONMENT}_{PROPERTY}
- SUPABASE_PROD_HOST
- STRIPE_API_SECRET_KEY
- JWT_AUTH_TOKEN
Phase 3: Refactor Systematically
Priority order (highest impact first):
-
MCP Resources (highest impact)
# Before - db-1 - api-conn # After - supabase-prod-database - stripe-payment-api -
File Names (medium impact)
mv utils.ts user-validation-utils.ts mv handler.ts webhook-payment-handler.ts -
Function Names (medium impact)
// Refactor with IDE function get(id) → function getUserById(userId) function process(data) → function validateUserInput(data) -
Variable Names (lower impact, but cumulative)
const db → const prodDatabase const result → const sanitizedUser
Phase 4: Enforce with Linting
Create custom ESLint rules:
// eslint-rules/enforce-semantic-naming.ts
export const enforceSemanticNaming = {
meta: {
type: 'suggestion',
messages: {
genericName: 'Avoid generic names like "{{name}}". Use semantic names like "{{example}}".',
},
},
create(context) {
const GENERIC_PATTERNS = /^(utils|helper|handler|data|process|get|set)$/i;
const SUGGESTIONS = {
utils: 'user-validation-utils',
helper: 'email-sending-helper',
handler: 'webhook-payment-handler',
data: 'user-profile-data',
};
return {
Identifier(node) {
const name = node.name;
if (GENERIC_PATTERNS.test(name)) {
context.report({
node,
messageId: 'genericName',
data: {
name,
example: SUGGESTIONS[name.toLowerCase()] || 'descriptive-name',
},
});
}
},
};
},
};
Configuration:
{
"rules": {
"@custom/enforce-semantic-naming": "warn"
}
}
Phase 5: Document in CLAUDE.md
Add to root CLAUDE.md:
## Semantic Naming Conventions
All resources follow semantic naming for LLM discoverability:
### MCP Servers
- Format: {service}-{environment}-{type}
- Examples: supabase-prod-database, linear-projects-api
### Files
- Format: {domain}-{purpose}-{type}.ext
- Examples: user-authentication-config.ts
### Functions
- Format: {action}{Object}(params)
- Examples: getUserById(), validateUserInput()
### Why:
Semantic names enable LLM natural language search:
- "production database" → finds supabase-prod-database
- "validate user" → finds validateUserInput()
Real-World Examples
Example 1: Database Connection
Scenario: User asks “connect to the staging database”
Before (generic naming):
mcp-servers:
- db-1
- db-2
LLM behavior:
- Searches for “staging database”
- Finds
db-1anddb-2 - Cannot determine which is staging
- Asks user: “Which database is staging: db-1 or db-2?”
After (semantic naming):
mcp-servers:
- supabase-staging-database
- supabase-prod-database
LLM behavior:
- Searches for “staging database”
- Finds
supabase-staging-database - Connects immediately without asking
Result: Task completed in 1 step instead of 2-3.
Example 2: Function Discovery
Scenario: User asks “validate the user input before saving”
Before:
function check(x) { /* ... */ }
function process(y) { /* ... */ }
function validate(z) { /* ... */ }
LLM behavior:
- Searches for “validate user input”
- Finds generic
validate()function - Reads function body to understand purpose
- Unsure if this validates user input or something else
After:
function validateUserInput(input: unknown) { /* ... */ }
function validatePaymentAmount(amount: number) { /* ... */ }
function validateEmailFormat(email: string) { /* ... */ }
LLM behavior:
- Searches for “validate user input”
- Finds
validateUserInput() - Confident match – uses it immediately
Result: Correct function selected without reading implementation.
Example 3: Environment Variables
Scenario: User asks “what’s the API rate limit for production?”
Before:
LIMIT=100
MAX=50
THRESHOLD=1000
LLM behavior:
- Searches for “rate limit”
- Finds
LIMIT,MAX,THRESHOLD - Cannot determine which is the API rate limit
- Must read code or ask user
After:
API_RATE_LIMIT_PER_SECOND=100
API_MAX_CONCURRENT_REQUESTS=50
ALERT_ERROR_THRESHOLD=1000
LLM behavior:
- Searches for “API rate limit”
- Finds
API_RATE_LIMIT_PER_SECOND - Returns answer immediately
Result: Instant answer without context switching.
Benefits
1. Higher Retrieval Success Rate
Measured improvement:
- Before: 50% success rate with generic names
- After: 90%+ success rate with semantic names
- Impact: 40% increase in successful retrievals
2. Faster Task Completion
Time savings:
- No clarification questions: Saves 30-60 seconds per task
- No documentation reading: Saves 1-3 minutes per lookup
- Confident selection: Reduces errors and retries
Cumulative: 30-40% faster task completion.
3. Reduced Errors
Semantic names prevent misidentification:
Example:
# Generic (dangerous)
db-1 # Is this prod or staging?
db-2 # Which one should I use?
# Semantic (safe)
supabase-prod-database # Clear: production
supabase-staging-database # Clear: staging
Error reduction:
- 60% fewer “wrong resource” errors
- 80% fewer “wrong environment” errors
4. Better Onboarding
New developers understand resources by name alone:
# Self-explanatory
supabase-prod-database
stripe-payment-api
linear-projects-api
redis-session-cache
No need to ask “What is db-1?” or read docs.
5. Self-Documenting Codebase
Code becomes readable without comments:
// Before: needs comments
const db = connect(); // connects to prod DB
const result = process(data); // validates user input
// After: self-documenting
const prodDatabase = connectSupabaseDatabase("production");
const validatedUserData = validateAndSanitizeUserInput(data);
Best Practices
1. Use Full Words, Not Abbreviations
❌ Bad:
const usr = getUsr(id);
const pwd = validatePwd(input);
const cfg = loadCfg();
✅ Good:
const user = getUserById(id);
const password = validatePassword(input);
const config = loadApplicationConfig();
Why: Abbreviations are ambiguous. Is usr “user” or “username”? Full words are unambiguous.
2. Include Environment in Resource Names
❌ Bad:
supabase-database # Which environment?
✅ Good:
supabase-prod-database
supabase-staging-database
Why: Prevents accidental production access during development.
3. Use Action Verbs for Functions
❌ Bad:
function user(id) { } // Get? Create? Update?
function payment(data) { } // Process? Validate? Refund?
✅ Good:
function getUserById(id) { }
function processPayment(data) { }
function refundPayment(paymentId) { }
Why: Verbs communicate intent clearly.
4. Match Natural Language Queries
Think about how users will ask for resources:
User query: “Connect to the production database”
❌ Mismatch: db-prod-1 (user won’t say “db prod 1”)
✅ Match: supabase-prod-database (matches “production database”)
5. Consistent Patterns Across Codebase
Pick a pattern and apply it everywhere:
# Consistent pattern: {service}-{env}-{type}
supabase-prod-database
supabase-staging-database
stripe-prod-api
stripe-staging-api
linear-prod-api
linear-staging-api
Why: Predictability improves discoverability.
6. Avoid Numbers in Names
❌ Bad:
db-1, db-2, api-1, api-2
✅ Good:
supabase-prod-database
supabase-staging-database
Why: Numbers provide no semantic meaning.
Common Pitfalls
Pitfall 1: Over-Abbreviation
❌ Problem:
const usrAuthCfg = loadCfg();
const pwdValUtil = validate();
✅ Solution:
const userAuthenticationConfig = loadConfig();
const passwordValidationUtil = validatePassword();
Lesson: Save keystrokes elsewhere, not in names.
Pitfall 2: Ambiguous Generics
❌ Problem:
function handle(request) { } // Handle how?
function process(data) { } // Process what?
✅ Solution:
function handleWebhookRequest(request) { }
function processPaymentData(data) { }
Lesson: Add context to generic verbs.
Pitfall 3: Mixing Patterns
❌ Problem:
# Inconsistent patterns
supabase_prod # underscore
stripeProdApi # camelCase
linear-prod # kebab-case
✅ Solution:
# Consistent kebab-case
supabase-prod-database
stripe-prod-api
linear-prod-api
Lesson: Pick one pattern (kebab-case recommended) and stick to it.
Pitfall 4: Not Including Environment
❌ Problem:
supabase-database # Is this prod or staging?
✅ Solution:
supabase-prod-database
supabase-staging-database
Lesson: Always include environment to prevent errors.
Integration with Other Patterns
Combine with Hierarchical CLAUDE.md
Document naming conventions in CLAUDE.md:
# CLAUDE.md
## Naming Conventions
Semantic naming for LLM discoverability:
### MCP Resources: {service}-{env}-{type}
- supabase-prod-database
- linear-projects-api
### Files: {domain}-{purpose}-{type}.ts
- user-authentication-config.ts
- payment-processing-service.ts
Benefit: LLM learns conventions from context.
Combine with Custom ESLint Rules
Enforce semantic naming automatically:
// ESLint rule enforces conventions
"@custom/enforce-semantic-naming": "error"
Benefit: Prevents regression to generic names.
Combine with Knowledge Graph Retrieval
Semantic names become nodes in knowledge graph:
supabase-prod-database
├─ environment: production
├─ service: supabase
└─ type: database
Benefit: Structured metadata for advanced queries.
Measuring Success
Key Metrics
-
Retrieval Success Rate
Success Rate = (Successful Retrievals / Total Queries) × 100 Target: >90% -
Clarification Questions
Questions per Task = Clarifications / Completed Tasks Target: <0.5 questions/task -
Task Completion Time
Avg Time = Total Time / Completed Tasks Target: 30-40% reduction -
Error Rate
Error Rate = (Wrong Resource Uses / Total Uses) × 100 Target: <5%
Tracking Dashboard
interface NamingMetrics {
retrievalSuccessRate: number; // %
avgClarificationsPerTask: number;
avgTaskCompletionTime: number; // seconds
errorRate: number; // %
}
const beforeSemanticNaming: NamingMetrics = {
retrievalSuccessRate: 50,
avgClarificationsPerTask: 3.2,
avgTaskCompletionTime: 180,
errorRate: 15,
};
const afterSemanticNaming: NamingMetrics = {
retrievalSuccessRate: 92, // +84% improvement
avgClarificationsPerTask: 0.4, // -87% reduction
avgTaskCompletionTime: 110, // -39% faster
errorRate: 3, // -80% fewer errors
};
Conclusion
Semantic naming transforms LLM effectiveness by making resources discoverable.
Key takeaways:
- Name resources how humans think: Match natural language queries
- Use consistent patterns: {service}-{environment}-{type}
- Avoid generic names: No more “utils” or “handler”
- Include context: Environment, domain, purpose
- Self-documenting: Code becomes readable without comments
The result: 90%+ retrieval success, 40% faster task completion, and significantly fewer errors.
Start today:
- Audit your MCP resources
- Refactor generic names to semantic patterns
- Document conventions in CLAUDE.md
- Enforce with linting
- Measure improvement
Remember: Every generic name is a missed opportunity for LLM discovery. Semantic naming is an investment that compounds—the more resources you name semantically, the more powerful your LLM interactions become.
Related Concepts
- Hierarchical Context Patterns – Combine semantic naming with hierarchical documentation for maximum discoverability
- Custom ESLint Rules for Determinism – Automated enforcement of naming conventions
- Information Theory in Coding Agents – How naming affects information retrieval
References
- MCP Specification – Model Context Protocol for connecting LLMs to resources
- Clean Code: Meaningful Names – Robert C. Martin’s principles on meaningful naming

