AST-Based Code Search: Precision Over False Positives

James Phoenix

The Problem with Text-Based Search

Scenario: You need to find all places where fetchUserData() is called in your codebase.

Using traditional grep:

$ grep -r "fetchUserData" .

./src/api/users.ts:  const userData = await fetchUserData(userId);
./src/api/users.ts:  // TODO: fetchUserData should handle errors better
./src/api/users.ts:  console.log("Calling fetchUserData");
./README.md:The `fetchUserData` function retrieves user data from the API.
./tests/mocks.ts:  fetchUserData: jest.fn(),
./utils/logger.ts:  logger.debug('fetchUserData called', { userId });

The problem:

Line 2: Comment mention (not a call)
Line 3: String literal (not a call)
Line 4: Documentation (not code)
Line 5: Mock definition (not a real call)
Line 6: String in logger (not a call)

Result: Only 1 of 6 matches is the actual function call you’re looking for.

Why Text Search Fails

Text-based tools like grep, ripgrep, and IDE search treat code as plain text, not structured syntax. They can’t distinguish:

Code vs comments: // fetchUserData() matches just like fetchUserData()
Strings vs identifiers: "fetchUserData" matches like fetchUserData
Function calls vs definitions: function fetchUserData() matches like fetchUserData()
Similar names: fetchUserDataById matches a search for fetchUserData

The Cost of False Positives

Time waste:

Manually filtering 10-50% false positives
Re-running searches with more specific patterns
Validating each match individually

Risk of errors:

Overlooking genuine matches hidden among noise
Acting on false positives (e.g., refactoring comments)
Missing edge cases due to cognitive overload

Poor LLM context:

AI coding agents fetch irrelevant snippets
Context window filled with documentation instead of code
Lower quality suggestions due to noisy input

The Solution: AST-Based Search

AST (Abstract Syntax Tree) is the structural representation of code that compilers use. Instead of treating code as text, AST-based search tools parse it into a tree of syntax nodes.

How AST Search Works

Step 1: Code is parsed into an AST

const user = await fetchUserData(userId);

Becomes:

VariableDeclaration
├─ VariableDeclarator
│  ├─ Identifier: "user"
│  └─ AwaitExpression
│     └─ CallExpression
│        ├─ Identifier: "fetchUserData"
│        └─ Arguments
│           └─ Identifier: "userId"

Step 2: Search queries match AST patterns

Instead of searching for the text “fetchUserData”, you search for:

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

★ 4.5/5 rating

306,000+ learners

View Course

CallExpression with callee.name === "fetchUserData"

This matches only function calls, ignoring comments, strings, and other non-code mentions.

Meet ast-grep

ast-grep is a fast, syntax-aware code search and refactoring tool built on tree-sitter parsers.

Installation:

# macOS
brew install ast-grep

# Linux/macOS (cargo)
cargo install ast-grep

# npm
npm install -g @ast-grep/cli

Basic usage:

# Find all calls to fetchUserData
ast-grep --pattern 'fetchUserData($$$)'

# Find all class definitions
ast-grep --pattern 'class $NAME { $$$ }'

# Find all React components with useState
ast-grep --pattern 'const [$STATE, $SETTER] = useState($$$)'

Pattern Syntax

Metavariables

Single metavariable ($VAR): Matches a single AST node

# Find all function calls with any name
ast-grep --pattern '$FUNC($$$)'

Ellipsis ($$$): Matches zero or more nodes

# Find all fetchUserData calls regardless of arguments
ast-grep --pattern 'fetchUserData($$$)'

Examples by Language

TypeScript/JavaScript

Find all async function definitions:

ast-grep --pattern 'async function $NAME($$$) { $$$ }'

Find all destructured useState calls:

ast-grep --pattern 'const [$STATE, $SETTER] = useState($$$)'

Find all try-catch blocks:

ast-grep --pattern 'try { $$$ } catch ($ERR) { $$$ }'

Find all Express route handlers:

ast-grep --pattern 'app.$METHOD($PATH, async ($REQ, $RES) => { $$$ })'

Python

Find all class definitions:

ast-grep --pattern 'class $NAME: $$$'

Find all function calls to a specific function:

ast-grep --pattern 'calculate_total($$$)'

Find all list comprehensions:

ast-grep --pattern '[$EXPR for $VAR in $ITER]'

Rust

Find all match expressions:

ast-grep --pattern 'match $EXPR { $$$ }'

Find all unwrap() calls:

ast-grep --pattern '$EXPR.unwrap()'

Practical Use Cases

Use Case 1: Refactoring Function Calls

Problem: You need to rename fetchUserData to getUserData everywhere it’s called (not in comments or docs).

Text-based approach (error-prone):

# Find all mentions
grep -r "fetchUserData" .

# Manually verify each match
# Replace only the real function calls
# Hope you didn't miss any or change comments by mistake

AST-based approach (precise):

# Find all function calls
ast-grep --pattern 'fetchUserData($$$)' --json

# Rewrite all matches
ast-grep --pattern 'fetchUserData($$$)' \
  --rewrite 'getUserData($$$)' \
  --update-all

Result: Only actual function calls are renamed. Comments, strings, and documentation are untouched.

Use Case 2: Finding Unsafe Patterns

Find all .unwrap() calls in Rust (which can panic):

ast-grep --pattern '$EXPR.unwrap()'

Output:

./src/main.rs:15:  let user = user_result.unwrap();
./src/api.rs:42:   let data = response.json().unwrap();
./src/db.rs:88:    let conn = pool.get().unwrap();

No false positives from:

Comments mentioning “unwrap”
String literals containing “unwrap”
Function definitions named unwrap()

Use Case 3: AI Coding Agent Context

LLM prompt: “Find all places where we make API calls to fetch user data”

Bad (text search):

grep -r "fetch.*user" .

Returns 50+ matches including comments, docs, variable names, etc.

Good (AST search):

# Find all function calls starting with 'fetch' and containing 'user'
ast-grep --pattern '$FUNC($$$)' | grep -i "fetch.*user"

Returns only actual function calls, giving the LLM high-signal context.

Use Case 4: Code Quality Audits

Find all console.log statements (to remove before production):

ast-grep --pattern 'console.log($$$)'

Find all TODO comments (different tool, but conceptually related):

# ast-grep focuses on code structure, not comments
# Use grep for comments, ast-grep for code
grep -r "TODO" .

Find all uses of any type in TypeScript:

ast-grep --pattern '$VAR: any'

Integration with AI Coding Workflows

Pattern 1: Precise Context Retrieval

When an LLM needs to understand how a function is used:

# Instead of:
grep -r "processPayment" .

# Use:
ast-grep --pattern 'processPayment($$$)' --json | jq

This gives the LLM only actual usages, not documentation or comments.

Pattern 2: Pre-Refactoring Validation

Before asking an LLM to refactor:

# Find all actual call sites
ast-grep --pattern 'oldFunctionName($$$)' > call_sites.txt

# Provide to LLM as context
LLM_CONTEXT=$(cat call_sites.txt)
llm "Refactor these call sites to use newFunctionName: $LLM_CONTEXT"

Pattern 3: Codebase Understanding

Generate a map of how functions are used:

# Find all function definitions
ast-grep --pattern 'function $NAME($$$) { $$$ }' --json > functions.json

# Find all function calls
ast-grep --pattern '$FUNC($$$)' --json > calls.json

# Analyze with LLM
llm "Given these functions and calls, create a dependency graph"

Advanced Patterns

Combining Filters

Find all async functions that use try-catch:

ast-grep --pattern 'async function $NAME($$$) { $$$ try { $$$ } catch { $$$ } $$$ }'

Language-Specific Queries

React: Find all components using useEffect with empty deps:

ast-grep --pattern 'useEffect($$$, [])'

Python: Find all functions with type hints:

ast-grep --pattern 'def $NAME($$$) -> $TYPE: $$$'

Custom Rules

Create a config file for reusable patterns:

# .ast-grep/rules.yml
rules:
  - id: no-console-log
    pattern: console.log($$$)
    message: Remove console.log before production
    severity: warning

  - id: prefer-const
    pattern: let $VAR = $VALUE
    message: Use const instead of let for immutable values
    severity: info

Run all rules:

ast-grep scan

Performance Comparison

Benchmark: Search for function calls in a 100k LOC TypeScript project

Tool	Time	False Positives	True Positives
grep	0.5s	45%	100%
ripgrep	0.2s	45%	100%
ast-grep	1.2s	0%	100%

Tradeoff: ast-grep is slower (parsing overhead) but eliminates false positives.

When to use each:

grep/ripgrep: Quick, fuzzy searches; searching strings/comments
ast-grep: Precise code structure queries; refactoring; AI context

Best Practices

1. Start Simple, Refine Iteratively

# Start broad
ast-grep --pattern 'fetch$$$($$$)'

# Refine to specific pattern
ast-grep --pattern 'fetchUserData($$$)'

# Further refine with filters
ast-grep --pattern 'await fetchUserData($$$)'

2. Use JSON Output for Programmatic Processing

ast-grep --pattern 'useState($$$)' --json | jq '.[] | .file'

3. Combine with Traditional Tools

# Find all API calls in src/api/
ast-grep --pattern 'fetch($$$)' src/api/ --json

4. Test Patterns on Small Files First

Before running on entire codebase:

ast-grep --pattern 'your-pattern' src/example.ts

5. Document Common Patterns

Create a team wiki or README with common ast-grep patterns:

## Common AST Patterns

### Find all API calls
`ast-grep --pattern 'fetch($$$)'`

### Find all React hooks
`ast-grep --pattern 'use$HOOK($$$)'`

### Find all async functions
`ast-grep --pattern 'async function $NAME($$$) { $$$ }'`

Limitations

1. Requires Valid Syntax

ast-grep can’t parse files with syntax errors:

$ ast-grep --pattern 'fetchUserData($$$)' broken-file.ts
Error: Parse error at line 42

Workaround: Fix syntax errors first, or use text search as fallback.

2. Language Support

Supported via tree-sitter:

JavaScript/TypeScript
Python
Rust
Go
C/C++
Ruby
Java
And 40+ more

Check: https://github.com/ast-grep/ast-grep#supported-languages

3. Learning Curve

Pattern syntax takes time to learn:

Easy: fetchUserData($$$) (basic call)
Medium: const [$A, $B] = useState($$$) (destructuring)
Hard: Complex nested patterns

Tip: Start with simple patterns, gradually increase complexity.

Alternatives

Semgrep

Similar tool with more focus on security/linting:

semgrep --pattern 'fetchUserData(...)'

Comparison:

Semgrep: Better for security rules, multi-language support
ast-grep: Faster, simpler syntax, better for refactoring

Comby

Structural code rewriting:

comby 'fetchUserData(:[args])' 'getUserData(:[args])' .

Comparison:

Comby: Language-agnostic (regex-like)
ast-grep: Language-aware (AST-based)

IDE Structural Search

IntelliJ IDEA, VS Code extensions offer structural search:

Pros: Integrated into IDE, visual interface
Cons: Not scriptable, not usable in CI/CD

Measuring Success

Key metrics:

False positive rate: Should be 0% for well-formed patterns
Time saved: Compare manual filtering vs. ast-grep precision
Refactoring confidence: Can you trust the results for automated rewrites?

Before ast-grep:

Search time: 30 seconds
Manual filtering: 5 minutes
False positives: 20-40%
Confidence: Low (manual verification required)

After ast-grep:

Search time: 2 minutes (parsing overhead)
Manual filtering: 0 seconds
False positives: 0%
Confidence: High (safe for automated refactoring)

Conclusion

Text-based search tools are fast but imprecise. AST-based search trades a bit of speed for perfect precision:

Use ast-grep when you need:

Zero false positives
Automated refactoring
High-quality LLM context
Structural code analysis

Use grep/ripgrep when you need:

Speed over precision
Fuzzy matching
Searching comments/docs
Quick exploration

The best approach: Use both. Start with grep for exploration, refine with ast-grep for precision.

Next steps:

Install ast-grep: brew install ast-grep
Try a simple pattern: ast-grep --pattern 'console.log($$$)'
Create a .ast-grep/rules.yml for your team’s common patterns
Integrate into CI/CD for code quality checks

By understanding code structure instead of treating it as text, AST-based search eliminates false positives and enables confident, automated refactoring—exactly what you need when working with AI coding agents.

Related Concepts

Custom ESLint Rules for AI Determinism – AST-based linting rules that enforce architectural patterns
Playwright Script Loop – Generate validation scripts for faster feedback cycles
Agentic Tool Detection – Detect tool availability before workflows
Evaluation-Driven Development – Self-healing test loops with AI vision
Test Custom Infrastructure – Avoid the house on stilts by testing tooling
Semantic Naming for Retrieval – Name patterns for agentic retrieval
Integration Testing Patterns – High-signal tests for LLM-generated code

AST-Based Code Search: Precision Over False Positives

The Problem with Text-Based Search

Why Text Search Fails

The Cost of False Positives

The Solution: AST-Based Search

How AST Search Works

Learn Prompt Engineering

Meet ast-grep

Pattern Syntax

Metavariables

Examples by Language

TypeScript/JavaScript

Python

Rust

Practical Use Cases

Use Case 1: Refactoring Function Calls

Use Case 2: Finding Unsafe Patterns

Use Case 3: AI Coding Agent Context

Use Case 4: Code Quality Audits

Integration with AI Coding Workflows

Pattern 1: Precise Context Retrieval

Pattern 2: Pre-Refactoring Validation

Pattern 3: Codebase Understanding

Advanced Patterns

Combining Filters

Language-Specific Queries

Custom Rules

Performance Comparison

Best Practices

1. Start Simple, Refine Iteratively

2. Use JSON Output for Programmatic Processing

3. Combine with Traditional Tools

4. Test Patterns on Small Files First

5. Document Common Patterns

Limitations

1. Requires Valid Syntax

2. Language Support

3. Learning Curve

Alternatives

Semgrep

Comby

IDE Structural Search

Measuring Success

Conclusion

Related Concepts

More Insights

LLM VCR and Agent Trace Hierarchy: Deterministic Replay for Agent Pipelines

Agent Search Observation Loop: Learning What Context to Provide