AST-Based Code Search: Precision Over False Positives

James Phoenix
James Phoenix

The Problem with Text-Based Search

Scenario: You need to find all places where fetchUserData() is called in your codebase.

Using traditional grep:

$ grep -r "fetchUserData" .

./src/api/users.ts:  const userData = await fetchUserData(userId);
./src/api/users.ts:  // TODO: fetchUserData should handle errors better
./src/api/users.ts:  console.log("Calling fetchUserData");
./README.md:The `fetchUserData` function retrieves user data from the API.
./tests/mocks.ts:  fetchUserData: jest.fn(),
./utils/logger.ts:  logger.debug('fetchUserData called', { userId });

The problem:

  • Line 2: Comment mention (not a call)
  • Line 3: String literal (not a call)
  • Line 4: Documentation (not code)
  • Line 5: Mock definition (not a real call)
  • Line 6: String in logger (not a call)

Result: Only 1 of 6 matches is the actual function call you’re looking for.

Why Text Search Fails

Text-based tools like grep, ripgrep, and IDE search treat code as plain text, not structured syntax. They can’t distinguish:

  • Code vs comments: // fetchUserData() matches just like fetchUserData()
  • Strings vs identifiers: "fetchUserData" matches like fetchUserData
  • Function calls vs definitions: function fetchUserData() matches like fetchUserData()
  • Similar names: fetchUserDataById matches a search for fetchUserData

The Cost of False Positives

Time waste:

  • Manually filtering 10-50% false positives
  • Re-running searches with more specific patterns
  • Validating each match individually

Risk of errors:

  • Overlooking genuine matches hidden among noise
  • Acting on false positives (e.g., refactoring comments)
  • Missing edge cases due to cognitive overload

Poor LLM context:

  • AI coding agents fetch irrelevant snippets
  • Context window filled with documentation instead of code
  • Lower quality suggestions due to noisy input

The Solution: AST-Based Search

AST (Abstract Syntax Tree) is the structural representation of code that compilers use. Instead of treating code as text, AST-based search tools parse it into a tree of syntax nodes.

How AST Search Works

Step 1: Code is parsed into an AST

const user = await fetchUserData(userId);

Becomes:

VariableDeclaration
├─ VariableDeclarator
│  ├─ Identifier: "user"
│  └─ AwaitExpression
│     └─ CallExpression
│        ├─ Identifier: "fetchUserData"
│        └─ Arguments
│           └─ Identifier: "userId"

Step 2: Search queries match AST patterns

Instead of searching for the text “fetchUserData”, you search for:

CallExpression with callee.name === "fetchUserData"

This matches only function calls, ignoring comments, strings, and other non-code mentions.

Meet ast-grep

ast-grep is a fast, syntax-aware code search and refactoring tool built on tree-sitter parsers.

Installation:

# macOS
brew install ast-grep

# Linux/macOS (cargo)
cargo install ast-grep

# npm
npm install -g @ast-grep/cli

Basic usage:

# Find all calls to fetchUserData
ast-grep --pattern 'fetchUserData($$$)'

# Find all class definitions
ast-grep --pattern 'class $NAME { $$$ }'

# Find all React components with useState
ast-grep --pattern 'const [$STATE, $SETTER] = useState($$$)'

Pattern Syntax

Metavariables

Single metavariable ($VAR): Matches a single AST node

# Find all function calls with any name
ast-grep --pattern '$FUNC($$$)'

Ellipsis ($$$): Matches zero or more nodes

# Find all fetchUserData calls regardless of arguments
ast-grep --pattern 'fetchUserData($$$)'

Examples by Language

TypeScript/JavaScript

Find all async function definitions:

ast-grep --pattern 'async function $NAME($$$) { $$$ }'

Find all destructured useState calls:

ast-grep --pattern 'const [$STATE, $SETTER] = useState($$$)'

Find all try-catch blocks:

ast-grep --pattern 'try { $$$ } catch ($ERR) { $$$ }'

Find all Express route handlers:

ast-grep --pattern 'app.$METHOD($PATH, async ($REQ, $RES) => { $$$ })'

Python

Find all class definitions:

ast-grep --pattern 'class $NAME: $$$'

Find all function calls to a specific function:

ast-grep --pattern 'calculate_total($$$)'

Find all list comprehensions:

ast-grep --pattern '[$EXPR for $VAR in $ITER]'

Rust

Find all match expressions:

ast-grep --pattern 'match $EXPR { $$$ }'

Find all unwrap() calls:

ast-grep --pattern '$EXPR.unwrap()'

Practical Use Cases

Use Case 1: Refactoring Function Calls

Problem: You need to rename fetchUserData to getUserData everywhere it’s called (not in comments or docs).

Text-based approach (error-prone):

# Find all mentions
grep -r "fetchUserData" .

# Manually verify each match
# Replace only the real function calls
# Hope you didn't miss any or change comments by mistake

AST-based approach (precise):

# Find all function calls
ast-grep --pattern 'fetchUserData($$$)' --json

# Rewrite all matches
ast-grep --pattern 'fetchUserData($$$)' \
  --rewrite 'getUserData($$$)' \
  --update-all

Result: Only actual function calls are renamed. Comments, strings, and documentation are untouched.

Use Case 2: Finding Unsafe Patterns

Find all .unwrap() calls in Rust (which can panic):

ast-grep --pattern '$EXPR.unwrap()'

Output:

./src/main.rs:15:  let user = user_result.unwrap();
./src/api.rs:42:   let data = response.json().unwrap();
./src/db.rs:88:    let conn = pool.get().unwrap();

No false positives from:

  • Comments mentioning “unwrap”
  • String literals containing “unwrap”
  • Function definitions named unwrap()

Use Case 3: AI Coding Agent Context

LLM prompt: “Find all places where we make API calls to fetch user data”

Bad (text search):

grep -r "fetch.*user" .

Returns 50+ matches including comments, docs, variable names, etc.

Good (AST search):

# Find all function calls starting with 'fetch' and containing 'user'
ast-grep --pattern '$FUNC($$$)' | grep -i "fetch.*user"

Returns only actual function calls, giving the LLM high-signal context.

Use Case 4: Code Quality Audits

Find all console.log statements (to remove before production):

ast-grep --pattern 'console.log($$$)'

Find all TODO comments (different tool, but conceptually related):

# ast-grep focuses on code structure, not comments
# Use grep for comments, ast-grep for code
grep -r "TODO" .

Find all uses of any type in TypeScript:

ast-grep --pattern '$VAR: any'

Integration with AI Coding Workflows

Pattern 1: Precise Context Retrieval

When an LLM needs to understand how a function is used:

# Instead of:
grep -r "processPayment" .

# Use:
ast-grep --pattern 'processPayment($$$)' --json | jq

This gives the LLM only actual usages, not documentation or comments.

Pattern 2: Pre-Refactoring Validation

Before asking an LLM to refactor:

# Find all actual call sites
ast-grep --pattern 'oldFunctionName($$$)' > call_sites.txt

# Provide to LLM as context
LLM_CONTEXT=$(cat call_sites.txt)
llm "Refactor these call sites to use newFunctionName: $LLM_CONTEXT"

Pattern 3: Codebase Understanding

Generate a map of how functions are used:

# Find all function definitions
ast-grep --pattern 'function $NAME($$$) { $$$ }' --json > functions.json

# Find all function calls
ast-grep --pattern '$FUNC($$$)' --json > calls.json

# Analyze with LLM
llm "Given these functions and calls, create a dependency graph"

Advanced Patterns

Combining Filters

Find all async functions that use try-catch:

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

4.5/5 rating
306,000+ learners
View Course
ast-grep --pattern 'async function $NAME($$$) { $$$ try { $$$ } catch { $$$ } $$$ }'

Language-Specific Queries

React: Find all components using useEffect with empty deps:

ast-grep --pattern 'useEffect($$$, [])'

Python: Find all functions with type hints:

ast-grep --pattern 'def $NAME($$$) -> $TYPE: $$$'

Custom Rules

Create a config file for reusable patterns:

# .ast-grep/rules.yml
rules:
  - id: no-console-log
    pattern: console.log($$$)
    message: Remove console.log before production
    severity: warning

  - id: prefer-const
    pattern: let $VAR = $VALUE
    message: Use const instead of let for immutable values
    severity: info

Run all rules:

ast-grep scan

Performance Comparison

Benchmark: Search for function calls in a 100k LOC TypeScript project

Tool Time False Positives True Positives
grep 0.5s 45% 100%
ripgrep 0.2s 45% 100%
ast-grep 1.2s 0% 100%

Tradeoff: ast-grep is slower (parsing overhead) but eliminates false positives.

When to use each:

  • grep/ripgrep: Quick, fuzzy searches; searching strings/comments
  • ast-grep: Precise code structure queries; refactoring; AI context

Best Practices

1. Start Simple, Refine Iteratively

# Start broad
ast-grep --pattern 'fetch$$$($$$)'

# Refine to specific pattern
ast-grep --pattern 'fetchUserData($$$)'

# Further refine with filters
ast-grep --pattern 'await fetchUserData($$$)'

2. Use JSON Output for Programmatic Processing

ast-grep --pattern 'useState($$$)' --json | jq '.[] | .file'

3. Combine with Traditional Tools

# Find all API calls in src/api/
ast-grep --pattern 'fetch($$$)' src/api/ --json

4. Test Patterns on Small Files First

Before running on entire codebase:

ast-grep --pattern 'your-pattern' src/example.ts

5. Document Common Patterns

Create a team wiki or README with common ast-grep patterns:

## Common AST Patterns

### Find all API calls
`ast-grep --pattern 'fetch($$$)'`

### Find all React hooks
`ast-grep --pattern 'use$HOOK($$$)'`

### Find all async functions
`ast-grep --pattern 'async function $NAME($$$) { $$$ }'`

Limitations

1. Requires Valid Syntax

ast-grep can’t parse files with syntax errors:

$ ast-grep --pattern 'fetchUserData($$$)' broken-file.ts
Error: Parse error at line 42

Workaround: Fix syntax errors first, or use text search as fallback.

2. Language Support

Supported via tree-sitter:

  • JavaScript/TypeScript
  • Python
  • Rust
  • Go
  • C/C++
  • Ruby
  • Java
  • And 40+ more

Check: https://github.com/ast-grep/ast-grep#supported-languages

3. Learning Curve

Pattern syntax takes time to learn:

  • Easy: fetchUserData($$$) (basic call)
  • Medium: const [$A, $B] = useState($$$) (destructuring)
  • Hard: Complex nested patterns

Tip: Start with simple patterns, gradually increase complexity.

Alternatives

Semgrep

Similar tool with more focus on security/linting:

semgrep --pattern 'fetchUserData(...)'

Comparison:

  • Semgrep: Better for security rules, multi-language support
  • ast-grep: Faster, simpler syntax, better for refactoring

Comby

Structural code rewriting:

comby 'fetchUserData(:[args])' 'getUserData(:[args])' .

Comparison:

  • Comby: Language-agnostic (regex-like)
  • ast-grep: Language-aware (AST-based)

IDE Structural Search

IntelliJ IDEA, VS Code extensions offer structural search:

Pros: Integrated into IDE, visual interface
Cons: Not scriptable, not usable in CI/CD

Measuring Success

Key metrics:

  1. False positive rate: Should be 0% for well-formed patterns
  2. Time saved: Compare manual filtering vs. ast-grep precision
  3. Refactoring confidence: Can you trust the results for automated rewrites?

Before ast-grep:

Search time: 30 seconds
Manual filtering: 5 minutes
False positives: 20-40%
Confidence: Low (manual verification required)

After ast-grep:

Search time: 2 minutes (parsing overhead)
Manual filtering: 0 seconds
False positives: 0%
Confidence: High (safe for automated refactoring)

Conclusion

Text-based search tools are fast but imprecise. AST-based search trades a bit of speed for perfect precision:

Use ast-grep when you need:

  • Zero false positives
  • Automated refactoring
  • High-quality LLM context
  • Structural code analysis

Use grep/ripgrep when you need:

  • Speed over precision
  • Fuzzy matching
  • Searching comments/docs
  • Quick exploration

The best approach: Use both. Start with grep for exploration, refine with ast-grep for precision.

Next steps:

  1. Install ast-grep: brew install ast-grep
  2. Try a simple pattern: ast-grep --pattern 'console.log($$$)'
  3. Create a .ast-grep/rules.yml for your team’s common patterns
  4. Integrate into CI/CD for code quality checks

By understanding code structure instead of treating it as text, AST-based search eliminates false positives and enables confident, automated refactoring—exactly what you need when working with AI coding agents.

Related Concepts

Topics
Ast GrepCode AnalysisCode SearchDeveloper ToolsFalse PositivesPrecision RetrievalRefactoringStructural SearchSyntax AwareTree Sitter

More Insights

Cover Image for Thought Leaders

Thought Leaders

People to follow for compound engineering, context engineering, and AI agent development.

James Phoenix
James Phoenix
Cover Image for Systems Thinking & Observability

Systems Thinking & Observability

Software should be treated as a measurable dynamical system, not as a collection of features.

James Phoenix
James Phoenix