The Problem with Text-Based Search
Scenario: You need to find all places where fetchUserData() is called in your codebase.
Using traditional grep:
$ grep -r "fetchUserData" .
./src/api/users.ts: const userData = await fetchUserData(userId);
./src/api/users.ts: // TODO: fetchUserData should handle errors better
./src/api/users.ts: console.log("Calling fetchUserData");
./README.md:The `fetchUserData` function retrieves user data from the API.
./tests/mocks.ts: fetchUserData: jest.fn(),
./utils/logger.ts: logger.debug('fetchUserData called', { userId });
The problem:
- Line 2: Comment mention (not a call)
- Line 3: String literal (not a call)
- Line 4: Documentation (not code)
- Line 5: Mock definition (not a real call)
- Line 6: String in logger (not a call)
Result: Only 1 of 6 matches is the actual function call you’re looking for.
Why Text Search Fails
Text-based tools like grep, ripgrep, and IDE search treat code as plain text, not structured syntax. They can’t distinguish:
- Code vs comments:
// fetchUserData()matches just likefetchUserData() - Strings vs identifiers:
"fetchUserData"matches likefetchUserData - Function calls vs definitions:
function fetchUserData()matches likefetchUserData() - Similar names:
fetchUserDataByIdmatches a search forfetchUserData
The Cost of False Positives
Time waste:
- Manually filtering 10-50% false positives
- Re-running searches with more specific patterns
- Validating each match individually
Risk of errors:
- Overlooking genuine matches hidden among noise
- Acting on false positives (e.g., refactoring comments)
- Missing edge cases due to cognitive overload
Poor LLM context:
- AI coding agents fetch irrelevant snippets
- Context window filled with documentation instead of code
- Lower quality suggestions due to noisy input
The Solution: AST-Based Search
AST (Abstract Syntax Tree) is the structural representation of code that compilers use. Instead of treating code as text, AST-based search tools parse it into a tree of syntax nodes.
How AST Search Works
Step 1: Code is parsed into an AST
const user = await fetchUserData(userId);
Becomes:
VariableDeclaration
├─ VariableDeclarator
│ ├─ Identifier: "user"
│ └─ AwaitExpression
│ └─ CallExpression
│ ├─ Identifier: "fetchUserData"
│ └─ Arguments
│ └─ Identifier: "userId"
Step 2: Search queries match AST patterns
Instead of searching for the text “fetchUserData”, you search for:
CallExpression with callee.name === "fetchUserData"
This matches only function calls, ignoring comments, strings, and other non-code mentions.
Meet ast-grep
ast-grep is a fast, syntax-aware code search and refactoring tool built on tree-sitter parsers.
Installation:
# macOS
brew install ast-grep
# Linux/macOS (cargo)
cargo install ast-grep
# npm
npm install -g @ast-grep/cli
Basic usage:
# Find all calls to fetchUserData
ast-grep --pattern 'fetchUserData($$$)'
# Find all class definitions
ast-grep --pattern 'class $NAME { $$$ }'
# Find all React components with useState
ast-grep --pattern 'const [$STATE, $SETTER] = useState($$$)'
Pattern Syntax
Metavariables
Single metavariable ($VAR): Matches a single AST node
# Find all function calls with any name
ast-grep --pattern '$FUNC($$$)'
Ellipsis ($$$): Matches zero or more nodes
# Find all fetchUserData calls regardless of arguments
ast-grep --pattern 'fetchUserData($$$)'
Examples by Language
TypeScript/JavaScript
Find all async function definitions:
ast-grep --pattern 'async function $NAME($$$) { $$$ }'
Find all destructured useState calls:
ast-grep --pattern 'const [$STATE, $SETTER] = useState($$$)'
Find all try-catch blocks:
ast-grep --pattern 'try { $$$ } catch ($ERR) { $$$ }'
Find all Express route handlers:
ast-grep --pattern 'app.$METHOD($PATH, async ($REQ, $RES) => { $$$ })'
Python
Find all class definitions:
ast-grep --pattern 'class $NAME: $$$'
Find all function calls to a specific function:
ast-grep --pattern 'calculate_total($$$)'
Find all list comprehensions:
ast-grep --pattern '[$EXPR for $VAR in $ITER]'
Rust
Find all match expressions:
ast-grep --pattern 'match $EXPR { $$$ }'
Find all unwrap() calls:
ast-grep --pattern '$EXPR.unwrap()'
Practical Use Cases
Use Case 1: Refactoring Function Calls
Problem: You need to rename fetchUserData to getUserData everywhere it’s called (not in comments or docs).
Text-based approach (error-prone):
# Find all mentions
grep -r "fetchUserData" .
# Manually verify each match
# Replace only the real function calls
# Hope you didn't miss any or change comments by mistake
AST-based approach (precise):
# Find all function calls
ast-grep --pattern 'fetchUserData($$$)' --json
# Rewrite all matches
ast-grep --pattern 'fetchUserData($$$)' \
--rewrite 'getUserData($$$)' \
--update-all
Result: Only actual function calls are renamed. Comments, strings, and documentation are untouched.
Use Case 2: Finding Unsafe Patterns
Find all .unwrap() calls in Rust (which can panic):
ast-grep --pattern '$EXPR.unwrap()'
Output:
./src/main.rs:15: let user = user_result.unwrap();
./src/api.rs:42: let data = response.json().unwrap();
./src/db.rs:88: let conn = pool.get().unwrap();
No false positives from:
- Comments mentioning “unwrap”
- String literals containing “unwrap”
- Function definitions named
unwrap()
Use Case 3: AI Coding Agent Context
LLM prompt: “Find all places where we make API calls to fetch user data”
Bad (text search):
grep -r "fetch.*user" .
Returns 50+ matches including comments, docs, variable names, etc.
Good (AST search):
# Find all function calls starting with 'fetch' and containing 'user'
ast-grep --pattern '$FUNC($$$)' | grep -i "fetch.*user"
Returns only actual function calls, giving the LLM high-signal context.
Use Case 4: Code Quality Audits
Find all console.log statements (to remove before production):
ast-grep --pattern 'console.log($$$)'
Find all TODO comments (different tool, but conceptually related):
# ast-grep focuses on code structure, not comments
# Use grep for comments, ast-grep for code
grep -r "TODO" .
Find all uses of any type in TypeScript:
ast-grep --pattern '$VAR: any'
Integration with AI Coding Workflows
Pattern 1: Precise Context Retrieval
When an LLM needs to understand how a function is used:
# Instead of:
grep -r "processPayment" .
# Use:
ast-grep --pattern 'processPayment($$$)' --json | jq
This gives the LLM only actual usages, not documentation or comments.
Pattern 2: Pre-Refactoring Validation
Before asking an LLM to refactor:
# Find all actual call sites
ast-grep --pattern 'oldFunctionName($$$)' > call_sites.txt
# Provide to LLM as context
LLM_CONTEXT=$(cat call_sites.txt)
llm "Refactor these call sites to use newFunctionName: $LLM_CONTEXT"
Pattern 3: Codebase Understanding
Generate a map of how functions are used:
# Find all function definitions
ast-grep --pattern 'function $NAME($$$) { $$$ }' --json > functions.json
# Find all function calls
ast-grep --pattern '$FUNC($$$)' --json > calls.json
# Analyze with LLM
llm "Given these functions and calls, create a dependency graph"
Advanced Patterns
Combining Filters
Find all async functions that use try-catch:
ast-grep --pattern 'async function $NAME($$$) { $$$ try { $$$ } catch { $$$ } $$$ }'
Language-Specific Queries
React: Find all components using useEffect with empty deps:
ast-grep --pattern 'useEffect($$$, [])'
Python: Find all functions with type hints:
ast-grep --pattern 'def $NAME($$$) -> $TYPE: $$$'
Custom Rules
Create a config file for reusable patterns:
# .ast-grep/rules.yml
rules:
- id: no-console-log
pattern: console.log($$$)
message: Remove console.log before production
severity: warning
- id: prefer-const
pattern: let $VAR = $VALUE
message: Use const instead of let for immutable values
severity: info
Run all rules:
ast-grep scan
Performance Comparison
Benchmark: Search for function calls in a 100k LOC TypeScript project
| Tool | Time | False Positives | True Positives |
|---|---|---|---|
| grep | 0.5s | 45% | 100% |
| ripgrep | 0.2s | 45% | 100% |
| ast-grep | 1.2s | 0% | 100% |
Tradeoff: ast-grep is slower (parsing overhead) but eliminates false positives.
When to use each:
- grep/ripgrep: Quick, fuzzy searches; searching strings/comments
- ast-grep: Precise code structure queries; refactoring; AI context
Best Practices
1. Start Simple, Refine Iteratively
# Start broad
ast-grep --pattern 'fetch$$$($$$)'
# Refine to specific pattern
ast-grep --pattern 'fetchUserData($$$)'
# Further refine with filters
ast-grep --pattern 'await fetchUserData($$$)'
2. Use JSON Output for Programmatic Processing
ast-grep --pattern 'useState($$$)' --json | jq '.[] | .file'
3. Combine with Traditional Tools
# Find all API calls in src/api/
ast-grep --pattern 'fetch($$$)' src/api/ --json
4. Test Patterns on Small Files First
Before running on entire codebase:
ast-grep --pattern 'your-pattern' src/example.ts
5. Document Common Patterns
Create a team wiki or README with common ast-grep patterns:
## Common AST Patterns
### Find all API calls
`ast-grep --pattern 'fetch($$$)'`
### Find all React hooks
`ast-grep --pattern 'use$HOOK($$$)'`
### Find all async functions
`ast-grep --pattern 'async function $NAME($$$) { $$$ }'`
Limitations
1. Requires Valid Syntax
ast-grep can’t parse files with syntax errors:
$ ast-grep --pattern 'fetchUserData($$$)' broken-file.ts
Error: Parse error at line 42
Workaround: Fix syntax errors first, or use text search as fallback.
2. Language Support
Supported via tree-sitter:
- JavaScript/TypeScript
- Python
- Rust
- Go
- C/C++
- Ruby
- Java
- And 40+ more
Check: https://github.com/ast-grep/ast-grep#supported-languages
3. Learning Curve
Pattern syntax takes time to learn:
- Easy:
fetchUserData($$$)(basic call) - Medium:
const [$A, $B] = useState($$$)(destructuring) - Hard: Complex nested patterns
Tip: Start with simple patterns, gradually increase complexity.
Alternatives
Semgrep
Similar tool with more focus on security/linting:
semgrep --pattern 'fetchUserData(...)'
Comparison:
- Semgrep: Better for security rules, multi-language support
- ast-grep: Faster, simpler syntax, better for refactoring
Comby
Structural code rewriting:
comby 'fetchUserData(:[args])' 'getUserData(:[args])' .
Comparison:
- Comby: Language-agnostic (regex-like)
- ast-grep: Language-aware (AST-based)
IDE Structural Search
IntelliJ IDEA, VS Code extensions offer structural search:
Pros: Integrated into IDE, visual interface
Cons: Not scriptable, not usable in CI/CD
Measuring Success
Key metrics:
- False positive rate: Should be 0% for well-formed patterns
- Time saved: Compare manual filtering vs. ast-grep precision
- Refactoring confidence: Can you trust the results for automated rewrites?
Before ast-grep:
Search time: 30 seconds
Manual filtering: 5 minutes
False positives: 20-40%
Confidence: Low (manual verification required)
After ast-grep:
Search time: 2 minutes (parsing overhead)
Manual filtering: 0 seconds
False positives: 0%
Confidence: High (safe for automated refactoring)
Conclusion
Text-based search tools are fast but imprecise. AST-based search trades a bit of speed for perfect precision:
Use ast-grep when you need:
- Zero false positives
- Automated refactoring
- High-quality LLM context
- Structural code analysis
Use grep/ripgrep when you need:
- Speed over precision
- Fuzzy matching
- Searching comments/docs
- Quick exploration
The best approach: Use both. Start with grep for exploration, refine with ast-grep for precision.
Next steps:
- Install ast-grep:
brew install ast-grep - Try a simple pattern:
ast-grep --pattern 'console.log($$$)' - Create a
.ast-grep/rules.ymlfor your team’s common patterns - Integrate into CI/CD for code quality checks
By understanding code structure instead of treating it as text, AST-based search eliminates false positives and enables confident, automated refactoring—exactly what you need when working with AI coding agents.
Related Concepts
- Custom ESLint Rules for AI Determinism – AST-based linting rules that enforce architectural patterns
- Playwright Script Loop – Generate validation scripts for faster feedback cycles
- Agentic Tool Detection – Detect tool availability before workflows
- Evaluation-Driven Development – Self-healing test loops with AI vision
- Test Custom Infrastructure – Avoid the house on stilts by testing tooling
- Semantic Naming for Retrieval – Name patterns for agentic retrieval
- Integration Testing Patterns – High-signal tests for LLM-generated code

