Introduction
Ever wish there was a smarter way to search for tech jobs? I built an AI agent that understands natural language queries like “Find me 5 latest Flutter jobs” and returns relevant postings from public job feeds.
This isn’t a tutorial on how to build agents from scratch. Instead, I’m walking you through how this specific project works—the decisions I made, the gotchas I hit, and real code examples you can learn from.
The agent workflow is simple but practical:
- User asks: “Show me remote backend roles”
- Agent extracts keywords:
['backend', 'remote'] - Agent calls a tool that searches cached RSS feeds
- Tool returns ranked, deduplicated jobs
- Agent formats results for the user
You’ll need Node.js 18+, an OpenAI API key, and comfort reading TypeScript. …
Introduction
Ever wish there was a smarter way to search for tech jobs? I built an AI agent that understands natural language queries like “Find me 5 latest Flutter jobs” and returns relevant postings from public job feeds.
This isn’t a tutorial on how to build agents from scratch. Instead, I’m walking you through how this specific project works—the decisions I made, the gotchas I hit, and real code examples you can learn from.
The agent workflow is simple but practical:
- User asks: “Show me remote backend roles”
- Agent extracts keywords:
['backend', 'remote'] - Agent calls a tool that searches cached RSS feeds
- Tool returns ranked, deduplicated jobs
- Agent formats results for the user
You’ll need Node.js 18+, an OpenAI API key, and comfort reading TypeScript. We’re not reinventing the wheel here—just building on top of some solid libraries.
What You’ll Need
Before diving in, have these ready:
- Node.js 18+ and npm
- An OpenAI API key (get one at https://platform.openai.com/)
- Basic TypeScript (don’t need to be expert—just comfortable with types)
If you’ve never used Mastra before, that’s fine. I’ll explain the key pieces as we go.
The Big Picture: How It All Fits Together
Imagine you’re building a small app with AI. You don’t just ask the AI a question and hope for the best. You:
- Give it a tool it can use (in our case: “search these job feeds”)
- Give it strict instructions (in our case: “always use the tool, don’t make stuff up”)
- Cache external data so you’re not hammering RSS feeds every second
- Validate everything with schemas so the code doesn’t break
Here’s the folder structure I’m working with:
src/mastra/
├── agents/
│ └── jobs-agent.ts ← The agent + its instructions
├── tools/
│ └── rss-tool.ts ← The "search jobs" tool
├── workflows/
│ └── jobs-workflow.ts ← Compose steps that use the agent
├── utils/
│ ├── keyword-extractor.ts ← Parse "Find 5 Flutter jobs"
│ ├── feed-cache.ts ← Cache RSS data locally
│ └── feed-scheduler.ts ← Auto-refresh feeds every 30 min
├── scorers/
│ └── jobs-scorer.ts ← Grade agent responses
└── index.ts ← Wire it all together
Don’t worry about memorizing this. You’ll see it in action.
Walking Through the Code: Step by Step
Step 1: The Agent Asks a Question
When a user types a query, it goes to the jobsAgent. Here’s what that agent looks like:
export const jobsAgent = new Agent({
name: 'Jobs Agent',
description: "Fetches recent remote and tech-related job listings from public RSS feeds",
instructions: `
You are a jobs search assistant. ALWAYS use the fetch-jobs tool to search for jobs.
Never make up job listings. If the tool returns 0 jobs, clearly state that no matches were found.
Format results with: Title, Company, Location, Description, Posted Date.
`,
model: 'openai/gpt-4o-mini',
tools: { rssTool },
memory: new Memory({
storage: new LibSQLStore({
url: 'file:../mastra.db',
}),
}),
});
What’s happening here:
- We’re creating an agent that uses GPT-4o-mini (a fast, cheap OpenAI model)
- We give it strict instructions to always use the
fetch-jobstool (this prevents hallucination) - It can access a memory store to remember past conversations
The key insight: the instructions are very explicit. We don’t say “maybe search for jobs.” We say “ALWAYS use the fetch-jobs tool.” This keeps the agent honest.
Step 2: Parse the User’s Query
Before calling the tool, we need to extract what the user actually wants. If someone says “Find me 5 latest Flutter jobs,” we extract:
- Keywords:
['flutter'] - Limit:
5 - Location: maybe
'remote'
This happens in keyword-extractor.ts:
export function extractKeywords(input: string): string[] {
const stopwords = new Set([
"find","show","latest","remote","job","jobs","for","me","the",
// ... 40+ more common words
]);
const stemmer = natural.PorterStemmer;
const words = input
.toLowerCase()
.replace(/[^\w\s+.#]/g, "") // keep tech symbols like + and #
.split(/\s+/)
.filter(word => word.length > 2 && !stopwords.has(word))
.map(word => stemmer.stem(word)); // "jobs" → "job", "running" → "run"
return [...new Set(words)]; // remove duplicates
}
Breaking this down:
- We remove “noise words” like “find”, “show”, “the” (these don’t help find jobs)
- We use a stemmer to reduce words to their root (so “Flutter”, “flutter”, “FLUTTER” all become “flutter”)
- We return only the meaningful keywords
Example:
Input: "Find 5 latest Flutter jobs"
Output: ['flutter'] // (removed: find, latest, jobs)
This is surprisingly effective because job titles usually contain the tech you’re searching for.
Step 3: The Tool Does the Heavy Lifting
The rssTool is where the magic happens. It’s a Mastra tool, which means it has:
- An input schema (what data it accepts)
- An output schema (what it returns)
- An execute function (what it actually does)
export const rssTool = createTool({
id: 'fetch-jobs',
inputSchema: z.object({
query: z.string().describe('e.g., "Flutter developer"'),
limit: z.number().default(10),
}),
outputSchema: z.object({
jobs: z.array(jobListingSchema),
total: z.number(),
query: z.string(),
}),
execute: async ({ context }) => {
const { query, limit } = context;
const keywords = extractKeywords(query);
// Load all cached jobs from RSS feeds
let allJobs = [];
for (const feedUrl of rssFeeds) {
const feedJobs = await fetchFeedWithCache(feedUrl);
allJobs = allJobs.concat(feedJobs);
}
// Remove duplicates (same job posted to multiple feeds)
const uniqueJobs = deduplicateJobs(allJobs);
// Filter: only keep jobs that match keywords
const matchedJobs = uniqueJobs.filter(job => {
const title = job.title.toLowerCase();
const desc = job.description.toLowerCase();
return keywords.some(kw => title.includes(kw) || desc.includes(kw));
});
// Sort by relevance (title matches > description matches), then by date
const sorted = matchedJobs
.sort((a, b) => {
const aScore = (a.title.toLowerCase().match(keywords[0]) ? 2 : 0);
const bScore = (b.title.toLowerCase().match(keywords[0]) ? 2 : 0);
return bScore - aScore;
})
.slice(0, limit);
return {
jobs: sorted,
total: sorted.length,
query,
};
},
});
What this does:
- Takes your keywords and limit
- Loads all cached job listings (from
.cache/jobs, refreshed every 30 minutes) - Removes duplicates (a job might be posted to multiple feeds)
- Filters to only jobs matching your keywords
- Ranks by relevance (title matches are more important than description matches)
- Returns the top N results
Notice: we don’t fetch live RSS here. We use a cache. This is important because:
- RSS feeds can be slow (5-10 second timeouts)
- We’d hit rate limits if we fetched on every query
- Users expect fast responses
Step 4: Feed Caching
The cache lives in .cache/jobs and stores raw job listings for ~4 hours. Here’s how it works:
export async function fetchFeedWithCache(feedUrl: string): Promise<CachedJob[]> {
// Try cache first
const cached = await loadFromCache(feedUrl);
if (cached !== null) {
return cached; // Cache hit! Return instantly
}
// Cache miss: fetch live, then save
try {
const response = await axios.get(feedUrl, { timeout: 8000 });
const feed = await parseFeed(response.data);
const jobs = feed.items.map(item => ({
title: item.title || 'No title',
link: item.url || '',
description: item.description?.substring(0, 500) || 'No description',
pubDate: item.published?.toISOString(),
source: feedUrl,
}));
// Save to cache for next time
await saveToCache(feedUrl, jobs);
return jobs;
} catch (error) {
console.error(`Failed to fetch ${feedUrl}:`, error.message);
return []; // Return empty list, don't crash
}
}
Why this matters:
- First request for a feed? We fetch and parse the RSS (takes 2-3 seconds)
- Follow-up requests within 4 hours? We serve from cache (instant)
- After 4 hours? We refresh from live feeds again
Step 5: Auto-Refresh with a Scheduler
We don’t want stale data. So every 30 minutes, we automatically refresh all feeds:
export function startFeedScheduler(intervalMinutes: number = 30): void {
if (isSchedulerActive) return;
const cronExpression = `*/${intervalMinutes} * * * *`; // */30 = every 30 min
cron.schedule(cronExpression, async () => {
const result = await refreshAllFeeds();
console.log(`Refreshed: ${result.refreshed} feeds, Failed: ${result.failed}`);
});
isSchedulerActive = true;
}
How it works:
- Uses
node-cronto schedule a background job - Every 30 minutes, it refreshes to see if the 4 hours interval has elapsed so it can fetch new feeds and updates the cache
- If a feed fails, we catch the error and continue (don’t crash the app)
- This happens in the background—users don’t wait for it
Step 6: Validation with Zod
Every tool and workflow uses Zod schemas. This is important because TypeScript types disappear at runtime, but Zod validates at runtime:
const jobSchema = z.object({
title: z.string(),
link: z.string(),
description: z.string(),
pubDate: z.string().optional(),
source: z.string(),
});
const jobSearchResult = z.object({
jobs: z.array(jobSchema),
total: z.number(),
query: z.string(),
});
Why this matters:
- If something returns bad data, Zod will catch it
- The agent can’t accidentally pass malformed data downstream
- You get helpful error messages instead of mysterious crashes later
Actually Running This Thing
Let’s get it working locally first:
npm install
npm run dev
This starts a dev server on http://localhost:4111/ with a web playground. Go there, select jobsAgent from the dropdown, and try typing:
Find 5 latest Flutter jobs
You’ll see the agent:
- Extract “flutter” as the keyword
- Search the cached jobs
- Return formatted results with links
If you want to test the API directly:
curl -X POST http://localhost:4111/api/agents/jobsAgent/stream \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Show me remote backend roles"}
]
}'
For production, build it:
npm run build
This creates .mastra/output with all the bundled code, and tells you how to start it:
node --import=./.mastra/output/instrumentation.mjs .mastra/output/index.mjs
Environment Setup
Create a .env file in the project root:
OPENAI_API_KEY=sk-your-key-here
That’s it. Everything else has sensible defaults.
How to Extend This project
Add a New Tool
Say you want a tool that looks up company data. Here’s the pattern:
// src/mastra/tools/company-lookup.ts
import { createTool } from '@mastra/core/tools';
import { z } from 'zod';
export const companyLookupTool = createTool({
id: 'lookup-company',
description: 'Get details about a company',
inputSchema: z.object({
companyName: z.string(),
}),
outputSchema: z.object({
name: z.string(),
foundingYear: z.number(),
description: z.string(),
}),
execute: async ({ context }) => {
const { companyName } = context;
// Call your API or database
return {
name: companyName,
foundingYear: 2020,
description: 'A cool company',
};
},
});
Then register it in src/mastra/index.ts:
import { companyLookupTool } from './tools/company-lookup.js';
// In the mastra config:
agents: {
jobsAgent: new Agent({
// ...
tools: { rssTool, companyLookupTool }, // ← Add here
}),
},
Add a New Agent
Same idea—create the agent, give it strict instructions, register it:
// src/mastra/agents/recruiter-agent.ts
export const recruiterAgent = new Agent({
name: 'Recruiter Agent',
instructions: `You help recruiters find candidates. Use the lookup-company and fetch-jobs tools.`,
model: 'openai/gpt-4o-mini',
tools: { companyLookupTool, rssTool },
// ... memory, scorers, etc
});
Then in index.ts:
agents: {
jobsAgent,
recruiterAgent, // ← Add here
},
Add Observability
The scorers are already set up to grade agent responses. But you can log what happened:
const response = await jobsAgent.generate("Find Flutter jobs");
console.log(`Agent response:`, response.text);
console.log(`Tool called:`, response.toolResults?.length);
console.log(`Artifacts:`, response.artifacts);
What I Learned
1. Explicit instructions matter more than model size. A cheap model with crystal-clear instructions (“ALWAYS use this tool, NEVER make up jobs”) beats a fancy model with vague instructions every time.
2. Cache external data aggressively. RSS feeds are slow and unreliable. Caching with a 4-hour TTL means users get instant responses after the first request.
3. Zod schemas reduce integration bugs. Catching bad data at the tool boundary prevents cascading failures downstream.
4. Schedulers are better than webhooks. Periodic refresh via cron is simpler than trying to monitor feed changes.
5. Testing is hard with LLMs. You can’t easily unit test an agent’s text generation. Use scorers and manual testing with curl instead.
Next Steps
Now that you understand how this works, try:
- Add more feeds to
src/mastra/data/rss-feeds.ts - Improve keyword extraction using embeddings or NER instead of keyword matching
- Add company metadata by looking up each job’s company in a database
- Deploy to production using the build output
References
- Mastra docs: https://docs.mastra.ai
- @rowanmanning/feed-parser: https://www.npmjs.com/package/@rowanmanning/feed-parser
- natural (PorterStemmer): https://www.npmjs.com/package/natural
- Zod validation: https://zod.dev
- node-cron: https://www.npmjs.com/package/node-cron