Building an AI-Powered Lead Generation System — Architecture and Hard Lessons
How I built LeadSysGen — an automated prospect discovery, qualification, and outreach system — the technical architecture, what the LLMs are actually good at, and where they fail.
LeadSysGen automates the top of the sales funnel: find prospects, qualify them, draft outreach. I built this to solve a problem I saw repeatedly with freelancing clients — the grunt work of lead generation eats hours that should be spent closing. Here's the full technical breakdown.
The Problem With Manual Lead Generation
The math is brutal: a sales rep manually researching prospects spends 2-3 hours to build a list of 20 qualified leads. Of those, 3-5 will respond to outreach. 1-2 will convert to calls. Conversion rates don't scale with effort — they scale with volume and targeting precision.
LeadSysGen attacks the first bottleneck: the research phase. Finding prospects, verifying they fit the ideal customer profile (ICP), enriching their data, and drafting personalized outreach — all automated.
Architecture Overview
Data Sources (LinkedIn, Clearbit, Apollo, Web)
│
▼
Prospect Discovery Service (Python/FastAPI)
│
▼
Enrichment Pipeline (async workers)
│
├── Company enrichment (size, industry, tech stack, funding)
├── Contact enrichment (role, seniority, email verification)
└── ICP scoring (LLM-based qualification)
│
▼
Qualification Engine
│
▼
Outreach Generator (LLM)
│
▼
Campaign Manager (scheduling, tracking, A/B testing)
│
▼
Analytics Dashboard (Next.js)
Prospect Discovery
The discovery layer aggregates from multiple sources and deduplicates:
class ProspectDiscoveryService:
def __init__(self):
self.sources = [
LinkedInScraper(),
ApolloAPIClient(api_key=settings.APOLLO_KEY),
ClearbitEnrichment(api_key=settings.CLEARBIT_KEY),
]
self.dedup_store = RedisBloomFilter("prospects:seen")
async def discover(self, icp: ICPCriteria, limit: int = 100) -> list[RawProspect]:
tasks = [source.search(icp, limit) for source in self.sources]
results = await asyncio.gather(*tasks, return_exceptions=True)
prospects = []
for result in results:
if isinstance(result, Exception):
logger.error("Source failed", exc_info=result)
continue
for p in result:
# Bloom filter deduplication — O(1), probabilistic
if not self.dedup_store.contains(p.email):
self.dedup_store.add(p.email)
prospects.append(p)
return prospects[:limit]The bloom filter prevents the same prospect from entering the pipeline twice across runs — cheap and fast with acceptable false-positive rate.
LLM-Based ICP Qualification
Rule-based qualification ("company size > 50, industry = SaaS, location = France") misses nuance. A 15-person fintech startup with Series A funding is a better prospect than a 200-person retail company in some ICPs.
I built an LLM qualification step that reasons over the full company profile:
ICP_QUALIFICATION_PROMPT = """
You are evaluating whether a company is a good fit for the following ideal customer profile (ICP):
ICP DEFINITION:
{icp_description}
COMPANY PROFILE:
Name: {company_name}
Industry: {industry}
Size: {employee_count} employees
Location: {location}
Tech stack: {tech_stack}
Recent funding: {funding}
Description: {description}
Evaluate this company against the ICP. Return JSON:
{{
"score": <0-100 integer, 100 = perfect fit>,
"tier": <"A" | "B" | "C" | "disqualified">,
"reasoning": "<2-3 sentences explaining the score>",
"flags": ["<any red flags>"],
"hooks": ["<specific talking points for outreach>"]
}}
Be rigorous. A score above 70 means high confidence this is a good prospect.
"""
async def qualify_prospect(
company: EnrichedCompany,
icp: ICPDefinition
) -> QualificationResult:
prompt = ICP_QUALIFICATION_PROMPT.format(
icp_description=icp.description,
**company.to_dict()
)
response = await openai_client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
response_format={"type": "json_object"},
temperature=0.1, # Low temperature for consistent scoring
)
return QualificationResult.model_validate_json(
response.choices[0].message.content
)temperature=0.1 is critical — you want deterministic qualification, not creative variance.
Personalized Outreach Generation
The most impactful feature. Generic outreach ("I noticed your company does X") has 2-3% response rates. Personalized outreach ("I saw your CTO's post about migrating to a microservices architecture and noticed you're hiring three backend engineers") gets 10-15%.
OUTREACH_PROMPT = """
Write a cold email from {sender_name} ({sender_role}) to {prospect_name} ({prospect_title}) at {company_name}.
CONTEXT ABOUT THE PROSPECT:
- Recent hooks: {hooks}
- Company situation: {company_context}
- Their likely challenges: {challenges}
SENDER'S VALUE PROPOSITION:
{value_prop}
REQUIREMENTS:
- Subject line: max 8 words, no emojis, no all-caps
- Body: 4-6 sentences maximum
- First sentence: must reference something specific to them (not generic)
- CTA: one specific, low-friction ask
- Tone: peer-to-peer, not salesy
- No phrases like: "I hope this finds you well", "I came across your profile"
Return JSON: {{"subject": "...", "body": "..."}}
"""The "no phrases like" instruction is important — LLMs default to the most common patterns in their training data, which are the most overused phrases in cold email.
Tracking and Analytics
Every email gets a unique tracking pixel and click-tracking link. The analytics show:
- Open rate by subject line variant (A/B testing)
- Reply rate by persona type and company tier
- Best time to send by industry
- Which ICP segments convert best
// Tracking webhook — receives open/click events
export async function POST(request: Request) {
const event = await request.json() as EmailEvent
await db.emailEvent.create({
data: {
emailId: event.emailId,
type: event.type, // "open" | "click" | "reply"
timestamp: new Date(event.timestamp),
metadata: event.metadata,
}
})
// Update campaign stats
await db.$executeRaw`
UPDATE campaign_stats
SET ${event.type}_count = ${event.type}_count + 1,
last_event_at = NOW()
WHERE campaign_id = (
SELECT campaign_id FROM emails WHERE id = ${event.emailId}
)
`
}What LLMs Are Bad At
After building this system, I'm clear on where LLMs fail in production:
-
Consistency at scale — the same company scored on the same ICP gives slightly different scores across runs. I cache qualification results and only re-score when enrichment data changes.
-
Detecting stale information — if a company was acquired last month, the LLM doesn't know. Always verify before sending.
-
Cultural nuance — outreach that works for US prospects reads differently to French or Moroccan recipients. I maintain locale-specific prompt variants.
-
Hallucinating "hooks" — without grounding in real data (LinkedIn posts, news, job listings), the LLM invents plausible-sounding hooks that are wrong. Always ground the context in real retrieved data.
Results
In the first two months running LeadSysGen against my own freelance pipeline:
- 340 qualified prospects discovered across target industries
- 12% email reply rate (vs 3% industry average for cold outreach)
- 4 new client conversations opened
The system doesn't replace sales judgment — it removes the manual research phase so you can spend time on what actually matters: the conversation.