Learn how to build an end-to-end agentic AI resume parsing system using LangGraph and GPT-4o-mini to extract structured, searchable data with HITL (Human in the Loop) step.
24 min read1 day ago
β
Press enter or click to view image in full size
Resume Transformer Pipeline LangGraph : Image by Alpha Iterations
Non-members read here for free.
Introduction
Every recruiter knows the pain: hundreds of resumes in different formats (PDF, DOCX, TXT), inconsistent layouts, varying structures, and no easy way to search through them. Modern Applicant Tracking Systems aβ¦
Learn how to build an end-to-end agentic AI resume parsing system using LangGraph and GPT-4o-mini to extract structured, searchable data with HITL (Human in the Loop) step.
24 min read1 day ago
β
Press enter or click to view image in full size
Resume Transformer Pipeline LangGraph : Image by Alpha Iterations
Non-members read here for free.
Introduction
Every recruiter knows the pain: hundreds of resumes in different formats (PDF, DOCX, TXT), inconsistent layouts, varying structures, and no easy way to search through them. Modern Applicant Tracking Systems automatically parse resumes, but parsing errors and inconsistencies are common. As a result, HR teams still spend significant time validating and correcting extracted data.
What if we could reliably transform any resume into clean, structured, searchable data?
Thatβs exactly what weβre going to build in this article: an end-to-end, agent-based workflow that combines deterministic processing with selective LLM reasoning to handle real-world resumes at scale.
Problem Statement
HR departments and recruitment teams spend 20β30 hours per week manually processing resumes which is a tedious, error-prone task where recruiters must visually scan PDFs and documents to extract candidate information, leading to inconsistent data entry, human errors, and inefficient candidate matching.
Proposed Solution
A multi-agent AI system built with LangGraph that transforms unstructured resume data into structured, queryable information through four specialized agents working in a coordinated workflow.
Tech stack
- Agentic Framework: LangGraph
- LLM Provider: OpenAI gpt-4o-mini
- PDF Processing: PyPDF2
- OCR: pytesseract
- Database: SQLite3
What This Project Does
This is an intelligent resume processing pipeline built with LangGraph and OpenAIβs GPT-4o-mini that:
1. Parses resumes from multiple formats (PDF, DOCX, TXT) β even image-based PDFs with OCR
2. **Extracts **structured data using AI (contact info, experience, education, skills)
3. Validates & Enriches the data (calculates years of experience, standardizes skills)
4. HITL (Human in the Loop) to allow humans to take final decision
5. Stores everything in a queryable SQLite database
The result? A fully automated Applicant Tracking System (ATS) that can process resumes at scale.
parse --> extract --> validate_enrich --> human_review --> store --> END
Letβs get started.
Please note, For simplicity and readability, this article uses the term agent to describe both Python-based workflow functions and the AI-powered extraction component. Only the extraction step uses a true AI agent backed by GPT-4o-mini, while the other agents are deterministic Python functions orchestrated within the workflow.
Project Setup
Complete end to end notebook can be accessed on my github repo.
Folder Structure
Below is the folder structure for the project:
resume-transformer-agent/ββββ README.md # This file - comprehensive project documentationβββ requirements.txt # Python dependenciesβββ .env # Environment variables (API keys) - not in repoβββ .gitignore # Git ignore rulesββββ resume-transformer-agent.ipynb # Main Jupyter notebook with complete workflowββββ data/ # Sample resume files for testingβ βββ *.pdf # PDF resume samplesβ βββ *.docx # DOCX resume samplesβ βββ *.txt # Text resume samplesββββ resume_ats.db # SQLite database (auto-generated) βββ candidates table # Main candidate information βββ experience table # Work history entries βββ education table # Educational background βββ skills table # Normalized skills βββ certifications table # Professional certifications
Requirements File
In the requirements.txt file please add below libraries so that the they can be installed.
#requirements.txt# Resume Transformer Agent Dependencies# LangGraph for workflow orchestrationlanggraph>=0.0.20# OpenAI API for LLM-powered extractionopenai>=1.0.0# Document parsing librariesPyPDF2>=3.0.0python-docx>=1.0.0# OCR capabilities (optional, for image-based PDFs)pytesseract>=0.3.10pillow>=10.0.0pdf2image>=1.16.0
Create the virtual environment
In your terminal create a virtual environment called .venv:
For Mac (Terminal):
python -m venv .venv
For Windows (Command Prompt):
python -m venv .venv
Activate the virtual environment
For Mac
source .venv/bin/activate
For Windows:
.venv\Scripts\activate
Install the dependencies
in the terminal:
pip install -r requirements.txt
Create .env File
Please add a .env file in the repo and update the OpenAI API Key.
#.env fileOPENAI_API_KEY=your_api_key_here
Add resume data:
We download some resumes from Kaggle Source 1 and Kaggle Source 2 for varying layout and categories.
Please refer below image for the variety of the resumes:
Press enter or click to view image in full size
Resume Samples. Source: Kaggle
Press enter or click to view image in full size
Resume Samples. Source: Kaggle
Now we are good with the project setup. Letβs get started with the notebook:
Importing the required Libraries
import osimport jsonimport reimport sqlite3from datetime import datetimefrom typing import TypedDict, Optional, List, Dict, Anyfrom pathlib import Path# Document parsingimport PyPDF2import docx# LangGraphfrom langgraph.graph import StateGraph, END# OpenAIfrom openai import OpenAIfrom dotenv import load_dotenvload_dotenv()# Access the variablesOPENAI_API_KEY = os.getenv("OPENAI_API_KEY")# Set up OpenAI client# Make sure to set your OPENAI_API_KEY environment variableclient = OpenAI(api_key=OPENAI_API_KEY)print("β All libraries imported successfully!")
Define State Schema
We now define the state schema to store the variable values in the workflow.
class ResumeState(TypedDict): """State schema for the resume transformation workflow""" # Input file_path: str file_type: str # Step 1: Parse raw_text: Optional[str] parse_error: Optional[str] # Step 2: Extract structured_data: Optional[Dict[str, Any]] extract_error: Optional[str] # Step 3: Validate & Enrich validated_data: Optional[Dict[str, Any]] validation_error: Optional[str] # Step 3.5: Human Review (HITL) human_approved: Optional[bool] human_review_notes: Optional[str] # Step 4: Store database_id: Optional[int] store_error: Optional[str] # Metadata status: str messages: List[str]print("β State schema defined!")
Parsing Agent
We use PyPDF2 library to extract text from text based pdfs, and pytesseract library to extract text from scanned pdfs.
def parse_pdf(file_path: str) -> str: """Extract text from PDF file, with OCR fallback for image-based PDFs""" text = "" try: # First, try regular text extraction with open(file_path, 'rb') as file: pdf_reader = PyPDF2.PdfReader(file) for page in pdf_reader.pages: page_text = page.extract_text() if page_text: text += page_text + "\n" # If no text was extracted, the PDF is likely image-based, try OCR if not text.strip(): print(" β οΈ No text found in PDF - attempting OCR extraction...") try: from pdf2image import convert_from_path import pytesseract # Convert PDF pages to images images = convert_from_path(file_path) # Perform OCR on each page for i, image in enumerate(images): page_text = pytesseract.image_to_string(image) text += page_text + "\n" print(f" β OCR extracted {len(page_text)} characters from page {i+1}") except ImportError: raise Exception( "PDF appears to be image-based but OCR libraries not available. " "Install with: pip install pytesseract pdf2image pillow" ) except Exception as e: raise Exception(f"OCR extraction failed: {str(e)}") except Exception as e: raise Exception(f"Error parsing PDF: {str(e)}") return text.strip()def parse_docx(file_path: str) -> str: """Extract text from DOCX file""" try: doc = docx.Document(file_path) text = "\n".join([paragraph.text for paragraph in doc.paragraphs]) except Exception as e: raise Exception(f"Error parsing DOCX: {str(e)}") return text.strip()def parse_txt(file_path: str) -> str: """Extract text from TXT file""" try: with open(file_path, 'r', encoding='utf-8') as file: text = file.read() except Exception as e: raise Exception(f"Error parsing TXT: {str(e)}") return text.strip()def parse_agent(state: ResumeState) -> ResumeState: """ Agent 1: Parse - Extract raw text from resume file """ print(f"π PARSE AGENT: Processing {state['file_path']}") try: file_type = state['file_type'].lower() if file_type == 'pdf': raw_text = parse_pdf(state['file_path']) elif file_type == 'docx': raw_text = parse_docx(state['file_path']) elif file_type == 'txt': raw_text = parse_txt(state['file_path']) else: raise ValueError(f"Unsupported file type: {file_type}") state['raw_text'] = raw_text state['parse_error'] = None state['messages'].append(f"β Successfully parsed {len(raw_text)} characters from {file_type.upper()}") print(f" β Extracted {len(raw_text)} characters") except Exception as e: state['parse_error'] = str(e) state['messages'].append(f"β Parse error: {str(e)}") print(f" β Error: {str(e)}") return stateprint("β Parse agent defined!")
Extract Agent
This is an LLM based step which takes the parsed text from the resume and then extract the relevant entities like contact (name, email, phone, location, linkedin, github), summary, experience, education, skills, certification and languages.
Get Alpha Iterationsβs stories in your inbox
Join Medium for free to get updates from this writer.
We use gpt-4o-mini here.
def extract_agent(state: ResumeState) -> ResumeState: """ Agent 2: Extract - Use LLM to extract structured data from raw text """ print(f"π€ EXTRACT AGENT: Analyzing resume with gpt-4o-mini") if state.get('parse_error'): state['extract_error'] = "Cannot extract: parsing failed" return state try: # Prepare the prompt for GPT-4o-mini system_prompt = """You are an expert resume parsing engine. Your task is to extract structured, normalized data from raw resume text and return it as VALID JSON ONLY. Follow these rules strictly: - Output MUST be a single JSON object and nothing else (no markdown, no comments). - If a field is missing or not found, use null or an empty array (never invent data). - Normalize capitalization (Title Case for names, companies, titles). - Deduplicate repeated skills, certifications, and languages. - Do not infer dates or employers unless explicitly stated. - Keep descriptions concise and factual. Extract the following schema exactly: { "contact": { "name": string | null, "email": string | null, "phone": string | null, "location": string | null, "linkedin": string | null, "github": string | null }, "summary": string | null, "experience": [ { "company": string | null, "title": string | null, "start_date": string | null, "end_date": string | null, "description": string | null, "responsibilities": [string] } ], "education": [ { "institution": string | null, "degree": string | null, "field": string | null, "graduation_year": string | null } ], "skills": [string], "certifications": [string], "languages": [string] } Date formatting rules: - Use "YYYY-MM" when month is available. - Use "YYYY" if only the year is available. - Use "Present" for ongoing roles. Responsibilities should be concise bullet-style statements (max 1 sentence each).""" user_prompt = f"Resume text:\n\n{state['raw_text']}" # Call OpenAI API response = client.chat.completions.create( model="gpt-4o-mini", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": user_prompt} ], temperature=0.1, response_format={"type": "json_object"} ) # Parse the response extracted_text = response.choices[0].message.content structured_data = json.loads(extracted_text) state['structured_data'] = structured_data state['extract_error'] = None state['messages'].append("β Successfully extracted structured data using GPT-4o-mini") print(f" β Extracted data for: {structured_data.get('contact', {}).get('name', 'Unknown')}") except Exception as e: state['extract_error'] = str(e) state['messages'].append(f"β Extract error: {str(e)}") print(f" β Error: {str(e)}") return stateprint("β Extract agent defined!")
Validation and Enrichment Agent
Cleans data, calculates total years of experience, and standardizes skills. We have used basic enrichment functions. Those can be enhanced for more sophesticated enrichment.
# Standard skills taxonomy for normalizationSTANDARD_SKILLS = { # Programming Languages 'python': 'Python', 'java': 'Java', 'javascript': 'JavaScript', 'js': 'JavaScript', 'typescript': 'TypeScript', 'c++': 'C++', 'c#': 'C#', 'ruby': 'Ruby', 'go': 'Go', 'rust': 'Rust', 'swift': 'Swift', 'kotlin': 'Kotlin', 'php': 'PHP', # Frameworks 'react': 'React', 'reactjs': 'React', 'angular': 'Angular', 'vue': 'Vue.js', 'django': 'Django', 'flask': 'Flask', 'spring': 'Spring', 'nodejs': 'Node.js', 'express': 'Express.js', 'fastapi': 'FastAPI', # Databases 'sql': 'SQL', 'mysql': 'MySQL', 'postgresql': 'PostgreSQL', 'mongodb': 'MongoDB', 'redis': 'Redis', 'oracle': 'Oracle', 'dynamodb': 'DynamoDB', # Cloud & DevOps 'aws': 'AWS', 'azure': 'Azure', 'gcp': 'Google Cloud', 'docker': 'Docker', 'kubernetes': 'Kubernetes', 'k8s': 'Kubernetes', 'terraform': 'Terraform', 'jenkins': 'Jenkins', 'ci/cd': 'CI/CD', 'git': 'Git', # Data Science & ML 'machine learning': 'Machine Learning', 'ml': 'Machine Learning', 'deep learning': 'Deep Learning', 'tensorflow': 'TensorFlow', 'pytorch': 'PyTorch', 'pandas': 'Pandas', 'numpy': 'NumPy', 'scikit-learn': 'Scikit-learn', 'nlp': 'NLP', # Other 'agile': 'Agile', 'scrum': 'Scrum', 'rest api': 'REST API', 'graphql': 'GraphQL', 'microservices': 'Microservices'}def parse_date(date_str: str) -> Optional[datetime]: """Parse various date formats""" if not date_str or date_str.lower() in ['present', 'current', 'now']: return datetime.now() # Try different formats formats = ['%Y-%m', '%Y', '%m/%Y', '%B %Y', '%b %Y'] for fmt in formats: try: return datetime.strptime(str(date_str).strip(), fmt) except: continue return Nonedef calculate_experience(experience_list: List[Dict]) -> float: """Calculate total years of experience""" total_months = 0 for job in experience_list: start = parse_date(job.get('start_date', '')) end = parse_date(job.get('end_date', '')) if start and end: months = (end.year - start.year) * 12 + (end.month - start.month) total_months += max(0, months) return round(total_months / 12, 1)def standardize_skills(skills: List[str]) -> List[str]: """Normalize skills to standard taxonomy""" standardized = set() for skill in skills: skill_lower = skill.lower().strip() # Check if skill matches standard taxonomy if skill_lower in STANDARD_SKILLS: standardized.add(STANDARD_SKILLS[skill_lower]) else: # Keep original if not in taxonomy standardized.add(skill.strip()) return sorted(list(standardized))def clean_contact_info(contact: Dict) -> Dict: """Clean and validate contact information""" cleaned = {} # Clean email if contact.get('email'): email = contact['email'].strip().lower() if '@' in email: cleaned['email'] = email # Clean phone if contact.get('phone'): phone = re.sub(r'[^\d+\-() ]', '', contact['phone']) cleaned['phone'] = phone # Copy other fields for field in ['name', 'location', 'linkedin', 'github']: if contact.get(field): cleaned[field] = contact[field].strip() return cleaneddef validate_and_enrich_agent(state: ResumeState) -> ResumeState: """ Agent 3: Validate & Enrich - Clean data, calculate experience, standardize skills """ print(f"β¨ VALIDATE & ENRICH AGENT: Processing data") if state.get('extract_error'): state['validation_error'] = "Cannot validate: extraction failed" return state try: data = state['structured_data'].copy() # Clean contact info if data.get('contact'): data['contact'] = clean_contact_info(data['contact']) # Calculate total experience if data.get('experience'): total_experience = calculate_experience(data['experience']) data['total_years_experience'] = total_experience print(f" β Calculated {total_experience} years of experience") # Standardize skills if data.get('skills'): original_count = len(data['skills']) data['skills'] = standardize_skills(data['skills']) print(f" β Standardized {original_count} skills to {len(data['skills'])} unique skills") # Add metadata data['processed_date'] = datetime.now().isoformat() data['data_version'] = '1.0' state['validated_data'] = data state['validation_error'] = None state['messages'].append("β Successfully validated and enriched data") except Exception as e: state['validation_error'] = str(e) state['messages'].append(f"β Validation error: {str(e)}") print(f" β Error: {str(e)}") return stateprint("β Validate & Enrich agent defined!")
Human in the Loop (HITL)
It is always essential to have a HITL step in the flow. Here the HR person is provided with a UI to validate/edit the data. This step is called visual grounding.
We use simple HTML webpage to view the image on the left side and the extracted fields on the right side allowing the user to validate and edit the extracted data if required.
# HTML-based Human Review Interfaceimport webbrowserimport base64from io import BytesIOfrom pathlib import Pathimport tempfiledef get_resume_image_base64(file_path: str, file_type: str) -> str: """Convert resume to base64 image for embedding in HTML""" try: if file_type == 'pdf': from pdf2image import convert_from_path images = convert_from_path(file_path, first_page=1, last_page=1, dpi=200) resume_image = images[0] elif file_type in ['jpg', 'jpeg', 'png']: from PIL import Image resume_image = Image.open(file_path) else: return None # Convert to base64 buffered = BytesIO() resume_image.save(buffered, format="PNG") img_str = base64.b64encode(buffered.getvalue()).decode() return f"data:image/png;base64,{img_str}" except Exception as e: print(f"β οΈ Could not generate visual preview: {str(e)}") return Nonedef create_review_html_file(file_path: str, file_type: str, validated_data: Dict, output_path: str, result_file: str) -> str: """Create HTML interface for human review""" # Get resume image img_data = get_resume_image_base64(file_path, file_type) # Extract data contact = validated_data.get('contact', {}) skills = validated_data.get('skills', []) certifications = validated_data.get('certifications', []) education = validated_data.get('education', []) experience = validated_data.get('experience', []) # Build education HTML (editable) education_html = "" for idx, edu in enumerate(education): education_html += f""" <div style="margin-bottom: 15px; padding: 15px; background: #f8f9fa; border-radius: 5px; border: 1px solid #dee2e6;"> <div style="font-weight: 600; color: #1976d2; margin-bottom: 8px;">Education #{idx + 1}</div> <input type="text" class="edit-input" id="edu_{idx}_degree" value="{edu.get('degree', '')}" placeholder="Degree"> <input type="text" class="edit-input" id="edu_{idx}_field" value="{edu.get('field', '')}" placeholder="Field of Study"> <input type="text" class="edit-input" id="edu_{idx}_institution" value="{edu.get('institution', '')}" placeholder="Institution"> <input type="text" class="edit-input" id="edu_{idx}_year" value="{edu.get('graduation_year', '')}" placeholder="Graduation Year"> </div> """ # Build experience HTML (editable) experience_html = "" for idx, exp in enumerate(experience): experience_html += f""" <div style="margin-bottom: 15px; padding: 15px; background: #f8f9fa; border-radius: 5px; border: 1px solid #dee2e6;"> <div style="font-weight: 600; color: #1976d2; margin-bottom: 8px;">Experience #{idx + 1}</div> <input type="text" class="edit-input" id="exp_{idx}_title" value="{exp.get('title', '')}" placeholder="Job Title"> <input type="text" class="edit-input" id="exp_{idx}_company" value="{exp.get('company', '')}" placeholder="Company"> <div style="display: flex; gap: 10px;"> <input type="text" class="edit-input" style="flex: 1;" id="exp_{idx}_start" value="{exp.get('start_date', '')}" placeholder="Start Date (YYYY-MM)"> <input type="text" class="edit-input" style="flex: 1;" id="exp_{idx}_end" value="{exp.get('end_date', '')}" placeholder="End Date (YYYY-MM or Present)"> </div> <textarea class="edit-input" id="exp_{idx}_description" rows="3" placeholder="Description">{exp.get('description', '')}</textarea> </div> """ # Build skills textarea (editable - one per line) skills_text = "\n".join(skills) # Build certifications textarea (editable - one per line) certifications_text = "\n".join(certifications) # Image section image_section = "" if img_data: image_section = f""" <div style="flex: 1; background: #f5f5f5; padding: 20px; overflow-y: auto; max-height: 80vh; border-right: 2px solid #e0e0e0;"> <div style="display: flex; justify-content: space-between; align-items: center; margin-bottom: 15px;"> <h2 style="margin: 0; color: #1976d2;">π Resume Document</h2> <div style="display: flex; gap: 8px;"> <button onclick="zoomIn()" style="padding: 8px 12px; border: 1px solid #1976d2; background: white; color: #1976d2; border-radius: 4px; cursor: pointer; font-weight: 600; transition: all 0.2s;" onmouseover="this.style.background='#1976d2'; this.style.color='white';" onmouseout="this.style.background='white'; this.style.color='#1976d2';">π+ Zoom In</button> <button onclick="zoomOut()" style="padding: 8px 12px; border: 1px solid #1976d2; background: white; color: #1976d2; border-radius: 4px; cursor: pointer; font-weight: 600; transition: all 0.2s;" onmouseover="this.style.background='#1976d2'; this.style.color='white';" onmouseout="this.style.background='white'; this.style.color='#1976d2';">π- Zoom Out</button> <button onclick="fitToWindow()" style="padding: 8px 12px; border: 1px solid #1976d2; background: white; color: #1976d2; border-radius: 4px; cursor: pointer; font-weight: 600; transition: all 0.2s;" onmouseover="this.style.background='#1976d2'; this.style.color='white';" onmouseout="this.style.background='white'; this.style.color='#1976d2';">β¬ Fit to Window</button> </div> </div> <div id="resume-image-container" style="text-align: center; overflow: auto;"> <img id="resume-image" src="{img_data}" style="width: 100%; border-radius: 8px; box-shadow: 0 4px 6px rgba(0,0,0,0.1); transition: transform 0.2s;"> </div> </div> """ else: image_section = """ <div style="flex: 1; background: #f5f5f5; padding: 20px; display: flex; align-items: center; justify-content: center; border-right: 2px solid #e0e0e0;"> <div style="text-align: center; color: #666;"> <div style="font-size: 48px; margin-bottom: 10px;">π</div> <div>Resume preview not available</div> </div> </div> """ html = f""" <div style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;"> <style> .edit-input {{ width: 100%; padding: 10px; margin: 8px 0; border: 1px solid #ddd; border-radius: 5px; font-size: 14px; box-sizing: border-box; }} .edit-input:focus {{ outline: none; border-color: #1976d2; box-shadow: 0 0 0 2px rgba(25,118,210,0.1); }} .field-label {{ font-weight: 600; color: #333; margin-bottom: 4px; font-size: 13px; text-transform: uppercase; letter-spacing: 0.5px; }} .section-title {{ font-size: 18px; font-weight: 700; color: #1976d2; margin: 25px 0 15px 0; padding-bottom: 8px; border-bottom: 2px solid #e3f2fd; }} .btn {{ padding: 12px 24px; border: none; border-radius: 6px; font-size: 15px; font-weight: 600; cursor: pointer; transition: all 0.2s; margin: 10px 10px 0 0; }} .btn-approve {{ background: #4caf50; color: white; }} .btn-approve:hover {{ background: #45a049; transform: translateY(-1px); box-shadow: 0 4px 8px rgba(0,0,0,0.2); }} .btn-reject {{ background: #f44336; color: white; }} .btn-reject:hover {{ background: #da190b; transform: translateY(-1px); box-shadow: 0 4px 8px rgba(0,0,0,0.2); }} </style> <div style="background: #1976d2; color: white; padding: 20px; border-radius: 8px 8px 0 0; margin-bottom: 0;"> <h1 style="margin: 0; font-size: 24px;">ποΈ Human Review - Visual Grounding</h1> <p style="margin: 8px 0 0 0; opacity: 0.9;">Review extracted data against the original resume and make corrections if needed</p> </div> <div style="display: flex; border: 2px solid #e0e0e0; border-top: none; border-radius: 0 0 8px 8px; background: white;"> {image_section} <div style="flex: 1; padding: 30px; overflow-y: auto; max-height: 80vh;"> <div class="section-title">π Contact Information (Editable)</div> <div class="field-label">Full Name</div> <input type="text" class="edit-input" id="name_input" value="{contact.get('name', '')}" placeholder="Full Name"> <div class="field-label">Email Address</div> <input type="email" class="edit-input" id="email_input" value="{contact.get('email', '')}" placeholder="email@example.com"> <div class="field-label">Phone Number</div> <input type="tel" class="edit-input" id="phone_input" value="{contact.get('phone', '')}" placeholder="+1 (555) 123-4567"> <div class="field-label">Location</div> <input type="text" class="edit-input" id="location_input" value="{contact.get('location', '')}" placeholder="City, State"> <div class="field-label">LinkedIn Profile</div> <input type="url" class="edit-input" id="linkedin_input" value="{contact.get('linkedin', '')}" placeholder="linkedin.com/in/username"> <div class="field-label">GitHub Profile</div> <input type="url" class="edit-input" id="github_input" value="{contact.get('github', '')}" placeholder="github.com/username"> <div class="section-title">πΌ Professional Summary</div> <div style="background: #f8f9fa; padding: 15px; border-radius: 5px; margin-bottom: 15px;"> <div style="font-size: 16px;"><strong>Total Experience:</strong> {validated_data.get('total_years_experience', 0)} years</div> </div> <div class="section-title">π οΈ Skills ({len(skills)}) - Editable</div> <div class="field-label">Skills (one per line)</div> <textarea class="edit-input" id="skills_input" rows="6" placeholder="Enter skills, one per line">{skills_text}</textarea> <div class="section-title">π Certifications ({len(certifications)}) - Editable</div> <div class="field-label">Certifications (one per line)</div> <textarea class="edit-input" id="certifications_input" rows="4" placeholder="Enter certifications, one per line">{certifications_text}</textarea> <div class="section-title">π Education ({len(education)}) - Editable</div> {education_html} <div class="section-title">πΌ Experience ({len(experience)}) - Editable</div> {experience_html} <div class="section-title">π Review Notes</div> <textarea class="edit-input" id="notes_input" rows="4" placeholder="Add any review notes or corrections needed..." style="resize: vertical; min-height: 80px;"></textarea> <div style="margin-top: 30px; padding-top: 20px; border-top: 2px solid #e0e0e0;"> <button class="btn btn-approve" onclick="approveData()">β Approve & Continue</button> <button class="btn btn-reject" onclick="rejectData()">β Reject & Skip</button> <div id="status_message" style="margin-top: 15px; padding: 12px; border-radius: 5px; display: none;"></div> </div> </div> </div> <script> window.reviewState = {{ approved: null, data: {{}} }}; // Zoom functionality let currentZoom = 100; const zoomStep = 20; const minZoom = 50; const maxZoom = 300; function zoomIn() {{ if (currentZoom < maxZoom) {{ currentZoom += zoomStep; applyZoom(); }} }} function zoomOut() {{ if (currentZoom > minZoom) {{ currentZoom -= zoomStep; applyZoom(); }} }} function fitToWindow() {{ currentZoom = 100; applyZoom(); }} function applyZoom() {{ const img = document.getElementById('resume-image'); if (img) {{ img.style.width = currentZoom + '%'; }} }} function approveData() {{ window.reviewState.approved = true; // Collect contact data window.reviewState.data = {{ contact: {{ name: document.getElementById('name_input').value, email: document.getElementById('email_input').value, phone: document.getElementById('phone_input').value, location: document.getElementById('location_input').value, linkedin: document.getElementById('linkedin_input').value, github: document.getElementById('github_input').value }}, skills: document.getElementById('skills_input').value.split('\\n').filter(s => s.trim() !== ''), certifications: document.getElementById('certifications_input').value.split('\\n').filter(s => s.trim() !== ''), education: [], experience: [], notes: document.getElementById('notes_input').value }}; // Collect education data let eduIdx = 0; while (document.getElementById('edu_' + eduIdx + '_degree')) {{ window.reviewState.data.education.push({{ degree: document.getElementById('edu_' + eduIdx + '_degree').value, field: document.getElementById('edu_' + eduIdx + '_field').value, institution: document.getElementById('edu_' + eduIdx + '_institution').value, graduation_year: document.getElementById('edu_' + eduIdx + '_year').value }}); eduIdx++; }} // Collect experience data let expIdx = 0; while (document.getElementById('exp_' + expIdx + '_title')) {{ window.reviewState.data.experience.push({{ title: document.getElementById('exp_' + expIdx + '_title').value, company: document.getElementById('exp_' + expIdx + '_company').value, start_date: document.getElementById('exp_' + expIdx + '_start').value, end_date: document.getElementById('exp_' + expIdx + '_end').value, description: document.getElementById('exp_' + expIdx + '_description').value }}); expIdx++; }} var msg = document.getElementById('status_message'); msg.style.display = 'block'; msg.style.background = '#d4edda'; msg.style.color = '#155724'; msg.style.border = '1px solid #c3e6cb'; msg.innerHTML = 'β
<strong>Data Approved!</strong> Please click Continue below the interface.'; }} function rejectData() {{ window.reviewState.approved = false; window.reviewState.data = {{ notes: document.getElementById('notes_input').value }}; var msg = document.getElementById('status_message'); msg.style.display = 'block'; msg.style.background = '#f8d7da'; msg.style.color = '#721c24'; msg.style.border = '1px solid #f5c6cb'; msg.innerHTML = 'β <strong>Data Rejected.</strong> This record will NOT be stored.<br><br>Click the button below to save your decision.'; }} function saveDecision() {{ // Save decision to a JSON file that Python can read var data = JSON.stringify(window.reviewState); var blob = new Blob([data], {{type: 'application/json'}}); var url = URL.createObjectURL(blob); var a = document.createElement('a'); a.href = url; a.download = '{result_file}'; a.click(); // Also write to localStorage as backup localStorage.setItem('reviewState', data); // Update message var statusMsg = document.getElementById('status_message'); if (statusMsg.style.display !== 'none') {{ statusMsg.innerHTML += '<br><br>π₯ <strong>File Downloaded!</strong> Return to the notebook and press Enter to continue.'; }} }} </script> <div style="text-center; padding: 20px; background: #f8f9fa; border-top: 2px solid #e0e0e0;"> <button class="btn" style="background: #2196f3; color: white; font-size: 16px;" onclick="saveDecision()">πΎ Save Decision & Continue</button> <p style="margin-top: 10px; color: #666; font-size: 13px;">After clicking above, return to your notebook and press Enter</p> </div> </div> """ # Write HTML to file with open(output_path, 'w', encoding='utf-8') as f: f.write(html) return output_pathdef human_review_agent(state: ResumeState) -> ResumeState: """ Agent 3.5: Human Review - Display resume and allow corrections (HITL) """ print(f"ποΈ HUMAN REVIEW AGENT: Visual grounding and verification") if state.get('validation_error'): print(" β οΈ Skipping review due to validation error") state['human_approved'] = False return state print("\n" + "="*80) print("VISUAL GROUNDING: Review extracted data against original resume") print("="*80 + "\n") # Create temporary files for HTML and result temp_dir = Path(tempfile.gettempdir()) / "resume_review" temp_dir.mkdir(exist_ok=True) html_file = temp_dir / "review.html" result_file = temp_dir / "review_result.json" # Clean up old result file if it exists if result_file.exists(): result_file.unlink() # Create HTML file print("π Generating review interface...") create_review_html_file( state['file_path'], state['file_type'], state['validated_data'], str(html_file), result_file.name ) # Open in browser print(f"π Opening review interface in browser: {html_file}") webbrowser.open(f"file://{html_file}") # Wait for user to complete review print("\nβ³ Waiting for your decision in the browser...") print("π After clicking 'Approve' or 'Reject', save the downloaded file and press Enter here: ") input() # Try to read result from file import time max_attempts = 3 attempt = 0 approval_result = None # Fallback to manual input with validation print("\n" + "="*60) decision = "" while decision not in ['approve', 'reject']: decision = input("Enter your decision (approve/reject): ").lower().strip() if decision not in ['approve', 'reject']: print("β οΈ Invalid input. Please type 'approve' or 'reject' (without quotes)") if decision == 'approve': state['human_approved'] = True # Ask if user wants to input edited data has_edits = input("Did you make any edits? (yes/no): ").lower().strip() if has_edits == 'yes': print("\nπ Please paste the edited data JSON (or press Enter to skip):") try: import json json_input = input() if json_input.strip(): edited_data = json.loads(json_input) # Merge edited contact data if 'contact' in edited_data: state['validated_data']['contact'].update(edited_data['contact']) # Update skills if 'skills' in edited_data: state['validated_data']['skills'] = edited_data['skills'] # Update certifications if 'certifications' in edited_data: state['validated_data']['certifications'] = edited_data['certifications'] # Update education if 'education' in edited_data: state['validated_data']['education'] = edited_data['education'] # Update experience if 'experience' in edited_data: state['validated_data']['experience'] = edited_data['experience'] print(" β Edits applied to validated data") except Exception as e: print(f" β οΈ Could not parse edits: {e}") state['human_review_notes'] = input("Review notes (optional): ") state['messages'].append("β Human review: APPROVED (manual)") print("\n β Human review completed - APPROVED") else: state['human_approved'] = False state['human_review_notes'] = input("Rejection reason: ") state['messages'].append("β Human review: REJECTED") print("\n β Human review completed - REJECTED") return state
HTML Page design for Human in the Loop
Store Agent
This step formats the data into SQL and inserts into the database using sqlite3. There 5 tables: candidates, experience, education, skills and certification.
def initialize_database(db_path: str = "resume_ats.db"): """Initialize the ATS database with required tables""" conn = sqlite3.connect(db_path) cursor = conn.cursor() # Create candidates table cursor.execute(''' CREATE TABLE IF NOT EXISTS candidates ( id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT NOT NULL, email TEXT, phone TEXT, location TEXT, linkedin TEXT, github TEXT, summary TEXT, total_years_experience REAL, processed_date TEXT, data_version TEXT, human_approved BOOLEAN, human_review_notes TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ) ''') # Add missing columns to existing table (for backward compatibility) try: cursor.execute("ALTER TABLE candidates ADD COLUMN human_approved BOOLEAN") print(" β Added human_approved column to existing database") except sqlite3.OperationalError: pass # Column already exists try: cursor.execute("ALTER TABLE candidates ADD COLUMN human_review_notes TEXT") print(" β Added human_review_notes column to existing database") except sqlite3.OperationalError: pass # Column already exists # Create experience table cursor.execute(''' CREATE TABLE IF NOT EXISTS experience ( id INTEGER PRIMARY KEY AUTOINCREMENT, candidate_id INTEGER, company TEXT, title TEXT, start_date TEXT, end_date TEXT, description TEXT, FOREIGN KEY (candidate_id) REFERENCES candidates (id) ) ''') # Create education table cursor.execute(''' CREATE TABLE IF NOT EXISTS education ( id INTEGER PRIMARY KEY AUTOINCREMENT, candidate_id INTEGER, institution TEXT, degree TEXT, field TEXT, graduation_year TEXT, FOREIGN KEY (candidate_id) REFERENCES candidates (id) ) ''') # Create skills table cursor.execute(''' CREATE TABLE IF NOT EXISTS skills ( id INTEGER PRIMARY KEY AUTOINCREMENT, candidate_id INTEGER, skill TEXT, FOREIGN KEY (candidate_id) REFERENCES candidates (id) ) ''') # Create certifications table cursor.execute(''' CREATE TABLE IF NOT EXISTS certifications ( id INTEGER PRIMARY KEY AUTOINCREMENT, candidate_id INTEGER, certification TEXT, FOREIGN KEY (candidate_id) REFERENCES candidates (id) ) ''') conn.commit() conn.close()def store_agent(state: ResumeState, db_path: str = "resume_ats.db") -> ResumeState: """ Agent 4: Store - Insert structured data into database """ print(f"πΎ STORE AGENT: Saving to database") if state.get('validation_error'): state['store_error'] = "Cannot store: validation failed" return state # Check human approval status if state.get('human_approved') == False: state['store_error'] = "Cannot store: rejected by human reviewer" state['status'] = 'rejected' print(f" β οΈ Skipping storage - rejected by human reviewer") return state try: # Initialize database initialize_database(db_path) conn = sqlite3.connect(db_path) cursor = conn.cursor() data = state['validated_data'] contact = data.get('contact', {}) # Insert candidate with human review data cursor.execute(''' INSERT INTO candidates (name, email, phone, location, linkedin, github, summary, total_years_experience, processed_date, data_version, human_approved, human_review_notes) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) ''', ( contact.get('name'), contact.get('email'), contact.get('phone'), contact.get('location'), contact.get('linkedin'), contact.get('github'), data.get('summary'), data.get('total_years_experience'), data.get('processed_date'), data.get('data_version'), state.get('human_approved'), state.get('human_review_notes') )) candidate_id = cursor.lastrowid # Insert experience for exp in data.get('experience', []): cursor.execute(''' INSERT INTO experience (candidate_id, company, title, start_date, end_date, description) VALUES (?, ?, ?, ?, ?, ?) ''', ( candidate_id, exp.get('company'), exp.get('title'), exp.get('start_date'), exp.get('end_date'), exp.get('description') )) # Insert education for edu in data.get('education', []): cursor.execute(''' INSERT INTO education (candidate_id, institution, degree, field, graduation_year) VALUES (?, ?, ?, ?, ?) ''', ( candidate_id, edu.get('institution'), edu.get('degree'), edu.get('field'), edu.get('graduation_year') )) # Insert skills for skill in data.get('skills', []): cursor.execute(''' INSERT INTO skills (candidate_id, skill) VALUES (?, ?) ''', (candidate_id, skill)) # Insert certifications for cert in data.get('certifications', []): cursor.execute(''' INSERT INTO certifications (candidate_id, certification) VALUES (?, ?) ''', (candidate_id, cert)) conn.commit() conn.close() state['database_id'] = candidate_id state['store_error'] = None state['status'] = 'completed' state['messages'].append(f"β Successfully stored candidate with ID {candidate_id}") print(f" β Stored candidate with ID: {candidate_id}") except Exception as e: state['store_error'] = str(e) state['status'] = 'failed' state['messages'].append(f"β Store error: {str(e)}") print(f" β Error: {str(e)}") return stateprint("β Store agent defined!")
Building the LangGraph Workflow
Now we stitch the pieces together to create a complete workflow using LangGraph nodes and edges.
parse --> extract --> validate_enrich --> human_review --> store --> END
def should_continue(state: ResumeState) -> str: """ Conditional edge to determine if workflow should continue or end """ # Check if any previous step failed if state.get('parse_error'): return END if state.get('e