Building an Intelligent Resume Transformation Agent Powered by LangGraph and gpt-4o-mini

Learn how to build an end-to-end agentic AI resume parsing system using LangGraph and GPT-4o-mini to extract structured, searchable data with HITL (Human in the Loop) step.

24 min read1 day ago

–

Press enter or click to view image in full size

Resume Transformer Pipeline LangGraph : Image by Alpha Iterations

Non-members read here for free.

Introduction

Learn how to build an end-to-end agentic AI resume parsing system using LangGraph and GPT-4o-mini to extract structured, searchable data with HITL (Human in the Loop) step.

24 min read1 day ago

–

Press enter or click to view image in full size

Resume Transformer Pipeline LangGraph : Image by Alpha Iterations

Non-members read here for free.

Introduction

Every recruiter knows the pain: hundreds of resumes in different formats (PDF, DOCX, TXT), inconsistent layouts, varying structures, and no easy way to search through them. Modern Applicant Tracking Systems automatically parse resumes, but parsing errors and inconsistencies are common. As a result, HR teams still spend significant time validating and correcting extracted data.

What if we could reliably transform any resume into clean, structured, searchable data?

That’s exactly what we’re going to build in this article: an end-to-end, agent-based workflow that combines deterministic processing with selective LLM reasoning to handle real-world resumes at scale.

Problem Statement

HR departments and recruitment teams spend 20–30 hours per week manually processing resumes which is a tedious, error-prone task where recruiters must visually scan PDFs and documents to extract candidate information, leading to inconsistent data entry, human errors, and inefficient candidate matching.

Proposed Solution

A multi-agent AI system built with LangGraph that transforms unstructured resume data into structured, queryable information through four specialized agents working in a coordinated workflow.

Tech stack

Agentic Framework: LangGraph
LLM Provider: OpenAI gpt-4o-mini
PDF Processing: PyPDF2
OCR: pytesseract
Database: SQLite3

What This Project Does

This is an intelligent resume processing pipeline built with LangGraph and OpenAI’s GPT-4o-mini that:

1. Parses resumes from multiple formats (PDF, DOCX, TXT) — even image-based PDFs with OCR

2. **Extracts **structured data using AI (contact info, experience, education, skills)

3. Validates & Enriches the data (calculates years of experience, standardizes skills)

4. HITL (Human in the Loop) to allow humans to take final decision

5. Stores everything in a queryable SQLite database

The result? A fully automated Applicant Tracking System (ATS) that can process resumes at scale.

parse --> extract --> validate_enrich --> human_review --> store --> END

Let’s get started.

Please note, For simplicity and readability, this article uses the term agent to describe both Python-based workflow functions and the AI-powered extraction component. Only the extraction step uses a true AI agent backed by GPT-4o-mini, while the other agents are deterministic Python functions orchestrated within the workflow.

Project Setup

Complete end to end notebook can be accessed on my github repo.

Folder Structure

Below is the folder structure for the project:

resume-transformer-agent/│├── README.md                          # This file - comprehensive project documentation├── requirements.txt                   # Python dependencies├── .env                              # Environment variables (API keys) - not in repo├── .gitignore                        # Git ignore rules│├── resume-transformer-agent.ipynb    # Main Jupyter notebook with complete workflow│├── data/                             # Sample resume files for testing│   ├── *.pdf                        # PDF resume samples│   ├── *.docx                       # DOCX resume samples│   └── *.txt                        # Text resume samples│└── resume_ats.db                     # SQLite database (auto-generated)    ├── candidates table              # Main candidate information    ├── experience table              # Work history entries    ├── education table               # Educational background    ├── skills table                  # Normalized skills    └── certifications table          # Professional certifications

Requirements File

In the requirements.txt file please add below libraries so that the they can be installed.

#requirements.txt# Resume Transformer Agent Dependencies# LangGraph for workflow orchestrationlanggraph>=0.0.20# OpenAI API for LLM-powered extractionopenai>=1.0.0# Document parsing librariesPyPDF2>=3.0.0python-docx>=1.0.0# OCR capabilities (optional, for image-based PDFs)pytesseract>=0.3.10pillow>=10.0.0pdf2image>=1.16.0

Create the virtual environment

In your terminal create a virtual environment called .venv:

For Mac (Terminal):

python -m venv .venv

For Windows (Command Prompt):

python -m venv .venv

Activate the virtual environment

For Mac

source .venv/bin/activate

For Windows:

.venv\Scripts\activate

Install the dependencies

in the terminal:

pip install -r requirements.txt

Create .env File

Please add a .env file in the repo and update the OpenAI API Key.

#.env fileOPENAI_API_KEY=your_api_key_here

Add resume data:

We download some resumes from Kaggle Source 1 and Kaggle Source 2 for varying layout and categories.

Please refer below image for the variety of the resumes:

Press enter or click to view image in full size

Resume Samples. Source: Kaggle

Press enter or click to view image in full size

Resume Samples. Source: Kaggle

Now we are good with the project setup. Let’s get started with the notebook:

Importing the required Libraries

import osimport jsonimport reimport sqlite3from datetime import datetimefrom typing import TypedDict, Optional, List, Dict, Anyfrom pathlib import Path# Document parsingimport PyPDF2import docx# LangGraphfrom langgraph.graph import StateGraph, END# OpenAIfrom openai import OpenAIfrom dotenv import load_dotenvload_dotenv()# Access the variablesOPENAI_API_KEY = os.getenv("OPENAI_API_KEY")# Set up OpenAI client# Make sure to set your OPENAI_API_KEY environment variableclient = OpenAI(api_key=OPENAI_API_KEY)print("✓ All libraries imported successfully!")

Define State Schema

We now define the state schema to store the variable values in the workflow.

class ResumeState(TypedDict):    """State schema for the resume transformation workflow"""    # Input    file_path: str    file_type: str        # Step 1: Parse    raw_text: Optional[str]    parse_error: Optional[str]        # Step 2: Extract    structured_data: Optional[Dict[str, Any]]    extract_error: Optional[str]        # Step 3: Validate & Enrich    validated_data: Optional[Dict[str, Any]]    validation_error: Optional[str]        # Step 3.5: Human Review (HITL)    human_approved: Optional[bool]    human_review_notes: Optional[str]        # Step 4: Store    database_id: Optional[int]    store_error: Optional[str]        # Metadata    status: str    messages: List[str]print("✓ State schema defined!")

Parsing Agent

We use PyPDF2 library to extract text from text based pdfs, and pytesseract library to extract text from scanned pdfs.

def parse_pdf(file_path: str) -> str:    """Extract text from PDF file, with OCR fallback for image-based PDFs"""    text = ""    try:        # First, try regular text extraction        with open(file_path, 'rb') as file:            pdf_reader = PyPDF2.PdfReader(file)            for page in pdf_reader.pages:                page_text = page.extract_text()                if page_text:                    text += page_text + "\n"                # If no text was extracted, the PDF is likely image-based, try OCR        if not text.strip():            print("   ⚠️ No text found in PDF - attempting OCR extraction...")            try:                from pdf2image import convert_from_path                import pytesseract                                # Convert PDF pages to images                images = convert_from_path(file_path)                                # Perform OCR on each page                for i, image in enumerate(images):                    page_text = pytesseract.image_to_string(image)                    text += page_text + "\n"                    print(f"   ✓ OCR extracted {len(page_text)} characters from page {i+1}")                            except ImportError:                raise Exception(                    "PDF appears to be image-based but OCR libraries not available. "                    "Install with: pip install pytesseract pdf2image pillow"                )            except Exception as e:                raise Exception(f"OCR extraction failed: {str(e)}")                    except Exception as e:        raise Exception(f"Error parsing PDF: {str(e)}")        return text.strip()def parse_docx(file_path: str) -> str:    """Extract text from DOCX file"""    try:        doc = docx.Document(file_path)        text = "\n".join([paragraph.text for paragraph in doc.paragraphs])    except Exception as e:        raise Exception(f"Error parsing DOCX: {str(e)}")    return text.strip()def parse_txt(file_path: str) -> str:    """Extract text from TXT file"""    try:        with open(file_path, 'r', encoding='utf-8') as file:            text = file.read()    except Exception as e:        raise Exception(f"Error parsing TXT: {str(e)}")    return text.strip()def parse_agent(state: ResumeState) -> ResumeState:    """    Agent 1: Parse - Extract raw text from resume file    """    print(f"🔍 PARSE AGENT: Processing {state['file_path']}")        try:        file_type = state['file_type'].lower()                if file_type == 'pdf':            raw_text = parse_pdf(state['file_path'])        elif file_type == 'docx':            raw_text = parse_docx(state['file_path'])        elif file_type == 'txt':            raw_text = parse_txt(state['file_path'])        else:            raise ValueError(f"Unsupported file type: {file_type}")                state['raw_text'] = raw_text        state['parse_error'] = None        state['messages'].append(f"✓ Successfully parsed {len(raw_text)} characters from {file_type.upper()}")        print(f"   ✓ Extracted {len(raw_text)} characters")            except Exception as e:        state['parse_error'] = str(e)        state['messages'].append(f"✗ Parse error: {str(e)}")        print(f"   ✗ Error: {str(e)}")        return stateprint("✓ Parse agent defined!")

Extract Agent

This is an LLM based step which takes the parsed text from the resume and then extract the relevant entities like contact (name, email, phone, location, linkedin, github), summary, experience, education, skills, certification and languages.

Get Alpha Iterations’s stories in your inbox

Join Medium for free to get updates from this writer.

We use gpt-4o-mini here.

def extract_agent(state: ResumeState) -> ResumeState:    """    Agent 2: Extract - Use LLM to extract structured data from raw text    """    print(f"🤖 EXTRACT AGENT: Analyzing resume with gpt-4o-mini")        if state.get('parse_error'):        state['extract_error'] = "Cannot extract: parsing failed"        return state        try:        # Prepare the prompt for GPT-4o-mini        system_prompt = """You are an expert resume parsing engine. Your task is to extract structured, normalized data from raw resume text and return it as VALID JSON ONLY.        Follow these rules strictly:        - Output MUST be a single JSON object and nothing else (no markdown, no comments).        - If a field is missing or not found, use null or an empty array (never invent data).        - Normalize capitalization (Title Case for names, companies, titles).        - Deduplicate repeated skills, certifications, and languages.        - Do not infer dates or employers unless explicitly stated.        - Keep descriptions concise and factual.        Extract the following schema exactly:        {        "contact": {            "name": string | null,            "email": string | null,            "phone": string | null,            "location": string | null,            "linkedin": string | null,            "github": string | null        },        "summary": string | null,        "experience": [            {            "company": string | null,            "title": string | null,            "start_date": string | null,            "end_date": string | null,            "description": string | null,            "responsibilities": [string]            }        ],        "education": [            {            "institution": string | null,            "degree": string | null,            "field": string | null,            "graduation_year": string | null            }        ],        "skills": [string],        "certifications": [string],        "languages": [string]        }        Date formatting rules:        - Use "YYYY-MM" when month is available.        - Use "YYYY" if only the year is available.        - Use "Present" for ongoing roles.        Responsibilities should be concise bullet-style statements (max 1 sentence each)."""        user_prompt = f"Resume text:\n\n{state['raw_text']}"                # Call OpenAI API        response = client.chat.completions.create(            model="gpt-4o-mini",            messages=[                {"role": "system", "content": system_prompt},                {"role": "user", "content": user_prompt}            ],            temperature=0.1,            response_format={"type": "json_object"}        )                # Parse the response        extracted_text = response.choices[0].message.content        structured_data = json.loads(extracted_text)                state['structured_data'] = structured_data        state['extract_error'] = None        state['messages'].append("✓ Successfully extracted structured data using GPT-4o-mini")        print(f"   ✓ Extracted data for: {structured_data.get('contact', {}).get('name', 'Unknown')}")            except Exception as e:        state['extract_error'] = str(e)        state['messages'].append(f"✗ Extract error: {str(e)}")        print(f"   ✗ Error: {str(e)}")        return stateprint("✓ Extract agent defined!")

Validation and Enrichment Agent

Cleans data, calculates total years of experience, and standardizes skills. We have used basic enrichment functions. Those can be enhanced for more sophesticated enrichment.

# Standard skills taxonomy for normalizationSTANDARD_SKILLS = {    # Programming Languages    'python': 'Python', 'java': 'Java', 'javascript': 'JavaScript', 'js': 'JavaScript',    'typescript': 'TypeScript', 'c++': 'C++', 'c#': 'C#', 'ruby': 'Ruby',    'go': 'Go', 'rust': 'Rust', 'swift': 'Swift', 'kotlin': 'Kotlin', 'php': 'PHP',        # Frameworks    'react': 'React', 'reactjs': 'React', 'angular': 'Angular', 'vue': 'Vue.js',    'django': 'Django', 'flask': 'Flask', 'spring': 'Spring', 'nodejs': 'Node.js',    'express': 'Express.js', 'fastapi': 'FastAPI',        # Databases    'sql': 'SQL', 'mysql': 'MySQL', 'postgresql': 'PostgreSQL', 'mongodb': 'MongoDB',    'redis': 'Redis', 'oracle': 'Oracle', 'dynamodb': 'DynamoDB',        # Cloud & DevOps    'aws': 'AWS', 'azure': 'Azure', 'gcp': 'Google Cloud', 'docker': 'Docker',    'kubernetes': 'Kubernetes', 'k8s': 'Kubernetes', 'terraform': 'Terraform',    'jenkins': 'Jenkins', 'ci/cd': 'CI/CD', 'git': 'Git',        # Data Science & ML    'machine learning': 'Machine Learning', 'ml': 'Machine Learning',    'deep learning': 'Deep Learning', 'tensorflow': 'TensorFlow',    'pytorch': 'PyTorch', 'pandas': 'Pandas', 'numpy': 'NumPy',    'scikit-learn': 'Scikit-learn', 'nlp': 'NLP',        # Other    'agile': 'Agile', 'scrum': 'Scrum', 'rest api': 'REST API',    'graphql': 'GraphQL', 'microservices': 'Microservices'}def parse_date(date_str: str) -> Optional[datetime]:    """Parse various date formats"""    if not date_str or date_str.lower() in ['present', 'current', 'now']:        return datetime.now()        # Try different formats    formats = ['%Y-%m', '%Y', '%m/%Y', '%B %Y', '%b %Y']    for fmt in formats:        try:            return datetime.strptime(str(date_str).strip(), fmt)        except:            continue    return Nonedef calculate_experience(experience_list: List[Dict]) -> float:    """Calculate total years of experience"""    total_months = 0        for job in experience_list:        start = parse_date(job.get('start_date', ''))        end = parse_date(job.get('end_date', ''))                if start and end:            months = (end.year - start.year) * 12 + (end.month - start.month)            total_months += max(0, months)        return round(total_months / 12, 1)def standardize_skills(skills: List[str]) -> List[str]:    """Normalize skills to standard taxonomy"""    standardized = set()        for skill in skills:        skill_lower = skill.lower().strip()        # Check if skill matches standard taxonomy        if skill_lower in STANDARD_SKILLS:            standardized.add(STANDARD_SKILLS[skill_lower])        else:            # Keep original if not in taxonomy            standardized.add(skill.strip())        return sorted(list(standardized))def clean_contact_info(contact: Dict) -> Dict:    """Clean and validate contact information"""    cleaned = {}        # Clean email    if contact.get('email'):        email = contact['email'].strip().lower()        if '@' in email:            cleaned['email'] = email        # Clean phone    if contact.get('phone'):        phone = re.sub(r'[^\d+\-() ]', '', contact['phone'])        cleaned['phone'] = phone        # Copy other fields    for field in ['name', 'location', 'linkedin', 'github']:        if contact.get(field):            cleaned[field] = contact[field].strip()        return cleaneddef validate_and_enrich_agent(state: ResumeState) -> ResumeState:    """    Agent 3: Validate & Enrich - Clean data, calculate experience, standardize skills    """    print(f"✨ VALIDATE & ENRICH AGENT: Processing data")        if state.get('extract_error'):        state['validation_error'] = "Cannot validate: extraction failed"        return state        try:        data = state['structured_data'].copy()                # Clean contact info        if data.get('contact'):            data['contact'] = clean_contact_info(data['contact'])                # Calculate total experience        if data.get('experience'):            total_experience = calculate_experience(data['experience'])            data['total_years_experience'] = total_experience            print(f"   ✓ Calculated {total_experience} years of experience")                # Standardize skills        if data.get('skills'):            original_count = len(data['skills'])            data['skills'] = standardize_skills(data['skills'])            print(f"   ✓ Standardized {original_count} skills to {len(data['skills'])} unique skills")                # Add metadata        data['processed_date'] = datetime.now().isoformat()        data['data_version'] = '1.0'                state['validated_data'] = data        state['validation_error'] = None        state['messages'].append("✓ Successfully validated and enriched data")            except Exception as e:        state['validation_error'] = str(e)        state['messages'].append(f"✗ Validation error: {str(e)}")        print(f"   ✗ Error: {str(e)}")        return stateprint("✓ Validate & Enrich agent defined!")

Human in the Loop (HITL)

It is always essential to have a HITL step in the flow. Here the HR person is provided with a UI to validate/edit the data. This step is called visual grounding.

We use simple HTML webpage to view the image on the left side and the extracted fields on the right side allowing the user to validate and edit the extracted data if required.

# HTML-based Human Review Interfaceimport webbrowserimport base64from io import BytesIOfrom pathlib import Pathimport tempfiledef get_resume_image_base64(file_path: str, file_type: str) -> str:    """Convert resume to base64 image for embedding in HTML"""    try:        if file_type == 'pdf':            from pdf2image import convert_from_path            images = convert_from_path(file_path, first_page=1, last_page=1, dpi=200)            resume_image = images[0]        elif file_type in ['jpg', 'jpeg', 'png']:            from PIL import Image            resume_image = Image.open(file_path)        else:            return None                # Convert to base64        buffered = BytesIO()        resume_image.save(buffered, format="PNG")        img_str = base64.b64encode(buffered.getvalue()).decode()        return f"data:image/png;base64,{img_str}"    except Exception as e:        print(f"⚠️ Could not generate visual preview: {str(e)}")        return Nonedef create_review_html_file(file_path: str, file_type: str, validated_data: Dict, output_path: str, result_file: str) -> str:    """Create HTML interface for human review"""        # Get resume image    img_data = get_resume_image_base64(file_path, file_type)        # Extract data    contact = validated_data.get('contact', {})    skills = validated_data.get('skills', [])    certifications = validated_data.get('certifications', [])    education = validated_data.get('education', [])    experience = validated_data.get('experience', [])        # Build education HTML (editable)    education_html = ""    for idx, edu in enumerate(education):        education_html += f"""        <div style="margin-bottom: 15px; padding: 15px; background: #f8f9fa; border-radius: 5px; border: 1px solid #dee2e6;">            <div style="font-weight: 600; color: #1976d2; margin-bottom: 8px;">Education #{idx + 1}</div>            <input type="text" class="edit-input" id="edu_{idx}_degree" value="{edu.get('degree', '')}" placeholder="Degree">            <input type="text" class="edit-input" id="edu_{idx}_field" value="{edu.get('field', '')}" placeholder="Field of Study">            <input type="text" class="edit-input" id="edu_{idx}_institution" value="{edu.get('institution', '')}" placeholder="Institution">            <input type="text" class="edit-input" id="edu_{idx}_year" value="{edu.get('graduation_year', '')}" placeholder="Graduation Year">        </div>        """        # Build experience HTML (editable)    experience_html = ""    for idx, exp in enumerate(experience):        experience_html += f"""        <div style="margin-bottom: 15px; padding: 15px; background: #f8f9fa; border-radius: 5px; border: 1px solid #dee2e6;">            <div style="font-weight: 600; color: #1976d2; margin-bottom: 8px;">Experience #{idx + 1}</div>            <input type="text" class="edit-input" id="exp_{idx}_title" value="{exp.get('title', '')}" placeholder="Job Title">            <input type="text" class="edit-input" id="exp_{idx}_company" value="{exp.get('company', '')}" placeholder="Company">            <div style="display: flex; gap: 10px;">                <input type="text" class="edit-input" style="flex: 1;" id="exp_{idx}_start" value="{exp.get('start_date', '')}" placeholder="Start Date (YYYY-MM)">                <input type="text" class="edit-input" style="flex: 1;" id="exp_{idx}_end" value="{exp.get('end_date', '')}" placeholder="End Date (YYYY-MM or Present)">            </div>            <textarea class="edit-input" id="exp_{idx}_description" rows="3" placeholder="Description">{exp.get('description', '')}</textarea>        </div>        """        # Build skills textarea (editable - one per line)    skills_text = "\n".join(skills)        # Build certifications textarea (editable - one per line)    certifications_text = "\n".join(certifications)        # Image section    image_section = ""    if img_data:        image_section = f"""        <div style="flex: 1; background: #f5f5f5; padding: 20px; overflow-y: auto; max-height: 80vh; border-right: 2px solid #e0e0e0;">            <div style="display: flex; justify-content: space-between; align-items: center; margin-bottom: 15px;">                <h2 style="margin: 0; color: #1976d2;">📄 Resume Document</h2>                <div style="display: flex; gap: 8px;">                    <button onclick="zoomIn()" style="padding: 8px 12px; border: 1px solid #1976d2; background: white; color: #1976d2; border-radius: 4px; cursor: pointer; font-weight: 600; transition: all 0.2s;" onmouseover="this.style.background='#1976d2'; this.style.color='white';" onmouseout="this.style.background='white'; this.style.color='#1976d2';">🔍+ Zoom In</button>                    <button onclick="zoomOut()" style="padding: 8px 12px; border: 1px solid #1976d2; background: white; color: #1976d2; border-radius: 4px; cursor: pointer; font-weight: 600; transition: all 0.2s;" onmouseover="this.style.background='#1976d2'; this.style.color='white';" onmouseout="this.style.background='white'; this.style.color='#1976d2';">🔍- Zoom Out</button>                    <button onclick="fitToWindow()" style="padding: 8px 12px; border: 1px solid #1976d2; background: white; color: #1976d2; border-radius: 4px; cursor: pointer; font-weight: 600; transition: all 0.2s;" onmouseover="this.style.background='#1976d2'; this.style.color='white';" onmouseout="this.style.background='white'; this.style.color='#1976d2';">⬜ Fit to Window</button>                </div>            </div>            <div id="resume-image-container" style="text-align: center; overflow: auto;">                <img id="resume-image" src="{img_data}" style="width: 100%; border-radius: 8px; box-shadow: 0 4px 6px rgba(0,0,0,0.1); transition: transform 0.2s;">            </div>        </div>        """    else:        image_section = """        <div style="flex: 1; background: #f5f5f5; padding: 20px; display: flex; align-items: center; justify-content: center; border-right: 2px solid #e0e0e0;">            <div style="text-align: center; color: #666;">                <div style="font-size: 48px; margin-bottom: 10px;">📄</div>                <div>Resume preview not available</div>            </div>        </div>        """        html = f"""    <div style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;">        <style>            .edit-input {{                width: 100%;                padding: 10px;                margin: 8px 0;                border: 1px solid #ddd;                border-radius: 5px;                font-size: 14px;                box-sizing: border-box;            }}            .edit-input:focus {{                outline: none;                border-color: #1976d2;                box-shadow: 0 0 0 2px rgba(25,118,210,0.1);            }}            .field-label {{                font-weight: 600;                color: #333;                margin-bottom: 4px;                font-size: 13px;                text-transform: uppercase;                letter-spacing: 0.5px;            }}            .section-title {{                font-size: 18px;                font-weight: 700;                color: #1976d2;                margin: 25px 0 15px 0;                padding-bottom: 8px;                border-bottom: 2px solid #e3f2fd;            }}            .btn {{                padding: 12px 24px;                border: none;                border-radius: 6px;                font-size: 15px;                font-weight: 600;                cursor: pointer;                transition: all 0.2s;                margin: 10px 10px 0 0;            }}            .btn-approve {{                background: #4caf50;                color: white;            }}            .btn-approve:hover {{                background: #45a049;                transform: translateY(-1px);                box-shadow: 0 4px 8px rgba(0,0,0,0.2);            }}            .btn-reject {{                background: #f44336;                color: white;            }}            .btn-reject:hover {{                background: #da190b;                transform: translateY(-1px);                box-shadow: 0 4px 8px rgba(0,0,0,0.2);            }}        </style>                <div style="background: #1976d2; color: white; padding: 20px; border-radius: 8px 8px 0 0; margin-bottom: 0;">            <h1 style="margin: 0; font-size: 24px;">👁️ Human Review - Visual Grounding</h1>            <p style="margin: 8px 0 0 0; opacity: 0.9;">Review extracted data against the original resume and make corrections if needed</p>        </div>                <div style="display: flex; border: 2px solid #e0e0e0; border-top: none; border-radius: 0 0 8px 8px; background: white;">            {image_section}                        <div style="flex: 1; padding: 30px; overflow-y: auto; max-height: 80vh;">                <div class="section-title">📝 Contact Information (Editable)</div>                                <div class="field-label">Full Name</div>                <input type="text" class="edit-input" id="name_input" value="{contact.get('name', '')}" placeholder="Full Name">                                <div class="field-label">Email Address</div>                <input type="email" class="edit-input" id="email_input" value="{contact.get('email', '')}" placeholder="email@example.com">                                <div class="field-label">Phone Number</div>                <input type="tel" class="edit-input" id="phone_input" value="{contact.get('phone', '')}" placeholder="+1 (555) 123-4567">                                <div class="field-label">Location</div>                <input type="text" class="edit-input" id="location_input" value="{contact.get('location', '')}" placeholder="City, State">                                <div class="field-label">LinkedIn Profile</div>                <input type="url" class="edit-input" id="linkedin_input" value="{contact.get('linkedin', '')}" placeholder="linkedin.com/in/username">                                <div class="field-label">GitHub Profile</div>                <input type="url" class="edit-input" id="github_input" value="{contact.get('github', '')}" placeholder="github.com/username">                                <div class="section-title">💼 Professional Summary</div>                <div style="background: #f8f9fa; padding: 15px; border-radius: 5px; margin-bottom: 15px;">                    <div style="font-size: 16px;"><strong>Total Experience:</strong> {validated_data.get('total_years_experience', 0)} years</div>                </div>                                <div class="section-title">🛠️ Skills ({len(skills)}) - Editable</div>                <div class="field-label">Skills (one per line)</div>                <textarea class="edit-input" id="skills_input" rows="6" placeholder="Enter skills, one per line">{skills_text}</textarea>                                <div class="section-title">📜 Certifications ({len(certifications)}) - Editable</div>                <div class="field-label">Certifications (one per line)</div>                <textarea class="edit-input" id="certifications_input" rows="4" placeholder="Enter certifications, one per line">{certifications_text}</textarea>                                <div class="section-title">🎓 Education ({len(education)}) - Editable</div>                {education_html}                                <div class="section-title">💼 Experience ({len(experience)}) - Editable</div>                {experience_html}                                <div class="section-title">📋 Review Notes</div>                <textarea class="edit-input" id="notes_input" rows="4" placeholder="Add any review notes or corrections needed..." style="resize: vertical; min-height: 80px;"></textarea>                                <div style="margin-top: 30px; padding-top: 20px; border-top: 2px solid #e0e0e0;">                    <button class="btn btn-approve" onclick="approveData()">✓ Approve & Continue</button>                    <button class="btn btn-reject" onclick="rejectData()">✗ Reject & Skip</button>                    <div id="status_message" style="margin-top: 15px; padding: 12px; border-radius: 5px; display: none;"></div>                </div>            </div>        </div>                <script>            window.reviewState = {{                approved: null,                data: {{}}            }};                        // Zoom functionality            let currentZoom = 100;            const zoomStep = 20;            const minZoom = 50;            const maxZoom = 300;                        function zoomIn() {{                if (currentZoom < maxZoom) {{                    currentZoom += zoomStep;                    applyZoom();                }}            }}                        function zoomOut() {{                if (currentZoom > minZoom) {{                    currentZoom -= zoomStep;                    applyZoom();                }}            }}                        function fitToWindow() {{                currentZoom = 100;                applyZoom();            }}                        function applyZoom() {{                const img = document.getElementById('resume-image');                if (img) {{                    img.style.width = currentZoom + '%';                }}            }}                        function approveData() {{                window.reviewState.approved = true;                                // Collect contact data                window.reviewState.data = {{                    contact: {{                        name: document.getElementById('name_input').value,                        email: document.getElementById('email_input').value,                        phone: document.getElementById('phone_input').value,                        location: document.getElementById('location_input').value,                        linkedin: document.getElementById('linkedin_input').value,                        github: document.getElementById('github_input').value                    }},                    skills: document.getElementById('skills_input').value.split('\\n').filter(s => s.trim() !== ''),                    certifications: document.getElementById('certifications_input').value.split('\\n').filter(s => s.trim() !== ''),                    education: [],                    experience: [],                    notes: document.getElementById('notes_input').value                }};                                // Collect education data                let eduIdx = 0;                while (document.getElementById('edu_' + eduIdx + '_degree')) {{                    window.reviewState.data.education.push({{                        degree: document.getElementById('edu_' + eduIdx + '_degree').value,                        field: document.getElementById('edu_' + eduIdx + '_field').value,                        institution: document.getElementById('edu_' + eduIdx + '_institution').value,                        graduation_year: document.getElementById('edu_' + eduIdx + '_year').value                    }});                    eduIdx++;                }}                                // Collect experience data                let expIdx = 0;                while (document.getElementById('exp_' + expIdx + '_title')) {{                    window.reviewState.data.experience.push({{                        title: document.getElementById('exp_' + expIdx + '_title').value,                        company: document.getElementById('exp_' + expIdx + '_company').value,                        start_date: document.getElementById('exp_' + expIdx + '_start').value,                        end_date: document.getElementById('exp_' + expIdx + '_end').value,                        description: document.getElementById('exp_' + expIdx + '_description').value                    }});                    expIdx++;                }}                                var msg = document.getElementById('status_message');                msg.style.display = 'block';                msg.style.background = '#d4edda';                msg.style.color = '#155724';                msg.style.border = '1px solid #c3e6cb';                msg.innerHTML = '✅ <strong>Data Approved!</strong> Please click Continue below the interface.';            }}                        function rejectData() {{                window.reviewState.approved = false;                window.reviewState.data = {{                    notes: document.getElementById('notes_input').value                }};                                var msg = document.getElementById('status_message');                msg.style.display = 'block';                msg.style.background = '#f8d7da';                msg.style.color = '#721c24';                msg.style.border = '1px solid #f5c6cb';                msg.innerHTML = '❌ <strong>Data Rejected.</strong> This record will NOT be stored.<br><br>Click the button below to save your decision.';            }}                        function saveDecision() {{                // Save decision to a JSON file that Python can read                var data = JSON.stringify(window.reviewState);                var blob = new Blob([data], {{type: 'application/json'}});                var url = URL.createObjectURL(blob);                var a = document.createElement('a');                a.href = url;                a.download = '{result_file}';                a.click();                                // Also write to localStorage as backup                localStorage.setItem('reviewState', data);                                // Update message                var statusMsg = document.getElementById('status_message');                if (statusMsg.style.display !== 'none') {{                    statusMsg.innerHTML += '<br><br>📥 <strong>File Downloaded!</strong> Return to the notebook and press Enter to continue.';                }}            }}        </script>        <div style="text-center; padding: 20px; background: #f8f9fa; border-top: 2px solid #e0e0e0;">            <button class="btn" style="background: #2196f3; color: white; font-size: 16px;" onclick="saveDecision()">💾 Save Decision & Continue</button>            <p style="margin-top: 10px; color: #666; font-size: 13px;">After clicking above, return to your notebook and press Enter</p>        </div>    </div>    """        # Write HTML to file    with open(output_path, 'w', encoding='utf-8') as f:        f.write(html)        return output_pathdef human_review_agent(state: ResumeState) -> ResumeState:    """    Agent 3.5: Human Review - Display resume and allow corrections (HITL)    """    print(f"👁️ HUMAN REVIEW AGENT: Visual grounding and verification")        if state.get('validation_error'):        print("   ⚠️ Skipping review due to validation error")        state['human_approved'] = False        return state        print("\n" + "="*80)    print("VISUAL GROUNDING: Review extracted data against original resume")    print("="*80 + "\n")        # Create temporary files for HTML and result    temp_dir = Path(tempfile.gettempdir()) / "resume_review"    temp_dir.mkdir(exist_ok=True)        html_file = temp_dir / "review.html"    result_file = temp_dir / "review_result.json"        # Clean up old result file if it exists    if result_file.exists():        result_file.unlink()        # Create HTML file    print("📄 Generating review interface...")    create_review_html_file(        state['file_path'],        state['file_type'],        state['validated_data'],        str(html_file),        result_file.name    )        # Open in browser    print(f"🌐 Opening review interface in browser: {html_file}")    webbrowser.open(f"file://{html_file}")        # Wait for user to complete review    print("\n⏳ Waiting for your decision in the browser...")    print("👉 After clicking 'Approve' or 'Reject', save the downloaded file and press Enter here: ")    input()        # Try to read result from file    import time    max_attempts = 3    attempt = 0    approval_result = None        # Fallback to manual input with validation    print("\n" + "="*60)    decision = ""    while decision not in ['approve', 'reject']:        decision = input("Enter your decision (approve/reject): ").lower().strip()        if decision not in ['approve', 'reject']:            print("⚠️ Invalid input. Please type 'approve' or 'reject' (without quotes)")        if decision == 'approve':        state['human_approved'] = True                # Ask if user wants to input edited data        has_edits = input("Did you make any edits? (yes/no): ").lower().strip()        if has_edits == 'yes':            print("\n📝 Please paste the edited data JSON (or press Enter to skip):")            try:                import json                json_input = input()                if json_input.strip():                    edited_data = json.loads(json_input)                                        # Merge edited contact data                    if 'contact' in edited_data:                        state['validated_data']['contact'].update(edited_data['contact'])                                        # Update skills                    if 'skills' in edited_data:                        state['validated_data']['skills'] = edited_data['skills']                                        # Update certifications                    if 'certifications' in edited_data:                        state['validated_data']['certifications'] = edited_data['certifications']                                        # Update education                    if 'education' in edited_data:                        state['validated_data']['education'] = edited_data['education']                                        # Update experience                    if 'experience' in edited_data:                        state['validated_data']['experience'] = edited_data['experience']                                        print("   ✓ Edits applied to validated data")            except Exception as e:                print(f"   ⚠️ Could not parse edits: {e}")                state['human_review_notes'] = input("Review notes (optional): ")        state['messages'].append("✓ Human review: APPROVED (manual)")        print("\n   ✓ Human review completed - APPROVED")    else:        state['human_approved'] = False        state['human_review_notes'] = input("Rejection reason: ")        state['messages'].append("✗ Human review: REJECTED")        print("\n   ✗ Human review completed - REJECTED")        return state

HTML Page design for Human in the Loop

Store Agent

This step formats the data into SQL and inserts into the database using sqlite3. There 5 tables: candidates, experience, education, skills and certification.

def initialize_database(db_path: str = "resume_ats.db"):    """Initialize the ATS database with required tables"""    conn = sqlite3.connect(db_path)    cursor = conn.cursor()        # Create candidates table    cursor.execute('''        CREATE TABLE IF NOT EXISTS candidates (            id INTEGER PRIMARY KEY AUTOINCREMENT,            name TEXT NOT NULL,            email TEXT,            phone TEXT,            location TEXT,            linkedin TEXT,            github TEXT,            summary TEXT,            total_years_experience REAL,            processed_date TEXT,            data_version TEXT,            human_approved BOOLEAN,            human_review_notes TEXT,            created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP        )    ''')        # Add missing columns to existing table (for backward compatibility)    try:        cursor.execute("ALTER TABLE candidates ADD COLUMN human_approved BOOLEAN")        print("   ✓ Added human_approved column to existing database")    except sqlite3.OperationalError:        pass  # Column already exists        try:        cursor.execute("ALTER TABLE candidates ADD COLUMN human_review_notes TEXT")        print("   ✓ Added human_review_notes column to existing database")    except sqlite3.OperationalError:        pass  # Column already exists        # Create experience table    cursor.execute('''        CREATE TABLE IF NOT EXISTS experience (            id INTEGER PRIMARY KEY AUTOINCREMENT,            candidate_id INTEGER,            company TEXT,            title TEXT,            start_date TEXT,            end_date TEXT,            description TEXT,            FOREIGN KEY (candidate_id) REFERENCES candidates (id)        )    ''')        # Create education table    cursor.execute('''        CREATE TABLE IF NOT EXISTS education (            id INTEGER PRIMARY KEY AUTOINCREMENT,            candidate_id INTEGER,            institution TEXT,            degree TEXT,            field TEXT,            graduation_year TEXT,            FOREIGN KEY (candidate_id) REFERENCES candidates (id)        )    ''')        # Create skills table    cursor.execute('''        CREATE TABLE IF NOT EXISTS skills (            id INTEGER PRIMARY KEY AUTOINCREMENT,            candidate_id INTEGER,            skill TEXT,            FOREIGN KEY (candidate_id) REFERENCES candidates (id)        )    ''')        # Create certifications table    cursor.execute('''        CREATE TABLE IF NOT EXISTS certifications (            id INTEGER PRIMARY KEY AUTOINCREMENT,            candidate_id INTEGER,            certification TEXT,            FOREIGN KEY (candidate_id) REFERENCES candidates (id)        )    ''')        conn.commit()    conn.close()def store_agent(state: ResumeState, db_path: str = "resume_ats.db") -> ResumeState:    """    Agent 4: Store - Insert structured data into database    """    print(f"💾 STORE AGENT: Saving to database")        if state.get('validation_error'):        state['store_error'] = "Cannot store: validation failed"        return state        # Check human approval status    if state.get('human_approved') == False:        state['store_error'] = "Cannot store: rejected by human reviewer"        state['status'] = 'rejected'        print(f"   ⚠️ Skipping storage - rejected by human reviewer")        return state        try:        # Initialize database        initialize_database(db_path)                conn = sqlite3.connect(db_path)        cursor = conn.cursor()                data = state['validated_data']        contact = data.get('contact', {})                # Insert candidate with human review data        cursor.execute('''            INSERT INTO candidates             (name, email, phone, location, linkedin, github, summary,              total_years_experience, processed_date, data_version,              human_approved, human_review_notes)            VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)        ''', (            contact.get('name'),            contact.get('email'),            contact.get('phone'),            contact.get('location'),            contact.get('linkedin'),            contact.get('github'),            data.get('summary'),            data.get('total_years_experience'),            data.get('processed_date'),            data.get('data_version'),            state.get('human_approved'),            state.get('human_review_notes')        ))                candidate_id = cursor.lastrowid                # Insert experience        for exp in data.get('experience', []):            cursor.execute('''                INSERT INTO experience                 (candidate_id, company, title, start_date, end_date, description)                VALUES (?, ?, ?, ?, ?, ?)            ''', (                candidate_id,                exp.get('company'),                exp.get('title'),                exp.get('start_date'),                exp.get('end_date'),                exp.get('description')            ))                # Insert education        for edu in data.get('education', []):            cursor.execute('''                INSERT INTO education                 (candidate_id, institution, degree, field, graduation_year)                VALUES (?, ?, ?, ?, ?)            ''', (                candidate_id,                edu.get('institution'),                edu.get('degree'),                edu.get('field'),                edu.get('graduation_year')            ))                # Insert skills        for skill in data.get('skills', []):            cursor.execute('''                INSERT INTO skills (candidate_id, skill)                VALUES (?, ?)            ''', (candidate_id, skill))                # Insert certifications        for cert in data.get('certifications', []):            cursor.execute('''                INSERT INTO certifications (candidate_id, certification)                VALUES (?, ?)            ''', (candidate_id, cert))                conn.commit()        conn.close()                state['database_id'] = candidate_id        state['store_error'] = None        state['status'] = 'completed'        state['messages'].append(f"✓ Successfully stored candidate with ID {candidate_id}")        print(f"   ✓ Stored candidate with ID: {candidate_id}")            except Exception as e:        state['store_error'] = str(e)        state['status'] = 'failed'        state['messages'].append(f"✗ Store error: {str(e)}")        print(f"   ✗ Error: {str(e)}")        return stateprint("✓ Store agent defined!")

Building the LangGraph Workflow

Now we stitch the pieces together to create a complete workflow using LangGraph nodes and edges.

parse --> extract --> validate_enrich --> human_review --> store --> END

def should_continue(state: ResumeState) -> str:    """    Conditional edge to determine if workflow should continue or end    """    # Check if any previous step failed    if state.get('parse_error'):        return END    if state.get('e

Learn how to build an end-to-end agentic AI resume parsing system using LangGraph and GPT-4o-mini to extract structured, searchable data with HITL (Human in the Loop) step.

Introduction

Learn how to build an end-to-end agentic AI resume parsing system using LangGraph and GPT-4o-mini to extract structured, searchable data with HITL (Human in the Loop) step.

Introduction

Problem Statement

Proposed Solution

Tech stack

What This Project Does

Project Setup

Folder Structure

Requirements File

Create the virtual environment

Activate the virtual environment

Install the dependencies

Create .env File

Add resume data:

Importing the required Libraries

Define State Schema

Parsing Agent

Extract Agent

Get Alpha Iterations’s stories in your inbox

Validation and Enrichment Agent

Human in the Loop (HITL)

Store Agent

Building the LangGraph Workflow

Similar Posts