Skip to content

A sophisticated Python application that streamlines your job search by intelligently scraping and filtering LinkedIn job postings.

Notifications You must be signed in to change notification settings

bigdata5911/Linked-in-Scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

45 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LinkedIn Job Scraper

Latest Update: August 2023 - Added OpenAI integration for automated cover letter generation

A sophisticated Python application that streamlines your job search by intelligently scraping and filtering LinkedIn job postings. Features a web-based dashboard for job management with AI-powered cover letter generation capabilities.

Application Screenshot

🎯 Overview

LinkedIn Job Scraper addresses the common frustrations of job searching on LinkedIn by providing:

  • Intelligent Filtering: Remove irrelevant job postings based on keywords in titles and descriptions
  • Duplicate Prevention: Automatic detection and removal of duplicate listings
  • Smart Sorting: Jobs sorted by actual posting date, not LinkedIn's relevance algorithm
  • No Sponsored Content: Focus only on genuine job postings
  • AI-Powered Cover Letters: Automated cover letter generation using OpenAI
  • Web Dashboard: Intuitive interface for job management and tracking

⚠️ Important Notice

Disclaimer: This application scrapes LinkedIn's website, which may violate their Terms of Service. Use at your own risk and consider implementing proxy servers to avoid potential IP blocking.

πŸš€ Features

Core Functionality

  • Automated Job Scraping: Multi-threaded scraping with configurable search parameters
  • Advanced Filtering: Filter by keywords, company names, job types, and languages
  • Database Storage: SQLite-based storage with efficient querying
  • Web Interface: Flask-based dashboard for job management
  • Status Tracking: Mark jobs as applied, rejected, interview, or hidden

AI Integration

  • Cover Letter Generation: OpenAI-powered automated cover letter creation
  • Resume Analysis: PDF resume parsing for personalized content
  • Smart Matching: AI-driven job-resume compatibility assessment

πŸ“‹ Prerequisites

  • Python 3.6 or higher
  • Flask
  • Requests
  • BeautifulSoup
  • Pandas
  • SQLite3
  • Pysocks

πŸ› οΈ Installation

  1. Clone the repository

    git clone https://github.com/bigdata5911/Linked-in-Scraping.git
    cd Linked-in-Scraping
  2. Install dependencies

    pip install -r requirements.txt
  3. Configure the application

    • Copy config_example.json to config.json
    • Update configuration parameters (see Configuration section below)
  4. Initialize the database

    python main.py
  5. Launch the web interface

    python app.py
  6. Access the dashboard Open your browser and navigate to http://127.0.0.1:5000

πŸ“– Usage

Job Scraper (main.py)

The scraper component handles LinkedIn job data extraction:

python main.py

Key Features:

  • Configurable search queries and filters
  • Duplicate detection and removal
  • Multi-round scraping for comprehensive coverage
  • Proxy support for enhanced reliability

Web Dashboard (app.py)

The Flask-based web interface provides job management capabilities:

python app.py

Dashboard Features:

  • Job Status Management: Mark jobs as applied (blue), rejected (red), interview (green), or hidden
  • Real-time Updates: Immediate database updates for all actions
  • Filtered Views: Focus on relevant job postings
  • Status Persistence: All changes saved to SQLite database

βš™οΈ Configuration

The config.json file controls all application behavior:

Network Configuration

{
  "proxies": {
    "http": "http://proxy-server:port",
    "https": "https://proxy-server:port"
  },
  "headers": {
    "User-Agent": "Your User Agent String"
  }
}

OpenAI Integration

{
  "OpenAI_API_KEY": "your-openai-api-key",
  "OpenAI_Model": "gpt-4",
  "resume_path": "/path/to/your/resume.pdf"
}

Search Configuration

{
  "search_queries": [
    {
      "keywords": "software engineer",
      "location": "San Francisco, CA",
      "f_WT": "2"
    }
  ]
}

Filtering Options

  • desc_words: Keywords to exclude from job descriptions
  • title_include: Required keywords in job titles
  • title_exclude: Keywords to exclude from job titles
  • company_exclude: Companies to filter out
  • languages: Allowed job posting languages (e.g., ["en", "de"])

Scraping Parameters

  • timespan: Time range for job postings
    • "r604800": Past week
    • "r86400": Last 24 hours
  • pages_to_scrape: Number of pages per search query
  • rounds: Number of scraping iterations
  • days_toscrape: Maximum age of job postings to scrape

Database Configuration

  • jobs_tablename: Table for raw job data
  • filtered_jobs_tablename: Table for filtered job data
  • db_path: SQLite database file path

🎨 Job Type Filters

Value Description
0 On-site positions
1 Hybrid positions
2 Remote positions
"" Any position type

πŸ”§ Advanced Features

Proxy Configuration

For enhanced reliability, configure proxy servers in your config.json:

{
  "proxies": {
    "http": "http://username:password@proxy-server:port",
    "https": "https://username:password@proxy-server:port"
  }
}

Automated Scheduling

Set up cron jobs for regular scraping:

# Run every hour during business days
0 9-17 * * 1-5 cd /path/to/Linked-in-Scraping && python main.py

🚧 Roadmap

Planned Features

  • Job Status Reversal: Add functionality to unhide and un-apply jobs
  • Enhanced Sorting: Sort by database entry date for better job discovery
  • Web Configuration: Frontend interface for search configuration
  • Export Functionality: Export job data to various formats
  • Advanced Analytics: Job application tracking and analytics

Known Limitations

  • Some job postings (~1-5%) may not appear in search results immediately due to LinkedIn's indexing delays
  • Manual database modification required to reverse job status changes
  • Configuration currently limited to JSON file editing

🀝 Contributing

We welcome contributions! Please follow these guidelines:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

For major changes, please open an issue first to discuss the proposed changes.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ‘¨β€πŸ’» Author

bigdata5911


Built with ❀️ for job seekers everywhere

About

A sophisticated Python application that streamlines your job search by intelligently scraping and filtering LinkedIn job postings.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published