LinkedIn Job Scraper

Latest Update: August 2023 - Added OpenAI integration for automated cover letter generation

A sophisticated Python application that streamlines your job search by intelligently scraping and filtering LinkedIn job postings. Features a web-based dashboard for job management with AI-powered cover letter generation capabilities.

🎯 Overview

LinkedIn Job Scraper addresses the common frustrations of job searching on LinkedIn by providing:

Intelligent Filtering: Remove irrelevant job postings based on keywords in titles and descriptions
Duplicate Prevention: Automatic detection and removal of duplicate listings
Smart Sorting: Jobs sorted by actual posting date, not LinkedIn's relevance algorithm
No Sponsored Content: Focus only on genuine job postings
AI-Powered Cover Letters: Automated cover letter generation using OpenAI
Web Dashboard: Intuitive interface for job management and tracking

⚠️ Important Notice

Disclaimer: This application scrapes LinkedIn's website, which may violate their Terms of Service. Use at your own risk and consider implementing proxy servers to avoid potential IP blocking.

🚀 Features

Core Functionality

Automated Job Scraping: Multi-threaded scraping with configurable search parameters
Advanced Filtering: Filter by keywords, company names, job types, and languages
Database Storage: SQLite-based storage with efficient querying
Web Interface: Flask-based dashboard for job management
Status Tracking: Mark jobs as applied, rejected, interview, or hidden

AI Integration

Cover Letter Generation: OpenAI-powered automated cover letter creation
Resume Analysis: PDF resume parsing for personalized content
Smart Matching: AI-driven job-resume compatibility assessment

📋 Prerequisites

Python 3.6 or higher
Flask
Requests
BeautifulSoup
Pandas
SQLite3
Pysocks

🛠️ Installation

Clone the repository

git clone https://github.com/bigdata5911/Linked-in-Scraping.git
cd Linked-in-Scraping

Install dependencies
```
pip install -r requirements.txt
```
Configure the application
- Copy config_example.json to config.json
- Update configuration parameters (see Configuration section below)
Initialize the database
```
python main.py
```
Launch the web interface
```
python app.py
```
Access the dashboard Open your browser and navigate to http://127.0.0.1:5000

📖 Usage

Job Scraper (`main.py`)

The scraper component handles LinkedIn job data extraction:

python main.py

Key Features:

Configurable search queries and filters
Duplicate detection and removal
Multi-round scraping for comprehensive coverage
Proxy support for enhanced reliability

Web Dashboard (`app.py`)

The Flask-based web interface provides job management capabilities:

python app.py

Dashboard Features:

Job Status Management: Mark jobs as applied (blue), rejected (red), interview (green), or hidden
Real-time Updates: Immediate database updates for all actions
Filtered Views: Focus on relevant job postings
Status Persistence: All changes saved to SQLite database

⚙️ Configuration

The config.json file controls all application behavior:

Network Configuration

{
  "proxies": {
    "http": "http://proxy-server:port",
    "https": "https://proxy-server:port"
  },
  "headers": {
    "User-Agent": "Your User Agent String"
  }
}

OpenAI Integration

{
  "OpenAI_API_KEY": "your-openai-api-key",
  "OpenAI_Model": "gpt-4",
  "resume_path": "/path/to/your/resume.pdf"
}

Search Configuration

{
  "search_queries": [
    {
      "keywords": "software engineer",
      "location": "San Francisco, CA",
      "f_WT": "2"
    }
  ]
}

Filtering Options

desc_words: Keywords to exclude from job descriptions
title_include: Required keywords in job titles
title_exclude: Keywords to exclude from job titles
company_exclude: Companies to filter out
languages: Allowed job posting languages (e.g., ["en", "de"])

Scraping Parameters

timespan: Time range for job postings
- "r604800": Past week
- "r86400": Last 24 hours
pages_to_scrape: Number of pages per search query
rounds: Number of scraping iterations
days_toscrape: Maximum age of job postings to scrape

Database Configuration

jobs_tablename: Table for raw job data
filtered_jobs_tablename: Table for filtered job data
db_path: SQLite database file path

🎨 Job Type Filters

Value	Description
`0`	On-site positions
`1`	Hybrid positions
`2`	Remote positions
`""`	Any position type

🔧 Advanced Features

Proxy Configuration

For enhanced reliability, configure proxy servers in your config.json:

{
  "proxies": {
    "http": "http://username:password@proxy-server:port",
    "https": "https://username:password@proxy-server:port"
  }
}

Automated Scheduling

Set up cron jobs for regular scraping:

# Run every hour during business days
0 9-17 * * 1-5 cd /path/to/Linked-in-Scraping && python main.py

🚧 Roadmap

Planned Features

Job Status Reversal: Add functionality to unhide and un-apply jobs
Enhanced Sorting: Sort by database entry date for better job discovery
Web Configuration: Frontend interface for search configuration
Export Functionality: Export job data to various formats
Advanced Analytics: Job application tracking and analytics

Known Limitations

Some job postings (~1-5%) may not appear in search results immediately due to LinkedIn's indexing delays
Manual database modification required to reverse job status changes
Configuration currently limited to JSON file editing

🤝 Contributing

We welcome contributions! Please follow these guidelines:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

For major changes, please open an issue first to discuss the proposed changes.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👨‍💻 Author

bigdata5911

Built with ❤️ for job seekers everywhere

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
data		data
screenshot		screenshot
static		static
templates		templates
tests		tests
.gitignore		.gitignore
README.md		README.md
app.py		app.py
config_example.json		config_example.json
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LinkedIn Job Scraper

🎯 Overview

⚠️ Important Notice

🚀 Features

Core Functionality

AI Integration

📋 Prerequisites

🛠️ Installation

📖 Usage

Job Scraper (`main.py`)

Web Dashboard (`app.py`)

⚙️ Configuration

Network Configuration

OpenAI Integration

Search Configuration

Filtering Options

Scraping Parameters

Database Configuration

🎨 Job Type Filters

🔧 Advanced Features

Proxy Configuration

Automated Scheduling

🚧 Roadmap

Planned Features

Known Limitations

🤝 Contributing

📄 License

👨‍💻 Author

About

Uh oh!

Releases

Packages

Languages

bigdata5911/Linked-in-Scraping

Folders and files

Latest commit

History

Repository files navigation

LinkedIn Job Scraper

🎯 Overview

⚠️ Important Notice

🚀 Features

Core Functionality

AI Integration

📋 Prerequisites

🛠️ Installation

📖 Usage

Job Scraper (main.py)

Web Dashboard (app.py)

⚙️ Configuration

Network Configuration

OpenAI Integration

Search Configuration

Filtering Options

Scraping Parameters

Database Configuration

🎨 Job Type Filters

🔧 Advanced Features

Proxy Configuration

Automated Scheduling

🚧 Roadmap

Planned Features

Known Limitations

🤝 Contributing

📄 License

👨‍💻 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Job Scraper (`main.py`)

Web Dashboard (`app.py`)

Packages