Auto ML - Automated Machine Learning Platform

An intelligent automated machine learning platform that provides comprehensive data analysis, preprocessing, model selection, and hyperparameter tuning capabilities through Model Context Protocol (MCP) tools.

🚀 Features

📊 Data Analysis & Exploration

Data Information: Get comprehensive dataset statistics including shape, memory usage, data types, and missing values
CSV Reading: Efficient CSV file reading with pandas and pyarrow support
Correlation Analysis: Visualize correlation matrices for numerical and categorical variables
Outlier Detection: Identify and visualize outliers in your datasets

🔧 Data Preprocessing

Automated Preprocessing: Handle missing values, encode categorical variables, and scale numerical features
Feature Engineering: Prepare features for both regression and classification problems
Data Validation: Check for duplicates and data quality issues

🤖 Machine Learning Models

Multiple Algorithms: Support for various ML algorithms including:
- Regression: Linear Regression, Ridge, Lasso, ElasticNet, Random Forest, XGBoost, SVR, KNN, CatBoost
- Classification: Logistic Regression, Ridge Classifier, Random Forest, XGBoost, SVM, KNN, Decision Tree, Naive Bayes, CatBoost

📈 Model Evaluation & Visualization

Performance Metrics:
- Regression: R², MAE, MSE
- Classification: Accuracy, F1-Score
Confusion Matrix Visualization: For classification problems
Model Comparison: Compare multiple models side-by-side

⚙️ Hyperparameter Tuning

Automated Tuning: Optimize model hyperparameters using advanced search algorithms
Customizable Scoring: Choose from various evaluation metrics
Trial Management: Control the number of optimization trials

📁 Project Structure

AutoML/
├── data/                   # Sample datasets
│   ├── Ai.csv
│   ├── Calories.csv
│   ├── Cost.csv
│   ├── Digital.csv
│   ├── Electricity.csv
│   ├── ford.csv
│   ├── Habits.csv
│   ├── heart.csv
│   ├── Lifestyle.csv
│   ├── Mobiles.csv
│   ├── Personality.csv
│   ├── Salaries.csv
│   ├── Shopper.csv
│   ├── Sleep.csv
│   ├── cat.csv
│   ├── test.csv
│   └── train.csv
├── tools/
│   └── all_tools.py       # MCP tool definitions
├── utils/
│   ├── before_model.py        # Feature preparation
│   ├── details.py             # Data information
│   ├── external_test.py       # External data test with XGBoost
│   ├── feature_importance.py  # Feature importance analysis
│   ├── hyperparameter.py      # Hyperparameter tuning
│   ├── model_selection.py     # Model selection and evaluation
│   ├── prediction.py          # Prediction utilities
│   ├── preprocessing.py       # Data preprocessing
│   ├── read_csv_file.py       # CSV reading utilities
│   └── visualize_data.py      # Visualization functions
├── main.py                # Application entry point
├── server.py              # MCP server configuration
├── requirements.txt       # Python dependencies
└── README.md             # This file

🛠️ Installation

Prerequisites

Python 3.8 or higher
pip or uv package manager

Setup

Clone the repository

git clone https://github.com/emircansoftware/AutoML.git
cd AutoML

Install dependencies

# Using pip
pip install -r requirements.txt
pip install uv

Using with Claude Desktop

1. Data Path Setting

In utils/read_csv_file.py, update the path variable to match your own project directory on your computer:

# Example:
path = r"C:\\YOUR\\PROJECT\\PATH\\AutoML\\data"

2. Claude Desktop Configuration

In Claude Desktop, add the following block to your claude_desktop_config.json file and adjust the paths to match your own system:

{
  "mcpServers": {
    "AutoML": {
      "command": "uv",
      "args": [
        "--directory",
        "C:\\YOUR\\PROJECT\\PATH\\AutoML",
        "run",
        "main.py"
      ]
    }
  }
}

You can now start your project from Claude Desktop.

📋 Dependencies

MCP Framework: mcp[cli]>=1.9.4 - Model Context Protocol for tool integration
Data Processing: pandas>=2.3.0, pyarrow>=20.0.0, numpy>=2.3.1
Machine Learning: scikit-learn>=1.3.0, xgboost>=2.0.0, lightgbm>=4.3.0
Additional ML: catboost (for CatBoost models)

🎯 Usage

Starting the MCP Server

from server import mcp

# Run the server
mcp.run()

Available Tools

The platform provides the following MCP tools:

Data Analysis Tools

information_about_data(file_name): Give detailed information about the data
reading_csv(file_name): Read the csv file
visualize_correlation_num(file_name): Visualize the correlation matrix for numerical columns
visualize_correlation_cat(file_name): Visualize the correlation matrix for categorical columns
visualize_correlation_final(file_name, target_column): Visualize the correlation matrix after preprocessing
visualize_outliers(file_name): Visualize outliers in the data
visualize_outliers_final(file_name, target_column): Visualize outliers after preprocessing

Preprocessing Tools

preprocessing_data(file_name, target_column): Preprocess the data (remove outliers, fill nulls, etc.)
prepare_data(file_name, target_column, problem_type): Prepare the data for models (encoding, scaling, etc.)

Model Training & Evaluation

models(problem_type, file_name, target_column): Select and evaluate models based on problem type
visualize_accuracy_matrix(file_name, target_column, problem_type): Visualize the confusion matrix for predictions
best_model_hyperparameter(model_name, file_name, target_column, problem_type, n_trials, scoring, random_state): Tune the hyperparameters of the best model
test_external_data(main_file_name, target_column, problem_type, test_file_name): Test external data with the best model and return predictions
predict_value(model_name, file_name, target_column, problem_type, n_trials, scoring, random_state, input): Predict the value of the target column for new input
feature_importance_analysis(file_name, target_column, problem_type): Analyze the feature importance of the data using XGBoost

Example Workflow

# 1. Analyze your data
info = information_about_data("data/heart.csv")

# 2. Preprocess the data
preprocessed = preprocessing_data("data/heart.csv", "target")

# 3. Prepare features for classification
features = prepare_data("data/heart.csv", "target", "classification")

# 4. Train and evaluate models
results = models("classification", "data/heart.csv", "target")

# 5. Visualize results
confusion_matrix = visualize_accuracy_matrix("data/heart.csv", "target", "classification")

# 6. Optimize best model
best_model = best_model_hyperparameter("RandomForestClassifier", "data/heart.csv", "target", "classification", 100, "accuracy", 42)

📊 Sample Datasets (All CSV datasets are from Kaggle.)

The project includes various sample datasets for testing:

heart.csv: Heart disease prediction dataset
Salaries.csv: Salary prediction dataset
Calories.csv: Calorie prediction dataset
Personality.csv: Personality analysis dataset
Digital.csv: Digital behavior dataset
Lifestyle.csv: Lifestyle analysis dataset
Mobiles.csv: Mobile phone dataset
Habits.csv: Habit analysis dataset
Sleep.csv: Sleep pattern dataset
Cost.csv: Cost analysis dataset
ford.csv: Ford car dataset
Ai.csv: AI-related dataset
cat.csv: Cat-related dataset

🔧 Configuration

Environment Variables

Set your preferred random seed for reproducible results
Configure MCP server settings in server.py

Customization

Add new ML algorithms in utils/model_selection.py
Extend preprocessing steps in utils/preprocessing.py
Create custom visualization functions in utils/visualize_data.py

🤝 Contributing

We welcome contributions! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Contributing Guidelines

Fork the repository
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Model Context Protocol for the MCP framework
scikit-learn for machine learning algorithms
XGBoost for gradient boosting
CatBoost for categorical boosting
pandas for data manipulation

📞 Support

If you encounter any issues or have questions:

Check the Issues page
Create a new issue with detailed information
Contact the maintainers

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
tools		tools
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
main.py		main.py
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
server.py		server.py
uv.lock		uv.lock

License

emircansoftware/AutoML

Folders and files

Latest commit

History

Repository files navigation