Skip to content

This repository contains various datasets for data analysis, machine learning, and educational purposes

Notifications You must be signed in to change notification settings

lovnishverma/datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

30 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

My Datasets Repository

This repository contains various datasets for data analysis, machine learning, and educational purposes. Below is a brief description of each dataset available in this repository.

Available Datasets

1. BMI_Data.csv

  • Contains Body Mass Index (BMI) data.
  • Useful for health and fitness analysis.

2. departments.csv

  • Contains department-related information.
  • Useful for organizational data processing.

3. employees.csv

  • Contains employee details.
  • Can be used for HR analytics and workforce management.

4. iris.csv

  • Classic Iris dataset for machine learning.
  • Contains different species of iris flowers with their measurements.

5. item_similarity_df.csv

  • Contains item similarity data.
  • Useful for recommendation system development.

6. movies.csv

  • Dataset containing information about movies.
  • Useful for movie recommendation models.

7. music_genre.csv

  • Contains music genre classification data.
  • Can be used for genre prediction models.

8. nielit.patt

  • Not a database it's for AVR custom Marker

9. pandas.csv

  • Sample dataset for practicing pandas library operations.
  • Useful for learning data manipulation.

10. pandas_tutorial1.csv

  • Another dataset for pandas tutorials.
  • Contains structured data for training purposes.

11. ratings.csv

  • Contains user ratings for various items.
  • Useful for collaborative filtering and recommendation systems.

12. sample.csv

  • A sample dataset.
  • Can be used for testing and learning purposes.

13. test.csv

  • A test dataset.
  • Used for validation and experimentation.

Usage

These datasets can be used for:

  • Machine learning projects
  • Data analysis and visualization
  • Educational and tutorial purposes

How to Contribute

If you have additional datasets to contribute, feel free to upload them and update this README with the necessary descriptions.

License

These datasets are provided for educational and research purposes. Please check individual datasets for any specific license information.


For any questions or suggestions, feel free to raise an issue or contact Lovnish Verma.

πŸ“Š Machine Learning Dataset Sources

A list of public datasets for machine learning, AI, data science, and analytics projects.


πŸ”Ή General-Purpose ML Repositories


πŸ”Ή Government & Open Data Portals


πŸ”Ή Domain-Specific Datasets

πŸ–ΌοΈ Computer Vision

🌐 Web & NLP

🧬 Bio, Medical & Health

πŸ—£οΈ Speech & Audio

  • OpenSLR – Speech recognition datasets.
  • LibriSpeech ASR – Audiobook dataset for speech recognition.

πŸ—ΊοΈ Maps & Geospatial


βœ… Quick Access Table

Name Domain Link
UCI ML Repo General Link
Kaggle General Link
IndiaAI Govt (India) Link
Data.gov.in Govt (India) Link
Data.gov Govt (USA) Link
Data World General Link
Hugging Face NLP/ML Link
Papers with Code Benchmarks Link
Zenodo Research Link

πŸ“Œ Tip

For code integration and automatic downloads, you can often use Python libraries such as:

from datasets import load_dataset

dataset = load_dataset("imdb")  # Hugging Face example

You can also automate downloads from Kaggle via API:

kaggle datasets download -d username/dataset-name

Feel free to contribute more sources via pull request!

About

This repository contains various datasets for data analysis, machine learning, and educational purposes

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

  •  

Packages

No packages published