This project involves a comprehensive analysis and time series forecasting of global Monkeypox (MPOX) cases during the 2022-2024 outbreak. The analysis leverages historical data and employs advanced time series models such as ARIMA and SARIMA to predict future trends in Monkeypox cases. The results aim to provide valuable insights into the outbreak's trajectory and inform public health strategies.
- Introduction
- Project Objectives
- Data Sources
- Methodology
- Results and Visualizations
- Project Structure
- How to Run the Project
- Dependencies
- Limitations and Future Work
- Contributing
- License
Monkeypox (MPOX) is an emerging zoonotic disease that has seen a resurgence in human cases globally since 2022. Accurate forecasting of Monkeypox incidence is crucial for public health planning and intervention. This project uses historical case data, explores key epidemiological trends, and applies forecasting techniques to model future cases.
- Analyze Historical Data: To identify key trends and patterns in global Monkeypox cases.
- Develop Predictive Models: Using ARIMA and SARIMA models to forecast future case numbers.
- Visualize Findings: To present complex data and results in an intuitive format using plots and visualizations.
- Provide Recommendations: Based on the forecast results, suggest actionable strategies for public health authorities.
The project utilizes publicly available data from reputable sources:
- World Health Organization (WHO): Monkeypox Situation Dashboard
- Centers for Disease Control and Prevention (CDC): Monkeypox Cases and Data
- Global Health Observatory (GHO): GHO Data Repository
The project employs a multi-step approach involving:
- Data Collection and Preprocessing: Importing datasets, handling missing values, and normalizing case numbers.
- Time Series Analysis: Using decomposition to identify trend, seasonality, and residual components.
- Model Selection and Training: Implementing ARIMA and SARIMA models to forecast future case trends.
- Model Evaluation: Using metrics like Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE) to assess model accuracy.
- Visualization: Creating informative plots to present key insights and results.
- Global Monkeypox Cases and Deaths Over Time: Visualizing historical trends.
- Top 10 Locations by Total Cases and Cases per Million: Identifying regions with the highest burden.
- Case Fatality Rate Analysis: Understanding disease severity.
- Forecasting Plots: Predictions using ARIMA and SARIMA models, including next-year projections.
The repository is structured as follows:
├── data/ # Directory for datasets (raw and processed)
│ ├── mpox_cases_global.csv
│ └── additional_data_sources/
├── notebooks/ # Jupyter notebooks for exploratory analysis and model building
│ ├── data_analysis.ipynb
│ └── forecasting_models.ipynb
├── plots/ # Directory for generated plots and visualizations
│ ├── global_cases_trend.png
│ └── sarima_forecast.png
├── README.md # Project description and overview
├── requirements.txt # Python dependencies and libraries
└── src/ # Source code for data processing and modeling
├── data_preprocessing.py
├── arima_model.py
└── sarima_model.py
To run this project locally, follow these steps:
-
Clone the Repository:
git clone https://github.com/your-username/monkeypox-forecasting.git cd monkeypox-forecasting
-
Create a Virtual Environment (Optional):
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install Dependencies: Install the required packages using the
requirements.txt
file:pip install -r requirements.txt
-
Run the Jupyter Notebooks: Open the project’s Jupyter notebooks to run the analysis and model building scripts:
jupyter notebook notebooks/data_analysis.ipynb
-
View Plots and Results: Once the analysis is complete, navigate to the
plots/
directory to view generated visualizations.
The following Python libraries are required for this project:
pandas
numpy
matplotlib
seaborn
statsmodels
scikit-learn
jupyter
To install them, run:
pip install -r requirements.txt
While the project offers valuable insights, it has certain limitations:
- Data Quality: The accuracy of the models is dependent on the quality and completeness of the underlying data.
- Model Complexity: The ARIMA and SARIMA models used may not capture all the nuances of disease spread.
- Future Enhancements: Incorporating machine learning models such as LSTM or Prophet could improve predictive accuracy.
Contributions are welcome! If you'd like to contribute to this project, please open an issue or submit a pull request. Ensure that your code adheres to the existing style and is well-documented.
This project is licensed under the MIT License. See the LICENSE file for details.