GitHub - General-Cow/Classifying-World-Economies: Verifying IMF Economic Classification of Countries

Classifying World Economies

Version 1.0 Updated: June 09, 2024

README CONTENTS

0.1 Contact Information

1.0 Release Contents
1.1 Introduction
1.2 Acknowledgements
1.3 Rights Information
1.4 Using this Database
1.5 Revision History

2.0 Data Tables
2.1 Original IMF Format
2.2 Processed Data Set Features
2.3 Target
2.4 Supervised Algorithms
2.5 Clustering

3.0 Results
3.1 Supervised Results
3.2 Clustering
3.3 Conclusion

4.0 Appendix
4.1 Advanced Countries List
4.2  Developing Countries List

0.1 Contact Information

Name: Email:

David Blankenship: [email protected]

Jai Vaidya: [email protected]

1.0 Release Contents

The contents of this dataset is listed below

  readme.txt     
  WEOOct2023allfix.xlsx
  Classifying_World_Economies_Notebook.ipnyb      
  Classifying World Economies.pptx

1.1 Introduction

Our goal for this project was to see if we could verify the classification of different countries as Advanced or Developing using real data and see if there are other potential categories being missed. We approached this using two methods:

Supervised, predictive classification of whether a country is a developing/emerging or advanced economy.

Unsupervised clustering to determine if there is another classification schema that could reasonably describe the data or see if the advanced and developing/emerging paradigm appears spontaneously from the data.

1.2 Acknowledgements

This project was created thanks to the collaborative efforts of David Blankenship and Jai Vaidya. This was originally created for their DSCI 631 class project at Drexel University in the Spring 2024 class.

1.3 Rights Information

International Monetary Fund

The original data source was the World Economic Outlook Oct 2023 dataset made by the IMF. It is available at their website listed here:

https://www.imf.org/en/publications/weo

1.4 Using this Database

This version of the database is available in a comma-separated value format. This data was put together in a Python 3 environment. Specifically, the Anaconda and Jupyter Notebook environment using the following packages:

Pandas
Numpy
Sci-kit learn
matplotlib
seaborn
joblib

It is recommended to import the data into your environment from the provided data files and begin analysis from there. For Python 3, we recommend importing the .xlsx file in as a pandas Dataframes using Panda's read_excel() function as outlined in the Jupyter Notebook and along with the functions defined in our EDA section to manipulate the data into the appropriate shape.

1.5 Revision History

 Version      Date            Comments
   1.0        June 2024       Initial release for DSCI 631 class project

2.0 Data Tables

The dataset that we used is the International Monetary Fund (IMF) World Economic Outlook (WEO) dataset.

It is a biannual dataset released by the IMF that is used to project economic forecasts in the near and medium term. In addition to the forecast, it's also a great collection of past economic data collected by the IMF.

It has data past data from 1980 up to 2021 and continues with predictions from there. It contains 44 features for 196 countries.

2.1 Original IMF Format

Initially all the data is shaped such that the rows are the features for every country stacked on top of one another while the columns are the years and various columns for the features like units, descriptions and so on.

2.2 Processed Data Set Features

We preprocess the dataset to remove employment in persons, as it is only collected by advanced economies and two columns that would be duplicated after coversion of national currencies to US dollars.

0   Gross domestic product, constant prices in US dollars
1   Gross domestic product, constant prices in Percent change
2   Gross domestic product, current prices in U.S. dollars
3   Gross domestic product, current prices in Purchasing power parity;
    international dollars
4   Gross domestic product, deflator in Index
5   Gross domestic product per capita, constant prices in US dollars
6   Gross domestic product per capita, constant prices in Purchasing power
    parity; 2017 international dollar
7   Gross domestic product per capita, current prices in U.S. dollars
8   Gross domestic product per capita, current prices in Purchasing power
    parity; international dollars
9   Gross domestic product based on purchasing-power-parity (PPP) share of
    world total in Percent 
10  Implied PPP conversion rate in National currency per current international
    dollar
11  Total investment in Percent of GDP
12  Gross national savings in Percent of GDP
13  Inflation, average consumer prices in Index
14  Inflation, average consumer prices in Percent change
15  Inflation, end of period consumer prices in Index
16  Inflation, end of period consumer prices in Percent change
17  Volume of imports of goods and services in Percent change
18  Volume of Imports of goods in Percent change
19  Volume of exports of goods and services in Percent change
20  Volume of exports of goods in Percent change
21  Unemployment rate in Percent of total labor force
22  Population in Persons
23  General government revenue in US dollars
24  General government revenue in Percent of GDP
25  General government total expenditure in US dollars
26  General government total expenditure in Percent of GDP
27  General government net lending/borrowing in US dollars
28  General government net lending/borrowing in Percent of GDP
29  General government structural balance in US dollars
30  General government structural balance in Percent of potential GDP
31  General government primary net lending/borrowing in US dollars
32  General government primary net lending/borrowing in Percent of GDP
33  General government net debt in US dollars
34  General government net debt in Percent of GDP
35  General government gross debt in US dollars
36  General government gross debt in Percent of GDP
37  Gross domestic product corresponding to fiscal year, current prices in US
    dollars
38  Current account balance in U.S. dollars
39  Current account balance in Percent of GDP

2.3 Target

We have a target variable set to 0 or 1 depending on whether the country is classified by the IMF as Developing or Advanced economies respectively.

See appendices for lists of the different countries in each category.

2.4 Supervised Algorithms

We implement a 5-fold cross-validation and a test split of 20%.

All models used K Nearest Neighbors = 3 for imputation of missing data points and recursive feature elimination for feature selection.

We used the following algorithms for the supervised learning with the selected hyperparameters:

Logistic Regression

Recursive Feature Elimination

Number of features to select: 1.0
Steps: 0.1

Model

Penalty: L1
C: 100

Decision Tree

Recursive Feature Elimination

Number of features to select: 0.1
Steps: 0.5

Model

Max Depth: 10
Minimum Samples Split: 2
Minimum Samples Leaf: 1

Random Forest

Recursive Feature Elimination

Number of features to select: 0.75
Steps: 1

Model

Max Depth: 10
Minimum Samples Split: 2
Minimum Samples Leaf: 1

2.5 Clustering

Before model building, K Nearest Neighbors = 3 is used for imputation of missing data points. PCA was then performed and 1st 2 components were fed into the models.

We used the following algorithms:

K Means Clustering

Elbow Plot and Silhouette Score Plot to identify optimal K

Score for k = 2: -11803.636634895442
Score for k = 3: -3640.8314135269616

DBSCAN

min_samples = 5
eps = 0.1, 0.2...0.8

Agglomerative Clustering

n_clusters = 2
Linkages used = "ward", "average", "complete", "single"

Gausian Mixture

n_components=2
covariance types = "full", "spherical", "diag", "tied"
AIC/BIC plot to identify optimal K
Best k: 5
Best Covariance Type: full

3.0 Results

This section provides a summary of the results for each analysis, the process to achieve these results, and a description of the challenges for each.

3.1 Supervised Results

Test Scores for Logistic Regression Model:

F1 Score (Test): 0.8311688311688312
Precision (Test): 0.8888888888888888
Recall (Test): 0.7804878048780488

Test Scores for Decision Tree Model:

F1 Score (Test): 0.9487179487179488
Precision (Test): 1.0
Recall (Test): 0.9024390243902439

Test Scores for Random Forest Model:

F1 Score (Test): 0.975609756097561
Precision (Test): 0.975609756097561
Recall (Test): 0.975609756097561

3.2 Clustering

K means (k=2)

normalized_mutual_info_score: 0.3955383189940556
adjusted_rand_score: 0.5777688363613122

DBSCAN

normalized_mutual_info_score: 0.05236904420828541
adjusted_rand_score: 0.09107931503444604

Agglomerative Clustering

normalized_mutual_info_score: 0.00743805187377737
adjusted_rand_score: 0.020076125670635537

Gaussian Mixtures

normalized_mutual_info_score: 0.17598207568900548
adjusted_rand_score: 0.3344522403257472

3.3 Conclusion

Overall, we are very satisfied with the results of our project. We set out to verify if the advanced/developing dichotomy was verifiable by the data and we were to find exactly this to a very high degree of fidelity with the supervised models. For the clustering models we saw that the clustering performance varies significantly across different algorithms. K-Means achieved the highest scores, indicating moderate alignment with the ground truth labels. Gaussian Mixtures performed moderately well. DBSCAN and Agglomerative Clustering showed poor performance, with particularly low scores for Agglomerative Clustering suggesting its clusters are almost random compared to the true labels. Overall, K-Means proved to be the most effective clustering method for this dataset.

4.0 Appendix

The appendix contains extra information that may be valuable for understanding the dataset.

4.1 Advanced Countries List

Andorra
Australia
Austria
Belgium
Canada
Croatia
Cyprus
Czech Republic
Denmark
Estonia
Finland
France
Germany
Greece
Hong Kong SAR
Iceland
Ireland
Israel
Italy
Japan
Korea
Latvia
Lithuania
Luxembourg
Macao SAR
Malta
Netherlands
New Zealand
Norway
Portugal
Puerto Rico
San Marino
Singapore
Slovak Republic
Slovenia
Spain
Sweden
Switzerland
Taiwan Province of China
United Kingdom
United States

4.2 Developing Countries List

Afghanistan
Albania
Algeria
Angola
Antigua and Barbuda
Argentina
Armenia
Aruba
Azerbaijan
The Bahamas
Bahrain
Bangladesh
Barbados
Belarus
Belize
Benin
Bhutan
Bolivia
Bosnia and Herzegovina
Botswana
Brazil
Brunei Darussalam
Bulgaria
Burkina Faso
Burundi
Cabo Verde
Cambodia
Cameroon
Central African Republic
Chad
Chile
China 
Colombia
Comoros
Democratic Republic of the Congo 
Republic of Congo
Costa Rica
Côte d'Ivoire
Djibouti
Dominica
Dominican Republic
Ecuador
Egypt
El Salvador
Equatorial Guinea
Eritrea
Eswatini
Ethiopia
Fiji
Gabon
The Gambia
Georgia
Ghana
Grenada
Guatemala
Guinea
Guinea-Bissau
Guyana
Haiti
Honduras
Hungary
India
Indonesia
Islamic Republic of Iran
Iraq
Jamaica
Jordan
Kazakhstan
Kenya
Kiribati
Kosovo
Kuwait
Kyrgyz Republic
Lao P.D.R.
Lebanon
Lesotho
Liberia
Libya
Madagascar
Malawi
Malaysia
Maldives
Mali
Marshall Islands
Mauritania
Mauritius
Mexico
Micronesia
Moldova
Mongolia
Montenegro
Morocco
Mozambique
Myanmar
Namibia
Nauru
Nepal
Nicaragua
Niger
Nigeria
North Macedonia
Oman
Pakistan
Palau
Panama
Papua New Guinea
Paraguay
Peru
Philippines
Poland
Qatar
Romania
Russia
Rwanda
Samoa
São Tomé and Príncipe
Saudi Arabia
Senegal
Serbia
Seychelles
Sierra Leone
Solomon Islands
Somalia
South Africa
South Sudan
Sri Lanka
St. Kitts and Nevis
St. Lucia
St. Vincent and the Grenadines
Sudan
Suriname
Syria - Of note Syria is dropped in our dataset due to no data being collected
        since 2011 due to the ongoing Syrian Civil War
Tajikistan
Tanzania
Thailand
Timor-Leste
Togo
Tonga
Trinidad and Tobago
Tunisia
Türkiye
Turkmenistan
Tuvalu
Uganda
Ukraine
United Arab Emirates
Uruguay
Uzbekistan
Vanuatu
Venezuela	
Vietnam	
West Bank and Gaza	
Yemen
Zambia
Zimbabwe

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Classifying World Economies_Final.pptx		Classifying World Economies_Final.pptx
Classifying_World_Economies_Notebook.ipynb		Classifying_World_Economies_Notebook.ipynb
ExplainingWorldEconomies_DavidBlankenship.pdf		ExplainingWorldEconomies_DavidBlankenship.pdf
LICENSE		LICENSE
README.md		README.md
WEOOct2023allfix.xlsx		WEOOct2023allfix.xlsx
XAI Followup.ipynb		XAI Followup.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

License

General-Cow/Classifying-World-Economies

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages