Repository: https://github.com/chirindaopensource/election_polarization_analysis
Owner: 2025 Craig Chirinda (Open Source Projects)
- Introduction
- Theoretical Background
- Features
- Methodology Implemented
- Core Components (Notebook Structure)
- Key Callable:
analyze_electoral_polarization
- Prerequisites
- Input Data Structure
- Usage
- Output Structure
- Customization
- License
- Citation
- Contributing
This project provides a Python implementation of the methodologies presented in the paper "Comparing Electoral Polarization Levels" by Boris Ginzburg (arXiv:2411.04072, 2024). The core of this repository is the iPython Notebook comparing_election_polarization_levels_draft.ipynb
, which contains a comprehensive suite of functions to analyze ideological polarization of an electorate around a particular central point,
The framework introduced by Ginzburg allows for flexibility in defining the central point and does not assume specific boundaries for the "center." This implementation enables researchers to:
- Establish whether polarization is occurring.
- Identify the position around which polarization is happening.
- Relate ideological polarization to affective polarization and the increased salience of divisive issues.
This codebase is intended for researchers and students in political science, economics, data science, and related fields who require robust tools for quantitative polarization analysis.
The implemented methods are grounded in the theoretical constructs proposed by Ginzburg (2024):
-
Ideological Polarization (Definition 1): Defines an electorate as more polarized around a chosen $x^$ if the share of voters belonging to any interval that includes $x^$ is weakly smaller in one distribution (
$\hat{F}$ ) compared to another ($F$ ). -
Condition for Polarization (Proposition 1): Provides a necessary and sufficient condition for comparing polarization based on the difference between two cumulative distribution functions (CDFs),
$\hat{F}(x) - F(x)$ , relative to$x^*$ . -
Affective Polarization (Proposition 2): Links ideological polarization to affective polarization (dislike towards opposing groups). It models animosity
$g(|x - m_j|)$ where$m_j$ is the mean of the opposing group and$g(\cdot)$ is an increasing function. Increased ideological polarization around$x^*$ (a group boundary) implies increased average animosity. -
Issue Salience (Proposition 3): Explores how increased salience (
$\alpha$ ) of a "divisive" issue$d$ relative to a "common-value" issue$c$ (where voter position$x = (1-\alpha)c + \alpha d$ ) can increase ideological and, consequently, affective polarization. -
Continuous Polarization Measure: Proposes a continuous measure $P(F,x^{}) = a[\int_{x<x^{}}F(x)dx] - b[\int_{x>x^{*}}F(x)dx]$, where
$a(\cdot)$ and$b(\cdot)$ are strictly increasing functions, to provide a complete ordering of distributions.
The provided iPython Notebook (comparing_election_polarization_levels_draft.ipynb
) implements a full pipeline for polarization analysis, including:
- Input Validation: Rigorous checks for input data schema, parameter types, and value ranges.
- Data Preparation: Cleaning of survey data, filtering by year, calculation of relative frequency distributions, and Cumulative Distribution Functions (CDFs) for voter ideological positions.
-
Central Point (
$x^*$ ) Definition: Flexible calculation of the central point$x^*$ based on user-defined criteria (mean, median, mode, or a fixed numeric value) for each distribution. -
Ideological Polarization Comparison (Definition 1): Systematic comparison of voter shares in intervals around
$x^*$ between two distributions. - Ideological Polarization Comparison (Proposition 1): Application of the CDF-difference conditions to assess polarization changes.
-
Affective Polarization Measurement: Calculation of affective polarization scores using a customizable animosity function
$g(\cdot)$ . -
Issue Salience Impact Evaluation: Simulation of varying issue salience (
$\alpha$ ) and its effect on synthetic voter distributions and their polarization levels. -
Continuous Polarization Measure Calculation: Computation of the
$P(F, x^*)$ score using customizable$a(\cdot)$ and$b(\cdot)$ functions. - Comprehensive Reporting: Generation of a structured dictionary containing all inputs, configurations, intermediate results, final metrics, and metadata.
- Customizable Functions: Support for user-defined functions for animosity ($g(\cdot)$) and for the continuous measure ($a(\cdot), b(\cdot)$).
- Synthetic Data Generation: Utility to create synthetic datasets for testing and demonstration.
The core analytical steps directly implement the definitions and propositions from Ginzburg (2024):
-
Distribution Representation: Voter ideological positions are represented by empirical Cumulative Distribution Functions (CDFs),
$F(x)$ , derived from survey data. For discrete scales (e.g., 0-10),$F(x_i)$ is the sum of relative frequencies up to score$x_i$ . -
Central Point
$x^*$ : Calculated for each distribution (issue/year) as specified by the user (mean, median, mode of$F(x)$ , or a fixed value). -
Definition 1 Comparison:
- For a reference $x^$ (typically from the earlier distribution $F$), intervals $[\underline{x}, \overline{x}]$ containing $x^$ are generated (e.g., $[x^-k, x^+k]$).
- The share of voters
$S_F = F(\overline{x}) - F(\underline{x})$ is calculated. For discrete data, this is$F(\lfloor\overline{x}\rfloor) - F(\lceil\underline{x}\rceil-1)$ . - Polarization is higher under
$\hat{F}$ (later distribution) than$F$ if$S_{\hat{F}} \le S_F$ for all such intervals.
-
Proposition 1 Comparison:
- The difference
$\Delta(x) = \hat{F}(x) - F(x)$ is calculated. - Polarization is higher under
$\hat{F}$ if an$x^*$ exists such that:-
$\Delta(x) \ge 0$ for all$x \le x^*$ -
$\Delta(x) \le 0$ for all$x \ge x^*$
-
- The difference
-
Affective Polarization (Proposition 2):
- Voters are partitioned into Left ($x < x^$) and Right ($x > x^$) groups.
- Mean positions
$m_L$ and$m_R$ are calculated for these groups. - Affective polarization $A(F) = \sum_{x<x^{}} g(|x - m_R|) f(x) + \sum_{x>x^{}} g(|x - m_L|) f(x)$, where
$f(x)$ is the relative frequency at$x$ , and$g(\cdot)$ is a user-supplied increasing animosity function.
-
Issue Salience (Proposition 3):
- Synthetic voter positions
$x_{\alpha} = (1-\alpha)c + \alpha d$ are generated for common respondents across a "common-value" issue ($c$ ) and a "divisive" issue ($d$ ) for various salience weights$\alpha$ . - The resulting distributions
$F_\alpha(x)$ are analyzed for changes in polarization (using Def. 1 / Prop. 1) as$\alpha$ varies.
- Synthetic voter positions
-
Continuous Polarization Measure:
- $P(F,x^{}) = a\left[\sum_{x_i<x^{}}F(x_i)\right] - b\left[\sum_{x_i>x^{*}}F(x_i)\right]$ (using sums for discrete unit-spaced CDFs).
-
$a(\cdot)$ and$b(\cdot)$ are user-supplied strictly increasing functions.
The iPython Notebook comparing_election_polarization_levels_draft.ipynb
is structured to follow a logical pipeline, orchestrated by the main analyze_electoral_polarization
function. The key functional blocks within the notebook include:
- Type Definitions and Configuration Constants: Defines all custom types and default parameters.
-
Input Validation Utilities: Functions to validate the schema and content of all input parameters. (
_validate_...
functions,validate_all_input_parameters
) -
Data Preparation: Functions for cleaning data, filtering by year, and calculating frequency distributions and CDFs. (
_clean_and_filter_single_dataframe
,_calculate_frequencies_and_cdfs_for_issue
,prepare_voter_position_data
) -
Central Point
$x^*$ Definition: Functions to calculate$x^*$ based on specified methods. (_calculate_x_star_for_single_year
,define_central_points
) -
Definition 1 Implementation: Functions to generate intervals and compare voter shares. (
_generate_intervals_around_x_star
,_calculate_share_in_interval
,compare_polarization_definition1
) -
Proposition 1 Implementation: Functions to analyze CDF differences and check conditions. (
_find_proposition1_x_star_and_crossings
,apply_proposition1_conditions
) -
Affective Polarization Measurement: Functions for group statistics and calculating
$A(F)$ . (_calculate_group_statistics
,measure_affective_polarization
) -
Issue Salience Evaluation: Functions to generate synthetic distributions for
$x_\alpha$ and analyze their polarization. (_generate_synthetic_distribution
,evaluate_issue_salience_impact
) -
Continuous Polarization Measure Calculation: Function to compute
$P(F, x^*)$ . (calculate_continuous_polarization_measure
) -
Comprehensive Reporting: Function to aggregate all results. (
generate_comprehensive_report
) -
Main Orchestrator: The
analyze_electoral_polarization
function that calls all the above steps. -
Custom Function Examples: Implementations of example animosity functions ($g(\cdot)$) and intensity functions ($a(\cdot), b(\cdot)$). (
linear_animosity_function
,quadratic_animosity_function
,identity_function_for_continuous_measure
,cubic_function_for_continuous_measure
) -
Synthetic Data Generation: Utility to create sample data. (
create_synthetic_data
) -
Usage Example: A script demonstrating how to run the pipeline. (
run_example_analysis_with_custom_functions
)
The central function in this project is analyze_electoral_polarization
. It orchestrates the entire analytical workflow.
def analyze_electoral_polarization(
data_dict: Dict[str, pd.DataFrame],
x_star_param: CentralPositionInputType,
years_for_analysis: List[int],
perform_salience_analysis: bool = False,
# ... (other parameters as defined in the notebook)
) -> ComprehensiveReportType:
# ... (implementation)
This function takes the raw data and configuration parameters, performs all analytical steps, and returns a comprehensive report dictionary. Refer to its docstring in the notebook for detailed parameter descriptions.
- Python 3.8+ (due to extensive use of
typing
features includingTypedDict
; for Python < 3.8, thetyping_extensions
library might be required as noted in the type definitions cell). - pandas: For data manipulation and DataFrame structures.
- NumPy: For numerical operations, especially array manipulations.
-
SciPy: Specifically
scipy.stats.mode
for modal calculation of$x^*$ .
To install dependencies (assuming you have Python and pip):
pip install pandas numpy scipy
# If using Python < 3.8 and type hints cause issues:
# pip install typing_extensions
The primary data input for the analyze_electoral_polarization
function is a dictionary of pandas DataFrames (data_dict
):
- Type:
Dict[str, pd.DataFrame]
- Keys: Strings representing issue names (e.g., "Economic_Policy", "ANES_LibCon_Scale").
- Values: Pandas DataFrames, where each DataFrame corresponds to an issue and must contain the following columns:
respondent_id
: Unique identifier for each respondent (str, int, or object).year
: Integer representing the survey year.ideology_score
: Numeric value (int or float) representing the respondent's self-reported ideological position on a defined scale (e.g., 0-10). Values must be withinideology_scale_bounds
and free of NaNs/Infs after initial cleaning.group_label
(Optional): String or object for pre-defined group affiliations, if any. Not directly used by all core polarization measures but can be useful for contextual analysis.
Example data_dict
entry:
data_dict = {
"Issue1": pd.DataFrame({
'respondent_id': ['resp_1', 'resp_2', ...],
'year': [2000, 2000, ...],
'ideology_score': [3, 7, ...]
}),
# ... other issues
}
-
Clone the Repository:
git clone https://github.com/chirindaopensource/election_polarization_analysis.git cd election_polarization_analysis
-
Ensure Prerequisites are Installed: See Prerequisites.
-
Open and Run the Notebook: Open
comparing_election_polarization_levels_draft.ipynb
in a Jupyter Notebook or JupyterLab environment. -
Execute Cells: Run the cells in order. The notebook is structured to define all helper functions and type definitions before they are used.
-
Examine the Usage Example: The final cells of the notebook, particularly the
run_example_analysis_with_custom_functions()
function call, demonstrate how to:- Generate synthetic data (for demonstration).
- Configure various analysis parameters.
- Pass custom functions for animosity and continuous measure calculations.
- Execute the main
analyze_electoral_polarization
pipeline. - Inspect the output report.
-
Adapt for Real Data:
- Replace the synthetic data generation step (
create_synthetic_data
) with your own data loading and preprocessing logic. Ensure your data is transformed into thedata_dict
format described in Input Data Structure. - Adjust parameters in the
analyze_electoral_polarization
call (e.g.,years_for_analysis
,x_star_param
, custom function choices) to suit your research question and dataset. - Carefully review the
DataQualityReportType
section of the output to ensure data integrity and sufficient sample sizes.
- Replace the synthetic data generation step (
The analyze_electoral_polarization
function returns a ComprehensiveReportType
, which is a nested Python dictionary. This report contains:
-
metadata
: Timestamps, Python/library versions, custom notes. -
input_parameters_summary
: A record of the main configurations used for the run. -
data_quality_and_preprocessing
: Detailed metrics from data cleaning for each issue. -
central_point_x_star_details
: Calculated$x^*$ values, methods used, and any metadata (e.g., warnings during calculation) for each issue/year. -
polarization_binary_comparisons
:-
definition1
: Summary outcomes and detailed interval checks. -
proposition1
: Summary outcomes and detailed condition checks, including the identified$x^*$ for Proposition 1.
-
-
affective_polarization
: Affective polarization scores ($A(F)$), group means ($m_L, m_R$ ), and group sizes for each issue/year. -
issue_salience_effects
(if performed): DataFrames showing how polarization metrics change with varying issue salience ($\alpha$ ). -
continuous_polarization_measures
: Calculated$P(F, x^*)$ scores for each issue/year. -
computational_diagnostics
: Notes on numerical stability or issues encountered. -
pipeline_status
: 'Success' or 'Failed'. -
error_message
: Details if the pipeline failed.
This dictionary can be easily inspected, saved to JSON, or used to generate further tables and visualizations.
The pipeline offers flexibility through several key parameters, notably:
-
Choice of
$x^*$ : Thex_star_param
allows selection of mean, median, mode, or a fixed value. -
Animosity Function
$g(\cdot)$ : Theg_func_affective_pol
parameter accepts any Python callable that takes a distance (float or NumPy array) and returns an animosity value. The notebook provideslinear_animosity_function
($g(d)=d$ ) andquadratic_animosity_function
($g(d)=d^2$ ) as examples. Users must also provide a string name for reporting viag_func_name_affective_pol
. -
Continuous Measure Functions
$a(\cdot), b(\cdot)$ : Thea_func_continuous_measure
andb_func_continuous_measure
parameters accept Python callables that take a float (sum of CDF values) and return a float. These functions must be strictly increasing. The notebook providesidentity_function_for_continuous_measure
($f(y)=y$ ) andcubic_function_for_continuous_measure
($f(y)=y^3$ ) as examples. Users must also provide string names for reporting viaa_func_name_continuous_measure
andb_func_name_continuous_measure
.
By defining and passing their own functions for these parameters, users can tailor the analysis to specific theoretical assumptions or empirical needs.
This project is licensed under the MIT License. See the LICENSE
file (not included here, but should be created in the repository root) for details. The text of the MIT license is as follows:
MIT License
Copyright (c) 2025 Craig Chirinda (Open Source Projects)
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
If you use this codebase or the methodologies it implements in your research, please cite the original paper:
- Ginzburg, B. (2024). Comparing Electoral Polarization Levels. arXiv preprint arXiv:2411.04072. Available at: https://arxiv.org/abs/2411.04072
Consider also acknowledging this GitHub repository if the implementation itself was significantly helpful.
Contributions to this project are welcome. Please consider the following guidelines:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Ensure your code adheres to PEP-8 standards and includes thorough type hinting and docstrings.
- Write unit tests for new functionality.
- Submit a pull request with a clear description of your changes.
(Further details on development setup, testing framework, and specific contribution areas can be added as the project matures.)
--
This README was generated based on the structure and content of comparing_election_polarization_levels_draft.ipynb
.