Problem saving model data (NetCDF) with pandas' integer type #1718

jsnyde0 · 2025-05-27T14:03:29Z

I get a TypeError when trying to save my model (mmm.save("model.nc")). It seems to have an issue if my data uses a special kind of integer type from pandas called Int64Dtype.

What I'm doing:

I'm building a Marketing Mix Model (MMM). In my model, the main thing I'm trying to predict (my y variable) is the number of units sold. These are naturally whole numbers (integers). Pandas seems to use Int64Dtype for this column.

When ArviZ tries to save the InferenceData (which includes my sales data), it gives a TypeError if that Int64Dtype is present. It looks like the saving process doesn't quite know how to handle this specific pandas integer type.

Here's a simple code example that shows the problem:

import arviz as az
import numpy as np
import pandas as pd
import xarray as xr

def run_simplified_reproducible_example():
    print(f"Running with: pandas {pd.__version__}, xarray {xr.__version__}, arviz {az.__version__}, numpy {np.__version__}")

    # 1. Simulate sales data (integers, with a missing value)
    sales_data_with_na = pd.Series(
        [100, 150, 20, 200, 120], dtype="Int64", name="units_sold"
    )

    # 2. Put it into an xarray.DataArray
    sales_data_array = xr.DataArray(
        sales_data_with_na,
        dims=["time_period"],
        coords={"time_period": np.arange(len(sales_data_with_na))},
        name="units_sold_observed",
    )

    # 3. Create an arviz.InferenceData object (like what my MMM produces)
    model_dataset = xr.Dataset({sales_data_array.name: sales_data_array})
    inference_data_to_save = az.InferenceData(observed_data=model_dataset)

    # 4. Try to save it (this is where the error usually happens)
    output_filename = "test_sales_model_save.nc"
    print(f"\nTrying to save to '{output_filename}'...")
    try:
        inference_data_to_save.to_netcdf(output_filename)
        print(f"Saved '{output_filename}' successfully (This is UNEXPECTED if the issue exists).")
    except TypeError as e:
        print(f"\n--- EXPECTED TypeError ---")
        print(f"Oops, couldn't save '{output_filename}'. Error: {e}")
        print("This is the TypeError I'm seeing due to the Int64Dtype.")
        print(f"--- END OF TypeError ---")
    except Exception as e:
        print(f"\n--- Some Other Error ---")
        print(f"An different error happened: {e}")
        print(f"--- END OF Other Error ---")

if __name__ == "__main__":
    run_simplified_reproducible_example()

My own versions:

pandas version: 2.2.3
xarray version: 2025.4.0
arviz version: 0.21.0
numpy version: 2.2.6

Solution

Not sure if this should be solved here by converting data types before you save (which is what I'm doing currently), or move this over to ArviZ?

The text was updated successfully, but these errors were encountered:

williambdean · 2025-05-27T15:41:36Z

Yeah, I would raise with the arviz team

williambdean · 2025-05-27T15:42:25Z

Are you able to convert your data to floats?

jsnyde0 · 2025-05-28T07:39:05Z

Yeah, this helped:

def convert_int64_to_float(dataset, var_name='y'):
    """Directly converts var_name in dataset if it's Int64Dtype."""
    if dataset is not None and var_name in dataset.data_vars:
        data_array = dataset[var_name]
        if isinstance(data_array.dtype, pd.Int64Dtype):
            print(f"  Targeted: Converting '{var_name}' in group to float64.")
            # Simplified conversion for Series-like data in DataArray
            converted_values = pd.Series(data_array.data.ravel()).astype(float).to_numpy().reshape(data_array.shape)

            new_da = xr.DataArray(
                converted_values,
                coords=data_array.coords,
                dims=data_array.dims,
                name=data_array.name,
                attrs=data_array.attrs
            )
            return dataset.assign({var_name: new_da})
    return dataset # Return original if no conversion needed or var not found

if mmm.idata is not None:
    print("Applying Int64Dtype conversion for 'y' variable...")

    # Target 'fit_data'
    if hasattr(mmm.idata, 'fit_data'):
        mmm.idata.fit_data = convert_int64_to_float(mmm.idata.fit_data, 'y')

    # Target 'observed_data'
    if hasattr(mmm.idata, 'observed_data'):
         mmm.idata.observed_data = convert_int64_to_float(mmm.idata.observed_data, 'y')

    print("Minimal targeted conversion finished.")
else:
    print("mmm.idata is None, skipping conversion.")

github-actions bot added the Needs Triage label May 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problem saving model data (NetCDF) with pandas' integer type #1718

Problem saving model data (NetCDF) with pandas' integer type #1718

jsnyde0 commented May 27, 2025

williambdean commented May 27, 2025

Uh oh!

williambdean commented May 27, 2025

Uh oh!

jsnyde0 commented May 28, 2025

Uh oh!

Problem saving model data (NetCDF) with pandas' integer type #1718

Problem saving model data (NetCDF) with pandas' integer type #1718

Comments

jsnyde0 commented May 27, 2025

What I'm doing:

Solution

williambdean commented May 27, 2025

Uh oh!

williambdean commented May 27, 2025

Uh oh!

jsnyde0 commented May 28, 2025

Uh oh!