You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I get a TypeError when trying to save my model (mmm.save("model.nc")). It seems to have an issue if my data uses a special kind of integer type from pandas called Int64Dtype.
What I'm doing:
I'm building a Marketing Mix Model (MMM). In my model, the main thing I'm trying to predict (my y variable) is the number of units sold. These are naturally whole numbers (integers). Pandas seems to use Int64Dtype for this column.
When ArviZ tries to save the InferenceData (which includes my sales data), it gives a TypeError if that Int64Dtype is present. It looks like the saving process doesn't quite know how to handle this specific pandas integer type.
Here's a simple code example that shows the problem:
importarvizasazimportnumpyasnpimportpandasaspdimportxarrayasxrdefrun_simplified_reproducible_example():
print(f"Running with: pandas {pd.__version__}, xarray {xr.__version__}, arviz {az.__version__}, numpy {np.__version__}")
# 1. Simulate sales data (integers, with a missing value)sales_data_with_na=pd.Series(
[100, 150, 20, 200, 120], dtype="Int64", name="units_sold"
)
# 2. Put it into an xarray.DataArraysales_data_array=xr.DataArray(
sales_data_with_na,
dims=["time_period"],
coords={"time_period": np.arange(len(sales_data_with_na))},
name="units_sold_observed",
)
# 3. Create an arviz.InferenceData object (like what my MMM produces)model_dataset=xr.Dataset({sales_data_array.name: sales_data_array})
inference_data_to_save=az.InferenceData(observed_data=model_dataset)
# 4. Try to save it (this is where the error usually happens)output_filename="test_sales_model_save.nc"print(f"\nTrying to save to '{output_filename}'...")
try:
inference_data_to_save.to_netcdf(output_filename)
print(f"Saved '{output_filename}' successfully (This is UNEXPECTED if the issue exists).")
exceptTypeErrorase:
print(f"\n--- EXPECTED TypeError ---")
print(f"Oops, couldn't save '{output_filename}'. Error: {e}")
print("This is the TypeError I'm seeing due to the Int64Dtype.")
print(f"--- END OF TypeError ---")
exceptExceptionase:
print(f"\n--- Some Other Error ---")
print(f"An different error happened: {e}")
print(f"--- END OF Other Error ---")
if__name__=="__main__":
run_simplified_reproducible_example()
My own versions:
pandas version: 2.2.3
xarray version: 2025.4.0
arviz version: 0.21.0
numpy version: 2.2.6
Solution
Not sure if this should be solved here by converting data types before you save (which is what I'm doing currently), or move this over to ArviZ?
The text was updated successfully, but these errors were encountered:
defconvert_int64_to_float(dataset, var_name='y'):
"""Directly converts var_name in dataset if it's Int64Dtype."""ifdatasetisnotNoneandvar_nameindataset.data_vars:
data_array=dataset[var_name]
ifisinstance(data_array.dtype, pd.Int64Dtype):
print(f" Targeted: Converting '{var_name}' in group to float64.")
# Simplified conversion for Series-like data in DataArrayconverted_values=pd.Series(data_array.data.ravel()).astype(float).to_numpy().reshape(data_array.shape)
new_da=xr.DataArray(
converted_values,
coords=data_array.coords,
dims=data_array.dims,
name=data_array.name,
attrs=data_array.attrs
)
returndataset.assign({var_name: new_da})
returndataset# Return original if no conversion needed or var not foundifmmm.idataisnotNone:
print("Applying Int64Dtype conversion for 'y' variable...")
# Target 'fit_data'ifhasattr(mmm.idata, 'fit_data'):
mmm.idata.fit_data=convert_int64_to_float(mmm.idata.fit_data, 'y')
# Target 'observed_data'ifhasattr(mmm.idata, 'observed_data'):
mmm.idata.observed_data=convert_int64_to_float(mmm.idata.observed_data, 'y')
print("Minimal targeted conversion finished.")
else:
print("mmm.idata is None, skipping conversion.")
I get a TypeError when trying to save my model (
mmm.save("model.nc")
). It seems to have an issue if my data uses a special kind of integer type from pandas called Int64Dtype.What I'm doing:
I'm building a Marketing Mix Model (MMM). In my model, the main thing I'm trying to predict (my y variable) is the number of units sold. These are naturally whole numbers (integers). Pandas seems to use Int64Dtype for this column.
When ArviZ tries to save the InferenceData (which includes my sales data), it gives a TypeError if that Int64Dtype is present. It looks like the saving process doesn't quite know how to handle this specific pandas integer type.
Here's a simple code example that shows the problem:
My own versions:
Solution
Not sure if this should be solved here by converting data types before you save (which is what I'm doing currently), or move this over to ArviZ?
The text was updated successfully, but these errors were encountered: