Description
Describe the bug
When using s3.to_parquet
to update a parquet file that is partitioned by a time interval or a timestamp "attribute" (such as year, month, hour, etc.), the function fails because for this mode the implementation assumes that the values of partition_cols
are names of the parquet / table columns, and it does not find something like hour(column)
in the dataframe columns.
I think the problem is this line, which uses the function delete_from_iceberg_table
, which expects column names.
How to Reproduce
Expected behavior
I expect the partition_cols
option to accept anything that can be used to partition a parquet. In particular, anything that is accepted when the argument mode
is append
or overwrite
instead of overwrite_partitions
.
Your project
No response
Screenshots
No response
OS
Ubuntu 22.04
Python version
3.10
AWS SDK for pandas version
3.7.3
Additional context
No response