Open
Description
Environment
pip datalake==0.14.0
Environment:
- Cloud provider: /
- OS: Linux
Bug
What happened:
I wanted to read the schema from a delta table in pyarrow format via
from deltalake import DeltaTable
file_uri = "..."
deltaTable = DeltaTable(file_uri, storage_options=...)
pyarrow_schema = deltaTable.schema().to_pyarrow()
It failed with error:
Traceback (most recent call last):
[...]
File "my_python_file.py", line 63, in __
pyarrow_schema = deltaTable.schema().to_pyarrow()
Exception: Schema error: Invalid data type for Arrow: void
What you expected to happen:
pyarrow has a null data type. I would expect that in case of a void
type in delta, the pyarrow schema returns the column with null
data type instead of throwing an exception.