Description
Is your feature request related to a problem or challenge?
- part of [DISCUSSION] Make it easier and faster to query remote files (S3, iceberg, etc) #13456
- related to Support
datafusion-cli
access to public S3 buckets that do not require authentication #16299
I would like to make querying files from remote stores to be easy and a great experience in DataFusion, and datafusion-cli
in particular.
While testing #16300, I tried this command:
datafusion-cli
> CREATE EXTERNAL TABLE nyc_taxi_rides
STORED AS PARQUET LOCATION 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet';
Object Store error: Object at location nyc_taxi_rides/data/tripdata_parquet not found: Error performing HEAD https://s3.us-east-1.amazonaws.com/altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet in 142.679833ms - Server returned non-2xx status code: 404 Not Found:
This confused me for quite a while as that is a valid url (prefix)
The issue is that the url 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet'
does not end in a /
. If you add a /
it then works great:
> CREATE EXTERNAL TABLE nyc_taxi_rides
STORED AS PARQUET LOCATION 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/';
0 row(s) fetched.
Elapsed 1.624 seconds.
BTW this is consistent with a local file system where selecting from a directory that doesn't end in a path works just fine:
-- Write data to `foo` directory:
> copy (values(1)) to 'foo/1.parquet';
+-------+
| count |
+-------+
| 1 |
+-------+
1 row(s) fetched.
Elapsed 0.044 seconds.
-- Note the location doesn't end in `/` but it works fine
> create external table foo stored as parquet location 'foo';
0 row(s) fetched.
Elapsed 0.022 seconds.
> select * from foo;
+---------+
| column1 |
+---------+
| 1 |
+---------+
1 row(s) fetched.
Elapsed 0.132 seconds.
Describe the solution you'd like
I would like this to be less confusing
Describe alternatives you've considered
Alternate 1: Better Error Message
At the very least we can make the message more explicit ("Not found. Hint: if it is a directory the path should end with /
")
Alternate 2: Preferred
It would be even better to automatically add a/
to the path if the first one was not found and try again
I think the trick will be to figure out at what level we should try to add /
(probably when first creating the ListingTable?)
Additional context
No response