Skip to content

OFFSET and LIMIT combination gives inconsistent output #25071

Open
@pratyakshsharma

Description

@pratyakshsharma

When running a query that uses OFFSET and LIMIT clauses together, we see output is incorrect at times and is inconsistent across different runs. If only OFFSET is used, the results are correct in all runs I tried.

Your Environment

  • Presto version used: Latest master
  • Storage (HDFS/S3/GCS..): Local filesystem
  • Data source and connector used: hive
  • Deployment (Cloud or On-prem):
  • Pastebin link to the complete debug logs:

Expected Behavior

The OFFSET clause should skip the required rows whether or not LIMIT clause is present in the query.

Current Behavior

When a combination of OFFSET and LIMIT clause are used, query returns inconsistent results

Possible Solution

Steps to Reproduce

  1. Create an external table on local with below DDL -
CREATE TABLE hive.default.stutos ( "accountid_key" varchar, "accountid" varchar, "applicationid" varchar, "app_name" varchar, "app_environment" varchar, "endpoint_name" varchar, "endpointid" varchar, "type" varchar, "linkedtoidentity" varchar, "status" varchar, "email" varchar, "gecos" varchar, "last_revalidation_by_serial" varchar, "last_revalidation_date" varchar, "last_access_date" varchar, "rec_status" varchar, "rec_source" varchar, "rec_created_date" varchar, "rec_updated_date" varchar, "rec_deleted_date" varchar, "active_riskscore" varchar, "transactional_rule" varchar, "priority" varchar, "multiple_infractions" varchar, "server_id" varchar, "account_name" varchar, "is_password_less" varchar, "is_mfa_enabled" varchar, "failedlogincount" varchar, "changepwdcount" varchar, "lastpwdchangetimestamp" varchar, "pwdexpirationdate" varchar, "usage" varchar, "description" varchar, "certificationtimestamp" varchar, "lastrevalidationdate" varchar, "functional_role" varchar, "is_privileged_acct" varchar, "is_functional_acct" varchar, "activation_date" varchar, "deactivation_date" varchar, "expiration_date" varchar, "lastlogintimestamp" varchar, "acct_owner" varchar, "rplex" varchar, "system_name" varchar, "ip_address" varchar, "pwd_change_date" varchar, "phr_change_date" varchar, "acct_owner_email" varchar, "revoke_date" varchar, "last_acc_login_date" varchar, "log_date" varchar ) WITH ( format = 'CSV', external_location = 'file:///Users/pratyakshsharma/Downloads/offset_test' );
  1. Use the attached file as data source
  2. Run below queries -
select accountid from hive.default.stutos ORDER BY accountid OFFSET 0 limit 1000; - this will give inconsistent results as shown in the screenshots attached.

select accountid from hive.default.stutos ORDER BY accountid OFFSET <n>; replace n with any number and the results are always correct when LIMIT clause is absent

Screenshots (if appropriate)

Image Image Image Image Image Image

Context

Incorrect results can affect production workloads and reports where presto is being used.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

👀 Review

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions