Skip to content

Commit 6744258

Browse files
zhaohehuhupan3793
authored andcommitted
[KYUUBI #7077] Spark 3.5: Enhance MaxScanStrategy for DSv2
### Why are the changes needed? To enhance the MaxScanStrategy in Spark's DSv2 to ensure it only works for relations that support statistics reporting. This prevents Spark from returning a default value of Long.MaxValue, which, leads to some queries failing or behaving unexpectedly. ### How was this patch tested? It tested out locally. ### Was this patch authored or co-authored using generative AI tooling? No Closes #7077 from zhaohehuhu/dev-0527. Closes #7077 64001c9 [zhaohehuhu] fix MaxScanStrategy for datasource v2 Authored-by: zhaohehuhu <[email protected]> Signed-off-by: Cheng Pan <[email protected]> (cherry picked from commit bcaff5a) Signed-off-by: Cheng Pan <[email protected]>
1 parent cbded32 commit 6744258

File tree

1 file changed

+2
-1
lines changed
  • extensions/spark/kyuubi-extension-spark-3-5/src/main/scala/org/apache/kyuubi/sql/watchdog

1 file changed

+2
-1
lines changed

extensions/spark/kyuubi-extension-spark-3-5/src/main/scala/org/apache/kyuubi/sql/watchdog/MaxScanStrategy.scala

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ import org.apache.spark.sql.catalyst.SQLConfHelper
2323
import org.apache.spark.sql.catalyst.catalog.{CatalogTable, HiveTableRelation}
2424
import org.apache.spark.sql.catalyst.planning.ScanOperation
2525
import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
26+
import org.apache.spark.sql.connector.read.SupportsReportStatistics
2627
import org.apache.spark.sql.execution.SparkPlan
2728
import org.apache.spark.sql.execution.datasources.{CatalogFileIndex, HadoopFsRelation, InMemoryFileIndex, LogicalRelation}
2829
import org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanRelation
@@ -237,7 +238,7 @@ case class MaxScanStrategy(session: SparkSession)
237238
_,
238239
_,
239240
_,
240-
relation @ DataSourceV2ScanRelation(_, _, _, _, _)) =>
241+
relation @ DataSourceV2ScanRelation(_, _: SupportsReportStatistics, _, _, _)) =>
241242
val table = relation.relation.table
242243
if (table.partitioning().nonEmpty) {
243244
val partitionColumnNames = table.partitioning().map(_.describe())

0 commit comments

Comments
 (0)