Skip to content

Commit f62ddc5

Browse files
zjffduFelix Cheung
authored andcommitted
[SPARK-17210][SPARKR] sparkr.zip is not distributed to executors when running sparkr in RStudio
## What changes were proposed in this pull request? Spark will add sparkr.zip to archive only when it is yarn mode (SparkSubmit.scala). ``` if (args.isR && clusterManager == YARN) { val sparkRPackagePath = RUtils.localSparkRPackagePath if (sparkRPackagePath.isEmpty) { printErrorAndExit("SPARK_HOME does not exist for R application in YARN mode.") } val sparkRPackageFile = new File(sparkRPackagePath.get, SPARKR_PACKAGE_ARCHIVE) if (!sparkRPackageFile.exists()) { printErrorAndExit(s"$SPARKR_PACKAGE_ARCHIVE does not exist for R application in YARN mode.") } val sparkRPackageURI = Utils.resolveURI(sparkRPackageFile.getAbsolutePath).toString // Distribute the SparkR package. // Assigns a symbol link name "sparkr" to the shipped package. args.archives = mergeFileLists(args.archives, sparkRPackageURI + "#sparkr") // Distribute the R package archive containing all the built R packages. if (!RUtils.rPackages.isEmpty) { val rPackageFile = RPackageUtils.zipRLibraries(new File(RUtils.rPackages.get), R_PACKAGE_ARCHIVE) if (!rPackageFile.exists()) { printErrorAndExit("Failed to zip all the built R packages.") } val rPackageURI = Utils.resolveURI(rPackageFile.getAbsolutePath).toString // Assigns a symbol link name "rpkg" to the shipped package. args.archives = mergeFileLists(args.archives, rPackageURI + "#rpkg") } } ``` So it is necessary to pass spark.master from R process to JVM. Otherwise sparkr.zip won't be distributed to executor. Besides that I also pass spark.yarn.keytab/spark.yarn.principal to spark side, because JVM process need them to access secured cluster. ## How was this patch tested? Verify it manually in R Studio using the following code. ``` Sys.setenv(SPARK_HOME="/Users/jzhang/github/spark") .libPaths(c(file.path(Sys.getenv(), "R", "lib"), .libPaths())) library(SparkR) sparkR.session(master="yarn-client", sparkConfig = list(spark.executor.instances="1")) df <- as.DataFrame(mtcars) head(df) ``` … Author: Jeff Zhang <[email protected]> Closes apache#14784 from zjffdu/SPARK-17210.
1 parent f89808b commit f62ddc5

File tree

2 files changed

+19
-0
lines changed

2 files changed

+19
-0
lines changed

R/pkg/R/sparkR.R

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -491,6 +491,10 @@ sparkConfToSubmitOps[["spark.driver.memory"]] <- "--driver-memory"
491491
sparkConfToSubmitOps[["spark.driver.extraClassPath"]] <- "--driver-class-path"
492492
sparkConfToSubmitOps[["spark.driver.extraJavaOptions"]] <- "--driver-java-options"
493493
sparkConfToSubmitOps[["spark.driver.extraLibraryPath"]] <- "--driver-library-path"
494+
sparkConfToSubmitOps[["spark.master"]] <- "--master"
495+
sparkConfToSubmitOps[["spark.yarn.keytab"]] <- "--keytab"
496+
sparkConfToSubmitOps[["spark.yarn.principal"]] <- "--principal"
497+
494498

495499
# Utility function that returns Spark Submit arguments as a string
496500
#

docs/sparkr.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,21 @@ The following Spark driver properties can be set in `sparkConfig` with `sparkR.s
6262

6363
<table class="table">
6464
<tr><th>Property Name</th><th>Property group</th><th><code>spark-submit</code> equivalent</th></tr>
65+
<tr>
66+
<td><code>spark.master</code></td>
67+
<td>Application Properties</td>
68+
<td><code>--master</code></td>
69+
</tr>
70+
<tr>
71+
<td><code>spark.yarn.keytab</code></td>
72+
<td>Application Properties</td>
73+
<td><code>--keytab</code></td>
74+
</tr>
75+
<tr>
76+
<td><code>spark.yarn.principal</code></td>
77+
<td>Application Properties</td>
78+
<td><code>--principal</code></td>
79+
</tr>
6580
<tr>
6681
<td><code>spark.driver.memory</code></td>
6782
<td>Application Properties</td>

0 commit comments

Comments
 (0)