Add initial integration test code #3

kimoonkim · 2017-12-17T01:19:29Z

Closes #2.

Add a small number of basic test functions that work with both the main spark repo and our fork. They use SparkPi and support a minimum set of features. Only scala, no resource staging support, etc.

The project no longer needs Spark jars as maven dependencies. Instead, we pass a distro tarball and dockerfiles dir to maven like:

$ mvn clean integration-test  \
    -Dspark-distro-tgz=/tmp/spark-2.3.0-SNAPSHOT-bin-20171216-0c8fca4608.tgz  \
    -Dspark-dockerfiles-dir=.../spark/resource-managers/kubernetes/docker/src/main/dockerfiles

Then maven builds Docker images from the distro tarball. The test code launches Spark jobs using the spark-submit CLI interface.

@foxish @liyinan926 @mccheah @ssuchter

liyinan926 · 2017-12-18T04:06:53Z

...ration-test/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesSuite.scala

+import org.apache.spark.deploy.k8s.integrationtest.constants.SPARK_DISTRO_PATH
+
+private[spark] class KubernetesSuite extends FunSuite with BeforeAndAfterAll with BeforeAndAfter {
+  import KubernetesSuite._


Add an empty line before the import.

liyinan926 · 2017-12-18T04:08:15Z

...ration-test/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesSuite.scala

+
+  before {
+    sparkAppConf = kubernetesTestComponents.newSparkAppConf()
+      .set("spark.kubernetes.initcontainer.docker.image", "spark-init:latest")


The property names for images have been changed in apache/spark#19995. They need to be updated once that PR is merged.

Ah, good to know. I'll leave a NOTE. Also, let me delete initcontainer lines since we don't use it here yet.

liyinan926 · 2017-12-18T04:08:42Z

...ration-test/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesSuite.scala

+  before {
+    sparkAppConf = kubernetesTestComponents.newSparkAppConf()
+      .set("spark.kubernetes.initcontainer.docker.image", "spark-init:latest")
+      .set("spark.kubernetes.driver.docker.image", "spark-driver:latest")


It seems newSparkAppConf already sets spark.kubernetes.driver.docker.image.

liyinan926 · 2017-12-18T04:09:25Z

...st/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesTestComponents.scala

+
+import io.fabric8.kubernetes.client.DefaultKubernetesClient
+import org.scalatest.concurrent.Eventually
+import scala.collection.mutable


Scala packages should be in a group after java packages.

liyinan926 · 2017-12-18T04:18:40Z

integration-test/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/ProcessUtils.scala

+    val outputLines = new ArrayBuffer[String]
+
+    Utils.tryWithResource(new InputStreamReader(proc.getInputStream)) { procOutput =>
+      Utils.tryWithResource(new BufferedReader(procOutput)) { (bufferedOutput: BufferedReader) =>


BufferedReader and InputStreamReader can be combined into a single resource new BufferedReader(new InputStreamReader(proc.getInputStream)).

Turning to Source.fromInputStream to avoid dealing with this ourselves.

liyinan926 · 2017-12-18T04:26:11Z

integration-test/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/ProcessUtils.scala

+        var line: String = null
+        do {
+          line = bufferedOutput.readLine()
+          if (line != null) {


This check and the condition of the while loop are redundant.

Ditto. Source.fromInputStream replaces this code.

liyinan926 · 2017-12-18T04:31:46Z

...t/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/Minikube.scala

+
+  private val MINIKUBE_STARTUP_TIMEOUT_SECONDS = 60
+
+  def startMinikube(): Unit = synchronized {


Why are this and the following methods synchronized ? executeMinikube cannot be called concurrently?

@mccheah would know better. I guess we want to prevent deleteMinikube from destroying the minikube vm while other methods try to use the vm. Maybe such a race condition can corrupt the vm or some vm provisioning tools like VirtualBox?

Does this explanation make sense? If yes, I think we can leave a NOTE.

It makes sense to me.

liyinan926 · 2017-12-18T04:33:02Z

.../test/scala/org/apache/spark/deploy/k8s/integrationtest/docker/SparkDockerImageBuilder.scala

+
+  private val originalDockerUri = URI.create(dockerHost)
+  private val httpsDockerUri = new URIBuilder()
+      .setHost(originalDockerUri.getHost)


liyinan926 · 2017-12-18T04:33:13Z

.../test/scala/org/apache/spark/deploy/k8s/integrationtest/docker/SparkDockerImageBuilder.scala

+  private val dockerClient = new DefaultDockerClient.Builder()
+    .uri(httpsDockerUri)
+    .dockerCertificates(DockerCertificates
+        .builder()


Ditto, indention.

liyinan926 · 2017-12-18T04:40:19Z

.../src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/GCE/GCETestBackend.scala

+  override def initialize(): Unit = {
+    var k8ConfBuilder = new ConfigBuilder()
+      .withApiVersion("v1")
+      .withMasterUrl(master.replaceFirst("k8s://", ""))


Should we copy the implementation of https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L2763 for parsing k8s master URL?

Yes, it seems better to have the code. Do we think copying is the right way, instead of somehow depending on a jar as a maven library? Copying is easy at least for now. Just curious how others think.

liyinan926 · 2017-12-18T20:11:04Z

...st/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesTestComponents.scala

@@ -56,6 +57,8 @@ private[spark] class KubernetesTestComponents(defaultClient: DefaultKubernetesCl
    new SparkAppConf()
      .set("spark.master", s"k8s://${kubernetesClient.getMasterUrl}")
      .set("spark.kubernetes.namespace", namespace)
+      // TODO: apache/spark#19995 is changing docker.image to container.image in these properties.


FYI: the PR has been merged.

I see. I'll have to update my distro, make this change and test it again.

This is done by commit 989a371.

liyinan926 · 2017-12-18T20:13:04Z

integration-test/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/ProcessUtils.scala

      }
+    }{
+      output => output.close()


Is calling close necessary?

Yes, it is because output is not a subclass of Closeable. See the extra version of Utils.tryWithResource that I added for details.

It seems Source is Closeable, https://www.scala-lang.org/api/current/scala/io/Source.html. So you don't need to call close explicitly.

Ah, that is interesting. I think it became Closeable in scala version 2.12. Source in scala 2.11 is not Closeable. From https://github.com/scala/scala/blob/v2.11.8/src/library/scala/io/Source.scala#L190:

abstract class Source extends Iterator[Char] {

I actually managed to get rid of the close call by using the input stream as resource. PTAL.

ifilonenko · 2017-12-18T20:24:29Z

integration-test/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/ProcessUtils.scala

+   * completes within timeout seconds.
+   */
+  def executeProcess(fullCommand: Array[String], timeout: Long): Seq[String] = {
+    val pb = new ProcessBuilder().command(fullCommand: _*)


Why not launch a pod to kick off the spark-submit command?

People launch spark-submit from outside their k8s clusters. I think launching spark-submit itself inside a pod would deviate from how people use spark on k8s. Besides, I don't even know running spark-submit inside a pod works :-)

I believe decoupling the spark-submit portion from the integration tests is better to assume that issues with host machine don't cause the integration tests to fail when they shouldn't. (we have enough assumptions already with Python and R). Thoughts? I can send up a PR to add launching from a pod if we believe it to be better than using ProcessBuilder, which I am trying to avoid.

Launching from a pod would add a second layer of indirection when trying to debug issues with the spark-submit runtime itself. Making assumptions about the host machine should be fine, isn't that what the Spark R and Python unit tests do?

Also it would complicate auth issues in the future when we add back custom client credentials etc. See DriverKubernetesCredentialsStep.

mccheah · 2017-12-18T20:43:35Z

integration-test/pom.xml

+          </execution>
+          <execution>
+            <!-- TODO: Remove this hack once upstream is fixed by SPARK-22777 -->
+            <id>set-exec-bit-on-docker-entrypoint-sh</id>


Just set it in the Dockerfile using: RUN chmod +x /opt/entrypoint.sh.

I think that's better to be done by the upstream code. It's hard for this integration code to surgically do in-place edit of Dockerfile.

The PR has merged now.

mccheah · 2017-12-18T20:44:20Z

integration-test/pom.xml

+        <version>1.3.0</version>
+        <executions>
+          <execution>
+            <id>download-minikube-linux</id>


We shouldn't be downloading here, we should be using the built-in Minikube binary.

I agree that's what we want, but I suggest we address this in a future PR. I think we'd like to start testing the upstream sooner than later.

I think the Riselab nodes where we are running this will have Minikube installed already, as it assumes the setup where the Minikube instance is being re-used.

So it should be just as easy to use the pre-installed minikube as it would be to download a new one every time. That is, the total time from now to when the tests start running against upstream should be equivalent regardless of which branch we use - so we should use the more robust mode from the start.

Copying my reply from below:

I am not familiar with #521 yet. So I don't know how much extra time it would need to address the code and potential review comments that the new code would invite. And I personally don't have a lot of extra time to spend on this beyond what I spent already.

I also don't know much about the riselab setup. I was hoping to delay that until @ssuchter comes back. We can use Pepperdata jenkins in the meantime.

So I really hope that we can keep this PR simple and address gaps in future PRs.

mccheah · 2017-12-18T20:46:27Z

...scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/MinikubeTestBackend.scala

+
+import org.apache.spark.deploy.k8s.integrationtest.backend.IntegrationTestBackend
+import org.apache.spark.deploy.k8s.integrationtest.constants.MINIKUBE_TEST_BACKEND
+import org.apache.spark.deploy.k8s.integrationtest.docker.SparkDockerImageBuilder


I believe this is using the incorrect branch - shouldn't this be based on apache-spark-on-k8s/spark#521 which in turn assumes that Minikube is pre-installed on the box? The code here should reflect what's in the integration-tests-reuse-minikube branch in apache-spark-on-k8s.

Yeah, I tried to start with apache-spark-on-k8s/spark#521. But I had to give up after I realized there are too many things to do even with what's already in the fork. Mostly decoupling the maven dependencies took me 2 full days including Saturday afternoon. A lot of fun :-)

I think we want to add #521 later in a follow-up PR. Maybe others can chip in once the basic overhaul is done by this PR.

Hm, I'm not sure what's in #521 that's specific to the fork that isn't in this PR?

Specifically, looking at the diff in the PR, it's entirely code level. There aren't any new Maven constructs added or taken away. So we should be able to apply the diff here as well without needing to worry about Maven at all?

I am not familiar with #521 yet. So I don't know how much extra time it would need to address the code and potential review comments that the new code would invite. And I personally don't have a lot of extra time to spend on this beyond what I spent already.

I also don't know much about the riselab setup. I was hoping to delay that until @ssuchter comes back. We can use Pepperdata jenkins in the meantime.

So I really hope that we can keep this PR simple and address gaps in future PRs.

Followed up offline - we can introduce the usage of a pre-installed Minikube later, and in fact we can't use a pre-installed Minikube right now since our first iteration of this will be deploying the tests on Pepperdata's Jenkins nodes. The jenkins jobs will be serialized with builds from apache-spark-on-k8s, so we shouldn't have multiple test runs colliding on managing the Minikube VM.

There needs to be a specific sequencing of events to switch to using a pre-installed Minikube instance on the Riselab jenkins nodes instead. I'll follow up on what that specific sequencing is in an issue.

Filed https://github.com/apache-spark-on-k8s/spark-integration/issues/4 to track.

mccheah · 2017-12-18T21:31:59Z

integration-test/pom.xml

+    <dependency>
+      <groupId>com.spotify</groupId>
+      <artifactId>docker-client</artifactId>
+      <version>5.0.2</version>


I'm not sure if we should have a universal standard of having these versions defined at the top. I'm not strongly opinionated here, but it's something to consider.

It's just strange that some versions are hardcoded here while others are in constants.

Sure. I can put all of them in properties.

mccheah · 2017-12-18T21:36:49Z

Actually wouldn't this repository also need to define the script that runs the make-distribution to create the Spark tarball given a remote repository location? I'd imagine this repository's code would like to include that script and the Jenkins job just runs it.

kimoonkim · 2017-12-18T21:46:46Z

I have been running make-distribution.sh manually and pass the output tarball to the integration-test. For example:

$ export DATE=`date "+%Y%m%d"`
$ export REVISION=`git rev-parse --short HEAD`
$ ./dev/make-distribution.sh --name ${DATE}-${REVISION} --tgz -Phadoop-2.6 -Pkubernetes -Pkinesis-asl -Phive -Phive-thriftserver

We can configure Jenkins job(s) to include this or we can check in a script into our repo. I don't have a strong opinion either way.

There is a minor issue in checking in the script. There can be some variations in how we build the distribution tarballs, like hadoop-2.6 vs hadoop-2.7, including python vs R profiles, etc. And it make take some time before we figure out how many different options we want to test. So checking-in the script may slow us down.

mccheah · 2017-12-18T21:52:05Z

I'm of the opinion that the repository should define as much of the logic as possible so that it's visible to all that will contribute to the project without needing to inspect the Jenkins node. So generally I'd expect one to be able to just run bin/run-tests.sh and it does all of the following:

git checkout the latest master or maybe some git hash/tag with a remote URL. (Edit: Passing in the given remote URL and git hash as parameters is fine, can use apache/spark and the head of master as defaults)
make-distribution.sh
Run tests using Maven

mccheah · 2017-12-18T21:55:45Z

For the make-distribution we can use whatever upstream uses to publish tarballs for releases. I don't foresee us being opinionated about the Hadoop version until we add HDFS support.

kimoonkim · 2017-12-18T22:11:51Z

For the make-distribution we can use whatever upstream uses to publish tarballs for releases.

But if we want to support multiple upstream versions, say 2.3, 2.4, etc, the upstream instruction itself may change over time. For instance, 2.4 may drop hadoop 2.6 support. Or include a new subproject profile that did not exist in 2.3. It's a moving target over time.

I imagine we can include multiple scripts in that case. My point is that there is a downside of doing too much wiring end-to-end, especially at the start when things are in the flux. It tends to lead to more work.

mccheah · 2017-12-18T22:43:20Z

I would expect the addition and modification of these flags to be the exception, not the norm. And our tests will usually not be opinionated towards new modules. For example, our tests don't deal with streaming at all. As we expand our test coverage in the future to include such modules then we'll have to keep this dependency footprint and reliance on profiles in mind.

But this decoupling also gives us the flexibility to maybe build with the minimal set of profiles required to get our tests to work. We could for example build a distribution without Mesos and streaming support - we just have to pass the minimal set of flags to make-distribution.sh. The disadvantage there is that we don't tests against the full classpath that spark-submit will run with, so we might miss classpath issues in the process.

In the longer term scheme of things we can propose the notion of upstream publishing nightly builds that would be built with the profiles that the full releases are built with.

kimoonkim · 2017-12-18T23:16:52Z

The last discussion is actually a nice segway into the next topic, how exactly we should design the Jenkins job(s). Let's move it to issue #2. I have a few questions to ask there.

kimoonkim · 2017-12-19T00:37:57Z

@mccheah Thanks for the review so far. Please take a look at https://github.com/apache-spark-on-k8s/spark-integration/issues/2#issuecomment-352591170. I think this has implications on whether or not we want to include the distribution build script in this repo.

@liyinan926, I think I addressed your comments so far. Can you please take another look? Thanks.

foxish · 2017-12-19T01:21:47Z

integration-test/test-data/input.txt

@@ -0,0 +1 @@
+Contents


Is this file being used?

No, it's not used. Do we want to delete the file?

kimoonkim · 2017-12-19T01:25:36Z

Ok. I just hooked this PR in Jenkins, which is passing now: http://spark-k8s-jenkins.pepperdata.org:8080/view/upstream%20spark%20repo/job/spark-integration/22/console

[INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @ spark-kubernetes-integration-tests_2.11 ---
Discovery starting.
Discovery completed in 142 milliseconds.
Run starting. Expected test count is: 2
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
Run completed in 4 minutes, 38 seconds.
Total number of tests run: 2
Suites: completed 2, aborted 0
Tests: succeeded 2, failed 0, canceled 0, ignored 0, pending 0
All tests passed.

This Jenkins job copies the distro tarball from the other Jenkins job: http://spark-k8s-jenkins.pepperdata.org:8080/view/upstream%20spark%20repo/job/build-spark-distribution/

foxish · 2017-12-19T01:23:10Z

README.md

+Note that currently the integration tests only run with Java 8.
+
+Running the integration tests requires a Spark distribution tarball. It also
+needs a local path to the directory that contains `Dockerimage` files.


This should say dockerfiles? We're shipping it under $DISTDIR/kubernetes/dockerfiles in upstream.

foxish · 2017-12-19T01:24:14Z

README.md

+needs a local path to the directory that contains `Dockerimage` files.
+
+Once you prepare the inputs, the integration tests can be executed with Maven or
+your IDE. Note that when running tests from an IDE, the `pre-integration-test`


Can we add a line explaining what each phase does?

foxish · 2017-12-19T01:25:49Z

README.md

+The integration tests make use of
+[Minikube](https://github.com/kubernetes/minikube), which fires up a virtual
+machine and setup a single-node kubernetes cluster within it. By default the vm
+is destroyed after the tests are finished.  If you want to preserve the vm, e.g.


Can we change this default behavior now? I think we wanted to remove the minikube lifecycle management from these tests. cc/ @mccheah

We can. But existing Jenkins jobs requires minikube to be cleaned up when they are done. So they need to set this flag to be true. I am not sure it's worth the effort now, given that we are going to incorporate apache-spark-on-k8s/spark#521 in the near future.

Fair enough. Thanks for clarifying.

liyinan926

LGTM. Thanks for the work!

kimoonkim · 2017-12-19T19:16:34Z

@liyinan926 Thanks for the review.

@foxish I think I have addressed your comments so far. Please take another look.

foxish · 2017-12-20T12:20:43Z

LGTM, we can iterate from this point forward. Thanks! Merging now.

kimoonkim added 12 commits December 15, 2017 15:16

Made test code compile

e82d4c5

Clean up pom.xml

7c8612c

Unpack distro in place

5a950eb

Clean up redundant resources

b14430e

Avoid buggy truezip

0443af8

Cleaned up docker image builder

567ad3e

Builds some docker images

74c158b

Drop http jar support for now

bde1cf9

Clean up

8624545

Use spark-submit

5585a0a

Tests pass

f793e38

Add hacks for dockerfiles and entrypoint.sh

069ae50

liyinan926 reviewed Dec 18, 2017

View reviewed changes

kimoonkim added 2 commits December 18, 2017 12:08

Address review comments

204c51f

Clean up pom.xml

6a93a9e

liyinan926 reviewed Dec 18, 2017

View reviewed changes

ifilonenko reviewed Dec 18, 2017

View reviewed changes

Add instructions in README.md

b3e4ee0

mccheah reviewed Dec 18, 2017

View reviewed changes

mccheah approved these changes Dec 18, 2017

View reviewed changes

Switch to container.image property keys

989a371

Define version properies in pom.xml

46d1f5f

Fix a bug in pom.xml

5f88921

foxish reviewed Dec 19, 2017

View reviewed changes

kimoonkim added 2 commits December 18, 2017 17:37

Clean up and fix README.md

a68bd5f

Fix README.md

0f8fad9

liyinan926 approved these changes Dec 19, 2017

View reviewed changes

kimoonkim added 2 commits December 19, 2017 11:11

Remove unnecessary close

7bc4aa2

Clean up

9ec071a

foxish merged commit 55bf2ca into apache-spark-on-k8s:master Dec 20, 2017

kimoonkim deleted the add-test-code branch December 22, 2017 19:01


		private val MINIKUBE_STARTUP_TIMEOUT_SECONDS = 60

		def startMinikube(): Unit = synchronized {

		@@ -0,0 +1 @@
		Contents

Add initial integration test code #3

Add initial integration test code #3

Uh oh!

Conversation

kimoonkim commented Dec 17, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mccheah Dec 18, 2017 •

edited

Loading