Skip to content

Add initial integration test code #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Dec 20, 2017

Conversation

kimoonkim
Copy link
Member

Closes #2.

Add a small number of basic test functions that work with both the main spark repo and our fork. They use SparkPi and support a minimum set of features. Only scala, no resource staging support, etc.

The project no longer needs Spark jars as maven dependencies. Instead, we pass a distro tarball and dockerfiles dir to maven like:

$ mvn clean integration-test  \
    -Dspark-distro-tgz=/tmp/spark-2.3.0-SNAPSHOT-bin-20171216-0c8fca4608.tgz  \
    -Dspark-dockerfiles-dir=.../spark/resource-managers/kubernetes/docker/src/main/dockerfiles

Then maven builds Docker images from the distro tarball. The test code launches Spark jobs using the spark-submit CLI interface.

@foxish @liyinan926 @mccheah @ssuchter

import org.apache.spark.deploy.k8s.integrationtest.constants.SPARK_DISTRO_PATH

private[spark] class KubernetesSuite extends FunSuite with BeforeAndAfterAll with BeforeAndAfter {
import KubernetesSuite._
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an empty line before the import.


before {
sparkAppConf = kubernetesTestComponents.newSparkAppConf()
.set("spark.kubernetes.initcontainer.docker.image", "spark-init:latest")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The property names for images have been changed in apache/spark#19995. They need to be updated once that PR is merged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good to know. I'll leave a NOTE. Also, let me delete initcontainer lines since we don't use it here yet.

before {
sparkAppConf = kubernetesTestComponents.newSparkAppConf()
.set("spark.kubernetes.initcontainer.docker.image", "spark-init:latest")
.set("spark.kubernetes.driver.docker.image", "spark-driver:latest")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems newSparkAppConf already sets spark.kubernetes.driver.docker.image.


import io.fabric8.kubernetes.client.DefaultKubernetesClient
import org.scalatest.concurrent.Eventually
import scala.collection.mutable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scala packages should be in a group after java packages.

val outputLines = new ArrayBuffer[String]

Utils.tryWithResource(new InputStreamReader(proc.getInputStream)) { procOutput =>
Utils.tryWithResource(new BufferedReader(procOutput)) { (bufferedOutput: BufferedReader) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BufferedReader and InputStreamReader can be combined into a single resource new BufferedReader(new InputStreamReader(proc.getInputStream)).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turning to Source.fromInputStream to avoid dealing with this ourselves.

var line: String = null
do {
line = bufferedOutput.readLine()
if (line != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check and the condition of the while loop are redundant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. Source.fromInputStream replaces this code.


private val MINIKUBE_STARTUP_TIMEOUT_SECONDS = 60

def startMinikube(): Unit = synchronized {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are this and the following methods synchronized ? executeMinikube cannot be called concurrently?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mccheah would know better. I guess we want to prevent deleteMinikube from destroying the minikube vm while other methods try to use the vm. Maybe such a race condition can corrupt the vm or some vm provisioning tools like VirtualBox?

Does this explanation make sense? If yes, I think we can leave a NOTE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to me.


private val originalDockerUri = URI.create(dockerHost)
private val httpsDockerUri = new URIBuilder()
.setHost(originalDockerUri.getHost)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indention.

private val dockerClient = new DefaultDockerClient.Builder()
.uri(httpsDockerUri)
.dockerCertificates(DockerCertificates
.builder()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, indention.

override def initialize(): Unit = {
var k8ConfBuilder = new ConfigBuilder()
.withApiVersion("v1")
.withMasterUrl(master.replaceFirst("k8s://", ""))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it seems better to have the code. Do we think copying is the right way, instead of somehow depending on a jar as a maven library? Copying is easy at least for now. Just curious how others think.

@@ -56,6 +57,8 @@ private[spark] class KubernetesTestComponents(defaultClient: DefaultKubernetesCl
new SparkAppConf()
.set("spark.master", s"k8s://${kubernetesClient.getMasterUrl}")
.set("spark.kubernetes.namespace", namespace)
// TODO: apache/spark#19995 is changing docker.image to container.image in these properties.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: the PR has been merged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I'll have to update my distro, make this change and test it again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done by commit 989a371.

}
}{
output => output.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is calling close necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is because output is not a subclass of Closeable. See the extra version of Utils.tryWithResource that I added for details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems Source is Closeable, https://www.scala-lang.org/api/current/scala/io/Source.html. So you don't need to call close explicitly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that is interesting. I think it became Closeable in scala version 2.12. Source in scala 2.11 is not Closeable. From https://github.com/scala/scala/blob/v2.11.8/src/library/scala/io/Source.scala#L190:

abstract class Source extends Iterator[Char] {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually managed to get rid of the close call by using the input stream as resource. PTAL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

* completes within timeout seconds.
*/
def executeProcess(fullCommand: Array[String], timeout: Long): Seq[String] = {
val pb = new ProcessBuilder().command(fullCommand: _*)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not launch a pod to kick off the spark-submit command?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People launch spark-submit from outside their k8s clusters. I think launching spark-submit itself inside a pod would deviate from how people use spark on k8s. Besides, I don't even know running spark-submit inside a pod works :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe decoupling the spark-submit portion from the integration tests is better to assume that issues with host machine don't cause the integration tests to fail when they shouldn't. (we have enough assumptions already with Python and R). Thoughts? I can send up a PR to add launching from a pod if we believe it to be better than using ProcessBuilder, which I am trying to avoid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Launching from a pod would add a second layer of indirection when trying to debug issues with the spark-submit runtime itself. Making assumptions about the host machine should be fine, isn't that what the Spark R and Python unit tests do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it would complicate auth issues in the future when we add back custom client credentials etc. See DriverKubernetesCredentialsStep.

</execution>
<execution>
<!-- TODO: Remove this hack once upstream is fixed by SPARK-22777 -->
<id>set-exec-bit-on-docker-entrypoint-sh</id>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just set it in the Dockerfile using: RUN chmod +x /opt/entrypoint.sh.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's better to be done by the upstream code. It's hard for this integration code to surgically do in-place edit of Dockerfile.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR has merged now.

<version>1.3.0</version>
<executions>
<execution>
<id>download-minikube-linux</id>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be downloading here, we should be using the built-in Minikube binary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that's what we want, but I suggest we address this in a future PR. I think we'd like to start testing the upstream sooner than later.

Copy link
Contributor

@mccheah mccheah Dec 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Riselab nodes where we are running this will have Minikube installed already, as it assumes the setup where the Minikube instance is being re-used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it should be just as easy to use the pre-installed minikube as it would be to download a new one every time. That is, the total time from now to when the tests start running against upstream should be equivalent regardless of which branch we use - so we should use the more robust mode from the start.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying my reply from below:

I am not familiar with #521 yet. So I don't know how much extra time it would need to address the code and potential review comments that the new code would invite. And I personally don't have a lot of extra time to spend on this beyond what I spent already.

I also don't know much about the riselab setup. I was hoping to delay that until @ssuchter comes back. We can use Pepperdata jenkins in the meantime.

So I really hope that we can keep this PR simple and address gaps in future PRs.


import org.apache.spark.deploy.k8s.integrationtest.backend.IntegrationTestBackend
import org.apache.spark.deploy.k8s.integrationtest.constants.MINIKUBE_TEST_BACKEND
import org.apache.spark.deploy.k8s.integrationtest.docker.SparkDockerImageBuilder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is using the incorrect branch - shouldn't this be based on apache-spark-on-k8s/spark#521 which in turn assumes that Minikube is pre-installed on the box? The code here should reflect what's in the integration-tests-reuse-minikube branch in apache-spark-on-k8s.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I tried to start with apache-spark-on-k8s/spark#521. But I had to give up after I realized there are too many things to do even with what's already in the fork. Mostly decoupling the maven dependencies took me 2 full days including Saturday afternoon. A lot of fun :-)

I think we want to add #521 later in a follow-up PR. Maybe others can chip in once the basic overhaul is done by this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I'm not sure what's in #521 that's specific to the fork that isn't in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, looking at the diff in the PR, it's entirely code level. There aren't any new Maven constructs added or taken away. So we should be able to apply the diff here as well without needing to worry about Maven at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with #521 yet. So I don't know how much extra time it would need to address the code and potential review comments that the new code would invite. And I personally don't have a lot of extra time to spend on this beyond what I spent already.

I also don't know much about the riselab setup. I was hoping to delay that until @ssuchter comes back. We can use Pepperdata jenkins in the meantime.

So I really hope that we can keep this PR simple and address gaps in future PRs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed up offline - we can introduce the usage of a pre-installed Minikube later, and in fact we can't use a pre-installed Minikube right now since our first iteration of this will be deploying the tests on Pepperdata's Jenkins nodes. The jenkins jobs will be serialized with builds from apache-spark-on-k8s, so we shouldn't have multiple test runs colliding on managing the Minikube VM.

There needs to be a specific sequencing of events to switch to using a pre-installed Minikube instance on the Riselab jenkins nodes instead. I'll follow up on what that specific sequencing is in an issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<dependency>
<groupId>com.spotify</groupId>
<artifactId>docker-client</artifactId>
<version>5.0.2</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should have a universal standard of having these versions defined at the top. I'm not strongly opinionated here, but it's something to consider.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just strange that some versions are hardcoded here while others are in constants.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I can put all of them in properties.

@mccheah
Copy link
Contributor

mccheah commented Dec 18, 2017

Actually wouldn't this repository also need to define the script that runs the make-distribution to create the Spark tarball given a remote repository location? I'd imagine this repository's code would like to include that script and the Jenkins job just runs it.

@kimoonkim
Copy link
Member Author

I have been running make-distribution.sh manually and pass the output tarball to the integration-test. For example:

$ export DATE=`date "+%Y%m%d"`
$ export REVISION=`git rev-parse --short HEAD`
$ ./dev/make-distribution.sh --name ${DATE}-${REVISION} --tgz -Phadoop-2.6 -Pkubernetes -Pkinesis-asl -Phive -Phive-thriftserver

We can configure Jenkins job(s) to include this or we can check in a script into our repo. I don't have a strong opinion either way.

There is a minor issue in checking in the script. There can be some variations in how we build the distribution tarballs, like hadoop-2.6 vs hadoop-2.7, including python vs R profiles, etc. And it make take some time before we figure out how many different options we want to test. So checking-in the script may slow us down.

@mccheah
Copy link
Contributor

mccheah commented Dec 18, 2017

I'm of the opinion that the repository should define as much of the logic as possible so that it's visible to all that will contribute to the project without needing to inspect the Jenkins node. So generally I'd expect one to be able to just run bin/run-tests.sh and it does all of the following:

  • git checkout the latest master or maybe some git hash/tag with a remote URL. (Edit: Passing in the given remote URL and git hash as parameters is fine, can use apache/spark and the head of master as defaults)
  • make-distribution.sh
  • Run tests using Maven

@mccheah
Copy link
Contributor

mccheah commented Dec 18, 2017

For the make-distribution we can use whatever upstream uses to publish tarballs for releases. I don't foresee us being opinionated about the Hadoop version until we add HDFS support.

@kimoonkim
Copy link
Member Author

For the make-distribution we can use whatever upstream uses to publish tarballs for releases.

But if we want to support multiple upstream versions, say 2.3, 2.4, etc, the upstream instruction itself may change over time. For instance, 2.4 may drop hadoop 2.6 support. Or include a new subproject profile that did not exist in 2.3. It's a moving target over time.

I imagine we can include multiple scripts in that case. My point is that there is a downside of doing too much wiring end-to-end, especially at the start when things are in the flux. It tends to lead to more work.

@mccheah
Copy link
Contributor

mccheah commented Dec 18, 2017

I would expect the addition and modification of these flags to be the exception, not the norm. And our tests will usually not be opinionated towards new modules. For example, our tests don't deal with streaming at all. As we expand our test coverage in the future to include such modules then we'll have to keep this dependency footprint and reliance on profiles in mind.

But this decoupling also gives us the flexibility to maybe build with the minimal set of profiles required to get our tests to work. We could for example build a distribution without Mesos and streaming support - we just have to pass the minimal set of flags to make-distribution.sh. The disadvantage there is that we don't tests against the full classpath that spark-submit will run with, so we might miss classpath issues in the process.

In the longer term scheme of things we can propose the notion of upstream publishing nightly builds that would be built with the profiles that the full releases are built with.

@kimoonkim
Copy link
Member Author

The last discussion is actually a nice segway into the next topic, how exactly we should design the Jenkins job(s). Let's move it to issue #2. I have a few questions to ask there.

@kimoonkim
Copy link
Member Author

@mccheah Thanks for the review so far. Please take a look at https://github.com/apache-spark-on-k8s/spark-integration/issues/2#issuecomment-352591170. I think this has implications on whether or not we want to include the distribution build script in this repo.

@liyinan926, I think I addressed your comments so far. Can you please take another look? Thanks.

@@ -0,0 +1 @@
Contents
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file being used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not used. Do we want to delete the file?

@kimoonkim
Copy link
Member Author

Ok. I just hooked this PR in Jenkins, which is passing now: http://spark-k8s-jenkins.pepperdata.org:8080/view/upstream%20spark%20repo/job/spark-integration/22/console

[INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @ spark-kubernetes-integration-tests_2.11 ---
Discovery starting.
Discovery completed in 142 milliseconds.
Run starting. Expected test count is: 2
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
Run completed in 4 minutes, 38 seconds.
Total number of tests run: 2
Suites: completed 2, aborted 0
Tests: succeeded 2, failed 0, canceled 0, ignored 0, pending 0
All tests passed.

This Jenkins job copies the distro tarball from the other Jenkins job: http://spark-k8s-jenkins.pepperdata.org:8080/view/upstream%20spark%20repo/job/build-spark-distribution/

README.md Outdated
Note that currently the integration tests only run with Java 8.

Running the integration tests requires a Spark distribution tarball. It also
needs a local path to the directory that contains `Dockerimage` files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should say dockerfiles? We're shipping it under $DISTDIR/kubernetes/dockerfiles in upstream.

README.md Outdated
needs a local path to the directory that contains `Dockerimage` files.

Once you prepare the inputs, the integration tests can be executed with Maven or
your IDE. Note that when running tests from an IDE, the `pre-integration-test`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a line explaining what each phase does?

The integration tests make use of
[Minikube](https://github.com/kubernetes/minikube), which fires up a virtual
machine and setup a single-node kubernetes cluster within it. By default the vm
is destroyed after the tests are finished. If you want to preserve the vm, e.g.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change this default behavior now? I think we wanted to remove the minikube lifecycle management from these tests. cc/ @mccheah

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can. But existing Jenkins jobs requires minikube to be cleaned up when they are done. So they need to set this flag to be true. I am not sure it's worth the effort now, given that we are going to incorporate apache-spark-on-k8s/spark#521 in the near future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Thanks for clarifying.

Copy link
Member

@liyinan926 liyinan926 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the work!

@kimoonkim
Copy link
Member Author

@liyinan926 Thanks for the review.

@foxish I think I have addressed your comments so far. Please take another look.

@foxish
Copy link
Member

foxish commented Dec 20, 2017

LGTM, we can iterate from this point forward. Thanks! Merging now.

@foxish foxish merged commit 55bf2ca into apache-spark-on-k8s:master Dec 20, 2017
@kimoonkim kimoonkim deleted the add-test-code branch December 22, 2017 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants