-
Notifications
You must be signed in to change notification settings - Fork 9
Add initial integration test code #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
import org.apache.spark.deploy.k8s.integrationtest.constants.SPARK_DISTRO_PATH | ||
|
||
private[spark] class KubernetesSuite extends FunSuite with BeforeAndAfterAll with BeforeAndAfter { | ||
import KubernetesSuite._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an empty line before the import.
|
||
before { | ||
sparkAppConf = kubernetesTestComponents.newSparkAppConf() | ||
.set("spark.kubernetes.initcontainer.docker.image", "spark-init:latest") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The property names for images have been changed in apache/spark#19995. They need to be updated once that PR is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good to know. I'll leave a NOTE. Also, let me delete initcontainer
lines since we don't use it here yet.
before { | ||
sparkAppConf = kubernetesTestComponents.newSparkAppConf() | ||
.set("spark.kubernetes.initcontainer.docker.image", "spark-init:latest") | ||
.set("spark.kubernetes.driver.docker.image", "spark-driver:latest") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems newSparkAppConf
already sets spark.kubernetes.driver.docker.image
.
|
||
import io.fabric8.kubernetes.client.DefaultKubernetesClient | ||
import org.scalatest.concurrent.Eventually | ||
import scala.collection.mutable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scala packages should be in a group after java packages.
val outputLines = new ArrayBuffer[String] | ||
|
||
Utils.tryWithResource(new InputStreamReader(proc.getInputStream)) { procOutput => | ||
Utils.tryWithResource(new BufferedReader(procOutput)) { (bufferedOutput: BufferedReader) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BufferedReader
and InputStreamReader
can be combined into a single resource new BufferedReader(new InputStreamReader(proc.getInputStream))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turning to Source.fromInputStream
to avoid dealing with this ourselves.
var line: String = null | ||
do { | ||
line = bufferedOutput.readLine() | ||
if (line != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check and the condition of the while loop are redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto. Source.fromInputStream
replaces this code.
|
||
private val MINIKUBE_STARTUP_TIMEOUT_SECONDS = 60 | ||
|
||
def startMinikube(): Unit = synchronized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are this and the following methods synchronized
? executeMinikube
cannot be called concurrently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mccheah would know better. I guess we want to prevent deleteMinikube
from destroying the minikube vm while other methods try to use the vm. Maybe such a race condition can corrupt the vm or some vm provisioning tools like VirtualBox
?
Does this explanation make sense? If yes, I think we can leave a NOTE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to me.
|
||
private val originalDockerUri = URI.create(dockerHost) | ||
private val httpsDockerUri = new URIBuilder() | ||
.setHost(originalDockerUri.getHost) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indention.
private val dockerClient = new DefaultDockerClient.Builder() | ||
.uri(httpsDockerUri) | ||
.dockerCertificates(DockerCertificates | ||
.builder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto, indention.
override def initialize(): Unit = { | ||
var k8ConfBuilder = new ConfigBuilder() | ||
.withApiVersion("v1") | ||
.withMasterUrl(master.replaceFirst("k8s://", "")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we copy the implementation of https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L2763 for parsing k8s master URL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it seems better to have the code. Do we think copying is the right way, instead of somehow depending on a jar as a maven library? Copying is easy at least for now. Just curious how others think.
@@ -56,6 +57,8 @@ private[spark] class KubernetesTestComponents(defaultClient: DefaultKubernetesCl | |||
new SparkAppConf() | |||
.set("spark.master", s"k8s://${kubernetesClient.getMasterUrl}") | |||
.set("spark.kubernetes.namespace", namespace) | |||
// TODO: apache/spark#19995 is changing docker.image to container.image in these properties. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: the PR has been merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I'll have to update my distro, make this change and test it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done by commit 989a371.
} | ||
}{ | ||
output => output.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is calling close
necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is because output
is not a subclass of Closeable
. See the extra version of Utils.tryWithResource
that I added for details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems Source
is Closeable
, https://www.scala-lang.org/api/current/scala/io/Source.html. So you don't need to call close
explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that is interesting. I think it became Closeable
in scala version 2.12. Source
in scala 2.11 is not Closeable
. From https://github.com/scala/scala/blob/v2.11.8/src/library/scala/io/Source.scala#L190:
abstract class Source extends Iterator[Char] {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually managed to get rid of the close call by using the input stream as resource. PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
* completes within timeout seconds. | ||
*/ | ||
def executeProcess(fullCommand: Array[String], timeout: Long): Seq[String] = { | ||
val pb = new ProcessBuilder().command(fullCommand: _*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not launch a pod to kick off the spark-submit command?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
People launch spark-submit
from outside their k8s clusters. I think launching spark-submit itself inside a pod would deviate from how people use spark on k8s. Besides, I don't even know running spark-submit inside a pod works :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe decoupling the spark-submit portion from the integration tests is better to assume that issues with host machine don't cause the integration tests to fail when they shouldn't. (we have enough assumptions already with Python and R). Thoughts? I can send up a PR to add launching from a pod if we believe it to be better than using ProcessBuilder, which I am trying to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Launching from a pod would add a second layer of indirection when trying to debug issues with the spark-submit runtime itself. Making assumptions about the host machine should be fine, isn't that what the Spark R and Python unit tests do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it would complicate auth issues in the future when we add back custom client credentials etc. See DriverKubernetesCredentialsStep.
</execution> | ||
<execution> | ||
<!-- TODO: Remove this hack once upstream is fixed by SPARK-22777 --> | ||
<id>set-exec-bit-on-docker-entrypoint-sh</id> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just set it in the Dockerfile
using: RUN chmod +x /opt/entrypoint.sh
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's better to be done by the upstream code. It's hard for this integration code to surgically do in-place edit of Dockerfile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR has merged now.
integration-test/pom.xml
Outdated
<version>1.3.0</version> | ||
<executions> | ||
<execution> | ||
<id>download-minikube-linux</id> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't be downloading here, we should be using the built-in Minikube binary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that's what we want, but I suggest we address this in a future PR. I think we'd like to start testing the upstream sooner than later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Riselab nodes where we are running this will have Minikube installed already, as it assumes the setup where the Minikube instance is being re-used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it should be just as easy to use the pre-installed minikube as it would be to download a new one every time. That is, the total time from now to when the tests start running against upstream should be equivalent regardless of which branch we use - so we should use the more robust mode from the start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying my reply from below:
I am not familiar with #521 yet. So I don't know how much extra time it would need to address the code and potential review comments that the new code would invite. And I personally don't have a lot of extra time to spend on this beyond what I spent already.
I also don't know much about the riselab setup. I was hoping to delay that until @ssuchter comes back. We can use Pepperdata jenkins in the meantime.
So I really hope that we can keep this PR simple and address gaps in future PRs.
|
||
import org.apache.spark.deploy.k8s.integrationtest.backend.IntegrationTestBackend | ||
import org.apache.spark.deploy.k8s.integrationtest.constants.MINIKUBE_TEST_BACKEND | ||
import org.apache.spark.deploy.k8s.integrationtest.docker.SparkDockerImageBuilder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is using the incorrect branch - shouldn't this be based on apache-spark-on-k8s/spark#521 which in turn assumes that Minikube is pre-installed on the box? The code here should reflect what's in the integration-tests-reuse-minikube
branch in apache-spark-on-k8s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I tried to start with apache-spark-on-k8s/spark#521. But I had to give up after I realized there are too many things to do even with what's already in the fork. Mostly decoupling the maven dependencies took me 2 full days including Saturday afternoon. A lot of fun :-)
I think we want to add #521 later in a follow-up PR. Maybe others can chip in once the basic overhaul is done by this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I'm not sure what's in #521 that's specific to the fork that isn't in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, looking at the diff in the PR, it's entirely code level. There aren't any new Maven constructs added or taken away. So we should be able to apply the diff here as well without needing to worry about Maven at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with #521 yet. So I don't know how much extra time it would need to address the code and potential review comments that the new code would invite. And I personally don't have a lot of extra time to spend on this beyond what I spent already.
I also don't know much about the riselab setup. I was hoping to delay that until @ssuchter comes back. We can use Pepperdata jenkins in the meantime.
So I really hope that we can keep this PR simple and address gaps in future PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Followed up offline - we can introduce the usage of a pre-installed Minikube later, and in fact we can't use a pre-installed Minikube right now since our first iteration of this will be deploying the tests on Pepperdata's Jenkins nodes. The jenkins jobs will be serialized with builds from apache-spark-on-k8s, so we shouldn't have multiple test runs colliding on managing the Minikube VM.
There needs to be a specific sequencing of events to switch to using a pre-installed Minikube instance on the Riselab jenkins nodes instead. I'll follow up on what that specific sequencing is in an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
integration-test/pom.xml
Outdated
<dependency> | ||
<groupId>com.spotify</groupId> | ||
<artifactId>docker-client</artifactId> | ||
<version>5.0.2</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we should have a universal standard of having these versions defined at the top. I'm not strongly opinionated here, but it's something to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just strange that some versions are hardcoded here while others are in constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I can put all of them in properties.
Actually wouldn't this repository also need to define the script that runs the |
I have been running
We can configure Jenkins job(s) to include this or we can check in a script into our repo. I don't have a strong opinion either way. There is a minor issue in checking in the script. There can be some variations in how we build the distribution tarballs, like hadoop-2.6 vs hadoop-2.7, including python vs R profiles, etc. And it make take some time before we figure out how many different options we want to test. So checking-in the script may slow us down. |
I'm of the opinion that the repository should define as much of the logic as possible so that it's visible to all that will contribute to the project without needing to inspect the Jenkins node. So generally I'd expect one to be able to just run
|
For the make-distribution we can use whatever upstream uses to publish tarballs for releases. I don't foresee us being opinionated about the Hadoop version until we add HDFS support. |
But if we want to support multiple upstream versions, say 2.3, 2.4, etc, the upstream instruction itself may change over time. For instance, 2.4 may drop hadoop 2.6 support. Or include a new subproject profile that did not exist in 2.3. It's a moving target over time. I imagine we can include multiple scripts in that case. My point is that there is a downside of doing too much wiring end-to-end, especially at the start when things are in the flux. It tends to lead to more work. |
I would expect the addition and modification of these flags to be the exception, not the norm. And our tests will usually not be opinionated towards new modules. For example, our tests don't deal with streaming at all. As we expand our test coverage in the future to include such modules then we'll have to keep this dependency footprint and reliance on profiles in mind. But this decoupling also gives us the flexibility to maybe build with the minimal set of profiles required to get our tests to work. We could for example build a distribution without Mesos and streaming support - we just have to pass the minimal set of flags to In the longer term scheme of things we can propose the notion of upstream publishing nightly builds that would be built with the profiles that the full releases are built with. |
The last discussion is actually a nice segway into the next topic, how exactly we should design the Jenkins job(s). Let's move it to issue #2. I have a few questions to ask there. |
@mccheah Thanks for the review so far. Please take a look at https://github.com/apache-spark-on-k8s/spark-integration/issues/2#issuecomment-352591170. I think this has implications on whether or not we want to include the distribution build script in this repo. @liyinan926, I think I addressed your comments so far. Can you please take another look? Thanks. |
integration-test/test-data/input.txt
Outdated
@@ -0,0 +1 @@ | |||
Contents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this file being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's not used. Do we want to delete the file?
Ok. I just hooked this PR in Jenkins, which is passing now: http://spark-k8s-jenkins.pepperdata.org:8080/view/upstream%20spark%20repo/job/spark-integration/22/console
This Jenkins job copies the distro tarball from the other Jenkins job: http://spark-k8s-jenkins.pepperdata.org:8080/view/upstream%20spark%20repo/job/build-spark-distribution/ |
README.md
Outdated
Note that currently the integration tests only run with Java 8. | ||
|
||
Running the integration tests requires a Spark distribution tarball. It also | ||
needs a local path to the directory that contains `Dockerimage` files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should say dockerfiles? We're shipping it under $DISTDIR/kubernetes/dockerfiles
in upstream.
README.md
Outdated
needs a local path to the directory that contains `Dockerimage` files. | ||
|
||
Once you prepare the inputs, the integration tests can be executed with Maven or | ||
your IDE. Note that when running tests from an IDE, the `pre-integration-test` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a line explaining what each phase does?
The integration tests make use of | ||
[Minikube](https://github.com/kubernetes/minikube), which fires up a virtual | ||
machine and setup a single-node kubernetes cluster within it. By default the vm | ||
is destroyed after the tests are finished. If you want to preserve the vm, e.g. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change this default behavior now? I think we wanted to remove the minikube lifecycle management from these tests. cc/ @mccheah
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can. But existing Jenkins jobs requires minikube to be cleaned up when they are done. So they need to set this flag to be true. I am not sure it's worth the effort now, given that we are going to incorporate apache-spark-on-k8s/spark#521 in the near future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Thanks for clarifying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the work!
@liyinan926 Thanks for the review. @foxish I think I have addressed your comments so far. Please take another look. |
LGTM, we can iterate from this point forward. Thanks! Merging now. |
Closes #2.
Add a small number of basic test functions that work with both the main spark repo and our fork. They use SparkPi and support a minimum set of features. Only scala, no resource staging support, etc.
The project no longer needs Spark jars as maven dependencies. Instead, we pass a distro tarball and dockerfiles dir to maven like:
Then maven builds Docker images from the distro tarball. The test code launches Spark jobs using the
spark-submit
CLI interface.@foxish @liyinan926 @mccheah @ssuchter