Updates

rnett · rnett · commit 95c32e44451c · 2021-02-01T22:18:10.000-08:00
Signed-off-by: Ryan Nett &lt;rnett@calpoly.edu&gt;
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,57 +1,99 @@
-# Building and contributing to TensorFlow Java
+# Building and Contributing to TensorFlow Java
 
 ## Building
 
-To build all the artifacts, simply invoke the command `mvn install` at the root of this repository (or
-the Maven command of your choice). It is also possible to build artifacts with support for MKL enabled with
+To build all the artifacts, simply invoke the command `mvn install` at the root of this repository (or the Maven command of your choice). It is also
+possible to build artifacts with support for MKL enabled with
 `mvn install -Djavacpp.platform.extension=-mkl` or CUDA with `mvn install -Djavacpp.platform.extension=-gpu`
 or both with `mvn install -Djavacpp.platform.extension=-mkl-gpu`.
 
 When building this project for the first time in a given workspace, the script will attempt to download
-the [TensorFlow runtime library sources](https://github.com/tensorflow/tensorflow) and build of all the native code
-for your platform. This requires a valid environment for building TensorFlow, including the [bazel](https://bazel.build/)
+the [TensorFlow runtime library sources](https://github.com/tensorflow/tensorflow) and build of all the native code for your platform. This requires a
+valid environment for building TensorFlow, including the [bazel](https://bazel.build/)
 build tool and a few Python dependencies (please read [TensorFlow documentation](https://www.tensorflow.org/install/source)
 for more details).
 
-This step can take multiple hours on a regular laptop. It is possible though to skip completely the native build if you are
-working on a version that already has pre-compiled native artifacts for your platform [available on Sonatype OSS Nexus repository](#Snapshots).
-You just need to activate the `dev` profile in your Maven command to use those artifacts instead of building them from scratch
+This step can take multiple hours on a regular laptop. It is possible though to skip completely the native build if you are working on a version that
+already has pre-compiled native artifacts for your platform [available on Sonatype OSS Nexus repository](#Snapshots). You just need to activate
+the `dev` profile in your Maven command to use those artifacts instead of building them from scratch
 (e.g. `mvn install -Pdev`).
 
-Note that modifying any source files under `tensorflow-core` may impact the low-level TensorFlow bindings, in which case a
-complete build could be required to reflect the changes.
+Modifying the native op generation code (not the annotation processor) or the JavaCPP configuration (not the abstract Pointers) will require a
+complete build could be required to reflect the changes, otherwise `-Pdev` should be fine.
 
+### GPU Support
+
+Currently, due to build time constraints, the GPU binaries only support compute capacities 3.5 and 7.0.  
+To use with un-supported GPUs, change the value [here](tensorflow-core/tensorflow-core-api/build.sh#L27) and build the binaries yourself. While this
+is far from ideal, we are working on getting more build resources, and for now this is the best option.
+
+To build for GPU, pass `-Djavacpp.platform.extension=-gpu` to maven. By default, the CI options are used for the bazel build.  
+Using Tensorflow's configure script and copying the resulting `.tf_configure.bazelrc` to `tensorflow-core-api` can be used to override these options (
+like cuda locations). See the [Working with Bazel generation](#working-with-bazel-generation) section for details. If you do this, make sure
+the `TF_CUDA_COMPUTE_CAPABILITIES` value in your `.tf_configure.bazelrc` matches the value set in `build.sh`.
 
 ## Running Tests
 
-`ndarray` can be tested using the maven `test` target.  `tensorflow-core` and `tensorflow-framework`, however, 
-should be tested using the `integration-test` target, due to the need to include native binaries.
-It will **not** be ran when using the `test` target of parent projects, but will be ran by `install` or `integration-test`.
+`ndarray` can be tested using the maven `test` target.  `tensorflow-core` and `tensorflow-framework`, however, should be tested using
+the `integration-test` target, due to the need to include native binaries. It will **not** be ran when using the `test` target of parent projects, but
+will be ran by `install` or `integration-test`. If you see a `no jnitensorflow in java.library.path` error from tests it is likely because you're
+running the wrong test target.
+
+### Native Crashes
+
+Occasionally tests will fail with a message like:
+
+```
+Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.22.0:test(default-test)on project tensorflow-core-api:There are test failures.
+
+    Please refer to C:\mpicbg\workspace\tensorflow\java\tensorflow-core\tensorflow-core-api\target\surefire-reports for the individual test results.
+    Please refer to dump files(if any exist)[date]-jvmRun[N].dump,[date].dumpstream and[date]-jvmRun[N].dumpstream.
+    The forked VM terminated without properly saying goodbye.VM crash or System.exit called?
+    Command was cmd.exe/X/C"C:\Users\me\.jdks\adopt-openj9-1.8.0_275\jre\bin\java -jar C:\Users\me\AppData\Local\Temp\surefire236563113746082396\surefirebooter5751859365434514212.jar C:\Users\me\AppData\Local\Temp\surefire236563113746082396 2020-12-18T13-57-26_766-jvmRun1 surefire2445852067572510918tmp surefire_05950149004635894208tmp"
+    Error occurred in starting fork,check output in log
+    Process Exit Code:-1
+    Crashed tests:
+    org.tensorflow.TensorFlowTest
+    org.apache.maven.surefire.booter.SurefireBooterForkException:The forked VM terminated without properly saying goodbye.VM crash or System.exit called?
+    Command was cmd.exe/X/C"C:\Users\me\.jdks\adopt-openj9-1.8.0_275\jre\bin\java -jar C:\Users\me\AppData\Local\Temp\surefire236563113746082396\surefirebooter5751859365434514212.jar C:\Users\me\AppData\Local\Temp\surefire236563113746082396 2020-12-18T13-57-26_766-jvmRun1 surefire2445852067572510918tmp surefire_05950149004635894208tmp"
+    Error occurred in starting fork,check output in log
+    Process Exit Code:-1
+    Crashed tests:
+    org.tensorflow.TensorFlowTest
+    at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:671)
+    at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:533)
+    at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:278)
+    at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:244)
+```
+
+This is because the native code crashed (i.e. because of a segfault), and it should have created a dump file somewhere in the project that you can use
+to tell what caused the issue.
 
 ## Contributing
 
 ### Formatting
 
-Java sources should be formatted according to the [Google style guide](https://google.github.io/styleguide/javaguide.html).
-It can be included in [IntelliJ](https://github.com/google/styleguide/blob/gh-pages/intellij-java-google-style.xml) and 
+Java sources should be formatted according to the [Google style guide](https://google.github.io/styleguide/javaguide.html). It can be included
+in [IntelliJ](https://github.com/google/styleguide/blob/gh-pages/intellij-java-google-style.xml) and
 [Eclipse](https://github.com/google/styleguide/blob/gh-pages/eclipse-java-google-style.xml).
 [Google's C++ style guide](https://google.github.io/styleguide/cppguide.html) should also be used for C++ code.
 
 ### Code generation
 
-Code generation for `Ops` and related classes is done during `tensorflow-core-api`'s `install`, using the annotation processor in 
-`tensorflow-core-generator`. If you change or add any operator classes (annotated with `org.tensorflow.op.annotation.Operator`), 
-endpoint methods (annotated with `org.tensorflow.op.annotation.Endpoint`), or change the annotation processor, be sure to re-run a 
-full `mvn install` in `tensorflow-core-api`.
+Code generation for `Ops` and related classes is done during `tensorflow-core-api`'s `compile` phase, using the annotation processor in
+`tensorflow-core-generator`. If you change or add any operator classes (annotated with `org.tensorflow.op.annotation.Operator`), endpoint methods (
+annotated with `org.tensorflow.op.annotation.Endpoint`), or change the annotation processor, be sure to re-run a
+`mvn install` in `tensorflow-core-api` (`-Pdev` is fine for this, it just needs to run the annotation processor).
 
 ### Working with Bazel generation
 
-`tensorflow-core-api` uses Bazel-built C++ code generation to generate most of the `@Operator` classes.  To get it to build, you will likely need to 
-clone the [tensorflow](https://github.com/tensorflow/tensorflow) project, run its configuration script (`./configure`), and copy the resulting 
+`tensorflow-core-api` uses Bazel-built C++ code generation to generate most of the `@Operator` classes.  
+By default, the bazel build is configured for the [CI](.github/workflows/ci.yml), so if you're building locally, you may need to clone
+the [tensorflow](https://github.com/tensorflow/tensorflow) project, run its configuration script (`./configure`), and copy the resulting
 `.tf_configure.bazelrc` to `tensorflow-core-api`.
 
-To run the code generation, use the `//:java_op_generator` target.  The resulting binary has good help text (viewable in 
-[op_gen_main.cc](tensorflow-core/tensorflow-core-api/src/bazel/op_generator/op_gen_main.cc#L31-L48)).
-Genrally, it should be called with arguments that are something like `bazel-out/k8-opt/bin/external/org_tensorflow/tensorflow/libtensorflow_cc.so 
---output_dir=src/gen/java --api_dirs=bazel-tensorflow-core-api/external/org_tensorflow/tensorflow/core/api_def/base_api,src/bazel/api_def` 
+To run the code generation, use the `//:java_op_generator` target. The resulting binary has good help text (viewable in
+[op_gen_main.cc](tensorflow-core/tensorflow-core-api/src/bazel/op_generator/op_gen_main.cc#L31-L48)). Generally, it should be called with arguments
+that are something
+like `bazel-out/k8-opt/bin/external/org_tensorflow/tensorflow/libtensorflow_cc.so --output_dir=src/gen/java --api_dirs=bazel-tensorflow-core-api/external/org_tensorflow/tensorflow/core/api_def/base_api,src/bazel/api_def`
 (from `tensorflow-core-api`).