failures during experimental feature parallel compile

testing new experimental feature from PR #5826 **Add functions for parallel compilation**  
which was recently merged into main branch  

im loading number of small models and attempting to run pre-compile and i'm getting errors on all attempts  

here ive documented 3 different failures:

- compile fails on some models with a totally random message such as:  
  (and works fine on some models)

  > Uncaught (in promise) Error: Pass at least one tensor to `tf.stack`

- compile completes without errors, but later actual code execution in js fails:  
  (same code works just fine if there is no pre-compile)

      Uncaught (in promise) TypeError: Cannot read properties of null (reading 'A')
        at tfjs.esm.js:47772:27
        at Array.forEach (<anonymous>)
        at runProgram (tfjs.esm.js:47770:10)
        at _MathBackendWebGL.runWebGLProgram (tfjs.esm.js:49796:7)
        at _MathBackendWebGL.uploadToGPU (tfjs.esm.js:49916:40)

  which happens in a trivial function that runs `tf.image.resizeBilinear` followed by `tf.div` to normalize input tensor
 
 - compile completes without errors, but later model inference fails with the same error as above  
   actual backtrace shows that it happens during `execute` call and kernel op in model that triggers error is a simple `sub`  
  (same model executes without issues if there is no pre-compile)


my function that runs precompile on all models is:

```js
type Models: Record<string, GraphModel>;

async function runCompile(allModels: Models) {
  const backendType = tf.getBackend();
  const webGLBackend = tf.backend();
  if ((backendType !== 'webgl') || (!webGLBackend || !webGLBackend.checkCompileCompletion)) {
    log('compile pass: skip');
    return;
  }
  const models = Object.values(allModels).filter((m) => m !== null) as GraphModel[];
  tf.env().set('ENGINE_COMPILE_ONLY', true);
  const numTensorsStart = tf.engine().state.numTensors;
  for (const model of models) {
    const shape = (model.inputs && model.inputs[0] && model.inputs[0].shape) ? [...model.inputs[0].shape] : [1, 64, 64, 3];
    const dtype = (model.inputs && model.inputs[0] && model.inputs[0].dtype) ? model.inputs[0].dtype : 'float32';
    for (let dim = 0; dim < shape.length; dim++) {
      if (shape[dim] === -1) shape[dim] = dim === 0 ? 1 : 64; // override batch number and any dynamic dimensions
    }
    const tensor = tf.zeros(shape, dtype);
    const res = await model.executeAsync(tensor);
    if (Array.isArray(res)) res.forEach((t) => tf.dispose(t));
    else tf.dispose(res);
    tf.dispose(tensor);
  }
  const kernels = await webGLBackend.checkCompileCompletionAsync(); // same errors if check is moved inside per-model loop
  webGLBackend.getUniformLocations();
  log('compile pass kernels:', kernels.length); // getting a reasonable value here
  tf.env().set('ENGINE_COMPILE_ONLY', false);
  const numTensorsEnd = tf.engine().state.numTensors;
  if ((numTensorsEnd - numTensorsStart) > 0) log('tensor leak:', numTensorsEnd - numTensorsStart); // no leaks
}
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

failures during experimental feature parallel compile #6250

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

failures during experimental feature parallel compile #6250

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions