-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Different mul_mat results between METAL and CPU backends #1230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You should be looking at the relative difference, not the absolute. The absolute diff grows unbounded with the matrix sizes, so this check does not give you any information. |
Thank you for pointing out the need to use a relative‐difference check. I’ve updated my validation code accordingly (see below), but I still observe a discrepancy when running with ggml. To rule out a hardware issue, I wrote a minimal PyTorch script that performs the same 512×512 matrix multiplication on both CPU and MPS backends using identical inputs and compares both absolute and relative errors. In that PyTorch test, the CPU and MPS results match exactly, with no measurable difference. I couldn't understand why ggml shows a relative error in this same operation while PyTorch does not. GGML output
PyTorch output
Modified tolerance calculation code
Pytorch Matrix Multiplication (MPS and CPU)
|
I also encountered inconsistent results between GPU and CPU backends when performing matrix multiplication. The first matrix has dimensions (2048, 7168) with all values set to 1, and the second matrix has dimensions (10, 7168) with all values set to 0.1. All inputs are FP32. The maximum relative error in the results is about 2.35e-04. Is this expected? |
I ran the same test on ggml-easy, but there is no difference between CPU / Metal: https://github.com/ngxson/ggml-easy/blob/0d49b461053035205ca1ab5bc783f849baa2fa0b/demo/random.cpp#L103-L140 Also looking at your code, there is not |
I added a call to ggml_set_input/output when creating the graph but no luck there. Then I noticed that, in the example you provide, before the matrix‐multiply operation you insert a ggml_scale node to convert the input data into random floats and initialize. I copied that pattern in my own
Modified
|
Uh oh!
There was an error while loading. Please reload this page.
Hi,
Thank you for the amazing work!
I’m seeing inconsistent results from the METAL and CPU backends when performing a matrix multiplication. I took the simple-backend.cpp example (code below), adapted it to run on both backends, and had it perform a single forward pass—each backend computes the same multiplication, and then the outputs are compared.
After debugging, I confirmed that both the CPU and METAL kernels are using 32-bit floats. While some small floating-point discrepancies are expected, the difference I’m observing after just one pass is far larger than any reasonable tolerance and it grows even larger as array sizes increase or when using small random values. I’d appreciate your help identifying why the METAL backend is producing such a large error.
Thank you!
Result
simple-backend.cpp
The text was updated successfully, but these errors were encountered: