Open
Description
I noticed odd behavior in leave_k_out_split - specifically, under certain circumstances (many rows with one value?), the number returned for the withheld test set value is different from the actual value. This causes train + test
to fail to reconstruct the original matrix.
Script to reproduce:
import implicit
implicit.__version__
#> '0.6.2'
from implicit.evaluation import leave_k_out_split
from scipy import sparse
ratings = sparse.csr_matrix(
[[3, 2, 1, 0], [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 1, 1, 1]]
)
train, test = leave_k_out_split(ratings, K=1, random_state=42)
diff = (train + test) - ratings
diff.nnz
#> 1
train.todense()
#> matrix([[3, 2, 0, 0],
#> [1, 0, 0, 0],
#> [0, 1, 0, 0],
#> [0, 0, 1, 0],
#> [0, 1, 0, 1]])
test.todense()
#> matrix([[0, 0, 4, 0],
#> [0, 0, 0, 0],
#> [0, 0, 0, 0],
#> [0, 0, 0, 0],
#> [0, 0, 1, 0]])
diff.todense()
#> matrix([[0, 0, 3, 0],
#> [0, 0, 0, 0],
#> [0, 0, 0, 0],
#> [0, 0, 0, 0],
#> [0, 0, 0, 0]])
Created at 2022-12-23 13:59:28 CST by reprexlite v0.5.0
I believe the issue is in _take_tails
- the returned test_idx
array has multiple copies of the first user index returned, so we end up with a test set value that is copied multiple times.
(As an aside, the call to _take_tails
when shuffled=True
does not pass on the rng
, so the random state cannot be maintained.)
Metadata
Metadata
Assignees
Labels
No labels