You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/optimized-embeddings.ipynb
+12-3Lines changed: 12 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -75,7 +75,16 @@
75
75
"cell_type": "markdown",
76
76
"metadata": {},
77
77
"source": [
78
-
"Bi-encoders are implemented as two classes, one encoding the documents and the other encoding the queries. We load our quantized embedding model for both:"
78
+
"Bi-encoders are implemented as two classes, one encoding the documents and the other encoding the queries.\n",
79
+
"Embedding performance on Intel Hardware depends on the data input strategy. It is recommended to calibrate the batch size and padding strategy to maximize the latency or throughput when embedding.\n",
80
+
"\n",
81
+
"If the length of the sequences is shorter than the maximum length of the model (for example shorter than 512 for BGE), it is recommended to truncate it to speed up encoding. (via `max_sequence_length` argument)\n",
82
+
"Padding can be set to `True` so that each batch is padded to the maximum length (could vary between batches) or to `max_length` that will pad the batch to the maximum set length.\n",
83
+
"Varying with batch size and `padding=True` will affect the throughput of the embedding model, as larger batches could be encoded to larger sequences and smaller batches could produce a large number of varying in sizes batches.\n",
84
+
"\n",
85
+
"Experimentation on your data is key to maximize performance!\n",
0 commit comments