You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
server : documentation of JSON return value of /completion endpoint (#3632)
* Added documentation of JSON return value of /completion endpoint
* Update examples/server/README.md
---------
Co-authored-by: Georgi Gerganov <[email protected]>
Copy file name to clipboardExpand all lines: examples/server/README.md
+36-6Lines changed: 36 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -106,25 +106,25 @@ node index.js
106
106
107
107
## API Endpoints
108
108
109
-
-**POST**`/completion`: Given a prompt, it returns the predicted completion.
109
+
-**POST**`/completion`: Given a `prompt`, it returns the predicted completion.
110
110
111
111
*Options:*
112
112
113
+
`prompt`: Provide the prompt for this completion as a string or as an array of strings or numbers representing tokens. Internally, the prompt is compared to the previous completion and only the "unseen" suffix is evaluated. If the prompt is a string or an array with the first element given as a string, a `bos` token is inserted in the front like `main` does.
114
+
113
115
`temperature`: Adjust the randomness of the generated text (default: 0.8).
114
116
115
117
`top_k`: Limit the next token selection to the K most probable tokens (default: 40).
116
118
117
119
`top_p`: Limit the next token selection to a subset of tokens with a cumulative probability above a threshold P (default: 0.95).
118
120
119
-
`n_predict`: Set the number of tokens to predict when generating text. **Note:** May exceed the set limit slightly if the last token is a partial multibyte character. When 0, no tokens will be generated but the prompt is evaluated into the cache. (default: -1, -1 = infinity).
121
+
`n_predict`: Set the maximum number of tokens to predict when generating text. **Note:** May exceed the set limit slightly if the last token is a partial multibyte character. When 0, no tokens will be generated but the prompt is evaluated into the cache. (default: -1, -1 = infinity).
120
122
121
-
`n_keep`: Specify the number of tokens from the initial prompt to retain when the model resets its internal context.
122
-
By default, this value is set to 0 (meaning no tokens are kept). Use `-1` to retain all tokens from the initial prompt.
123
+
`n_keep`: Specify the number of tokens from the prompt to retain when the context size is exceeded and tokens need to be discarded.
124
+
By default, this value is set to 0 (meaning no tokens are kept). Use `-1` to retain all tokens from the prompt.
123
125
124
126
`stream`: It allows receiving each predicted token in real-time instead of waiting for the completion to finish. To enable this, set to `true`.
125
127
126
-
`prompt`: Provide a prompt as a string, or as an array of strings and numbers representing tokens. Internally, the prompt is compared, and it detects if a part has already been evaluated, and the remaining part will be evaluate. If the prompt is a string, or an array with the first element given as a string, a space is inserted in the front like main.cpp does.
127
-
128
128
`stop`: Specify a JSON array of stopping strings.
129
129
These words will not be included in the completion, so make sure to add them to the prompt for the next iteration (default: []).
130
130
@@ -158,6 +158,36 @@ node index.js
158
158
159
159
`n_probs`: If greater than 0, the response also contains the probabilities of top N tokens for each generated token (default: 0)
160
160
161
+
*Result JSON:*
162
+
163
+
Note: When using streaming mode (`stream`) only `content` and `stop` will be returned until end of completion.
164
+
165
+
`content`: Completion result as a string (excluding `stopping_word` if any). In case of streaming mode, will contain the next token as a string.
166
+
167
+
`stop`: Boolean for use with `stream` to check whether the generation has stopped (Note: This is not related to stopping words array `stop` from input options)
168
+
169
+
`generation_settings`: The provided options above excluding `prompt` but including `n_ctx`, `model`
170
+
171
+
`model`: The path to the model loaded with `-m`
172
+
173
+
`prompt`: The provided `prompt`
174
+
175
+
`stopped_eos`: Indicating whether the completion has stopped because it encountered the EOS token
176
+
177
+
`stopped_limit`: Indicating whether the completion stopped because `n_predict` tokens were generated before stop words or EOS was encountered
178
+
179
+
`stopped_word`: Indicating whether the completion stopped due to encountering a stopping word from `stop` JSON array provided
180
+
181
+
`stopping_word`: The stopping word encountered which stopped the generation (or "" if not stopped due to a stopping word)
182
+
183
+
`timings`: Hash of timing information about the completion such as the number of tokens `predicted_per_second`
184
+
185
+
`tokens_cached`: Number of tokens from the prompt which could be re-used from previous completion (`n_past`)
186
+
187
+
`tokens_evaluated`: Number of tokens evaluated in total from the prompt
188
+
189
+
`truncated`: Boolean indicating if the context size was exceeded during generation, i.e. the number of tokens provided in the prompt (`tokens_evaluated`) plus tokens generated (`tokens predicted`) exceeded the context size (`n_ctx`)
0 commit comments