Closed
Description
Discussed in #614
Originally posted by talhalatifkhan August 16, 2023
I am trying to make sure that my output follow a json format every time, i stumbled upon jsonformer and from there i stumbled upon grammar-based sampling, I used json-schema-to-grammar.py to convert json schema.
I want to know if grammar based sampling is used for this specific purpose and if so then how do i use it.
Json schema
json_schema = {
"type": "object",
"properties": {
"Stage": {
"type": "string",
"enum": ["first", "second"]
},
"Task Finished": {"type": "boolean"},
"Statement": {"type": "string"},
"Assistant": {"type": "string"}
}
}
Llama grammar
space ::= " "?
string ::= "\"" (
[^"\\] |
"\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])
)* "\"" space
Stage ::= "\"first\"" | "\"second\""
boolean ::= ("true" | "false") space
root ::= "{" space "\"Assistant\"" space ":" space string "," space "\"Stage\"" space ":" space Stage "," space "\"Statement\"" space ":" space string "," space "\"Task Finished\"" space ":" space boolean "}" space
Here is my code
from llama_cpp import Llama, LlamaGrammar
fs_template = """
You are a precise AI comparer. Your task is to match the user's intent to the statements in the context and confirm if the identified intent is correct.
Your responses should strictly follow the format below:
Stage: [print 'first']
User Intent: [insert user intent statement here]
Task Finished: [insert boolean value based on whether user intent is confirmed]
Assistant: [inser Assistant response here ]
Adhere to the following instructions to complete the task:
1. Start by trying to match the user's question to the statements in the context.
2. If you identify the matching statement to the user's question then confirm it from the user.
3. If the user's intent is unclear or doesn't match the context, ask follow-up questions by providing the options in the context.
4. Once you have confirmed the user intent, set "Task Finished: True" and proceed with your response.
5. You will fail your task if the output generated does not follow the format mentioned above.
Context: (only knowledge base you have)
------------
sample context
-----------
"""
schema = '''
space ::= " "?
string ::= "\"" (
[^"\\] |
"\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])
)* "\"" space
Stage ::= "\"first\"" | "\"second\""
boolean ::= ("true" | "false") space
root ::= "{" space "\"Assistant\"" space ":" space string "," space "\"Stage\"" space ":" space Stage "," space "\"Statement\"" space ":" space string "," space "\"Task Finished\"" space ":" space boolean "}" space
'''
def get_prompt(question: str, chat_history: list,
system_prompt: str) -> str:
texts = [f'[INST] <<SYS>>\n{system_prompt}\n<</SYS>>\n\n']
for user_input, response in chat_history:
texts.append(f'{user_input.strip()} [/INST] {response.strip()} </s><s> [INST] ')
texts.append(f'{question.strip()} [/INST]')
return ''.join(texts)
history = []
prompt = get_prompt("user query", history, fs_template)
grammar = LlamaGrammar.from_string(grammar=schema, verbose=True)
print(grammar)
client = Llama(
model_path="model/llama-2-13b-chat.ggmlv3.q8_0.bin",
n_ctx=4098,
n_threads=16,
last_n_tokens_size=70,
)
answer = client(
prompt,
grammar=grammar,
stream=False,
temperature=0.0,
top_p=0.95,
top_k=50,
repeat_penalty=1.3,
max_tokens=4000,
)
print(answer)
This is the error i am getting
parse: error parsing grammar: expecting newline or end at \] |
"\" (["\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])
)* """ space
Stage ::= ""first"" | ""second""
boolean ::= ("true" | "false") space
root ::= "{" space ""Assistant"" space ":" space string "," space ""Stage"" space ":" space Stage "," space ""Statement"" space ":" space string "," space ""Task Finished"" space ":" space boolean "}" space
Traceback (most recent call last):
File "/home/talha/CloudWhisper/jformer.py", line 49, in <module>
grammar = LlamaGrammar.from_string(grammar=schema,verbose=True)
File "/home/talha/.local/lib/python3.10/site-packages/llama_cpp/llama_grammar.py", line 66, in from_string
raise ValueError(
ValueError: from_string: error parsing grammar file: parsed_grammar.rules is empty