Skip to content

Ollama bug and improvements #686

Open
@easydatawarehousing

Description

@easydatawarehousing

Temperature and seed parameters should be part of 'options'

According to the docs temperature and seed should be passed as options:

curl http://localhost:11434/api/chat -d '{
  "model": "llama3",
  "messages": [
    {
      "role": "user",
      "content": "Hello!"
    }
  ],
  "options": {
    "seed": 101,
    "temperature": 0
  }
}'

In the current implementation these are passed at the same level as parameters like 'model'.

Changing code of Langchain::LLM::Ollama like this works, but is probably not the best place to implement this.

def chat(messages:, model: nil, **params, &block)
  parameters = chat_parameters.to_params(params.merge(messages:, model:, stream: block.present?))

  if parameters.key?(:seed) || parameters.key?(:temperature)
    parameters[:options] = {}

    if parameters.key?(:seed)
      parameters[:options][:seed] = parameters.delete(:seed)
    end

    if parameters.key?(:temperature)
      parameters[:options][:temperature] = parameters.delete(:temperature)
    end
  end

  # ...

Non-streaming response chunks should be joined before parsing?

I am using Ollama 0.1.45. When requesting a non-streaming response (i.e. not passing a block to chat method) and the response is large (more than ~4000 characters) Ollama will send multiple chunks of data.

In the current implementation each chunk is JSON.parse'd seperately. For smaller responses which fit in a single chunck this is obviously not a problem. For multiple chunks I need to join all chunks first and then JSON parse it.

Changing code of Langchain::LLM::Ollama like this works for me.

def chat(messages:, model: nil, **params, &block)
  parameters = chat_parameters.to_params(params.merge(messages:, model:, stream: block.present?))
  responses_stream = []

  if parameters[:stream]
    # Existing code
    client.post("api/chat", parameters) do |req|
      req.options.on_data = json_responses_chunk_handler do |parsed_chunk|
        responses_stream << parsed_chunk

        block&.call(OllamaResponse.new(parsed_chunk, model: parameters[:model]))
      end
    end

    generate_final_chat_completion_response(responses_stream, parameters)
    # /Existing code
  else
    client.post("api/chat", parameters) do |req|
      req.options.on_data = proc do |chunk, _size, _env|
        puts "RECEIVED #{_size} CHARS, LAST CHAR IS: '#{chunk[-1]}'" # DEBUG
        responses_stream << chunk
      end
    end

    OllamaResponse.new(
      {
        "message" => {
          "role"    => "assistant",
          "content" => JSON.parse(responses_stream.join).dig("message", "content")
        }
      },
      model: parameters[:model]
    )
  end
end

Ollama docs say nothing about this behavior. Might be a bug in Ollama. Or a feature.
This happens at least with llama3-8b-q8 and phi3-14b-q5 models.
Should langchainrb code around this? Checking if response chunks are complete JSON documents or not.

Inherit from Langchain::LLM::OpenAI ?

Since Ollama is compatible with OpenAI's API, isn't it easier to let Langchain::LLM::Ollama inherit from Langchain::LLM::OpenAI ? Overwriting default values where needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions