Ollama bug and improvements

## Temperature and seed parameters should be part of 'options'
According to [the docs](https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-chat-completion) temperature and seed should be passed as options:

```curl
curl http://localhost:11434/api/chat -d '{
  "model": "llama3",
  "messages": [
    {
      "role": "user",
      "content": "Hello!"
    }
  ],
  "options": {
    "seed": 101,
    "temperature": 0
  }
}'
```

In the current implementation these are passed at the same level as parameters like 'model'.

Changing code of Langchain::LLM::Ollama like this works, but is probably not the best place to implement this.

```ruby
def chat(messages:, model: nil, **params, &block)
  parameters = chat_parameters.to_params(params.merge(messages:, model:, stream: block.present?))

  if parameters.key?(:seed) || parameters.key?(:temperature)
    parameters[:options] = {}

    if parameters.key?(:seed)
      parameters[:options][:seed] = parameters.delete(:seed)
    end

    if parameters.key?(:temperature)
      parameters[:options][:temperature] = parameters.delete(:temperature)
    end
  end

  # ...
```

## Non-streaming response chunks should be joined before parsing?
I am using Ollama 0.1.45. When requesting a non-streaming response (i.e. not passing a block to `chat` method) and the response is large (more than ~4000 characters) Ollama will send multiple chunks of data.

In the current implementation each chunk is `JSON.parse`'d seperately. For smaller responses which fit in a single chunck this is obviously not a problem. For multiple chunks I need to join all chunks first and then JSON parse it.

Changing code of Langchain::LLM::Ollama like this works for me.

```ruby
def chat(messages:, model: nil, **params, &block)
  parameters = chat_parameters.to_params(params.merge(messages:, model:, stream: block.present?))
  responses_stream = []

  if parameters[:stream]
    # Existing code
    client.post("api/chat", parameters) do |req|
      req.options.on_data = json_responses_chunk_handler do |parsed_chunk|
        responses_stream << parsed_chunk

        block&.call(OllamaResponse.new(parsed_chunk, model: parameters[:model]))
      end
    end

    generate_final_chat_completion_response(responses_stream, parameters)
    # /Existing code
  else
    client.post("api/chat", parameters) do |req|
      req.options.on_data = proc do |chunk, _size, _env|
        puts "RECEIVED #{_size} CHARS, LAST CHAR IS: '#{chunk[-1]}'" # DEBUG
        responses_stream << chunk
      end
    end

    OllamaResponse.new(
      {
        "message" => {
          "role"    => "assistant",
          "content" => JSON.parse(responses_stream.join).dig("message", "content")
        }
      },
      model: parameters[:model]
    )
  end
end
```

Ollama docs say nothing about this behavior. Might be a bug in Ollama. Or a feature.
This happens at least with llama3-8b-q8 and phi3-14b-q5 models.
Should langchainrb code around this? Checking if response chunks are complete JSON documents or not.

## Inherit from Langchain::LLM::OpenAI ?
Since Ollama is compatible with OpenAI's API, isn't it easier to let Langchain::LLM::Ollama inherit from Langchain::LLM::OpenAI ? Overwriting default values where needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Ollama bug and improvements #686

Temperature and seed parameters should be part of 'options'

Non-streaming response chunks should be joined before parsing?

Inherit from Langchain::LLM::OpenAI ?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Ollama bug and improvements #686

Description

Temperature and seed parameters should be part of 'options'

Non-streaming response chunks should be joined before parsing?

Inherit from Langchain::LLM::OpenAI ?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions