Skip to content

Failing single underscore nested delimiter in BaseSettings with submodules #51

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
6 of 15 tasks
petroslamb opened this issue May 10, 2023 · 16 comments
Closed
6 of 15 tasks

Comments

@petroslamb
Copy link

Initial Checks

  • I have searched GitHub for a duplicate issue and I'm sure this is something new
  • I have searched Google & StackOverflow for a solution and couldn't find anything
  • I have read and followed the docs and still think this is a bug
  • I am confident that the issue is with pydantic (not my code, or another library in the ecosystem like FastAPI or mypy)

Description

Hi,

I am following the example for nesting a BaseSettings class with BaseModels and I am using env_nested_delimiter = '_'.

Running the example code I get:

pydantic.error_wrappers.ValidationError: 4 validation errors for Settings
sub_model -> v1
  field required (type=value_error.missing)
sub_model -> v2
  field required (type=value_error.missing)
sub_model -> v3
  field required (type=value_error.missing)
sub_model -> deep
  field required (type=value_error.missing)

While it looks possible to use the single underscore delimiter, the values are not picked up and I am not sure which ones the settings object is looking for in this example.

Example Code

import os
from unittest import TestCase

from pydantic import BaseModel, BaseSettings


class DeepSubModel(BaseModel):
    v4: str


class SubModel(BaseModel):
    v1: str
    v2: bytes
    v3: int
    deep: DeepSubModel


class Settings(BaseSettings):
    v0: str
    sub_model: SubModel

    class Config:
        env_nested_delimiter = '_'
        env_prefix = 'TEST_'


class TestConfig(TestCase):

    def setUp(self) -> None:
        self.original_env = os.environ.copy()

    def tearDown(self) -> None:
        os.environ.clear()
        os.environ.update(self.original_env)

    def test_nested_delimiter(self):
   
        os.environ['TEST_V0'] = 'v0'
        os.environ['TEST_SUB_MODEL_V1'] = 'v1'
        os.environ['TEST_SUB_MODEL_V2'] = 'v2'
        os.environ['TEST_SUB_MODEL_V3'] = '3'
        os.environ['TEST_SUB_MODEL_DEEP_V4'] = 'v4'

        config = Settings()

        self.assertEqual(config.v0, 'v0')
        self.assertEqual(config.sub_model.v1, 'v1')
        self.assertEqual(config.sub_model.v2, b'v2')
        self.assertEqual(config.sub_model.v3, 3)
        self.assertEqual(config.sub_model.deep.v4, 'v4')

Python, Pydantic & OS Version

pydantic version: 1.10.2
            pydantic compiled: True
                 install path: /Users/petroslabropoulos/Projects/workable/ml-utils/.venv/lib/python3.10/site-packages/pydantic
               python version: 3.10.5 (main, Jul 20 2022, 15:20:07) [Clang 13.1.6 (clang-1316.0.21.2.5)]
                     platform: macOS-13.3.1-x86_64-i386-64bit
     optional deps. installed: ['dotenv', 'typing-extensions']

Affected Components

@hramezani hramezani transferred this issue from pydantic/pydantic May 10, 2023
@petroslamb
Copy link
Author

Note that in my real use case, my classes had default values which hid the error and made it be completely silent. I could not find a way to print the needed env vars and so I asked this question.

Also, after experimentation I found out that if the attributes/submodels have no underscores of their own (here sub_model has one), the variables get picked up correctly.

@hramezani
Copy link
Member

Also, after experimentation I found out that if the attributes/submodels have no underscores of their own (here sub_model has one), the variables get picked up correctly.

Yes. the problem was the _ in one of your fields (sub_model).

I've tested your example in the new pydantic-settings and there is a problem in parsing bytes. I will prepare a patch to fix that problem.

As you may know, we are preparing Pydantic V2 and settings management will be a separate package(pydantic-settings). pydantic-settings is in alpha state and you can install it by pip install pydantic-settings --pre

@petroslamb
Copy link
Author

petroslamb commented May 10, 2023

Hi @hramezani, thanks for your answer.

Should I not expect a fix in the V1 version and I should prepare for a migration to V2 instead?

@hramezani
Copy link
Member

hramezani commented May 10, 2023

Hi @petroslamb, I think it's not a bug. this not working because you have a field with _ in your model.

BTW, V1 only receives security and bugfixes patches.

@petroslamb
Copy link
Author

petroslamb commented May 10, 2023

@hramezani I very much appreciate your prompt responses.

If I may comment on this, I think I did not find this exception in the documentation, which made it silent and hard to debug for me.

Also, I suspect this will happen for every delimiter that is also included the attributes. And the single underscore is as common a case as they come.

@hramezani
Copy link
Member

I think there is no way to fix this. Actually, it's not a bug and the user has to care about the env_nested_delimiter config.

Also, I suspect this will happen for every delimiter that is also included the attributes. And the single underscore is as common a case as they come.

Yeah, actually env variables names are strings, and pydantic-settings split them by env_nested_delimiter if exists. So, I think there is no way to raise an error.

Maybe we can say something about this common problem in the documentation. @petroslamb You can open a PR and explain the problem in doc if you have time.

@antonakospanos
Copy link

antonakospanos commented May 10, 2023

The env_nested_delimiter does not work with single underscore _. This is not expected based on the docs where we read that nested configs are supported using any configurable env_nested_delimiter:

Another way to populate nested complex variables is to configure your model with the env_nested_delimiter config setting, then use an env variable with a name pointing to the nested module fields. What it does is simply explodes your variable into nested models or dicts.
...
env_nested_delimiter can be configured via the Config class as shown above, or via the _env_nested_delimiter keyword argument on instantiation.

Python uses underscores to name vars a lot, so it's highly problematic if env_nested_delimiter fails in case of properties including an _. I would prefer we flag nested configs as beta or sth till we fix this issue. If no fix should be expected let's, at least, document _ as not supported in the docs.

Example

This is our config that fails to be (fully) loaded from the environment:

class LLMConfig(BaseSettings):
    provider: str = "openai"
    chat: bool = True
    model_name: str = "gpt-3.5-turbo"
    api_key: str = ""
    api_type: Optional[str] = "azure"
    api_version: Optional[str] = "2023-03-15-preview"
    deployment_id: Optional[str] = None
    api_base: Optional[str] = None

    class Config(BaseSettings.Config):
        env_prefix = "GENERATION_LLM_"


class GenerationConfig(BaseSettings):
    llm: LLMConfig = LLMConfig()
    ..

    class Config(BaseSettings.Config):
        env_nested_delimiter = "_"
        env_prefix = "GENERATION_"

The fields without an _ are loaded correctly from the environment:

config = GenerationConfig()
config.llm.provider # loaded with GENERATION_LLM_PROVIDER env var
config.llm.chat # loaded with GENERATION_LLM_CHAT env var 

The rest of them, having an _, do not though:

config = GenerationConfig()
config.llm.api_key # not loaded from GENERATION_LLM_API_KEY
config.llm.api_type # not loaded from GENERATION_LLM_API_TYPE
...

@hramezani
Copy link
Member

@antonakospanos it does work with _ and any other strings.

pydantic-settings split env variables name by env_nested_delimiter and tries to build the model. So, if you set the env_nested_delimiter to _ you need to care about the field names. Actually, it is a wrong usage or misconfiguration of pydantic-settings

As I mentioned above, we can add a note to the documentation about this.

@antonakospanos
Copy link

antonakospanos commented May 10, 2023

Properties like GENERATION_LLM__API_KEY are error prone. Why can't we map the env vars (i.e. GENERATION_LLM_API_KEY) to the properties (i.e. api_key) of a nested config as follows?

class GenerationConfig(BaseSettings):
    llm: LLMConfig = LLMConfig()
    ..

    class Config(BaseSettings.Config):
        env_nested_delimiter = "_"
        env_prefix = "GENERATION_"

If we concat the env_prefix, the llm name of the field in GenerationConfig and the env_nested_delimiter we come up with GENERATION_LLM_ prefix, that is the one we expect to have as a prefix in the mapped env vars of LLMConfig's properties.

It seems like sth that can be supported by adjusting the implementation to check all the information provided (env_prefix, nested config name, env_nested_delimeter).

@hramezani
Copy link
Member

The config name is env_nested_delimiter. It's a delimiter for splitting the environment variables. Also, env_prefix is telling the pydantic-settings to just look for env variables that start with it.

I am still thinking the behavior of pydantic-settings here is correct. but let's see what is @samuelcolvin idea here.

@antonakospanos
Copy link

@samuelcolvin your input on this?

@antonakospanos
Copy link

Is this fixed or planned to be fixed in the context of v2? Any update please?

@samuelcolvin
Copy link
Member

Unless I'm missing something, if you set the delimiter to _, then include a field name like sub_model, sure stuff is going to break.

I don't think it's reasonable to expect Pydantic V2 to know not to split sub_model but to split foo_bar.

@jcarbelbide
Copy link

It would be really nice if it worked because the current behavior is a bit inconvenient to work around. I want to start using pydantic-settings, but I don't want to have to change the names of my existing env vars, and I like the elegant solution that pydantic-settings offers

If I understand correctly, is pydantic-settings starting with the env vars, splitting those up by the delimiter, and looking for fields that match in the settings class? Would the problem be solved if instead, it worked backwards? Starting from the class, grab field names, and look for env vars that match the field_name+delimiter+nested_field_name combination? I'm sure I'm missing something obvious, but I wanted to throw the question out there because on the surface, it does seem possible to support behavior like this

@gsakkis
Copy link
Contributor

gsakkis commented Feb 7, 2025

Any workaround for this that doesn't require picking __ or another delimiter?

In addition to @jcarbelbide's suggestion, another potentially simpler / less invasive option that would cover most (even if not all) use cases would be to add a new env_nested_delimiter_depth: int | None = None parameter that if not None splits on the first N delimiters and leaves the rest as is. So for most practical purposes one would set env_nested_delimiter="_", env_nested_delimiter_depth=1.

EDIT: digging into the source, it looks like changing just one line does the trick for depth=1:

-            _, *keys, last_key = env_name_without_prefix.split(self.env_nested_delimiter)
+            _, *keys, last_key = env_name_without_prefix.split(self.env_nested_delimiter, 1)

@hramezani what do you think? I could submit a PR for adding env_nested_delimiter_depth if you agree with the idea.

@hramezani
Copy link
Member

@gsakkis, I will be happy to review a simple change that will fix your problem. please remember to include proper test and doc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants