Skip to content

Multiple endpoints for connection. #310

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jul 10, 2023

Conversation

MikhailBurdukov
Copy link
Contributor

Specify multiple hosts for connection, to provide fault tolerance.

.SetHosts({ getEnvOrDefault("CLICKHOUSE_HOST",      "asasdasd"), 
                            getEnvOrDefault("CLICKHOUSE_HOST",      "localhost"),
                            getEnvOrDefault("CLICKHOUSE_HOST",      "noalocalhost"), 
                            getEnvOrDefault("CLICKHOUSE_HOST",      "localhost"), 
                           })

The endpoint is selecting according to EndpointsIterationAlgorithm, rn only RoundRobin approach is implemented.

Ref #139

@@ -194,30 +207,55 @@ class Client::Impl {
std::unique_ptr<InputStream> input_;
std::unique_ptr<OutputStream> output_;
std::unique_ptr<SocketBase> socket_;
std::shared_ptr<EndpointsIteratorBase> endpoints_iterator;
Copy link

@aalexfvk aalexfvk May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to store iterator here ? Or we can create it as needed. It looks pretty lightweight

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the Round Robin algo, we have to keep the same instance of the object, because otherwise current_index will be reset to zero every time the object is created.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Vasily and believe endpoints iteration-related code could be simplified. Having a base class overridden by a specific algorithm is a good general solution, but in case only one algorithm is implemented, I think it's better to have a dead simple RoundRobinEndpointsIterator endpoints_iterator without inheritance and smart pointers.

If you still want a general solution, consider using std::unique_ptr as we seem to have only one user of this field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO general solution with base class looks pretty good here, fixed shared_ptr to unique_ptr by now. @Enmk what do you think?

@MikhailBurdukov MikhailBurdukov marked this pull request as ready for review May 25, 2023 14:21
@@ -194,30 +207,55 @@ class Client::Impl {
std::unique_ptr<InputStream> input_;
std::unique_ptr<OutputStream> output_;
std::unique_ptr<SocketBase> socket_;
std::shared_ptr<EndpointsIteratorBase> endpoints_iterator;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Vasily and believe endpoints iteration-related code could be simplified. Having a base class overridden by a specific algorithm is a good general solution, but in case only one algorithm is implemented, I think it's better to have a dead simple RoundRobinEndpointsIterator endpoints_iterator without inheritance and smart pointers.

If you still want a general solution, consider using std::unique_ptr as we seem to have only one user of this field.

Copy link
Contributor

@Enmk Enmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requires changes, please take a look at previous attempt, since this new one already repeats some of the errors and comments from the old one:
#139

@MikhailBurdukov
Copy link
Contributor Author

In the last commit, a behavior was added that allows users to reset the connection to a connected endpoint, which is described here. I also checked manually that the client is trying to reconnect to the same endpoint using options_.send_retries, but I don't know how to implement this in the test, because as far as I understand, there are no tests that allow you to manipulate the connection or the `ClickHouse-server' directly. Does anyone have some ideas how to test such situation?

@@ -44,6 +44,19 @@ enum class CompressionMethod {
LZ4 = 1,
};

struct Endpoint {
std::string host;
unsigned int port = 9000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unsigned int port = 9000;
uint16_t port = 9000;

Just to make sure that nobody tires to connect to port >65535

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I think this is a good idea, even though this alters public interface

@@ -135,7 +136,7 @@ class NonSecureSocketFactory : public SocketFactory {
public:
~NonSecureSocketFactory() override;

std::unique_ptr<SocketBase> connect(const ClientOptions& opts) override;
std::unique_ptr<SocketBase> connect(const ClientOptions& opts, const std::string& host, const std::string& port) override;
Copy link
Contributor

@Enmk Enmk Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::unique_ptr<SocketBase> connect(const ClientOptions& opts, const std::string& host, const std::string& port) override;
std::unique_ptr<SocketBase> connect(const ClientOptions& opts, const Endpoint & endpoint) override;

Also please modify the call site.

@@ -88,7 +89,7 @@ class SocketFactory {

// TODO: move connection-related options to ConnectionOptions structure.

virtual std::unique_ptr<SocketBase> connect(const ClientOptions& opts) = 0;
virtual std::unique_ptr<SocketBase> connect(const ClientOptions& opts, const std::string& host, const std::string& port) = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
virtual std::unique_ptr<SocketBase> connect(const ClientOptions& opts, const std::string& host, const std::string& port) = 0;
std::unique_ptr<SocketBase> connect(const ClientOptions& opts, const Endpoint & endpoint) = 0;

Comment on lines 18 to 27
virtual void Next() = 0;
// Get the address of current endpoint.
virtual std::string GetHostAddr() const = 0;

// Get the port of current endpoint.
virtual uint16_t GetPort() const = 0;

// Reset iterations.
virtual void ResetIterations() = 0;
virtual bool NextIsExist() const = 0;
Copy link
Contributor

@Enmk Enmk Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
virtual void Next() = 0;
// Get the address of current endpoint.
virtual std::string GetHostAddr() const = 0;
// Get the port of current endpoint.
virtual uint16_t GetPort() const = 0;
// Reset iterations.
virtual void ResetIterations() = 0;
virtual bool NextIsExist() const = 0;
virtual Endpoint Next() = 0;

Cleaner and much simpler API. We don't need to reset iteration or to check if there are more endpoints left. Consider EndpointsIteratorBase to be an endless generator of endpoints.

If some new iteration strategy pops up later, that would require changes, we can update the interface accordingly, since it is not part of public API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but ResetIterations is needed in order not to create an infinite loop for endpoint choosing.
Otherwise we have to check it thought opts.endpoints.size(),which doesn't look like a good solution. Any ideas how to do so?

if (opts.host.empty())
return opts;

Endpoint endpoint_single({opts.host, opts.port});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Endpoint endpoint_single({opts.host, opts.port});
Endpoint default_endpoint({opts.host, opts.port});

return opts;

Endpoint endpoint_single({opts.host, opts.port});
if (std::find(opts.endpoints.begin(), opts.endpoints.end(), endpoint_single) == std::end(opts.endpoints)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it is best to make default_endpoint first endpoint unconditionally, so it is simple, easy and predictable

@@ -64,8 +64,12 @@ struct ClientInfo {
};

std::ostream& operator<<(std::ostream& os, const ClientOptions& opt) {
os << "Client(" << opt.user << '@' << opt.host << ":" << opt.port
<< " ping_before_query:" << opt.ping_before_query
os << "Client(";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That ignores opt.host and opt.port completely, and this operator<< might be used on unmodified ClientOptions (by tests or users), making output confusing.

@@ -111,6 +115,10 @@ std::unique_ptr<SocketFactory> GetSocketFactory(const ClientOptions& opts) {
return std::make_unique<NonSecureSocketFactory>();
}

std::unique_ptr<EndpointsIteratorBase> GetEndpointsIterator(const std::vector<Endpoint>& endpoints) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::unique_ptr<EndpointsIteratorBase> GetEndpointsIterator(const std::vector<Endpoint>& endpoints) {
std::unique_ptr<EndpointsIteratorBase> GetEndpointsIterator(const ClientOptions& options) {

@MikhailBurdukov
Copy link
Contributor Author

MikhailBurdukov commented Jun 27, 2023

Discussed in direct messages:

  • Connection format: N times to all hosts. Example with 3 endpoints and N = 2 : 1- 2 -3 -1 - 2 -3.
  • Iterator must have only one method: virtual Endpoint Next() = 0; and it is an endless generator of endpoints.
  • The logic of connection control in Client Impl. Number of connection attempts = endpoints.size() * send_retry
  • If the host has become unavailable, we first try to recreate the connection with it, if it did not work, then we try to find a new one.

All this is done in the last commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants