Skip to content

[WIP] Remove ESPAsyncTCP dependency of OTA code, SSL fixes, better parsing #1716

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

Niek
Copy link
Contributor

@Niek Niek commented May 13, 2019

This PR changes the following:

  • Update GitHub SSL fingerprint
  • Remove ESPAsyncTCP dependency in OTA module
  • OTA uses SSL with AxTLS on core < 2.5.0 and BearSSL on 2.5.0
  • Better OTA response header parsing

Please test carefully, but I had no issues on core 2.4.2 and 2.5.0 on my setup.

@mcspr mcspr self-requested a review May 13, 2019 19:58

WiFiClient _ota_client;

#if ASYNC_TCP_SSL_ENABLED
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTA_SECURE_ENABLED?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's even better to use a generic SUPPORT_SSL flag (or similar)? We can use that in MQTT and other modules as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per pr - we no longer use async client, so why depend on it?

We can use that in MQTT and other modules as well.

Yep, no reason not to enable all secure things at the same time (since they use the same library and we bundle it anyways).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added SSL_ENABLED now - if it looks good I'll make the changes across the whole project in a separate PR

#define USING_AXTLS // do not use BearSSL
#endif

WiFiClient _ota_client;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using single _ota_client object instead of separate non-secure and secure ?
Minor question - does this create _ota_client when secure one is used? (i.e. is it in memory at this point?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it but it clutters the code a lot if you're using a separate _secure object: all subsequent _ota_client calls need to be surrounded with if-checks. But maybe this is not the nicest solution. Now that I think about it - not sure what happens when connecting to a HTTPS host first and to a HTTP one later. Do you have any suggestions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad not looking at this closer. Secure client will not fallback to basic one :(

Essentially, what my thinking is

  • use separate methods for secure and non-secure update flow
  • remove global client objects, create them inplace. We are not async, we expect them to live up until the function ends.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And mqtt /ota handler needs to be aware of that too. All async-mqtt-client callbacks are inside SYS context - we cannot use Core's sync clients inside of it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed some changes, the client objects are created inside the function now. I have a strange crash with SSL requests though (alloc issues in SSL) - can you check on your end as well?

Copy link
Collaborator

@mcspr mcspr May 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something that I tried yesterday with HTTPClient & httpUpdate was requiring ~20seconds for connection to happen.

In general, I think we would want to avoid doing connection twice. HTTPClient, for example, wraps around that internally (which btw can be adapted here or straight up used as-is)

Reviewing HTTPClient I'd also noticed that it cleverly wraps around fingerprinting (see Traits classes). But I did rewrite bunch of things to stop axtls thing from happening so no idea about that, but in the end it did actually start connecting.

Another problematic part is shadowing _ota_client instead of having separate methods (which sometimes caused compiler errors for me. might be a false flag, but can be easily separated instead of creating it)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remedy for 20 second wait was turning off active webserver (that was not ok...)

if (getSetting("webEnable", 1).toInt() == 0) return;

In webSetup(), wsSetup() and debugWebSetup()

At least now I have a reproducible "conflict" mentioned in the comments (which I did not remember having earlier :/). I wonder if async webserver polling method is broken somehow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion it would be better to remove ESPAsyncTCP/ESPAsyncWebServer in the whole project - it's not maintained anymore and you'll see more and more compatibility issues with future core versions. Rewriting the telnet server is not too much work (I'm testing it now). For web support: the standard ESP8266WebServer is pretty good, you'll only need something like arduinoWebSockets for WS(S) support.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely thing :( Web server part is problematic for some parts though - alexa triggers header parsing bugs, some credentials stuff works differently... Which again puts the Core version as a dependency on some bug or feature. (OT: we also can switch to espressif IDF 🙄)
Websocket is not a huge problem, since it is both an external lib and pretty widely used

Copy link
Collaborator

@mcspr mcspr May 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another heads up regarding slightly getting distracted and dependencies in general.
What we can do instead of removing the module straight away, is implement another one side by side (just another file) and make them incompatible with each other. That way we can already use new one, and don't need to think about dependencies in a strange way.
edit: ...and if I don't sleep too much tomorrow, I'll prepare bssl client as a pr

#if !defined(ARDUINO_ESP8266_RELEASE_2_5_0) && !defined(ARDUINO_ESP8266_RELEASE_2_5_1) && !defined(ARDUINO_ESP8266_RELEASE_2_5_2)
#define USING_AXTLS // do not use BearSSL on core < 2.5.0
#endif

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a way to check this in a better way - maybe ESP.getCoreVersion()?

This comment was marked as outdated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That way bearssl is also off with git versions of the Core.
2.5.0 and up introduced https://github.com/esp8266/Arduino/blob/master/cores/esp8266/core_esp8266_version.h
tag with proper comparison, but we again are hit by multi-Core support :/

In theory, one can depend on some behaviour / implementation instead. Kind of dangerous though, if it changes. For example, see:
dev...mcspr:utils/fragmentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use ARDUINO_ESP8266_GIT_VER instead? The type_traits stuff looks pretty rough (not to mention badly readable).

Copy link
Collaborator

@mcspr mcspr May 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARDUINO_ESP8266_GIT_VER has the same problem as RELEASE defines. BTW I think we are carried away. Does this work? Shouldn't it be global -D... instead, so framework gets this too? IMO documentation would solve this better. If we sure this breaks horribly, static_assert(condition, "message") can also do this.

Regarding other branch, you'd say that this is a clearer intent?

#include <cont.h>
#if defined(ARDUINO_ESP8266_RELEASE_2_3_0) \
|| defined(ARDUINO_ESP8266_RELEASE_2_4_0) \
|| defined(ARDUINO_ESP8266_RELEASE_2_4_1)
extern cont_t g_cont;
#define getFreeStack() cont_get_free_stack(&g_cont)
#elif defined(ARDUINO_ESP8266_RELEASE_2_4_2)
extern cont_t* g_pcont;
#define getFreeStack() cont_get_free_stack(g_pcont)
#else
#define getFreeStack() ESP.getFreeContStack()
#endif

https://en.cppreference.com/w/cpp/language/decltype
https://en.cppreference.com/w/cpp/types/enable_if
It is just an overload detection. We force compiler to pick specific 'detect()' method and store bool based on the one it found (either internal or external return type)
From my limited understanding, to make this shorter 'getHeapStats' function would need different return types (or different set of args)

@Niek Niek changed the title Remove ESPAsyncTCP dependency of OTA code, SSL fixes, better parsing [WIP] Remove ESPAsyncTCP dependency of OTA code, SSL fixes, better parsing May 23, 2019
@mcspr mcspr self-requested a review May 23, 2019 22:23
@mcspr
Copy link
Collaborator

mcspr commented May 28, 2019

For some of long-stack-trace-crashes: esp8266/Arduino@d83eabe, esp8266/Arduino#6153, esp8266/Arduino#6156
Turns out bearssl client silently corrupted memory.
Also interesting advice from the linked issue - put WiFiClient object into heap (see unique_ptr example in HTTPClient https example). Haven't checked the exact size, but it may be important when we are using multiple levels of functions.

Still no clue what exactly webserver does differently from telnet server :/
Timeout problem (15000ms is the default) would also be explained by that, since we may never receive esp_schedule() in time to get back to the client. Something is blocking the process.

@mcspr
Copy link
Collaborator

mcspr commented May 28, 2019

Was testing with 2.5.2 release. Running latest GIT Core (esp8266/Arduino@dddc8d2) I no longer see the client hanging when ESPAsyncWebServer is listening for connections.
Must've been the latest scheduled_functions changes (yield / no yield after loop etc.), not related to wifi client itself.

edit: Certs, fingerprinting works. Certs do need MFLN thing, OOM crashing without it
~31k heap at the start, just before https.begin() heap is 21600 (20096 for real, 7% fragmentation)
(haven't checked where exactly alloc is crashing. maybe something fixable)

@Niek
Copy link
Contributor Author

Niek commented Jun 5, 2019

Closing in favor of #1751

@Niek Niek closed this Jun 5, 2019
@mcspr mcspr mentioned this pull request Jun 13, 2019
@Niek Niek deleted the ota-without-espasynctcp branch July 15, 2019 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants