Description
Basic Infos
- This issue complies with the issue POLICY doc.
- I have read the documentation at readthedocs and the issue is not addressed there.
- I have tested that the issue is present in current master branch (aka latest git).
- I have searched the issue tracker for a similar issue.
- If there is a stack dump, I have decoded it.
- I have filled out all fields below.
Platform
- Hardware: [ESP-07]
- Core Version: [latest git hash or date]
- Development Env: [Platformio]
- Operating System: [Windows]
Settings in IDE
- Module: [Generic ESP8266 Module]
- Flash Mode: [dio]
- Flash Size: [4MB]
- lwip Variant: [v2 Lower Memory|Higher Bandwidth]
- Reset Method: [nodemcu]
- Flash Frequency: [40Mhz]
- CPU Frequency: [80Mhz]
- Upload Using: [OTA|SERIAL]
- Upload Speed: [115200] (serial upload only)
Problem Description
The last few months, I've encountered lots and lots of unexplainable issues on a specific project.
The update runs fine on my side and when updated on other nodes we see all kinds of random issues.
A few weeks ago we found out that when those nodes are (re)flashed via serial, the nodes run just fine.
The boards we made all have an ESP-07S on board, which were pre-flashed by the seller with our image.
However the initial image was based on core 2.5.x and we're now using core 2.7.4
So I wonder if it is possible that the bootloader part (or whatever it is called) may not be overwritten using OTA and that this part may use different flash settings?
The issues we encountered were so random, that I'm really starting to think the flash was either not written correctly, or maybe not always read correct.
I know for sure that the power supply is not to blame here, as I designed the boards myself.
The ESP is powered by its own AMS1117, has the capacitors described by Espressif's design guides and the boards run just fine after being flashed over serial.
Could this be somehow related to the changes made a while back regarding the driving voltage of XMC flash?
N.B. not all units seem to have the same flash brand. Some report as XMC, but not all and I am not entirely certian all boards with stability issues are specific to be using XMC flash.
Could it also be a write speed issue? No idea how quickly the flash is written via OTA.
When flashing using serial, there does not seem to be any difference when flashing at 115200 baud vs. 4 times that rate.
I also tested on some boards by flashing an older image (of the entire flash) and then performing an OTA update, that it would fail repeatedly.
Strangely enough, flashing another build with minor code changes, via OTA, could run stable.
So this does sound like we may have some very specific bit pattern which is harder to flash?
I assume the OTA flashed data is verified using some checksum, but is it possible this checksum is calculated when the ESP is running "older" code to set the flash parameters?
Or is there maybe a timing issue possible where reading the flash sequentially is different from normal use, so that an post-OTA check may be successful, but still cause failed reads when running in normal mode?