Skip to content

LWIP memory leak? #3419

Closed
Closed
@kovikti

Description

@kovikti

Basic Infos

Main question: can there be a situation during a tcp communication session, where lwip crashes or memory is not freed up? If yes, when, how to avoid it?
Can anything cause this on the server/network side?

Hardware

Hardware: ESP-12E
Core Version: most recent git master, also tried git sdk 2.1.0

Description

I'm experiencing some strange issues with (assuming) the TCP/IP library.
It is very difficult to debug and and I'm still investigating it, however it already took more time than I wanted. Sorry in advance...
So I have a rather large piece of software, POSTing data to a server regularly, every 10 sec.
Everything was working fine (weeks without any issues), now I changed the server to a private one instead of a shared one at a hosting company. PC to his server connection is fine.
Now what I see is that sometimes (quite random, from every few requests to one out of several hours), data is not sent by WiFiClient (debug does not show ":sent"), and the buffer seems to be not freed. client.stop() s called after a timeout period. So slowly memory runs out and no more request can be made. The system is still running (more or less), but useless.
I debugged my heap usage, I free up everything that I allocated, I'm pretty sure the problem is in LWIP or close to it.
On the server side (Apache) I see missing requests (nothing appears in the access log) or request timeouts as no data arrived to the server (such as [12/Jul/2017:12:22:41 +0200] "-" 408 0 "-" "-").
At the first case the connection did not build up successfully, at the second, connection is ok, but packets were lost. Normally this shouldn't happen with tcp.
I did tcp dumps on the server that confirms this, some packets didn't reach the server. (However providing an ssh tunnel to the server from a localhost provided better results - still investigating).
I added delay after the client.write() function, it seemed to help, sending smaller chucks to write, also helped, but the problem still persists.
It occures pretty random, so not easy to tell.
According to wireshark if the connection was built up and the problem arises, I saw several duped ACKs, overlapping ACKs, strange things, but simetimes the tcp stream is just cut off during the session:
Server side: SYN, SYN ACK, ACK, PACKET, ACK, and nothing more, however an other packet should have been sent by the ESP.

Earlier I assumed WiFiClient.write is blocking, until ack received, so I assumed the missing packes were sent from the ESP but it looks like not. I also had some suspicion of the dynamic memory allocation for ClientContext in WiFiClient, maybe it is not freed up?
I also tried switching to the async tcp library, in order to have better control over the packets, which seemes great, but the problem seems to persist. For now it seems much more rare, but difficult to tell. I haven't yet seen memory leaks here, only crashes.
See the log at the bottom, an ack should have arrived.

I assume there might be some strange constellation between some kind of a network error (?) and a bug in lwip: not handling retransmission or acks in some strange situation.

Still investigating, but I wanted to know if anyone has experienced anything like this.

EDIT: I managed to reproduce the problem with a simple sketch. See below.
Try changing host and chunksize.
With google.com only small chunk size works (<=1460), but with for example httpbin.org size 2000 works without problems.
However, if I connect the ESP to mobile hotspot, it works!
I'm really confused now.

Settings in IDE

IDE: VMICRO+Visual Studio
Module: Generic ESP8266 Module
Flash Size: 4MB
CPU Frequency: 80Mhz
Flash Mode: qio
Flash Frequency: 40Mhz
Upload Using: SERIAL
Reset Method: nodemcu

Sketch

#include <ESP8266WiFi.h>

const char* ssid = "...";
const char* password = "...";

//const char* host = "httpbin.org";
const char* host = "google.com";

int chunksize = 1000;// 2000;

int contentLength = 2000;

void setup() {
	Serial.begin(115200);
	delay(10);

	// We start by connecting to a WiFi network

	Serial.println();
	Serial.println();
	Serial.print("Connecting to ");
	Serial.println(ssid);

	WiFi.mode(WIFI_STA);
	WiFi.begin(ssid, password);

	while (WiFi.status() != WL_CONNECTED) {
		delay(500);
		Serial.print(".");
	}

	Serial.println("");
	Serial.println("WiFi connected");
	Serial.println("IP address: ");
	Serial.println(WiFi.localIP());
}

WiFiClient client;

void loop() {
	//client.setNoDelay(true);
	delay(2000);
	Serial.printf("\nHeap: %d\n", ESP.getFreeHeap());
	++value;
	
	Serial.printf("Time: %d\n", millis()/1000);

	Serial.print("connecting to ");
	Serial.println(host);

	const int httpPort = 80;
	if (!client.connect(host, httpPort)) {
		Serial.println("connection failed");
		return;
	}
	Serial.println("Connected.");
	
	String url = "/";

	Serial.print("Requesting URL: ");
	Serial.println(url);

	
	String s = String("POST ") + url + " HTTP/1.1\r\n" +
		"Host: " + host + "\r\n" +
		"Content-type: application/octet-stream\r\n" +
		"Connection: close" + "\r\nContent-length: " + (String)contentLength + "\r\n\r\n";

	int dataLen = contentLength + s.length();
	uint8_t data[dataLen];

	memcpy(data, s.c_str(), s.length());

	int pos = 0;

	auto time1 = millis();
	while (true){
		int len = dataLen - pos;
		if (len == 0){
			break;
		}
		if (len > chunksize){ len = chunksize; }
		client.write(data+pos, len);
		pos += len;
		yield();
	}
	auto time2 = millis();

	Serial.printf("send time: %d\n", time2-time1);

	unsigned long timeout = millis();
	while (client.available() == 0) {
		if (millis() - timeout > 5000) {
			Serial.println(">>> Client Timeout !");
			client.stop();
			return;
		}
	}

	// Read all the lines of the reply from server and print them to Serial
	while (client.available()){
		String line = client.readStringUntil('\r');
		Serial.print(line);
	}

	Serial.println();
	Serial.println("closing connection");
	client.stop();
}

Debug Messages

Async Httpclient start connecting
Async Httpclient connected
Async Httpclient write len: 1400
Async Httpclient ACK len: 1400 time: 58
Async Httpclient write len: 748
Async Httpclient send finished
mac 985

ets Jan 8 2013,rst cause:4, boot mode:(1,6)

wdt reset

(second ACK should have arrived!)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions