Will Continuous Top-N leak space forever?

I'm looking at this query:

https://github.com/ververica/flink-sql-cookbook/blob/aa0bd978b1116a073e094ceea44ecd659ae3cefd/aggregations-and-analytics/05_top_n/05_top_n.md#L30-L47

I'm wondering if this will leak space forever?

## An example about data size

Excluding the `GROUP BY` bit, and simplifying:

```sql
SELECT wizard, spell, times_cast
FROM (
    SELECT *,
    ROW_NUMBER() OVER (PARTITION BY wizard ORDER BY times_cast DESC) AS row_num
    FROM (SELECT wizard, spell, times_cast FROM spells_cast_source)
)
WHERE row_num <= 2; 
```

From a logical PoV, my brain tells me that Flink will have to forever keep a map of seen identifiers (`wizard`) somewhere, in order to track `row_num` continuously?

What happens if I have 300Gb of data? Will this leak that space? 

## Windowed aggregations and continuous streaming?

Looking at upstream docs for Flink 1.20:

https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/table/sql/queries/window-agg/

> Unlike other aggregations on continuous tables, window aggregation do not emit intermediate results but only a final result, the total aggregation at the end of the window. Moreover, window aggregations purge all intermediate state when no longer needed.

So the tradeoff here is either "continuously, but leaking space", or "when window ends, but cleaning up after a window".

Is there a way to obtain continuous streaming over a window, whilst still cleaning up after a window is done? 

	```sql
	CREATE TABLE spells_cast (
	wizard STRING,
	spell STRING
	) WITH (
	'connector' = 'faker',
	'fields.wizard.expression' = '#{harry_potter.characters}',
	'fields.spell.expression' = '#{harry_potter.spells}'
	);

	SELECT wizard, spell, times_cast
	FROM (
	SELECT *,
	ROW_NUMBER() OVER (PARTITION BY wizard ORDER BY times_cast DESC) AS row_num
	FROM (SELECT wizard, spell, COUNT(*) AS times_cast FROM spells_cast GROUP BY wizard, spell)
	)
	WHERE row_num <= 2;
	```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Will Continuous Top-N leak space forever? #68

An example about data size

Windowed aggregations and continuous streaming?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Will Continuous Top-N leak space forever? #68

Description

An example about data size

Windowed aggregations and continuous streaming?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions