Skip to content
This repository was archived by the owner on Nov 25, 2024. It is now read-only.

[roomserver]Store transaction ID and prevent reprocessing of events #446

Merged

Conversation

APwhitehat
Copy link
Contributor

Reconfiguration of postgresql is required to test this.

Signed-off-by: Anant Prakash <[email protected]>

Copy link
Member

@erikjohnston erikjohnston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely the right direction.

auth_event_nids BIGINT[] NOT NULL
auth_event_nids BIGINT[] NOT NULL,
-- Apply uniqueness constraint on event_id and transaction_id
CONSTRAINT roomserver_event_id_transaction_id_unique UNIQUE (event_id, transaction_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is saying that we want the tuple (event_id, txn_id) to be unique, which would allow us to have duplicated event_ids, which is not what we want.

// Check if an transaction ID exists
// This is used to determine if the room event is processed/processing already
CheckTransactionID(
ctx context.Context, TransactionID string,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A transaction ID is only unique up to a user and device, so we'll need to also pass in the sender and device_id.

event_id TEXT NOT NULL,
-- Transaction ID for this event
-- Required to ensure that an event is not processed again
transaction_id TEXT NOT NULL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll also need to store at least the device_id and probably the sender (though the sender can be computed from the json)

const checkExistsTransactionIDSQL = "" +
"SELECT EXISTS(" +
"SELECT 1 FROM roomserver_events WHERE transaction_id = $1" +
")"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably remove the SELECT EXISTS (..) portion and simply count the number of rows returned. Another option is SELECT COUNT(*) FROM roomserver_events WHERE transaction_id = $1. In general its usually clearer to try and avoid using subqueries where possible.

@erikjohnston
Copy link
Member

Unfortunately, I forgot two things that makes things a bit more annoying:

  1. Transactions IDs are scoped to a particular user's device. This means that the same transaction ID may be used across different devices, so all queries for transaction IDs need to take that into account.
  2. We need to make sure the client API returns the right event ID, i.e. if we have already processed the transaction we need to return the existing event's ID, rather than the ID for the event we are trying to persist. This means that the API the clientapi uses to send events needs to return an event ID.

The second point should be relatively easy to do. However the first point is more annoying, since the events table currently stores neither the sender user_id nor device ID. Given that, I think I've changed my mind and we should probably have it as a a separate table (like you originally suggested).

@erikjohnston
Copy link
Member

Oh, and also: whenever you add new queries (especially when using new columns), its a good idea to stop and consider whether it needs a new index to be added.

For example, currently in this PR we're adding a query to select based on transaction_id, which is not indexed. This means that the database would need to scan the entire table to find the row, which isn't ideal, so we should in that case add an index on transaction_id.

@APwhitehat APwhitehat force-pushed the roomserver_txnID_idempotent branch 2 times, most recently from 6f4b3f5 to ac6a910 Compare May 24, 2018 15:23
"INSERT INTO roomserver_transactions (transaction_id, device_id, user_id, event_id)" +
" VALUES ($1, $2, $3, $4)" +
" ON CONFLICT ON CONSTRAINT roomserver_transaction_unique" +
" DO NOTHING"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to noop if the row already exists?

@erikjohnston erikjohnston merged commit 60e7795 into matrix-org:master May 26, 2018
@APwhitehat APwhitehat deleted the roomserver_txnID_idempotent branch May 26, 2018 11:04
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants