Description
As per #26, the solution of atomically swapping the tables is unsafe in the event of death of connections.
I've been considering some other solutions. I've reached dead-end with all of them, but hopefully someone else can pick on where I stopped and find a solution.
An idea based on views, derived from my earlier work on http://code.openark.org/blog/mysql/auto-caching-tables is as follows:
Views & SLEEP(), dead-end on the SLEEP() part
- We assume we want to alter table
tbl
- Ghost table is called
ghost
- We issue:
create or replace view ghost_view as select * from ghost where sleep(600)>=0
(this view is updatable, and can receiveINSERT
,UPDATE
,DELETE
) - To begin the swap, we issue:
rename table tbl to tbl_old, ghost_view to tbl
RENAME
works just as well on views as it does on tables - Queries are now executing on the view, but are stalling for a long enough period
We now have time to apply changes ontoghost
- Problem/Dead-end: we would like to
rename table tbl to tbl_old_view, ghost to tbl
but therename
is blocked until all those hanging queries are complete. There's no way to migrate those already running queries onto the renamed ghost table.- I tried two levels of views (a view calling a view calling the table) and then swapping the mid-layered view. Still does not work. The
rename
hangs until the queries complete, which is not what we want.
- I tried two levels of views (a view calling a view calling the table) and then swapping the mid-layered view. Still does not work. The
Any ideas?
Views & GET_LOCK(), spaghetti on GET_LOCK()
Before I realized the spaghetti was there, I found this solution to be appealing:
- True, it uses
GET_LOCK
, so same "what if the connection dies" problem is still there - However I reduce the number of risky connections from 2 to 1
- And then make it possible to have n connections who will share the risk: it would take death of all n connections to cause for a premature
rename
(as opposed to death of any connection)
And then I found the spaghetti.
The solution is similar to the above, but:
- Connection mysqlbinlog reader proof of concept #1 issues
GET_LOCK('ding', 0)
and succeeds - We
create or replace view ghost_view as select * from ghost where get_lock('ding',600) >= 0
- We
rename table tbl to tbl_old, ghost_view to tbl
queries are blocked ontbl
(which is now aview
) - We complete writing events onto
ghost
- We
rename table tbl to tbl_old_view, ghost to tbl
- this blocks due to the already existing queries
- We
RELEASE_LOCK('ding')
in connection mysqlbinlog reader proof of concept #1 - Problem queries are released, but are now blocking each other!!!
rename
unblocks and we have our migrated table in place.
Before discussing the Problem, note that we can variant as follows:
- Have connections mysqlbinlog reader proof of concept #1 .. #n issue a
SELECT GET_LOCK('ding#1', 600)
toGET_LOCK('ding#n', 600)
(we take n locks) - Variant the view:
create or replace view ghost_view as select * from ghost where get_lock('ding#1',600)+get_lock('ding#2',600)+...+get_lock('ding#n',600) >= 0
Such that it would take the death of all n connections to make a premature rename
.
So, back to the problem. Each query will issue get_lock
and queries will block each other, leading to really scary workload. We can hack around this by doing crazy stuff like:
create view... select where release_lock(concat('ding', if(get_lock('ding',600) >= 0, '', ''))) >= 0
. This will make sure to release any acquired lock.
- This still scares me because of crazy locking contention.
Ideas?