-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[5.4] Improve article list performance for large sites #45542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 5.4-dev
Are you sure you want to change the base?
[5.4] Improve article list performance for large sites #45542
Conversation
administrator/components/com_content/src/Model/ArticlesModel.php
Outdated
Show resolved
Hide resolved
A long time ago we had, what sounds like, a very similar issue and in that case although the indexing improved the performance on a large site it had a negative experience impact on sites with a more typical number of items. I am not saying that is the case here just that it needs to be checked |
I reviewed the affected query on small (< 100 articles) and medium size (1200 articles) sites, in both cases the indexes that are enforced by this PR have been in use anyways. Only on large scale size the optimizer decides to skip the index. |
Well, we can do like that also, a bit hacky, but should work. |
Same for me. I played around with various changes to the indexes, join types and even tested multiple mariadb version - no effect. |
The system test failure is unrelated from this PR and will be fixed in #45547 |
I have tested this item ✅ successfully on e55df52 This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/45542. |
btw, the query works faster when use filter by category, that is why the frontend does not have the same slow down effect, even though query almost identical. |
@Fedik Have you tested with MySQL? Or MariaDB? Or both? |
@richard67 MySQL |
I have tried to reproduce the actual and the expected result on diverse Ubuntu VMs with different MySQL 8.0 versions, but I was not able to get the same results. Unfortunately I don't have MariaDB available for tests. What I did: I've made a new installation of a current 5.4-dev. The I have disabled the "Content - Smart Search" and "Behaviour - Versionable" plugins as recommended in the Readme of @Fedik 's plugin https://github.com/Fedik/plg_sampledata_bigdata , and I have enabled Workflows in the integration options of com_content. Then I've installed an enabled the latest release of that plugin plg_sampledata_bigdata_1.0.0_328f117_20240927_1253.zip . Then I have run it several times to that I had more than 50k articles and workflow associations on one site, and on another I had more than 30k. In both cases the same result: Without the
As you can see, the primary key is used for the workflows table "w", in opposite to the screen shot for the actual result in the description of this PR. With the
As you can see, forcing the PRIMARY key makes other keys disappear from the list of possible keys, and in case of the workflow associations "wa" it results in the "idx_extensions" not being used anymore. The reason for that might be that in my scenario, all categories and all articles belong to the the same stage of the same workflow, and there are no other workflows and workflow stages. This fits to the first answer to the stackoverflow question here https://stackoverflow.com/questions/945925/mysql-index-being-ignored :
The citation is taken from here: https://dev.mysql.com/doc/refman/8.4/en/index-hints.html Now if you scroll a bit down on that MySQL documentation, you will find an important note:
When the "Thus, you should expect USE INDEX, FORCE INDEX, and IGNORE INDEX to be deprecated in a future release of MySQL," will happen, we will have to change our code to be different for MySQL and MariaDB, as the latter seems not to make that change, see https://mariadb.com/kb/en/force-index/ , and optimizer hints seem to work very differently to MySQL, see https://mariadb.com/kb/en/index-hints-how-to-force-query-plans/ . So I am not sure if this PR here is the way to go. @SniperSister @Fedik Maybe you can provide more details about your scenario, e.g. how workflows are used? Or could you test my scenario if you get the same results as I did? Maybe something is special with my MySQL installation on Ubuntu (however, I am not aware of having changed anything on the default configuration regarding the opimizer such stuff). |
I am testing on local Mysql 8.0.32 without any extra configurations. My test result is: After PR: I just noticed that this PR is missing And I think you have a faster setup 😉 |
@Fedik So it seems that for you the PRIMARY key of the "ws" table is also not used with the PR? |
@Fedik And could you post the complete queries without and with the PR? Or are they exactly the same as mine? Mine are with workflows enabled. |
Sure, SELECT `a`.`id`,`a`.`asset_id`,`a`.`title`,`a`.`alias`,`a`.`checked_out`,`a`.`checked_out_time`,`a`.`catid`,`a`.`state`,`a`.`access`,`a`.`created`,`a`.`created_by`,`a`.`created_by_alias`,`a`.`modified`,`a`.`ordering`,`a`.`featured`,`a`.`language`,`a`.`hits`,`a`.`publish_up`,`a`.`publish_down`,`a`.`introtext`,`a`.`fulltext`,`a`.`note`,`a`.`images`,`a`.`metakey`,`a`.`metadesc`,`a`.`metadata`,`a`.`version`,`fp`.`featured_up`,`fp`.`featured_down`,`l`.`title` AS `language_title`,`l`.`image` AS `language_image`,`uc`.`name` AS `editor`,`ag`.`title` AS `access_level`,`c`.`title` AS `category_title`,`c`.`created_user_id` AS `category_uid`,`c`.`level` AS `category_level`,`c`.`published` AS `category_published`,`parent`.`title` AS `parent_category_title`,`parent`.`id` AS `parent_category_id`,`parent`.`created_user_id` AS `parent_category_uid`,`parent`.`level` AS `parent_category_level`,`ua`.`name` AS `author_name`,`wa`.`stage_id` AS `stage_id`,`ws`.`title` AS `stage_title`,`ws`.`workflow_id` AS `workflow_id`,`w`.`title` AS `workflow_title`
FROM `bd5_content` AS `a`
LEFT JOIN `bd5_languages` AS `l` ON `l`.`lang_code` = `a`.`language`
LEFT JOIN `bd5_content_frontpage` AS `fp` ON `fp`.`content_id` = `a`.`id`
LEFT JOIN `bd5_users` AS `uc` ON `uc`.`id` = `a`.`checked_out`
LEFT JOIN `bd5_viewlevels` AS `ag` ON `ag`.`id` = `a`.`access`
LEFT JOIN `bd5_categories` AS `c` ON `c`.`id` = `a`.`catid`
LEFT JOIN `bd5_categories` AS `parent` ON `parent`.`id` = `c`.`parent_id`
LEFT JOIN `bd5_users` AS `ua` ON `ua`.`id` = `a`.`created_by`
INNER JOIN `bd5_workflow_associations` AS `wa` ON `wa`.`item_id` = `a`.`id`
INNER JOIN `bd5_workflow_stages` AS `ws` ON `ws`.`id` = `wa`.`stage_id`
INNER JOIN `bd5_workflows` AS `w` ON `w`.`id` = `ws`.`workflow_id`
WHERE `wa`.`extension` = 'com_content.article' AND `a`.`state` IN (1,0)
ORDER BY a.id desc LIMIT 20 With: SELECT `a`.`id`,`a`.`asset_id`,`a`.`title`,`a`.`alias`,`a`.`checked_out`,`a`.`checked_out_time`,`a`.`catid`,`a`.`state`,`a`.`access`,`a`.`created`,`a`.`created_by`,`a`.`created_by_alias`,`a`.`modified`,`a`.`ordering`,`a`.`featured`,`a`.`language`,`a`.`hits`,`a`.`publish_up`,`a`.`publish_down`,`a`.`introtext`,`a`.`fulltext`,`a`.`note`,`a`.`images`,`a`.`metakey`,`a`.`metadesc`,`a`.`metadata`,`a`.`version`,`fp`.`featured_up`,`fp`.`featured_down`,`l`.`title` AS `language_title`,`l`.`image` AS `language_image`,`uc`.`name` AS `editor`,`ag`.`title` AS `access_level`,`c`.`title` AS `category_title`,`c`.`created_user_id` AS `category_uid`,`c`.`level` AS `category_level`,`c`.`published` AS `category_published`,`parent`.`title` AS `parent_category_title`,`parent`.`id` AS `parent_category_id`,`parent`.`created_user_id` AS `parent_category_uid`,`parent`.`level` AS `parent_category_level`,`ua`.`name` AS `author_name`,`wa`.`stage_id` AS `stage_id`,`ws`.`title` AS `stage_title`,`ws`.`workflow_id` AS `workflow_id`,`w`.`title` AS `workflow_title`
FROM `bd5_content` AS `a`
LEFT JOIN `bd5_languages` AS `l` ON `l`.`lang_code` = `a`.`language`
LEFT JOIN `bd5_content_frontpage` AS `fp` ON `fp`.`content_id` = `a`.`id`
LEFT JOIN `bd5_users` AS `uc` FORCE INDEX(PRIMARY) ON `uc`.`id` = `a`.`checked_out`
LEFT JOIN `bd5_viewlevels` AS `ag` ON `ag`.`id` = `a`.`access`
LEFT JOIN `bd5_categories` AS `c` ON `c`.`id` = `a`.`catid`
LEFT JOIN `bd5_categories` AS `parent` ON `parent`.`id` = `c`.`parent_id`
LEFT JOIN `bd5_users` AS `ua` FORCE INDEX(PRIMARY) ON `ua`.`id` = `a`.`created_by`
INNER JOIN `bd5_workflow_associations` AS `wa` FORCE INDEX(PRIMARY) ON `wa`.`item_id` = `a`.`id`
INNER JOIN `bd5_workflow_stages` AS `ws` FORCE INDEX(PRIMARY) ON `ws`.`id` = `wa`.`stage_id`
INNER JOIN `bd5_workflows` AS `w` FORCE INDEX(PRIMARY) ON `w`.`id` = `ws`.`workflow_id`
WHERE `wa`.`extension` = 'com_content.article' AND `a`.`state` IN (1,0)
ORDER BY a.id desc LIMIT 20 The query looks the same to what you have with enabled Workflow.
yeap |
My guess, MySQL optimizer have some kind of internal performance rating for its server, and based on this doing its optimization. Or maybe some memory limit config. |
Yep, that's my assumption too. Getting the exact same EXPLAIN results is going to be hard if the query is performed on different machines / setups. |
If we could find which configuration could improve performance with large amount of data, we then probably can skip this hacks, and add "mysql configuration hint" in to joomla manual, somewhere. |
This is interesting https://stackoverflow.com/questions/50508632/mysql-not-picking-up-the-optimal-index Here is my trace the query optimiser-trace.txt But hard to read, for example for index |
Pull Request for Issue #43701
Based on the PR #44079 by @Fedik , slightly adjusted and ported to 5.4
Summary of Changes
In our company we are supporting a rather large Joomla 5.x client site with more than 50.000 articles that makes use of the workflows. For years, the performance in the backend was perfectly fine, however recently we experienced significant performance issues in the article list views in the frontend but most importantly in the backend.
During my debugging I realized that the main query for the backend article list could use indexes on all relevant joins but doesn't do so. Instead, it creates a temporary table which has a significant performance impact. During further diagnosis it became apparent that the MySQL query optimizer stops using the available keys if a certain number of articles are in the #__content table, which is a document behavior (see https://stackoverflow.com/questions/72807286/index-not-being-utilized-in-mysql-when-larger-dataset) and helpful most of the times. In Joomla's usecase however the performance impact is significant: By enforcing the index usage as suggested in this PR, the query time was reduced from 2.6s down to 48ms.
Example screenshot of an EXPLAIN output WITHOUR the PR:

No key is used, instead temporary table with filesort
Example screenshot WITH the PR:

Please note: this is an issue that only affects MariaDB/MySQL servers. Postgres is unaffected as it uses it's own query optimization logic.
Testing Instructions
Actual result BEFORE applying this Pull Request
Expected result AFTER applying this Pull Request
Link to documentations
Please select: