Skip to content

[5.4] Improve article list performance for large sites #45542

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: 5.4-dev
Choose a base branch
from

Conversation

SniperSister
Copy link
Contributor

@SniperSister SniperSister commented May 29, 2025

Pull Request for Issue #43701
Based on the PR #44079 by @Fedik , slightly adjusted and ported to 5.4

Summary of Changes

In our company we are supporting a rather large Joomla 5.x client site with more than 50.000 articles that makes use of the workflows. For years, the performance in the backend was perfectly fine, however recently we experienced significant performance issues in the article list views in the frontend but most importantly in the backend.

During my debugging I realized that the main query for the backend article list could use indexes on all relevant joins but doesn't do so. Instead, it creates a temporary table which has a significant performance impact. During further diagnosis it became apparent that the MySQL query optimizer stops using the available keys if a certain number of articles are in the #__content table, which is a document behavior (see https://stackoverflow.com/questions/72807286/index-not-being-utilized-in-mysql-when-larger-dataset) and helpful most of the times. In Joomla's usecase however the performance impact is significant: By enforcing the index usage as suggested in this PR, the query time was reduced from 2.6s down to 48ms.

Example screenshot of an EXPLAIN output WITHOUR the PR:
Bildschirmfoto 2025-05-29 um 21 42 32

No key is used, instead temporary table with filesort

Example screenshot WITH the PR:
Bildschirmfoto 2025-05-29 um 21 43 32

Please note: this is an issue that only affects MariaDB/MySQL servers. Postgres is unaffected as it uses it's own query optimization logic.

Testing Instructions

Actual result BEFORE applying this Pull Request

  • verify that no key is used for the relevant joins as shown in the screenshot above.

Expected result AFTER applying this Pull Request

  • verify that key is used for the relevant joins as shown in the screenshot above.

Link to documentations

Please select:

  • Documentation link for docs.joomla.org:
  • No documentation changes for docs.joomla.org needed
  • Pull Request link for manual.joomla.org:
  • No documentation changes for manual.joomla.org needed

@brianteeman
Copy link
Contributor

A long time ago we had, what sounds like, a very similar issue and in that case although the indexing improved the performance on a large site it had a negative experience impact on sites with a more typical number of items. I am not saying that is the case here just that it needs to be checked

@SniperSister
Copy link
Contributor Author

I reviewed the affected query on small (< 100 articles) and medium size (1200 articles) sites, in both cases the indexes that are enforced by this PR have been in use anyways. Only on large scale size the optimizer decides to skip the index.

@Fedik
Copy link
Member

Fedik commented May 30, 2025

Well, we can do like that also, a bit hacky, but should work.
I thought there is something smarter to trick the optimizer, but did not found anything.

@SniperSister
Copy link
Contributor Author

I thought there is something smarter to trick the optimizer, but did not found anything.

Same for me. I played around with various changes to the indexes, join types and even tested multiple mariadb version - no effect.

@SniperSister
Copy link
Contributor Author

The system test failure is unrelated from this PR and will be fixed in #45547

@Fedik
Copy link
Member

Fedik commented May 31, 2025

I have tested this item ✅ successfully on e55df52


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/45542.

@Fedik
Copy link
Member

Fedik commented May 31, 2025

btw, the query works faster when use filter by category, that is why the frontend does not have the same slow down effect, even though query almost identical.

@richard67
Copy link
Member

@Fedik Have you tested with MySQL? Or MariaDB? Or both?

@Fedik
Copy link
Member

Fedik commented Jun 1, 2025

@richard67 MySQL

@richard67
Copy link
Member

I have tried to reproduce the actual and the expected result on diverse Ubuntu VMs with different MySQL 8.0 versions, but I was not able to get the same results. Unfortunately I don't have MariaDB available for tests.

What I did:

I've made a new installation of a current 5.4-dev. The I have disabled the "Content - Smart Search" and "Behaviour - Versionable" plugins as recommended in the Readme of @Fedik 's plugin https://github.com/Fedik/plg_sampledata_bigdata , and I have enabled Workflows in the integration options of com_content.

Then I've installed an enabled the latest release of that plugin plg_sampledata_bigdata_1.0.0_328f117_20240927_1253.zip .

Then I have run it several times to that I had more than 50k articles and workflow associations on one site, and on another I had more than 30k.

In both cases the same result:

Without the FORCE INDEX I had only one index not being used and not 2 like shown in the screenshot of the actual result.

EXPLAIN SELECT `a`.`id`,`a`.`asset_id`,`a`.`title`,`a`.`alias`,`a`.`checked_out`,`a`.`checked_out_time`,`a`.`catid`,`a`.`state`,`a`.`access`,`a`.`created`,`a`.`created_by`,`a`.`created_by_alias`,`a`.`modified`,`a`.`ordering`,`a`.`featured`,`a`.`language`,`a`.`hits`,`a`.`publish_up`,`a`.`publish_down`,`a`.`introtext`,`a`.`fulltext`,`a`.`note`,`a`.`images`,`a`.`metakey`,`a`.`metadesc`,`a`.`metadata`,`a`.`version`,`fp`.`featured_up`,`fp`.`featured_down`,`l`.`title` AS `language_title`,`l`.`image` AS `language_image`,`uc`.`name` AS `editor`,`ag`.`title` AS `access_level`,`c`.`title` AS `category_title`,`c`.`created_user_id` AS `category_uid`,`c`.`level` AS `category_level`,`c`.`published` AS `category_published`,`parent`.`title` AS `parent_category_title`,`parent`.`id` AS `parent_category_id`,`parent`.`created_user_id` AS `parent_category_uid`,`parent`.`level` AS `parent_category_level`,`ua`.`name` AS `author_name`,`wa`.`stage_id` AS `stage_id`,`ws`.`title` AS `stage_title`,`ws`.`workflow_id` AS `workflow_id`,`w`.`title` AS `workflow_title`
FROM `j3ux0_content` AS `a`
LEFT JOIN `j3ux0_languages` AS `l` ON `l`.`lang_code` = `a`.`language`
LEFT JOIN `j3ux0_content_frontpage` AS `fp` ON `fp`.`content_id` = `a`.`id`
LEFT JOIN `j3ux0_users` AS `uc` ON `uc`.`id` = `a`.`checked_out`
LEFT JOIN `j3ux0_viewlevels` AS `ag` ON `ag`.`id` = `a`.`access`
LEFT JOIN `j3ux0_categories` AS `c` ON `c`.`id` = `a`.`catid`
LEFT JOIN `j3ux0_categories` AS `parent` ON `parent`.`id` = `c`.`parent_id`
LEFT JOIN `j3ux0_users` AS `ua` ON `ua`.`id` = `a`.`created_by`
INNER JOIN `j3ux0_workflow_associations` AS `wa` ON `wa`.`item_id` = `a`.`id`
INNER JOIN `j3ux0_workflow_stages` AS `ws` ON `ws`.`id` = `wa`.`stage_id`
INNER JOIN `j3ux0_workflows` AS `w` ON `w`.`id` = `ws`.`workflow_id`
WHERE `wa`.`extension` = 'com_content.article' AND `a`.`state` IN (1,0)
ORDER BY a.id desc LIMIT 20;

query-1

As you can see, the primary key is used for the workflows table "w", in opposite to the screen shot for the actual result in the description of this PR.

With the FORCE INDEX 2 indexes were not used, and the execution time of the query did not differ much, it was in both cases something between 350 and 390 ms.

EXPLAIN SELECT `a`.`id`,`a`.`asset_id`,`a`.`title`,`a`.`alias`,`a`.`checked_out`,`a`.`checked_out_time`,`a`.`catid`,`a`.`state`,`a`.`access`,`a`.`created`,`a`.`created_by`,`a`.`created_by_alias`,`a`.`modified`,`a`.`ordering`,`a`.`featured`,`a`.`language`,`a`.`hits`,`a`.`publish_up`,`a`.`publish_down`,`a`.`introtext`,`a`.`fulltext`,`a`.`note`,`a`.`images`,`a`.`metakey`,`a`.`metadesc`,`a`.`metadata`,`a`.`version`,`fp`.`featured_up`,`fp`.`featured_down`,`l`.`title` AS `language_title`,`l`.`image` AS `language_image`,`uc`.`name` AS `editor`,`ag`.`title` AS `access_level`,`c`.`title` AS `category_title`,`c`.`created_user_id` AS `category_uid`,`c`.`level` AS `category_level`,`c`.`published` AS `category_published`,`parent`.`title` AS `parent_category_title`,`parent`.`id` AS `parent_category_id`,`parent`.`created_user_id` AS `parent_category_uid`,`parent`.`level` AS `parent_category_level`,`ua`.`name` AS `author_name`,`wa`.`stage_id` AS `stage_id`,`ws`.`title` AS `stage_title`,`ws`.`workflow_id` AS `workflow_id`,`w`.`title` AS `workflow_title`
FROM `j3ux0_content` AS `a`
LEFT JOIN `j3ux0_languages` AS `l` ON `l`.`lang_code` = `a`.`language`
LEFT JOIN `j3ux0_content_frontpage` AS `fp` ON `fp`.`content_id` = `a`.`id`
LEFT JOIN `j3ux0_users` AS `uc` FORCE INDEX(PRIMARY) ON `uc`.`id` = `a`.`checked_out`
LEFT JOIN `j3ux0_viewlevels` AS `ag` ON `ag`.`id` = `a`.`access`
LEFT JOIN `j3ux0_categories` AS `c` ON `c`.`id` = `a`.`catid`
LEFT JOIN `j3ux0_categories` AS `parent` ON `parent`.`id` = `c`.`parent_id`
LEFT JOIN `j3ux0_users` AS `ua` FORCE INDEX(PRIMARY) ON `ua`.`id` = `a`.`created_by`
INNER JOIN `j3ux0_workflow_associations` AS `wa` FORCE INDEX(PRIMARY) ON `wa`.`item_id` = `a`.`id`
INNER JOIN `j3ux0_workflow_stages` AS `ws` FORCE INDEX(PRIMARY) ON `ws`.`id` = `wa`.`stage_id`
INNER JOIN `j3ux0_workflows` AS `w` FORCE INDEX(PRIMARY) ON `w`.`id` = `ws`.`workflow_id`
WHERE `wa`.`extension` = 'com_content.article' AND `a`.`state` IN (1,0)
ORDER BY a.id desc LIMIT 20;

query-2

As you can see, forcing the PRIMARY key makes other keys disappear from the list of possible keys, and in case of the workflow associations "wa" it results in the "idx_extensions" not being used anymore.

The reason for that might be that in my scenario, all categories and all articles belong to the the same stage of the same workflow, and there are no other workflows and workflow stages.

This fits to the first answer to the stackoverflow question here https://stackoverflow.com/questions/945925/mysql-index-being-ignored :

You can also use FORCE INDEX, which acts like USE INDEX (index_list) but with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the given indexes to find rows in the table.

Since you aren't actually "finding" any rows (you are selecting them all), a table scan is always going to be fastest, and the optimizer is smart enough to know that in spite of what you are telling them.

The citation is taken from here: https://dev.mysql.com/doc/refman/8.4/en/index-hints.html

Now if you scroll a bit down on that MySQL documentation, you will find an important note:

Note

MySQL 8.4 supports the index-level optimizer hints JOIN_INDEX, GROUP_INDEX, ORDER_INDEX, and INDEX, which are equivalent to and intended to supersede FORCE INDEX index hints, as well as the NO_JOIN_INDEX, NO_GROUP_INDEX, NO_ORDER_INDEX, and NO_INDEX optimizer hints, which are equivalent to and intended to supersede IGNORE INDEX index hints. Thus, you should expect USE INDEX, FORCE INDEX, and IGNORE INDEX to be deprecated in a future release of MySQL, and at some time thereafter to be removed altogether.

These index-level optimizer hints are supported with both single-table and multi-table DELETE statements.

For more information, see Index-Level Optimizer Hints.

When the "Thus, you should expect USE INDEX, FORCE INDEX, and IGNORE INDEX to be deprecated in a future release of MySQL," will happen, we will have to change our code to be different for MySQL and MariaDB, as the latter seems not to make that change, see https://mariadb.com/kb/en/force-index/ , and optimizer hints seem to work very differently to MySQL, see https://mariadb.com/kb/en/index-hints-how-to-force-query-plans/ .

So I am not sure if this PR here is the way to go.

@SniperSister @Fedik Maybe you can provide more details about your scenario, e.g. how workflows are used? Or could you test my scenario if you get the same results as I did? Maybe something is special with my MySQL installation on Ubuntu (however, I am not aware of having changed anything on the default configuration regarding the opimizer such stuff).

@Fedik
Copy link
Member

Fedik commented Jun 16, 2025

I am testing on local Mysql 8.0.32 without any extra configurations.
Originally I made installation and all dummy content with joomla 5.2 and then updated to 5.4. With Workflow disabled.

My test result is:
~2.5s vs ~1.5s (for full page loading)
~1.6s vs ~1s (only this query)
with this PR and ~44k dummy articles.

My explain before PR:
Screenshot 2025-06-16_11-50-52

After PR:

Screenshot 2025-06-16_11-51-11

I just noticed that this PR is missing FORCE INDEX(idx_langcode) from my original PR. With this I get few ms faster (~980ms).
But I see in your "explain" this index is already used, that is strange, may it be some server configuration difference?
Or maybe newest mysql is smarter?

And I think you have a faster setup 😉

@richard67
Copy link
Member

@Fedik So it seems that for you the PRIMARY key of the "ws" table is also not used with the PR?

@richard67
Copy link
Member

@Fedik And could you post the complete queries without and with the PR? Or are they exactly the same as mine? Mine are with workflows enabled.

@Fedik
Copy link
Member

Fedik commented Jun 16, 2025

Sure,
Without:

SELECT `a`.`id`,`a`.`asset_id`,`a`.`title`,`a`.`alias`,`a`.`checked_out`,`a`.`checked_out_time`,`a`.`catid`,`a`.`state`,`a`.`access`,`a`.`created`,`a`.`created_by`,`a`.`created_by_alias`,`a`.`modified`,`a`.`ordering`,`a`.`featured`,`a`.`language`,`a`.`hits`,`a`.`publish_up`,`a`.`publish_down`,`a`.`introtext`,`a`.`fulltext`,`a`.`note`,`a`.`images`,`a`.`metakey`,`a`.`metadesc`,`a`.`metadata`,`a`.`version`,`fp`.`featured_up`,`fp`.`featured_down`,`l`.`title` AS `language_title`,`l`.`image` AS `language_image`,`uc`.`name` AS `editor`,`ag`.`title` AS `access_level`,`c`.`title` AS `category_title`,`c`.`created_user_id` AS `category_uid`,`c`.`level` AS `category_level`,`c`.`published` AS `category_published`,`parent`.`title` AS `parent_category_title`,`parent`.`id` AS `parent_category_id`,`parent`.`created_user_id` AS `parent_category_uid`,`parent`.`level` AS `parent_category_level`,`ua`.`name` AS `author_name`,`wa`.`stage_id` AS `stage_id`,`ws`.`title` AS `stage_title`,`ws`.`workflow_id` AS `workflow_id`,`w`.`title` AS `workflow_title`
FROM `bd5_content` AS `a`
LEFT JOIN `bd5_languages` AS `l` ON `l`.`lang_code` = `a`.`language`
LEFT JOIN `bd5_content_frontpage` AS `fp` ON `fp`.`content_id` = `a`.`id`
LEFT JOIN `bd5_users` AS `uc` ON `uc`.`id` = `a`.`checked_out`
LEFT JOIN `bd5_viewlevels` AS `ag` ON `ag`.`id` = `a`.`access`
LEFT JOIN `bd5_categories` AS `c` ON `c`.`id` = `a`.`catid`
LEFT JOIN `bd5_categories` AS `parent` ON `parent`.`id` = `c`.`parent_id`
LEFT JOIN `bd5_users` AS `ua` ON `ua`.`id` = `a`.`created_by`
INNER JOIN `bd5_workflow_associations` AS `wa` ON `wa`.`item_id` = `a`.`id`
INNER JOIN `bd5_workflow_stages` AS `ws` ON `ws`.`id` = `wa`.`stage_id`
INNER JOIN `bd5_workflows` AS `w` ON `w`.`id` = `ws`.`workflow_id`
WHERE `wa`.`extension` = 'com_content.article' AND `a`.`state` IN (1,0)
ORDER BY a.id desc LIMIT 20

With:

SELECT `a`.`id`,`a`.`asset_id`,`a`.`title`,`a`.`alias`,`a`.`checked_out`,`a`.`checked_out_time`,`a`.`catid`,`a`.`state`,`a`.`access`,`a`.`created`,`a`.`created_by`,`a`.`created_by_alias`,`a`.`modified`,`a`.`ordering`,`a`.`featured`,`a`.`language`,`a`.`hits`,`a`.`publish_up`,`a`.`publish_down`,`a`.`introtext`,`a`.`fulltext`,`a`.`note`,`a`.`images`,`a`.`metakey`,`a`.`metadesc`,`a`.`metadata`,`a`.`version`,`fp`.`featured_up`,`fp`.`featured_down`,`l`.`title` AS `language_title`,`l`.`image` AS `language_image`,`uc`.`name` AS `editor`,`ag`.`title` AS `access_level`,`c`.`title` AS `category_title`,`c`.`created_user_id` AS `category_uid`,`c`.`level` AS `category_level`,`c`.`published` AS `category_published`,`parent`.`title` AS `parent_category_title`,`parent`.`id` AS `parent_category_id`,`parent`.`created_user_id` AS `parent_category_uid`,`parent`.`level` AS `parent_category_level`,`ua`.`name` AS `author_name`,`wa`.`stage_id` AS `stage_id`,`ws`.`title` AS `stage_title`,`ws`.`workflow_id` AS `workflow_id`,`w`.`title` AS `workflow_title`
FROM `bd5_content` AS `a`
LEFT JOIN `bd5_languages` AS `l` ON `l`.`lang_code` = `a`.`language`
LEFT JOIN `bd5_content_frontpage` AS `fp` ON `fp`.`content_id` = `a`.`id`
LEFT JOIN `bd5_users` AS `uc` FORCE INDEX(PRIMARY) ON `uc`.`id` = `a`.`checked_out`
LEFT JOIN `bd5_viewlevels` AS `ag` ON `ag`.`id` = `a`.`access`
LEFT JOIN `bd5_categories` AS `c` ON `c`.`id` = `a`.`catid`
LEFT JOIN `bd5_categories` AS `parent` ON `parent`.`id` = `c`.`parent_id`
LEFT JOIN `bd5_users` AS `ua` FORCE INDEX(PRIMARY) ON `ua`.`id` = `a`.`created_by`
INNER JOIN `bd5_workflow_associations` AS `wa` FORCE INDEX(PRIMARY) ON `wa`.`item_id` = `a`.`id`
INNER JOIN `bd5_workflow_stages` AS `ws` FORCE INDEX(PRIMARY) ON `ws`.`id` = `wa`.`stage_id`
INNER JOIN `bd5_workflows` AS `w` FORCE INDEX(PRIMARY) ON `w`.`id` = `ws`.`workflow_id`
WHERE `wa`.`extension` = 'com_content.article' AND `a`.`state` IN (1,0)
ORDER BY a.id desc LIMIT 20

The query looks the same to what you have with enabled Workflow.
I just tried to enable workflow, but I did not get any time difference. It stays the same 1.6s vs 1s for the query on the same data. But maybe because I do not have any real workflow in use.

So it seems that for you the PRIMARY key of the "ws" table is also not used with the PR?

yeap

@Fedik
Copy link
Member

Fedik commented Jun 16, 2025

My guess, MySQL optimizer have some kind of internal performance rating for its server, and based on this doing its optimization. Or maybe some memory limit config.
That can explain difference in our results.

@SniperSister
Copy link
Contributor Author

My guess, MySQL optimizer have some kind of internal performance rating for its server, and based on this doing its optimization. Or maybe some memory limit config.

Yep, that's my assumption too. Getting the exact same EXPLAIN results is going to be hard if the query is performed on different machines / setups.

@Fedik
Copy link
Member

Fedik commented Jun 16, 2025

If we could find which configuration could improve performance with large amount of data, we then probably can skip this hacks, and add "mysql configuration hint" in to joomla manual, somewhere.

@Fedik
Copy link
Member

Fedik commented Jun 17, 2025

This is interesting https://stackoverflow.com/questions/50508632/mysql-not-picking-up-the-optimal-index
There also a link to how to trace the optimizer https://oysteing.blogspot.com/2016/01/how-to-get-optimizer-trace-for-query.html

Here is my trace the query optimiser-trace.txt

But hard to read, for example for index idx_langcode it say chosen: true but in the end it still not used, hm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants