Description
Background
In #154, we addressed the issue of missing pages in the search index. The root cause was that some index entries lacked a title property (sdesc
). We resolved this by applying the same solution used for the manual content: using the description (ldesc
) as the title.
To avoid duplicating text in both the title and description fields, we now pull the description from the parent <book>
. For example:
In this example, the title "Type" was taken from the page description, while the new description ("Language Reference") comes from the parent <book>
. You can see the implementation here:
phd/phpdotnet/phd/Package/PHP/Web.php
Lines 244 to 258 in 673b2da
Issue
Some entries, like extension main pages (e.g. book.strings
, book.zip
) and top-level pages (e.g. copyright
, getting-started
, security
), don’t have a parent <book>
. In these cases, the description is being reused as the title, resulting in duplicate content:
Proposed fix
While some entries lack a parent <book>
, every entry has at least one parent <set>
. The root entry itself is a set called "PHP Manual".
The proposed solution is to fall back to the first <set>
in the hierarchy when no <book>
is found:
I have a working implementation and will submit a PR soon.