Skip to content

Duplicated titles and descriptions in search index for chunks without parent book #159

Closed
@lhsazevedo

Description

@lhsazevedo

Background

In #154, we addressed the issue of missing pages in the search index. The root cause was that some index entries lacked a title property (sdesc). We resolved this by applying the same solution used for the manual content: using the description (ldesc) as the title.

To avoid duplicating text in both the title and description fields, we now pull the description from the parent <book>. For example:

Types / Language Reference

In this example, the title "Type" was taken from the page description, while the new description ("Language Reference") comes from the parent <book>. You can see the implementation here:

if ($index["sdesc"] === "" && $index["ldesc"] !== "") {
$index["sdesc"] = $index["ldesc"];
$parentId = $index['parent_id'];
// isset() to guard against undefined array keys, either for root
// elements (no parent) or in case the index structure is broken.
while (isset($this->indexes[$parentId])) {
$parent = $this->indexes[$parentId];
if ($parent['element'] === 'book') {
$index["ldesc"] = Format::getLongDescription($parent['docbook_id']);
break;
}
$parentId = $parent['parent_id'];
}
}

Issue

Some entries, like extension main pages (e.g. book.strings, book.zip) and top-level pages (e.g. copyright, getting-started, security), don’t have a parent <book>. In these cases, the description is being reused as the title, resulting in duplicate content:

book.strings
book.zip
copyright

Proposed fix

While some entries lack a parent <book>, every entry has at least one parent <set>. The root entry itself is a set called "PHP Manual".

The proposed solution is to fall back to the first <set> in the hierarchy when no <book> is found:

book.strings
book.zip
copyright

I have a working implementation and will submit a PR soon.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions