Closed
Description
Hello, I have a use case in which I want to run xml_find_all
on a nodeset, but have the result be a list of nodesets (one element for each node) instead of one flattened nodeset. In this use case, it's important that I know which result node came from which original node. In my testing it's way faster to have this functionality within the xml2 package, since it can directly call C in the lapply, than to call xml_find_all
in a for or foreach loop. I can imagine a couple ways to do this.
- Extend the
xml_find_all.xml_nodeset
method by adding anunlist
argument. This would break the convention that "xml_find_all
always returns a nodeset", but seems cleaner to me. Something like below:
xml_find_all.xml_nodeset <- function(x, xpath, ns = xml_ns(x), unlist = TRUE) {
if (length(x) == 0)
return(xml_nodeset())
if (isTRUE(unlist)) {
nodes <- unlist(recursive = FALSE,
lapply(x, function(x)
.Call(xpath_search, x$node, x$doc, xpath, ns, Inf)))
nodes <- xml_nodeset(nodes)
} else {
nodes <- lapply(x, function(x)
xml_nodeset(.Call(xpath_search, x$node, x$doc, xpath, ns, Inf)))
}
return(nodes)
}
- Add a separate function called
xml_find_all_list
, something like below:
xml_find_all_list <- function(x, xpath, ns = xml_ns(x)) {
if (length(x) == 0)
return(xml_nodeset())
lapply(x, function(x)
xml_nodeset(.Call(xpath_search, x$node, x$doc, xpath, ns, Inf)))
}
I'm happy to make a PR, just let me know how to proceed. Thanks.
Metadata
Metadata
Assignees
Labels
No labels