Description
The behavior of xml_url
appears to have changed pretty drastically in v1.3.0.
Previously, passing base_url
via read_html
resulted in it setting the url returned by xml_url
These are deeply simplified examples but highlight the change in behavior. I can work around the change from "NA" to "<CHARSXP: NA>", but the loss of the xml_url is a big loss which will require a bit of re-architecting.
This breaks edgarWebR (currently off CRAN due to vignettes making remote API calls)
Using string input
Example 1
require(xml2)
doc <- read_html("<html/>", base_url = "http://test.com")
xml_url(doc)
On v1.2.5 the output was "http://test.com"
On v1.3.1 the output is "UTF-8"
Example 2
require(xml2)
doc <- read_html("<html/>")
xml_url(doc)
On v1.2.5 the output was "NA"
Ov v1.3.0 the output is "UTF-8"
Using httr response
Example 3
require(xml2)
require(httr)
href <- "https://www.sec.gov/cgi-bin/cik_lookup?company=cloudera"
res <- GET(href)
doc <- read_html(res, base_url = href)
xml_url(doc)
On v1.2.5 the output was "https://www.sec.gov/cgi-bin/cik_lookup?company=cloudera"
Ov v1.3.0 the output is "<CHARSXP: NA>"
Example 4
require(xml2)
require(httr)
href <- "https://www.sec.gov/cgi-bin/cik_lookup?company=cloudera"
res <- GET(href)
doc <- read_html(res)
xml_url(doc)
On v1.2.5 the output was "NA"
Ov v1.3.0 the output is "<CHARSXP: NA>"