Parse search engine results
Arguments
- path
character. either a path to a file that contains search results or a path to a directory containing search engine result files
- engine
character.
- selectors
either character or a
webbot_selectors
S3 object. For character, it represents the selectors version and valid choices are listed inselectors_versions
and "latest" (select the latest version). You can also supply your ownwebbot_selectors
object.
Examples
search_html <- system.file(
"www.google.com_climatechange_text_2023-03-16_08_16_11.html",
package = "webbotparseR"
)
parse_search_results(search_html, engine = "google text", selectors = "ver1")
#> # A tibble: 10 × 10
#> title link text image page position search_engine type query
#> <chr> <chr> <chr> <chr> <chr> <int> <chr> <chr> <chr>
#> 1 What Is Climate C… http… Clim… data… 1 1 www.google.c… text clim…
#> 2 Home – Climate Ch… http… Vita… data… 1 2 www.google.c… text clim…
#> 3 Vital Signs of th… http… “Cli… data… 1 3 www.google.c… text clim…
#> 4 Climate change - … http… In c… data… 1 4 www.google.c… text clim…
#> 5 IPCC — Intergover… http… The … data… 1 5 www.google.c… text clim…
#> 6 Climate Change | … http… Comp… data… 1 6 www.google.c… text clim…
#> 7 Climate change: e… http… Clim… NA 1 7 www.google.c… text clim…
#> 8 UNFCCC http… What… data… 1 8 www.google.c… text clim…
#> 9 Climate Change - … http… Clim… data… 1 9 www.google.c… text clim…
#> 10 Causes of climate… http… This… data… 1 10 www.google.c… text clim…
#> # ℹ 1 more variable: date <dttm>