Skip to contents

Parse search engine results

Usage

parse_search_results(path, engine, selectors = "latest")

Arguments

path

character. either a path to a file that contains search results or a path to a directory containing search engine result files

engine

character.

selectors

either character or a webbot_selectors S3 object. For character, it represents the selectors version and valid choices are listed in selectors_versions and "latest" (select the latest version). You can also supply your own webbot_selectors object.

Value

a tibble of parsed search engine results

Examples

search_html <- system.file(
    "www.google.com_climatechange_text_2023-03-16_08_16_11.html",
    package = "webbotparseR"
)

parse_search_results(search_html, engine = "google text", selectors = "ver1")
#> # A tibble: 10 × 10
#>    title              link  text  image page  position search_engine type  query
#>    <chr>              <chr> <chr> <chr> <chr>    <int> <chr>         <chr> <chr>
#>  1 What Is Climate C… http… Clim… data… 1            1 www.google.c… text  clim…
#>  2 Home – Climate Ch… http… Vita… data… 1            2 www.google.c… text  clim…
#>  3 Vital Signs of th… http… “Cli… data… 1            3 www.google.c… text  clim…
#>  4 Climate change - … http… In c… data… 1            4 www.google.c… text  clim…
#>  5 IPCC — Intergover… http… The … data… 1            5 www.google.c… text  clim…
#>  6 Climate Change | … http… Comp… data… 1            6 www.google.c… text  clim…
#>  7 Climate change: e… http… Clim… NA    1            7 www.google.c… text  clim…
#>  8 UNFCCC             http… What… data… 1            8 www.google.c… text  clim…
#>  9 Climate Change - … http… Clim… data… 1            9 www.google.c… text  clim…
#> 10 Causes of climate… http… This… data… 1           10 www.google.c… text  clim…
#> # ℹ 1 more variable: date <dttm>