classify_visits() categorizes visits by either extracting the visit URL's
domain or host and matching them to a list of domains or hosts;
or by matching a list of regular expressions against the visit URL.
Usage
classify_visits(
wt,
classes,
match_by = "domain",
regex_on = NULL,
return_rows_by = NULL,
return_rows_val = NULL
)Arguments
- wt
webtrack data object.
- classes
a data frame containing classes that can be matched to visits.
- match_by
character. Whether to match list entries from
classesto the domain of a visit ("domain") or the host ("host") with an exact match; or with a regular expression against the whole URL of a visit ("regex"). If set to"domain"or"host", bothwtandclassesneed to have a column called accordingly. If set to"regex", theurlcolumn ofwtwill be used, and you need to setregex_onto the column inclassesfor which to do the pattern matching. Defaults to"domain".- regex_on
character. Column in
classeswhich to use for pattern matching. Defaults toNULL.- return_rows_by
character. A column in
classeson which to subset the returning data. Defaults toNULL.- return_rows_val
character. The value of the columns specified in
return_rows_by, for which data should be returned. For example, if yourclassesdata contains a columntype, which has a value called"shopping", settingreturn_rows_byto"type"andreturn_rows_valto"shopping"will only return visits classified as"shopping".
Value
webtrack data.frame with the same columns as wt and any column
in classes except the column specified by match_by.
Examples
if (FALSE) { # \dontrun{
data("testdt_tracking")
data("domain_list")
wt <- as.wt_dt(testdt_tracking)
# classify visits via domain
wt_domains <- extract_domain(wt)
wt_classes <- classify_visits(wt_domains, classes = domain_list, match_by = "domain")
# classify visits via domain
# for the example, just renaming "domain" column
domain_list$host <- domain_list$domain
wt_hosts <- extract_host(wt)
wt_classes <- classify_visits(wt_hosts, classes = domain_list, match_by = "host")
# classify visits with pattern matching
# for the example, any value in "domain" treated as pattern
data("domain_list")
regex_list <- domain_list[type == "facebook"]
wt_classes <- classify_visits(wt[1:5000],
classes = regex_list,
match_by = "regex", regex_on = "domain"
)
# classify visits via domain and only return class "search"
data("domain_list")
wt_classes <- classify_visits(wt_domains,
classes = domain_list,
match_by = "domain", return_rows_by = "type",
return_rows_val = "search"
)
} # }
