deduplicate() flags, drops or aggregates duplicates, which are defined as
consecutive visits to the same URL within a certain time frame.
Usage
deduplicate(
wt,
method = "aggregate",
within = 1,
duration_var = "duration",
keep_nvisits = FALSE,
same_day = TRUE,
add_grpvars = NULL
)Arguments
- wt
webtrack data object.
- method
character. One of
"aggregate","flag"or"drop". If set to"aggregate", consecutive visits (no matter the time difference) to the same URL are combined and their duration aggregated. In this case, a duration column must be specified via"duration_var". If set to"flag", duplicates within a certain time frame are flagged in a new column calledduplicate. In this case,withinargument must be specified. If set to"drop", duplicates are dropped. Again,withinargument must be specified. Defaults to"aggregate".- within
numeric (seconds). If
methodset to"flag"or"drop", a subsequent visit is only defined as a duplicate when happening within this time difference. Defaults to 1 second.- duration_var
character. Name of duration variable. Defaults to
"duration".- keep_nvisits
boolean. If method set to
"aggregate", this determines whether number of aggregated visits should be kept as variable. Defaults toFALSE.- same_day
boolean. If method set to
"aggregate", determines whether to count visits as consecutive only when on the same day. Defaults toTRUE.- add_grpvars
vector. If method set to
"aggregate", determines whether any additional variables are included in grouping of visits and therefore kept. Defaults toNULL.
Examples
if (FALSE) { # \dontrun{
data("testdt_tracking")
wt <- as.wt_dt(testdt_tracking)
wt <- add_duration(wt, cutoff = 300, replace_by = 300)
# Dropping duplicates with one-second default
wt_dedup <- deduplicate(wt, method = "drop")
# Flagging duplicates with one-second default
wt_dedup <- deduplicate(wt, method = "flag")
# Aggregating duplicates
wt_dedup <- deduplicate(wt[1:1000], method = "aggregate")
# Aggregating duplicates and keeping number of visits for aggregated visits
wt_dedup <- deduplicate(wt[1:1000], method = "aggregate", keep_nvisits = TRUE)
# Aggregating duplicates and keeping "domain" variable despite grouping
wt <- extract_domain(wt)
wt_dedup <- deduplicate(wt, method = "aggregate", add_grpvars = "domain")
} # }
