parse_path()
parses parts of a path, i.e., anything separated by
"/", "-", "_" or ".", and adds them as a new variable. Parts that do not
consist of letters only, or of a real word, can be filtered via the argument keep
.
Arguments
- wt
webtrack data object
- varname
character. name of the column from which to extract the host. Defaults to
"url"
.- keep
character. Defines which types of path components to keep. If set to
"all"
, anything is kept. If"letters_only"
, only parts containing letters are kept. If"words_only"
, only parts constituting English words (as defined by the Word Game Dictionary, cf. https://cran.r-project.org/web/packages/words/index.html) are kept. Support for more languages will be added in future.- decode
logical. Whether to decode the path (see
utils::URLdecode()
), default to TRUE