Use import()
to import a list of data frames from a vector of file names or from a multi-object file (Excel workbook, .Rdata file, compressed directory in a zip file or tar archive, or HTML file)
Usage
import_list(
file,
setclass = getOption("rio.import.class", "data.frame"),
which,
rbind = FALSE,
rbind_label = "_file",
rbind_fill = TRUE,
...
)
Arguments
- file
A character string containing a single file name for a multi-object file (e.g., Excel workbook, zip file, tar archive, or HTML file), or a vector of file paths for multiple files to be imported.
- setclass
An optional character vector specifying one or more classes to set on the import. By default, the return object is always a “data.frame”. Allowed values include “tbl_df”, “tbl”, or “tibble” (if using tibble), “arrow”, “arrow_table” (if using arrow table; the suggested package
arrow
must be installed) or “data.table” (if using data.table). Other values are ignored, such that a data.frame is returned. The parameter takes precedents over parameters in ... which set a different class.- which
If
file
is a single file path, this specifies which objects should be extracted (passed toimport()
'swhich
argument). Ignored otherwise.- rbind
A logical indicating whether to pass the import list of data frames through
data.table::rbindlist()
.- rbind_label
If
rbind = TRUE
, a character string specifying the name of a column to add to the data frame indicating its source file.- rbind_fill
If
rbind = TRUE
, a logical indicating whether to set thefill = TRUE
(and fill missing columns withNA
).- ...
Additional arguments passed to
import()
. Behavior may be unexpected if files are of different formats.
Value
If rbind=FALSE
(the default), a list of a data frames. Otherwise, that list is passed to data.table::rbindlist()
with fill = TRUE
and returns a data frame object of class set by the setclass
argument; if this operation fails, the list is returned.
Details
When file is a vector of file paths and any files are missing, those files are ignored (with warnings) and this function will not raise any error. For compressed files, the file name must also contain information about the file format of all compressed files, e.g. files.csv.zip
for this function to work.
Trust
For serialization formats (.R, .RDS, and .RData), please note that you should only load these files from trusted sources. It is because these formats are not necessarily for storing rectangular data and can also be used to store many things, e.g. code. Importing these files could lead to arbitary code execution. Please read the security principles by the R Project (Plummer, 2024). When importing these files via rio
, you should affirm that you trust these files, i.e. trust = TRUE
. See example below. If this affirmation is missing, the current version assumes trust
to be true for backward compatibility and a deprecation notice will be printed. In the next major release (2.0.0), you must explicitly affirm your trust when importing these files.
Which
For compressed archives (zip and tar, where a compressed file can contain multiple files), it is possible to come to a situation where the parameter which
is used twice to indicate two different concepts. For example, it is unclear for .xlsx.zip
whether which
refers to the selection of an exact file in the archive or the selection of an exact sheet in the decompressed Excel file. In these cases, rio
assumes that which
is only used for the selection of file. After the selection of file with which
, rio
will return the first item, e.g. the first sheet.
Please note, however, .gz
and .bz2
(e.g. .xlsx.gz
) are compressed, but not archive format. In those cases, which
is used the same way as the non-compressed format, e.g. selection of sheet for Excel.
References
Plummer, M (2024). Statement on CVE-2024-27322. https://blog.r-project.org/2024/05/10/statement-on-cve-2024-27322/
Examples
## For demo, a temp. file path is created with the file extension .xlsx
xlsx_file <- tempfile(fileext = ".xlsx")
export(
list(
mtcars1 = mtcars[1:10, ],
mtcars2 = mtcars[11:20, ],
mtcars3 = mtcars[21:32, ]
),
xlsx_file
)
# import a single file from multi-object workbook
import(xlsx_file, sheet = "mtcars1")
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#> 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#> 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
# import all worksheets, the return value is a list
import_list(xlsx_file)
#> $mtcars1
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#> 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#> 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#>
#> $mtcars2
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#> 2 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#> 3 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#> 4 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#> 5 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#> 6 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#> 7 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#> 8 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> 9 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#> 10 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#>
#> $mtcars3
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#> 2 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#> 3 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#> 4 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#> 5 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#> 6 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#> 7 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
#> 8 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#> 9 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#> 10 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#> 11 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
#> 12 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#>
# import and rbind all worksheets, the return value is a data frame
import_list(xlsx_file, rbind = TRUE)
#> mpg cyl disp hp drat wt qsec vs am gear carb _file
#> 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 1
#> 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 1
#> 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 1
#> 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 1
#> 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 1
#> 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 1
#> 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 1
#> 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 1
#> 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 1
#> 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 1
#> 11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 2
#> 12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 2
#> 13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 2
#> 14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 2
#> 15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 2
#> 16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 2
#> 17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 2
#> 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 2
#> 19 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 2
#> 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 2
#> 21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 3
#> 22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 3
#> 23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 3
#> 24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 3
#> 25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 3
#> 26 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 3
#> 27 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 3
#> 28 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 3
#> 29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 3
#> 30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 3
#> 31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 3
#> 32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 3