Post

DP02 Stacking Excel with Mutate Cross

Description

This post depicts situations when data is collected in separate fragments, such as different time periods, places, or other subsets, and needs to be consolidated into a unified table for comprehensive analysis. This process is also known as “stacking” tables. Combining these individual data parts is essential to perform in-depth data processing and obtain insights that encompass the entirety of the original files.

The script automates the stacking process, resulting in significant time savings and error prevention, particularly when managing a substantial number of files.

The provided R script begins by collecting Excel files with the same structure (fields) from a specific folder. The script also enhances data integrity by modifying column names and introducing a “PPL” column that indicates the period for each record. The process concludes by exporting the consolidated dataset for further utilization in a CSV format.

R Script - Merge Excel Files with Mutate Across

Graphical Description of the Process


flowchart LR
Excel_File_01-- Stacking -->id1((R_Script))
Excel_File_02-- Stacking -->id1((R_Script))
Excel_File_03-- Stacking -->id1((R_Script))
Excel_File_N-- Stacking  -->id1((R_Script))
id1((R_Script))-->Resulting_DataFrame_CSV_file

Merging Excel files with equal structure with mutate across

In some situations when the data is imported, and the fields contain no values, R mistakenly assumes that they are of ‘logical’ type instead of ‘character’, causing a conflict when merging the files. To avoid this, the ‘where - across’ function is used to check all the fields and convert any logical types into character types. The result is a vector containing all the types.

Create a vector with the field types.

1
2
3
4
5
6
7
8
vectortypes <- read_excel(filenames_list[1]) %>% 
  mutate(across(where(is.logical), as.character)) %>%   # as.character
  summarise_all(class) %>% slice(1) %>% unlist(., use.names=FALSE)
vectortypes
# Convert character and logical to text
vectortypes[vectortypes %in% c("character", "logical")] <- "text"
#vectortypes[vectortypes == "character" | vectortypes == "logical"] <- "text"  # Option
vectortypes

Function Definition

The vector of types is applied in the function.

1
2
3
4
5
6
7
8
9
10
11

fx_readfiles <- function(filename){
  print(paste("Merging",filename,sep = " "))
  xlfile <- readxl::read_excel(filename, col_types = vectortypes)
  print(paste("Number of records in ",
              filename," = ",nrow(xlfile),
              " ; columns = ",length(xlfile),sep=""))
  dfXLfile <- data.frame(xlfile)
  dfXLfile 
  }

__

End of Post

This post is licensed under CC BY 4.0 by the author.