Post

DP12 Imputation using LeftJoin and Coalesce

Description

Another example of imputation of missing values but in this case using the information located in the same data table, for the same MID but in other periods. This was used on demographic values as for example race, DOB, that we can assume don’t change for the same individual. A table without missing values is created using dplyr::group_by. Then the original data is merged with dplyr::left_join(). Finally the replacing value is taken with coalesce.

R Script - Inputation using LeftJoin and Coalesce

Initial data with missing values

12 Result Initial data with missing values

Code to create a new data frame by excluding the rows with missing (NA) values:

1
2
3
4
df %>% dplyr::arrange(Period) %>% 
  filter(!is.na(Type)) %>% select(MID, Type) %>% 
  group_by(MID, Type) %>% tally() %>% 
  arrange(MID, desc(n)) %>% rename(Type_wo_NA = Type)
MIDType_wo_NAn
2051RD3
7004RA3
7443NB3
9265TC3

Table from Data Excluding NAs

Final result

The previous data table is joined with the original data, adding the ‘Type_wo_NA’ field.

To populate the resulting ‘Type2’ field, the function takes the value from the ‘Type’ field unless it’s an NA error. This evaluation is made possible using the ‘Coalesce’ function from the Dplyr package.

1
2
3
4
5
6

# Left Join Table of Unique Type + Coalesce
df <- df %>% dplyr::left_join(df_noNA[,c(1:2)], by = c('MID'))

df$Type2 <- dplyr::coalesce(df$Type, df$Type_wo_NA)

The final results can be seen in the Type2 field.

12 Result Result of the Script

__

End of Post

This post is licensed under CC BY 4.0 by the author.