Post

DP20 Getting Results from Sparse Matrix

Description

This script is designed to process a large matrix of periods and medical conditions, allowing for the export of summarized values to Power BI for dynamic visualization.

The script includes a function to convert the matrix of periods and conditions into a list format using reshape2::melt. Lastly, it utilizes Dcast or group_by operations to produce the final results.

R Script - Recap of Sparse Matrix

Initial Source. View of the CSV file in Excel.

The following is a preview of the initial CSV data presented as an Excel spreadsheet:

20 Data Source Matrix of Patients, Demographic info, Medical Conditions and Last Date of Visit.

Function to Convert the matrix into a list

Use of data.table::melt reshaping tool.

All the medical conditions fields will be listed in a single column “variable”. The variable “patients” is the total number of enrollees that will serve as the denominator when creating percentage parameters.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Function Melt Each Group of Conditions by Month and Practices
fx_melttables <- function(vectorconditions, dfdata){
  dffinal <- data.frame()
  dfdata <- data.table(dfdata)
  for(i in seq_along(vectorconditions)){
    print(vectorconditions[i])
    melted1 <- data.table::melt(dfdata, id.vars = c("MIDIndex", "Month", "Practice_Name"), 
                    measure.vars = vectorconditions[i]) 
    dffinal <- bind_rows(dffinal, melted1)
  }
  dffinal
}

### Applying the fx and summarizing the results to data 
### "dtRECAP" and a group of medical conditions in vectorcond. 
results <- fx_melttables(vectorcond, dtRECAP)

Melted Data: Head

The resulting data frame has 157,980 rows resulting from 7899 rows from data multiplied by 20 variables.

20 Result of Melt Data

View of some of the 157,980 records.

Melted Data: View of a random sample of 30 results.

Most of the results have a value equal to zero representing the blank cells in the Excel view.

20 Melted Sample 30results

View of some random sample 30 results.

Aggregation of All Variables for Two Practices Example

Using reshape2::dcast

1
2
3
# Aggregation with dplyr::dcast
reshape2::dcast(data = results, formula = variable ~ Practice_Name[c(1,4)], 
                value.var = "value", fun.aggregate = sum)

Result: Data Table - Recap of all variables - Two Practices example

20 Example Line Chart

Aggregation All Variables - Two Practices Example

Aggregation for All variables - All Months

1
2
3
# Aggregation: All variables - All Months
reshape2::dcast(data = results, formula = variable ~ Month, 
                value.var = "value", fun.aggregate = sum)

Result: Data Table - Recap of Patients per Month 20 Example Line Chart

All Variables - All variables ~ Months

Line Plot Patients with Hypertension

This code shows a ggplot geom_line as an example of patients with hypertension by month.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# LinePlot (ggplot2) for one Condition (Hypertension)
dfHypertension <- results %>% filter(variable=="Hypertension") %>% 
  group_by(Month, variable) %>% 
   summarize(ptsHTN = sum(value), .groups = 'drop')

# Basic Line
ln <- ggplot2::ggplot(data=dfHypertension, 
             aes(x=Month, y=ptsHTN, group=1, label=ptsHTN)) +
  geom_line(color="blue",linewidth=1) + 
  geom_point()+ geom_text(nudge_y = 5) + 
  ggtitle("Patients with Hypertesion - All Practices")

# y-axis limits
ln +  ylim(0,110)+
theme_light() + labs(x="Enrollment Month",y="Patients")

Line chart: Patients with Hypertension

20 Example Line Chart

Number of Patients with Hypertension - All Practices.

Aggregation All Variables - Months - Practices

This is the table to export that will serve as a source for a PowerBI dashboard:

1
2
3
totaltable <- results %>% group_by(variable, Month, Practice_Name) %>%
  summarize(Enrollees = sum(value), .groups = 'drop')
print(totaltable[sample(3840,30),],n=30)

Results to export to PowerBI 20 Results PowerBI

All Variables - All Months - All Practices

Data Mining Patients with specific conditions

The next shows the results looking for patients filtered by medical conditions, months, and practice.

1
2
3
4
5
# * 6.1 Data Mining List of Patients with Hypertension
results %>% filter(variable== "Hypertension", 
                   Month== "202105",
                   value == 1, 
                   Practice_Name == "HOKE PRIMARY CARE")

Results: ID of 8 patients 20 Results PowerBI

Detail of 8 patients

__

End of Post

This post is licensed under CC BY 4.0 by the author.