RStudio Connect - Automated Report Failsafe
RStudio Connect
I can’t say enough good things about RStudio Connect! I use it on a daily basis for dashboards, applications, markdown docs, and automated internal and external (client) reports. Most recently, the team also uses it integrated with Python (via reticulate), which comes in handy when an R package has limitations for whatever reason.
This post provides a quick and dirty failsafe that provides reports from sending when a condition is not met as RStudio Connect does not have a built-in solution natively as of yet.
Context: When sending reports to clients, I want to ensure accuracy and data integrity, which means I don’t want a report to send IF there is no data at all or the data have the wrong timestamp for example.
The example below has been working reliably for over a year now. IF a condition is not met, I get an error message from the RStudio Connect server so I can investigate without the report going out to the client.
#Data processing, etc. For the example assume that there is an R dataframe with your data to be #sent to the client.
if(is_empty(your_data_frame) || is_false(max(your_data_frame$date) == Sys.Date() - 1)){
stop("Data error! Investigate!")
} else {
# Further processing
currentDate <- Sys.Date()
# Set up ftp drop
tempdir <- tempdir() # create temporary directory
client_file <- paste(tempdir, "/client_file", currentDate, ".txt.gz",sep = "") # file-path
write.table(your_data_frame, file = gzfile(client_file), sep = "|", row.names = FALSE) # write .txt file
# check file size
file_size <- (file.info(client_file)$size/1024)/1024
paste(round(file_size, 2), "MB")
#Set up ftp drop
handle <- getCurlHandle()
outputFileName <- paste("client_file_", currentDate,".txt.gz", sep="")
ftpUpload(what = client_file, to = outputFileName, userpwd = Sys.getenv("user_pwd"), curl = handle)
}
Quick walk through:
- To set up the conditional logic, I use two functions from the
rlang
package,is_empty
which checks if there is any data at all in my dataframe, andis_false
, which checks if themax(date)
in my dataframe is in factcurrent_date - 1
, which is important when your database is a day behind. - I then use the
stop()
function to kill the script and give me a message. Thus, when the script runs as a scheduled job via RStudio Connect, it stops and sends an error email to my email account. The stop/error message will then be in the script log of the scheduled Rmarkdown. - The
else
part contains all further processing, which runs if conditions are/are not met - depending on your set-up, e.g email send, file upload (like in this case), etc.