A Guide For Maintaining RCM Automations
Welcome back to Tarpon Health's Automation Lifecycle series, where we share best practices for developing Revenue Cycle Management (RCM) automations. This blog delves into the best practices for creating efficient and reliable automations in the fifth and final phase of The Automation Lifecycle: Maintain.
At the heart of automation lies the promise of achieving more with less. However, as organizations embrace automation at scale, a maintenance strategy is required. Despite its importance, maintenance often remains overlooked in the excitement of deploying new automations. Yet, it's critical to program success: the journey doesn't end once an automation goes live.
Beyond the initial deployment, continuous monitoring, logic adjustments, troubleshooting, and enhancements are essential to keep automations running smoothly and adapting to evolving digital landscapes. While this may initially seem daunting, with the right tools and processes, maintenance becomes not only manageable but routine.
Why it's important
Regular maintenance ensures automations are reliable and efficient and that they meet the needs of the business.
As automation programs mature, they will require more maintenance.
A good maintenance process can help to mitigate this, freeing up developers to focus on building new automations.
Automation solutions can become outdated with the development of new technologies and systems.
By conducting regular maintenance, you can extend the life of your automation solutions and make them more future-proof.
What steps are involved?
Ensuring the smooth operation and longevity of Robotic Process Automation (RPA) solutions hinges on collaboration between the operations and automation teams. When an automation undergoes maintenance, its unavailability to operations necessitates a coordinated effort to mitigate disruption. The operations team must have a plan to manage accounts manually while the automation team rectifies any issues. Moreover, the expertise of the operations team is often indispensable; the automation team frequently seeks their input to verify changes and ensure accuracy before implementation.
Here is a breakdown of the main activities that are involved in maintaining an automation solution.
Documentation: Thorough documentation throughout the automation design, build, test, and deploy phases is needed to facilitate understanding and future maintenance by other team members. Clear, concise documentation ensures continuity and minimizes downtime during transitions or updates.
Automation Metrics: Establishing metrics to gauge automation efficacy is essential for monitoring performance and identifying areas for improvement. These metrics serve as benchmarks for evaluating the impact and efficiency of automation initiatives.
Ticketing: Integrating the automation into the ticketing and support system streamlines communication and facilitates issue resolution. This integration ensures that any maintenance or downtime is communicated effectively and resolved efficiently.
Downtime: Developing processes to manage both planned and unplanned downtime is critical for minimizing disruption to operations. By implementing proactive measures and downtime plans, organizations can mitigate the impact of downtime on productivity and service delivery.
We explore each activity in more detail below.
Documentation
As a general principle, organizations should approach documentation with the mindset that the original team responsible for building the automation may not be available for its maintenance in the future. Just as the digital landscape undergoes frequent changes, so too does the composition of your team.
Documentation comes in various forms, but two components stand out: design documentation and the ReadMe.
Design documentation: Design documentation, typically overseen by the solution architect, the individual designing the automation, serves as the blueprint guiding developers during the initial construction of the automation. Unfortunately, organizations often neglect to maintain this documentation beyond the initial build phase—a critical oversight. This documentation encapsulates crucial elements such as how data inputs drive automation steps, the business logic governing decision-making processes, and key components that address the underlying business challenge. Whenever developers make updates to the code, it is imperative for the solution architect to promptly revise the design documentation to reflect these changes. This ensures that future developers have an accurate roadmap for understanding and modifying the automation as needed.
ReadMe: A ReadMe document, authored by the development team for each object or node within the automation, serves as a vital resource for understanding automation intricacies. The ReadMe should encompass a range of information, including any issues encountered during the build process, testing directions, and significant decisions made regarding architecture and logic. By meticulously documenting these details, future developers—particularly those who were not involved in the original build—can effectively troubleshoot the automation when maintenance is required.
By prioritizing comprehensive design documentation and detailed ReadMe documents, organizations can fortify their automation initiatives with the necessary resources for long-term sustainability and adaptability.
Metrics
Organizations will need to monitor the health of their automations through a comprehensive set of metrics. After all, as the adage goes, "one cannot address what is not measured."
Below are some of the metrics that enable teams to assess the health of their automations:
Skip rate
This metric quantifies the percentage of skipped accounts relative to the total number of queued accounts. A skip occurs when the automation opts not to process an account due to insufficient information. Monitoring the skip rate provides visibility into the prevalence of incomplete or inadequate data, which may necessitate adjustments to data sourcing or preprocessing workflows.
Error rate
The error rate metric gauges the percentage of unsuccessful account processes compared to the total number of accounts processed. An account is considered processed if the automation initiates its handling, excluding instances where accounts are skipped due to data deficiencies. Tracking the error rate helps pinpoint areas of workflow instability or logic errors, facilitating targeted troubleshooting and refinement efforts.
Execution time
This metric evaluates the average and median duration required to process an account from initiation to completion. By monitoring execution time, teams can identify bottlenecks or inefficiencies within the automation workflow, guiding optimization initiatives to streamline processing and enhance overall efficiency.
Queue consumption rate
Queue consumption rate measures the percentage of accounts processed out of the total number of accounts queued. This measures an automations ability to consume its anticipated queue/account volume within its designated run time. Adjustments will need to be made to the automation’s execution time or to the designated run time if the automation is not consistently processing all the queued accounts.
Error free days
Error-free days measure the percentage of operational days during which the automation functions within an acceptable error rate threshold, such as 5%. This metric provides an overarching view of the automation's reliability and consistency over time, highlighting periods of sustained performance and potential trends in error occurrence.
Ticketing
Incorporating automations into existing IT ticketing systems streamlines tracking and reporting of issues, monitoring resolution time, and facilitates coordination across departments. Teams can more effectively manage maintenance tasks when the ticketing system is consistently used for logging and tracking all automation-related issues. Furthermore, by capturing pertinent data within the ticketing system, organizations can identify recurring issues, trends, and optimization opportunities. This data-driven approach enables continuous improvement and refinement of automation processes, ultimately enhancing performance.
Ticketing systems also play an important role in communication and professionalism surrounding automation programs. By organizing issues based on priority and resolution time, ticketing systems enable teams to make informed decisions and allocate resources effectively. For instance, the ability to categorize issues by priority allows teams to discern urgent tasks requiring immediate attention from those that can be addressed later. Moreover, ticketing systems enable teams to navigate trade-offs effectively. For example, the automation team may temporarily pause new-build work to address a high-priority maintenance issue causing an unmanageable backlog for the operations team. By providing visibility into task dependencies and resource constraints, ticketing systems empower teams to make informed decisions and optimize workflows in line with organizational priorities.
Downtime
Organizations need to prepare for both anticipated and unanticipated downtime. Anticipated downtime includes events such as EMR and major application updates, while unanticipated downtime encompasses occurrences like a graphical user interface change on a website. In both cases, the automation team will need to inform the operations team that the automation will be unavailable, triage the issue, and fix the underlying problem.
Below is a list of activities that every organization should deploy regardless of whether the downtime is anticipated or unanticipated.
Our number one rule: Automation teams should set expectations with operational teams that automations will experience downtime, and the process being automated remains their responsibility. Automations should not be built without this alignment.
Workqueue access: We recommend separating accounts that humans will work on from those that robots will handle. However, operational teams should have access to the automation work queues so that no administrative work is required for operational teams to intervene when necessary.
Automated emails: Automation teams should establish automated emails to operations triggered when the automation fails to run.
Triage system: Automation teams should establish a triage level rating for the automation based on how long it will take to fix the issue. Operational leaders are responsible for writing clear downtime procedures that align with the triage level assigned by the automation team.
Documentation: Downtime plans should be documented in a place accessible to both automation and operational teams, such as an intranet site.
For known upgrades and enhancements, organizations should take additional steps and preventative measures. As a general policy, the automation team should block off the two weeks before and after an application/EMR upgrade to solely focus on reducing downtime. That means that new builds, enhancements, etc., should be deprioritized.
Additionally, have a representative from the automation team join application/EMR meetings to learn about the specifics of the forthcoming upgrade/enhancement. Having access to the release notes is also important. To the extent possible, get your automation team early access to the application/EMR test environment. This will allow the team to view, test, and understand (with precision) how upgrades will impact your automations as soon as possible.
The maintenance phase is the last and most overlooked phase in the Automation Lifecycle. There are many tools, processes, and responsibilities that are necessary to keep your automations running smoothly. By following the advice and tactics, any organization can limit downtime, stay coordinated, and sustain the automation. The results will follow.