Abstract: There is disclosed a method of training an RPA robot to use a GUI. The method comprises capturing video of the GUI as an operator uses the GUI to carry out a process; capturing a sequence of events triggered as the operator uses the GUI to carry out said process; and analyzing said video and said sequence of events to thereby generate a workflow. The workflow, when executed by an RPA robot, causes the RPA robot to carry out said process using the GUI.
Field of the invention
5 The present invention relates to systems and methods for robotic process
automation and, in particular, automatic training of robotic process automation
robots.
Background of the invention
1 o Human guided computer processes are ubiquitous across many field of
technology and endeavour. Modern graphical user interfaces (GUis) have proven
invaluable in allowing human operators to use computer systems to carry out
often complex data processing and/or systems control tasks. However, whilst
GUis often allow human operators to quickly become accustomed to performing
15 new tasks, they provide a high barrier to any further automation of tasks.
Traditional workflow automation aims to take tasks usually performed by
operators using GUis and automate them so that a computer system may carry
out the same task without significant re-engineering of the underlying software
being used to perform the task. Initially, this required exposing application
2 o programming interfaces (APis) of the software so that scripts may be manually
devised to execute the required functionality of the software so as to perform the
required task.
Robotic process automation (RPA) systems represent an evolution of this
approach and use software agents (referred to as RPA robots) to interact with
2 5 computer systems via the existing graphical user interfaces (GUis). RPA robots
can then generate the appropriate input commands for the GUI to cause a given
process to be carried out by the computer system. This enables the automation
of processes, turning attended processes into unattended processes. The
advantages of such an approach are multitude and include greater scalability
3 o allowing multiple RPA robots to perform the same task across multiple computer
wo 2021/219234 PCT/EP2020/062199
2
systems, along with a greater repeatability as the possibility for human error in a
given process in reduced or even eliminated.
However, the process of training a RPA robot to perform a particular task
can be cumbersome and requires a human operator to use the RPA system itself
5 to program in the particular process specifically identifying each individual step
using the RPA system. The human operator is also required to identify particular
portions of the GUI to be interacted with, and to build a workflow for the RPA
robot to use.
1 o Summary of the invention
The invention provides a method of training an RPA robot to perform task
using a GUI based on just analysis of video of an operator using the GUI and the
events (or input) triggered by the operator when carrying out the process. In this
way the above problems of the prior art regarding the training of RPA robots may
15 be obviated.
In a first aspect there is provided a method of training an RPA robot (or
script or system) to use a GUI. The method comprises steps of capturing video of
the GUI as an operator (or user) uses the GUI to carry out a process (or task);
capturing a sequence of events triggered as the operator uses the GUI to carry
2 o out said process and analyzing said video and said sequence of events to
thereby generate a workflow. The workflow is such that, when executed by an
RPA robot, causes the RPA robot to carry out said process using the GUI. The
steps of capturing may be carried out by a remote desktop system.
The step of analyzing may further comprise steps of identifying one or
2 5 more interactive elements of the GUI from said video and matching at least one
of the events in the sequence of events as corresponding to a least one of the
interactive elements. An interactive element may be any typical GUI element
such as (but not limited to) a text box, a button, a context menu, a tab, a radio
button (or array thereof), a checkbox (or array thereof) etc. The step of identifying
wo 2021/219234 PCT/EP2020/062199
3
an interactive element may be carried out by applying a trained machine learning
algorithm to at least part of the video.
Identifying an interactive element may comprise identifying positions of
one or more anchor elements in the GUI relative to said interactive element. For
5 example, a machine learning algorithm (such as a graph neural network) may be
used to identify the one or more anchor elements based on one or more predetermined
feature values. Said feature values may also be determined via
training of the machine learning algorithm.
Said feature values may include any one or more of: distance between
1 o elements, orientation of an element; and whether elements are in the same
window.
The sequence of events may comprise any one or more of: a keypress
event; a click event (such as a single click, or multiples thereof); a drag event;
and a gesture event. Inferred events (such as a hoverover event) based on the
15 video may also be included in the sequence of events. Typically, a hover event
may be inferred based on one or more interface elements becoming visible in the
GUI.
The step of analyzing may further comprise identifying a sequence of subprocesses
of said process. In a sequence of sub-processes a process output of
2 o one of the sub-processes of the sequence may be used by the RPA robot as a
process input to another sub-process of the sequence.
The generated workflow may be editable by a user to enable the inclusion
of a portion of a previously generated workflow corresponding to a further subprocess,
such that said edited workflow, when executed by an RPA robot, causes
25 the RPA robot to carry out a version of said process using the GUI, the version of
said process including the further sub-process. The version of said process may
include the further sub-process in place of an existing sub-process of said
process.
In a second aspect there is provided methods of carrying out a process
30 using a GUI using an RPA robot trained by the methods according to the first
wo 2021/219234 PCT/EP2020/062199
4
aspect above. In particular said method may comprise the RPA robot reidentifying
one or more interactive elements in the GUI based on respective
anchor elements specified in a workflow. A machine learning algorithm (such as a
graph neural network), may be used to re-identify the one or more interactive
5 elements based on one or more pre-determined feature values (such as those
determined as part of methods of the first aspect).
There is also provided systems and apparatus arranged to carry out any of
the methods set out above. For example there is provided a system for training
an RPA robot (or script or system) to use a GUI. The system is arranged to
1 o capture video of the GUI as an operator (or user) uses the GUI to carry out a
process (or task) and capture a sequence of events triggered as the operator
uses the GUI to carry out said process. The system further comprises a workflow
generation module arranged to analyze said video and said sequence of events
to thereby generate a workflow.
15 The invention also provides one or more computer programs suitable for
execution by one or more processors such computer program(s) being arranged
to put into effect the methods outlined above and described herein. The invention
also provides one or more computer readable media, and/or data signals carried
over a network, which comprise (or store thereon) such one or more computer
2 o programs.
Brief description of the drawings
Embodiments of the invention will now be described, by way of example
2 5 only, with reference to the accompanying drawings, in which:
Figure 1 schematically illustrates an example of a computer system;
Figure 2 schematically illustrates a system for robotic process automation
(RPA);
Figure 3a is a flow diagram schematically illustrating an example method
3 o for training an RPA robot;
wo 2021/219234 PCT/EP2020/062199
5
Figure 3b is a flow diagram schematically illustrating an example method
of an RPA robot of an RPA system executing a workflow to carry out a process;
Figure 4 schematically illustrates an example workflow analysis module of
an RPA system, such as the RPA system of figure 2;
5 Figure 5 schematically illustrates a computer vision module such as may
be used with the RPA system of figures 2 and 4;
Figure 6 schematically illustrates an action identification module such as
may be used with the RPA system of figures 2 and 4;
Figure 7 schematically illustrates an example of a workflow and an edited
1 o version of the workflow;
Figure 8 schematically illustrates an example execution module of an RPA
system, such as the RPA system described in figure 2.
Figure 9a shows an image from a video of a GUI;
Figure 9b shows a further image from a video of a GUI having undergone
15 a re-identification process.
Detailed description of embodiments of the invention
In the description that follows and in the figures, certain embodiments of
2 o the invention are described. However, it will be appreciated that the invention is
not limited to the embodiments that are described and that some embodiments
may not include all of the features that are described below. It will be evident,
however, that various modifications and changes may be made herein without
departing from the broader spirit and scope of the invention as set forth in the
2 5 appended claims.
Figure 1 schematically illustrates an example of a computer system 1 00.
The system 1 00 comprises a computer 1 02. The computer 1 02 comprises: a
storage medium 1 04, a memory 1 06, a processor 1 08, an interface 11 0, a user
output interface 112, a user input interface 114 and a network interface 116,
3 o which are all linked together over one or more communication buses 118.
wo 2021/219234 PCT/EP2020/062199
6
The storage medium 1 04 may be any form of non-volatile data storage
device such as one or more of a hard disk drive, a magnetic disc, an optical disc,
a ROM, etc. The storage medium 104 may store an operating system for the
processor 108 to execute in order for the computer 102 to function. The storage
5 medium 104 may also store one or more computer programs (or software or
instructions or code).
The memory 106 may be any random access memory (storage unit or
volatile storage medium) suitable for storing data and/or computer programs (or
software or instructions or code).
1 o The processor 1 08 may be any data processing unit suitable for executing
one or more computer programs (such as those stored on the storage medium
104 and/or in the memory 1 06), some of which may be computer programs
according to embodiments of the invention or computer programs that, when
executed by the processor 1 08, cause the processor 1 08 to carry out a method
15 according to an embodiment of the invention and configure the system 100 to be
a system according to an embodiment of the invention. The processor 108 may
comprise a single data processing unit or multiple data processing units operating
in parallel or in cooperation with each other. The processor 1 08, in carrying out
data processing operations for embodiments of the invention, may store data to
2 o and/or read data from the storage medium 1 04 and/or the memory 1 06.
The interface 11 0 may be any unit for providing an interface to a device
122 external to, or removable from, the computer 1 02. The device 122 may be a
data storage device, for example, one or more of an optical disc, a magnetic disc,
a solid-state-storage device, etc. The device 122 may have processing
2 5 capabilities- for example, the device may be a smart card. The interface 110
may therefore access data from, or provide data to, or interface with, the device
122 in accordance with one or more commands that it receives from the
processor 1 08.
The user input interface 114 is arranged to receive input from a user, or
3 o operator, of the system 100. The user may provide this input via one or more
wo 2021/219234 PCT/EP2020/062199
7
input devices of the system 100, such as a mouse (or other pointing device) 126
and/or a keyboard 124, that are connected to, or in communication with, the user
input interface 114. However, it will be appreciated that the user may provide
input to the computer 1 02 via one or more additional or alternative input devices
5 (such as a touch screen). The computer 102 may store the input received from
the input devices via the user input interface 114 in the memory 1 06 for the
processor 1 08 to subsequently access and process, or may pass it straight to the
processor 1 08, so that the processor 1 08 can respond to the user input
accordingly.
1 o The user output interface 112 is arranged to provide a graphical/visual
and/or audio output to a user, or operator, of the system 100. As such, the
processor 1 08 may be arranged to instruct the user output interface 112 to form
an image/video signal representing a desired graphical output, and to provide this
signal to a monitor (or screen or display unit) 120 of the system 100 that is
15 connected to the user output interface 112. Additionally, or alternatively, the
processor 108 may be arranged to instruct the user output interface 112 to form
an audio signal representing a desired audio output, and to provide this signal to
one or more speakers 121 of the system 100 that is connected to the user output
interface 112.
2 o Finally, the network interface 116 provides functionality for the computer
1 02 to download data from and/or upload data to one or more data
communication networks.
It will be appreciated that the architecture of the system 1 00 illustrated in
figure 1 and described above is merely exemplary and that other computer
2 5 systems 1 00 with different architectures (for example with fewer components
than shown in figure 1 or with additional and/or alternative components than
shown in figure 1) may be used in embodiments of the invention. As examples,
the computer system 1 00 could comprise one or more of: a personal computer; a
server computer; a mobile telephone; a tablet; a laptop; a television set; a set top
wo 2021/219234 PCT/EP2020/062199
8
box; a games console; other mobile devices or consumer electronics devices;
etc.
Figure 2 schematically illustrates a system for robotic process automation
(RPA). As depicted in Figure 2, there is a computer system 200 (such as the
5 computer system 100 described above) operated by an operator (or a user) 201.
The computer system 200 is communicatively coupled to an RPA system 230.
The operator 201 interacts with the computer system 200 to cause the
computer system 200 to carry out a process (or function or activity). Typically, the
process carried out on the computer system 200 is carried out by one or more
1 o applications (or programs or other software). Such programs may be carried out
or executed directly on the system 200 or may be carried out elsewhere (such as
on a remote or cloud computing platform) and controlled and/or triggered by the
computer system 200. The operator 201 interacts with the computer system 200
via a graphical user interface (GUI) 210 which displays one or more interactive
15 elements to the operator 201. The operator 201 is able to interact with said
interactive elements via a user input interface of the computer system 200 (such
as the user input interface 114 described above). It will be appreciated that as the
operator 201 interacts with the GUI 210 as displayed to the operator 201 typically
changes to reflect the operator interaction. For example, as the operator inputs
2 o text into a textbox in the GUI 210 the GUI 210 will display the text entered into the
text box. Similarly, as the operator moves a cursor across the GUI 210 using a
pointing device (such as a mouse 126) the pointer is shown as moving in the GUI
210.
The RPA system 230 is arranged to receive video 215 of the GUI 210. The
2 5 video 215 of the GUI 210 shows (or visually depicts or records) the GUI 210
displayed to the operator 201 as the operator 201 uses the GUI 210 to carry out
the process. The RPA system 230 is also arranged to receive (or capture) a
sequence of events 217 triggered in relation to the GUI by the operator using the
GUI to carry out the process. Such events may include individual key presses
3 o made by the operator 201, clicks (or other pointer interaction events) made by the
wo 2021/219234 PCT/EP2020/062199
9
operator 201, events generated by the GUI itself (such as on click events relating
to particular elements, changes of focus of particular windows in the GUI, etc.).
A workflow analysis module 240 of the RPA system 230 is arranged to
analyse the video of the GUI 210 and the sequence of events 217 to thereby
5 generate a workflow (or a script) for carrying out said process using the GUI 210.
Workflows are described in further detail shortly below. However, it will be
appreciated that a workflow 250 typically defines a sequence of interactions (or
actions) with the GUI 210. The interactions may be inputs to be carried out on or
in relation to particular identified elements of the GUI such that when the
1 o sequence of interactions is carried out on the GUI the system 200 on which the
GUI is operating carries out said process. As such a workflow 250 may be
thought of as being (or representing) a set of instructions for carrying out a
process using a GUI.
An execution module 270 of the RPA system 230 is arranged to cause the
15 workflow 250 to be carried out on the respective GU Is 210-1; 21 0-2; ... of one or
more further computer systems 200-1; 200-2; ... In particular, the execution
module 270 is arranged to receive video of the respective GUI 210-1; 21 0-2; ... on
the further computing systems 200-1 ; 200-2; .... The execution module 270 is also
arranged to provide input 275 to the further computer systems 200-1 ; 200-2; ...
2 o emulating input that an operator 201 would provide. By analysing the video of the
respective GUis the execution module is able to identify (or re-identify) the GUI
elements present in the workflow 250 and provide inputs to the further GUis in
accordance with the workflow 250. In this way the execution module may be
considered to be an RPA robot (or software agent) operating a further system
2 5 200-1, via the respective GUI, 210-1 to carry out the process. It will be
appreciated that the further systems 200-1 ; 200-2; ... may be systems such as
system 200 such as the computer system 1 00 described above. Alternatively one
or more of the further computing systems 200-1 ; 200-2; ... may be virtualized
computer systems. It will be appreciated that multiple instances of the execution
30 module 270 (or RPA robot) may be instantiated by the RPA system 230 in
wo 2021/219234 PCT/EP2020/062199
10
parallel (or substantially in parallel) allowing multiple instances of the process to
be carried out substantially at the same time on respective further computing
system 200-1; 200-2; ....
Figure 3a is a flow diagram schematically illustrating an example method
5 300 for training an RPA robot according to the RPA system 230 of figure 2.
At a step 310 video 215 of a GUI 210 as an operator 201 uses the GUI
210 to carry out a process is captured.
At a step 320 a sequence of events 217 triggered as the operator 201
uses the GUI 210 to carry out said process is captured.
1 o At a step 330 a workflow is generated based on the video 215 and the
sequence of events 217. In particular, the video 215 and the sequence of events
217 analyzed to thereby generate the workflow which, when executed by an RPA
robot, causes the RPA robot to carry out said process using the GUI. The video
215 and the sequence of events 217 may be analyzed using one or more trained
15 machine learning algorithms. The step 330 may comprise identifying one or more
interactive elements of the GUI from said video and matching at least one of the
events in the sequence of events as corresponding to a least one of the
interactive elements. In this way the step 330 may comprise identifying a
sequence of interactions for the workflow.
2 o Figure 3b is a flow diagram schematically illustrating an example method
350 of an RPA robot of an RPA system 230 executing a workflow 250 to carry out
a process. The RPA system 230 may be an RPA system 230 according as
described above in relation to figure 2.
At a step 360 video of a GUI 210-1 on a computing system 200-1 is
2 5 received.
At a step 370 video of a GUI210-1 on a computing system 200-1 is
received.
At a step 380 input 275 is provided to the computer system 200-1 based
on the workflow 250. The step 380 may comprise analysing the video of the GUI
30 to re-identify) the GUI elements present in the workflow 250 and provide input to
wo 2021/219234 PCT/EP2020/062199
11
the GUI in accordance with the workflow 250. In this way the step 380 may
operate a further system 200-1, via the GUI to carry out the process.
Figure 4 schematically illustrates an example workflow analysis module of
an RPA system, such as the RPA system 230 described above in relation to
5 figure 2.
The workflow analysis module 240 shown in figure 4 comprises a video
receiver module 41 0, an event receiver module 420, a computer vision module
430, an action identification module 440 and a workflow generation module 450.
Also shown in figure 4 is an operator 201 interacting with a computer system 200
1 o by way of a GUI 210, as described above in relation to figure 2.
CLAIMS
1. A method of training a robotic process automation, RPA, robot to use a
GUI, the method comprising:
capturing video of the GUI as an operator uses the GUI to carry out a
process;
capturing a sequence of events triggered as the operator uses the GUI to
carry out said process;
analyzing said video and said sequence of events to thereby generate a
workflow which, when executed by an RPA robot, causes the RPA robot to carry
out said process using the GUI;
wherein said analyzing further comprises:
identifying an interactive element of the GU I from said video; and
matching at least one of the events in the sequence of events as
corresponding to the interactive element,
wherein identifying an interactive element comprises identifying
positions of one or more anchor elements in the GUI relative to said
interactive element, and wherein the one or more anchor elements are
identified by:
identifying a predetermined number of nearest elements as
anchor elements;
identifying nearest elements in one or more predetermined
directions as anchor elements; or
identifying all elements within a predefined region of the
interactive element as anchor elements.
2. The method of claim 1 wherein identifying an interactive element is carried
out by applying a trained machine learning algorithm to at least part of the video.
27/30
3. The method of any preceding claim wherein a machine learning algorithm
is used to identify the one or more anchor elements based on one or more predetermined
feature values.
4. The method of claim 3 wherein the feature values are determined via
training of the machine learning algorithm.
5. The method of claim 4 wherein the machine learning algorithm comprises
a graph neural network.
6. The method of any one of claims 3 to 5 wherein the feature values include
any one or more of:
distance between interactive elements,
orientation of an interactive element; and
whether interactive elements are in the same window.
7. The method of any preceding claim wherein the sequence of events
comprise any one or more of:
a keypress event;
a hoverover event;
a click event;
a drag event; and
a gesture event.
8. The method of any preceding claim comprising including based on the
video one or more inferred events in the sequence of events.
9. The method of claim 8 wherein a hover event is inferred based on one or
more interface elements becoming visible in the GUI.
1 0. The method of any preceding claim wherein the step of analyzing
comprises:
28/30
identifying a sequence of sub-processes of said process.
11. The method of claim 10 wherein a process output of one of the subprocesses
of the sequence is used by the RPA robot as a process input to
another sub-process of the sequence.
12. The method of claim 10 or claim 11 further comprising editing the
generated workflow to include a portion of a previously generated workflow
corresponding to a further sub-process, such that said edited workflow, when
executed by an RPA robot, causes the RPA robot to carry out a version of said
process using the GUI, the version of said process including the further subprocess.
13. The method of claim 12 wherein the version of said process includes the
further sub-process in place of an existing sub-process of said process.
14. The method of any preceding claim wherein the video and or the
sequence of event are captured using a remote desktop system.
15. A method of carrying out a process using a GUI using an RPA robot
trained by the method according to claim 1, the method further comprising the
RPA robot re-identifying an interactive element in the GUI based on associated
anchor elements in the GUI specified in a workflow,
wherein the one or more anchor elements are:
a predetermined number of nearest elements;
nearest elements in one or more predetermined directions;
or
all elements within a predefined region of the interactive
element.
| # | Name | Date |
|---|---|---|
| 1 | 202217069109-STATEMENT OF UNDERTAKING (FORM 3) [30-11-2022(online)].pdf | 2022-11-30 |
| 2 | 202217069109-PROOF OF RIGHT [30-11-2022(online)].pdf | 2022-11-30 |
| 3 | 202217069109-POWER OF AUTHORITY [30-11-2022(online)].pdf | 2022-11-30 |
| 4 | 202217069109-NOTIFICATION OF INT. APPLN. NO. & FILING DATE (PCT-RO-105-PCT Pamphlet) [30-11-2022(online)].pdf | 2022-11-30 |
| 5 | 202217069109-FORM 1 [30-11-2022(online)].pdf | 2022-11-30 |
| 6 | 202217069109-DRAWINGS [30-11-2022(online)].pdf | 2022-11-30 |
| 7 | 202217069109-DECLARATION OF INVENTORSHIP (FORM 5) [30-11-2022(online)].pdf | 2022-11-30 |
| 8 | 202217069109-COMPLETE SPECIFICATION [30-11-2022(online)].pdf | 2022-11-30 |
| 9 | 202217069109.pdf | 2022-12-01 |
| 10 | 202217069109-GPA-091222.pdf | 2022-12-12 |
| 11 | 202217069109-Correspondence-091222.pdf | 2022-12-12 |
| 12 | 202217069109-FORM 3 [31-05-2023(online)].pdf | 2023-05-31 |
| 13 | 202217069109-FORM 18 [29-04-2024(online)].pdf | 2024-04-29 |
| 14 | 202217069109-FER.pdf | 2025-07-02 |
| 15 | 202217069109-FORM 3 [18-09-2025(online)].pdf | 2025-09-18 |
| 1 | 202217069109_SearchStrategyNew_E_SEARCHREPORTE_24-02-2025.pdf |