Abstract: In one implementation, the present invention provides a system and method for scalable storage by consistently resolving one database server among the cluster of independent database servers by applying the configured database server selection rule, such as hashing the PROCESSINSTANCEID, choosing the data source/connection corresponding to the database server that is resolved out of pool of data sources/connections, each connected with each of the database servers, and performing the Create, Retrieve, Update, Delete (CRUD) operations with a single transaction created over the resolved data source/connection. Further, in order to be able to perform CRUD operations involving the process entities those are related to the specific process instance in hand, all the related process entities are ensured to be present in the same database server to which the resolved data source/connection is connected to. (TO BE PUBLISHED WITH FIGURE 3&4)
Claims:
1. A system (500) for storing a relational data, the system comprising:
a processor (502);
a memory (506) coupled to the processor for executing a plurality of modules present in the memory, the plurality of modules comprising:
an shard identification module (508) configured to identify a plurality of database shards operating on a plurality of database servers;
a query module (510) configured to receive the data in a form of at least one process instance having at least one process entity associated thereto from at least one client device;
an identification generator (512) configured to generate at least one process instance identification (PID) associated with the process instance received;
an execution engine configured to execute the process entity associated with the process instance;
an shard id embed module configured to embed the process instance identification (PID) generated in the process entity executed;
a storage module configured to store the data associate with the PID in at least one database shard identified, wherein the database shard selected from a plurality of database shards and is identified, by the shard identification module, by applying at least one sharding rule based on the PID.
2. The system as claimed in claim 1, wherein the identification generator is further configured to generate at least one process entity identification for the process entity.
3. The system as claimed in claim 1, wherein the data having the process instance with the PID and the process entity identifications with the PID embedded therein are stored in a single database shard.
4. The system as claimed in claim 3 and 4, wherein the identification generator is further configured to link all the process entity identification having common PID.
5. The system as claimed in claim 1, wherein the database shard is selected based on at least one sharding algorithm on the PID.
6. The system as claimed in claim 1, wherein the process entity is linked to the process instance by embedding the PID and a separator.
7. The system as claimed in claim 1, wherein the before applying the sharding rule the PID is extracted using a ShardingKeyFinder mechanism, the ShardingKeyFinder mechanism enables to find a sharding key (PID) from the given ID.
8. The system as claimed in claim 1, wherein the sharding rule preferably comprises hash value of the PID to identify the data source.
9. The system as claimed in claim 1, wherein the sharding rule is configurable and re-configurable.
10. The system as claimed in claim 1 comprises at least one database shard selected from the plurality of database shards storing a runtime data associated with the data, and at least one database shard selected from the plurality of database shards storing a static and runtime data.
11. A method (700) for storing a relational data, the method comprising:
identifying (702) a plurality of database shards operating on a plurality of database servers;
receiving (704), from at least one client device, the data in a form of at least one process instance having at least one process entity associated thereto;
generating (706) at least one process instance identification (PID) associated with the process instance received;
executing (708) the process entity associated with the process instance;
embedding (710) the process instance identification (PID) generated in the process entity executed;
identifying (712), by applying at least one sharding rule based on the PID, at least one database shard selected from a plurality of database shards;
storing (714) the data associate with the PID in the database shard identified.
12. The method as claimed in 11, further comprises, generating at least one process entity identification for the process entity.
13. The method as claimed in 11, further comprises, storing the data having the process instance with the PID and the process entity identifications with the PID embedded therein in a single database shard.
14. The method as claimed in 12 and 13, further comprises, linking all the process entity identification having common PID.
15. The method as claimed in 11, further comprises, selecting the database shard based on at least one sharding algorithm on the PID.
16. The method as claimed in 11, further comprises, extracting the PID before applying the sharding rule using a ShardingKeyFinder mechanisms the ShardingKeyFinder mechanism enables to find a sharding key (PID) from the given ID.
17. The method as claimed in 11, wherein the sharding rule preferably comprises hash value of the PID to identify the data source.
18. The method as claimed in 11, wherein the sharding rule is configurable and re-configurable.
19. The method as claimed in 11, further comprises, linking the process entity to the process instance by embedding the PID and a separator.
, Description:
TECHNICAL FIELD
The present subject matter described herein, in general, relates to data optimization and databases, and more particularly, to BPM systems and methods for scalable storage of relational process data in databases.
BACKGROUND
Business process management (BPM) is a field in operations management that focuses on improving corporate performance by managing and optimizing a company's business processes. It can therefore be described as a "process optimization process." The BPM may be achieved using a specific BPM platform implementation. BPM places a significant emphasis on business processes within the enterprise both in terms of streamlining process logic to improve efficiency and also to establish processes that are adaptable and extensible so that they can be augmented in response to business change. The business process layer represents a core part of any service-oriented architecture. From a composition perspective, it usually assumes the role of the parent service composition controller.
Business processes are visually modeled using modeling tools and represented preferably in extensible markup language (XML) file complying with BPMN 2.0 schema. When the processes (XML files) are deployed into BPM, they are stored as process definitions (PDs) into database and managed by the BPM. The process definitions and set of configuration data are considered as static data, which does not grow or does not get modified very frequently at runtime.
Process Instances (PIs) are created and orchestrated following the process definitions as the blue-prints. As well known in the art, the process instance represents one specific instance of a process that is currently executing. Whenever a process is started, a process instance is created that represents that specific instance that was started. It contains all runtime information related to that instance. Multiple process instances of the same process can be executed simultaneously. For example, consider a process definition that describes how to process a purchase order. Whenever a new purchase order comes in, a new process instance will be created for that purchase order. Multiple process instances (one for each purchase order) can coexist. A process instance is uniquely identified by an id. This ID generation can be customized by applications.
When the process instances are created and started, the BPM inserts data corresponding to the Process Instances and associated process inputs/variables and other related data into database. While the process instances are executed, their state data, execution traces, and other relevant runtime data are also persisted or updated into database (DB).
For example, a simple leave management process is as shown in figure 1. In this sample example, the “business process” refers to the entire vacation request workflow, beginning when an employee asks for vacation, and ending with the approval and reporting of that vacation. Consequently, the term “process instance” refers to that employee’s single request for a leave of absence, and “case management” would refer to the management of each vacation request. When an employee makes a new vacation request, that request generates a new case (process instance) in the BPM system that subsequently moves through the workflow (business process) according to the workflow design.
Here, as shown in the figure 2, the BPM may require to save two types of data i.e., the static data and the dynamic (runtime) data. The static data may include but not limited to process definition (PD), configurations, cluster node details, organization details like user, role, group, etc. The runtime data may include but not limited to a process instance (PI) details such as PIs current state, process variables/inputs/properties, task /activity variables/inputs/properties, process execution trace records, human task information, Timer(s) (deadline /alarm), etc.
When the BPM is running or being executed, the static data will be almost fixed and growth of these data will be minimal. But the runtime data will be more and for each process instance created, data will keep accumulating and will add overhead to the DB operations. The data pertaining to PI and the related entities are of relational natured i.e., they form a parent-child hierarchical relation among them. This means that the related process entities (Data) need to be collocated for performing more useful relational DB queries and updates. Application may perform the query on runtime data based on various process entities, i.e., based on process instance (the top most parent entity), or human task or variable or any other children entities, each of them is uniquely identified by its own primary key, and entities pertaining to a single process instance are related to one another by foreign key relationship among them.
For example, an application may perform a query that involves access to multiple tables (each representing an entity in hierarchical process entities) in a DB, such as, display the number of days and reason for this leave application. This may require the BPM to accumulate data from various tables. This way of relating the related data and querying them is conventionally used, and it is performed well when the entire process data could be stored and managed with single database. In large scale deployments, wherein the data growth is very rapid and all the data cannot be accommodated in single database storage, a scalable and economic solution that performs equally well when compared with the performance of standalone database system (non-distributed) is required.
Conventionally and currently, in order to solve the above issues, the data is stored in a centralized database and the BPM cluster nodes access the data from this centralized database. The users/the applications can query any type of process data from the appropriate table by knowing a PROCESSINSTANCEID (ProcessInstanceId) of the process instance to which the required process entities, such as Human tasks, process variables, etc., pertaining to. This required all the tables to have PROCESSINSTANCEID as a foreign key to create a link among these data. However, when the number of process instances grow beyond the limits of a single database can accommodate, storage of them in the DB becomes bottle neck which in turn reduces the scalability and increase the DB overhead.
The application is able to add more number of BPM engine nodes into the cluster, but not able to scale as the DB is centralized and its capability is limited. Further, due to increase in overhead of the DB, the performance of DB is degraded substantially.
SUMMARY
This summary is provided to introduce concepts related to systems and methods of scalable storage of relational process data in databases, and are further described below in the detailed description. This summary is not intended to identify essential features of the claimed subject matter nor is it intended for use in determining or limiting the scope of the claimed subject matter.
A main objective of the present invention is to be able to scale-out the database compute capability and storage capacity by distributing the data and data access over multiple databases without introducing any performance overhead that is normally expected in any distributed storage solutions.
Another objective of the present invention is to ensure that the existing mode of deployment, i.e., cluster of BPM engines using a single centralized database, works absolutely fine with no or very minimal configuration-only changes, hence the solution is expected to be fully backward compatible with the prior solutions.
Yet another objective of the present invention is to provide a scalable and high-performant database storage mechanism in large scale BPM deployments where a cluster of BPM Process applications handle huge volume of concurrent business operations.
In order to provide a technical solution to the above mentioned technical problems, the present invention provides a mechanism for scalable storage of relational process data in systems. To achieve the scalable storage the system requires the BPM to be able to:
• consistently resolve one database server among the cluster of independent database servers by applying the configured database server selection rule, such as hashing the PROCESSINSTANCEID,
• choose the datasource/connection corresponding to the database server that is resolved out of pool of datasources/connections, each connected with each of the database servers, and
• perform the Create, Retrieve, Update, Delete (CRUD) operations with a single transaction created over the resolved datasource/connection. Further, in order to be able to perform CRUD operations involving the process entities those are related to the specific process instance in hand, all the related process entities are ensured to be present in the same database server to which the resolved datasource/connection is connected to.
Accordingly, in one implementation, the present invention provides a system for storing a relational data. The system comprises a processor and a memory coupled to the processor for executing a plurality of modules present in the memory. The plurality of modules comprises a shard identification module, a query module, an identification generator, an execution engine, a shard id embed module, and a storage module. The shard identification module is configured to identify a plurality of database shards operating on a plurality of database servers. The query module is designed in such a way that it receives the data in a form of at least one process instance having at least one process entity associated thereto from at least one client device. The identification generator is configured to generate at least one process instance identification (PID) associated with the process instance receive. The execution engine is configured to execute the process entity associated with the process instance. The shard id embed module is configured to embed the process instance identification (PID) generated in at least one identification of the process entity executed. The storage module is configured to store the data associated with the PID in at least one database shard identified, wherein the database shard selected from a plurality of database shards and is identified, by the shard identification module, by applying at least one sharding rule based on the PID and the data sources configured.
In one implementation, the present invention provides method for storing a relational data. The method comprises:
• identifying a plurality of database shards operating on a plurality of database servers;
• receiving, from at least one client device, the data in a form of at least one process instance having at least one process entity associated thereto;
• generating at least one process instance identification (PID) associated with the process instance received;
• executing the process entity associated with the process instance;
• embedding the process instance identification (PID) generated in at least one identification of the process entity executed;
• identifying, by applying at least one sharding rule based on the PID and the data sources configured, at least one database shard selected from a plurality of database shards;
• storing the data associate with the PID in the database shard identified.
In contrast to the prior-art techniques available, with the present invention, avoids placing the related objects in different shards by generating IDs for related objects in such a way that when a shard is determined based on part (common parent key prefix) of the IDs of these related objects, the determined shard for all these related objects ensured to be same automatically. The present invention enables to reduce extra intelligence (manual efforts as well as computing efforts) to handle relationships between objects that might otherwise be in completely separate shards, as the related objects will always be collocated in same shard. Furthermore, the present invention enables, the relation among the related objects before they are inserted into the DB, to generate IDs in such a way that shard calculation logic will automatically resolve a same shard for all these related objects, then the relation are inserted into the same shard so that the related objects are collocated in single shard. This way when the query that requires to interact with multiple tables representing related objects does not require to access multiple shards/database servers.
In one implementation, the present invention achieves the technical advancement over the conventional knowledge when the common parent key is embedded within other related process element’s IDs and then the shard is found based on the common parent key. This enables all the related process data are collocated in the same shard. However, it may be understood by the person skilled in that art that, every step is required to achieve the single-shard CRUD operations.
BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS
The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to refer like features and components.
Figure 1 illustrates a sample BPM process for leave management system.
Figure 2 illustrates the prior system associated with the BPM process storing two types of data i.e., the static data and the dynamic (runtime) data.
Figure 3 illustrates a deployment for a BPM process, in accordance with an embodiment of the present subject matter.
Figure 4 illustrates an overall deployment of the present invention, in accordance with an embodiment of the present subject matter.
Figure 5 illustrates a process of storing/updating the process data using the BPM, in accordance with an embodiment of the present subject matter.
Figure 6 illustrates a system for storing a relational data, in accordance with an embodiment of the present subject matter.
Figure 7 illustrates a relation between each ID, in accordance with an embodiment of the present subject matter.
Figure 8 illustrates a method for storing a relational data, in accordance with an embodiment of the present subject matter.
Figure 9 illustrates a timeline diagram for storing the relational data, in accordance with an embodiment of the present subject matter.
Figure 10 illustrates a timeline diagram for execution of the join query on the stored a relational data, in accordance with an embodiment of the present subject matter.
Figure 11 illustrates a timeline diagram for fetching the variable value from the stored a relational data, in accordance with an embodiment of the present subject matter.
Figure 12 illustrates a timeline diagram for application to directly specify the desired shard/data source in a thread of execution, in accordance with an embodiment of the present subject matter.
Figure 13 illustrates different ID generators, in accordance with an embodiment of the present subject matter.
Figure 14 illustrates a timeline diagram for MainPIBasedIdGenerator implementation, in accordance with an embodiment of the present subject matter.
Figure 15 illustrates a timeline diagram for PID will be extracted using string parsing, in accordance with an embodiment of the present subject matter.
It is to be understood that the attached drawings are for purposes of illustrating the concepts of the invention and may not be to scale.
DETAILED DESCRIPTION OF THE PRESENT INVENTION
The following clearly describes the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Apparently, the described embodiments are merely a part rather than all of the embodiments of the present invention. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts shall fall within the protection scope of the present invention.
The invention can be implemented in numerous ways, including as a process, an apparatus, a system, a composition of matter, a computer readable medium such as a computer readable storage medium or a computer network wherein program instructions are sent over optical or electronic communication links. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention.
A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.
Systems and methods for scalable storage of relational process data in databases are disclosed.
While aspects are described for systems and methods scalable storage of relational process data in databases, the present invention may be implemented in any number of different computing systems, environments, and/or configurations, the embodiments are described in the context of the following exemplary systems, apparatus, and methods.
In one implementation, the present invention enables the scalable storage of relational process data in BPM System. The present invention enables the BPM system to consistently resolve one database server among the cluster of independent database servers by applying a configured database server selection rule, such as hashing the PROCESSINSTANCEID, to choose the data source/connection corresponding to the database server that is resolved out of pool of data sources/connections, each connected with each of the database servers,. Further, the present invention performs the required CRUD operations with a single transaction created over the resolved data source/connection. To be able to perform CRUD operations involving the process entities those are related to the specific process instance in hand, all the related process entities are ensured to be present in the same database server to which the resolved data source/connection is connected to.
In one implementation, the present invention enables the system to provide an ability to tag the related process entities/data pertaining to a specific process instance and store the related data consistently in one of the cluster of database servers based on a pluggable rule.
In one implementation, the present invention enables the system to provide an ability to query individual process entity data identified by its ID.
In one implementation, the present invention enables the system to provide an ability to query the data with join of the tables even when using multiple database servers.
In one implementation, the present invention enables the system to provide an ability to store the application data along with BPM data for more advanced join queries.
Referring now to figure 2, an existing systems associated with the BPM process. As shown in figure 2 multiple BPM engine nodes in the cluster are configured to connect with the same centralized database. Runtime and Static data is stored in the same database, there will be no separation of data, all the data will reside in the same DB. This has the limitation that DB cannot be scaled-out even if BPM engines can scale horizontally. Also in this deployment it has single point of failure for database.
Referring now to figure 3, a deployment for a BPM process, in accordance with an embodiment of the present subject matter is disclosed. In one implementation, various internal modules that are involved in the implementation of this invention looks like as shown in figure 3. Multiple datasources are configured in the engine and a request from application will be processed by multiple layers and in the DAO layer, BPM engine will identify the appropriate datasource by using the sharding technique and the request will be processed using this datasource.
Referring now to figure 4, an overall deployment of the present invention for a BPM process, in accordance with an embodiment of the present subject matter is disclosed. In one implementation, an overall deployment of the present invention may look like as shown in figure 4. As shown in figure 4, one of the DB may store the static and runtime (dynamic) data and other DB will be store the runtime data. In the relational model, the collection of standalone databases or shards may be logically viewed as a single distributed database.
Referring now to figure 5, a process of storing/updating the process data using the BPM is disclosed. In one implementation, as shown in figure 5, when storing/updating the process data into any database, the target database is determined by performing the sharding based on a process ID. This sharding based on a process ID ensures that all the data that are related to certain process instance are stored or updated in the same database.
Referring now to figure 6, a system for storing a relational data, in accordance with an embodiment of the present subject matter is disclosed. In one implementation, a system (500) for storing a relational data is disclosed. The system comprises a processor (502) and a memory (506) coupled to the processor (502) for executing a plurality of modules present in the memory.
Although the present subject matter is explained considering that the storing a relational data is achieved by the system (500), it may be understood that the system (500) may also be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, and the like. It will be understood that the system (500) may be accessed by multiple users through one or more user devices (not shown) or application residing in those device (not shown). Examples of the system (500) may include, but are not limited to, a portable computer, a personal may be communicatively coupled to other devices through a network (not shown).
In one implementation, the network may be a wireless network, a wired network or a combination thereof. The network can be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, and the like. The network may either be a dedicated network or a shared network. The shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like, to communicate with one another. Further the network may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like.
In one implementation, the system (500) may include the processor (502), an interface (504), and the memory (506). The processor (502) may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the at least one processor (502) is configured to fetch and execute computer-readable instructions stored in the memory (506).
The interface (504) may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The interface (504) may allow the system (500) to interact with a user directly or through the client devices. Further, the interface (504) may enable the system (500) to communicate with other computing devices, such as web servers and external data servers (not shown). The interface (504) can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. The interface (504) may include one or more ports for connecting a number of devices to one another or to another server.
The memory (506) may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. The memory (506) may include plurality of modules. The modules include routines, programs, objects, components, data structures, etc., which perform particular tasks or implement particular abstract data types. In one implementation, the plurality of modules comprises a shard identification module (508), a query module (510), an identification generator (512), an execution engine (514), a shard id embed module (516), and a storage module (518).
In one implementation, the shard identification module (508) may be configured to identify a plurality of database shards operating on a plurality of database servers.
In one implementation, the query module (510) may be configured to receive the data in a form of at least one process instance having at least one process entity associated thereto from at least one client device.
In one implementation, the identification generator (512) may be configured to generate at least one process instance identification (PID) associated with the process instance receive.
In one implementation, the execution engine (514) may be configured to execute the process entity associated with the process instance.
In one implementation, the shard id embed module (516) may be configured to embed the process instance identification (PID) generated in at least one identification of the process entity executed.
In one implementation, the storage module (518) may be configured to store the data associate with the PID in at least one database shard identified, wherein the database shard selected from a plurality of database shards and is identified, by the shard identification module, by applying at least one sharding rule based on the PID.
In one implementation, the identification generator (512) is further configured to generate at least one process entity identification for the process entity.
In one implementation, the data having the process instance with the PID and the process entity identifications with the PID embedded therein are stored in a single database shard.
In one implementation, the identification generator is further configured to link all the process entity identification having common PID.
In one implementation, the database shard is selected based on at least one sharding algorithm on the PID.
In one implementation, the process entity is linked to the process instance by embedding the PID and a separator.
In one implementation, the before applying the sharding rule the PID is extracted using a ShardingKeyFinder command. It may be understood by the person skilled in the art that the command may invoke a particular mechanism or algorithm for the extraction of PID. In one implementation, the ShardingKeyFinder mechanism is a new technique introduced to find the sharding key based on the input. In this, a string parsing is performed to find the sharding key (PID) from the given ID. Basically the string that is preceding the $$$ token in the whole ID is extracted.
In one implementation, the sharding rule preferably comprises hash value of the PID to identify the data source.
In one implementation, wherein the sharding rule is configurable and re-configurable for more customized usage.
In one implementation, the present invention comprises at least one database shard selected from the plurality of database shards storing a runtime data associated with the data, and at least one database shard selected from the plurality of database shards storing a static and runtime data.
Referring now to figure 7, a relation between each ID, in accordance with an embodiment of the present subject matter is disclosed. In one example, as BPM may be storing the data onto multiple databases, and may support querying (such as, ‘select query’) any type of process entities based on any type of process entity (such as, ‘where’ clause), all process element IDs (such as, primary keys) are defined such that it embeds the common parent key (which may be a PID in this case) that identifies the parent process entity, i.e., process instance, to which all other types of process entities pertaining to.
In one implementation, when storing the data related to a specific process instance, all the data that are related to the specific process instances are also stored in the same database. For this, from the ID of the entity, find the common parent key (PID) as shown in figure 7.
In one implementation, while generating the key, the present invention uses the same pattern so that the customizer may split the common parent key (PID). For example:
Key Description Current Key Updated Key
Process Instance ID (PID) PI_001 PI_001
Human Task ID TASK_01 PI_001$$$TASK_01
Human Task Variable ID VAR_01 PI_001$$$TASK_01$$$VAR_01
Timer ID TIMER_01 PI_001$$$TIMER_01
Process Variable ID VAR_02 PI_001$$$VAR_02
In one implementation, when any process entity is to be stored into the database, the BPM/ the system identifies the appropriate data source based on the process instance to which this element pertaining to. All the elements that pertain to the process instance are stored to the same database.
In one implementation, before storing any entity, the system/ the BPM resolves the common parent key (PID in this case) and based on which, it determines the data source by applying the rule. For more flexibility the rule is made pluggable, so that application can plug-in their own rules and control the database selection.
In one implementation, the BPM may use the connected database in shared-nothing fashion, that may from the perspective of performing the DB operation for the specific process instance in the hand, the system may not have to behave specially or differently than usual irrespective of whether the DB being interacted is single centralize DB or a DB that is part of cluster of DBs, which forms part of the distributed database system.
In one implementation, by implementing the present invention the system/ BPM, to fulfill the DB CRUD operation, except for the data source resolution based on the PID before performing the CRUD, it may not have to deal with multiple databases or does not require being aware of the fact that there are multiple database servers running. This enables to achieve same CRUD performance as if the CRUD operation were conducted against a single centralized database.
It may be understood by the person skilled in the art that the process instance and the process elements instance in the examples discussed in this invention are related entities. All the process element instances created while executing a process instance pertain to the process instance. They are linked (or tagged) by prefixing the PID and $$$ separator to the related process element instances IDs. Like Process element instances, there are other entities such as Task, timer, trace, etc. and they are also linked in the same way.
In one implementation, the process element ID’s (primary keys) are defined to embeds the common parent key (which is PID in this case) that identifies the parent process entity i.e., by prefixing the common parent key (PID) and $$$ to the primary keys of the related process entities.
In one implementation, the common parent key (PID) may be found as every time a process instance is created, a unique ID is generated either by default ID generator or custom ID generator implemented and plugged-in by the application developer. Later when the other related process entities such as process element instance, trace, task, timer, etc. are created, they are created under the process execution context that has the details about the Process instance, including its ID, currently being executed or managed. So, the ID generator for other types of process entities obtain the common parent key (PID) from the context.
For example, for the generation of the ID’s, either by default ID generator or custom ID generator implemented and plugged-in by the application developer. This ID generator does not use DB sequence, but it uses DB to synchronize among the BPM engine nodes in the cluster to ensure the IDs generated across the engines are unique. For example, at the beginning, the table has the LAST_GIVEN_ID as 0. Each node, at any point of time, will assume exclusive ownership of range of IDs. When the first node is started, it will update the table with the value that is equal to existing value of LAST_GIVEN_ID in the table+500, thus the value will be 500. Now the first now creates IDs in the range of 1 to 500. When the second node starts, it sees 500 as the LAST_GIVEN_ID, hence it takes 501 as the starting ID and 1000 as the last ID, and update the table accordingly, i.e., with the value of 1000. Like this all the nodes will assume a bucket of IDs. When the bucket of IDs is used out, nodes will re-obtain next bucket of IDs following the same process. As the LAST_GIVEN_ID is persisted in DB, the IDs won’t duplicate even if the BPM engines or DB is restarted. The bucket size (500 in this example) is configurable. When the nodes obtain and update the LAST_GIVEN_ID, the table row is locked so that there may not be concurrent access to this data, hence no duplicates due to concurrent access.
Figure 8 illustrates a method for storing a relational data, in accordance with an embodiment of the present subject matter. The method may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. The method may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.
The order in which the method is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method or alternate methods. Additionally, individual blocks may be deleted from the method without departing from the protection scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof. However, for ease of explanation, in the embodiments described below, the method may be considered to be implemented in the above described system (500).
At block 702, a plurality of database shards operating on a plurality of database servers are identified.
At block 704, the data in a form of at least one process instance having at least one process entity associated thereto is received from at least one client device.
At block 706, at least one process instance identification (PID) associated with the process instance received is generated.
At block 708, the process entity associated with the process instance is executed.
At block 710, the process instance identification (PID) generated is embedded in the process entity executed.
At block 712, at least one database shard is identified by applying at least one sharding rule based on the PID. The shard is selected from a plurality of database shards. In one implementation, when an element ID is given (for example, say LMS_PI_0734$$$TASK_0975), for insert or query operation first PID is extracted using SharedKeyFinder (in case of the example PID is LMS_PI_0734). Once the PID is found, PID is passed to the Sharding Rule interface along with the configured list of data source names (bpmdasDataSource1, bpmdasDataSource2, bpmdasDataSource3). The sharding rule (java implementation that decides the data source based on the PID and data sources passed) will find the appropriate data source name using any algorithm. BPM engine provides one implementation which uses the hashing mechanism to ensure same data source name is used for same PID.shard=pid.hashCode()%dataSourceNames.size();
In another application scenario, a person skilled in that art may use round robin method for allocating the datasource, this allocated datasource name is embedded in the PID itself (using process id customizer like LMS_PI_0734#bpmdasDataSource1) and this is used during custom sharding rule implementation to determine the shard ( PID = LMS_PI_0734#bpmdasDataSource1; Task ID = LMS_PI_0734#bpmdasDataSource1$$$TASK_0975;
Sharding key finder finds “LMS_PI_0734#bpmdasDataSource1” as PID which is the common parent key and the custom sharding rule finds “bpmdasDataSource1” as the shard/data source name from PID itself)
At block 714, the data associate with the PID is stored in the database shard identified.
In one implementation, the data having the process instance with the PID and the process entity identifications with the PID embedded therein are stored in a single database shard.
In one implementation, the identification generator is further configured to link all the process entity identification having common PID.
In one implementation, the database shard is selected based on at least one sharding algorithm on the PID.
In one implementation, the process entity is linked to the process instance by embedding the PID and a separator.
In one implementation, the before applying the sharding rule the PID is extracted using a ShardingKeyFinder command.
In one implementation, the sharding rule preferably comprises hash value of the PID to identify the data source.
In one implementation, wherein the sharding rule is configurable and re-configurable.
In one implementation, the present invention comprises at least one database shard selected from the plurality of database shards storing a runtime data associated with the data, and at least one database shard selected from the plurality of database shards storing a static and runtime data.
Figure 9 illustrates a timeline diagram for storing the relational data, in accordance with an embodiment of the present subject matter. As shown in figure 9, when storing/updating the process data, the target database may be determined by performing the sharding based on PID. This ensures that all the data that are related to certain process instance are stored or updated in the same database.
Figure 10 illustrates a timeline diagram for execution of the join query on the stored a relational data, in accordance with an embodiment of the present subject matter. As shown in figure 8, suppose user/application retrieves all human tasks associated to a process instance, the BPM may find the target database based on the PID and execute the join query (Query joining two tables PI and Task) to that DB.
Figure 11 illustrates a timeline diagram for fetching the variable value from the stored a relational data, in accordance with an embodiment of the present subject matter. As shown in figure, when application wants to fetch the variable value, then the PID may be determined by parsing the variable ID. Then the database may be determined based on the PID, and the query may be executed against the target database.
In one implementation, by using the present invention, any user or the application may directly query the entity. Using the provided key, database in which the data is stored may be determined by applying the sharding algorithm on the PID associated with the key. Once the database is determined, query may be performed on that DB directly.
In one implementation, by using the present invention, as all the process data is present in the same database, user may perform joined query of process and related data.
In one implementation, by using the present invention, the ID may be customized, and the user may use application specific sharding key in the PID.
In one implementation, the existing BPM may also use the database selected by application logic of the present invention.
In one implementation, the applications use the BPM for creating and managing business processes that deal with business data. Typically, applications may require performing CRUD operations involving both application-specific business data and the data populated and managed by BPM. This requires that both application data and BPM data that are related to each other be present in the same database. BPM supports achieving this by letting the application specify which data source to use in BPM API invocation context (by using Java Thread Local) and BPM will use this information, if present, to perform the DB CRUD operations in the same transaction of application. Figure 12 illustrates a timeline diagram for creating and managing business processes that deal with business data, in accordance with an embodiment of the present subject matter. As shown in figure 12, this provides extension and flexibility to store the data along with the application data, so that application may perform join query with the BPM process data (as both the data are stored in the same database).
In one implementation, figure 12 represents another option for application to directly specify the desired shard/data source in a thread of execution. So that BPM also will persist all its data along with associated application data in the same sharding. This also can be considered as one differentiator when compared with other prior arts.
Apart from what is explained above, the present invention also include the below mentioned advantages:
? Customizing the sharding rule: Based on user and system requirements, the applications can customize the sharding rule, which enables applications to use any type of sharding algorithm (s).
? Customizing the ID generator: Based on user and system requirements, application can plug-in their own id generator, using which data distribution can be controlled easily.
? Highly scalable: In the relational model, the collection of standalone databases or shards can be logically viewed as a single distributed database. Since the database is distributed and elastically scalable, BPM nodes also can be scaled elastically virtually without any limits. This enables application(s) for supporting more process load.
? Highly Available: Since the database is distributed, if one database server goes down, only those process data present in that database will not be available, and the system can continue to run fine for other processes. With the prior solution, if the single centralized database goes down, the entire system will not be available.
Apart from what is explained above, the present invention also include the below mentioned effects:
? The present invention greatly improves application’s scalability and performance.
? The applications can customize the database selection algorithm/sharding technique as per their logic.
? Applications ensures that the BPM data is stored along with the application data in the same database; hence CRUD operations involving both application and BPM data will perform better without having to interact with multiple database servers.
In one implementation, the present invention may be implemented using database clusters provided by database vendors, such as Oracle Real Application Cluster (RAC), which is very expensive due to its requirements for SAN/NAS storage architectures. With Oracle RAC, the cluster may run multiple database server processes, but they all connect to and operate on a shared storage. This may use the concept of shared everything which would cause the bottleneck in disk IO operations. This is overcome by having SAN/NAS storage architectures, in which large number of hard disks that are interconnected.
In one implementation, using NoSQL technologies is another alternative solution for scalable storage but with limitation such as lack of transactional integrity and ACID capabilities.
As compared to the existing prior-art techniques, the present invention is advanced by linking the related process data using same common parent key and using that for DB CRUD operations.
As compared to the existing prior-art techniques, the present invention is advanced by storing the related BPM and business/application data of a given process instance in the same database and able to perform join query in distributed storage.
As compared to the existing prior-art techniques, the present invention is advanced by storing the BPM data in distributed database for highly scalable and high performance application, without using database cluster solutions such as Oracle RAC and/or NoSQL technologies.
WORKING EXAMPLE of creating and executing PI with a sample: User may create the process definition (PD) from IDE and deploy that to BPM engine. After deployed that PD, user will create a process instance (PI) using the API which is exposed from BPM engine.
For example,
public String createAndStartProcessInstance(String processDefinitionId, Map variables,String creatorId);
This API may first create a process instance (PI) object and generate a unique id for the PI and return that id to the user.
An end user “John” applies leave using this LMS, then one leave flow (one PI) will be created by the BPM engine.
LMS code may be like
Map variable = new HashMap();
variable.put(“NumberOfDays”,2);
variable.put(“Reason”,”Friends marriage”);
String applicationId= bpmEngine.getExecutionService().
createAndStartProcessInstance(“LeaveApplication”,variable,”John”);
In this sample, the applicationId is the Process Instance Id (PID).BPM engine may internally create this PID using the id generator which generates the unique ids.
After the unique id is generated, BPM engine may give a callback to the listener
com.soa.foundation.bpm.base.idgenerator.ProcessIDCustomizer
A sample implementation of this may be like, in the callback, application may get an auto generated id using “context.getCurrentId();” after they got that, application can customize that id.
public class SampleProcessIDCustomizerImpl implements ProcessIDCustomizer
{
public String customize(ProcessIDCustomizationContext context)
{
String currentId = context.getCurrentId(); // Say PI_01
String customizedId = "LMS_" + currentId;
return customizedId;
}
}
Application may customize the PID as per their convenience. In this sample it would be LMS_PI_01.
Now when the process is executing, BPM engine will create elements and tasks based on the process and for each of the process element it will create unique id internally.
For creating the ID BPM engine will use the ModuleIDGenerator (interface) as below. Depending on the element type corresponding implementation will be used by BPM engine for generating the IDs.
BPM engine supports this pluggable ID generator for each module as shown in figure 13. When we are using the distributed data source, all the id generator implementation may be wrapped using the MainPIBasedIdGenerator implementation which will link the main PID and the element ID together.
When any ID has to be generated, BPM engine may invoke the following API in ModuleIDGenerator
public String generateID(BaseContext paramBaseContext);
MainPIBasedIdGenerator implementation may use the base context and link the element Id with the PID.
In the above sample, as shown in figure 14, when the process is started, it may first create one process element for SecretaryVerification element using ElementIdGenerator, generated id will be like EI_01. When application is using the distributed database, the MainPIBasedIdGenerator will link to the PID and the id generated will be like LMS_PI_01$$$EI_01.
In similar lines other ID generators also will be wrapped and the generated id will be linked with the main process instance id (PID).
WORKING EXAMPLE of selecting the data source for storage: In distributed storage case, the available data-sources will be configured by the application like below
bpm.distributed.datasources.bean.names=bpmdasDataSource1;
bpmdasDataSource2;bpmdasDataSource3
When BPM engine needs to save the data to DB, as shown in figure 15 it will invoke the ShardingRule to find the datasource to be used for this process instance (PI).
The API may be like
public interface ShardingRule
{
public String getDataSourceBeanName(String PID, List dataSourceBeanNames, String defaultDataSourceBeanName);
}
BPM engine may use the data source that is returned from this implementation.Since all the elements are linked with the process instance id in the format (PID$$$ID).
Before invoking the ShardingRule the PID will be extracted using ShardingKeyFinder.
In the above example from the element id (LMS_PI_01$$$EI_01) PID will be extracted using string parsing.
WORKING EXAMPLE for Sharding Rule customization: By default, BPM engine provides one sharding rule implementation which will work based on hash value of the PID.
int idx = shardingKey.hashCode() % dataSourceBeanNames.size();
idx = Math.abs(idx);
return dataSourceBeanNames.get(idx);
Application can provide their own ShardingRule and configure using the configurationbpm.distributed.datasources.shardingrule.bean.id=LMS_ShardingRule
For example:
LMS can have their own ShardingRule like
public class LMSShardingRule implements ShardingRule
{
@Override
public String getDataSourceBeanName(
String shardingKey,
List dataSourceBeanNames,
String defaultDataSourceBeanName)
{
String dataSource = // custom logic
return dataSource;
}
}
In one implementation, based on the above example step by step details of the present invention is as provided below:
Before starting the system, user has to perform following configuration changes
Step : 1 Configure the data sources that has to be used by using the configuration
bpm.distributed.datasources.bean.names=bpmdasDataSource1;bpmdasDataSource2;bpmdasDataSource3
Step : 2 Application may configure the custom sharding rule (Optional)
bpm.distributed.datasources.shardingrule.bean.id=LMS_ShardingRule
Step : 3 During system startup, BPM engine will identify that multi data source has to be used and will wrap all the id generators with MainPIBasedIdGenerator
Step : 4 Based on the configuration, ShardingRule will be initialized.
Step by step details for the Leave application flow execution:
Step : 1 End user creates the leave application in Leave Management System (say Leave management system(LMS) is an application developed using BPM)
Step : 2 Application (LMS) invokes createAndStartProcessInstance with the Process definition ID and variables that has to be used.
Step : 3 BPM engine will create unique PID (which uses BPM IDService for generating unique id). (In our sample it will be PI_01)
Step : 4 BPM engine will invoke ProcessIDCustomizer for customizing the PID. (Optional) (In this step the ID will become LMS_PI_01)
Step : 5 Once the PI started, process element for SecretaryVerification will be created.
Step : 6 As part of the process element creation, BPM engine will invoke the ModuleIDGenerator to generate unique id for this element.
a. Here first element Id generator will create the ID like (EI_01)
b. It will be linked with main process instance id (like LMS_PI_01$$$_EI_01)
Step : 7 Element execution will be initiated and application logic will be executed.
Step : 8 After the execution, BPM engine will save the data to database,
Step : 9 BPM engine will invoke the ShardingKeyFinder by passing the element Id (In our sample, the ID passed will be LMS_PI_01$$$_EI_01)
Step : 10 ShardingKeyFinder will filter the PID and return (it will return LMS_PI_01) (This step is inversion of which is done by MainPIBasedIdGenerator (in step 6)
Step : 11 Using this PID, the ShardingRule will be invoked
Step : 12 The DB operation will be performed in that DB which is returned by the ShardingRule.
A person of ordinary skill in the art may be aware that in combination with the examples described in the embodiments disclosed in this specification, units and algorithm steps may be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether the functions are performed by hardware or software depends on the particular applications and design constraint conditions of the technical solution. A person skilled in the art may use different methods to implement the described functions for each particular application, but it should not be considered that the implementation goes beyond the scope of the present invention.
It may be clearly understood by a person skilled in the art that for the purpose of convenient and brief description, for a detailed working process of the foregoing system, apparatus, and unit, reference may be made to a corresponding process in the foregoing method embodiments, and details are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the described apparatus embodiment is merely exemplary. For example, the unit division is merely logical function division and may be other division in actual implementation. For example, a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed. In addition, the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented through some interfaces. The indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms.
When the functions are implemented in a form of a software functional unit and sold or used as an independent product, the functions may be stored in a computer-readable storage medium. Based on such an understanding, the technical solutions of the present invention essentially, or the part contributing to the prior art, or a part of the technical solutions may be implemented in a form of a software product. The computer software product is stored in a storage medium, and includes several instructions for instructing a computer device (which may be a personal computer, a server, or a network device) to perform all or a part of the steps of the methods described in the embodiment of the present invention. The foregoing storage medium includes: any medium that can store program code, such as a USB flash drive, a removable hard disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk, or an optical disc.
Although implementations for systems and methods scalable storage of relational process data in databases have been described in language specific to structural features and/or methods, it is to be understood that the appended claims are not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as examples of implementations of the systems and methods scalable storage of relational process data in databases.
| Section | Controller | Decision Date |
|---|---|---|
| # | Name | Date |
|---|---|---|
| 1 | 5277-CHE-2015-US(14)-HearingNotice-(HearingDate-23-09-2022).pdf | 2022-08-31 |
| 1 | Power of Attorney [01-10-2015(online)].pdf | 2015-10-01 |
| 2 | 5277-CHE-2015-CLAIMS [07-04-2020(online)].pdf | 2020-04-07 |
| 2 | Form 3 [01-10-2015(online)].pdf | 2015-10-01 |
| 3 | Form 18 [01-10-2015(online)].pdf | 2015-10-01 |
| 3 | 5277-CHE-2015-DRAWING [07-04-2020(online)].pdf | 2020-04-07 |
| 4 | Drawing [01-10-2015(online)].pdf | 2015-10-01 |
| 4 | 5277-CHE-2015-FER_SER_REPLY [07-04-2020(online)].pdf | 2020-04-07 |
| 5 | Description(Complete) [01-10-2015(online)].pdf | 2015-10-01 |
| 5 | 5277-CHE-2015-OTHERS [07-04-2020(online)].pdf | 2020-04-07 |
| 6 | abstract 5277-CHE-2015.jpg | 2015-11-16 |
| 6 | 5277-CHE-2015-FER.pdf | 2020-01-16 |
| 7 | Correspondence by Agent_Deed of Assignment_05-03-2018.pdf | 2018-03-05 |
| 7 | 5277-CHE-2015-Correspondence-080216.PDF | 2016-06-27 |
| 8 | 5277-CHE-2015-8(i)-Substitution-Change Of Applicant - Form 6 [26-02-2018(online)].pdf | 2018-02-26 |
| 8 | 5277-CHE-2015-PA [26-02-2018(online)].pdf | 2018-02-26 |
| 9 | 5277-CHE-2015-ASSIGNMENT DOCUMENTS [26-02-2018(online)].pdf | 2018-02-26 |
| 10 | 5277-CHE-2015-PA [26-02-2018(online)].pdf | 2018-02-26 |
| 10 | 5277-CHE-2015-8(i)-Substitution-Change Of Applicant - Form 6 [26-02-2018(online)].pdf | 2018-02-26 |
| 11 | Correspondence by Agent_Deed of Assignment_05-03-2018.pdf | 2018-03-05 |
| 11 | 5277-CHE-2015-Correspondence-080216.PDF | 2016-06-27 |
| 12 | abstract 5277-CHE-2015.jpg | 2015-11-16 |
| 12 | 5277-CHE-2015-FER.pdf | 2020-01-16 |
| 13 | Description(Complete) [01-10-2015(online)].pdf | 2015-10-01 |
| 13 | 5277-CHE-2015-OTHERS [07-04-2020(online)].pdf | 2020-04-07 |
| 14 | Drawing [01-10-2015(online)].pdf | 2015-10-01 |
| 14 | 5277-CHE-2015-FER_SER_REPLY [07-04-2020(online)].pdf | 2020-04-07 |
| 15 | Form 18 [01-10-2015(online)].pdf | 2015-10-01 |
| 15 | 5277-CHE-2015-DRAWING [07-04-2020(online)].pdf | 2020-04-07 |
| 16 | Form 3 [01-10-2015(online)].pdf | 2015-10-01 |
| 16 | 5277-CHE-2015-CLAIMS [07-04-2020(online)].pdf | 2020-04-07 |
| 17 | Power of Attorney [01-10-2015(online)].pdf | 2015-10-01 |
| 17 | 5277-CHE-2015-US(14)-HearingNotice-(HearingDate-23-09-2022).pdf | 2022-08-31 |
| 1 | SearchStrategyAE_11-01-2021.pdf |
| 1 | SearchStrategyMatrix_16-01-2020.pdf |
| 2 | SearchStrategyAE_11-01-2021.pdf |
| 2 | SearchStrategyMatrix_16-01-2020.pdf |