Parallel Recovery For Shared Disk Databases

Abstract: ABSTRACT PARALLEL RECOVERY FOR SHARED-DISK DATABASES In the online replay process, the replay order is according to the GLSN ascending order too, but it is possible that one or more WAL sender(s) lag behind, also GLSN of each WAL record is not strictly incremented by 1, and might have gaps, such gaps are skipped once they occur, so if we can know which WAL sender lags behind, we can make data recovery in parallel. The present invention provides a mechanism to achieve parallel data recovery for shared-disk databases. According to the present invention, for every page transfers between two nodes, a special WAL record is inserted at the receiver, acting as a holding point (condition). When data recovery thread encounters this particular log, it will look at the other node processing data recovery, so it checks if the corresponding log is replayed. If yes, it will go for commit; otherwise it will continue to wait. 30

Patent Information

Application #

Filing Date

27 February 2017

Publication Number

31/2017

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

abhishek.sen@majumdarip.com

Parent Application

Patent Number

Legal Status

Grant Date

2018-10-29

Renewal Date

Applicants

HUAWEI TECHNOLOGIES INDIA PVT. LTD

Company of Syno 37, 46, 45/3, 45/4 Etc. Kno 1540, Kundalahalli Village, Bengaluru, Karnataka 560037, India.

Inventors

1. SREEKANTAIAH, Nirmala

SYNO 37, 46, 45/3, 45/4 ETC., KNO 1540, Kundalaha lli Village, Bengaluru, Karnataka 560037, India.

2. NIE, Yuanyuan

SYNO 37, 46, 45/3, 45/4 ETC., KNO 1540, Kundalaha lli Village, Bengaluru, Karnataka 560037, India.

3. LI, Haifeng

SYNO 37, 46, 45/3, 45/4 ETC., KNO 1540, Kundalaha lli Village, Bengaluru, Karnataka 560037, India.

Specification

TECHNICAL FIELD
The present subject matter described herein, in general, relates to database technologies, and more particularly, to parallel recovery for shared-disk databases.
BACKGROUND
As conventionally known, a database system provides a high-level view of data, but ultimately the data have to be stored as bits on one or more storage nodes. A vast majority of databases today store data on magnetic disk (and, increasingly, on flash storage) and fetch data into main memory for processing, or copy data onto tapes and other backup nodes for archival storage. The physical characteristics of storage nodes play a major role in the way data are stored, in particular because access to a random piece of data on disk is much slower than memory access: Disk access takes tens of milliseconds, whereas memory access takes a tenth of a microsecond. The database system can be a distributed database system, wherein the database is distributed over multiple disparate computers or nodes. Shared-disk databases fall into a general category where multiple database instances share some physical storage resource. With a shared-disk architecture, multiple nodes coordinate access to a shared storage system at a block level.
A database management system (DBMS) is generally system software for creating and managing databases. The DBMS provides users and programmers with a systematic way to create, retrieve, update and manage data. The DBMS is a collection of programs that enables you to store, modify, and extract information from a database. DBMSs have had crash recovery for many years. In DBMS, ACID (Atomicity, Consistency, Isolation, and Durability) is a set of properties that guarantee that database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction. For example, a transfer of funds from one bank account to another, even involving
2

multiple changes such as debiting one account and crediting another, is a single transaction. High availability features in DBMS is about ensuring that a database system remain operational both during planned or unplanned outages, such as maintenance operations, hardware / network failures, and the like. Further, a database replication is a process of ensuring a copy of data exists on a different machine to improve reliability, fault tolerance and availability.
As conventionally known, the database replication is the frequent electronic copying data from a database in one computer or server to a database in another so that all users can access information in event of failure of the place where original data modification took place. The place where the change is originated is termed as master and the place where data is replicated to is termed as standby. The database replication can either be physical i.e., log-shipping or logical i.e., command-shipping.The database replication can also be synchronous, where in an applications wait time includes changes in the originator node and time to safely commit in the replica, or can be asynchronous, where an application gets response immediately after the data is safely committed in the originator node, and originator takes responsibility of asynchronously committing the data on standby Replication can also be done on distributed database clusters, ensuring availability of a fully functional standby cluster in event of failure of the master cluster. Each of these clusters can be made up of one or more computers/servers/devices/nodes and can host single/multiple databases. Additionally, these clusters can also have a centralized coordinator, which coordinates all the activities related to data management.
Failover is a scenario where the master or the node that was transferring the logs is not available for the application (say the master is crashed) and standby has to take over the role of master. Switchover is a scenario where application (or coordinator) instructs the master to become standby, and the existing standby to become new master.
3

However, all the existing shared-disk replication solutions which aim at providing high availability, insist on only one node of a standby cluster to handle recovery during switchover/failover operations before the cluster can become fully functional. This node from standby cluster is responsible for replaying all the redo logs that are shipped from the erstwhile master. The time taken by this single node to recover the cluster is proportional to size of redo logs that are yet to be applied. Once the recovery is completed by this single node for the whole cluster, all the other nodes can come up and handle applications in parallel. As whole of the cluster recovery is handled by a single node, a considerable amount of time is taken to recover the cluster, and also the availability of cluster takes a hit in presence of huge amount of redo logs during recovery.
SUMMARY
This summary is provided to introduce concepts related to parallel recovery for shared-disk databases, and the same are further described below in the detailed description. This summary is not intended to identify essential features of the claimed subject matter nor is it intended for use in determining or limiting the scope of the claimed subject matter.
A main objective of the present invention is to providing a system and methods for faster data recovery in shared-disk databases.
The present invention also provides system and method for fast data recovery during switchover/failover where the clusters are present across geographical locations.
The present invention also provides system and method for fast data recovery during switchover/failover to reduce the data recovery time, and the database can be on-line in a much lesser time
4

5 In one implementation, according to the present invention at least two data recovery nodes/threads wait for each other, if they share a page between them. When this information is also logged as part of WAL, the nodes/threads decide which nodes/threads go ahead, and which needs to wait.For every page transfers between two nodes, a special log record (WAL) at the receiver is inserted, acting
10 as a holding point, the holding point represents a condition, which means Wait until node X replayed WAL record's GLSN >= Page's GLSN. When a data recovery node/thread encounters this particular special log record, it looks at the other node/thread processing data recovery, to check if the corresponding log is replayed. If yes, it will go commit, otherwise it will continue to wait. Thus,
15 multiple nodes/threads can participate in data recovery during switchover/failover thereby sharing the load and reducing the data recovery time.
Accordingly, in one implementation, the present invention provides a database system for data recovery in at least one shared-disk database. The database system
20 comprises a master cluster comprising a first device, and a standby cluster comprising a second device. The first device of the master cluster is adapted to transmit at least a log to the second device of the standby cluster, wherein the log contains at least information for modifying the shared-disk database and at least a pre-defined condition for recovery during switchover/failover. The second device
25 of the standby cluster is adapted to perform recovery based on the log received from the first device.
In one implementation, the present invention provides a first device for data recovery in at least one shared-disk database. The first device includes a
30 processor, and a memory coupled to the processor for executing a plurality of modules present in the memory. The plurality of modules includes a log generation module and a transmitting module. The log generation module is configured to generate at least one log containing at least information for modifying the shared-disk database and at least one pre-defined condition for
35 recovery during switchover/failover. The transmitting module configured to transmit the at least one log to at least one device of a standby cluster
5

In one implementation, the present invention provides a second device for data recovery in at least one shared-disk database. The second device includes a processor, and a memory coupled to the processor for executing a plurality of modules present in the memory. The plurality of modules includes a receiving
10 module, a checking module, and an execution module. The receiving module configured to receive at least a log from at least one device, the log contains at least information to modify the shared-disk database and at least a pre-defined condition for data recovery during switchover/failover. The checking module configured to check if the information received in the log to modify the shared-
15 disk database is replayed. The execution module configured to commit the information to modify the shared-disk database, or hold the commit based on the pre-defined condition for data recovery.
In one implementation, the present invention provides a method for data recovery 20 in at least one shared-disk database. The method comprises transmitting, by a first device comprised in a master cluster, at least a log to at least one second device comprised in a standby cluster, the log contains at least an information to modify the shared-disk database and at least a pre-defined condition for data recovery during switchover/failover; and performing, by second device, the data recovery 25 based on the log received from the first device.
In one implementation, method, performed by a first device, to achieve data recovery in at least one shared-disk database, is disclosed. The method includes generating at least a log containing at least information to modify the shared-disk 30 database and at least a pre-defined condition for data recovery during switchover/failover; and transmitting the log generated to at least one second device to perform data recovery based on the log received.
In one implementation, a method, performed by a second device, for data recovery
35 in at least one shared-disk database is disclosed. The includes receiving at least a
log from at least one first device, the log contains at least an information to
6

5 modify the shared-disk database and at least a pre-defined condition for data recovery during switchover/failover; checking if the information received in the log to modify the shared-disk database is replayed; and commit the information to modify the shared-disk database, or holding the commit based on the pre-defined condition for data recovery, the pre-defined condition comprises a data recovery
10 condition indicating that the second device holds the data recovery of the log received form the first device, upon determining that the first device of the master cluster lags to transmit the log, until at least one further log is obtained from the first device, the second device holds till the data recovery is completed at the first device.
15
In contrast to the prior-art techniques, the present invention enables the usage of multiple nodes in a cluster data recovery during switchover/failover thereby sharing the load and reducing the data recovery time. Further, the present invention assists in disaster data recovery scenarios where the clusters are present
20 across geographical locations.
The various options and preferred embodiments referred to above in relation to the first implementation are also applicable in relation to the other implementations. 25
BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS
The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in 30 which the reference number first appears. The same numbers are used throughout the drawings to refer like features and components.
Figure 1 illustrates a first device from a master cluster of at least two or more devices to transmit a log, in accordance with an embodiment of the present subject 35 matter.
7

5 Figure 2 illustrates a second device from a standby cluster of at least two or more devices to achieve data recovery in at least one shared-disk database, in accordance with an embodiment of the present subject matter.
Figure 3 illustrates a method comprising a first device of master cluster and 10 second device of the standby cluster to provide availability, fault tolerance and reliability.
Figure 4 illustrates a method, performed by a second device, from a standby cluster of at least two or more devices, for data recovery in at least one shared-15 disk database, in accordance with an embodiment of the present subject matter.
Figure 5 illustrates an overall processing of multiple nodes which can participate in data recovery during switchover/failover, in accordance with an embodiment of the present subject matter 20
It is to be understood that the attached drawings are for purposes of illustrating the concepts of the invention and may not be to scale.
DETAILED DESCRIPTION OF THE PRESENT INVENTION
25
The following clearly describes the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. The described embodiments are merely a part rather than all of the embodiments of the present invention. All other
30 embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts shall fall within the protection scope of the present invention.
The invention can be implemented in numerous ways, as a process, an apparatus,
35 a system, a composition of matter, a computer readable medium such as a
computer readable storage medium or a computer network wherein program
8

5 instructions are sent over optical or electronic communication links. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention.
10 A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives,
15 modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical
20 fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.
It may be noted by the person skilled in the art that, the shared disk databases fall into a general category where multiple database instances share same physical
25 storage resource. With a shared disk architecture, multiple nodes coordinate access to a shared storage system at a block level. During the processing in the shared disk architecture Redo logs are generated in shared-disk clusters. It may be understood that the redo logs are in a proprietary format which log a history of all changes made to the database. Each redo log file consists of redo records. A redo
30 record, also called a redo entry, holds a group of change vectors, each of which describes or represents a change made to a single block in the database. When a transaction is committed, the transaction's details in the redo log buffer are written to a redo log file.
9

5 Further, the write-ahead logging (WAL) is a family of techniques for providing atomicity and durability (two of the ACID properties) in database systems. In a system using WAL, all modifications are written to a log before they are applied.
In a master/standby kind of replication, the WAL has to be applied in order on the 10 standby node of the database to maintain consistency of data that was changed as part of transaction in the master. In order to facilitate this, each WAL contains an LSN (Log Sequence Number). WAL with LSN 'x' has to be applied on database before a WAL with LSN 'x'+'y' can be applied, where in both 'x' and 'y' are positive integers. For shared disk clusters, LSN is replaced by GLSN (Global Log 15 Sequence Number), the uses of which are described in further sections.
The Redo logs (WAL) generated in shared-disk clusters are ordered by a particular number, a GLSN. GLSN is global across all nodes of the cluster. The GLSN may be used to define the data recovery point for a restore operation.
20 Generally, the GLSNs are used internally during a RESTORE sequence to track the point in time to which data has been restored. Every record in the transaction log is uniquely identified by the GLSN. GLSNs are ordered such that if GLSN 2 is greater than GLSN 1, the change described by the log record referred to by GLSN 2 occurred after the change described by the log record GLSN1. The logs
25 generated in the cluster are ordered using GLSN.
The WAL with GLSN from master to standby can be sent by a single dedicated node, or alternatively each node of the master cluster can take the responsibility of transmitting WAL generated by the each node. The entities which send WAL may
30 be termed as WAL senders. In general, a cluster has multiple WAL senders for load balancing and efficiency. Consequently, on the standby cluster, there are nodes which will receive WAL logs and write them to shared storage. These entities are termed as WAL receivers. There can be multiple WAL receivers in a standby cluster. Generally there is a one-to-one mapping between WAL sender
35 and WAL receiver, however it is not mandatory.
10

5 The WAL logs received by the WAL receivers may be applied to the database to bring it to the same state as master cluster. This process is termed as data recovery.
The data recovery may be of two types a cold data recovery (where logs are 10 already present as in the case of failover), and an online data recovery (where logs are being continuously replayed from multiple nodes in the master to standby as in case of switchover).In the cold data recovery process, the WAL records to be replayed are already present in the standby shared-disk. In each set of WAL files, the replay order may be just a sort-merge algorithm, according to the GLSN 15 ascending order of log records. This is the procedure followed during failover operations.
During switchover operation, the WAL is being continuously transmitted from the sender process. In the WAL replay process, the replay order is according to the
20 ascending order of the GLSN, but it may possible that one or more WAL sender(s) lag behind. Also, the GLSN of each WAL record may not be strictly incremented by 1, there might have some gaps. The gaps once they occur may not be skipped, but the system may have to wait until the WAL sender which is lagging behind to catch up.
25
Hence, it becomes of critical importance to know which WAL sender lags behind so that a parallel data recovery is possible. The present invention solves the above issues and thereby data recovery is made faster.
30 Accordingly, systems and methods for fast data recovery during switchover/failover are disclosed.
While aspects are described for systems and methods for fast data recovery during
switchover/failover, the present invention may be implemented in any number of
35 different computing systems, environments, and/or configurations, the
11

5 embodiments are described in the context of the following exemplary systems, apparatus, and methods.
In one implementation, according to the present invention, if two transactions make changes to same page in database on master, parallel data recovery threads
10 on the standby may have to wait for one another. Further, if this information is also logged as part of the WAL, the threads may decide which data recovery threads can go ahead, and which needs to wait, thereby identifying the lagging WAL sender and holding data recovery for that WAL sender till further logs from the sender is obtained.
15
In one implementation, a database system for data recovery in at least one shared-disk database is disclosed. Although the present subject matter is explained considering that the present invention is implemented in database system, it may be understood that the present invention may also be implemented in a variety of
20 computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, and the like. It will be understood that the database system may be accessed by multiple users, or applications residing on the database system. Examples of the database system may include, but are not limited to, a portable computer, a personal digital
25 assistant, a handheld node, sensors, routers, gateways and a workstation. The database system is communicatively coupled to each other and/or other nodes or a nodes or apparatuses to form a network (not shown). Examples of the database system may include, but are not limited to, a portable computer, a personal digital assistant, a handheld node, sensors, routers, gateways and a workstation.
30
In one implementation, the database system may comprise at least one first device 100 and at least one second device 200. Although the present subject matter is explained considering that the present invention is implemented in using the first device 100 and the second device 200, it may be understood that the present
35 invention may also be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe
12

5 computer, a server, a network server, and the like. It will be understood that the first device 100 and the second device 200 may be accessed by multiple users, or applications residing on the database system. Examples of the first device 100 and the second device 200 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld node, sensors, routers, gateways and a
10 workstation. The first device 100 and the second device 200 may be communicatively coupled to each other and/or other nodes or a nodes or apparatuses to form a network (not shown). Examples of the first device 100 and the second device 200 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld node, sensors, routers, gateways and a
15 workstation.
The database system, the first device 100 and the second device 200 is communicatively coupled to each other and/or other nodes or a nodes or apparatuses to form a network (not shown). In one implementation, the network
20 (not shown) may be a wireless network, a wired network or a combination thereof. The network can be implemented as one of the different types of networks, such as GSM, CDMA, LTE, UMTS, intranet, local area network (LAN), wide area network (WAN), the internet, and the like. The network may either be a dedicated network or a shared network. The shared network represents an association of the
25 different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like, to communicate with one another. Further the network may include a variety of network nodes, including routers, bridges, servers, computing nodes, storage nodes, and the like.
30
The database system, the first device 100 and the second device 200may include many processors, an interface, and a memory. The processor may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any
35 nodes that manipulate signals based on operational instructions. Among other
13

5 capabilities, the at least one processor is configured to fetch and execute computer-readable instructions or modules stored in the memory.
The interface (I/O interface) for example interface 104/204, may include a variety of software and hardware interfaces, for example, a web interface, a graphical user
10 interface, and the like. The I/O interface may allow the database system the first device 100 and the second device 200 to interact with a user directly. Further, the I/O interface may enable the database system, the first device 100 and the second device 200 to communicate with other nodes or nodes, computing nodes, such as web servers and external data servers (not shown). The I/O interface can facilitate
15 multiple communications within a wide variety of networks and protocol types, including wired networks, for example, GSM, CDMA, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. The I/O interface may include one or more ports for connecting a number of nodes to one another or to another server. The I/O interface may provide interaction between the user and
20 database system, the first device 100 and the second device 200 via, a screen provided for the interface.
The memory may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory
25 (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. The memory may include plurality of instructions or modules or applications to perform various functionalities. The memory includes routines, programs, objects, components,
30 data structures, etc., which perform particular tasks or implement particular abstract data types.
In one implementation, each of the clusters (master cluster or standby cluster) can be made up of one or more computers/servers/devices/nodes and can host 35 single/multiple databases. Additionally, these clusters can also have a centralized coordinator, which coordinates all the activities related to data management.
14

Accordingly, in one implementation, the present invention provides a database system for data recovery in at least one shared-disk database. The database system comprises at least one cluster comprising at least two or more devices. At least one first device 100 from the two or more devices is adapted to transmit at least a
10 log to at least one second device 200 from the two or more devices, the log contains at least an information to modify the shared-disk database and at least a pre-defined condition for data recovery during switchover/failover. The second device 200 is adapted to perform data recovery based on the log received from the first device 100.
15
In one implementation, in one implementation, the present invention provides a database system for data recovery in at least one shared-disk database. The database system comprises a master cluster comprising a first device 100, and a standby cluster comprising a second device 200. The first device 100 of the
20 master cluster is adapted to transmit at least a log to the second device 200 of the standby cluster, wherein the log contains at least information for modifying the shared-disk database and at least a pre-defined condition for recovery during switchover/failover. The second device of the standby cluster is adapted to perform recovery based on the log received from the first device.
25
The pre-defined condition may include a data recovery condition indicating that the second device holds the data recovery of the log received form the first device, upon determining that the first device of the master cluster lags to transmit the log, until at least one further log is obtained from the first device, the second device
30 holds till the data recovery is completed at the first device.
In one implementation, the log is at least a write-ahead logging (WAL) pre-arranged using at least a Global Log Sequence Number (GLSN).
15

5 In one implementation, the pre-defined condition for data recovery may include checking if the information received in the log to modify the shared-disk database is replayed based on the GLSN.
In one implementation, the second device is adapted to commit the information to 10 modify the shared-disk database or hold the commit based on the pre-defined condition for data recovery.
In one implementation, the second device is adapted to utilize one or more data recovery threads/nodes to replay the information received in the log to modify the 15 shared-disk database; commit the information to modify the shared-disk database; or hold the commit based on the pre-defined condition for data recovery.
In one implementation, database system characterized by a parallel data recovery in at least one shared-disk database.
20
Referring now to figure 1, a first device 100 from a master cluster of at least two or more devices adapted to transmit a log for data recovery in at least one shared-disk database is illustrated, in accordance with an embodiment of the present subject matter. The first device 100 from the master cluster comprises a processor
25 102, a memory 106 coupled to the processor 102 for executing a plurality of modules present in the memory 106. The plurality of modules may include a log generation module 110 and a transmitting module 112. The memory 106 may further include a database storage 108 configured to store database information. The log generation module 110 is configured to generate at least a log containing
30 at least information to modify the shared-disk database and at least a pre-defined condition for data recovery during switchover/failover. The transmitting module 112 is configured to transmit the log to at least one second device from a standby cluster having two or more devices.
35 The pre-defined condition may include a data recovery condition indicating that the second device holds the data recovery of the log received form the first device,
16

5 upon determining that the first device of the master cluster lags to transmit the log, until at least one further log is obtained from the first device, the second device holds till the data recovery is completed at the first device.
In one implementation, the log is at least a write-ahead logging (WAL) pre-10 arranged using at least a Global Log Sequence Number (GLSN).
In one implementation, the pre-defined condition for data recovery comprises: checking if the information received in the log to modify the shared-disk database is replayed based on the GLSN.
15
Referring now to figure 2, a second device 200 from a standby cluster of at least two or more devices data recovery in at least one shared-disk databaseis illustrated, in accordance with an embodiment of the present subject matter. The second device 200 of the standby cluster comprises a processor 202 and a memory
20 206 coupled to the processor 202 for executing a plurality of modules present in the memory 206. The plurality of modules comprises a receiving module 210, a checking module 212, and an execution module 214. The receiving module 210 is configured to receive at least a log from at least one first device from a master cluster comprising the two or more devices, the log contains at least information
25 to modify the shared-disk database and at least a pre-defined condition for data recovery during switchover/failover. The checking module 212configured to check if the information received in the log to modify the shared-disk database is replayed in the database information stored in database storage 208. The execution module214 configured to commit the information to modify the shared-
30 disk database, or hold the commit based on the pre-defined condition for data recovery.
The pre-defined condition may include a data recovery condition indicating that
the second device holds the data recovery of the log received form the first device,
35 upon determining that the first device of the master cluster lags to transmit the log,
17

5 until at least one further log is obtained from the first device, the second device holds till the data recovery is completed at the first device.
In one implementation, the log is at least a write-ahead logging (WAL) pre¬arranged using at least a Global Log Sequence Number (GLSN). 10
In one implementation, the pre-defined condition for data recovery may include checking if the information received in the log to modify the shared-disk database is replayed based on the GLSN.
15 In one implementation, the second device 200 may further utilize one or more data recovery threads/nodes to replay the information received in the log to modify the shared-disk database, commit the information to modify the shared-disk database, or hold the commit based on the pre-defined condition for data recovery.
20
Referring now to figure 3 a method comprising a first device of master cluster and a second device of the standby cluster to provide availability, fault tolerance and reliability is illustrated. In one implementation, as shown in figure 3, replication can be done on distributed database clusters, ensuring availability of a fully
25 functional standby cluster in event of failure of the master cluster. Each of these clusters can be made up of one or more computers/servers and can host single/multiple databases. Additionally, these clusters can also have a centralized coordinator, which coordinates all the activities related to data management.
30 Referring now to figure 4, a method, performed by a second device, from a standby cluster of at least two or more devices, for data recovery in at least one shared-disk database are disclosed, in accordance with an embodiment of the present subject matter. The method may be described in the general context of computer executable instructions. Generally, computer executable instructions can
35 include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular
18

5 abstract data types. The method may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.
10
The order in which the method is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method or alternate methods. Additionally, individual blocks may be deleted from the method without departing from the protection
15 scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof. However, for ease of explanation, in the embodiments described below, the method may be considered to be implemented in the above described database system, the first device 100 and/or the second device200.
20
In one implementation, for every page transfers between two nodes, we have to insert a special WAL record at the receiver, acting as a holding point, the holding point represents a condition, which means Wait until node X replayed WAL record's GLSN >= Page's GLSN. When data recovery thread encounters this
25 particular log, it will look at the other node processing data recovery, so see if the corresponding log is replayed. If yes, it will go ahead to commit, otherwise it will continue to wait.
Referring now to figure 4, a method, performed by the second device from a 30 standby cluster of at least two or more devices, for data recovery in at least one shared-disk database is disclosed.
At block 401, a node/thread starts out by reading a WAL from the WAL file.
35 At block 402, the WAL is checked to see if contains the holding point.
19

5 If WAL does not contain holding point, at block 403, the changes mentioned in WAL are committed to the database. The holding point may include a data recovery condition indicating that the second device holds the data recovery of the log received form the first device, upon determining that the first device of the master cluster lags to transmit the log, until at least one further log is obtained 10 from the first device, the second device holds till the data recovery is completed at the first device.
If WAL contains a holding point, at block 404, the node checks to see if the GLSN mentioned in holding point is applied to page or not. If it is already 15 applied, and at block 403, the WAL is committed, otherwise the node waits at block 405 for the WAL to be applied.
In one implementation, a method for data recovery in at least one shared-disk database is disclosed. The method comprises transmitting at least one first device
20 form at least one master cluster comprising at least two or more devices. The log generated by the first device of the master cluster is transmitted to at least a second device, from a standby cluster of at least two or more devices, the log contains at least information to modify the shared-disk database and at least a pre-defined condition for data recovery during switchover/failover. The method
25 further comprises performing the data recovery in the shared-disk database, by the second device, based on the log received from the first device of the master cluster.
Apart from what is discussed above, the present invention has some additional 30 advantages as provided below:
• The present invention reduces an overall data recovery time, and which enables the database to be on-line in a much lesser time.
• The present invention is beneficial in disaster data recovery scenarios where the clusters are present across geographical
35 locations.
20

5 • The present invention allows multiple nodes to participate in data
recovery during switchover/failover thereby sharing the load and reducing the data recovery time.
Referring now to figure 5, an overall processing of multiple nodes/devices 10 participating in data recovery during switchover/failover as a part of standby cluster is illustrated, in accordance with an embodiment of the present subject matter. In one implementation, as shown in figure 5, for every page transfers between two nodes, (for example node 1 and node 2), in master cluster 100 a special WAL record is inserted. At the receiver, the special WAL acts as a holding 15 point. The holding point represents a condition, which means wait until node X replayed WAL record's GLSN >= Page's GLSN. In this case, the first holding point between Node 1 and Node 2 means, hold on data recovery for WAL from Node 2, until a WAL with GLSN >= 64994 is replayed on Node 1.
20 In one implementation, when the data recovery thread encounters the special WAL it determines if the other data recovery threads are also processing data recovery on this page, and checks if the corresponding log is replayed. If the corresponding log is replayed, it will proceed to commit with the application of WAL or the receiver will continue to wait.
25
It may be noted from the above that multiple nodes can participate in data recovery during switchover/failover thereby sharing the load and reducing the data recovery time.
30 A person skilled in the art may understand that any known or new algorithms by be used for the implementation of the present invention. However, it is to be noted that, the present invention provides a method to be used during data recovery operation in shared-disk databases to achieve the above mentioned benefits and technical advancement irrespective of using any known or new algorithms.
35
21

5 A person of ordinary skill in the art may be aware that in combination with the examples described in the embodiments disclosed in this specification, units and algorithm steps may be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether the functions are performed by hardware or software depends on the particular applications and design 10 constraint conditions of the technical solution. A person skilled in the art may use different methods to implement the described functions for each particular application, but it should not be considered that the implementation goes beyond the scope of the present invention.
15 It may be clearly understood by a person skilled in the art that for the purpose of convenient and brief description, for a detailed working process of the foregoing system, apparatus, and unit, reference may be made to a corresponding process in the foregoing method embodiments, and details are not described herein again.
20 In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the described apparatus embodiment is merely exemplary. For example, the unit division is merely logical function division and may be other division in actual implementation. For example, a plurality of units
25 or components may be combined or integrated into another system, or some features may be ignored or not performed. In addition, the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented through some interfaces. The indirect couplings or communication connections between the apparatuses or units may be implemented in electronic,
30 mechanical, or other forms.
When the functions are implemented in a form of a software functional unit and sold or used as an independent product, the functions may be stored in a computer-readable storage medium. Based on such an understanding, the 35 technical solutions of the present invention essentially, or the part contributing to the prior art, or a part of the technical solutions may be implemented in a form of
22

5 a software product. The computer software product is stored in a storage medium, and includes several instructions for instructing a computer node (which may be a personal computer, a server, or a network node) to perform all or a part of the steps of the methods described in the embodiment of the present invention. The foregoing storage medium includes: any medium that can store program code, 10 such as a USB flash drive, a removable hard disk, a read-only memory (Read¬only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk, or an optical disc.
Although implementations for parallel recovery for shared-disk databases have 15 been described in language specific to structural features and/or methods, it is to be understood that the appended claims are not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as examples of implementations of the parallel recovery for shared-disk
databases.
20
25
30
35
40
23

5
WE CLAIM :
1. A database system for data recovery in at least one shared-disk database,
the database system comprising:
10 a master cluster comprising a first device, and a standby cluster
comprising a second device, wherein:
the first device of the master cluster is adapted to transmit at least a
log to the second device of the standby cluster, wherein the log contains at
least information for modifying the shared-disk database and at least a pre-
15 defined condition for recovery during switchover/failover; and
the second device of the standby cluster is adapted to perform recovery based on the log received from the first device.
2. The database system as claimed in claim 1, wherein the log is at least a
20 write-ahead logging (WAL) pre-arranged using at least a Global Log Sequence
Number (GLSN) and a holding point based on a page transfer between the first device and the second device, wherein the page comprises a page GLSN.
3. The database system as claimed in claims 1 or 2, wherein the pre-defined
25 condition comprises a data recovery condition indicating that the second device
holds the data recovery of the log received form the first device, upon determining that the first device of the master cluster lags to transmit the log, until at least one further log is obtained from the first device, the second device holds till the data recovery is completed at the first device. 30
4. The database system as claimed in claim 1, wherein the second device is
further adapted to commit the information to modify the shared-disk database or
hold the commit based on the pre-defined condition for recovery.
35 5. The database system as claimed in claim 1, wherein the second device is
further adapted to:
24

5 utilize one or more recovery threads to replay the information received in
the log to modify the shared-disk database, and perform one of the steps selected from:
commit the information to modify the shared-disk database; or hold the commit based on the pre-defined condition for recovery. 10
6. The database system as claimed in claim 1, where is standby cluster is
adapted to:
utilize one or more nodes, and one or more recovery threads to replay the information in the received log to modify shared-disk database, and perform one 15 of the steps selected from:
commit the information to modify the shared-disk database; or hold the commit based on the pre-defined condition for recovery.
7. The database system as claimed in claim 1 is characterized to perform a
20 parallel recovery in at least one shared-disk database.
8. A first device for data recovery in at least one shared-disk database, the
first device comprising:
a processor;
25 a memory coupled to the processor for executing a plurality of modules
present in the memory, the plurality of modules comprising:
a log generation module configured to generate at least one log
containing at least information for modifying the shared-disk database and
at least one pre-defined condition for recovery during switchover/failover;
30 a transmitting module configured to transmit the at least one log to
at least one device of a standby cluster.
9. The first device as claimed in claim 8, wherein the log is at least a write-
ahead logging (WAL) pre-arranged using at least a Global Log Sequence Number
35 (GLSN) and a holding point based on a page transfer between the first device and the at least one device of a standby cluster, the page comprises a page GLSN.
25

10. The first device as claimed in claim 8, wherein the pre-defined condition
comprises a data recovery condition indicating that the second device holds the
data recovery of the log received form the first device, upon determining that the
first device of the master cluster lags to transmit the log, until at least one further
10 log is obtained from the first device, the second device holds till the data recovery is completed at the first device.
11. A second device for data recovery in at least one shared-disk database, the
second device comprising:
15 a processor;
a memory coupled to the processor for executing a plurality of modules present in the memory, the plurality of modules comprising:
a receiving module configured to receive at least a log from at least
one device , the log contains at least an information to modify the shared-
20 disk database and at least a pre-defined condition for data recovery during
switchover/failover;
a checking module configured to check if the information received in the log to modify the shared-disk database is replayed;
an execution module configured to commit the information to
25 modify the shared-disk database, or hold the commit based on the pre-
defined condition for data recovery.
12. The second device as claimed in 11, wherein the log is at least a write-
ahead logging (WAL) pre-arranged using at least a Global Log Sequence Number
30 (GLSN) and a holding point based on a page transfer between the at least one device and the second device, the page comprises a page GLSN.
13. The second device as claimed in 11 or 12, wherein the pre-defined
condition comprises a data recovery condition indicating that the second device
35 holds the data recovery of the log received form the first device, upon determining that the first device of the master cluster lags to transmit the log, until at least one
26

5 further log is obtained from the first device, the second device holds till the data recovery is completed at the first device.
14. The second device as claimed in claim 11 is further configured to:
utilize one or more data recovery threads to replay the information 10 received in the log to modify the shared-disk database, and perform one of the steps selected from:
commit the information to modify the shared-disk database; or
hold the commit based on the pre-defined condition for data recovery.
15 15. The database system as claimed in claim 11, where is standby cluster is adapted to:
utilize one or more data recovery threads to replay the information
received in the log to modify shared-disk database, and perform one of the steps
selected from:
20 commit the information to modify the shared-disk database; or
hold the commit based on the pre-defined condition for data recovery.
16. A method for data recovery in at least one shared-disk database, the
method comprising:
25 transmitting, by a first device comprised in a master cluster, at least a log
to at least one second device comprised in a standby cluster, the log contains at least an information to modify the shared-disk database and at least a pre-defined condition for data recovery during switchover/failover;
performing, by second device, the data recovery based on the log received
30 from the first device.
17. The method as claimed in claim 15, wherein the log is at least a write-
ahead logging (WAL) pre-arranged using at least a Global Log Sequence Number
(GLSN) and a holding point based on a page transfer between the first device and
35 the second device, the page comprises a page GLSN.
27

5 18. The method as claimed in claim 15 or 16, wherein the pre-defined condition comprises a data recovery condition indicating that the second device holds the data recovery of the log received form the first device, upon determining that the first device of the master cluster lags to transmit the log, until at least one further log is obtained from the first device, the second device holds till the data 10 recovery is completed at the first device.
19. The method as claimed in claim 15 further comprises: commit the
information, by the second device, to modify the shared-disk database or hold the
commit based on the pre-defined condition for data recovery.
15
20. The method as claimed in claim 15 further comprises:
utilizing, by the second device, one or more data recovery threads to
replay the information received in the log to modify the shared-disk database, and
perform one of the steps selected from:
20 commit, by the second device of the standby cluster, the information to
modify the shared-disk database; or
holding, by the second device, the commit based on the pre-defined condition for data recovery.
25 21. A method, performed by a first device, to achieve data recovery in at least
one shared-disk database, the method comprising:
generating at least a log containing at least an information to modify the
shared-disk database and at least a pre-defined condition for data recovery during
switchover/failover;
30 transmitting the log generated to at least one second device to perform data
recovery based on the log received.
22. A method, performed by a second device, for data recovery in at least one shared-disk database, the method comprising:
28

receiving at least a log from at least one first device, the log contains at least an information to modify the shared-disk database and at least a pre-defined condition for data recovery during switchover/failover;
checking if the information received in the log to modify the shared-disk database is replayed; and
commit the information to modify the shared-disk database, or holding the commit based on the pre-defined condition for data recovery, the pre-defined condition comprises a data recovery condition indicating that the second device holds the data recovery of the log received form the first device, upon determining that the first device of the master cluster lags to transmit the log, until at least one further log is obtained from the first device, the second device holds till the data recovery is completed at the first device.
Dated this 27th day of February 2017
DR. SANCHITA GANGULI
of S. MAJUMDAR & CO.
Applicant's Agent
29

Documents

Orders

Section	Controller	Decision Date
15 grant	Subhra banerjee	2018-06-27
15 grant	Subhra banerjee	2018-06-27
15 grant	Subhra banerjee	2018-10-29

Application Documents

#	Name	Date
1	201747006912-RELEVANT DOCUMENTS [14-09-2023(online)].pdf	2023-09-14
1	Power of Attorney [27-02-2017(online)].pdf	2017-02-27
2	201747006912-RELEVANT DOCUMENTS [09-09-2022(online)].pdf	2022-09-09
2	Form 5 [27-02-2017(online)].pdf	2017-02-27
3	Form 3 [27-02-2017(online)].pdf	2017-02-27
3	201747006912-ASSIGNMENT WITH VERIFIED COPY [04-03-2022(online)].pdf	2022-03-04
4	Drawing [27-02-2017(online)].pdf	2017-02-27
4	201747006912-FORM-16 [04-03-2022(online)].pdf	2022-03-04
5	Description(Complete) [27-02-2017(online)].pdf_384.pdf	2017-02-27
5	201747006912-POWER OF AUTHORITY [04-03-2022(online)].pdf	2022-03-04
6	Description(Complete) [27-02-2017(online)].pdf	2017-02-27
6	201747006912-RELEVANT DOCUMENTS [17-09-2021(online)].pdf	2021-09-17
7	Other Patent Document [18-05-2017(online)].pdf	2017-05-18
7	201747006912-RELEVANT DOCUMENTS [23-03-2020(online)].pdf	2020-03-23
8	PROOF OF RIGHT [27-06-2017(online)].pdf	2017-06-27
8	201747006912-RELEVANT DOCUMENTS [21-03-2019(online)].pdf	2019-03-21
9	201747006912-PatentCertificate29-10-2018.pdf	2018-10-29
9	Correspondence by Agent_Form-1_06-07-2017.pdf	2017-07-06
10	abstract 201747006912 .jpg	2017-07-07
10	Abstract_Granted 302533_29-10-2018.pdf	2018-10-29
11	201747006912-FORM-9 [27-07-2017(online)].pdf	2017-07-27
11	Claims_Granted 302533_29-10-2018.pdf	2018-10-29
12	201747006912-FORM 18A [28-07-2017(online)].pdf	2017-07-28
12	Description_Granted 302533_29-10-2018.pdf	2018-10-29
13	201747006912-FER.pdf	2017-11-27
13	Drawings_Granted 302533_29-10-2018.pdf	2018-10-29
14	201747006912-FORM 3 [05-02-2018(online)].pdf	2018-02-05
14	Marked Up Claims_Granted 302533_29-10-2018.pdf	2018-10-29
15	201747006912-OTHERS [12-02-2018(online)].pdf	2018-02-12
15	201747006912-Response to office action (Mandatory) [27-06-2018(online)].pdf	2018-06-27
16	201747006912-FER_SER_REPLY [12-02-2018(online)].pdf	2018-02-12
16	201747006912-Written submissions and relevant documents (MANDATORY) [08-05-2018(online)].pdf	2018-05-08
17	Correspondence by Agent_Deed Of Assignment_17-04-2018.pdf	2018-04-17
17	201747006912-CLAIMS [12-02-2018(online)].pdf	2018-02-12
18	201747006912-8(i)-Substitution-Change Of Applicant - Form 6 [06-04-2018(online)].pdf	2018-04-06
18	201747006912-FORM 3 [28-02-2018(online)].pdf	2018-02-28
19	201747006912-ASSIGNMENT DOCUMENTS [06-04-2018(online)].pdf	2018-04-06
19	201747006912-HearingNoticeLetter.pdf	2018-03-27
20	201747006912-PA [06-04-2018(online)].pdf	2018-04-06
21	201747006912-ASSIGNMENT DOCUMENTS [06-04-2018(online)].pdf	2018-04-06
21	201747006912-HearingNoticeLetter.pdf	2018-03-27
22	201747006912-8(i)-Substitution-Change Of Applicant - Form 6 [06-04-2018(online)].pdf	2018-04-06
22	201747006912-FORM 3 [28-02-2018(online)].pdf	2018-02-28
23	201747006912-CLAIMS [12-02-2018(online)].pdf	2018-02-12
23	Correspondence by Agent_Deed Of Assignment_17-04-2018.pdf	2018-04-17
24	201747006912-Written submissions and relevant documents (MANDATORY) [08-05-2018(online)].pdf	2018-05-08
24	201747006912-FER_SER_REPLY [12-02-2018(online)].pdf	2018-02-12
25	201747006912-Response to office action (Mandatory) [27-06-2018(online)].pdf	2018-06-27
25	201747006912-OTHERS [12-02-2018(online)].pdf	2018-02-12
26	201747006912-FORM 3 [05-02-2018(online)].pdf	2018-02-05
26	Marked Up Claims_Granted 302533_29-10-2018.pdf	2018-10-29
27	201747006912-FER.pdf	2017-11-27
27	Drawings_Granted 302533_29-10-2018.pdf	2018-10-29
28	201747006912-FORM 18A [28-07-2017(online)].pdf	2017-07-28
28	Description_Granted 302533_29-10-2018.pdf	2018-10-29
29	201747006912-FORM-9 [27-07-2017(online)].pdf	2017-07-27
29	Claims_Granted 302533_29-10-2018.pdf	2018-10-29
30	abstract 201747006912 .jpg	2017-07-07
30	Abstract_Granted 302533_29-10-2018.pdf	2018-10-29
31	201747006912-PatentCertificate29-10-2018.pdf	2018-10-29
31	Correspondence by Agent_Form-1_06-07-2017.pdf	2017-07-06
32	201747006912-RELEVANT DOCUMENTS [21-03-2019(online)].pdf	2019-03-21
32	PROOF OF RIGHT [27-06-2017(online)].pdf	2017-06-27
33	201747006912-RELEVANT DOCUMENTS [23-03-2020(online)].pdf	2020-03-23
33	Other Patent Document [18-05-2017(online)].pdf	2017-05-18
34	201747006912-RELEVANT DOCUMENTS [17-09-2021(online)].pdf	2021-09-17
34	Description(Complete) [27-02-2017(online)].pdf	2017-02-27
35	201747006912-POWER OF AUTHORITY [04-03-2022(online)].pdf	2022-03-04
35	Description(Complete) [27-02-2017(online)].pdf_384.pdf	2017-02-27
36	201747006912-FORM-16 [04-03-2022(online)].pdf	2022-03-04
36	Drawing [27-02-2017(online)].pdf	2017-02-27
37	Form 3 [27-02-2017(online)].pdf	2017-02-27
37	201747006912-ASSIGNMENT WITH VERIFIED COPY [04-03-2022(online)].pdf	2022-03-04
38	Form 5 [27-02-2017(online)].pdf	2017-02-27
38	201747006912-RELEVANT DOCUMENTS [09-09-2022(online)].pdf	2022-09-09
39	Power of Attorney [27-02-2017(online)].pdf	2017-02-27
39	201747006912-RELEVANT DOCUMENTS [14-09-2023(online)].pdf	2023-09-14

Search Strategy

1	search_17-11-2017.pdf