Abstract: In one implementation, the present invention when used during backup operation, it processes the contents in the database, file by file, by getting rid of or by removing the dead tuples in the pages, and ensuring a page is contiguous. The data in multiple places in the page is aligned after the page head which allows a maximum usable content in the page available as a single chunk in the page. Further, a hint on the length of data, while writing the backup content of the changed pages is provided in the page header. The present invention when used in offline data, dead tuples in the pages, may be removed in the same way as explained above, and when used during restore operation, checks if the pages have already been de-fragmented. If the pages are de-fragmented, the present invention writes only portions of the page that contain the valid data. (TO BE PUBLISHED WITH FIGURE 4 & 5(b))
Claims:
1. A method utilizing at least a computer system for backing up data contained in a database, the database is constructed of pages maintaining the data in small segments of data, the method comprising:
fetching at least one page selected from the pages containing at least one page header and at least data available in multiple places;
de-fragmenting the page by removing at least dead tuples present on the page, and aligning the data present in multiple places after the at least one page header;
maintaining a metadata indicating a free space for data available in the page after removing the dead tuples, wherein the metadata is maintained in the page header; and
backing up the data aligned in the page after the defragmentation.
2. The method as claimed in claim 1, further comprising: storing the data backed up into at least one storage media.
3. The method as claimed in claim 1, wherein, after the defragmenting the data available in the page is preferably in the form of a single chunk in the page.
4. The method as claimed in claim 1, wherein, during restore operation of the database, the method further comprises writing the data aligned in the page, specifically, after backing up the data aligned in the page.
5. A system for backing up data contained in a database, the database is constructed of pages maintaining the data in small segments of data, the system comprising:
a processor;
a memory for storing a database; and
the memory coupled to the processor for executing a plurality of modules present in the memory, the plurality of modules comprising:
a fetching module configured to fetch at least one page selected from the pages containing at least a page header and at least data available in multiple places;
a defragmentation module configured to de-fragment the page by removing at least dead tuples present on the page, and aligning the data present in multiple places after the at least one page header;
a maintenance module configured to maintain a metadata, wherein the metadata indicates a free space for data available in the page after removing the dead tuples, and the metadata is maintained in the page header; and
a backup module configured to back up the data aligned in the page after the defragmentation.
6. The system as claimed in claim 5, wherein the data backed up into at least the memory.
7. The system as claimed in claim 5, wherein after the defragmentation the data available in the page is preferably in the form of a single chunk in the page.
8. The system as claimed in claim 5, further comprising a restore module configured to write the data aligned in the page, specifically, after backing up the data aligned in the page.
, Description:TECHNICAL FIELD
The present subject matter described herein, in general, relates to databases, and more particularly, to systems and methods for performing an efficient dead tuple removal and compaction during backup and restore.
BACKGROUND
Databases are widely used technical means for storing and handling large amounts of data. Due to the increasing complications of databases the efficiency and reliability of the databases are of prime importance. An unexpected crash or inconsistency in a database can result in enormous losses not only because of the time necessary to restore the database from a backup, but in the worst case also in the loss of data.
The backup operation creates a copy of the database / clusters the database to reconstruct the database in case of disaster. When a physical backup operation of the database is performed, the data stored in the database would in fragmented form. Hence, during a restore operation, i.e., reconstructing the database from a previous backup, also the data is fragmented and after the restore operation, the database has to perform a run a vacuum operation to defragment the data. The vacuum operation reclaims storage occupied by dead tuples. However, when a data compression operation is done during the backup operation, the efficiency, in terms of occupied size, would be less due to the fragmentation.
In order to solve this problem, there are various techniques proposed in the prior-art, some of the techniques are mentioned below:
• Perform the vacuum operation after the restore operation and thereby recovering i.e., getting the database to a consistent point the database.
• Backup the database as is from the database.
• Restore the database as is from backup.
• Allocate fresh space after recovery
To summarize the various techniques available in the prior-art to solve the above problem, according to the conventional technique, the physical backup of the database is done by copying the files of the database system. When the database system is corrupted, the backed-up files are restored and the database would be recovered. Once the recovery is successful, the database will start service. The overall conventional process of database recovery is as shown in figure 1.
In spite of the above mentioned solution available in the prior-art, the convention process of database recovery has certain drawbacks as the downtime for database is unacceptable because of the availability issues i.e., a user would want to connect to database as soon as possible. So the database has to be restored as fast as possible. Also once the restore operation is done, the user does not want any restriction in accessing the tables of the database. Further, during the process of recovery, if the database is fragmented, some CPU cycles will be spent in vacuum after restore. During this time because of locking, tables are unusable. Even if vacuum is done, more space has to be allocated in disk for subsequent insertion operations, so the space in holes will be unused.
To summarize the technical problems available in the prior-art, there is a dire need to reduce the backup time during the backup operation, reduce the restore time during the restore operation, provide a better space utilization after the restore operation, thereby by efficiently and effectively perform the recovery of the database by reducing the overall time and efforts required by the CPU of the system.
SUMMARY
This summary is provided to introduce concepts related to system and method for page defragmentation and data compaction during data backup, and the same are further described below in the detailed description. This summary is not intended to identify essential features of the claimed subject matter nor is it intended for use in determining or limiting the scope of the claimed subject matter.
The main objective of the present invention is to solve the technical problem as recited above by providing a system and method for reducing the backup time during the backup operation, reducing the restore time during the restore operation, providing a better space utilization after the restore operation, thereby by efficiently and effectively perform the recovery of the database by reducing the overall time and efforts required by the CPU of the system.
Another objective of the present invention is to provide a system and a method to be used during backup operation to reduce the backup time during the backup operation, reduce the restore time during the restore operation, provide a better space utilization after the restore operation, thereby by efficiently and effectively perform the recovery of the database by reducing the overall time and efforts required by the CPU of the system.
Another objective of the present invention is to provide a system and a method to be used on already backed-up data to reduce the backup time during the backup operation, reduce the restore time during the restore operation, provide a better space utilization after the restore operation, thereby by efficiently and effectively perform the recovery of the database by reducing the overall time and efforts required by the CPU of the system.
Another objective of the present invention is to provide a system and a method to achieve de-fragmentation of a page during backup
Another objective of the present invention is to provide a system and a method to reduce the input / output (I/O) for a page write during backup.
Another objective of the present invention is to provide a system and a method to achieve an offline de-fragmentation of backup database.
Yet another objective of the present invention is to provide a system and a method to achieve removal of dead tuples during restore
In order to provide a technical solution to the technical problems mentioned in the preceding section, the present invention provides a mechanism that is configured to be used during the backup process or during the restore process or optionally during both the processes, or is also used on offline data.
Accordingly, in one implementation, the present invention provides a system for backing up data contained in a database constructed of the pages maintaining the data in small segments of data. The system comprises a processor, a memory for storing a database, and the memory coupled to the processor for executing a plurality of modules present in the memory. The plurality of modules comprises a fetching module configured to fetch at least one page selected from the pages containing at least a page header and at least the data available in multiple places; a defragmentation module configured to de-fragment the page by removing at least dead tuples present on the page, and aligning the data present in multiple places after at least one page header; a maintenance module configured to maintain a metadata in the page header, the metadata indicates a free space for data available in the page after removing the dead tuples; and a backup module configured to back up the data aligned in the page after defragmentation.
In one implementation, the present invention provides a method utilizing at least a computer system for backing up data contained in a database constructed of the pages maintaining the data in small segments of data. The method comprises:
• fetching at least one page selected from the pages containing at least a page header and at least the data available in multiple places;
• de-fragmenting the page by removing at least dead tuples present on the page, and aligning the data present in multiple places after at least one page header;
• maintaining a metadata in the page header, the metadata indicates a free space for data available in the page after removing the dead tuples; and
• backing up the data aligned in the page after defragmentation.
In contrast to the prior-art techniques available, the present invention when used during the backup process, it will process the contents in the database file by file by getting rid of the dead tuples in the pages, and ensuring a page is contiguous, and by providing a hint on the length of data, while writing the backup content of the changed pages. Further, when the present invention is implemented on the offline data, the dead tuples in the pages, can be removed in the same way. Furthermore, when the present invention is used during the restore process, it checks if the pages have already been de-fragmented. If de-fragmented, it will write only portions of the page that contain valid data.
BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS
The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to refer like features and components.
Figure 1 illustrates a database recovery process as implemented in the prior-art.
Figure 2 illustrates an overall process of data backup and restores operation, in accordance with an embodiment of the present subject matter.
Figure 3 illustrates a system for backing up data contained in a database, in accordance with an embodiment of the present subject matter.
Figure 4 illustrates a comparative flow chart for the backup process as available in the prior-art and as disclosed in accordance with an embodiment of the present subject matter.
Figure 5 (a) illustrates the disconnected chunks of data in the data page obtained after the deletion of some data from the database/ data page, in accordance with an embodiment of the present subject matter.
Figure 5 (b) illustrates the updated page after the defragmentation, in accordance with an embodiment of the present subject matter.
Figure 6 illustrates a method utilizing at least a computer system for backing up data contained in a database, in accordance with an embodiment of the present subject matter.
It is to be understood that the attached drawings are for purposes of illustrating the concepts of the invention and may not be to scale.
DETAILED DESCRIPTION OF THE PRESENT INVENTION
The following clearly describes the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Apparently, the described embodiments are merely a part rather than all of the embodiments of the present invention. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts shall fall within the protection scope of the present invention.
The invention can be implemented in numerous ways, as a process, an apparatus, a system, a composition of matter, a computer readable medium such as a computer readable storage medium or a computer network wherein program instructions are sent over optical or electronic communication links. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention.
A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.
Systems and methods for page defragmentation and data compaction during data backup are disclosed.
While aspects are described for system, system and method for page defragmentation and data compaction during data backup, the present invention may be implemented in any number of different computing systems, environments, and/or configurations, the embodiments are described in the context of the following exemplary systems, apparatus, and methods.
In one implementation, the present invention provides a mechanism a mechanism that is configured to be used during the backup process or during the restore process or optionally during both the processes, or is also used on offline data.
Referring now to figure 2 illustrates an overall process of data backup and restore operation, in accordance with an embodiment of the present subject matter. As shown in figure 2, when the present invention is used during the backup process, it may process the contents in the database file by file by getting rid of or by removing the dead tuples in the pages, and ensuring a page is contiguous. It may be understood by the person skilled in the art that, the data content in a page may be present in multiple places in the page. According to the present invention, the data in multiple places in the page is aligned after the page head. The technical benefit of this alignment is that, a maximum usable content in the page will be available as a single chunk in the page. The present invention also enables to providing a hint on the length of data, while writing the backup content of the changed pages. The technical benefit of this type of the alignment is that it allows the I/O of only the data that is a maximum usable content. For example, if the continuous data of 8k page is only 2k, then the present invention enables to perform an I/O for only 2k, and not the whole 8k.
In one implementation, when the present invention is used in offline data, dead tuples in the pages, may be removed in the same way as explained above. Hence, the present invention effectively increases the space in each page, but may not work to reduce the total number of pages – which will involve more changes to the metadata. It may be understood by the person skilled in the art that the metadata is maintained in page header and may be broadly considered as page header itself. According to the present invention, upon vacuum and defragmentation the metadata i.e., the page header will be updated and may indicate the free space available on the page. Further, the present invention provides flexibility to use any if the existing page header structure to achieve the task.
In one implementation, as shown in figure 2, when the present invention is used during restore operation, the present invention may check if the pages have already been de-fragmented. If the pages are de-fragmented, the present invention may write only portions of the page that contain the valid data. In case if the pages are not de-fragmented, the present invention based on the user option or automatically, may de-fragment and load the page to database, so that further vacuum is not necessary.
Referring now to figure 3, a system for backing up data contained in a database, is illustrated, in accordance with an embodiment of the present subject matter. In one implementation, the present invention provides a system 300 for backing up data contained in a database 316, the apparatus that comprises a processor 302, and a memory 306 coupled to the processor for executing a plurality of modules stored in said memory 306.
Although the present subject matter is explained considering that the present invention is implemented as system 300, it may be understood that the system 300 may also be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, and the like. It will be understood that the system 300 may be accessed by multiple users, or applications residing on the system 300. Examples of the system 300 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, sensors, routers, gateways and a workstation. The system 300 is communicatively coupled to other devices or a nodes or apparatuses to form a network (not shown).
In one implementation, the network (not shown) may be a wireless network, a wired network or a combination thereof. The network can be implemented as one of the different types of networks, such as GSM, CDMA, LTE, UMTS, intranet, local area network (LAN), wide area network (WAN), the internet, and the like. The network may either be a dedicated network or a shared network. The shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like, to communicate with one another. Further the network may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like.
The system 300 as illustrated in accordance with an embodiment of the present subject matter may include at least one processor 302, an interface 304, and a memory 306. The at least one processor 302 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the at least one processor 302 is configured to fetch and execute computer-readable instructions or modules stored in the memory 306.
The interface 304 (I/O interface) may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface 304 may allow the system 300 to interact with a user directly. Further, the I/O interface 304 may enable the system 300 to communicate with other devices or nodes, computing devices, such as web servers and external data servers (not shown). The I/O interface 304 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, GSM, CDMA, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. The I/O interface 304 may include one or more ports for connecting a number of devices to one another or to another server. The I/O interface 304 may provide interaction between the user and the system 300 via, a screen provided for the interface 304.
The memory 306 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. The memory 306 may include plurality of instructions or modules or applications to perform various functionalities. The memory 306 includes routines, programs, objects, components, data structures, etc., which perform particular tasks or implement particular abstract data types.
In one implementation, the plurality of modules may include but not limited to at least one fetching module 308, defragmentation module 310, maintenance module 314, and backup module 316.
In one implementation, a system 300 for backing up data contained in a database, the database is constructed of the pages maintaining the data in small segments of data. The system 300 comprises at least one fetching module 308, defragmentation module 310, maintenance module 314, and backup module 316. The fetching module configured to fetch at least one page selected from the pages containing at least a page header and at least the data present in multiple places. The defragmentation module configured to de-fragment the page by removing at least dead tuples present on the page, and thereafter aligning the data present in multiple places after at least one page header. The maintenance module configured to maintain a metadata in the page header, the metadata indicates a free space for data available in the page after removing the dead tuples. The backup module configured to back up the data aligned in the page after defragmentation.
In one implementation, the data backed up into at least the memory.
In one implementation, after defragmentation the data available in the page is preferably in the form of a single chunk in the page.
In one implementation, the system may also comprise of a restore module configured to write the data aligned in the page, specifically, after backing up the data aligned in the page.
Referring now to figure 4, a comparative flow chart for the back up process as available in the prior-art and as disclosed in accordance with an embodiment of the present subject matter, is illustrated. A person skilled in the art may understand that when database physical backup operation is being performed, each file data will be copied to memory in chunks called pages (same as database page size). This data reside in memory until written to backup location, which may be another disk location, and the writing into backup location is also carried out in multiples of page size. In contrast with this convention mechanism, the present invention as shown in figure 4 utilizes the data in the memory for vacuum operations. Because this data may not include cost of disk input/output (I/O) nor bound to any transaction, this data on memory is used to perform the vacuum operation (removing of dead tuples) and defragmenting of data (reclaiming continuous memory locations for new data). The person skilled in the art may understand that, the removal of dead tuple and reclaiming the memory locations of dead tuples may be achieved by any of know or new techniques. It is also to be noted that, in order to avoid complexity in understanding the present invention, any specific removal of dead tuple and reclaiming the memory locations of dead tuples techniques are not disclosed in greater details.
Also, as already know the tuples that are deleted or obsolete by update/delete operations in database are not physically removed from their tables and is present in database until cleaned. Also, each page may be already be fragmented and further the current vacuum operation also leaves the page fragmented. As a result of this fragmentation, a continuous block of space is not available for new data after restore operation. The present invention runs the vacuum operation on these dead tuples on the pages in the backup operation memory. Also, as the page is outside database process space, the present invention runs defragmentation on these page data, as transactions are not involved. Post defragmentation each page header is updated with the data location changes and the page is ready for writing into backup location.
Referring now to figure 5 (a), the figure 5 (a) illustrates the disconnected chunks of data in the data page obtained after the deletion of some data from the database/ data page, in accordance with an embodiment of the present subject matter. As shown in figure 5 (a), the database maintains the data in small segments of data files called pages. Whenever there is a data change via, user operations like Update/Delete, the data is not deleted physically, but is marked deleted. Over a period of time the deleted data sections make lot of disconnected chunks of data as shown in the figure 5 (a).
If users “Insert” operation request a bigger chunk of data then there are chances that data cannot be accommodated in the current page, because of unavailability of contiguous memory. Database does automatic defragmentation (Removing dead tuples) when such a scenario occurs and it’s called “Vacuum”. The updated page looks like as shown in the figure 5 (b) in accordance with an embodiment of the present inventions.
Figure 6 illustrates method utilizing at least a computer system for backing up data contained in a database, in accordance with an embodiment of the present subject matter. The method may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. The method may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.
The order in which the method is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method or alternate methods. Additionally, individual blocks may be deleted from the method without departing from the protection scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof. However, for ease of explanation, in the embodiments described below, the method may be considered to be implemented in the above described system 300.
At block 602, at least one page selected from the pages containing at least a page header and at least the data present in multiple places is fetched.
At block 604, the page by removing at least dead tuples present on the page is defragmented.
At block 606, the data present in multiple places is aligned after at least one page header.
At block 608, a metadata is maintained in the page header, the metadata indicates a free space for data available in the page after removing the dead tuples.
At block 610, the data aligned in the page after defragmentation is backed-up.
In one implementation, the method further comprises the data backed up to be stored into at least one storage media.
In one implementation, after defragmenting the data available in the page is preferably in the form of a single chunk in the page.
In one implementation, during restore operation of the database, the method further writes the data aligned in the page, specifically, after backing up the data aligned in the page.
Apart from what is explained above, the present invention also include the below mentioned advantages:
? The present invention reduces the size and time of the backup can be reduced.
? The present invention restores time can be reduced.
? The present invention enables to achieve space utilization of the data is improved after restore, so page allocation after restore can be reduced.
? The present invention enables to achieve de-fragmentation of a page during backup.
? The present invention reduces I/O for a page write during backup.
? The present invention enables offline de-fragmentation of backup database.
? The present invention enables removal of dead tuples during restore.
A person skilled in the art may understand that any known or new algorithms may be used for the implementation of the present invention. However, it is to be noted that, the present invention provides a method to be used during back up operation to achieve the above mentioned benefits and technical advancement irrespective of using any known or new algorithms.
A person of ordinary skill in the art may be aware that in combination with the examples described in the embodiments disclosed in this specification, units and algorithm steps may be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether the functions are performed by hardware or software depends on the particular applications and design constraint conditions of the technical solution. A person skilled in the art may use different methods to implement the described functions for each particular application, but it should not be considered that the implementation goes beyond the scope of the present invention.
It may be clearly understood by a person skilled in the art that for the purpose of convenient and brief description, for a detailed working process of the foregoing system, apparatus, and unit, reference may be made to a corresponding process in the foregoing method embodiments, and details are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the described apparatus embodiment is merely exemplary. For example, the unit division is merely logical function division and may be other division in actual implementation. For example, a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed. In addition, the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented through some interfaces. The indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms.
When the functions are implemented in a form of a software functional unit and sold or used as an independent product, the functions may be stored in a computer-readable storage medium. Based on such an understanding, the technical solutions of the present invention essentially, or the part contributing to the prior art, or a part of the technical solutions may be implemented in a form of a software product. The computer software product is stored in a storage medium, and includes several instructions for instructing a computer device (which may be a personal computer, a server, or a network device) to perform all or a part of the steps of the methods described in the embodiment of the present invention. The foregoing storage medium includes: any medium that can store program code, such as a USB flash drive, a removable hard disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk, or an optical disc.
Although implementations for system and method for page defragmentation and data compaction during data backup have been described in language specific to structural features and/or methods, it is to be understood that the appended claims are not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as examples of implementations of the system and method for page defragmentation and data compaction during data backup.