Abstract: A system, method and apparatus for faster query execution by optimizing index in a database management system (DBMS) are disclosed. The present invention provides a mechanism to optimize the ttree for duplicate key searches, wherein during data insertion it detects duplication and records as metadata. The same knowledge of duplicated records is used in future search to optimize the selection and projection of same instead of comparing record value for each record. The present invention, improves performance of any database query containing the scan on the TTree Index table, which has largely duplicated key values. (To be published with figure 3)
CLIAMS:
1. An apparatus for optimizing the processing of a query in a system characterized in that said apparatus configured to contain an duplicate key search optimized index expressed in the format of an optimized ttree data structure and thereby enable optimization of at least one database query performance by utilizing the said optimized index for data containing large amount of duplicate records.
2. In a system having an apparatus, and the system including means for logically ordering the records by an index, said index comprising a tree structure storing key values derived from values stored by said at least one field of each record, said index including a plurality of nodes, each node comprising a particular key value together with a record identifier for indicating a particular record which stores said particular key value, said nodes being linked via. pointers, the apparatus implemented method for optimizing the processing of a query comprising:
creating at least one index, in response to the query received specifying at least one condition (key value) for selecting particular records, wherein the index stores the key values and an additional memory in the nodes for storing at least one metadata generated;
generating, during the creation of the index, the metadata for at least one duplicate key value by checking duplicacy in the key value specified in the query received, and thereby storing the metadata generated in the additional memory;
performing at least one operation based on the query received, and thereby updating the metadata stored in the nodes based on the operation performed, wherein the operation is configured to change the records;
performing a condition search for retrieving the record based on the query received;
finding, based on the condition, a range (minimum to maximum) to generate a subtree from the tree, and thereby locating the key value in the nodes, and isolating the subtree comprising the key value;
finding, based on the condition, a node match in the subtree generated to find the key value, wherein the node match is at least one of a full node match or a partial node match, and thereby selecting, by using the metadata stored, the duplicate key values to determine the duplicacy in the index; and
displaying the record (result of execution) retrieved based on the query received.
3. The method as claimed in claim 2, wherein the query is received in a format comprising CREATE INDEX query along with [OPTIMIZED FOR DUPLICATE] option.
4. The method as claimed in claim 2, wherein the metadata is a bit-based metadata (bitmap) for each node in the tree, wherein for each record having duplicates, end bits (tail) of a duplicate list is marked with “0”, while the remaining bits of the duplicate list is marked as “1”.
5. The method as claimed in claims 2 and 4 comprises the bit-based metadata created by:
generating a map of possible matches; and
performing a Bit-wise AND of the map of possible matches generated with the bitmap of the node of the tree and thereby generating a result with an exact match of the key values and then restricting the search scope within the metadata bitmap.
6. The method as claimed in claim 2, wherein duplicacy in the key value specified in the query received is checked during insertion in the tree during which first the node is searched for insertion and then position within the node, and if a record with the same key value is found during search, then new record is add to the end of the duplicates identified, and if duplicate meta-data bit field updated.
7. The method as claimed in claim 2, wherein the operation to be performed is selected from a group of SQL operations comprising insert, update, delete, record on table, or any combination thereof.
8. The method as claimed in claim 2, wherein the key value is located by a range search technique using a range key pair i.e., minimum value and maximum value.
9. The method as claimed in claim 2, wherein the full node match indicates all nodes in subtree are same and/or matching the condition (key value in the condition), and if found, extracting the tuple id’s along with the data values from the nodes without reading key values stored at nodes.
10. The method as claimed in claim 2, wherein the partial node match indicates at least one node but not all nodes in the sub-tree are same and/or matching the condition (key value in the condition), if found, using the metadata (the key color/mark) with duplicity information to identify the duplicates.
11. The method as claimed in claim 2 comprises coloring/marking the bitmap metadata based on duplicacy identified during record insertion.
12. An apparatus, to be used in a system, the system including means for logically ordering the records by an index, said index comprising a tree structure storing key values derived from values stored by said at least one field of each record, said index including a plurality of nodes, each node comprising a particular key value together with a record identifier for indicating a particular record which stores said particular key value, said nodes being linked via. pointers, the apparatus for optimizing the processing of a query comprising:
a memory;
a processor coupled to the memory for executing a plurality of instructions present in the memory, when executed, the processor is operable to:
create at least one index, in response to the query received specifying at least one condition (key value) for selecting particular records, wherein the index stores the key values and an additional memory in the nodes to store at least one metadata generated;
generate, during the creation of the index, the metadata for atleast one duplicate key value by checking duplicacy in the key value specified in the query received, and thereby store the metadata generated in the additional memory;
perform at least one operation based on the query received, and thereby update the metadata stored in the nodes based on the operation performed, wherein the operation is configured to change the records;
perform a condition search for retrieving the record based on the query received;
find, based on the condition, a range (minimum to maximum) to generate a subtree from the tree, and thereby locate the key value in the nodes, and isolate the subtree comprising the key value;
find, based on the condition, a node match in the subtree generated to find the key value, wherein the node match is at least one of a full node match or a partial node match, and thereby select, by using the metadata stored, the duplicate key values to determine the duplicacy in the index; and
display the record (result of execution) retrieved based on the query received.
13. The apparatus as claimed in claim 12, wherein the query is received in a format comprising CREATE INDEX query along with [OPTIMIZED FOR DUPLICATE] option.
14. The apparatus as claimed in claim 12, wherein the metadata is a bit-based metadata for each node in the tree, wherein for each record having duplicates, end bits (tail) of a duplicate list is marked with “0”, while the remaining bits of the duplicate list is marked as “1”.
15. The apparatus as claimed in claims 12, wherein the Bit-based metadata after locating the key value is generated by:
generating a map of possible matches; and
performing a Bit-wise AND of the map generated with the bitmap of the node of the tree and thereby generating a result with an exact match of the key values and then restricting the search scope within the metadata bitmap.
16. The apparatus as claimed in claim 12, wherein duplicacy in the key value specified in the query received is checked during insertion in the tree during which first the node is searched for insertion and then position within the node, and if a record with the same key value is found during search, then new record is add to the end of the duplicates identified, and if duplicate meta-data bit field updated.
17. The apparatus as claimed in claim 12, wherein the operation to be performed on the tree is selected from a group of SQL operations comprising insert, update, delete, record on table, or any combination thereof.
18. The apparatus as claimed in claim 12, wherein the key value is located by a range search technique usinga range key pair i.e., minimum value and maximum value.
19. The apparatus as claimed in claim 12, wherein the full node match indicates all nodes in subtree are same and/or matching the condition (key value in the condition), and if found, extracting the tuple id’s along with the data values from the nodes without reading key values stored at nodes.
20. The apparatus as claimed in claim 12, wherein the partial node match indicates at least one node but not all nodes in the sub-tree are same and/or matching the condition (key value in the condition), if found, using the metadata (the key color/mark) with duplicity information to identify the duplicates.
21. The apparatus as claimed in claim 13 is configured to color/mark the bitmap metadata based on duplicacy identified during record insertion.
,TagSPECI:
TECHNIAL FIELD
The present subject matter described herein, in general, relates to database management system and more particularly to a system, method and apparatus for faster query execution by optimizing index in a database management system (DBMS).
BACKGROUND
A database system is generally used to answer queries requesting information from the database stored. A query may be defined as a logical expression over the data and the data relationships expressed in the database, and results in the identification of a subset of the database. In the recent advancements, database systems enable a single query execution to be run in parallel.
Normally as per conventional query execution techniques, in order to execute any query, the database management systems (DBMS) first converts the query to a parse tree, then the parse tree is converted into a plan tree. Based on the plan tree generated, an executor tree is created. Finally, a database executor executes each node of the execution tree to yield the final result.
The DBMS may involve various stages for the execution of a query received. The exemplary stages involved in any sequential query language (SQL) statement processed by the RDBMS is given below:
Stage 1: Parse (Syntax Check): It parses the SQL statement syntax and decides whether it conform to the standard
Stage 2: Analyze: It checks whether the objects (tables, columns, etc.) used in SQL statement exist in Database. This phase will extract any bind variable if there are any.
Stage 3: Optimize: It chooses the best plan for query execution based on cost.
Stage 4: Execute: It executes the best plan generates in previous step and returns the result.
The execution tree generated during the execution process contains leaf and internal nodes, where leaf node may be a scan node (including a sequence scan or an index scan) based on some Qual (i.e., qualification, if any) on base table. The internal nodes take result of leaf node and certain operations (known to person skilled in the art) of join or sort to yield intermediate result, which finally apply some constraint on top (e.g. join or any clause) or at root node to yield the final result.
In Index Scan, an index is a special data stores which organizes the references to the data in a table in specific order when insert is performed on the table, such that, retrieval or search of data on the table is faster and efficient.
A TTree (T-tree, or ttree, or t-tree) is one of the popular choices of data structure for maintaining index for In-memory database management system. The T-tree is a type of binary tree (B-tree) data structure that is used by main-memory databases, such as Datablitz, EXtremeDB, MySQL Cluster, Oracle TimesTen and MobileLite.As per the data available on Wikipedia, the T-tree is a balanced index tree data structure optimized for cases where both the index and the actual data are fully kept in memory. T-trees seek to gain the performance benefits of in-memory tree structures such as AVL trees while avoiding the large storage space overhead which is common to them.T-trees do not keep copies of the indexed data fields within the index tree nodes themselves. Instead, they take advantage of the fact that the actual data is always in main memory together with the index so that they just contain pointers to the actual data fields.
Searching on TTree is done using a user’s condition, received from the query fired on DBMS, for which Index are to be created and remaining conditions (if any, in the query) are used as Qual (Qualification node) in the execution tree. During the searching operation, the conditions are first converted into a range(min to max) and ttree scan is done for min first.Once record is found, a simple Tree traversal till max is found and all the matching records are projected to the upper node of the execution tree.
Currently in-memory, database systems uses TTree based index for performing range queries. Theindex may find (search) a record, i.e., time take for searching at least one record is in O(log n) time. The Tree indexes are optimized for range queries. The TTree indexes may be optimized for space too. They store data in each node (not only in the leaflike in B-Tree). They also store only row-ids and do not store the value of the key in the index. However, for each value comparison (record searching), the DBMS may have to access the page memory, which generally causes a cache miss. To avoid the cache miss problem, some users maintain records because of which the index key is largely duplicated. This duplicate key value is stored in order of record id within a TTree node.
Conventionally, the duplicate record search problem in TTree is handled by firstly finding the left most nodes which contains the record, then in the node performs binary search to find the matching record. Considering the presence of duplicates in the TTree, the record matched in Mid as result of Binary Search may have left as well as right adjacent record/s also matching the key. For search to be proper a linear search left is performed till all matching in current node is found and then right linear search is performed till all matching record is found.
SUMMARY
This summary is provided to introduce concepts related to a system, method and apparatus for faster query execution by optimizing index in a database management system (DBMS) are further described below in the detailed description. This summary is not intended to identify essential features of the claimed subject matter nor is it intended for use in determining or limiting the scope of the claimed subject matter.
TECHNICAL PROBLEM: As per the convention technologies, Tree indexes are not highly optimized for searches of records where there are a lot of duplicates. Due to presence of duplicates the tree index search reduces to O(n) time i.e., search cost increases. Along with increase in search cost creates a scenario where the DBMS have to access more random memory. Along with more access of random memory the cache miss ratio also tends to go up. In view of this technical problem, there is a need to provide a mechanism which can optimize a search of duplicated key values in the index (in the execution tree) during the execution of any database query in DBMS to optimize the execution process, efficiently search/ retrieve the intended result from the database, and thereby increase the overall performance of the machine (CPU) to achieve the higher speed.
TECHNICAL SOLUTION: For solving the above mentioned problems and the other problems available in the prior-art a mechanism that can optimize a search of duplicated key values in the index (in the execution tree) during the execution of any database query in DBMS to optimize the execution process, efficiently search/ retrieve the intended result from the database, and thereby increase the overall performance of the machine (CPU) to achieve the higher speed is provided in the present invention. Further, the present invention provides a mechanism to improve efficiency of search on data indexed with tree (TTree) having high number of duplicate key values, by reducing the number of key comparisons to select the desired records
The plurality of aspects provides a system, method and apparatus for faster query execution by optimizing index in a database management system (DBMS). The technical solutions are as follows:
Accordingly, in one aspect of the present invention, a system having an apparatus, and the system including means for logically ordering the records by an index, said index comprising a tree structure storing key values derived from values stored by said at least one field of each record, said index including a plurality of nodes, each node comprising a particular key value together with a record identifier for indicating a particular record which stores said particular key value, said nodes being linked via. Pointers are disclosed. The apparatus implemented method for optimizing the processing of a query comprising:
• creating at least one index, in response to the query received specifying at least one condition (key value) for selecting particular records, wherein the index stores the key values and an additional memory in the nodes for storing at least one metadata generated;
• generating, during the creation of the index, the metadata for at least one duplicate key value by checking duplicacy in the key value specified in the query received, and thereby storing the metadata generated in the additional memory;
• performing at least one operation based on the query received, and thereby updating the metadata stored in the nodes based on the operation performed, wherein the operation is configured to change the records;
• performing a condition search for retrieving the record based on the query received;
• finding, based on the condition, a range (minimum to maximum) to generate a subtree from the tree, and thereby locating the key value in the nodes, and isolating the subtree comprising the key value;
• finding, based on the condition, a node match in the subtree generated to find the key value, wherein the node match is at least one of a full node match or a partial node match, and thereby selecting, by using the metadata stored, the duplicate key values to determine the duplicacy in the index; and
• displaying the record (result of execution) retrieved based on the query received.
In one aspect of the present invention, an apparatus to be used in a system, and the system including means for logically ordering the records by an index, said index comprising a tree structure storing key values derived from values stored by said at least one field of each record, said index including a plurality of nodes, each node comprising a particular key value together with a record identifier for indicating a particular record which stores said particular key value, said nodes being linked via. Pointers are disclosed. The apparatus for optimizing the processing of a query comprises a memory, and a processor coupled to the memory for executing a plurality of instructions present in the memory. When the plurality of instructions present in the memory executed, the processor is operable to:
• create at least one index, in response to the query received specifying at least one condition (key value) for selecting particular records, wherein the index stores the key values and an additional memory in the nodes to store at least one metadata generated;
• generate,during the creation of the index, the metadata for at least one duplicate key value by checking duplicacy in the key value specified in the query received, and thereby store the metadata generated in the additional memory;
• perform at least one operation based on the query received, and thereby update the metadata stored in the nodes based on the operation performed, wherein the operation is configured to change the records;
• perform a condition search for retrieving the record based on the query received;
• find, based on the condition, a range (minimum to maximum) to generate a subtree from the tree, and thereby locate the key value in the nodes, and isolate the subtree comprising the key value;
• find, based on the condition, a node match in the subtree generated to find the key value, wherein the node match is at least one of a full node match or a partial node match, and thereby select, by using the metadata stored, the duplicate key values to determine the duplicacy in the index; and
• display the record (result of execution) retrieved based on the query received.
In implementation, the present invention enables to mark duplicate keys during index modification phase so as to use the knowledge (marking) to optimize search.
In one implementation, the present invention enables to mark duplicate keys inside the ttree so as to avoid cache misses while executing searches.
In one implementation, the present invention enables to identify a complete ttree node containing the same key so as to complete avoid key search in this node.
In one implementation, the present invention enables to limit the size of tree metadata so as to improve the space usage of the database.
In one implementation, by the use of the above mentioned technical solution, the present invention provides the mechanism to optimize the ttree for duplicate key searches, wherein, during data insertion the mechanism detects duplication and records as metadata. Further, the same knowledge of duplicated records is used in future search to optimize the selection and projection of same instead of comparing record value for each record.
In one implementation, a data definition language or data description language (DDL) enhancement to the user is provided so that the user can indicate the preference to optimize an index for duplicate searches. In the present invention, an extension to the existing DDL statement to achieve the technical solution provided in the present invention is provided. The new DDL statement (query is received in the form/format) the query is received in a format comprising CREATE INDEX query along with [OPTIMIZED FOR DUPLICATE] option.
BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS
The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to refer like features and components.
Figure 1illustratesan apparatus, to be used in a system, for optimizing the processing of a query, in accordance with an embodiment of the present subject matter.
Figure 2illustrates a method for optimizing the processing of a query, in accordance with an embodiment of the present subject matter.
Figure 3illustratesthe steps involved in the present invention, in accordance with an embodiment of the present subject matter.
Figure 4(a) illustrates an example of a scan for key with value = 5 on a tree, in accordance with an embodiment of the present subject matter.
Figure 4(b) illustrates an example of an isolation of a sub tree with the locality of a key with value = 5 on a tree, in accordance with an embodiment of the present subject matter.
Figure 4(c) illustrates an example of a full node match, in accordance with an embodiment of the present subject matter.
Figure 4(d) illustrates an example of a bit-based metadata maintained for each node, in accordance with an embodiment of the present subject matter.
Figure 4(e) illustrates an example of determine all the duplicates when matching record in the tree, in accordance with an embodiment of the present subject matter.
Figure 5 illustrates an overall system comprising an apparatus anda server storing database, for optimizing the processing of a query, in accordance with an embodiment of the present subject matter.
DETAILED DESCRIPTION OF THE PRESENT INVENTION
The following clearly describes the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Apparently, the described embodiments are merely a part rather than all of the embodiments of the present invention. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts shall fall within the protection scope of the present invention.
The invention can be implemented in numerous ways, including as a process, an apparatus, a system, a composition of matter, a computer readable medium such as a computer readable storage medium or a computer network wherein program instructions are sent over optical or electronic communication links. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention.
A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.
Systems, methods and apparatus for faster query execution by optimizing index in a database management system (DBMS) are disclosed.
While aspects are described for faster query execution by optimizing index in a database management system (DBMS) may be implemented in any number of different computing systems, environments, and/or configurations, the embodiments are described in the context of the following exemplary systems, apparatus, and methods.
Referring now to figure 1,an apparatus (100) is illustrated, to be used in a system including means for logically ordering the records by an index, said index comprising a tree structure storing key values derived from values stored by said at least one field of each record, said index including a plurality of nodes, each node comprising a particular key value together with a record identifier for indicating a particular record which stores said particular key value, said nodes being linked via. Pointers. The apparatus (100) for optimizing the processing of a query comprises a memory (108), and a processor (104) coupled to the memory (108) for executing a plurality of instructions present in the memory. When the plurality of instructions present in the memory (108) executed, the processor (104) is operable to:
• Create (112) at least one index, in response to the query received specifying at least one condition (key value) for selecting particular records, wherein the index stores the key values and an additional memory in the nodes to store at least one metadata generated;
• Generate (114),during the creation of the index, the metadata for at least one duplicate key value by checking duplicacy in the key value specified in the query received, and thereby store the metadata generated in the additional memory;
• Perform (116) at least one operation based on the query received, and thereby update the metadata stored in the nodes based on the operation performed, wherein the operation is configured to change the records;
• Perform (118) a condition search for retrieving the record based on the query received;
• find, based on the condition, a range (minimum to maximum) to generate a subtree from the tree, and thereby locate the key value in the nodes, and isolate (120) the subtree comprising the key value;
• find (122), based on the condition, a node match in the subtree generated to find the key value, wherein the node match is at least one of a full node match or a partial node match, and thereby select, by using the metadata stored, the duplicate key values to determine the duplicacy in the index (124); and
• display (128) the record (result of execution) retrieved based on the query received.
In one implementation, the apparatus (100) comprises a database (110) and an interface (106)
In one implementation, the apparatus (100)is communicably coupled with the user devices / database client systems (102). Although the present subject matter is explained considering that the apparatus (100)is implemented as a separate computing unit it may be understood that the apparatus (100)may also be implemented on a server, in a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, and the like. It will be understood that the apparatus (100) may be accessed by multiple users through one or more user devices/client systems 102-1, 102-2…102-N, collectively referred to as user (102) hereinafter, or applications residing on the user devices (102). Examples of the user devices (102) may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, and a workstation. The user devices (102) are communicatively coupled to the apparatus (100) through a network (not shown).
In one implementation, the network may be a wireless network, a wired network or a combination thereof. The network can be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, and the like. The network may either be a dedicated network or a shared network. The shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like, to communicate with one another. Further the network may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like.
In one implementation, the processor (104) may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor (104) is configured to fetch and execute computer-readable instructions stored in the memory (108).
The interface (106) may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The interface (106) may allow the client systems/users (102) to interact with a user directly or through the apparatus (100). Further, the interface (106) may enable the apparatus (100)to communicate with other computing devices, such as web servers and external data servers (not shown). The interface (106) can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. The interface (106) may include one or more ports for connecting a number of devices to one another or to another server.
The memory (108) may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. The memory (108) may include at least one query compiler configured to prepare an execution plan in a tree structure, with a plurality of plan nodes, for the database query received. It shall be noted that the query compiler is a conventional compiler and the execution plan generation done in the tradition / convention approaches as available in the prior-art.
Figure 5 illustrates an overall system comprising an apparatus (100) and a server (500) storing database (110), for optimizing the processing of a query, in accordance with an embodiment of the present subject matter.
In one implementation, the query is received in a format comprising CREATE INDEX query along with [OPTIMIZED FOR DUPLICATE] option.
In one implementation, the metadata is a bit-based metadata for each node in the tree, wherein for each record having duplicates, end bits (tail) of a duplicate list is marked with “0”, while the remaining bits of the duplicate list is marked as “1”.
In one implementation, the operation to be performed on the tree is selected from a group of SQL operations comprising insert, update, delete, record on table, or any combination thereof.
In one implementation, the full node match indicates all nodes in subtree are same and/or matching the condition (key value in the condition), and if found, extracting the tuple id’s along with the data values from the nodes without reading key values stored at nodes.
In one implementation, the partial node match indicates at least one node but not all nodes in the sub-tree are same and/or matching the condition (key value in the condition), if found, using the metadata (the key color/mark) with duplicity information to identify the duplicates.
In one implementation, the bit-based metadata after locating the key value is generated by firstly, generating a map of possible matches; and then performing a Bit-wise AND of the map generated with the bitmap of the node of the tree and thereby generating a result with an exact match of the key values and then restricting the search scope within the metadata bitmap.
In one implementation, the duplicacy in the key value specified in the query received is checked during insertion in the tree during which first the node is searched for insertion and then position within the node, and if a record with the same key value is found during search, then new record is add to the end of the duplicates identified, and if duplicate meta-data bit field updated.
In one implementation, the operation to be performed on the tree is selected from a group of SQL operations comprising insert, update, delete, record on table, or any combination thereof.
In one implementation, the key value is located by a range search technique using a range key pair i.e., minimum value and maximum value.
In one implementation, the full node match indicates all nodes in subtree are same and/or matching the condition (key value in the condition), and if found, extracting the tuple id’s along with the data values from the nodes without reading key values stored at nodes.
Figure 2 illustrates a method performed by an apparatus (100)in a system, and the system including means for logically ordering the records by an index, said index comprising a tree structure storing key values derived from values stored by said at least one field of each record, said index including a plurality of nodes, each node comprising a particular key value together with a record identifier for indicating a particular record which stores said particular key value, said nodes being linked via. pointers, in accordance with an embodiment of the present subject matter. The method may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. The method may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.
The order in which the method is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method or alternate methods. Additionally, individual blocks may be deleted from the method without departing from the scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof. However, for ease of explanation, in the embodiments described below, the method may be considered to be implemented in the above described apparatus (100).
At block 202, the database query is received by the apparatus (100). The create at least one index, in response to the query received specifying at least one condition (key value) for selecting particular records, wherein the index stores the key values and an additional memory in the nodes to store at least one metadata generated. The query is received in a format comprising CREATE INDEX query along with [OPTIMIZED FOR DUPLICATE] option.
At block 204, the metadata for at least one duplicate key value by checking duplicacy in the key value specified in the query received is generated. The metadata generated is stored in the additional memory of the nodes of Index. The metadata is a bit-based metadata for each node in the tree, wherein for each record having duplicates, end bits (tail) of a duplicate list is marked with “0”, while the remaining bits of the duplicate list is marked as “1”.
At block 206,at least one operation is performed based on the query received, and the metadata stored in the nodes is updated based on the operation performed, wherein the operation is configured to change the records. The operation to be performed on the tree is selected from a group of SQL operations comprising insert, update, delete, record on table, or any combination thereof
At block 208,a condition search for retrieving the record based on the query received is performed.
At block 210,a range (minimum to maximum) is found based on the condition received to generate a subtree from the tree, and thereby locate the key value in the nodes, and isolate the subtree comprising the key value.
At block 212, a node match is found in the subtree generated to find the key value, wherein the node match is at least one of a full node match or a partial node match. The full node match indicates all nodes in subtree are same and/or matching the condition (key value in the condition), and if found, extracting the tuple id’s along with the data values from the nodes without reading key values stored at nodes. The partial node match indicates at least one node but not all nodes in the sub-tree are same and/or matching the condition (key value in the condition), if found, using the metadata (the key color/mark) with duplicity information to identify the duplicates.
At block 214, the duplicate key values to determine the duplicacy in the index is selected by using the metadata stored.
At block 216, the record (result of execution) retrieved based on the query received is displayed using the interface to the user.
Figure 3 illustrates the steps involved in the present invention, in accordance with an embodiment of the present subject matter. In one implementation, the present invention discloses a method to optimize the ttree for duplicate key searches. Wherein, during data insertion it detects duplication and records as metadata. The same knowledge of duplicated records is used in future search to optimize the selection and projection of same instead of comparing record value for each record.
Using the technical solution provided, the present invention optimizes the ttree for duplicate key searches.
Using the technical solution provided, the present invention optimizes the ttree for duplicate key searches by maintaining certain metadata which increases. It may be understood by the person skilled in the art that, maintenance of the metadata is not software code free, and hence may increase the cost of writes operation (Insert/Update/Delete)to the index and ultimately increasing the space consumed by the index.
Using the technical solution provided, the present invention offers a DDL enhancement to the user so that the user can indicate the preference to optimize an index for duplicate searches.
The steps provided in the Figure 3 is best explained by the below mentioned example. It may be understood by the person skilled in the art that the example is just for understanding purpose and shall not limit the scope of the present invention.
Consider a Table T1 with 4 columns T1(C1, C2, C3, C4)
STEP 1:
Receive query in the form CREATE INDEX idx ON t1(c1, c2) OPTIMIZED FOR DUPLICATE
a. The TTree nodes for the given key condition is identified in the first step (Specifically, create a TTree index for Table T1 using columns C1;)
b. This special type of TTree index reserves an additional memory to hold the metadata for duplicacy in Table data.
c. As shown in Figure 4(a) an example of a scan for key with value = 5 on a tree, in accordance with an embodiment of the present subject matter is disclosed. The figure 4(a) shows the Ttree nodes for the given key condition (key = 5 in this case).
STEP 2:
Generate metadata for every data in table into the ttree nodes
a. If data exists in Table T1; read the same and build TTree index for the same using C1 Column as specified in the Create Index Command in Step 1.
b. During this building of index, we check for duplicacy in the key values (i.e. C1 column in this case); result for the same is stored in the additional metadata space reserved in each node of TTree.
c. The figure 4(a) shows the Ttree nodes for the given key condition (key = 5 in this case)and storing metadata at its tail.
STEP 3:
Perform Insert/Update/Delete operation on TTree
a. This step is indicative of users write operation on table (like insert, update, delete or records in the table). This is a normal processing during convention query execution.
STEP 4:
Update metadata
a. With each Table operation as explain in Step 3; TTree index on the table is also in modified according to the modification of the data in table.
b. At this point the duplicacy check for the each record inserted/deleted/modified with the operation is also carried out and result is stored in meta-data.
STEP 5:
Perform point search on TTree
a. This step is indicative of users read operation on the table data (i.e. records). The same is given in form of selection query with a search condition, as exemplified below: -
SELECT c1, c2, c3, c4 FROM t1 WHERE c1 = 5
b. This is a normal processing during convention query execution.
STEP 6:
Find Min Sub-TTree from Condition Key Range i.e., isolate the sub-Ttree with locality of key value
a. Applying standard logic for TTree search i.e., First a sub tree where the search condition is located is identified, and then isolated.
b. The isolation of a sub tree based on the locality of the key value is shown in Figure 4 (b).
c. The key value is located based by a range search technique using a range key pair i.e., minimum value and maximum value. Search in a TTree is by range key pair; i.e. minimum value andmaximum value;
1. Point search: when minimum key = maximum key value (eg. Colum1 = Value1)
2. Range Search: When minimum key < maximum key value (eg. Colum1 > minVal and Column1 < maxval)
3. Infinity Search: When min = -ve infinity and max = +ve infinity (eg. ORDER BY Column1)
4. Partial Infinity: When either min or max is defined and other end of the range is infinity. E.g.:
a. Column1 < value1;
b. Column1 > value 1;
5. Null Range: when range specified is self contradicting and will not produce any records. E.g:
a. Column1 =1 and Column1 =2
b. Column1 >5 and column 1 < 5
STEP 7:
Find Full Node Match and Add to selection
a. In case all records in sub-ttree are matching the condition range; then select all records without key compare as all records are same and matching the key value.
b. All records from this node are extracted without any search on the node.
c. If satisfies the condition, the row-ids of this node are extracted without any search on the node.
d. Continuing the previous example we see that “Node-3” completely satisfies the condition. All records from this node are extracted without any search on the node is as shown in Figure 4(c).
STEP 8:
Node Partial Match: Find the first matching record
a. In case some records in sub-TTree are matching the condition range, use the metadata (the key color) with duplicity information to identify the duplicates.
b. In case some records in sub-ttree are matching the condition range, use the key color to identify the duplicates.
c. In the example as shown in figure 4(d)it is seen that the first 5 records meet the criteria. So select first Node 2 matching records and just start adding all records for which color indicates next record is of same color.
STEP 9:
Select duplicate by using stored meta-data
a. As shown in Figure 4(e), a bit-based metadata for each node is maintained. The bit represents the position of each record in the node.
b. For each record which has duplicate, the tail of the duplicate list is marked with “0”, while the complete list is marked as “1”.
c. Once we find one matching record using binary search, using a specific bit map can determine all the duplicates.
d. The figure 4(e) shows two examples for the following data with 9 positions and search for 20.
STEP 10:
Project and Transfer to user, display results.
a. This step is indicative of returning of the result to user as response to the search query in the format specified by user in the query-
SELECT c1, c2, c3, c4 FROM t1 WHERE c1 = 5
Exemplary embodiments discussed above may provide certain advantages. Though not required to practice aspects of the disclosure, these advantages may include:
1. The mechanism disclosed in the present invention improves performance of any database query containing the scan on the TTree Index table, which has largely duplicated key values.
2. The mechanism allows indicating an index is a duplicate search optimized index in an in-memory database.
3. The mechanism maintain duplication related metadata during writes to an ttree index
4. The mechanism uses the metadata for point searches to find duplicates in a very fast manner.
5. The mechanism uses the metadata for range searches to find duplicates for the range extremities.
6. The mechanism create column with data-type as TAG to store the variable number of elements fields.
7. The mechanism avoids comparison of data from data pages in a ttree storing only rowed.
8. The mechanism allows mapping a whole ttree node to the search condition to avoid search on the node.
Although implementations for a system, method and apparatus for faster query execution by optimizing index in a database management system (DBMS)have been described in language specific to structural features and/or methods, it is to be understood that the appended claims are not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as examples of implementations of a system, method and apparatus for faster query execution by optimizing index in a database management system (DBMS).
| # | Name | Date |
|---|---|---|
| 1 | 1776-CHE-2015-Response to office action [14-02-2025(online)].pdf | 2025-02-14 |
| 1 | GPA of Huawei Technologies India Pvt. Ltd..pdf | 2015-04-13 |
| 2 | 1776-CHE-2015-WithDrawalLetter.pdf | 2017-09-05 |
| 2 | FORM 3.pdf | 2015-04-13 |
| 3 | 1776-CHE-2015-RELEVANT DOCUMENTS [29-08-2017(online)].pdf | 2017-08-29 |
| 3 | FORM 2 & Complete Specification.pdf | 2015-04-13 |
| 4 | abstract 1776-CHE-2015.jpg | 2015-09-01 |
| 4 | Drawings.pdf | 2015-04-13 |
| 5 | abstract 1776-CHE-2015.jpg | 2015-09-01 |
| 5 | Drawings.pdf | 2015-04-13 |
| 6 | 1776-CHE-2015-RELEVANT DOCUMENTS [29-08-2017(online)].pdf | 2017-08-29 |
| 6 | FORM 2 & Complete Specification.pdf | 2015-04-13 |
| 7 | 1776-CHE-2015-WithDrawalLetter.pdf | 2017-09-05 |
| 7 | FORM 3.pdf | 2015-04-13 |
| 8 | 1776-CHE-2015-Response to office action [14-02-2025(online)].pdf | 2025-02-14 |
| 8 | GPA of Huawei Technologies India Pvt. Ltd..pdf | 2015-04-13 |