Informed Search Space Optimization For Optimal Denial Constraints

< Back

Informed Search Space Optimization For Optimal Denial Constraints Discovery

Abstract: ABSTRACT INFORMED SEARCH SPACE OPTIMIZATION FOR OPTIMAL DENIAL CONSTRAINTS DISCOVERY Data quality management deals with discovery of Denial Constraints (DCs). The process of identifying and defining DCs is largely manual and demands clear understanding of domain and the database design know-how making the whole process tedious, complex, and human error prone. Embodiments herein provide a method and system for applying a simulated annealing for search space optimization for denial constraints discovery. This disclosure aims at identifying DCs by applying AI and ML techniques to reduce the complexity in the search space and focuses on discovering an approximate set of DCs from the complete possible set of DCs. The system combines AI and ML techniques to minimize the predicate search space by employing appropriate search space optimization techniques, greedy algorithms, back-tracing, and search space reduction technique employing clustering of predicates. Thus, number of computation operations required for DC discovery are many folds lesser than a brute force approach. [To be published with FIG. 3]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

13 January 2023

Publication Number

29/2024

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th floor, Nariman point, Mumbai 400021, Maharashtra, India

Inventors

1. MATHEW, Antony John

Tata Consultancy Services Limited, TCS Centre-SEZ, Tata Consultancy Services Limited, Infopark Special Economic Zone, Kochi 682042, Kerala, India

2. MALIAKAL ANTONY, Roopa

Tata Consultancy Services Limited, Electronic Technology Park SEZ I Technopark Campus, Karyavattom P.O, Trivandrum 695581, Kerala, India

3. SAJ, Teena Elsa

Tata Consultancy Services Limited, TCS Centre-SEZ, Tata Consultancy Services Limited, Infopark Special Economic Zone, Kochi 682042, Kerala, India

Specification

Description:FORM 2

THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003

COMPLETE SPECIFICATION
(See Section 10 and Rule 13)

Title of invention:
INFORMED SEARCH SPACE OPTIMIZATION FOR OPTIMAL DENIAL CONSTRAINTS DISCOVERY

Applicant
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India

Preamble to the description:

The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
[001] The disclosure herein generally relates to the field of data quality management and more particularly, to a method and system for an informed search space optimization for an optimal denial constraints discovery leveraging artificial intelligence (AI)/ machine learning (ML).

BACKGROUND
[002] Data Quality is one of the major challenges faced in industry. Data quality can reduce due to various issues in the data related to completeness, accuracy, validity, consistency, redundancy, and availability. The first step in improving the quality is to detect these issues, for which appropriate rules are to be defined at the technical and business level. Such rules represent the constraints that may be imposed on the data to ensure that the quality of data is intact. Denial Constraints (DCs) is an amalgamation of a variety of such constraints like Integrity Constraints (IC) applied on datasets such as key constraints, Functional Dependency (FD) and Order Dependencies (OD). There are a few techniques and algorithms available for identifying DCs.
[003] The major problem with these techniques and algorithms is the inefficiency in addressing the increasing computing complexity that arises with the increasing search space of predicates while deriving DCs for a given dataset. The process of identifying and defining DCs is largely manual and demands clear understanding of domain and the database design know-how making the whole process tedious, complex, and human error prone. To reduce the human effort, a system driven DC discovery is preferable, where search and discover within a given database instance is performed for all valid denial constraints. For such a discovery system, the system needs to compute and search through a large search space comprising of tuple pairs of each record in the given dataset and predicates formed based on columns in the given dataset. This makes the problem of DC discovery very expensive in both computation space and time.
[004] Current process followed in the generating of DCs is to apply deterministic search techniques like Depth First Search/ Breadth First Search in the predicate space. Brute force computation of predicate combination and its evaluation is another option. However, these deterministic approaches tend to make the computation time increase many folds as each combination or at the least each tree in the DFS needs to be evaluated.

SUMMARY
[005] Embodiments of the disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a method and system for an informed search space optimization for an optimal denial constraints discovery leveraging artificial intelligence (AI)/ machine learning (ML) is provided.
[006] In one aspect, a processor-implemented method for an informed search space optimization for an optimal denial constraints discovery leveraging AI/ML is provided. The processor-implemented method comprising receiving, via an input/output interface, a plurality of records of a predefined dataset and generating a plurality of pairs of the received each of the plurality of records. Further, the processor-implemented method includes extracting, via the one or more hardware processors, one or more predicates from the plurality of pairs based on a metadata of one or more columns in the predefined dataset and clustering the extracted one or more predicates based on a combination of considerations. Furthermore, the processor-implemented method includes generating, via the one or more hardware processors, at least one predicate combination within the one or more clusters of the predicates and reducing size of the combination of the considerations based on a predicate interaction behavior.
[007] Furthermore, the processor-implemented method includes regenerating, via the one or more hardware processors, combination of the considerations based on the predicate interaction behavior to simulate a starting position of combination for annealing. A combination of the clustered predicates is iteratively annealed by removing one predicate from the combination in each iteration such that the combination retains the minimum coverage needed to form the minimal set cover. Furthermore, the processor-implemented method includes identifying, via the one or more hardware processors, a final state of the combination that remains after the simulated annealing iterations. Finally, the processor-implemented method includes optimizing, via the one or more hardware processors, the predicate search space and thereby reducing computation and time demands in a denial constraints discovery process.
[008] In another aspect, a system for an informed search space optimization for an optimal denial constraints discovery leveraging AI/ML is provided. The system includes at least one memory storing programmed instructions, one or more Input /Output (I/O) interfaces, and one or more hardware processors operatively coupled to the at least one memory, wherein the one or more hardware processors are configured by the programmed instructions to receive a plurality of records of a predefined dataset. Further, the one or more hardware processors are configured by the programmed instructions to form a plurality of pairs of the received each of the plurality of records to extract one or more predicates from the plurality of pairs based on a metadata of one or more columns in the predefined dataset.
[009] Further, the one or more hardware processors are configured by the programmed instructions to cluster the extracted one or more predicates based on a combination of considerations to form at least one predicate combination within the one or more clusters of the predicates. Furthermore, the one or more hardware processors are configured by the programmed instructions to reduce size of the combination of the considerations based on a predicate interaction behavior and to regenerate combination of the considerations based on the predicate interaction behavior to simulate a starting position of combination for annealing. A combination of the clustered predicates is iteratively annealed by removing one predicate from the combination in each iteration such that the combination retains the minimum coverage needed to form the minimal set cover. Further, the one or more hardware processors are configured by the programmed instructions to identify a final state of the combination that remains after the simulated annealing iterations. Finally, the one or more hardware processors are configured by the programmed instructions to optimize the predicate search space and thereby reducing computation and time demands in a denial constraints discovery process.
[010] In yet another aspect, one or more non-transitory machine-readable information storage mediums are provided comprising one or more instructions, which when executed by one or more hardware processors causes a method for an informed search space optimization for an optimal denial constraints discovery leveraging AI/ML. The processor-implemented method comprising receiving, via an input/output interface, a plurality of records of a predefined dataset and generating a plurality of pairs of the received each of the plurality of records. Further, the processor-implemented method includes extracting, via the one or more hardware processors, one or more predicates from the plurality of pairs based on a metadata of one or more columns in the predefined dataset and clustering the extracted one or more predicates based on a combination of considerations. Furthermore, the processor-implemented method includes generating, via the one or more hardware processors, at least one predicate combination within the one or more clusters of the predicates and reducing size of the combination of the considerations based on a predicate interaction behavior.
[011] Furthermore, the processor-implemented method includes regenerating, via the one or more hardware processors, combination of the considerations based on the predicate interaction behavior to simulate a starting position of combination for annealing. A combination of the clustered predicates is iteratively annealed by removing one predicate from the combination in each iteration such that the combination retains the minimum coverage needed to form the minimal set cover. Furthermore, the processor-implemented method includes identifying, via the one or more hardware processors, a final state of the combination that remains after the simulated annealing iterations. Finally, the processor-implemented method includes optimizing, via the one or more hardware processors, the predicate search space and thereby reducing computation and time demands in a denial constraints discovery process.
[012] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS
[013] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
[014] FIG. 1 illustrates a network diagram of a system for an informed search space optimization for an optimal denial constraints discovery, in accordance with some embodiments of the present disclosure.
[015] FIG. 2 is an exemplary flow diagram illustrating a method for an informed search space optimization for an optimal denial constraints discovery, in accordance with some embodiments of the present disclosure.
[016] FIG. 3 is a functional block diagram to illustrate search space reduction and optimization process in the discovery of denial constraints, in accordance with some embodiments of the present disclosure.

DETAILED DESCRIPTION OF EMBODIMENTS
[017] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
[018] The data quality can reduce due to issues present in data related to completeness, accuracy, validity, consistency, redundancy, and availability. The first step in improving the quality is to detect these issues, for which appropriate rules are to be defined at technical and business level. Such rules represent the constraints that may be imposed on the data to ensure that the quality of data is intact. Denial Constraints (DCs) is an amalgamation of a variety of constraints like Integrity Constraints (IC) applied on datasets such as key constraints, Functional Dependency (FD) and Order Dependencies (OD). Denial Constraints discovery uses search techniques like Depth First or Breadth First which consumes considerable time in finding the minimal covers needed for identifying the final DCs. The time taken is attributed largely due to the vastness of the search space attributed to the count of predicates applicable to the dataset. The common approaches to keep the time taken within limits are to increase the computation power and employ distributed processing. However, with the non-linear increase in the complexity factor with increase in predicate space, the approach of increasing computation power may not be viable for larger search space.
[019] The process of identifying and defining DCs is largely manual and demands clear understanding of domain and the database design know-how making the whole process tedious, complex, and human error prone. To reduce the human effort, a system driven DC discovery is preferable, where search and discover within a given database instance is performed for all valid denial constraints. For such a discovery system, the system needs to compute and search through a large search space comprising of tuple pairs of each record in the given dataset and predicates formed based on columns in the given dataset. This makes the problem of DC discovery very expensive in both computation space and time.
[020] Current process followed in the generating of DCs is to apply deterministic search techniques like Depth First Search/ Breadth First Search in the predicate space. Brute force computation of predicate combination and its evaluation is another option. However, these deterministic approaches tend to make the computation time increase many folds as each combination or at the least each tree in the DFS needs to be evaluated.
[021] Existing solutions of an automated DC discovery leads to a system that is expensive in both computation and time dimensions. This complexity is largely attributed to predicate search space and tuple pair size which increases in a non-linear fashion with an increase in the records size and/ or column count in the given dataset. The proposed new approach overcomes the runtime complexity in the search space, attributed to the number of predicates, in the DC discovery. With the new approach runtime grows only linearly with increase in the number of predicates in the search space resulting in a reduction in overall time taken in the discovery by orders of magnitude. This approach can discover results in an acceptable time frame that otherwise took hours to compute in a brute force manner.
[022] Therefore, the embodiments herein provide a method and system for an informed search space optimization for an optimal denial constraints discovery leveraging artificial intelligence and machine learning. This disclosure aims at identifying DCs by applying AI and ML techniques to reduce the complexity in the search space and focuses on discovering an approximate set of DCs from the complete possible set of DCs. The system combines AI and ML techniques to minimize the predicate search space by employing appropriate search space optimization techniques, greedy algorithms, back-tracing, and search space reduction technique employing clustering of predicates. Thus, number of computation operations required for DC discovery are many folds lesser than a brute force approach. For instance, a 30 predicates dataset would demand a max computation of 14 million whereas employing an appropriate search space optimization can reduce this computation to not more than 65 thousand indicating a 99% reduction in computation demands. Employing clustering techniques further reduces this computation demand, but at the cost of reduced count of DCs discovered.
[023] Referring now to the drawings, and more particularly to FIG. 1 through FIG. 3, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
[024] FIG. 1 illustrates a network diagram of a system 100 for an informed search space optimization for an optimal denial constraints discovery leveraging AI/ML. Although the present disclosure is explained considering that the system 100 is implemented on a server, it may also be present elsewhere such as a local machine. It may be understood that the system 100 comprises one or more computing devices 102, such as a laptop computer, a desktop computer, a notebook, a workstation, a cloud-based computing environment and the like. It will be understood that the system 100 may be accessed through one or more input/output interfaces 104-1, 104-2... 104-N, collectively referred to as I/O interface 104. Examples of the I/O interface 104 may include, but are not limited to, a user interface, a portable computer, a personal digital assistant, a handheld device, a smartphone, a tablet computer, a workstation and the like. The I/O interface 104 are communicatively coupled to the system 100 through a network 106.
[025] In an embodiment, the network 106 may be a wireless or a wired network, or a combination thereof. In an example, the network 106 can be implemented as a computer network, as one of the different types of networks, such as virtual private network (VPN), intranet, local area network (LAN), wide area network (WAN), the internet, and such. The network 106 may either be a dedicated network or a shared network, which represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), and Wireless Application Protocol (WAP), to communicate with each other. Further, the network 106 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices. The network devices within the network 106 may interact with the system 100 through communication links.
[026] The system 100 may be implemented in a workstation, a mainframe computer, a server, and a network server. In an embodiment, the computing device 102 further comprises one or more hardware processors 108, one or more memory 110, hereinafter referred as a memory 110 and a data repository 112, for example, a repository 112. The data repository 112 may also be referred as a dynamic knowledge base 112 or a knowledge base 112. The memory 110 is in communication with the one or more hardware processors 108, wherein the one or more hardware processors 108 are configured to execute programmed instructions stored in the memory 110, to perform various functions as explained in the later part of the disclosure. The repository 112 may store data processed, received, and generated by the system 100. The memory 110 further comprises a plurality of modules. The plurality of modules is configured to perform various functions.
[027] The system 100 supports various connectivity options such as BLUETOOTH®, USB, ZigBee and other cellular services. The network environment enables connection of various components of the system 100 using any communication link including Internet, WAN, MAN, and so on. In an exemplary embodiment, the system 100 is implemented to operate as a stand-alone device. In another embodiment, the system 100 may be implemented to work as a loosely coupled device to a smart computing environment. The components and functionalities of the system 100 are described further in detail.
[028] FIG. 2 is an exemplary flow diagrams illustrating a processor-implemented method 200 for an informed search space optimization for an optimal denial constraints discovery leveraging AI/ML implemented by the system 100 of FIG.1, according to some embodiments of the present disclosure.
[029] Initially at step 202 of the method 200, the one or more I/O interfaces 104 of the system 100 are configured to receive a plurality of records of a predefined dataset, for e.g., salary dataset (T) which contains records on salary given to employees in a company.
[030] At the next step 204 of the method 200, the one or more hardware processors 108 are configured by the programmed instructions to form a plurality of pairs of the received each of the plurality of records. Herein, each of the plurality of records are paired with another record in the predefined dataset. For e.g., salary record of first employee (x) is paired with every other salary record (y) present in the dataset.
[031] At the next step 206 of the method 200, the one or more hardware processors 108 are configured by the programmed instructions to extract one or more predicates from the plurality of pairs based on metadata of one or more columns, like column data type, order of magnitude of the column values, in the predefined dataset, wherein one or more column combinations are identified as joinable or comparable.
[032] In one example, wherein in the salary dataset (T), salary is compared with tax to form 6 different predicates generating one predicate group (Pg1) – T.Salary(x) = T.Tax(x) (P1), T.Salary(x) != T.Tax(x) (P2), T.Salary(x) >= T.Tax(x) (P3), T.Salary(x) <= T.Tax(x) (P4), T.Salary(x) < T.Tax(x) (P5) and T.Salary(x) > T.Tax(x) (P6). Similarly, Salary of one record can be compared with another salary record to form 6 different predicates generating another predicate group (Pg2) - T.Salary(x) = T. Salary(y) (P7), T.Salary(x) != T. Salary(y) (P8), T.Salary(x) >= T. Salary(y) (P9), T.Salary(x) <= T. Salary(y) (P10), T.Salary(x) < T. Salary(y) (P11) and T.Salary(x) > T. Salary(y) (P12).
[033] At the next step 208 of the method 200, the one or more hardware processors 108 are configured by the programmed instructions to cluster the extracted one or more predicates based on a combination of considerations. The combination of considerations includes a predicate group, and an overlap of predicates. Wherein, overlap of tuple pairs satisfied by two predicates and the interaction behavior of pairs of predicates thereby reducing the search space of predicates into separate clusters of predicates. Hierarchical clustering technique performs best when clustering predicates. However, due to inherent data size restriction of hierarchical clustering, Kmeans clustering technique could be leveraged.
[034] At the next step 210 of the method 200, the one or more hardware processors 108 are configured by the programmed instructions to form at least one predicate combination within the one or more clusters of the predicates. The predicates are taken through a sorting step based on the predicate group that each predicate is part of. All combinations of predicates are then formed based on these groups thus formed.
[035] In another example, wherein predicates from predicate groups Pg1 and Pg2 mentioned above in paragraph [032] could form combinations (P1,P7), (P1,P8), (P1,P9), (P1,P10), (P1,P11), (P1,P12), (P1,P13), (P2,P7), (P2,P8), (P2,P9), (P2,P10), (P2,P11), (P2,P12), (P2,P13), (P3,P7), (P3,P8), (P3,P9), (P3,P10), (P3,P11), (P3,P12), (P3,P13), (P4,P7), (P4,P8), (P4,P9), (P4,P10), (P4,P11), (P4,P12), (P4,P13), (P5,P7), (P5,P8), (P5,P9), (P5,P10), (P5,P11), (P5,P12), (P5,P13), (P6,P7), (P6,P8), (P6,P9), (P6,P10), (P6,P11), (P6,P12), (P6,P13).
[036] At the next step 212 of the method 200, the one or more hardware processors 108 are configured by the programmed instructions to reduce size of the combination of the considerations based on a predicate interaction behavior. For e.g., if the given combination contains predicates which are identified as subset predicate of a primary predicate, then the combination may be split into one or more combination such that the subset predicate does not appear in the same combination as primary predicate. The proposed approach leverages a greedy algorithm to reduce the combination size. The greedy algorithm uses individual predicate coverage as the consideration to identify the predicate to be removed. The predicate with lower individual coverage value is identified in each annealing iteration to be removed from the combination.
[037] Referring FIG. 3, a functional block diagram 300 to illustrate search space reduction and optimization process in the discovery of denial constraints, according to some embodiments of the present disclosure. Herein, the predicates are grouped into overlapping clusters and possible predicate combinations are formed within the predicate clusters. Each combination is looped through one or more iterations based on size of the predicate group in that combination. Each combination satisfies a predefined condition that the coverage of the combination is equal to or greater than the minimum coverage required to form a minimal set cover in the formation process of a denial constraints. The condition for a predicate combination to become the minimal set cover is that any of the individual predicates in the combination should be satisfied by any given record pair.
[038] In one illustration, wherein combination is looped through N iterations where N is the size of predicate groups. In each iteration loop each combination is re-evaluated after one of the predicates are identified through a greedy algorithm is removed from the combination, and the coverage is re-computed. If the re-computed coverage is seen to be more than acceptable coverage, then the parent combination is discarded and the annealed combination is taken forward for next iteration loop.
[039] At the next step 214 of the method 200, the one or more hardware processors 108 are configured by the programmed instructions to regenerate combination of the considerations based on the predicate interaction behavior to simulate a starting position of combination for annealing. A lesser greedy algorithm could retain the parent combination in addition to annealed combination to check possibility of alternate solutions formation. However, this could lead to increased computation complexities. If the re-computed coverage is seen to be lesser than acceptable coverage, then the annealed combination is discarded, and back-tracing is employed to fall back to parent combination.
[040] At the next step 216 of the method 200, the one or more hardware processors 108 are configured by the programmed instructions for annealing iteratively a combination of the clustered predicates by removing one predicate from the combination in each iteration such that the combination retains the minimum coverage needed to form the minimal set cover. An exit criterion is defined where the system stops annealing once it reaches either the iteration loop reaches the maximum possible value of N or there are no combinations which can be annealed further. Each of the one or more predicates with lower individual coverage value is identified, leveraging greedy algorithm, in each annealing iteration to be removed from the combination.
[041] At the next step 218 of the method 200, the one or more hardware processors 108 are configured by the programmed instructions to identify a final state of the combination that remains after the simulated annealing iterations. Each of the one or more predicates are backtracked to previous predicate combination state if the newly formed predicate combinations lack the minimum coverage that is needed to form a minimal set cover.
[042] Finally at the last step 220 of the method 200, the one or more hardware processors 108 are configured by the programmed instructions to optimizes the predicate search space and thereby reducing computation and time demands in a denial constraints discovery process.
[043] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[044] The embodiments of present disclosure herein address the problem of denial constraints discovery that uses search techniques like Depth First or Breadth First which consumes considerable time in finding the minimal covers needed for identifying the final DCs. The time taken is attributed largely due to the vastness of the search space attributed to the count of predicates applicable to the dataset. Therefore, embodiments herein provide a method and system for an informed search space optimization for an optimal denial constraints discovery leveraging AI/ML. This disclosure aims at identifying DCs by applying AI and ML techniques to reduce the complexity in the search space and focuses on discovering an approximate set of DCs from the complete possible set of DCs. The system combines AI and ML techniques to minimize the predicate search space by employing appropriate search space optimization techniques, greedy algorithms, back-tracing, and search space reduction technique employing clustering of predicates. Thus, number of computation operations required for DC discovery are many folds lesser than a brute force approach.
[045] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g., any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g., hardware means like e.g., an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g., an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means, and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g., using a plurality of CPUs, GPUs etc.
[046] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[047] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[048] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[049] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.
, Claims:WE CLAIM:
1. A processor-implemented method (200) comprising steps of:
receiving (202), via an input/output interface, a plurality of records of a predefined dataset;
generating (204), via one or more hardware processors, a plurality of pairs of the received each of the plurality of records, wherein each of the plurality of records are paired with another record in the predefined dataset;
extracting (206), via the one or more hardware processors, one or more predicates from the plurality of pairs based on a metadata of one or more columns in the predefined dataset, wherein one or more column combinations are identified as one of a) joinable and b) comparable;
clustering (208), via the one or more hardware processors, the extracted one or more predicates based on a combination of considerations, wherein the combination of considerations include a predicate group, and an overlap of predicates;
generating (210), via one or more hardware processors, at least one predicate combination within the one or more clusters of the predicates, wherein the predicates are taken through a sorting step based on the predicate group that each predicate is part of;
reducing (212), via one or more hardware processors, size of the generated at least one predicate combination based on a predicate interaction behavior;
regenerating (214), via one or more hardware processors, combination of the considerations based on the predicate interaction behavior to simulate a starting position of combination for annealing;
annealing (216), via the one or more hardware processors, iteratively a combination of the clustered predicates by removing one predicate from the combination in each iteration such that the combination retains the minimum coverage needed to form the minimal set cover;
identifying (218), via the one or more hardware processors, a final state of the combination that remains after the simulated annealing iterations; and
optimizing (220), via the one or more hardware processors, the predicate search space and thereby reducing computation and time demands in a denial constraints discovery process.
2. The processor-implemented method (200) as claimed in claim 1, wherein overlap of tuple pairs satisfied by two predicates and the interaction behavior of pairs of predicates thereby reducing the search space of predicates into separate clusters of predicates.
3. The processor-implemented method (200) as claimed in claim 1, wherein each combination is looped through one or more iterations based on size of the predicates in that combination.
4. The processor-implemented method (200) as claimed in claim 1, wherein each combination satisfies a predefined condition that the coverage of the combination is equal to or greater than the minimum coverage required to form a minimal set cover in the formation process of a denial constraints.
5. The processor-implemented method (200) as claimed in claim 1, wherein each of the one or more predicates with lower individual coverage value is identified, leveraging greedy algorithm, in each annealing iteration to be removed from the combination.
6. The processor-implemented method (200) as claimed in claim 1, wherein each of the one or more predicates are backtracked to previous predicate combination state if the newly formed predicate combinations lack the minimum coverage that is needed to form a minimal set cover.
7. A system (100) comprising:
an input/output interface (104) to receive records in a given dataset;
a memory (110) in communication with the one or more hardware processors (108), wherein the one or more hardware processors (108) are configured to execute programmed instructions stored in the memory (110) to;
generate a plurality of pairs of the received each of the plurality of records, wherein each of the plurality of records are paired with another record in the predefined dataset;
extract one or more predicates from the plurality of pairs based on a metadata of one or more columns in the predefined dataset, wherein one or more column combinations are identified as one of a) joinable and b) comparable;
cluster the extracted one or more predicates based on a combination of considerations, wherein the combination of considerations include a predicate group, and an overlap of predicates;
generate at least one predicate combination within the one or more clusters of the predicates, wherein the predicates are taken through a sorting step based on the predicate group that each predicate is part of;
reduce size of the generated at least one predicate combination based on a predicate interaction behavior;
regenerate combination of the considerations based on the predicate interaction behavior to simulate a starting position of combination for annealing;
annealing iteratively a combination of the clustered predicates by removing one predicate from the combination in each iteration such that the combination retains the minimum coverage needed to form the minimal set cover;
identify a final state of the combination that remains after the simulated annealing iterations; and
optimize the predicate search space and thereby reducing computation and time demands in a denial constraints discovery process.
8. The system (100) as claimed in claim 7, wherein overlap of tuple pairs satisfied by two predicates and the interaction behavior of pairs of predicates thereby reducing the search space of predicates into separate clusters of predicates.
9. The system (100) as claimed in claim 7, wherein each combination is looped through one or more iterations based on size of the predicates in that combination.
10. The system (100) as claimed in claim 7, wherein each combination satisfies a predefined condition that the coverage of the combination is equal to or greater than the minimum coverage required to form a minimal set cover in the formation process of a denial constraints.
11. The system (100) as claimed in claim 7, wherein each of the one or more predicates with lower individual coverage value is identified in each annealing iteration to be removed from the combination.
12. The system (100) as claimed in claim 7, wherein each of the one or more predicates are backtracked to previous predicate combination state if the newly formed predicate combinations lack the minimum coverage that is needed to form a minimal set cover.

Dated this 13th Day of January 2023
Tata Consultancy Services Limited
By their Agent & Attorney

(Adheesh Nargolkar)
of Khaitan & Co
Reg No IN-PA-1086

Documents

Application Documents

#	Name	Date
1	202321002861-STATEMENT OF UNDERTAKING (FORM 3) [13-01-2023(online)].pdf	2023-01-13
2	202321002861-REQUEST FOR EXAMINATION (FORM-18) [13-01-2023(online)].pdf	2023-01-13
3	202321002861-PROOF OF RIGHT [13-01-2023(online)].pdf	2023-01-13
4	202321002861-FORM 18 [13-01-2023(online)].pdf	2023-01-13
5	202321002861-FORM 1 [13-01-2023(online)].pdf	2023-01-13
6	202321002861-FIGURE OF ABSTRACT [13-01-2023(online)].pdf	2023-01-13
7	202321002861-DRAWINGS [13-01-2023(online)].pdf	2023-01-13
8	202321002861-DECLARATION OF INVENTORSHIP (FORM 5) [13-01-2023(online)].pdf	2023-01-13
9	202321002861-COMPLETE SPECIFICATION [13-01-2023(online)].pdf	2023-01-13
10	202321002861-FORM-26 [15-02-2023(online)].pdf	2023-02-15
11	Abstract1.jpg	2023-03-09
12	202321002861-FER.pdf	2025-08-07
13	202321002861-FORM-26 [05-11-2025(online)].pdf	2025-11-05

Search Strategy

1	202321002861_SearchStrategyNew_E_SearchHistory(84)E_27-03-2025.pdf