Abstract: The present subject matter relates to a method to assess a data quality index (DQI) at a final level of abstraction for data stored in a data repository. The method includes computing a plurality of intermediary DQIs for the data, where each of the plurality of intermediary DQIs corresponds to at least one of a data quality parameter, a level of abstraction, and a data quality factor. The method further includes identifying a weighing factor associated with each of the plurality of intermediary DQIs based on a level of abstraction, and determining a final DQI at the final level of abstraction based on a weighted average of the plurality of intermediary DQIs. Furthermore, the weighted average is calculated based on the weighing factors
FORM 2
THE PATENTS ACT, 1970 (39 of 1970)
&
THE PATENTS RULES, 2003
COMPLETE SPECIFICATION
(See section 10, rule 13)
1. Title of the invention: DATA QUALITY ANALYSIS
2. Applicant(s)
NAME NATIONALITY ADDRESS
TATA CONSULTANCY Nirmal Building, 9th Floor, Nariman Point,
Indian
SERVICES LIMITED Maharashtra 400021, India
3. Preamble to the description
COMPLETE SPECIFICATION
The following specification particularly describes the invention and the manner in which it
is to be performed.
DATA QUALITY ANALYSIS TECHNICAL FIELD
[0001] The present subject matter is related, in general to data quality in a computing environment and, in particular, but not exclusively to a method and system for determining a data quality index in the computing environment.
BACKGROUND
[0002] We live in a digitized Data-Age. Generally, digitized data, such as, market data, statistical data, economical data or financial data are relied upon and utilized by enterprises and organizations for various processes, such as data migration, data reconciliation and decision making. Before utilizing the data, subjective measurements of the quality of the data may be assessed for the accuracy of further processing. The quality of data may be computed against several assessment parameters, the resulting values of which may be stored in indices for reference and comparative purposes known as Data Quality Index (DQI).
[0003] The assessment parameters are generally selected based on the type of information stored in the data. For example, the assessment parameter for a customer passport number may be uniqueness. That is to say, a data repository will be checked for uniqueness of all passport number entries. The overall measure of uniqueness will be the data quality for the customer passport number. Similarly, data quality can be measured for any other type of data stored in the data repository. All these measures will result in different DQIs of all the different types of data, measured against the different assessment parameters.
SUMMARY
[0004] This summary is provided to introduce concepts related to data quality analysis, and the concepts are further described below in the detailed description. This summary is not intended to identify essential features of the claimed subject matter nor is it intended for use in determining or limiting the scope of the claimed subject matter.
[0005] In one implementation, a method to assess data quality index (DQI) at a final level of abstraction for the data stored in a data repository is provided. In one implementation, the method includes computing a plurality of intermediary DQIs for the data, where each of the plurality of intermediary DQIs corresponds to at least one of a data quality parameter, a level of
abstraction, and a data quality factor. The method further includes identifying a weighing factor associated with each of the plurality of intermediary DQIs based on the level of abstraction, and determining a final DQI at the final level of abstraction based on a weighted average of the plurality of intermediary DQIs. The weighted average is calculated based on the weighing factors.
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] The present subject matter and other features and advantages thereof will become apparent and may be better understood from the following drawings. The components of the figures are not necessarily to scales, emphasis instead being placed on better illustration of the underlying principle of the subject matter. Different numeral references on figures designate corresponding elements throughout different views. In the figure(s), the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to reference like features and components. The detailed description is described with reference to the accompanying figure(s).
[0007] Fig. 1 illustrates a computing system for data quality analysis, in accordance with an implementation of the present subject matter.
[0008] Fig. 2 illustrates a method for computing a data quality index in accordance with an implementation of the present subject matter.
[0009] Fig. 3 illustrates a schematic block diagram representation for computing a data quality index in accordance with an implementation of the present subject matter.
DETAILED DESCRIPTION
[00010] Systems and methods for data quality analysis are described therein. The systems and methods can be implemented in a variety of computing devices, such as, laptops, desktops, workstations, tablet-PCs, smart phones, notebooks or portable computers, tablet computers, mainframe computers, mobile computing devices, entertainment devices, computing platforms, internet appliances and similar systems. However, a person skilled in the art will comprehend that the embodiment of the present subject matter are not limited to any particular computing system, architecture or application device, as it may be adapted to take advantage of new computing system and platform as they become accessible.
[00011] Typically, quality of data is computed by measuring individual data elements against a plurality of assessment parameters, which is selected based on the type of data. For example, the assessment parameters may be a referential integrity parameter or a population parameter. The data is then measured against these parameters to result in multiple data quality indices (DQI) based on such individual parameters. Therefore, a plurality of DQIs is presented to the user against each of the assessment parameters. This may be cumbersome to a user who wishes to obtain an overall measure of data quality, for example, across a system, or at a higher level of abstraction as compared to the quality of individual data elements. For example, a user may wish to obtain a DQI for a ‘transaction data’. The ‘transaction data’ may be composed of individual data elements, such as, ‘credit data’, ‘cash data’ and ‘liquidity data’. Each of these data elements are generally measured against corresponding assessment parameters, which in turn result in separate DQIs for each of the data elements. This may confuse the user in dealing with the multiple DQIs at the different levels of abstraction for a system. The level of abstraction may be a level at which a determination of a data quality is performed. In an example, a system may include a plurality of entities, each of the entities may include a plurality of elements, and in turn each of the elements may include plurality of assessment parameters.
[00012] The present subject matter describes systems and methods for data quality analysis. In one implementation, a DQI for data stored in a data repository is assessed. The assessed DQI provides a holistic value of data quality for a plurality of types of data in the data repository. In an example, a user may require data quality of customer data, for example where the customer data may include customer data and customer detail entities. In turn, the customer detail entity may include data elements, such as, customer names, customer addresses, customer passport numbers, and customer phone numbers. Each of the data elements may be measured against one or more relevant data quality factors and/or parameters to determine an actual value of data quality of the corresponding data element. For example, customer names may be measured against a data quality factor, such as, a sufficiency factor in order to determine an actual value of data quality for the sufficiency factor. For the sufficiency factor, a customer name column across the data repository may be assessed for missing data or attributes. In an example, a customer last name may be provided as mandatory, and entries missing in the customer last name will result in bad data quality for that entry. Similarly, each of the data elements may be measured against the at least one data quality factor in order to determine the actual values of data quality.
[00013] According to an implementation of the present subject matter, a weighing factor may be identified and assigned to each of the actual values of the data elements. The weighing factor may be configured depending on a weightage or importance of the data quality factor or parameter, and an importance of the data element. In an example, a customer name column in a data repository may be measured against at least two data quality factors, such as, sufficiency and consistency. A weighing factor for each of the data quality parameters may be assigned, depending on the importance of the data quality factors in the data repository. In said example, the sufficiency data quality factor and the consistency data quality factor may be provided with a 70:30 weightage, i.e., the sufficiency data quality factor counts for 70 % by weight and the consistency data quality factor counts for 30 % by weight for the customer name element. In an example, the sufficiency factor may be determined based on “% Population” data quality metric and the consistency may be may be determined based on “% Checksum variance” data quality metric. Therefore on determining the data quality at an element level of abstraction, i.e., for the element ‘customer name’ a weighted average of the sufficiency and the consistency data quality factor may be determined.
[00014] Similarly, according to an implementation, for a further data element, such as, ‘customer address’, the data element may be measured against one or more of the data quality factors as described above. Moreover, the data quality for the data element ‘customer address’ may be determined by assigning weighing factors to the data quality factors against which the customer address is measured, and the weighted average is determined. At this stage, at the element level of abstraction, two DQIs are present, i.e., the DQI for ‘customer name’ and the DQI for ‘customer passport number’. In order to determine a DQI at an entity level of abstraction, for example, including the two data elements ‘customer name’ and ‘customer passport number’, the weighing factors may be obtained and assigned to the DQIs at the element level of abstraction, and the weighted average determined to obtain a single DQI at the entity level of abstraction. In said example, the entity level of abstraction refers to ‘customer details’, which includes the two data elements ‘customer name’ and ‘customer passport number’. In an example, the entity level of abstraction may include any number of entities, such as, ‘customer details’, ‘customer preferences’, and ‘customer transactions’, where each of the entities may include any number of elements relevant to data quality measurements.
[00015] According to an implementation of the present subject matter, to determine a single DQI at a system level of abstraction, where the system level of abstraction includes any number of entities, the DQIs at the entity levels may be assigned weighing factors in the manner described above and a weighted average may be determined to obtain the single DQI at the system level of abstraction. The single DQI at the system level of abstraction may be a holistic value of data quality of the entire data repository. In this manner, the user may be provided a DQI at any level of abstraction. Furthermore it is to be understood that the method described above is not limited to the system level of abstraction, but to any number of subsequent levels of abstraction thereof.
[00016] In the example described above, the present subject matter provides a single DQI, which is indicative of the quality of the customer data in its entirety, rather than single actual values of data quality for each of the data measured against the data quality factors.
[00017] These and other advantages of the present subject matter would be described in greater detail in conjunction with the following figures. While aspects of described systems and methods for the determination of the DQI can be implemented in any number of different computing systems, environments, and/or configurations, the embodiments are described in the context of the following exemplary system(s).
[00018] Fig. 1 illustrates a computing system 100 for data quality analysis, in accordance with an implementation of the present subject matter. In said implementation, the computing system 100 includes one or more processor(s) 102, interface(s) 104, and a memory 106 coupled to the processor 102. The processor 102 can be a single processing unit or a number of units, all of which could also include multiple computing units. The processor 102 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor 102 is configured to fetch and execute computer-readable instructions and data stored in the memory 106.
[00019] The interfaces 104 may include a variety of software and hardware interfaces, for example, interface for peripheral device(s), such as a keyboard, a mouse, an external memory, and a printer. Further, the interfaces 104 may enable the computing system 100 to communicate
with other computing devices, such as web servers and external data repositories in the communication network (not shown in the figure). The interfaces 104 may facilitate multiple communications within a wide variety of protocols and networks, such as a network, including wired networks, e.g., LAN, cable, etc., and wireless networks, e.g., WLAN, cellular, satellite, etc. The interfaces 104 may include one or more ports for connecting the computing system 100 to a number of computing devices.
[00020] The memory 106 may include any computer-readable medium known in the art including, for example, volatile memory such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. The memory 106 also includes module(s) 108 and data 110.
[00021] The module(s) 108 include routines, programs, objects, components, data structures, etc., which perform particular tasks or implement particular abstract data types. In one implementation, the module(s) 108 includes a primary computation module 112, an identification module 114, a secondary computation module 116, and other module(s) 118. The other module(s) 118 may include programs or coded instructions that supplement applications and functions of the computing system 100.
[00022] On the other hand, the data 110, inter alia serves as a repository for storing data processed, received, and generated by one or more of the module(s) 108. The data 110 includes for example, primary computation data 120, identification data 122, secondary computation data 124, and other data 126. The other data 126 includes data generated as a result of the execution of one or more modules in the module(s) 108.
[00023] In one implementation, the computing system 100 determines a Data Quality Index (DQI), which is indicative of a holistic quality of data at a given level of abstraction. The level of abstraction may be a level at which a determination of a data quality is performed. In an example, a system may include a plurality of entities, and in turn each of the entities may include a plurality of elements. The system structure therefore may be said to include three levels of abstraction, i.e., an element level, an entity level, and a system level of abstraction. It may be noted that the present subject matter is not limited to these three levels of abstraction, but may extend to any number of levels of abstraction as would be appreciated by a person skilled in the
art. Therefore, according to the present subject matter, a measure of data quality at any of the levels of abstraction in the data repository may be provided.
[00024] In one implementation, the primary computation module 112 computes a data quality at one of the levels of abstraction based on a data quality parameter or factor associated with the data to be analyzed. For example, the data quality parameter may be selected from individual metrics, such as, % Checksum Variance, % Reference Integrity Failures, % Duplication, % Junk or Accurate Data, % Data Population, or any other metric conventionally known in the art. In another example, the data quality parameter may also be used to derive one or more data quality factors, such as, Latency, Uniqueness, Consistency, Accuracy, and Sufficiency (LUCAS) factor. For example, for a ‘customer passport number’ data element, said data element may be measured against the % Duplication and say a % Population metric. The results obtained thereof may be used to derive the uniqueness and the sufficiency data quality factors for the ‘customer passport number’, and this in turn may be utilized to determine data quality at an element level. In said example, in measuring the ‘customer passport number’ data element against the uniqueness factor, each entry in the ‘customer passport number’ data element is compared across the data repository against a reference. Passport numbers resulting in a mismatch against the reference number will produce a ‘bad’ data quality. Furthermore, the sufficiency factor may be judged by missing data or missing attributes in the individual data entries. In yet another example, the level of abstraction may be utilized as a data quality parameter for the computation of the DQI. In said example, at the element level of abstraction, the computation of DQI will involve utilizing the actual values as well as identifying the weighing factors. However, at the entity and the system levels of abstraction, the actual values will not be determined, but the weighing factors may be identified in order to compute the DQI. Further, at levels higher to element level, one or more of the data quality factors may also be considered.
[00025] In said example, at the element level of abstraction, an actual value corresponding to each of the plurality of data quality parameters may be computed by the primary computation module 112. In the above described example of ‘customer passport number’, the actual value corresponding to the uniqueness and the sufficiency factors may be computed by the primary computation module 112. This will yield two separate actual values, for example 90 and 95,
indicative of the uniqueness and the sufficiency factors of the ‘customer passport number’ data element.
[00026] In one implementation, the actual values computed by the primary computation module 112 are stored in the primary computation data 120.
[00027] According to an implementation of the present subject matter, in furtherance to the above computation by the primary computation module 112, the identification module 114 identifies a weighing factor associated with each of the actual values depending on an importance or weightage of each of the data quality parameters or the data element. In the above example of ‘customer passport number’ being measured against the uniqueness and the sufficiency factors, the identification module 114 may identify and assign a weightage to the resultant actual values, such as, 70:30. That is to say, that the actual value measured against the uniqueness factor holds 70 % by weight, and the actual value measured against the sufficiency factor holds 30 % by weight. In an implementation, the weighing factors assigned by, for example, a user, may form weightage data, which may be stored in the identification data 122. The weightage data stored thus may be subsequently identified by the identification module 114.
[00028] In an implementation, upon identification of the weighing factors, the secondary computation module 116 determines a single DQI at the element level of abstraction based on a weighted average of the actual values described above. In the above example of the ‘customer passport number’ a weighted average of the actual values measured against the uniqueness and sufficiency factors yields a single DQI indicative of the quality of data of the ‘customer passport number’.
[00029] In one implementation, the single DQI determined by the secondary computation module 116 is stored in the secondary computation data 124. The secondary computation data 124 further includes rules regarding the computation of the single DQI as described earlier.
[00030] In an example, as described earlier, there may be any number of data elements in the data repository. For example, in addition to the ‘customer passport number’, there may be additional data elements such as ‘customer phone number’. The single DQI for the ‘customer phone number’ data element may be computed in a manner similar to the method described above for the ‘customer passport number’. However, as described above, the data quality
parameters may be selected depending on the data element. For example, for the ‘customer phone number’, the actual values may be determined against a sufficiency factor and a consistency factor. The consistency factor may be assessed by comparing an entry of the ‘customer phone number’ data element with a reference format, such as, nnnn-nnn-nnn. Once the single DQI for the ‘customer phone number’ data element has been determined, the single DQI may be stored in the secondary computation data 124. In this manner, according to the present subject matter, the single DQI can be determined for any number of elements in a data repository to provide a DQI at the element level of abstraction.
[00031] In an implementation, the entities of the system may include a plurality of data elements as provided earlier. The single DQI is determined for each of the data elements as described above and the single DQIs stored in the secondary computation data 124. Furthermore, in said implementation, according to the present subject matter, the computing system 100 provides a single DQI at the entity level.
[00032] In an implementation, at the entity level of abstraction, the primary computation module 112 is configured to determine the DQIs based on the data quality parameter being the level of abstraction. In said implementation, the DQIs that are determined at the element level, for example, the single DQIs of the ‘customer passport number’ and the ‘customer phone number’ are assigned with weighing factors that may be identified from the identification data 122, by the identification module 114 in order to determine a single DQI for the entity ‘customer data’, which as mentioned earlier, includes the data elements ‘customer passport number’ and ‘customer phone number’. Furthermore, in an implementation, the secondary computation module 116 determines a single DQI for the entity, based on the weighted average of the single DQIs determined at the element level.
[00033] Thus, the computing system 100 is configured to determine a single DQI at any level of abstraction. Moreover, in an implementation, there may be a plurality of data entities, each of which includes a plurality of the data elements as described above.
[00034] In an implementation, at the system level of abstraction, the primary computation module 112 is configured to utilize the single DQIs determined at the entity level of abstraction. In an example, where in addition to the ‘customer data’ entity, there is also a ‘customer preferences’ entity, the identification module 114 is configured to identify a weighing factor
associated with the single DQI of the ‘customer data’ entity as well as the ‘customer preferences’ entity. Finally, the secondary computation module 116 determines a single DQI for the system, based on a weighted average of the data entities.
[00035] In one implementation, the computing system 100 provides an intermediary DQI. In an example, the single DQIs obtained at the element level and the entity level may be referred to as the intermediary DQIs. The intermediary DQIs may be utilized to determine the single DQI at the system level. In another example, the single DQI obtained at the system level, i.e., the holistic data quality of the system in its entirety may be referred to as a final DQI. Further, a level of abstraction at which the final DQI is to be obtained may be understood as the final level of abstraction. It will be understood that the final DQI for one level of abstraction may become an intermediary DQI for another level of abstraction. For example, the final DQI obtained at the element level of abstraction, may be utilized as the intermediary DQI for the determination of the final DQI at the entity level of abstraction, and similarly, the final DQI at the entity level of abstraction may be used as the intermediary DQI for the determination of the final DQI at the system level of abstraction..
[00036] In an example, where the DQI is required only at the element level and the entity level, the actual values of the data elements measured against the data quality parameters at the element level of abstraction, as described earlier, may be referred to as primary intermediary DQIs, and the single DQI computed based on the weighted average of the actual values may be referred to as secondary intermediary DQIs.
[00037] For the purpose of explanation and not as a limitation, the present subject matter has been described in detail with respect to tables 1 and 2. The examples provided therein are with reference to a three tier level of abstraction, i.e., element, entity and system. Table 1 provides an example where the system includes two entities ‘customer details’ and ‘customer preferences’. Furthermore, the ‘customer details’ entity includes two elements ‘customer passport number’ and ‘customer phone number’. In the present example, the ‘customer passport number’ element is measured against the sufficiency and the uniqueness factor, and the ‘customer phone number’ is measured against the sufficiency and the consistency factors. The actual values for various data elements may be computed with respect to the LUCAS factors and the data quality metrics.
Table 2 provides details on the manner in which the final DQI at the system level is computed according to an implementation of the present subject matter.
Table 1
Table.Column Rule Factor Metric Actual Value Target Value Weight for DQ Factor/Metric Weight
at element Weight
at entity Weight
at system
Customer
Details.Passport
Number Field should have values Sufficiency %population 95 100 30 70 60 60
Customer
Details.Passport
Number Data
should be unique Uniqueness %uniqueness 90 100 70 70 60 60
Customer
Details.Phone
number Field should have values Sufficiency %population 70 90 50 30 60 60
Customer
Details.Phone
number Format should be xxx-xxx-xxx consistency %checksum variance 92 100 50 30 60 60
Customer
Preferences.
Comments Data
should be populated Sufficiency %population 88 99 100 100 40 60
Table 2
[00038] In the tables given above, at the element level of abstraction, the actual values of the ‘customer passport number’ are determined by the primary computation module 112, to be 95
and 90, as measured against the sufficiency and the uniqueness factors respectively. The sufficiency factor may be derived from a % Population metric and the uniqueness factor may be derived from a % Uniqueness data quality metric. Similarly, the actual values of the ‘customer phone number’ are determined by the primary computation module 112, to be 70 and 92, as measured against the sufficiency and the consistency factors respectively. The consistency factor may be derived from a % Checksum variance data quality metric. Furthermore, an actual value of a further data element, viz., ‘customer comments’ is determined by the primary computation module 112, to be 88, as measured against a sufficiency factor.
[00039] As described above, in order to determine the plurality of intermediary DQIs at the element level, the identification module 114 identifies weighing factors to be associated with each of the actual values of the data elements, such as, from the identification data 122. The weighing factors thus identified are listed in the table 1. In table 1, for the data element ‘customer details passport number’ sufficiency factor has 30 % weightage and uniqueness factor has 70 % weightage. Furthermore, at the element level, a weighted average of the element level actual values is computed by the secondary computation module 116, in order to determine the intermediary DQIs at the element level. Similarly, at the entity level, for the two entities, viz., ‘customer details’ and ‘customer preferences’, weighing factors are identified by the identification module 114, and the intermediary DQIs at the entity level are computed by the secondary computation module 116, based on the weighted average of the intermediary DQIs at the element level. Similarly, at the system level, the identification module 114 identifies the weighing factors to be associated with the intermediary DQIs at the entity level and in turn the secondary computation module 116 computes the final DQI, indicative of the data quality of the entire system, i.e., ‘customer data’ in its entirety.
[00040] Table 2 illustrates a flow by which the final DQI of the system may be computed, beginning with the determination of the actual values of the data elements, to the computation of the intermediary DQIs at the element level of abstraction. This is followed by the identification of the weighing factors associated with the intermediary DQIs at the element level of abstraction, and the computation of the intermediary DQIs at the entity level of abstraction. Finally, the weighing factors at the entity level are identified and assigned to the intermediary DQIs at the entity level of abstraction, and subsequently, the final DQI of 88.21 is computed at the system
level of abstraction, which is indicative of the data quality across the entire system, representing ‘customer data’. Furthermore, in an example, the LUCAS factors may also be associated with the intermediary DQIs at the entity level of abstraction. As illustrated, at entity level the DQI customer details with respect to sufficiency factor is 87.5 and the customer details with respect to
uniqueness factor is 90.
[00041] Furthermore, in one implementation of the present subject matter, based on the level of abstraction, a DQI based on the data quality parameter may be determined. In table 2, the columns provided under ‘entity by factor’ and ‘system by factor’, show the values that are indicative of a data quality as against the corresponding data quality parameters.
[00042] According to the method of the present subject matter, a root cause for a low or poor level of data quality may also be ascertained. In an example, if the final DQI at the system level of abstraction is 80, whereas a threshold level of data quality at the system level of abstraction is 90, the root cause for the low DQI may be ascertained. In said example, the data quality factor, which is contributing to said low DQI may be investigated and corrected accordingly. For example, it may ascertained that the latency in the ‘customer details” at entity level is influencing the data quality in negative way.
[00043] By way of a further example, the present subject matter has been described in detail with respect to the following equation:
where DQIs is the data quality index of the system;
C1 to C3 are table-specific columns,
M1 to M3 are metrics used for qualifying quality of data,
W1 to W5 are weighing factors for a metric-column combination for a level of abstraction,
Wc1, Wc2, Wc3 are column specific weighing factors in a table
Wt1 and Wt2 are table specific weighing factors, and
In the above shown equation, the table-specific columns C1 to C4, such as, customer passport number, and customer phone number, may be associated with at least one of the data quality parameters, in this case, metrics M1 to M3. Furthermore, each of the table-specific columns C1 to C4 may be associated with a weighing factor, such as, the weighing factors W1 to W5 at each level of abstraction as shown in table 1. As a result, the intermediary DQIs as explained earlier may be obtained at each of the levels of abstraction. In an example, this primary level of abstraction may be the element level of abstraction. Furthermore, at a higher level of abstraction, such as, the entity level of abstraction, a further weighing factor may be associated with the intermediary DQI. In the above equation, column specific weighing factors Wc1 to Wc3 may be associated with the intermediary DQIs. Further, table specific weighing factors Wt1 and Wt2 are multiplied with the intermediary DQIs obtained at the entity level of abstraction and the final DQI at the system level of abstraction may be obtained, viz. DQIs. Moreover, since the method according to the present subject matter is not limited to any level of abstraction, in an example where there is a higher level of abstraction, such as, an enterprise level of abstraction, at the system level of abstraction, a further weighing factor may be associated with a cluster of the DQIs obtained at the system level, and a final DQI may be obtained.
[00044] Fig. 2 illustrates a method 200 for data quality analysis, according to one embodiment of the present subject matter. The method 200 may be implemented in a variety of computing systems, mentioned in description of fig. 1, in several different ways. For example, the method 200, described herein, may be implemented using the computing system 100, as described above.
[00045] The method 200, completely or partially, may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. A person skilled in the art will readily recognize that steps of the method can be performed by programmed computers. Herein, some embodiments are also intended to cover program storage devices, e.g., digital data storage media, which are machine or computer readable and encode machine-
executable or computer-executable programs of instructions, wherein said instructions perform some or all of the steps of the described method 200.
[00046] The order in which the method 200 is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method, or an alternative method. Additionally, individual blocks may be deleted from the method without departing from the spirit and scope of the subject matter described herein. Furthermore, the methods can be implemented in any suitable hardware, software, firmware, or combination thereof. It will be understood that even though the method 200 is described with reference to the computing system 100, the description may be extended to other systems as well.
[00047] At block 202, where data, for which a quality is to be determined, is retrieved from a data repository. The quality of the data may be analyzed at any level of abstraction. In one example, the levels of abstraction may include the element level, the entity level, and the system level of abstraction.
[00048] At block 204, intermediary DQIs associated with the data are computed. Each of the intermediary DQIs corresponds to a data quality parameter or factor. Examples of data quality parameters include, but are not limited to, a data quality metric, and the level of abstraction. In an example, at the element level of abstraction, actual values are computed based on the data quality parameters, which could be for instance, the data quality metric. In said example, the actual value of the data element corresponding to a given data quality parameter may be understood as the intermediary DQI. For example, at the element level of abstraction as described earlier, the actual values are computed by a direct measurement against the data quality parameter as described earlier. In an example, the data quality parameter, such as, data quality metrics may be used to derive at least one data quality factor of the Latency Uniqueness Consistency Accuracy and Sufficiency (LUCAS) factor. In said example, the data quality factors may be determined for levels higher to the element level, or, to say the lower most level. For example, latency factor for customer address can be derived based on accuracy parameter and sufficiency parameter. Further, based on data quality parameter or data quality factors or both, intermediary DQI may be generated for levels higher than the element level.
[00049] At a block 206, weighing factors are identified and associated with the plurality of intermediary DQIs based on the level of abstraction. The weighing factors are configured based on degree of importance of that particular data at a given level.
[00050] At a block 208, the final DQI is determined for the level of abstraction. The desired single DQI at any of the levels of abstraction may be referred to as the final DQI. In an example, if the user desires a holistic data quality, the single DQI at the element level of abstraction will be referred to as the final DQI in this case.
[00051] In one implementation, the intermediary DQIs include identifying the weighing factor associated with each of a plurality of primary intermediary DQIs obtained at a primary level of abstraction, and determining the weighted average of the primary intermediary DQIs at the primary level of abstraction to obtain a plurality of secondary intermediary DQIs for a secondary level of abstraction. In said implementation, the primary intermediary DQIs are those intermediary DQIs, which are utilized in a subsequent level of abstraction for determining a subsequent intermediary DQI. The subsequent intermediary DQIs in this case, may be referred to as the secondary intermediary DQI. Similarly, the primary level of abstraction may be that level of abstraction, at which initial computations are performed in order to determine the primary intermediary DQIs, and the secondary level of abstraction may be that level of abstraction, at which further computations are performed in order to determine the secondary intermediary DQI.
[00052] In one example, while assessing DQI at the entity level, the DQIs determined at the element level may be the primary intermediary DQIs and the element level of abstraction may be referred to as the primary level of abstraction. Similarly, the intermediary DQIs determined at the entity level may be the secondary intermediary DQIs and the element level of abstraction may be referred to as the secondary level of abstraction. Likewise it will be understood that the primary level of abstraction may be that level of abstraction, at which initial computations are performed in order to determine the primary intermediary DQIs, and the secondary level of abstraction may be that level of abstraction, at which further computations are performed in order to determine the secondary intermediary DQI.
[00053] Fig. 3 illustrates a schematic block diagram representation for computing a DQI in accordance with an implementation of the present subject matter. In said implementation, the
DQI is computed across four levels of abstraction, viz., element 304, entity 306, system 308, and enterprise levels of abstraction 310, with respect to data quality parameters. In an example, the data quality parameter may be at least one data quality metric. In another example, the data quality parameter may also be used to derive one of the LUCAS factors. In the figure, the LUCAS factors, or to say, the data quality factors may be utilized at levels of abstraction higher than the element level of abstraction.
[00054] In said implementation, at the element level of abstraction 304, at least one data quality parameter 302 may be associated with each of the elements to obtain a data quality. In one example, if DQI at entity level of abstraction 306 is to be determined, the DQI obtained at the element level of abstraction may be considered to be the intermediary DQI as explained earlier. Furthermore, weighing factors may be associated with each of the data quality parameters 302 as explained earlier. It will be understood that the DQI at the entity and the higher level may be computed based on the data quality parameters, data quality factors, or both. In an example, different weighing factors may be associated at the different levels of abstraction. In one example, the different weighing factors may be associated with individual elements, entities, and or systems, and in another example, the different weighing factors may be associated with the cluster of elements, entities, and or systems.
[00055] As illustrated, entity level 306 may include a cluster of elements 312. Further, at the entity level of abstraction 306, the data quality factor, such as, the at least one LUCAS factor 318 may be associated with either the entity or cluster of elements 312 that form the entity, or part of the entity. In one example, the cluster of elements 312 can form a function or process that is part of the entity.
[00056] Similar to the above described manner, the process of computing the DQI is repeated at the system level of abstraction 308, and the enterprise level of abstraction 310. Furthermore, the system may be composed entirely or partly of a cluster of entities 314, and the enterprise may be composed entirely or partly of a cluster of systems 316. Furthermore, the cluster of entities 314 can form a function or process that is part of the system, and the cluster of systems 316 can form a function or process that is part of the entity. A data quality factor, such as the LUCAS factor 320, may be associated at the system level of abstraction 308, or the cluster of entities 314. In one example, as described earlier, a weighted average may be computed of each of the cluster
of entities 314. Similarly, the LUCAS factor 322 may be associated at the enterprise level of abstraction 310, or the cluster of systems 316.
[00057] Although implementations of DQI analysis have been described in language specific to structural features and/or methods, it is to be understood that the present subject matter is not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as implementations for DQI analysis.
I/We claim:
1. A computer implemented method to assess data quality index (DQI) at a final level of
abstraction for data stored in a data repository, the method comprising:
computing a plurality of intermediary DQIs for the data, wherein each of the plurality of intermediary DQIs corresponds to at least one of a data quality parameter, a level of abstraction, and a data quality factor;
identifying a weighing factor associated with each of the plurality of intermediary DQIs based on the level of abstraction; and
determining a final DQI at the final level of abstraction based on a weighted average of the plurality of intermediary DQIs, wherein the weighted average is calculated based on the weighing factor associated with each of the plurality of intermediary DQIs.
2. The method as claimed in claim 1, wherein the level of abstraction is one of an element level, a cluster of elements level, an entity level, a cluster of entities level, a system level, a cluster of systems level, and an enterprise level.
3. The method as claimed in claim 1, wherein the data quality factor is a Latency Uniqueness Consistency Accuracy Sufficiency (LUCAS) factor.
4. The method as claimed in claim 2, wherein the computing the intermediary DQIs at the element level comprises:
computing an actual value of a data element against each of the plurality of data quality parameters;
identifying a weighing factor associated with the actual value; and
determining a weighted average of the actual values to obtain the intermediary DQIs at the element level.
5. The method as claimed in claim 1, wherein the computing the intermediary DQIs
comprises:
identifying a weighing factor associated with each of a plurality of primary intermediary DQIs obtained at a primary level of abstraction; and
determining a weighted average of the primary intermediary DQIs at the primary level of abstraction to obtain a plurality of secondary intermediary DQIs for a secondary level of abstraction based on the weighted average.
6. A computing system (100) for assessing a data quality index (DQI) at a final level of
abstraction for data stored in a repository, the system (100) comprising:
a processor (102); and
a memory (106) coupled to the processor (102), the memory (106) comprising:
a primary computation module (112), configured to obtain a plurality of intermediary DQIs for the data, wherein the plurality of intermediary DQI corresponds to at least one of a data quality parameter, a level of abstraction, and a data quality factor;
an identification module (114), configured to identify a weighing factor associated with each of the plurality of intermediary DQIs based on the level of abstraction; and
a secondary computation module (116), configured to compute a final DQI at the final level of abstraction based on a weighted average of the plurality of intermediary DQIs, wherein the weighted average is calculated based on the weighing factor associated with each of the plurality of intermediary DQIs.
7. The computing system (100) as claimed in claim 6, wherein the level of abstraction is one of an element level, a cluster of elements level, an entity level, a cluster of entities level, a system level, a cluster of systems level, and an enterprise level.
8. The computing system (100) as claimed in claim 7, wherein the primary computation module (112) is further configured to compute at the element level, at least one actual value corresponding to at least one of each of the plurality of data quality parameters and the data quality factor.
9. The computing system (100) as claimed in claim 8, wherein the identification module (114) is further configured to identify a weighing factor associated with the at least one actual value.
10. The computing system (100) as claimed in claim 8, wherein the secondary computation module (116) is further configured to determine a weighted average of the actual values to obtain the intermediary DQIs at the element level.
11. The computing system (100) as claimed in claim 6, wherein the data quality factor is a Latency Uniqueness Consistency Accuracy Sufficiency (LUCAS) factor.
12. A computer-readable medium having embodied thereon a computer program for executing a method comprising:
computing a plurality of intermediary DQIs at a level of abstraction for data stored in a data repository, wherein each of the plurality of intermediary DQIs corresponds to at least one of a data quality parameter, the level of abstraction, and a data quality factor;
identifying a weighing factor associated with each of the plurality of intermediary DQIs based on the level of abstraction; and
determining a final DQI at a final level of abstraction based on a weighted average of the plurality of intermediary DQIs, wherein the weighted average is calculated based on the weighing factor associated with each of the plurality of intermediary DQIs.
| Section | Controller | Decision Date |
|---|---|---|
| # | Name | Date |
|---|---|---|
| 1 | 2432-MUM-2011-RELEVANT DOCUMENTS [26-09-2023(online)].pdf | 2023-09-26 |
| 1 | Other Document [03-07-2017(online)].pdf | 2017-07-03 |
| 2 | 2432-MUM-2011-RELEVANT DOCUMENTS [27-09-2022(online)].pdf | 2022-09-27 |
| 2 | Examination Report Reply Recieved [03-07-2017(online)].pdf | 2017-07-03 |
| 3 | Description(Complete) [03-07-2017(online)].pdf_560.pdf | 2017-07-03 |
| 3 | 2432-MUM-2011-RELEVANT DOCUMENTS [28-09-2021(online)].pdf | 2021-09-28 |
| 4 | Description(Complete) [03-07-2017(online)].pdf | 2017-07-03 |
| 4 | 2432-MUM-2011-RELEVANT DOCUMENTS [29-03-2020(online)].pdf | 2020-03-29 |
| 5 | Correspondence [03-07-2017(online)].pdf | 2017-07-03 |
| 5 | 2432-MUM-2011-RELEVANT DOCUMENTS [22-03-2019(online)].pdf | 2019-03-22 |
| 6 | Claims [03-07-2017(online)].pdf | 2017-07-03 |
| 6 | 2432-MUM-2011-CORRESPONDENCE(5-9-2011).pdf | 2018-08-10 |
| 7 | 2432-MUM-2011-FORM-26 [23-08-2017(online)].pdf | 2017-08-23 |
| 7 | 2432-MUM-2011-CORRESPONDENCE(8-11-2011).pdf | 2018-08-10 |
| 8 | 2432-MUM-2011-CORRESPONDENCE(8-9-2011).pdf | 2018-08-10 |
| 8 | 2432-MUM-2011-Correspondence to notify the Controller (Mandatory) [24-08-2017(online)].pdf | 2017-08-24 |
| 9 | 2432-MUM-2011-FER.pdf | 2018-08-10 |
| 9 | 2432-MUM-2011-Written submissions and relevant documents (MANDATORY) [14-09-2017(online)].pdf | 2017-09-14 |
| 10 | 2432-MUM-2011-FORM 1(8-9-2011).pdf | 2018-08-10 |
| 10 | 2432-MUM-2011-PatentCertificate25-09-2017.pdf | 2017-09-25 |
| 11 | 2432-MUM-2011-FORM 18(5-9-2011).pdf | 2018-08-10 |
| 11 | 2432-MUM-2011-IntimationOfGrant25-09-2017.pdf | 2017-09-25 |
| 12 | 2432-MUM-2011-FORM 26(8-11-2011).pdf | 2018-08-10 |
| 12 | 2432-MUM-2011-RELEVANT DOCUMENTS [31-03-2018(online)].pdf | 2018-03-31 |
| 13 | 2432-MUM-2011-HearingNoticeLetter.pdf | 2018-08-10 |
| 13 | Form-3.pdf | 2018-08-10 |
| 14 | 2432-MUM-2011-ORIGINAL UNDER RULE 6 (1A)-280817.pdf | 2018-08-10 |
| 14 | Form-1.pdf | 2018-08-10 |
| 15 | ABSTRACT1.jpg | 2018-08-10 |
| 15 | Drawings.pdf | 2018-08-10 |
| 16 | ABSTRACT1.jpg | 2018-08-10 |
| 16 | Drawings.pdf | 2018-08-10 |
| 17 | Form-1.pdf | 2018-08-10 |
| 17 | 2432-MUM-2011-ORIGINAL UNDER RULE 6 (1A)-280817.pdf | 2018-08-10 |
| 18 | 2432-MUM-2011-HearingNoticeLetter.pdf | 2018-08-10 |
| 18 | Form-3.pdf | 2018-08-10 |
| 19 | 2432-MUM-2011-FORM 26(8-11-2011).pdf | 2018-08-10 |
| 19 | 2432-MUM-2011-RELEVANT DOCUMENTS [31-03-2018(online)].pdf | 2018-03-31 |
| 20 | 2432-MUM-2011-FORM 18(5-9-2011).pdf | 2018-08-10 |
| 20 | 2432-MUM-2011-IntimationOfGrant25-09-2017.pdf | 2017-09-25 |
| 21 | 2432-MUM-2011-FORM 1(8-9-2011).pdf | 2018-08-10 |
| 21 | 2432-MUM-2011-PatentCertificate25-09-2017.pdf | 2017-09-25 |
| 22 | 2432-MUM-2011-FER.pdf | 2018-08-10 |
| 22 | 2432-MUM-2011-Written submissions and relevant documents (MANDATORY) [14-09-2017(online)].pdf | 2017-09-14 |
| 23 | 2432-MUM-2011-Correspondence to notify the Controller (Mandatory) [24-08-2017(online)].pdf | 2017-08-24 |
| 23 | 2432-MUM-2011-CORRESPONDENCE(8-9-2011).pdf | 2018-08-10 |
| 24 | 2432-MUM-2011-FORM-26 [23-08-2017(online)].pdf | 2017-08-23 |
| 24 | 2432-MUM-2011-CORRESPONDENCE(8-11-2011).pdf | 2018-08-10 |
| 25 | Claims [03-07-2017(online)].pdf | 2017-07-03 |
| 25 | 2432-MUM-2011-CORRESPONDENCE(5-9-2011).pdf | 2018-08-10 |
| 26 | Correspondence [03-07-2017(online)].pdf | 2017-07-03 |
| 26 | 2432-MUM-2011-RELEVANT DOCUMENTS [22-03-2019(online)].pdf | 2019-03-22 |
| 27 | Description(Complete) [03-07-2017(online)].pdf | 2017-07-03 |
| 27 | 2432-MUM-2011-RELEVANT DOCUMENTS [29-03-2020(online)].pdf | 2020-03-29 |
| 28 | Description(Complete) [03-07-2017(online)].pdf_560.pdf | 2017-07-03 |
| 28 | 2432-MUM-2011-RELEVANT DOCUMENTS [28-09-2021(online)].pdf | 2021-09-28 |
| 29 | Examination Report Reply Recieved [03-07-2017(online)].pdf | 2017-07-03 |
| 29 | 2432-MUM-2011-RELEVANT DOCUMENTS [27-09-2022(online)].pdf | 2022-09-27 |
| 30 | Other Document [03-07-2017(online)].pdf | 2017-07-03 |
| 30 | 2432-MUM-2011-RELEVANT DOCUMENTS [26-09-2023(online)].pdf | 2023-09-26 |
| 1 | 2432_search_28-11-2016.pdf |