Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media for assessing data quality in multi-stage, multi-source batch processes that do not require validation of input data prior to processing. Embodiments of the present disclosure are further capable of identifying or predicting potential data quality issues, assessing their impact (if any) on the batch process, and providing recommendations for preventing or resolving the identified or predicted data quality issues.
CLIAMS:We claim:
1. A method for assessing data quality in a multi-stage, multi-source batch process, the batch process including one or more batch jobs being concurrently executed by one or more hardware processors, the method comprising:
determining, by one or more hardware processors, a performance parameter associated with the one or more batch jobs from a set of batch process parameters based on metadata associated with the batch process;
monitoring a real-time value associated with the performance parameter during execution of the batch process;
calculating a deviation of the monitored real-time value associated with the performance parameter from a threshold value associated with the performance parameter;
predicting, by one or more hardware processors, that one or more data quality issues and a magnitude of the one or more data quality issues are present based on the calculated deviation and a correlation between the calculated deviation and one or more previously identified potential data quality issues;
predicting, by one or more hardware processors, a magnitude of an impact of the one or more predicted data quality issues on the batch process; and
providing, by one or more hardware processors, a recommendation to resolve the one or more predicted data quality issues.
2. The method according to claim 1, wherein the set of batch process parameters includes at least one of: a frequency or number of transactions processed in a logical path within a batch job from among the one or more batch jobs, a number of read/write operations performed by a batch job from among the one or more batch jobs on a dataset; time taken to execute a step within a batch job from among the one or more batch jobs; or a frequency or number of failed transactions within a batch job from among the one or more batch jobs.
3. The method according to claim 1, wherein:
the performance parameter comprises a vector of two or more performance parameters associated with the one or more batch jobs,
monitoring the real-time value associated with the performance parameter during execution of the batch process comprises determining a vector of real-time values associated with the two or more performance parameters, and
calculating a deviation of the monitored real-time value comprises calculating a vector difference between the vector of real-time values and a vector of threshold values associated with the performance parameter.
4. The method according to claim 3, wherein predicting that one or more data quality issues are present comprises making the prediction based on the vector difference and a correlation between the vector difference and one or more previously identified data quality issues.
5. The method according to claim 1, wherein the method further comprises calibrating the threshold value associated with the performance parameter.
6. The method according to claim 5, wherein the method further comprises calibrating the correlation between the calculated deviation and the one or more previously identified data quality issues.
7. The method according to claim 5, wherein calibration occurs when performance of the batch process does not match an expected performance of the batch process.
8. The method according to claim 1, wherein the method further comprises providing an assessment of impacts on the batch process based on the one or more predicted data quality issues and metadata associated with the batch process.
9. The method according to claim 1, further comprising:
receiving, from an authenticated user, at least one of: the set of batch process parameters, the threshold value associated with the performance parameter, or the correlation between the calculated deviation and one or more previously identified potential data quality issues.
10. A system for assessing data quality in a multi-stage, multi-source batch process comprising:
one or more hardware processors; and
a computer-readable medium storing instructions that, when executed by the one or more hardware processors, cause the one or more hardware processors to perform operations comprising:
determining a performance parameter associated with the one or more batch jobs from a set of batch process parameters based on metadata associated with the batch process;
monitoring a real-time value associated with the performance parameter during execution of the batch process;
calculating a deviation of the monitored real-time value associated with the performance parameter from a threshold value associated with the performance parameter;
predicting that one or more data quality issues are present and a magnitude of the one or more data quality issues based on the calculated deviation and a correlation between the calculated deviation and one or more previously identified potential data quality issues;
predicting, by the one or more hardware processors, a magnitude of an impact of the one or more predicted data quality issues on the batch process; and
providing a recommendation to resolve the one or more predicted data quality issues.
11. The system according to claim 10, wherein the set of batch process parameters includes at least one of: a frequency or number of transactions processed in a logical path within a batch job from among the one or more batch jobs, a number of read/write operations performed by a batch job from among the one or more batch jobs on a dataset; time taken to execute a step within a batch job from among the one or more batch jobs; or a frequency or number of failed transactions within a batch job from among the one or more batch jobs.
12. The system according to claim 10, wherein:
the performance parameter comprises a vector of two or more performance parameters associated with the one or more batch jobs,
monitoring the real-time value associated with the performance parameter during execution of the batch process comprises determining a vector of real-time values associated with the two or more performance parameters, and
calculating a deviation of the monitored real-time value comprises calculating a vector difference between the vector of real-time values and a vector of threshold values associated with the performance parameter.
13. The system according to claim 12, wherein predicting that one or more data quality issues are present comprises making the prediction based on the vector difference and a correlation between the vector difference and one or more previously identified data quality issues.
14. The system according to claim 10, wherein the operations further comprise calibrating the threshold value associated with the performance parameter.
15. The system according to claim 14, wherein the operations further comprise calibrating the correlation between the calculated deviation and the one or more previously identified data quality issues.
16. The system according to claim 14, wherein calibration occurs when performance of the batch process does not match an expected performance of the batch process..
17. The system according to claim 10, wherein the operations further comprise providing an assessment of impacts on the batch process based on the one or more predicted data quality issue and metadata associated with the batch process.
18. A non-transitory computer-readable medium storing instructions for assessing data quality in a multi-stage, multi-source batch process, wherein upon execution of the instructions by one or more hardware processors, the hardware processors perform operations comprising:
determining a performance parameter associated with the one or more batch jobs from a set of batch process parameters based on metadata associated with the batch process;
monitoring a real-time value associated with the performance parameter during execution of the batch process;
calculating a deviation of the monitored real-time value associated with the performance parameter from a threshold value associated with the performance parameter;
predicting that one or more data quality issues are present and a magnitude of the one or more data quality issues based on the calculated deviation and a correlation between the calculated deviation and one or more previously identified potential data quality issues;
predicting, by the one or more hardware processors, a magnitude of an impact of the one or more predicted data quality issues on the batch process; and
providing a recommendation to resolve the one or more predicted data quality issues.
Dated this 25th day of March 2014
R Ramya Rao
Of K&S Partners
Agent for the Applicant
,TagSPECI:BACKGROUND
Batch processes are used by many large enterprises to efficiently handle a variety of data transactions often critical for business or regulatory purposes. Batch processes may be organized as a collection of batch jobs that perform a set of operations on discrete data sets to yield processed results. For example, a batch process for closing a financial cycle for a given business may require processing of numerous account payable transactions spread across different departmental units. The batch process for closing the financial cycle may include a batch job for each departmental unit handling the account payable transactions in the departmental unit. Each batch job processing account payable transactions may be further broken into steps that include reading the input account payable transaction from a database, processing the account payable transaction, and storing the processed account payable transaction in the same database or a different database. Upon completion of the batch jobs for the departmental units, the batch process may comprise another batch job that collects the processed account payable transactions from each departmental unit and produces an account summary that may be posted into a general ledger to close the financial cycle.
| # | Name | Date |
|---|---|---|
| 1 | Form-9(Online).pdf | 2014-03-28 |
| 2 | IP26766-spec.pdf | 2014-04-02 |
| 3 | IP26766-drawings.pdf | 2014-04-02 |
| 4 | FORM 5.pdf | 2014-04-02 |
| 5 | FORM 3.pdf | 2014-04-02 |
| 6 | 1586CHE2014.pdf | 2014-04-02 |
| 7 | 1586-CHE-2014 POWER OF ATTORNEY 26-08-2014.pdf | 2014-08-26 |
| 7 | FORM 3.pdf | 2014-04-02 |
| 8 | 1586-CHE-2014 FORM-1 26-08-2014.pdf | 2014-08-26 |
| 8 | FORM 5.pdf | 2014-04-02 |
| 9 | 1586-CHE-2014 CORRESPONDENCE OTHERS 26-08-2014.pdf | 2014-08-26 |
| 9 | IP26766-drawings.pdf | 2014-04-02 |
| 10 | FORM-18.pdf | 2014-11-03 |
| 10 | IP26766-spec.pdf | 2014-04-02 |
| 11 | 1586-CHE-2014-FER.pdf | 2019-10-11 |
| 11 | Form-9(Online).pdf | 2014-03-28 |
| 1 | SearchStrategyMatrix_11-10-2019.pdf |