Sign In to Follow Application
View All Documents & Correspondence

Configuration Driven Nested Iterative And Interpreted Calculation Framework For Data Science And Machine Learning Systems

Abstract: The present embodiment provides a system and a computer-implemented method for real-time generation of complex aggregate functions for use in feature enrichment, de-noising, and content classification and scoring. The system includes a configuration interpreter module, a driver layer module and an iterative function interpreter module. The configuration interpreter module is configured to interpret, evaluate and execute functions stored as a text statement (105) in a configuration tabular data. The driver layer module is configured to generate complex aggregate function as new features. The iterative function interpreter module is configured to process and generate a complex aggregate function output in the data frame through an iterative and a nested process. Reference Figure 1

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
14 December 2020
Publication Number
51/2020
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
jalanastha64@gmail.com
Parent Application

Applicants

SUMYAG DATA SCIENCES PVT LTD
D603, Mantri Serenity, Doddakallasandra, Bangalore, 560062, Karnataka, India

Inventors

1. VISHWANATH RAMDAS
D603 Mantri Serenity Apts, Doddakallasandra, Bangalore, Karnataka, 560062, India
2. CHANDRA MAHENDRA VIKRAM SINGH
Flat E-102, Wing 1, Saravana Tranquil Heights, Vinayak Nagar, Vidyaranyapura, Bengaluru, 560097, Karnataka, India.

Specification

Claims:1. A system for real-time generation of complex aggregate functions for use in feature enrichment, de-noising, and content classification and scoring, the system comprising:
a configuration interpreter module configured to interpret, evaluate and execute functions stored as text statement (105) in a configuration tabular data in real-time;
a driver layer module configured to generate complex aggregate function as new features and score in the data frame; and
an iterative function interpreter module configured to process and generate a complex aggregate function output in the data frame through an iterative and a nested process, wherein the iterative function interpreter module is capable of generating the output in the data frame (135) by processing the aggregate of weights (130) and generated functions as stored in the configuration tabular data,
wherein the system is deployed anywhere in the data science pipeline to generate complex features based on the business requirements.

2. The system as claimed in claim 1 further comprises the processing of the configuration tabular data as a complex aggregate function using features in the input data frame.

3. A computer-implemented method for real-time feature enrichment, de-noising, content classification and scoring, the method comprising:
interpretation of a configuration tabular data as a text statement, wherein the configuration tabular data includes a plurality of features (110);
standardization of the plurality of features (110) in an input data frame (115) with a standardized range as an input to the final aggregate function;
processing of the final function as an aggregate of a weight and a function from the configuration tabular data in the output data frame (135), wherein the output data frame (135) is processed iteratively and in a nested fashion; and
displaying the output data frame (135) with the final function.

4. The computer-implemented method as claimed in claim 3, wherein the weight is added to the plurality of features.

5. The computer-implemented method as claimed in claim 3, wherein the configuration tabular data includes a document such as spreadsheet or text file.

6. The computer-implemented method as claimed in claim 3, wherein the standardized range is the range for standardizing the output data frame (135).

7. The computer-implemented method as claimed in claim 3, wherein the final function is processed as an aggregate sum and product of the weight and the function from the configuration tabular data.

8. The computer-implemented method as claimed in claim 3, wherein a new complex aggregate function is added with a change in the configuration tabular data.

9. The computer-implemented method as claimed in claim 3, wherein the processing of the complex aggregate function does not require a change or deployment of a new code.
, Description:FIELD OF INVENTION
The present embodiment relates to the field of computational systems used in Data Science, AI and Machine Learning, and more particularly relates to a computer-implemented method and system for feature enrichment, de-noising, final classification and scoring.
BACKGROUND OF THE INVENTION
In the field of data science there is a continuous need for feature enrichment, feature selection, de-noising, final scoring and output generation depending on the area of application. These Features and outputs are typically the outcome of complex non-linear computations that are aggregated with differing strategies like sum-function or a product-function. However, the complexity of this process makes it difficult to deploy such functions in real-time without the involvement of code management, code deployment and testing. The hardening of computation approaches into the code limits the capabilities of the analytics system.
Further, the methodology for generating functionally relevant features as inputs for analytics models through the implementation of coded functions, classes and methods have been in prevalence and existence for many years. These methodologies implemented in various coding platforms like Java, C Python and others are generally deployed as hardened and embedded functions in the code rendering the processes inflexible. This type of deployment has various disadvantages such as changing for every complex computation, does not have the capability to deploy ad-hoc computational functions into the system, lacks the ability to deploy complex higher level aggregate functions either as sum arrays or product arrays and lacks the mechanism for controlling these functions flexibly through configurations that are embodied and placed outside the code-base to independently manage the outputs.
Furthermore, most of the implementation of aggregate function enrichment are either tightly coupled as independently coded programs deployed in the code-base and then using further code components and modules to embody the higher aggregate functions. There is no clear mechanism provided that combine all three features, the iterative aggregate function, the implementation of custom feature functions and computations and the configuration mechanism to control the functions in a single process or system.
Therefore, there is a need of a system and a computer-implemented method for combining complex computations and complex aggregate functions, along with configuration schema to help deploy computation in data science without code management into a single system. There is a need for a system and a method for analyzing data fast and flexibly through the process of generating complex higher dimension features using complex computations and functions.
SUMMARY OF THE INVENTION
As mentioned, there is a need for a system and a method for flexible analysis of data for the areas such as feature enrichment, feature selection, de-noising, final scoring and output generation.
In an aspect, a system for real-time generation of complex aggregate functions for use in feature enrichment, de-noising, and content classification and scoring is provided. The system includes a configuration interpreter module, a driver layer module and an iterative function interpreter module. The configuration interpreter module is configured to interpret, evaluate and execute functions stored as text statements in a configuration tabular data in real-time. The driver layer module is configured to generate complex aggregate functions as new features and score in the data frame. The iterative function interpreter module is configured to process and generate a complex aggregate function output in the data frame through an iterative and a nested process, wherein the iterative function interpreter module is capable of generating the output in the data frame by processing the aggregate of weights and generated functions as stored in the configuration tabular data. The system is deployed anywhere in the data science pipeline to generate complex features based on the business requirements.
In another aspect, a computer-implemented method for real-time feature enrichment, de-noising, content classification and scoring is provided. The computer-implemented method includes the following steps: interpretation of a configuration tabular data as a text statement, wherein the configuration tabular data includes a plurality of features; standardization of the plurality of features in an input data frame with a standardized range as an input to the final aggregate function; and processing of the final function as an aggregate of a weight and a function from the configuration tabular data in the output data frame, wherein the output data frame is processed iteratively and in a nested fashion; and displaying the output data frame with the final function.
The preceding is a simplified summary to provide an understanding of some aspects of embodiments of the present invention. This summary is neither an extensive nor exhaustive overview of the present invention and its various embodiments. The summary presents selected concepts of the embodiments of the present invention in a simplified form as an introduction to the more detailed description presented below. As will be appreciated, other embodiments of the present invention are possible utilizing, alone or in combination, one or more of the features set forth above or described in detail below.
BRIEF DESCRIPTION OF THE DRAWINGS
The above and still further features and advantages of embodiments of the present invention will become apparent upon consideration of the following detailed description of embodiments thereof, especially when taken in conjunction with the accompanying drawings, and wherein:
Figure 1 illustrates a block diagram of a system for real-time generation of complex aggregate functions, according to an embodiment herein;
Figure 2 illustrates a block diagram of a computer-implemented method for real-time generation of complex aggregate functions, according to an embodiment herein; and
Figure 3 illustrates a block diagram of a configuration storage file system, according to an embodiment herein.
To facilitate understanding, like reference numerals have been used, where possible, to designate like elements common to the figures.
DETAILED DESCRIPTION
As used throughout this application, the word "may" be used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include”, “including”, and “includes” mean including but not limited to.
The phrases “at least one”, “one or more”, and “and/or” are open-ended expressions that are both conjunctive and disjunctive in operation. For example, each of the expressions “at least one of A, B and C”, “at least one of A, B, or C”, “one or more of A, B, and C”, “one or more of A, B, or C” and “A, B, and/or C” means A alone, B alone, C alone, A and B together, A and C together, B and C together, or A, B and C together.
The term “a” or “an” entity refers to one or more of that entity. As such, the terms “a” (or “an”), “one or more” and “at least one” can be used interchangeably herein. It is also to be noted that the terms “comprising”, “including”, and “having” can be used interchangeably.
Figure 1 illustrates a block diagram of the system for real-time generation of complex aggregate functions. The system is deployed in a variety of applications in data science such as, but not limited to, feature enrichment, de-noising, and content classification and scoring. The system includes a configuration interpreter module, a driver layer module and an iterative function interpreter module.
In an embodiment, the system is capable of evaluating textual statements of instructions and interpreting the textual statements into a programmable and executable system and then iteratively executing the textual statements through a nested process to obtain a complex computation graph.
The configuration interpreter module interprets, evaluates and executes a complex aggregate function stored as a text statement (105) in a configuration tabular data in real-time. In an embodiment, the configuration tabular data includes a document such as a spreadsheet and a text file. In an embodiment, the configuration interpreter module generates an ad-hoc computation in real time.
The driver layer module is configured to generate the complex aggregate function as a new feature and a score in the data frame. In an embodiment, an input consists of a DataFrame (115) that is capable of storing a plurality of enriched features (110). In an embodiment, the plurality of enriched features (110) is employed for generating the complex computation. In an embodiment, the DataFrame (115) is a data structure/table. In an embodiment, the plurality of enriched features that are related or unrelated are combined to generate a classification output that can represent an output or an intermediary output like a feature enrichment and a flag for De-Noising, classification.
Figure 3 illustrates a two-dimensional, heterogeneous tabular data [300] having a plurality of rows and a plurality of columns. In an embodiment, the plurality of columns represents features (305) in the DataFrame. In another embodiment, the plurality of rows represents the records (310) in the DataFrame.
In an embodiment, the DataFrame can represent a document vector. The document vector is a serialized representation of the content in a document with a plurality of columns that provide properties/features of the content in a plurality of row-record. The document vector is stored and processed through a DataFrame object by the software/analytics pipeline.
Each of the plurality of feature in the driver layer has a nested list of complex functions that are embodied as a list of complex computational statements. These functions are evaluated generating intermediary outputs/features (125). In an embodiment, the driver layer module is capable of standardizing the plurality of features (110) with a standardization range. In an embodiment, the standardization range is a positive number from the configuration file per intermediary outputs/features. In another embodiment, the standardization range is a negative number from the configuration file per intermediary outputs/features. In an embodiment, the plurality of features (110) is customised with a clip-range. In an embodiment, the intermediary feature vector (125) is generated to configure in the line items of configuration file, following the complex-custom computation, standardization and range-limit process
The iterative function interpreter module is configured to process and generate a complex aggregate function output in the data frame through an iterative and a nested process. In an embodiment, enriched features are nested into another aggregate function and are executed to generate a final score output. In an embodiment, multiple nested loops are executed to achieve the complex computation function embodied through the configuration files (345).
In an embodiment, weights (130) are added to the aggregate function. In an embodiment, the functions include trigonometry, polynomial, linear and other computation transforms. In an embodiment, the complex computation is embodied as a serialized computation. In another embodiment, the complex computation is embodied as a vectorized approach. In an embodiment, the serialized computation is a sum aggregate. In an embodiment, the sum aggregate is represented by the following equation:
F(x)= w1*f(x1)+w2*f(x2)+………………………+wi*f(xi)
In another embodiment, the serialized computation is a product aggregate. In an embodiment, the product aggregate is represented by the following equation:
F(x)=f(x1)w1 * f(x2)w2 * …………………………….*f(xi)wi
The intermediary features and weights based on sum/product aggregate are and processed in multiple iterations to yield the output/classification vector. The output/classification vector is limited based on the ranges in the configuration file (345). This output vector (135) is added to the DataFrame, generating a new feature/output/ classification. In an embodiment, the output vector/classification is embodied with the plurality of input features ranging from 1 to n. In a preferred embodiment, the range lies in between 1 and 20.
The iterative function interpreter module handles evaluation of computational statements, feature normalization, feature standardization and finalization of the plurality of features.
Figure 2 illustrates the computer-implemented method for real-time feature enrichment, de-noising, content classification and scoring. The computer-implemented method processes the complex computations for real-time feature enrichment, de-noising, content classification and scoring. The computer-implemented method includes the following steps:
The first step of the computer-implemented method involves the interpretation of the configuration tabular data as a text statement. In an embodiment, the configuration tabular data includes the plurality of features. In an embodiment, the configuration tabular data includes the document such as spreadsheet or text file. In an embodiment, ad-hoc computations are generated in real-time.
The second step of the computer-implemented method involves the standardization of the plurality of features (110) in the input data frame with a standardized range as an input to the final aggregate function. In an embodiment, the standardization range is a positive number from the configuration file per intermediary outputs/features. In another embodiment, the standardization range is a negative number from the configuration file per intermediary outputs/features.
Each of the plurality of feature in the input data frame has a nested list of complex functions that are embodied as a list of complex computational statements. These functions are evaluated generating intermediary outputs/features.
The third step of the computer-implemented method involves the processing of the final function as an aggregate of a weight and a function from the configuration tabular data in the output data frame. In an embodiment, the output data frame is processed iteratively and in a nested fashion.
In an embodiment, enriched features are nested into another aggregate function and are executed to generate a final score output. In an embodiment, multiple nested loops are executed to achieve the complex computation function embodied through the configuration files (345).
In an embodiment, weights are added to the aggregate function. In an embodiment, the functions include trigonometry, polynomial, linear and other computation transforms. In an embodiment, the complex computation is embodied as a serialized computation. In an embodiment, the serialized computation is a sum aggregate. In an embodiment, the sum aggregate is represented by the following equation:
F(x)= w1*f(x1)+w2*f(x2)+………………………+wi*f(xi)
In another embodiment, the serialized computation is a product aggregate. In an embodiment, the product aggregate is represented by the following equation:
F(x)=f(x1)w1 * f(x2)w2 * …………………………….*f(xi)wi
The intermediary features and weights based on sum/product aggregate are processed in multiple iterations to yield the output/classification vector (135). The output/classification vector is limited based on the ranges in the configuration file. This output vector is added to the DataFrame, generating a new feature/output/ classification. In an embodiment, the output vector/classification is embodied with the plurality of input features ranging from 1 to n. In a preferred embodiment, the range lies in between 1 and 20.
The fourth step of the computer-implemented method involves the displaying the output DataFrame with the output vector/classification.
The present system and method allow the addition/incorporation of the new complex aggregate function with a change in the configuration tabular data. The present system and method do not require a change or deployment of a new code for the processing of the complex aggregate function.
The present system and method are deployed in data science pipeline that consists of processing data in tabular structures through a sequence of feature enrichment, feature selection processes, applications of models and analytics functional applications, and other data transforms done to the input DataFrame to result into the output DataFrame that can be used for business applications.
Moreover, though the description of the present invention has included description of one or more embodiments, configurations, or aspects and certain variations and modifications, other variations, combinations, and modifications are within the scope of the present invention, e.g., as may be within the skill and knowledge of those in the art, after understanding the present disclosure. It is intended to obtain rights which include alternative embodiments, configurations, or aspects to the extent permitted, including alternate, interchangeable and/or equivalent structures, functions, ranges or steps to those claimed, whether or not such alternate, interchangeable and/or equivalent structures, functions, ranges or steps are disclosed herein, and without intending to publicly dedicate any patentable subject matter.

Documents

Application Documents

# Name Date
1 202041054194-STATEMENT OF UNDERTAKING (FORM 3) [14-12-2020(online)].pdf 2020-12-14
2 202041054194-STARTUP [14-12-2020(online)].pdf 2020-12-14
3 202041054194-PROOF OF RIGHT [14-12-2020(online)].pdf 2020-12-14
4 202041054194-POWER OF AUTHORITY [14-12-2020(online)].pdf 2020-12-14
5 202041054194-FORM28 [14-12-2020(online)].pdf 2020-12-14
6 202041054194-FORM-9 [14-12-2020(online)].pdf 2020-12-14
7 202041054194-FORM FOR STARTUP [14-12-2020(online)].pdf 2020-12-14
8 202041054194-FORM FOR SMALL ENTITY(FORM-28) [14-12-2020(online)].pdf 2020-12-14
9 202041054194-FORM 18A [14-12-2020(online)].pdf 2020-12-14
10 202041054194-FORM 1 [14-12-2020(online)].pdf 2020-12-14
11 202041054194-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [14-12-2020(online)].pdf 2020-12-14
12 202041054194-EVIDENCE FOR REGISTRATION UNDER SSI [14-12-2020(online)].pdf 2020-12-14
13 202041054194-DRAWINGS [14-12-2020(online)].pdf 2020-12-14
14 202041054194-DECLARATION OF INVENTORSHIP (FORM 5) [14-12-2020(online)].pdf 2020-12-14
15 202041054194-COMPLETE SPECIFICATION [14-12-2020(online)].pdf 2020-12-14
16 202041054194-OTHERS [02-03-2021(online)].pdf 2021-03-02
17 202041054194-MARKED COPIES OF AMENDEMENTS [02-03-2021(online)].pdf 2021-03-02
18 202041054194-FORM 13 [02-03-2021(online)].pdf 2021-03-02
19 202041054194-FER_SER_REPLY [02-03-2021(online)].pdf 2021-03-02
20 202041054194-DRAWING [02-03-2021(online)].pdf 2021-03-02
21 202041054194-COMPLETE SPECIFICATION [02-03-2021(online)].pdf 2021-03-02
22 202041054194-CLAIMS [02-03-2021(online)].pdf 2021-03-02
23 202041054194-AMMENDED DOCUMENTS [02-03-2021(online)].pdf 2021-03-02
24 202041054194-ABSTRACT [02-03-2021(online)].pdf 2021-03-02
25 202041054194-FER.pdf 2021-10-18
26 202041054194-US(14)-HearingNotice-(HearingDate-02-02-2024).pdf 2024-01-22
27 202041054194-Correspondence to notify the Controller [01-02-2024(online)].pdf 2024-02-01

Search Strategy

1 search_expAE_20-08-2021.pdf
2 searchE_15-01-2021.pdf