Method And System For Test Suite Optimization

< Back

Method And System For Test Suite Optimization

Abstract: Enhancements or changes to an application over different releases demand an optimized set of test cases to perform testing of latest version of the application. Approaches used by existing systems, such as source code based approaches, are not programming language agnostic or may be complex. The disclosure herein generally relates to application testing, and, more particularly, to a method and system for release notes processing based test suite optimization. By processing release notes and a feature dependency map, the system identifies all changed and impacted features, and further identifies test cases that are relevant to the changed and impacted features. Further, highly relevant test cases are segregated and are used to form an optimized test suite. [To be published with FIG. 2]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

09 February 2022

Publication Number

32/2023

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Patent Number

Legal Status

Grant Date

2025-09-17

Renewal Date

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th Floor, Nariman Point Mumbai Maharashtra India 400021

Inventors

1. WEWHARE, Dipesh

Tata Consultancy Services Limited Awadh Park, 1/1, Vibhuti Khand, Gomti Nagar, Lucknow Uttar Pradesh India 226010

2. GAUR, Somendra

Tata Consultancy Services Limited Awadh Park, 1/1, Vibhuti Khand, Gomti Nagar, Lucknow Uttar Pradesh India 226010

Specification

FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION (See Section 10 and Rule 13)
Title of invention: METHOD AND SYSTEM FOR TEST SUITE OPTIMIZATION
Applicant
Tata Consultancy Services Limited A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
Preamble to the description
The following specification particularly describes the invention and the manner in which it is to be performed.

TECHNICAL FIELD [001] The disclosure herein generally relates to application testing, and, more particularly, to a method and system for test suite optimization.
BACKGROUND
[002] Applications typically have different versions. When an application is launched for the first time, it is possible that there are some bugs. In an effort to fix the bugs and/or to add more features and/or to remove certain existing features, developers of the applications may continue working on the applications, and subsequently new versions of the applications get released. As part of quality check, before releasing any version of an application, testing is performed to ensure that the application is functioning as intended. To test any application, a plurality of testcases are required, directed to different features (of the application) being tested. The testcases used to test all features of an application are collectively referred to as ‘test suite’.
[003] Every addition/deletion/updation of features would demand a change in previously used test suite, as some testcases may become irrelevant, some may have to be newly added to test the added/changed features and so on. This change in test suite to address the changes made to the application is referred to as test suite optimization. One way to perform the test suite optimization is by manually selecting required test cases. However, large applications may have quite a lot of features, and to test each feature n number of test cases may be required. As a result, manual selection maybe a cumbersome task, which is also prone to human errors. Such errors may result in unattended bugs, and later when the application is released, may result in the application malfunctioning/failing and being less efficient.
[004] There exist some systems that at least partially automate the test suite optimization. Such systems may be employing different methods for the test suite optimization. For example, some of the existing systems being used for test suite optimization rely on source code analysis. Disadvantage of this approach is that as the codes may be lengthy and complex, the systems may sometimes fail to

detect the changes efficiently. In addition, as different applications may have been built using different coding languages, a system handling one particular type of code may fail to work with a different code. To have different systems to handle applications built on different types of codes may be effort intensive and costly.
SUMMARY [005] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a processor implemented method of generating a data model for test suite optimization is provided. The method includes initially collecting, via one or more hardware processors, a training data comprising plurality of release notes and a feature dependency map of a plurality of reference applications, wherein the plurality of release notes comprises change information across different versions of each of the plurality of reference applications. Further, an impact analysis is performed to identify all features of each of the plurality of reference applications that have been modified across the different versions, via the one or more hardware processors, wherein the impact analysis comprises processing the plurality of release notes and the feature dependency map of each of the plurality of reference applications. Further, one or more test cases relevant to each of the features identified as modified across the different versions are identified, via the one or more hardware processors. Identifying the one or more test cases relevant to each of the features identified as modified across the different versions involves the following steps. Initially, a plurality of data points are captured from a historical data of a plurality of test cases. Further, each of the plurality of data points is transformed to a corresponding numerical form to generate a plurality of data point values. Further, the one or more test cases relevant to each of the features are identified based on the generated data point values. After identifying the one or more test cases relevant to each of the features identified as modified across the different versions, the data model is generated via the one or more hardware

processors, based on the features identified by performing the impact analysis, and the identified one or more test cases relevant to each of the features.
[006] In another aspect, the generated data model is used to perform the test suite optimization for an Application Under Test (AUT), which involves the following steps. Initially, a plurality of release notes of the AUT are fetched. Further, one or more changed features of the AUT are identified by processing the plurality of release notes. Further, a plurality of test cases relevant to the identified one or more changed features are identified. Further, the identified plurality of test cases are processed using the data model, which further involves the following steps. Initially, a relative importance of each of the plurality of test cases is determined in comparison with one another. Further, the plurality of test cases are segregated as highly relevant test cases and less relevant test cases. Further, an optimized test suite is generated using the plurality of test cases segregated as the highly relevant test cases.
[007] In yet another aspect, a system for generating a data model for test suite optimization is provided. The system includes one or more hardware processors, a communication interface, and a memory storing a plurality of instructions, wherein the plurality of instructions when executed, configure the one or more hardware processors to initially collect a training data comprising plurality of release notes and a feature dependency map of a plurality of reference applications, wherein the plurality of release notes comprises change information across different versions of each of the plurality of reference applications. Further, an impact analysis is performed to identify all features of each of the plurality of reference applications that have been modified across the different versions, via the one or more hardware processors, wherein the impact analysis comprises processing the plurality of release notes and the feature dependency map of each of the plurality of reference applications. Further, one or more test cases relevant to each of the features identified as modified across the different versions are identified, via the one or more hardware processors. Identifying the one or more test cases relevant to each of the features identified as modified across the different versions involves the following steps. Initially, a plurality of data points are

captured from a historical data of a plurality of test cases. Further, each of the plurality of data points is transformed to a corresponding numerical form to generate a plurality of data point values. Further, the one or more test cases relevant to each of the features are identified based on the generated data point values. After identifying the one or more test cases relevant to each of the features identified as modified across the different versions, the data model is generated via the one or more hardware processors, based on the features identified by performing the impact analysis, and the identified one or more test cases relevant to each of the features.
[008] In yet another aspect, the one or more hardware processors in the system are configured to perform the test suite optimization for an Application Under Test (AUT), using the generated data model by executing the following steps. Initially, a plurality of release notes of the AUT are fetched. Further, one or more changed features of the AUT are identified by processing the plurality of release notes. Further, a plurality of test cases relevant to the identified one or more changed features are identified. Further, the identified plurality of test cases are processed using the data model, which includes the following steps. Initially, a relative importance of each of the plurality of test cases in comparison with one another is determined. Further, the plurality of test cases are segregated as highly relevant test cases and less relevant test cases. Further, an optimized test suite is generated using the plurality of test cases segregated as the highly relevant test cases.
[009] In yet another aspect, a non-transitory computer readable medium for generating a data model for test suite optimization is provided. The non-transitory computer readable medium includes a plurality of instructions, which when executed, cause one or more hardware processors to initially collect a training data comprising plurality of release notes and a feature dependency map of a plurality of reference applications, wherein the plurality of release notes comprises change information across different versions of each of the plurality of reference applications. Further, an impact analysis is performed to identify all features of each of the plurality of reference applications that have been modified across the

different versions, via the one or more hardware processors, wherein the impact analysis comprises processing the plurality of release notes and the feature dependency map of each of the plurality of reference applications. Further, one or more test cases relevant to each of the features identified as modified across the different versions are identified, via the one or more hardware processors. Identifying the one or more test cases relevant to each of the features identified as modified across the different versions involves the following steps. Initially, a plurality of data points are captured from a historical data of a plurality of test cases. Further, each of the plurality of data points is transformed to a corresponding numerical form to generate a plurality of data point values. Further, the one or more test cases relevant to each of the features are identified based on the generated data point values. After identifying the one or more test cases relevant to each of the features identified as modified across the different versions, the data model is generated via the one or more hardware processors, using based on the features identified by performing the impact analysis, and the identified one or more test cases relevant to each of the features.
[010] In yet another aspect, the non-transitory computer readable medium causes the one or more hardware processors to perform the test suite optimization for an Application Under Test (AUT), using the generated data model by executing the following steps. Initially, a plurality of release notes of the AUT are fetched. Further, one or more changed features of the AUT are identified by processing the plurality of release notes. Further, a plurality of test cases relevant to the identified one or more changed features are identified. Further, the identified plurality of test cases are processed using the data model, which includes the following steps. Initially, a relative importance of each of the plurality of test cases in comparison with one another is determined. Further, the plurality of test cases are segregated as highly relevant test cases and less relevant test cases. Further, an optimized test suite is generated using the plurality of test cases segregated as the highly relevant test cases.

[011] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
[012] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
[013] FIG. 1 illustrates an exemplary system for test suite optimization, according to some embodiments of the present disclosure.
[014] FIGS. 2A and 2B (collectively referred to as FIG. 2) is a flow diagram depicting steps involved in the process of generating a data model for test suite optimization, by the system of FIG. 1, according to some embodiments of the present disclosure.
[015] FIG. 3 illustrates a flow diagram depicting steps involved in the process of performing a test suite optimization for an Application Under Test (AUT), using the data model, by the system of FIG. 1, in accordance with some embodiments of the present disclosure.
[016] FIG. 4 depicts an example neural network used by the system of FIG. 1 to generate a data model for test suite optimization, in accordance with some embodiments of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS [017] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.

[018] Existing systems for automating test suite optimization may be employing different methods for the test suite optimization. For example, some of the existing systems being used for test suite optimization rely on source code analysis. Disadvantage of this approach is that as the codes may be lengthy and complex, the systems may sometimes fail to detect the changes efficiently. In addition, as different applications may have been built using different coding languages, a system handling one particular type of code may fail to work with a different code. To have different systems to handle applications built on different types of codes may be effort intensive and costly. In order to address these concerns, embodiments of the present disclosure provide a method and system for release notes based test suite optimization. The system processes release notes of an application to determine features that have been changed/added/removed in at least latest release/version of the application. Further, based on the changed/added/removed features is processed with a feature dependency map of the application to identify all features that have been impacted. Further, for the features that have been identified as impacted, the system identifies corresponding test cases, and in turn performs test suite optimization.
[019] Referring now to the drawings, and more particularly to FIG. 1 through FIG. 4, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
[020] FIG. 1 illustrates an exemplary system for test suite optimization, according to some embodiments of the present disclosure. The system 100 includes or is otherwise in communication with hardware processors 102, at least one memory such as a memory 104, an I/O interface 112. The hardware processors 102, memory 104, and the Input /Output (I/O) interface 112 may be coupled by a system bus such as a system bus 108 or a similar mechanism. In an embodiment, the hardware processors 102 can be one or more hardware processors.
[021] The I/O interface 112 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and

the like. The I/O interface 112 may include a variety of software and hardware interfaces, for example, interfaces for peripheral device(s), such as a keyboard, a mouse, an external memory, a printer and the like. Further, the I/O interface 112 may enable the system 100 to communicate with other devices, such as web servers, and external databases.
[022] The I/O interface 112 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, local area network (LAN), cable, etc., and wireless networks, such as Wireless LAN (WLAN), cellular, or satellite. For the purpose, the I/O interface 112 may include one or more ports for connecting several computing systems with one another or to another server computer. The I/O interface 112 may include one or more ports for connecting several devices to one another or to another server.
[023] The one or more hardware processors 102 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, node machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the one or more hardware processors 102 is configured to fetch and execute computer-readable instructions stored in the memory 104.
[024] The memory 104 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random-access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, the memory 104 includes a plurality of modules 106.
[025] The plurality of modules 106 include programs or coded instructions that supplement applications or functions performed by the system 100 for executing different steps involved in generating the training data and in turn a trained data model for test suite optimization. The plurality of modules 106, amongst other things, can include routines, programs, objects, components, and data structures, which performs particular tasks or implement particular abstract data types. The plurality of modules 106 may also be used as, signal processor(s),

node machine(s), logic circuitries, and/or any other device or component that manipulates signals based on operational instructions. Further, the plurality of modules 106 can be used by hardware, by computer-readable instructions executed by the one or more hardware processors 102, or by a combination thereof. The plurality of modules 106 can include various sub-modules (not shown). The plurality of modules 106 may include computer-readable instructions that supplement applications or functions performed by the system 100 for the test suite optimization.
[026] The data repository (or repository) 110 may include a plurality of abstracted piece of code for refinement and data that is processed, received, or generated as a result of the execution of the plurality of modules in the module(s) 106.
[027] Although the data repository 110 is shown internal to the system 100, it will be noted that, in alternate embodiments, the data repository 110 can also be implemented external to the system 100, where the data repository 110 may be stored within a database (repository 110) communicatively coupled to the system 100. The data contained within such external database may be periodically updated. For example, new data may be added into the database (not shown in FIG. 1) and/or existing data may be modified and/or non-useful data may be deleted from the database. In one example, the data may be stored in an external system, such as a Lightweight Directory Access Protocol (LDAP) directory and a Relational Database Management System (RDBMS). Functions of the components of the system 100 are now explained with reference to steps in flow diagrams in FIG. 2 and FIG. 3.
[028] FIGS. 2A and 2B (collectively referred to as FIG. 2) is a flow diagram depicting steps of a method 200 involved in the process of generating a data model for test suite optimization, by the system of FIG. 1, according to some embodiments of the present disclosure.
[029] In an embodiment, the system 100 comprises one or more data storage devices or the memory 104 operatively coupled to the processor(s) 102 and is configured to store instructions for execution of steps of the method 200 and a

method 300 by the processor(s) or one or more hardware processors 102. The steps of the method 200 and method 300 of the present disclosure will now be explained with reference to the components or blocks of the system 100 as depicted in FIG. 1 and the steps of flow diagrams as depicted in FIG. 2 and FIG. 3. Although process steps, method steps, techniques or the like may be described in a sequential order, such processes, methods, and techniques may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps to be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously.
[030] At step 202, the system 100 collects a training data comprising a plurality of release notes and a feature dependency map of each of a plurality of reference applications, via one or more hardware processors, wherein the plurality of release notes comprises change information across different versions of each of the plurality of reference applications. The applications being used to form the training data at a time may be from same domain or from multiple domains. The system 100 may collect the data in different ways, from various Application Life cycle Management tools, and some examples of the approaches for data collection are given below:
• Data imported from integrated Tools accessed through DB or REST APIs
• Bulk upload of the data using CSV and XLS
• Using adaptor for Data discovery (JIRA/ALM-Octane/ALM/Rally/Service-Now)
• By having capabilities to continuously collect and refresh information on various applications
[031] The release notes of each application indicate changes (addition/removal/updates) to the application in different releases. The feature dependency map of each application indicates/captures dependency of each of the features of the application with one or more other features of the application. For example, if the application is a banking application, then case transaction feature

has a dependency on authentication feature i.e. the user is allowed to make a transaction only after successful authentication. Similar way, other features also may have such dependencies. In addition to the release notes and the feature dependency map, the system 100 may collect some additional data as inputs. In another embodiment, these additional data may be present in the release notes and may be extracted by the system 100. Examples of the additional data are, but not limited to:
• Release :- Indicates release version
• Requirements :- Requirements here refers to new features or enhancements of an application.
• Test Set :- Test set refers to logical grouping of Test cases like regression, sanity etc.
• Testcase :- Testcases used in previous versions/releases
• Defect :- historical data on defects identified with previous versions
• Test Execution :- results of testcases executed in previous versions
• Release Notes :- release notes indicating changes made in at least latest version release
[032] Further, at step 204 of the method 200, the system 100 performs an impact analysis to identify all features that have been modified across the different versions, via the one or more hardware processors, wherein the impact analysis comprises processing the plurality of release notes and the feature dependency map of each of the plurality of reference applications.
[033] In an embodiment, as the system 100 as to perform the test suite optimization for identifying testcases to test only latest release of the application, the system may process only latest version of the release note so that most recent changes to the application can be identified. The latest release note summarizes the changes, enhancements and bug fixes in the latest release. One sample release note may include the following sections:
• Header – Document Name (i.e., Release Notes), product name, release number,
release date, note date, note version, etc.

• Overview - A brief overview of the product and changes, in the absence of other formal documentation.
• Purpose - A brief overview of the purpose of the release note with a listing of what is new in this release, including bug fixes and new features.
• Issue Summary - A short description of the bug or the enhancement in the release and its associated module name.
• Steps to Reproduce - The steps that were followed when the bug was encountered.
• Resolution - A short description of the modification/enhancement that was made to fix the bug.
• End-User Impact - What different actions are needed by the end-users of the application. This should include whether other functionality is impacted by these changes.
[034] During the impact analysis, the system 100 may use a suitable Natural Language Processing (NLP) technique to process the release note(s) and identify the changes to the application in the latest release.
[035] During the NLP processing of the documents, a NLP engine (not shown) of the system 100 may initially read the release notes collected as input and may perform a keyword based exhaustive search. Keywords act as a knowledge database, as search for different domains can be different. As a result, search string cannot be generalized. So the system 100 performs pre-processing of the documents. During the pre-processing, the system 100 tokenizes and converts the contents of the input documents i.e. the release notes into lowercase. The system 100 may further perform stemming on the words in the documents to remove all the affixes. The NLP engine further identifies format and the language of each of the documents. The NLP engine may further perform one of an exhaustive search or a fuzzy search to interpret the documents and extract data.
• Exhaustive Search: Exhaustive search on the whole document or the
specific part of the document is performed to identify text which
corresponds to certain keywords fed as input to the system 100. Depending
on requirements, separate keywords can be configured.

o Search for exact keywords o Search for keywords with spaces o Case insensitive search o Search for keywords with special character • Fuzzy Search:
Consider that the release note contains the following contents, written by three
different users (User 1, User 2, and User 3):
User 1: “The defect-Id1224 has been fixed which is reported for the crashing of
user Login Page.”
User 2: “The Bug-ID1224 found in User Login Page has been fixed successfully”
User 3: “The Bugs Ids reported are mentioned in ascending order in the release notes”
After performing lowercase transformation, stop words removal, tokenization
and stemming, the sentences are as follows:
User 1: “defect-id1224 fix report crash user login page”
User 2: “bug-id1224 user login page fix”
User 3: “bug ids report mention ascend order release notes”
To perform the search for defects in the release notes, consider that the
keywords being searched by the system 100 are:
Defect: defectid*/bugid*/ticketid*
[036] The system 100 identifies the defects ids in both the sentences by User 1 and User 2 and reports for impact analysis. But the sentence by User 3 is not identified as defect as the system 100 treats Bug-ID and bug ids differently. Presence of wild cards boosts the scope of search in a document. In a similar way requirement can also be identified.
[037] Due to the dependency of the features, the changes to one or more of the features during the release may have impacted one or more other features in the application. The system 100 processes information on the changes identified by processing the release note(s) along with the feature dependency map, to identify

impact, wherein ‘impact’ in this context is defined in terms of all features that have been impacted/modified in latest release of the application.
[038] Further, at step 206 of the method 200, the system 100 identifies, via the one or more hardware processors, one or more test cases relevant to each of the features identified as modified in at least the latest release. This in turn helps the system 100 pick/select right test cases for the test case optimization.
[039] Sub-steps in the process of identifying the one or more test cases relevant to each of the features identified as modified in at least the latest release are depicted in steps 206a through 206c.
[040] Selection of test cases may be based on various factors such as but not limited to:
1) Higher the defects associated with a testcase,
2) Higher number of associated requirements,
3) Higher number of executions associated with the testcase, and
4) Age of the testcase
[041] Applications are developed by following application lifecycle including the phases of Plan, Develop, Test, Deploy, Maintain. Each phase of this process leads to creation of artifacts to catalogue the steps and activities performed. At step 206a, the system 100 captures application data for defects, testcases, release, test set, requirement, and test execution referenced respectively as follows in the solution.
• Release Datapoint:- having release details of application
• Requirement Datapoint:- having all the requirements available for the application
• Test Set Datapoint:- having the test sets created for the application
• Test case Datapoint:- having all the test cases written for the application
• Defects Datapoint:- having all the logged defects for the application
• Test Execution Datapoint:- having all the execution records of the test cases for the application.
[042] These datapoints (Release, Requirements, Test set, Testcase, Defect, Test Execution) listed are standard entities that any ALM tool (JIRA/ALM-

Octane/ALM/Rally/Service-Now) captures and details of each of these datapoints refer to the fields associated with each of these entities.
[043] Further, at step 206b, the system 100 transforms each of the plurality of data points to a corresponding numerical form to generate a plurality of data point values. Transforming to the numerical format maybe done by the system 100 with or without keeping certain parameters as base value. For example, the system 100 may keep release age (alternately referred to as ‘age of release’) as the base value. The age of a release is computed based on release date. The values range between 1 and n. The system 100 may be configured to consider age of latest release as 1 and that of oldest release as n i.e., in reverse order. Further, the other data points are transformed to the corresponding numerical value as:
1. Number of Requirements linked:- Defines absolute count of the requirements that a test case is linked to, and value can be between 0 and n. Higher the requirements associated with a testcase higher is its importance, hence this is an important measure considered for selection of testcases.
2. Test case Priority: Represents priority of a testcase, and value is defined between 1 and 3, wherein 1 indicates High Priority, 2 indicates Medium Priority, and 3 indicates Low Priority.
3. Test Execution Age: Represents minimum execution age of a testcase i.e., the age of the latest release of a test execution corresponding to a test case, and value can be between 1 and n.
4. Defect Age: Represents minimum age of a defect i.e., the age of the latest release of a defect, that corresponds to a failed test execution of a test case. The value can be between 1 and n.
5. Defect Validity: It is calculated whether the defects associated with test case are valid or not. If valid defects are associated, then value is 1 else 0.
6. Requirement Age: Represents minimum age of a requirement i.e., the age of the latest release of a requirement to which a test case is linked to. The value can be between 1 and n.
7. Requirement Coverage: Represents extent of coverage of a test case. Value is calculated depending on whether or not a testcase covers more than one

requirement. If the testcase covers more than one requirement, then value is
assigned as 1, else it is 0.
For each parameter, the n is maximum as the Release Age for the application.
[044] Here, the system 100 may calculate the parameters “Test Execution Age, Defect Age and Requirement Age” by keeping the release age as the base value i.e., their age is same as that of the release to which these parameters are linked with. Based on the values of the data points, the system 100, at step 206c, identifies one or more test cases relevant to each of the features of the application being considered. At this stage, after calculating values of the different data points for each of the test cases, the system 100 computes a weighted average score for each test case. The system 100 computes the weighted average score based on values of the different data points and a weightage value assigned to each of the data points. For example, the weighted average score is computed as:
weighted average score = (aix) + (biy) + (ciz) --- (1)where, ai,
bi, and ci are different data points, and x, y, z represent corresponding
weightage values.
[045] It is to be noted that the equation (1) shows only 3 data points as an example, however this is not intended to restrict scope of embodiments disclosed herein. Depending on number of data points being considered, equation (1) can be expanded in the similar way. The calculated weighted average is then compared with a threshold, by the system 100. All the test cases for which the computed weighted average exceeds the threshold are identified as relevant and are selected by the system 100 at step 206c.
[046] After identifying the one or more test cases relevant to each of the features identified as modified across the different versions at step 206, the system 100, at step 208 of the method 200, generates, via the one or more hardware processors, a data model using information on the features identified by performing the impact analysis, and the identified one or more test cases relevant to each of the features as training data. The data model maybe then used by the system 100 to process real-time data and perform test suite optimization. In an embodiment, data

learned by the system 100 during the test suite optimization of real-time data maybe used by the system 100 for retraining and updating the data model.
[047] In an embodiment, the system generates the data model by training a neural network using the training data at step 208. Training of the neural network by the system 100 is further explained below.
[048] The system 100 may use a multi-layer artificial neural network for generating the data model. An example neural network architecture considered can have seven inputs with an input layer comprising seven neurons, a hidden layer having eight neurons, and an output layer having a single neuron for classification. The number of neurons in the hidden layer is a hyper parameter which is set to eight by analyzing various values. A sigmoid function is applied on the output layer and a threshold of 0.8 is considered. The neural network works as a binary classifier and outputs 1 if the testcase needs to be selected and 0 if not in the regression test suite. This example neural network architecture is depicted in FIG. 4.
[049] In an embodiment, a supervised learning approach is used for training the neural network. In this process, a subset of training data with actual output is provided to the neural network as input. Each of a plurality of neurons of an input layer of the neural network corresponds to each of the input data i.e. the training data. These inputs are fed forwarded to the neural network. The inputs and the sigmoid function are used to calculate different weights and biases and optimization is achieved by utilizing a gradient method.
[050] While starting a training process, values for initial weights W1 and W2 are assigned randomly. An iterative approach is followed by the system 100 to constantly update the weights to minimize loss and achieve a global optimum point.
[051] For explanation purpose, it is considered that W1 and W2 are arrays of size [7,8] and [8,1] respectively.
[052] Firstly, the inputs are passed through a hidden layer of the neural network and each neuron processes the input using the sigmoid function. Consider that value for one parameter is x1 and the random assigned weights are [w1 w2 w3 w4 w5 w6 w7 w8].
[053] Output of a first neuron in the hidden layer is calculated as:

ht = ∑i8=1 x1 wi --- (2)
[054] Afterwards the sigmoid function is applied to this computed output for the final output as follows:

[055] Weights for all neurons of the hidden layer are calculated in similar manner and are passed on to a final output layer. The computation on the final output layer is represented as:

Where,∈ = ∑18=1 W2i
[056] Here wij represent the weights assigned to jth neuron of the ith layer and hi represents ith neuron of the hidden layer.
[057] The computed Z i.e., the predicted value by the network is then compared with the actual output Y for the testcase and the loss is computed with the following cross entropy function:
H(y) = 1N ∑iN=1 yi .log(p(yi)) + (1— yi).log (1— p(yi)) --- (5)
Where, yi is the actual value and p(yi )is the predicted value .
[058] After computing the loss, the system 100 uses a backpropagation algorithm for computing gradients and updating the weights. The process is done repetitively until a global minimum is achieved. The weights computed by the system 100 are saved in the system 100 to be leveraged for production data.
[059] An example scenario of the system 100 using the data model for performing the test suite optimization is depicted in FIG. 3. At step 302 of the method 300 depicted in FIG.3, the system 100 fetches a plurality of release notes and a feature dependency map of an Application Under Test (AUT). At step 304, the system 100 identifies one or more changed features of the AUT and impacted features by processing the plurality of release notes and the feature dependency

map. At step 306, the system 100 identifies a plurality of test cases relevant to the identified one or more changed features and the corresponding impacted features, by executing steps similar to that in step 206 of the method 200. Further, at step 308, the system 100 processes the test cases identified as relevant to the one or more changed and impacted features, using the data model. The data model, at step 308a of the step 308, determines relative importance of each of the plurality of test cases in comparison with one another. Further, at step 308b, the system segregates the plurality of test cases as highly relevant test cases and less relevant test cases, based on the determined relative importance. The system 100 may compare the determined relative importance with a threshold of importance, and all test cases having the determined relative importance exceeding the threshold of importance are identified as highly relevant, and the test cases having the relative importance below the threshold of importance are identified as less relevant by the system 100. Further, at step 308c, the system 100 generates an optimized test suite, using the test cases segregated as highly relevant test cases. The system 100 may then recommend the optimized test suite to a user, via a suitable interface.
Experimental Results:
[060] Experimental data used have the following keywords. Table 1 : Keyword Considered for Release Notes processing

Reference Keyword
Defects VOWIFI,ANALOG,TEM
Requirements COMMAND,LOGIN,CODE
[061] Using the exhaustive search and matching for the document with the
above keywords, output is obtained in the following format:
Table 2 : Sample Output for the Impacted Test cases from Processed Release Notes

Requirement/Defect ID Testcase Name Module
TD1 VoWI-FI check for access issue Platform

TD2 COMMANDS_Injection_test_vulnerable Settings
TD4 Input LOG-IN access check Behaviour
Table 3 : Release Datapoint

Release Id Release Name Release Date
R1 Alpha 20-12-2015
R2 Beta 20-10-2015
R3 Gamma 20-08-2015
R4 Delta 20-06-2015
Table 4: Requirement Datapoint

Requirement ID Release ID
Req1 R1
Req2 R1
Req3 R2
Req4 R2
Req5 R3
Req6 R3
Req7 R4
Req8 R4
Req9 R1
Req10 R2
Table 5: Testcase Datapoint

Testcase Id Requirement Id Priority Test set Id
TC 04 R2 Low TS1
TC 09 R1 Low TS1
TC 88 R3 Low TS1
TC 111 R4 Low TS1
TC 12 R5 Low TS1

TC 20 R6 Low TS1
TC 26 R7 Low TS1
TC 29 R8 Low TS1
TC 30 R9 Medium TS1
TC 31 R10 Medium TS1
TC 34 R11 Medium TS1
TC 39 R12 Low TS1
TC _44 R13 Low TS1
TC 19 R14 High TS1
TC 24 R15 High TS1
TC 25 R16 High TS1
Table 6: Test Execution Datapoint

Test Exe Id Testcase Id Defect Id Status
TE1 TC 04 Passed
TE2 TC 09 Passed
TE3 TC 88 Passed
TE4 TC 111 Passed
TE5 TC 12 D1 Failed
TE6 TC 20 D2 Failed
TE7 TC 26 D3 Failed
TE8 TC 29 D4 Failed
TE9 TC 30 D5 Failed
TE10 TC 31 D6 Failed
TE11 TC 34 D7 Failed
Table 7: Defect Datapoint

Defect ID Status
D1 Open

D2 Fixed
D3 Open
D4 Fixed
D5 Duplicate
D6 Fixed
D7 Fixed
Table 8 : Test Set Datapoint

Test Set Id Test set type
TS1 Regression Testing
[062] For each of the aforementioned data points, the system 100 computes parameter values i.e. the data points are transformed to corresponding numerical values. As shown in the release datapoint, the release age was computed as [1,2,3,4]. The release age was utilized by the rest of the datapoints to compute the parameters in following format:
• Number of Requirements linked: The given data contained one requirement linked to each testcase, so the value is 1
• Test case Priority: The values were defined between 1 and 3. 1 being High Priority, 2 for Medium Priority and 3 for Low Priority.
• Test Execution Age: The minimum execution age of a testcase i.e., the age of the latest release of a test execution corresponding to a test case was calculated. Here the value is between 1 to 4 depending on the testcase execution. For example, for TC-04 is associated with R2 with release age as 2. Hence, Test Execution Age is 2 for it.
• Defect Age: The minimum age of a defect i.e., the age of the latest release of a defect, that corresponds to a failed test execution of a test case was calculated. The value was between 1 and 4 in this example. Hence for TC-31 it is 2 as it is associated with R2.

• Defect Validity: It was calculated whether the defects associated with test case are valid or not. If valid defects are associated, then value is 1 else 0. Hence, for TC-31 the value is 1 as valid defect is associated with it.
• Requirement Age: The minimum age of a requirement i.e., the age of the latest release of a requirement to which a test case is linked to, was calculated. The value was between 1 and 4 in this example. The requirement age associated with TC-31 is 2 due to R2.

Requi Requir
Test_ remen Testca Requir Executi Defec Defect ements
case_ tslink seprio ement_ ondet tfou valid cover
id
TC_0 ed rity age ails nd ity age

4 TC_0 1 1 2 2 0 0 0

9 TC_8 1 1 2 2 0 0 0

8 TC_1 1 1 2 2 0 0 0

11 TC_1 1 1 2 2 0 0 0

2 TC_2 1 1 2 2 0 0 0

0 TC_2 1 1 1 1 0 0 0

6 TC_2 1 1 2 2 0 0 0

9 1 1 1 1 0 0 0

TC 3
0 TC_3 1 2 2 2 0 0 0

1 TC_3 1 2 2 2 2 1 0

4 1 2 2 2 0 0 0
• Requirement Coverage: It was calculated depending on whether a testcase covers more than one requirement. If it does its values is 1, else it is 0. Here one to one mapping is there between requirement and testcase, so the value is 0. If many to one mapping exist, then it will be 1.
Table 9: Computed Parameter Values for Sample testcase
[063] These values were used as a training data for the neural network.
[064] Following table shows evaluation metrics obtained after training the neural network. Table 10: Result on Evaluation Metrics for Neural Network trained:
Metrics Value
Accuracy 82.75
Precision 0.750000
Recall 0.923077
F1 score 0.827586
Cohens kappa 0.658824
Creating the Test Suite from the trained model and release notes processing:-[065] The created neural network was used as a baselined and was utilized for generating the optimized final suite. The Release Notes processing can output a set of testcases or a set of impacted features. It was considered at this stage that a set of impacted testcases was given as output by the NLP engine. The testcases which were impacted were extracted from the database and were passed to the

baselined neural network for the automated decision making of whether to be selected or not to be selected.

Te Requi Test_ Requ Exec Def Defe Requir Requi M O
st remen case_ irem ution ect_ ctv ements remen od ut
ca tslin prior enta deta fou alidi cover tslin ule p
se ked ity ge ils nd ty age ked s ut
ID
T 1 1 3 2 2 1 1 0 UI Y
D - es
1 Be ha vio ur
T 1 1 3 2 0 0 1 0 Pla Y
D tfo es
2 rm
[066] To calculate the impact of the recommended test suite suggested by the neural network, following metrics were computed:
• Optimization Percentage: It was computed as the percentage of recommended testcase with respect to the total number of testcases. Higher the percentage better is the suggested deck.
• Requirements Coverage: Total Number of requirements covered by the testcases recommended. Higher the coverage better is the recommended deck.
• Defect Coverage: Total Number of defects covered by the testcases recommended. Higher the coverage better is the recommended deck.
• Module Coverage: Total Number of impacted modules covered by the testcases recommended. Higher the coverage better is the recommended deck.

[067] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[068] The embodiments of present disclosure herein address unresolved problem of test suite optimization. The embodiment thus provides a mechanism for determining changed and impacted features of an application in latest release, by processing release notes and a feature dependency map. Moreover, the embodiments herein further provides a mechanism to identify test cases relevant to the determined changed and impacted test cases, and further determining highly relevant test cases which are used to form an optimized test suite.
[069] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g., any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g., hardware means like e.g., an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g., an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g., using a plurality of CPUs.

[070] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[071] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[072] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or

stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[073] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.

We Claim:
1. A processor implemented method (200) of generating a data model for test suite optimization, comprising:
collecting, via one or more hardware processors, a training data comprising plurality of release notes and a feature dependency map of a plurality of reference applications, wherein the plurality of release notes comprises change information across different versions of each of the plurality of reference applications (202); performing an impact analysis to identify all features of each of the plurality of reference applications that have been modified across the different versions, via the one or more hardware processors, wherein the impact analysis comprises processing the plurality of release notes and the feature dependency map of each of the plurality of reference applications (204);
identifying, via the one or more hardware processors, one or more test cases relevant to each of the features identified as modified across the different versions (206), comprising:
capturing a plurality of data points from a historical data of
a plurality of test cases (206a);
transforming each of the plurality of data points to a
corresponding numerical form to generate a plurality of data
point values (206b); and
identifying the one or more test cases relevant to each of the
features, based on the generated data point values (206c);
and generating, via the one or more hardware processors, the data model based on the features identified by performing the impact analysis, and the identified one or more test cases relevant to each of the features (208).

2. The method as claimed in claim 1, wherein the plurality of release notes and the feature dependency map are processed using at least one Natural Language Processing (NLP) technique during the impact analysis.
3. The method as claimed in claim 1, wherein the plurality of data points comprise one or more of a release datapoint, a requirement datapoint, a test set datapoint, a test case datapoint, a defects datapoint, and a test execution datapoint.
4. The method as claimed in claim 1, wherein the generated data model is used to perform the test suite optimization for an Application Under Test (AUT), comprising:
fetching a plurality of release notes of the AUT (302);
identifying one or more changed features of the AUT by processing the
plurality of release notes (304);
identifying the plurality of test cases relevant to the identified one or more
changed features (306); and
processing the identified plurality of test cases using the data model (308),
comprising:
determining a relative importance of each of the plurality of test
cases in comparison with one another (308a);
segregating the plurality of test cases as highly relevant test cases
and less relevant test cases (308b); and
generating an optimized test suite using the plurality of test cases
segregated as the highly relevant test cases (308c).
5. A system (100) for generating a data model for test suite optimization,
comprising:
one or more hardware processors (102); a communication interface (112); and

a memory (104) storing a plurality of instructions, wherein the plurality of instructions when executed, configure the one or more hardware processors to:
collect a training data comprising plurality of release notes and a feature dependency map of a plurality of reference applications, wherein the plurality of release notes comprises change information across different versions of each of the plurality of reference applications;
perform an impact analysis to identify all features of each of the plurality of reference applications that have been modified across the different versions, wherein the impact analysis comprises processing the plurality of release notes and the feature dependency map of each of the plurality of reference applications;
identify one or more test cases relevant to each of the features identified as modified across the different versions, by:
capturing a plurality of data points from a historical
data of a plurality of test cases;
transforming each of the plurality of data points to a
corresponding numerical form to generate a plurality
of data point values; and
identifying the one or more test cases relevant to each
of the features, based on the generated data point
values; and generate the data model based on the features identified by performing the impact analysis, and the identified one or more test cases relevant to each of the features.
6. The system as claimed in claim 5, wherein the one or more hardware processors are configured to process the plurality of release notes and the

feature dependency map using at least one Natural Language Processing (NLP) technique during the impact analysis.
7. The system as claimed in claim 5, wherein the plurality of data points comprise one or more of a release datapoint, a requirement datapoint, a test set datapoint, a test case datapoint, a defects datapoint, and a test execution datapoint.
8. The system as claimed in claim 5, wherein the one or more hardware processors are configured to perform the test suite optimization for an Application Under Test (AUT), using the generated data model, by:
fetching a plurality of release notes of the AUT;
identifying one or more changed features of the AUT by processing the
plurality of release notes;
identifying the plurality of test cases relevant to the identified one or
more changed features; and
processing the identified plurality of test cases using the data model,
comprising:
determining a relative importance of each of the plurality of test
cases in comparison with one another;
segregating the plurality of test cases as highly relevant test cases
and less relevant test cases; and
generating an optimized test suite using the plurality of test cases
segregated as the highly relevant test cases.

Documents

Application Documents

#	Name	Date
1	202221006925-STATEMENT OF UNDERTAKING (FORM 3) [09-02-2022(online)].pdf	2022-02-09
2	202221006925-REQUEST FOR EXAMINATION (FORM-18) [09-02-2022(online)].pdf	2022-02-09
3	202221006925-PROOF OF RIGHT [09-02-2022(online)].pdf	2022-02-09
4	202221006925-FORM 18 [09-02-2022(online)].pdf	2022-02-09
5	202221006925-FORM 1 [09-02-2022(online)].pdf	2022-02-09
6	202221006925-DRAWINGS [09-02-2022(online)].pdf	2022-02-09
6	202221006925-FIGURE OF ABSTRACT [09-02-2022(online)].jpg	2022-02-09
7	202221006925-DRAWINGS [09-02-2022(online)].pdf	2022-02-09
8	202221006925-DECLARATION OF INVENTORSHIP (FORM 5) [09-02-2022(online)].pdf	2022-02-09
9	202221006925-COMPLETE SPECIFICATION [09-02-2022(online)].pdf	2022-02-09
10	202221006925-FORM-26 [21-04-2022(online)].pdf	2022-04-21
11	Abstract1.jpg	2022-06-10
12	202221006925-FER.pdf	2025-03-03
13	202221006925-FER_SER_REPLY [31-07-2025(online)].pdf	2025-07-31
14	202221006925-CLAIMS [31-07-2025(online)].pdf	2025-07-31
15	202221006925-PatentCertificate17-09-2025.pdf	2025-09-17
16	202221006925-IntimationOfGrant17-09-2025.pdf	2025-09-17

Search Strategy

1	SearchStrategyE_04-03-2024.pdf