Usage Of Program Traces For Model Checking Of Code

Abstract: USAGE OF PROGRAM TRACES FOR MODEL CHECKING OF C CODE The present invention illustrates an optimized approach for model checking of computer executable code. The system generates summaries by using trace inferences which are further used during model checking. Spurious counter-examples lead to newer traces and are used to improve summaries. The system considers summaries as a property of the computer executable code and uses it for abstraction during model checking process. The system provides an output confirming proper functionality of the computer executable code post the model checking process.

Patent Information

Application #

Filing Date

22 February 2012

Publication Number

12/2014

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

Parent Application

Applicants

TATA CONSULTANCY SERVICES LIMITED

NIRMAL BUILDING, 9TH FLOOR, NARIMAN POINT, MUMBAI 400021, MAHARASHTRA, INDIA.

Inventors

1. YEOLEKAR, ANAND V.

TATA CONSULTANCY SERVICES LIMITED, PROCESS ENG. INNOVATION LAB, TRDDC, 54B, HADAPSAR INDUSTRIAL ESTATE, PUNE 411103, MAHARASHTRA, INDIA

Specification

FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
TITLE OF INVENTION: "USAGE OF PROGRAM TRACES FOR MODEL CHECKING OF CODE"
Applicant TATA Consultancy Services Limited
A company Incorporated in India under The Companies Act, 1956
Having address:
Nirmal Building, 9th Floor,
Nariman Point, Mumbai 400021,
Maharashtra, India
The following specification particularly describes the invention and the manner in which it is performed.

FIELD OF THE INVENTION
The present invention in general relates to the field of software verification and specifically relates to model checking of software code.
BACKGROUND OF THE INVENTION
Software model checking focuses on providing guarantee that properties (or assertions) remain consistent over all runs of the software. Such guarantee is particularly required in certain safety-critical embedded software that essentially involves human interaction e.g. automotive control software. A typical model checking algorithms is disclosed in US7146605 issued to International Business Machines Corporation, wherein the document teaches a method to symbolically analyze code to detect control-flow based property violations, while employing symbolical analysis technique, though facing the code summarization scaling challenge A lack of precise scaling, upon summarization, to the size of industry code further leads to an exponential increase into the computational complexity of the algorithm as a number of lines of code increases. Model checking large size code faces the well-known state explosion problem i.e. potentially exponential computational complexity arising out of large no. of program variables, statements, branching and loops. State-of-the-art softwares are known not to scale beyond 500KLoC.Model checking tools (like CBMC) find it difficult to analyze code with size greater than 500 KLoC of code.
A variety of abstractions have been proposed to scale model checking. One solution is to use compact summaries of code fragments that approximate program state computed by the fragment. Summaries can be computed both statically (by symbolic analysis and dynamically as illustrated in US7962901. The patent contributes to improving model checking in the presence of pointers. However, the challenge of generation of summaries by observing program execution using program traces still has not been addressed.
The overall model checking effort can be reduced by providing compact summaries thus allowing us to analyze larger programs within the given resources of memory, time and computing power. Summaries are generally computed statically i.e. through symbolic analysis of the code without execution.

In the light of foregoing problem, there is a need to scale up the model checking process for analyzing huge lines of code and at the same time maintain the integrity of the code. The following invention proposes to utilize summaries constructed dynamically i.e. by observing program traces to control the state space explosion.
OBJECTIVES OF THE INVENTION
The principle object of the present invention is to obtain summaries using a plurality of trace of variables for each code segments.
Another significant object of the invention is to the application code (containing summaries for certain code fragments) for property or assertion violations.
It is another object of the present invention to confirm the assertion violations by validating the counter example.
Yet another object of the invention is to eliminate spurious counter examples to generate a precise summary.
Another object of the invention is to report a bug for property violations occurred in the model checking process.
SUMMARY
In one of the preferred embodiments of the present invention, a method and system for dynamically obtaining summaries from the information of program state reached during execution of some program fragments is utilized to simplify software for subsequent analysis by using state of the art model checking tools. The system is configured to check the integrity of the dynamically generated summaries thereby ensuring that property is either safe or violated. The dynamically generated summaries are characterized as a property of the program.

The system comprises of a trace generation module, an inference engine, a model checking engine, and a validation module The system is adapted to compare predetermined potential invariants with the values obtained from the trace generation module for receiving successful invariants. In one of the embodiment of the invention, the model checking engine optimizes the checking process by envisaging the summaries generated from the inference engine. The model checking engine may report a counter example that is validated by replaying on the original code to see if it indeed leads to assertion violation.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention. These and other features of the present invention will become more fully apparent from the following description, or may be learned by the practice of the invention as set forth hereinafter.
BRIEF DESCRIPTION OF THE DRAWINGS
The foregoing summary, as well as the following detailed description of preferred embodiments, is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the invention, there is shown in the drawings example constructions of the invention; however, the invention is not limited to the specific system and method disclosed in the drawings:
Figure 1 depicts the logical flow of the process and the various stages involved in the same. Figure 2 depicts the system diagram of model checking of computer executable code.

DETAILED DESCRIPTION OF THE INVENTION
Some embodiments of this invention, illustrating all its features, will now be discussed in detail. The words "comprising," "having," "containing," and "including," and other forms thereof, are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items.
It must also be noted that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise. Although any systems and methods similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present invention, the preferred, systems and methods are now described.
The present invention describes a system for abstraction to scale up model checking process. The invention provides a unique solution for abstraction by generating summaries from a trace generation module 104 in conjugation with an inference engine 106.
Referring to figure 1, the diagram illustrates the various stages involved in the method for abstraction. An input module 102 provides input in the form of a computer executable code with specified properties to the trace generation module 104. Properties can be expressed in the form of assert statements in the code.
The trace generation module 104 generates traces to be used by an inference engine 106. Trace generation steps include slicing the code with respect to the given property, identifying code inputs via a preliminary static analysis, code instrumentation, compilation and execution with inputs picked from a database. The code inputs database may be initialized with random values. During execution, code instrumentation produces a trace i.e. values of variables of interest at certain program points are recorded for analysis.

Relevant program points and variables are identified in the following way: suppose we choose functions in the code as potential abstraction units. Let G be the call graph of the code, with root r indicating the code entry point and L be the list of functions we wish to abstract away. Initially, L = empty. Let t be the function containing the assertion to be checked. G, r, t are easily identified by a preliminary static analysis of the code. Start a breadth-first-search on G. A node n encountered during the BFS is added to L if there is no path from n to t in G, in which case the BFS proceeds without processing the child nodes of n. This algorithm produces a list of functions who do not invoke t either directly or indirectly i.e. through their child functions.
Upon identification of the candidate functions to be abstracted, the corresponding program points are (i) function entry and (ii) function exit(s). The variables of interest are (i) arguments to the function, (ii) return variables of the function, (iii) global variables of the program, etc. In an exemplary embodiment of the invention, the relevant program points can also be identified as the start and end of a code fragment other than a function body e.g. a loop body, a block of statements doing some complex computation, etc. For such code fragments, variables of interest are subsequently identified.
The inference engine 106 is used to generate summaries based on the traces generated by the trace generation module 104. The inference engine 106 analyzes trace data to discover relations between variables. In accordance with an exemplary embodiment the inference engine checks if a pair of variables can be related by the inequality "x > y", or a variable is always greater than zero "x > 0". The inference engine 106 can be equipped with template of patterns that is instantiated for the given set of variables in the trace. The template may contain but is not limited to Boolean, linear, non-linear, domain-specific, or even user-provided relation patterns. These patterns may also relate arrays, lists, trees, and the like data structures in the program.
Upon obtaining the valid relations on variables, the function body is replaced with the corresponding summary. Every updated global and parametric variable is replaced with a non-deterministic assignment to indicate that it could take any possible value in its type range. Return variable is similarly replaced. Variable definitions are adjusted to correctly define the variables used in the new function body. This is followed by relations on variables placed as special

instructions recognizable by model-checkers (the model-checker honors these relations during symbolic analysis of the code.) A simplified source-level transformation of the function body that consists of a series of non-deterministic assignments is obtained following the variables relations extraction.
The model checking engine 110 generates counter examples 111 to be checked by the validation engine 112. A counter-example 111 is a program trace produced by the model-checker claiming an assertion violation. The counter example 111 reported by the model checker needs to be validated. Summaries can give incorrect assignments to program variables due to the non-deterministic assignments and relations formed out of observing limited runs of the program thereby violating the desired program behavior
The validation engine 112 checks whether the counter example 111 is spurious or correct by executing on the original computer implemented code. The validation engine in an exemplary embodiment reports additional information relating to violations of summaries. Depending upon the result of validation the system is characterized to generate a bug report 120, or the counterexample is added to the database of code inputs and summaries are regenerated. The refined summaries are expected to improve over the old ones.
In an exemplary embodiment when the model-checking engine reports null counter-examples, it then becomes necessary to check for the integrity of the summaries. This can be achieved by casting the summaries as assertions over the code fragment replaced and verifying the correctness by the model-checking engine. The entire process shown in figure 1 can be repeated for these summary-assertions. Non violation of the summary assertions leads to the conclusion of safety of the property with respect to the original program. In case of summary-assertion violations, the system is configured to add the counterexamples to the database. The summary-checking and validation loops may thus execute in interleaving till some output is reached or iterative attempts have been made, in which case the effort can be said to be non-conclusive with respect to checking the original property. In a preferred embodiment, summaries represent program behavior at well-defined computer executable code points (traces) or behavior of computer executable code fragments e.g. functions. Dynamically generated summaries refer to

summaries constructed from computer executable code executions instead of static analysis of the computer executable code.
Preferably, the system uses state of art tool Daikon and the like to generate traces for abstraction. The computer executable code is executed multiple number of times with random inputs to generate traces. The inference engine 106 compares the predetermined potential invariants with the values obtained from the trace generation module 104.
Potential invariants are generally of the form of linear relations between program variables e.g. "x>0", "x>y", "elements of a[] > constant", etc. Daikon reports successful invariants which are suitably aggregated to represent summaries of functions. In accordance with an embodiment of the invention, the inference engine 106 generates traces to establish relations between potential invariants and successful invariants.
In a preferred embodiment the model checking module 110 reports a counter example i.e. property or assertion violation, or safety of property. The counterexample (CE) contains values to computer executable code inputs that may be used to execute the code.
In a preferred embodiment, the validation engine 112 performs validation of the computer executable code as the summaries generated are an approximation of the original computer executable code thereby possibly deviating from the original behavior of the computer executable code. The validation engine 112 verifies whether the counter example is spurious or correct.
The output generating module 900 preferably, report bugs containing property violation with values to computer executable code inputs. If no property violation is retrieved a message box portraying completion of model checking process and ensuring 'property is safe' is displayed.
Now referring to the system diagram as illustrated in figure 2, the system comprises of a cache 400, a processing module 500, a memory device 600 and a test database 700. The processing module further comprises of a trace generation module 104, an inference engine 106, a model

checking module and a validation engine 112. The input module 102 accepts computer executable code in the form of input and stores it in the cache for further processing. The trace generation module uses predetermined potential invariants to compare with the successful invariants. The model checking module 110, uses the summaries generated by the inference engine 106, to check for counter examples if any and accordingly perform validation in assistance with the validation engine 112.
The memory device 600, is configured to store the bug reports generated by the validation engine thereby creating the test database 700. Conditional checking is done by the validation engine. Spurious counter examples are recorded in the form of bug reports in the memory device and help in generation of test database. Such spurious counter examples are revised again for model checking based on the refined summaries. The cycle is recursive until the property of the computer executable code is ensured as safe. The output generating module 900 ensures consistency in the original working behavior of the computer executable code and generates output in the form of a message box "Property is safe" thus concluding the model checking process.
Although the present invention is not limited to any particular tracing tool or methodology, inference engine, model checking tool, or language of the code, we use Daikon to generate summaries and CBMC for model-checking C code.
The advantages associated with the mentioned process of model checking involve reducing overall model checking effort, thus allowing analyzing larger programs within the given resources of memory, time and computing power.

Working Example
Consider illustration 1(a), showing a computer executable code with function "sort" that is meant to sort an array of integers. Trace inference detects that at the end of the function, invariant "array a is sorted on <=" holds. Consequently, we construct a summary with non-deterministic assignments to the array members and add a loop indicating the sorted relation. Illustrationl(b) shows how the summary replaces the function body. Note that this summary is an over-approximation of the original code segment.
Illustration 1: function body summarized with non-deterministic assignments and "assume" statements

CLAIMS
1. A method for abstraction to scale up model checking process, the method comprising of
processor implemented steps of:
receiving a computer executable code into memory and fragmenting the code into a
plurality of identifiable code segments;
generating a plurality of trace of variables for the each code segment by recursive
random inputs;
obtaining, in the code segment, values of relevant variables associated with the each trace
at a program point corresponding to the code segment;
comparing predetermined potential invariants with the values of relevant variables to
retrieve a successful invariant for each code segment;
aggregating the each retrieved successful invariants to form a summary;
analyzing the each code segment with the corresponding summary to generate a counter
example;
checking violations in the property and/or summary by validating the generated counter
example;
generating a bug report for property violation and generating an output ensuring non
violation of summaries thereby concluding the model checking process.
2. The method of claim 1, wherein the trace generation further comprises of slicing the code according to the given property into a plurality of code segments, identifying code inputs via a preliminary static analysis, code instrumentation, compiling and executingthe each sliced code segment coupled with inputs picked from a database.
3. The method of claim 1, wherein the each identifiable code segment forms a subset of the said computer executable code and the code instrumentation comprises of trace production of values of variables of interest at certain program points.

4. The method of claim 1, wherein the trace of variable comprises a string argument containing a plurality of variable arguments to comply with predetermined format.
5. The method of claim 1, wherein the relevant variables comprise of parameters to the code segment, updated non-local variables within the code segment and variables returned by the code segment.
6. The method of claim 1, wherein the potential invariants are relations between but are not limited to relevant variables such as predicate, linear, containment, non-linear including polynomial and the like.
7. The method of claim 1, wherein the spurious counter example provides input leading to execution of trace that does not violate property of the potential invariants.
8. The method of claim 1, wherein the summary is concise form of the code segment.
9. The method of claim 1, wherein the counter example contains values of program inputs that are used to execute the code segment.

10. The method of claim 1, wherein the bug report contains property violations with values to program inputs.
11. The method of claim 1, wherein the output is generated as a message box portraying that property is not violated for any run of the computer executable code.
12. A system for abstraction to scale up a model checking process, the system comprising of processor implemented steps of:
an input module comprising of a computer executable code to fragment the code into a plurality of identifiable code segments;
a compiler to prepare the computer executable code to trace variables at suitable program points based on the input module thereby compiling the code;

a trace generation module to obtain values of relevant variables associated with each
trace at the program points by executing the computer executable code recursively with
random inputs fetched from a code input database;
an inference engine adapted to compare predetermined potential invariants with the
values obtained from the trace generation module to retrieve successful invariants;
a model checking engine adapted to optimize checking process by utilizing the generated
summaries via the inference engine;
a validation module configured to generate a counter example for verifying violations in
the summary;
an output generating module to report bugs in a test database and ensuring consistency in
the original working behavior.
13. The system of claim 12, wherein the identifiable code segments forms a subset of the C program, such as a function.
14. The system of claim 12, wherein the trace of variable contains a string argument that can contain a plurality of variable arguments to comply with predetermined format.
15. The system of claim 12, wherein the relevant variables comprise of parameters to the code segment, updated non-local variables within the code segment and variables returned by the code segment.
16. The system of claim 12, wherein the potential invariants are relations between but are not limited to relevant variables such as predicate, linear, containment, non-linear including polynomial and the like.
17. The system of claim 12, wherein the spurious counter example detects the execution trace that violates property of the variables.
18. The system of claim 12 wherein, the summary is concise form of the code segment to be replaced.

19. The system of claim 12 wherein, the counter example contains values of program inputs that may be used to execute the code segment.
20. The system of claim 12, wherein the precise summary refers to integrity of the computed summaries and furthermore each code segment is adapted to ensure consistency with the original working behavior.
21. The system of claim 12, wherein the bug report contains property violations with values to program inputs.
22. The system of claim 12, wherein the output is generated as a message box portraying that property is not violated for any run of the computer executable code.

Documents

Application Documents

#	Name	Date
1	ABSTRACT1.jpg	2018-08-11
2	480-MUM-2012-FORM 3.pdf	2018-08-11
3	480-MUM-2012-FORM 26(26-3-2012).pdf	2018-08-11
4	480-MUM-2012-FORM 2.pdf	2018-08-11
5	480-MUM-2012-FORM 2(TITLE PAGE).pdf	2018-08-11
6	480-MUM-2012-FORM 18.pdf	2018-08-11
7	480-MUM-2012-FORM 1.pdf	2018-08-11
8	480-MUM-2012-FORM 1(19-3-2012).pdf	2018-08-11
9	480-MUM-2012-FER.pdf	2018-08-11
10	480-MUM-2012-DRAWING.pdf	2018-08-11
11	480-MUM-2012-DESCRIPTION(COMPLETE).pdf	2018-08-11
12	480-MUM-2012-CORRESPONDENCE.pdf	2018-08-11
13	480-MUM-2012-CORRESPONDENCE(26-3-2012).pdf	2018-08-11
14	480-MUM-2012-CORRESPONDENCE(19-3-2012).pdf	2018-08-11
15	480-MUM-2012-CLAIMS.pdf	2018-08-11
16	480-MUM-2012-ABSTRACT.pdf	2018-08-11
17	480-MUM-2012-OTHERS [10-10-2018(online)].pdf	2018-10-10
18	480-MUM-2012-FER_SER_REPLY [10-10-2018(online)].pdf	2018-10-10
19	480-MUM-2012-COMPLETE SPECIFICATION [10-10-2018(online)].pdf	2018-10-10
20	480-MUM-2012-CLAIMS [10-10-2018(online)].pdf	2018-10-10
21	480-MUM-2012-Response to office action [28-11-2020(online)].pdf	2020-11-28
22	480-MUM-2012-FORM-26 [28-11-2020(online)].pdf	2020-11-28
23	480-MUM-2012-Correspondence to notify the Controller [28-11-2020(online)].pdf	2020-11-28
24	480-MUM-2012-Written submissions and relevant documents [15-12-2020(online)].pdf	2020-12-15
25	480-MUM-2012-US(14)-HearingNotice-(HearingDate-01-12-2020).pdf	2021-10-03

Search Strategy

1	SEARCH_STRATEGY_480mum2012_10-04-2018.pdf