Abstract: The invention relates to a method, one or more computer-readable media and a system for analyzing a computer program, wherein the method comprises defining a first condition, wherein said first condition specifies a cause of an error, defining a second condition, wherein said second condition specifies a defect for said computer program, wherein said defect is a result of said first condition, wherein said first condition specifies acquisition of a property by a variable that leads to said error responsive to said defect specified by said second condition and said defect specified by said second condition is an operation that is forbidden based on said property acquired by said variable, computing path information from a statement of said computer program satisfying said first condition to another statement of said computer program satisfying said second condition and providing a visual representation of the path.
Description
Method of analyzing a computer program
The present invention relates to a method, one or more
computer-readable media and a system for analyzing a
computer program.
A compiler produces an object program from the source code
of a computer program. To infer properties of the source
code, various dataflow analyses have been proposed.
Typically, the dataflow analyses are performed to compute
properties of variables. If the dataflow analysis is to be
applied to only one procedure, then, an intra-procedural
dataflow analyses is used. On the contrary, if the dataflow
analysis is to be applied inter-procedurally, then, an
inter-procedural dataflow analysis is used.
In case of a defect in the computer program the defect is
notified to the user.
However, to identify the cause for the defect, the user is
required to spend substantial amount of time. Complex
computer programs have sources code which may run into
thousands of statements. Identifying the causes for each of
the defect in the source program is a time consuming
process and requires good amount of man hours, thus,
escalating the cost of the product development.
It is an object of the invention to eliminate or at least
minimize the above mentioned problems.
The above object is achieved by a method and one or more
computer-readable media for analyzing a computer program,
wherein the method comprises, defining a first condition
with respect -to which an analysis of the computer program
is to be performed, wherein the first condition specifies a
cause of an error, defining a second condition, wherein the
second condition specifies a defect for the computer
program, wherein the defect is a result of the first
condition, wherein the first condition specifies
acquisition of a property by a variable that leads to the
error responsive to the defect specified by the second
condition and the defect specified by the second condition
is an operation that is forbidden based on the property
acquired by the variable, computing path information from a
statement of the computer program satisfying the first
condition to another statement of the computer program
satisfying the second condition and providing a visual
representation of a path from the cause of the error to the
defect using the path information.
Thereby, enabling an individual to easily correlate a
defect to the possible cause for the defect in a computer
program.
According to an embodiment, the computing of the path
information from the statement of the computer program
satisfying the first condition to another statement of the
computer program satisfying the second condition comprises
generating a control flow graph (CFG) from a source code of
the computer program, wherein the CFG comprises at least
one basic block comprising at least one statement of the
source code.
According to yet another embodiment, the method further
comprises performing a dataflow analysis upon each of the
basic blocks of the CFG.
According to yet another embodiment, the performing of the
dataflow analysis each of the basic blocks of the CFG
includes performing an intra-procedural dataflow analysis
upon the basic blocks of the CFG.
Thereby, enabling the dataflow analysis within a procedure.
According to yet another embodiment, the performing of the
dataflow analysis each of the basic blocks of the CFG
includes performing an inter-procedural dataflow analysis
upon the basic blocks of the CFG.
Thereby, enabling the dataflow analysis to be applied
inter-procedurally.
According to yet another embodiment, the performing of the
dataflow analysis upon each of the basic blocks of the CFG
comprises computing a property of a variable in each of the
basic blocks of the CFG, determining if the variable having
the property satisfies the first condition and computing a
list of paths from a respective the basic block comprising
the variable to another the basic block satisfying the
second condition if the variable having the property
satisfies the first condition.
Thereby, computing a property of a variable along with path
information.
According to yet another embodiment, the list of paths
comprises identification information of the corresponding
statements of the source code from the block to the another
block.
According to yet another embodiment, the computation of the
property of the variable and the list of paths are derived
using a transfer function, wherein the transfer function
comprises a function to compute the list of paths.
According to yet another embodiment, the computation of the
property of the variable derived using the transfer
function includes solving iteratively the transfer function
until successive iterations do not change the property of
the respective variable.
According to yet another embodiment, the method further
comprises ceasing to emit repetitive identification
information of the corresponding statements of the source
code to the list of paths.
According to yet another embodiment, the cessation of
emission of repetitive identification information of the
corresponding statements of the source code from the list
of paths is performed using a concatenation operation.
According to yet another embodiment, the method further
comprises maintaining the property and the list of paths
for the variable from the basic block to the another block
on a memory.
According to yet another embodiment, the providing of the
visual representation of path from the cause of the error
to the defect using the path information includes providing
a visual indication of the path from the statement to the
another statement responsive to an input received from a
user over a warning message displayed corresponding to the
defect in the another statement.
According to yet another embodiment, the providing of the
visual representation of path from the cause of the error
to the defect using the path information includes
indicating a severity level of the statement based on
number of the errors caused by the statement by
representing the statement distinguishably.
Another embodiment includes, a system for analyzing a
computer program, wherein the system comprises a processor
configured to process a first condition, wherein the first
condition specifies a cause of an error, process a second
condition, wherein the second condition specifies a defect
for the computer program, wherein the defect is a result of
the first condition, wherein the first condition specifies
acquisition of a property by a variable that leads to the
error responsive to the defect specified by the second
condition and the defect specified by the second condition
is an operation that is forbidden based on the property
acquired by the variable, compute path information from a
statement of the computer program satisfying the first
condition to another statement of the computer program
satisfying the second condition and a display to provide a
visual representation of path from the cause of the error
to the defect using the path information.
The present invention is further described hereinafter with
reference to illustrated embodiments shown in the
accompanying drawings, in which:
FIG 1 illustrates a representative hardware environment
for practicing the embodiments herein,
FIG 2 illustrates a schematic block diagram of a system
for computing path information from a statement
responsible for a cause of an error to another
statement having a defect for a computer program
according to an embodiment herein,
FIG 3 illustrates an example of a control flow graph
comprising a plurality of nodes,
FIG 4 illustrates an example of a lattice reflecting
partial order among elements in a base domain for
null pointer analysis,
FIG 5 illustrates an example of a computer program
comprising a loop,
FIG 6a illustrates a path obtained by applying Transpath
at statement 5 in the computer program of FIG 5
according to an embodiment herein,
FIG 6b illustrates a path obtained by applying Transpath
at statement 7 in the computer program of FIG 5
according to an embodiment herein,
FIG 7 illustrates highlighting of a cause and an effect
statement of a computer program according to an
embodiment herein,
FIG 8 illustrates highlighting of a cause statement, an
effect statement and intermediate statements
between the cause and the effect statement
according to an embodiment herein,
FIG 9 is a flow diagram illustrating a method of
computing properties of variables along with path
information according to an embodiment herein,
FIG 10 is a flow diagram illustrating a method of
analyzing a computer program according to an
embodiment herein,
FIG 11 illustrates an example of a computer program,
FIG 12 illustrates an example of a computer program,
FIG 13 illustrates an example of a computer program,
FIG 14 illustrates a CFG for the computer program of FIG
13, and
FIG 15 illustrates an example of a computer program.
Various embodiments are described with reference to the
drawings, wherein like reference numerals are used to refer
to like elements throughout. In the following description,
for purpose of explanation, numerous specific details are
set forth in order to provide a thorough understanding of
one or more embodiments. It may be evident that such
embodiments may be practiced without these specific
details.
Fig. 1 depicts a representative hardware environment for
practicing the embodiments herein. This schematic drawing
illustrates a hardware configuration of an information
handling/computer system 100 in accordance with the
embodiments herein. The system comprises at least one
processor or central processing unit (CPU) 101. The CPU 101
is interconnected via bus 102 to various devices such as a
memory 103, input/output (I/O) controller 104, and user
interface controller 105. Depending on the type and
configuration of the system 100, the memory 103 may be
volatile (such as random access memory (RAM) etc., non-
volatile (read only memory (ROM), flash memory devices
etc.,) or a combination of the two. The.memory 103 is used
to store instructions and data for use by the CPU 101. The
I/O controller 104 can connect to peripheral devices, such
as CD drives 107 and hard drives 108, or other program
storage devices that are readable by the system. Typically,
an operating system for the computer system 100 as well as
an application program is stored onto the hard drive 108.
The operating system runs on the CPU 101 and is used to
coordinate and provide control of various components within
system 100. The system 100 can read the inventive
instructions on the hard drive 108 and load them onto the
memory 103 for execution by the CPU 101. The user interface
controller 105 can connect to a keyboard 109, mouse 110,
speaker 111, microphone 112, display device 113 and/or
other user interface devices such as a touch screen device
(not shown) to the bus 102 to gather user input and also to
provide system output to the user.
FIG 2 illustrates a schematic block diagram of a system for
computing path information from a statement responsible for
a cause of an error to another statement having a defect
for a computer program. The system 200 comprises a static
analysis module 202 and a visualization module 208. The
static analysis module 202 comprises a control flow graph
(CFG) generator 204 and a dataflow analyzer 206. The CFG
generator generates a CFG of the computer program from a
source code of the computer program. Each node in a CFG is
a basic block comprising of at least one statement.
FIG 3 illustrates an example of a CFG comprising a
plurality of nodes 302. Each of the nodes 302 of the CFG is
a basic block 304a through 304g. Typically, a basic block
in a CFG represents a set of statements with no branching
statements. In the shown example of FIG 3, each statement
is treated as a basic block, and thus, each basic block
comprises only one statement.
Referring again to FIG 2, the CFG generated by the CFG
generator 204 is provided to a dataflow analyzer 206. The
dataflow analyzer 206 analyzes the CFG to compute
properties of variables along with path information in each
of the basic blocks 304a through 304g. Depending on the
requirement, either of the dataflow analyses such as intra-
procedural dataflow analysis or inter-procedural dataflow
analysis may be performed. The type of analysis performed
by the dataflow analyzer includes an analysis to detect
defects of a computer program statically. For example, the
analysis includes, but is not limited to, a null pointer
analysis and an array index out of bounds analysis.
To perform the static analysis, a base domain for the
analysis is selected depending on the type of analysis
being performed. For example, for null pointer analysis the
selected base domain is:
{null, nonnull, maybenull, undefined}
An example of a lattice reflecting partial order among
elements in the base domain for null pointer analysis is
illustrated in FIG 4. The lattice illustrates the elements
undefined, null, nonnull and maybenull of the base domain
for null pointer analysis.
In an embodiment, dataflow analysis to compute the
properties of the variables and the path information may be
performed using the following algorithm:
1. Initialization: For each block (statement)
a .Initialize an abstract property for each
variable with an element from base domain and
associate each variable with empty path
2. change = true
3. while change
Do
a. Select a node "n" in control flow graph (CFG)
b. Repeat
i. Compute IN (or OUT) set for n from the
immediate predecessor (or successors) of n
ii. Compute OUT (or IN) set of n using
Transfer function and update path
information
iv. n <- Select predecessor (or successor)
node of n
c. until (all the nodes in control flow graph are
visited)
Done
The analyzer analyses each of the basic blocks of the CFG
and computes IN, OUT, Gen, and Kill sets. In embodiments
where the basic blocks are within a loop, the analyzer
computes the IN, OUT, Gen, and Kill sets iteratively until
the abstract property of the variables do not change in
successive iterations.
Path information is to be computed from a statement in a
basic block responsible for causing an error to another
statement in a basic block having the defect. The analyzer
determines if a statement is either a cause statement or a
defect statement based on conditions defined by the user.
The type of analysis to be performed, for example, null
pointer analysis, array index out of bounds analysis,
typically, determines the conditions to be specified for a
cause. In an embodiment, the conditions specifying the
causes specify acquisition of a property by a variable that
leads to an error responsive to a corresponding defect
specified by the conditions specifying the defects. The
conditions specifying the defects specify an operation that
is forbidden based on the property acquired by a variable.
To compute the path information, the transfer function of
the analyzer is augmented with a function. Typically, the
path information comprises a list of paths from the
statements responsible for causing errors to the statements
having defects.
Accordingly, the dataflow analyzer while performing the
analysis computes the abstract values of properties of
variables along with the path information.
Referring again to Fig 4, the illustrated lattice also
represents a Join operation. The Join operation applied
depends on the direction of flow of the analyzer, i.e.,
forward or backward and the type of Join function. In an
embodiment, the Join operation is:
Where
INP is an input comprising of variable
property and set of paths at a basic block p
abstrp(v) is abstract value of v in base
domain at block p.
pathp(v) is path information for variable v
at block p.
Join (abstrp(v)) is computed from the base
domain.
In the aforementioned embodiment, the JOIN operation
comprises an operation for collecting path information in
addition to abstract values for variables.
In an embodiment the properties of variables along with
path information are computed using the transfer function:
Where, OUT is the output of a basic block, Transpath is a
function to compute the path information and properties of
variables, IN is the input to the basic block.
Gen is a function that returns a variable property pair and
a set for gathering path information from a statement
responsible for a cause to other statements visited by the
analyzer. For example, if a statement is responsible for
causing an error, then Gen set for the statement is [v,
abstr (v) , {""}], where v is a variable involved in a
statement, abstr(v), represents abstract value of v in base
domain and "" is an empty path string.
Kill set comprises of variables identified based on base
domain. Additionally, if a statement is a cause statement
with a variable v, then previous abstract properties and
paths corresponding to v are deleted for this statement.
Hence v is added to the Kill set. For example, if a
variable is assigned a property which is nonnull, then the
null path for that variable is assigned as empty set for
the statement.
In an embodiment, the function Transpath is performed using
the following algorithm:
Transpath (S)
// S represents state
for each variable v appearing in S do
if paths (v) ! =empty
for each p є paths (v) do
In the aforementioned embodiment, paths(v) returns path
information of the variable v and statement_id is a
statement identifier in the source code. The statement
identifier, in an embodiment, is the statement number of a
statement of the source code. The path information is
computed for the variable having the path set p as
nonempty. The path set p being nonempty implies that a
cause statement, i.e., a statement in which the variable v
is assigned a property null has been encountered by the
analyzer as the analyzer traverses the basic blocks of the
CFG.
Initially, for each variable v є V, where V is a set of all
variables, the abstract property and path are assigned as
follows:
(1) d: Depending on strategy, initial value of d
(where d є D) can be assigned top or bottom element of
the lattice corresponding to abstract domain, where d
is an abstract property of a variable and D is the
base domain.
(2) p: Initial value of path set p (where p є P) is
assigned as an empty set, where p is the path set and
P is the path information.
In an embodiment, repetitive identification information of
the corresponding statements of the source code is removed
from the list of paths in case a loop is encountered by the
analyzer. This enables in saving computation time and
memory space. As described in embodiments herein, the
function Transpath computes the list of paths whenever it
visits a statement. Such an operation will result in a path
with multiple cycles in presence of loops. The multiple
cycles of a path are not essential for visualization of the
cause of the defects in the computer program as
visualization may be provided using a single cycle of the
path. Accordingly, multiple cycles having repetitive
identification information of the corresponding statements
of the source code are may not emitted into the list of
paths, instead only one cycle of statements may be emitted
into the list of paths.
In an embodiment, to remove the list of paths of multiple
cycles and retain only the path information of a single
cycle in case of a loop, a concatenation operation used in
the transfer function. On detection of a cycle, the
concatenation operation emits a subpath which constitutes
only one cycle.
Referring now to FIG 5, an example of a computer program
comprising a loop is illustrated. In the shown example of
FIG 5, statement 4 and statement 8 are cause statements as
in statement 4 a variable p is assigned a property null and
in statement 8 a variable q is assigned a property null.
Applying Transpath at statement 5 for paths coming from 11,
the path 4 5 6 7 8 9 10 11, as illustrated in FIG 6a, is
obtained. At the end of the path 4 5 6 7 8 9 10 11, the
flow of the analyzer may either move to the statement 5 or
12. For example, the flow shall move to the statement 5 in
case the condition of the loop prevails. If the condition
of the loop does not prevail, the flow moves to the
statement 12. Concatenation of the path 4 5 6 7 8 9 10 11
with the path 5 shall emit 4 5 instead of 4 5 6 7 8 9 10 11
5 as the path 5 is already in the list.
Statement 7 is an effect statement as the statement has a
defect due to the null property of variable q. The path
from the cause statement, i.e., from statement 8, to the
effect statement at statement 7 is 8 9 10 11 5 6 7, as
illustrated in FIG 6b. The path 12 is encountered only if
the condition of the loop does not prevail.
The computed properties of the variables and path
information are provided to a graphical user interface
(GUI) 208. The GUI 208 provides a visual representation of
the path from the cause of a respective error to a defect
in a computer program on the display device 113 of the
computer system 100. Additionally, the visual
representation provided may be in accordance with a
visualization mode selected by a user.
In an embodiment, after the analysis of the computer
program, the GUI 208 displays warning messages referring to
different defects in the source code of the computer
program. On detecting an input over a warning message using
the GUI, the corresponding statement where the defect is
present is highlighted.
In another embodiment, visual indication of the cause and
the effect statement may be provided to the user. In the
present embodiment, all cause nodes for a statement having
a defect may be highlighted on receiving an input from the
user. FIG 7 illustrates highlighting of only the cause and
the effect statement of a computer program. In the shown
example of FIG 7, the highlighted statement 702 is a cause
statement as the variable p is assigned a property null.
The highlighted statement 704 is the effect statement,
i.e., the statement having the defect due to the assignment
of the value null to p at the highlighted statement 702.
The highlighted statements 702 and 704 may be highlighted
in colours, such that, they are distinguishable. Similarly,
all effects statements for a cause statement may be
highlighted on reception of an input for the same.
In yet another embodiment, visual representation of all
cause statements for an effect statement, i.e., the
statement having the defect, the effect statement and the
intermediate statements between the path of the cause and
the effect statement may be provided. The user may be
provided with an option such that the paths may be
displayed one by one. Moreover, a user may access the
abstract property of a variable when the respective
statement is being highlighted. FIG 8 illustrates
highlighting of the cause statement, effect statement and
the intermediate statements between the cause and the
effect statement. The highlighted statement 802 and 804 are
cause statements as the variable p is assigned to a
property null in the highlighted statements 802 and 804.
The highlighted statement 806 is an effect statement as the
statement has a defect due to use of the variable p having
a property null assigned at statement 802. The highlighted
statements 808 from the cause statement 802 to the effect
statement 806 are the intermediate statements.
Additionally, the highlighted statement 810 is an effect
statement due to use of the variable p having a property
null assigned at statement 804. The arrows represent the
flow of the paths.
Additionally, each path displayed may be highlighted to be
represented distinguishably. This will enable easy
distinction of each path.
In yet another embodiment, a cause statement may be
represented distinctly based on the number of statements
for which the cause statement is responsible for error.
This will provide a representation of the severity of a
cause statement and the user may prioritize to fix the bugs
accordingly.
In yet another embodiment, a distance from a cause
statement to an effect statement may be displayed. The
distance from the cause statement to the effect statement
may be computed as length of path from the cause statement
to the effect statement.
Typically, the path information is computed from a basic
block comprising a statement that is responsible for
causing an error to another basic block comprising another
statement having the defect. The conditions responsible for
the errors and defects of the computer program may be
defined to specify the causes and the defects. For example,
a first condition may be defined to specify the cause of an
error and another condition may be defined to specify the
defects. The defects specified by the second condition
result from the errors specified in the first condition.
Typically, the first condition specifies acquisition of a
property by a variable in a statement of the computer
program that leads to the error responsive to the defect
specified by the second condition. For example, the defect
specified by the second condition is an operation that is
forbidden based on the property acquired by the variable.
As the first condition specifies the cause of an error, it
may be inferred that the first condition defines a cause
statement. Similarly, as the second condition specifies the
defect in the computer program, it may be inferred that the
second condition defines an effect statement, i.e. the
statement where the variable having a property that may
result into an error has effect. For example, a variable
may acquire a property by way of a value being assigned to
the variable.
FIG 9 is a flow diagram illustrating a method of computing
properties of variables along with path information
according to an embodiment herein. At block 902, a CFG
comprising at least one basic block is generated from a
source code of a computer program. Next at block 904, a
dataflow analysis is performed over each of the basic
blocks of the CFG. The dataflow analysis may be performed
by using an intra-procedural dataflow analysis or an inter-
procedural dataflow analysis.
At block 906, properties of variables in each of the basic
blocks of the control flow graph are computed. Next, at
block 908 it is determined if a respective variable having
a respective property satisfies the first condition, i.e.,
if a variable is assigned a property that leads to an
error. If the respective variable having the respective
property satisfies the first condition, then, at block 910,
a list of paths from a basic block comprising the
respective variable to another basic block satisfying the
second condition is computed.
In an embodiment, the computation of properties of the
variable and the computation of the list of paths are
performed using a transfer function comprising a function
to compute the list of paths.
In an embodiment, for statements which are within a loop,
the transfer function is iteratively solved until
successive iterations do not change the properties of the
respective variables. This enables the property of a
variable to be converged to a fixed point
Moving next to block 912, properties and the list of paths
for each of the variables from the basic block to the
another basic block are maintained in the memory 103 of FIG
1 of the computer system 100 of FIG 1. In an embodiment,
the list of paths comprises identification information of
the corresponding statements of the source code from the
basic block to the another basic block. This enables is
computing a list of paths to identify statements of a
source code from a statement responsible for causing an
error to another having the defect responsive to the
property assigned to the variable in the statement causing
the error.
FIG 10 is a flow diagram illustrating a method of analyzing
a computer program according to an embodiment herein. At
block 1002, a first condition specifying a cause of an
error is defined. Next, at block 1004, a second condition
specifying a defect for the computer program is defined.
At block 1006, path information from a statement of the
computer program satisfying the first condition to another
statement of the computer program satisfying the second
condition is computed. The statement satisfying the first
condition is the statement in which a variable is assigned
a property that leads to the defect in another statement,
for example, an operation forbidden by the property
acquired by the variable. The defect in the another
statement is the defect specified by the second condition
and thus a statement having a defect as specified in the
second condition satisfies the second condition.
At block 1008, a visual representation of a path from the
cause of the error to the defect using the path information
in provided.
EXAMPLE: 1 - Null Pointer Analysis
Null pointer analysis is typically an analysis wherein a
dereferenced pointer variable having a property null is
identified, i.e., the property of a variable is null. In
computer programming pointers provide value at an address
of a memory identified by the variable dereferenced. A
dereferenced variable having a property, for example null,
may lead to an error as the dereference operation is
forbidden due to invalid memory address.
Therefore, to compute path information from a basic block
having a statement which assigns a value null to a value as
being the cause to the basic block having the statement
where the variable is dereferenced, i.e., the statement
having the defect. The path information is computed along
with the computation of abstract properties for each of the
variables as the analyzer traverses the basic blocks of the
CFG according to the embodiments described herein.
Referring now to FIG 11, an example of a computer program
1100 to be analyzed is shown. In the shown example of the
computer program in FIG 11, each statement of the computer
program is identified with the statement number
corresponding to it. The statement numbers act as statement
identifiers for the computer program of the present
example.
To perform a null pointer analysis of the computer program
1100 shown in FIG 11, a CFG is generated from the source
code of the computer program 1100 through the CFG
generator. The CFG for the computer program is illustrated
in FIG 3. Referring now to FIG 3, it can be observed that
each basic block 304a through 304g comprises at least one
statement of the source code. The basic block 304a
corresponds to statement having the statement number 1 in
the computer program 1100. Similarly basic blocks 304b
through 304g correspond to statements 3, 4, 5, 6, 8 and 9
respectively of the computer program 1100. The dataflow
analyzer analyses each of the basic blocks and computes the
property and path information for the variables in each of
the basic blocks.
Initially, abstract property and path information for all
the variables is undefined and empty respectively. As the
analyzer traverses the basic blocks abstract values and
path information are computed in the following manner:
It can be seen that for the block 304f, the analyzer
computes the property of the variable as null as well as
the path information by applying the transfer function
according to the embodiments described herein. The path
information computed herein is 8, i.e., the statement
number of the corresponding statement represented by the
basic block. In the present example, the statement number
is used to identify the corresponding statement of the
source code of a particular basic block.
For the basic block 304g, IN is a Join operation as the
basic block 304g receives the output of basic blocks 304e
and 304f as input. Therefore, the IN state for the basic
block 304g is the Join operation of the OUT of the basic
blocks 304e and 304f. OUT of basic block 304g is derived by
applying the transfer function according to the embodiments
described herein.
The path list is amended by inserting the identification
number of the statement corresponding to the basic block
304g. Thus, the path {89} represents the cause and effect
path for the defect in statement 9 of the source code. The
statement number 8 identifies the statement where a
property null is assigned to the variable a as the cause
statement. The next statement, i.e., 9 where the variable a
is dereferenced is identified as the effect statement.
The computed path may be provided to a GUI to be visually
represented to the user.
EXAMPLE: 2 - Array index out of bounds analysis
In the present example, the techniques described herein are
implemented for the analysis of array index out of bounds.
The property computed by the dataflow analyzer in the
implementation of the array index out of bounds analysis is
lower and upper limit for each variable. As the set of all
possible intervals is infinite in size, a finite base
domain is selected for such analysis which comprises of set
of all possible intervals with integer constants in each of
the basic blocks and with interval . Each integer
variable is associated with an interval. The IN and OUT
state consists of variable value pair as described in the
embodiments herein. Similarly, Gen and Kill sets are
nonempty for assignment statements and empty for other
statements. Effect of different statements on abstract
interval of a variable is as follows:
Where a is a variable involved in the statement, abstr(a)
is abstract value of a. In an embodiment, to obtain a fixed
point, every abstract interval is mapped to a value in the
base domain using widening and narrowing techniques, known
per se. The upper and lower limit of an interval is mapped
to a value in the base domain using the widening operation.
The narrowing operation is used to refine the outcome of
the widening operation.
The first condition for the array index out of bounds is
defined as assignment of value to a variable where lower or
upper limit of the variable goes beyond a prescribed limit.
The prescribed limit of a variable is based on size of an
array for which the variable is used as a subscript.
The second condition is defined as the use of a variable as
an array subscript for which the interval value is beyond
the prescribed limit. Thus, the second condition specifies
the defect for a statement.
Referring now to FIG 12, an example of a computer program
1200 to be analyzed is shown. In the shown example of the
computer program in FIG 12, each statement of the computer
program is identified with the statement number
corresponding to it. The statement numbers act as statement
identifiers for the computer program of the present
example.
To perform an array index out of bounds analysis for the
computer program 1200 of FIG 12, a CFG is generated from
the source code of the computer program 1200 through the
CFG generator. The CFG for the computer program 1200 is not
illustrated herein. However, each basic block of the CFG is
considered to comprise one statement each of the source
code. The interval values computed for integer variables
for the computer program of above are as follows:
Computation of Gen, Kill, IN and OUT sets are not shown as
the computation of the same are performed in a manner
similar to the computation described under example 1.
Further, effect of statements on abstract values is shown.
While the analyzer traverses each of the basic blocks, each
comprising one statement, it is determined whether the
statement traversed by the analyzer is a cause statement or
not. Accordingly, if the statement is not a cause statement
the path set is empty and if the statement is a cause
statement the path set comprises the statement identifier
for the respective statement. For the analysis of statement
11 it can be seen that the path set comprises the statement
identifiers {8, 11}. This is because the variable i is used
as a subscript for the array x at statement 11 and at
statement 8 I is assigned an interval which is beyond the
prescribed limit. Thus statement 8 becomes a cause
statement as the variable i is assigned an interval which
is beyond the prescribed limit. As the variable i is used
as a subscript at statement 11 it leads to a defect. Thus
statement 11 may be referred to as the effect statement.
Thus, the path set for statement 11 comprises the list of
paths from the cause statement to the effect statement and
a user will be required to debug only the path 8, 11.
Since the cause statement is defined in the first condition
as assignment of value to a variable where lower or upper
limit of the variable goes beyond a prescribed limit, the
analyzer will identify statement 8 as a cause statement and
will start collection path information from thereon
EXAMPLE: 3 - Liveness of Variables
The techniques proposed herein may also be used for
visualizing information about properties such as liveness
of variables, busyness of an expression, etc.
To compute and visualize information about a liveness
property of a variable a backward dataflow analysis is
used. The embodiments described herein may be augmented
with backward dataflow analyzer, such as, live variable
analyzer. Live variable analysis computes a set of live
variables at each program point. For example, a variable v
is called live at a program point p if value of v is used
by at least one statement following p without being re-
defined.
Referring now to FIG 13 and FIG 14, an example of a
computer program is shown in FIG 13 and the corresponding
CFG is illustrated in FIG 14. As the live variable analyzer
is a backward dataflow flow analyzer, the analysis of the
CFG illustrated in FIG 14 is performed backward. In the
shown CFG of FIG 14, after the block 1412, variables r4 and
r5 are live as they are used in the block 1412. After block
1410 variables r2 and r3 are also live as they have been
used in block 1410. Thus, after the analysis of block 1410
r2, r3, r4 and r5 are live variables as the variables have
been used at least once without being re-defined. Moving
backward, the flow encounters a branch and has two flow
paths, i.e., to block 1408 and 1404. After, the analysis of
blocks 1408 and 1406 r4 is killed from the live variable
set as r4 is re-defined at block 1406. After, the analysis
of blocks 1404 and 1402, the live variable set comprises of
r2, r3 and r5, as no variable is redefined at blocks 1404
and 1402.
Accordingly, to compute and visualize liveness property of
a variable, the transfer function of live variable analysis
maps OUT set to IN set. Dataflow equations for a basic
block b are as follows:
Genb = Set of all variables which are used in the block
without being redefined.
Killb = Set of all variables which are redefined.
The first condition is defined such that a statement is
responsible for cause of an error if the statement uses a
variable that is not defined in the basic block comprising
the statement. Thus, the first condition specifies that a
statement is a cause statement if it uses a variable that
is not defined in the basic block comprising the statement.
The second condition defines an assignment of a property to
a variable. Thus, the statement where a variable is
defined is an effect statement.
As the analyzer performs a backward analysis, the analyzer
starts gathering path information is backward direction for
each variable (such as r2, r3, r4, and r5) unless it
reaches a statement which redefines the variable. Statement
7 is an effect statement as it defines r4 and cause
statement are all those statements where this definition is
used. In the shown example, one of the cause statements is
7, hence path for variable r4 from cause statement 11 is
[11, 10, 7}. This implies r4 is live along the path 11 10
7.
The proposed analysis herein computes all possible defined-
use chain for a variable.
Example: 4 - Detecting Infeasible Paths
Additionally, the techniques proposed herein may be used to
detect infeasible paths while computing the path
information from a statement responsible for an error to
the statement having the defect. Referring now to FIG 15,
an example of a computer program is shown. Analyzing the
example program of FIG 15 for infeasible paths, it can be
seen that abstract property and path information of
variable p at statement 3 is [null, {1, 2, 3}]. This
implies that null path corresponding to p till statement 3
is 1 2 3. Since statement 3 is a predicate which compares p
directly with null, the else branch is infeasible, hence 1
2 3 5 6 7 8 is infeasible path.
The embodiments described herein enable providing a visual
representation of a path from a cause of an error to a
defect in a computer program. The visual representation of
the path from the cause to the defect, enable an individual
to easily correlate a defect to the possible cause for the
defect. This enables easy and fast debugging of computer
programs, and thus achieve reduction in the development
cost of a product.
While this invention has been described in detail with
reference to certain preferred embodiments, it should be
appreciated that the present invention is not limited to
those precise embodiments. Rather, in view of the present
disclosure which describes the current best mode for
practicing the invention, many modifications and variations
would present themselves, to those of skill in the art
without departing from the scope and spirit of this
invention. The scope of the invention is, therefore,
indicated by the following claims rather than by the
foregoing description. All changes, modifications, and
variations coming within the meaning and range of
equivalency of the claims are to be considered within their
scope.
We claim:
1. A method of analyzing a computer program, the method
comprising:
- defining a first condition with respect to which an
analysis of said computer program is to be performed,
wherein said first condition specifies a cause of an
error,
- defining a second condition, wherein said second
condition specifies a defect for said computer program,
wherein said defect is a result of said first condition,
wherein said first condition specifies acquisition of a
property by a variable that leads to said error
responsive to said defect specified by said second
condition and said defect specified by said second
condition is an operation that is forbidden based on said
property acquired by said variable,
- computing path information from a statement of said
computer program satisfying said first condition to
another statement of said computer program satisfying
said second condition, and
- providing a visual representation of a path from said
cause of said error to said defect using said path
information.
2. The method according to claim 1, wherein the computing
of said path information from said statement of said
computer program satisfying said first condition to
another statement of said computer program satisfying
said second condition comprises generating a control flow
graph from a source code of said computer program,
wherein said control flow graph comprises at least one
basic block comprising at least one statement of said
source code.
3. The method according to claim 2, further comprising
performing a dataflow analysis upon each of said basic
blocks of said control flow graph.
4. The method according to claim 3, wherein the
performing of said dataflow analysis each of said basic
blocks of said control flow graph includes performing an
intra-procedural dataflow analysis upon said basic blocks
of said control flow graph.
5. The method according to claim 3, wherein the
performing of said dataflow analysis each of said basic
blocks of said control flow graph includes performing an
inter-procedural dataflow analysis upon said basic blocks
of said control flow graph.
6. The method according to claim 3, wherein the
performing of the dataflow analysis upon each of said
basic blocks of said control flow graph comprises:
- computing a property of a variable in each of said
basic blocks of said control flow graph,
- determining if said variable having said property
satisfies said first condition, and
- computing a list of paths from a respective said basic
block comprising said variable to another said basic
block satisfying said second condition if said variable
having said property satisfies said first condition.
7. The method according to claim 6, wherein said list of
paths comprises identification information of said
corresponding statements of said source code from said
block to said another said block.
8. The method according to claim 6, wherein the
computation of said property of said variable and said
list of paths are derived using a transfer function,
wherein said transfer function comprises a function to
compute said list of paths.
9. The method according to claim 8, wherein the
computation of said property of said variable derived
using said transfer function includes solving iteratively
said transfer function until successive iterations do not
change said property of said respective variable.
10. The method according to claim 6, further comprising
ceasing to emit repetitive identification information of
said corresponding statements of said source code to said
list of paths.
11. The method according to claim 10, wherein the
cessation of emission of repetitive identification
information of said corresponding statements of said
source code from said list of paths is performed using a
concatenation operation.
12. The method according to claim 6, further comprising
maintaining said property and said list of paths for said
variable from said basic block to said another said block
on a memory.
13. The method according to claim 1, wherein the
providing of said visual representation of path from said
cause of said error to said defect using said path
information includes providing a visual indication of
said path from said statement to said another statement
responsive to an input received from a user over a
warning message displayed corresponding to said defect in
said another statement.
14. The method according to claim 1, wherein the
providing of said visual representation of path from said
cause of said error to said defect using said path
information includes indicating a severity level of said
statement based on number of said errors caused by said
statement by representing said statement distinguishably.
15. One or more computer-readable media containing a
computer program that is executable by a processor to
perform the method recited in claims 1 to 14.
16. A system for analyzing a computer program, wherein
the system comprises a processor configured to:
- process a first condition, wherein said first condition
specifies a cause of an error,
- process a second condition, wherein said second
condition specifies a defect for said computer program,
wherein said defect is a result of said first condition,
wherein said first condition specifies acquisition of a
property by a variable that leads to said error
responsive to said defect specified by said second
condition and said defect specified by said second
condition is an operation that is forbidden based on said
property acquired by said variable,
- compute path information from a statement of said
computer program satisfying said first condition to
another statement of said computer program satisfying
said second condition, and
- a display to provide a visual representation of path
from said cause of said error to said defect using said
path information.
17. The system according to claim 16, wherein said
processor is configured to generate a control flow graph
from a source code of said computer program, wherein said
control flow graph comprises at least one basic block
comprising at least one statement of said source code.
18. The method according to claim 17, wherein said
processor is further configured to perform a dataflow
analysis upon each of said basic blocks of said control
flow graph.
19. The method according to claim 18, wherein the
processor is configured to perform an intra-procedural
dataflow analysis upon said basic blocks of said control
flow graph.
20. The method according to claim 18, wherein the
processor is configured to perform an inter-procedural
dataflow analysis upon said basic blocks of said control
flow graph.
21. The method according to claim 18, wherein the
processor is configured to:
- compute a property of a variable in each of said basic
blocks of said control flow graph,
-determine if a respective said variable having a
respective said property satisfies said first condition,
and
- compute a list of paths from a respective said basic
block comprising said respective said variable to another
said basic block satisfying said second condition if said
respective said variable having said respective said
property satisfies said first condition.
22. The method according to claim 21, wherein said list
of paths comprises identification information of said
corresponding statements of said source code from said
block to said another said block.
23. The method according to claim 21, wherein said
processor computes said property of said variable and
said list of paths using a transfer function, wherein
said transfer function comprises a function to compute
said list of paths.
24. The method according to claim 23, wherein the
processor solves iteratively said transfer function until
successive iterations do not change said property of said
respective variable.
25. The method according to claim 21, wherein said
processor is further configured to cease emitting
repetitive identification information of said
corresponding statements of said source code to said list
of paths.
26. The method according to claim 25, wherein said
processor performs a concatenation operation to cease
emission of repetitive identification information of said
corresponding statements of said source code from said
list of paths.
27. The method according to claim 21, wherein said
processor is further configured to maintain said property
and said list of paths for said variable from said basic
block to said another said block on a memory.
28. The method according to claim 1, wherein the display
provides said visual representation of said path from
said statement to said another statement responsive to an
input received from a user over a warning message
displayed corresponding to said defect in said another
statement.
29. The method according to claim 1, wherein the display
provides an indication of a severity level of said
statement based on number of said errors caused by said
statement by representing said statement distinguishable.
The invention relates to a method, one or more computer-readable media and a system for analyzing a computer program, wherein the method comprises defining a first condition, wherein said first condition specifies a cause
of an error, defining a second condition, wherein said second condition specifies a defect for said computer program, wherein said defect is a result of said first condition, wherein said first condition specifies acquisition of a property by a variable that leads to said error responsive to said defect specified by said second condition and said defect specified by said second condition is an operation that is forbidden based on said property acquired by said variable, computing path information from a statement of said computer program satisfying said first condition to another statement of
said computer program satisfying said second condition and providing a visual representation of the path.
| Section | Controller | Decision Date |
|---|---|---|
| # | Name | Date |
|---|---|---|
| 1 | 289-kol-2009-specification.pdf | 2011-10-06 |
| 1 | 289-KOL-2009-Written submissions and relevant documents (MANDATORY) [04-06-2019(online)].pdf | 2019-06-04 |
| 2 | 289-kol-2009-gpa.pdf | 2011-10-06 |
| 2 | 289-KOL-2009-HearingNoticeLetter.pdf | 2019-05-30 |
| 3 | 289-KOL-2009_EXAMREPORT.pdf | 2016-06-30 |
| 3 | 289-kol-2009-form 3.pdf | 2011-10-06 |
| 4 | 289-kol-2009-form 2.pdf | 2011-10-06 |
| 4 | 289-KOL-2009-(27-01-2015)-CLAIMS.pdf | 2015-01-27 |
| 5 | 289-KOL-2009-FORM 18.pdf | 2011-10-06 |
| 5 | 289-KOL-2009-(27-01-2015)-CORRESPONDENCE.pdf | 2015-01-27 |
| 6 | 289-kol-2009-form 1.pdf | 2011-10-06 |
| 6 | 289-KOL-2009-(27-01-2015)-DESCRIPTION (COMPLETE).pdf | 2015-01-27 |
| 7 | 289-KOL-2009-FORM 1-1.1.pdf | 2011-10-06 |
| 7 | 289-KOL-2009-289-KOL-2009.pdf | 2011-10-06 |
| 8 | 289-kol-2009-drawings.pdf | 2011-10-06 |
| 8 | 289-kol-2009-abstract.pdf | 2011-10-06 |
| 9 | 289-kol-2009-claims.pdf | 2011-10-06 |
| 9 | 289-kol-2009-description (complete).pdf | 2011-10-06 |
| 10 | 289-KOL-2009-CORRESPONDENCE-1.1.pdf | 2011-10-06 |
| 10 | 289-kol-2009-correspondence.pdf | 2011-10-06 |
| 11 | 289-KOL-2009-CORRESPONDENCE-1.1.pdf | 2011-10-06 |
| 11 | 289-kol-2009-correspondence.pdf | 2011-10-06 |
| 12 | 289-kol-2009-claims.pdf | 2011-10-06 |
| 12 | 289-kol-2009-description (complete).pdf | 2011-10-06 |
| 13 | 289-kol-2009-abstract.pdf | 2011-10-06 |
| 13 | 289-kol-2009-drawings.pdf | 2011-10-06 |
| 14 | 289-KOL-2009-289-KOL-2009.pdf | 2011-10-06 |
| 14 | 289-KOL-2009-FORM 1-1.1.pdf | 2011-10-06 |
| 15 | 289-KOL-2009-(27-01-2015)-DESCRIPTION (COMPLETE).pdf | 2015-01-27 |
| 15 | 289-kol-2009-form 1.pdf | 2011-10-06 |
| 16 | 289-KOL-2009-(27-01-2015)-CORRESPONDENCE.pdf | 2015-01-27 |
| 16 | 289-KOL-2009-FORM 18.pdf | 2011-10-06 |
| 17 | 289-KOL-2009-(27-01-2015)-CLAIMS.pdf | 2015-01-27 |
| 17 | 289-kol-2009-form 2.pdf | 2011-10-06 |
| 18 | 289-KOL-2009_EXAMREPORT.pdf | 2016-06-30 |
| 18 | 289-kol-2009-form 3.pdf | 2011-10-06 |
| 19 | 289-KOL-2009-HearingNoticeLetter.pdf | 2019-05-30 |
| 19 | 289-kol-2009-gpa.pdf | 2011-10-06 |
| 20 | 289-KOL-2009-Written submissions and relevant documents (MANDATORY) [04-06-2019(online)].pdf | 2019-06-04 |
| 20 | 289-kol-2009-specification.pdf | 2011-10-06 |