Sign In to Follow Application
View All Documents & Correspondence

Source Code Similarity Evaluation Method And Source Code Similarity Evaluation Apparatus

Abstract: A source code similarity evaluation apparatus (10) comprises a development result products analysis unit (21) which extracts a source code list composing the software; a correspondence analysis unit (22) which compares the comparison source code list (44a) and the comparison destination source code list (44b) to analyze correspondence of both; a comparison object specifying unit (23) which specifies a comparison object of a comparison destination source code (41b) of comparison source contained in the comparison source code list (44a), from the comparison destination source code list (44b), based on the correspondence; a similarity calculation unit (24) which determines the similarity between the comparison source code (41a) and the source code (41) specified by the comparison object specifying unit (23); and an output unit (15) which outputs the combination of comparison source code (41a) and similarity.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
05 August 2013
Publication Number
08/2015
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
Parent Application

Applicants

HITACHI, LTD.
6-6, MARUNOUCHI 1-CHOME, CHIYODA-KU, TOKYO, JAPAN

Inventors

1. YOSHIMURA KENTARO
C/O HITACHI, LTD., INTELLECTUAL PROPERTY GROUP, 12TH FLOOR, MARUNOUCHI CENTER BUILDING, 6-1, MARUNOUCHI 1-CHOME, CHIYODA-KU, TOKYO 100-8220, JAPAN
2. HASHIMOTO YASUNORI
C/O HITACHI, LTD., INTELLECTUAL PROPERTY GROUP, 12TH FLOOR, MARUNOUCHI CENTER BUILDING, 6-1, MARUNOUCHI 1-CHOME, CHIYODA-KU, TOKYO 100-8220, JAPAN
3. MIBE RYOTA
C/O HITACHI, LTD., INTELLECTUAL PROPERTY GROUP, 12TH FLOOR, MARUNOUCHI CENTER BUILDING, 6-1, MARUNOUCHI 1-CHOME, CHIYODA-KU, TOKYO 100-8220, JAPAN

Specification

W6847
- 1-
BACKGROUND OF THE INVENTION
The present invention relates to a source code similarity evaluation method for
evaluating similarity of a source code group composing software, and a source code similarity
evaluation apparatus for implementing this.
5 In recent years, with extension of an application range of software, software
called legacy software has been extending. Legacy software means such software which has
increased scale or complexity of software, and is difficult in maintenance and has increased
maintenance cost, as a result of repeated fiinctional addition or modification for new request to a
system.
10 As one cause of increase in maintenance cost of legacy software, there exists a
similar code string, what is called a code clone. The code clone means the identical or similar
plural code string, contained in the source code, and is formed mainly by performing diversion of
the source code. In software containing many code clones, in addition to increase in code size
I of the whole system, in adding change to one of a plurality of the identical or similar code
15 strings, similar change has to be perft)rmed to all other places in many cases, which has caused
increase in maintenance cost.
The code clone is said to generate, in many cases, in maintenance development,
where fiinctional addition was performed to the legacy software. It is because the code clone is
made, in many cases, by copying and then altering a source code group composing base
20 software, using software which realizes fiinction similar to fianction to be added, as the base
software. Originally, similar fiinctional parts of the base software should be made in common
as software parts. However, because of strong requirement to shorten development term or
reduce development cost in software development, the code clone is made in many software
development fields.
25 To reduce maintenance cost of software, it has been required from software
maintenance field to detect this code clone and make it common as software parts. However, it
is not efficient to discover the code clone by visual code review for large scale software.
In recent years, in view of the above problem, technology for detecting the code
clone, for a source code which composes software, has been disclosed.
30 It is described in [PROBLEM TO BE SOLVED] of JP-A-2006-18693 that "it is
an object of the present invention to provide a similar source code extraction program, a similar
W6847
- 2 -
source code extraction apparatus and a similar source code extraction method, which enable to
extract a similar source code fragment in high speed". It is described in [MEANS FOR
SOLVING THE PROBLEM] that "the comparison source code fragment designation part 210
receives designation of a source code fragment, which is a standard with comparison, extracts a
5 source code fragment which is similar to this source code fragment, from a source code group
received the designation by the correspondence analysis unit 220, and outputs it from the result
output part 290. The extraction processing of the source code to be compared similarity with
the comparison source code fragment from the source code group is performed by the
comparison destination source code fragment extraction part 270, by making reference of a
10 syntax tree prepared from the comparison source code fragment, and a syntax tree prepared from
the source code group.
There is described the detail of algorithm of diff", which is a program for
performing comparison of two files, in J. W. Hunt and M. D. Mcllroy, "An Algorithm for
Differential File Comparison", BellTelephone Laboratories Computing Science Technical
15 Report, #41, July, 1976.
There is described the research content relating to the clone detection of software,
in Rainer Koschke, "Survey of Research on Software Clones", Dagstuhl Seminar Proceedings,
06301, 19. 04. 2007.
SUMMARY OF THE INVENTION
20 In the technology described in J. W. Hunt and M. D. Mcllroy, "An Algorithm for
Differential File Comparison", BellTelephone Laboratories Computing Science Technical
Report, #41, July, 1976; and Rainer Koschke, "Survey of Research on Software Clones",
Dagstuhl Seminar Proceedings, 06301, 19. 04. 2007, there is a problem that a great amount of
time is required in understanding similarity relation, due to increase in analysis results in a
25 square of file number, when applied to large scale software composed of a large quantity of
source codes (for example, several thousands of files).
Technology described in JP-A-2006-18693 is not sufficient for an object of
performing the efficient analysis to large scale software composed of a large quantity of source
codes, due to being one, where a user specifies the comparison source code. For example, in
30 the case where software is composed of several tens of thousands of source code files, it is not
realistic for a user to designate all of the comparison source codes (files).
It is an object of the present invention to provide a source code similarity
evaluation method for evaluating similarity of two sets of software composed of a plurality of
W6847
- 3 -
source code files, and a source code similarity evaluation apparatus for implementing this.
To solve the above problem, the invention relating to the source code similarity
evaluation apparatus according to claim 1 is characterized by comprising: a source code list
extraction unit which extracts a comparison source code list shov^ring composition of comparison
5 source software, and a comparison destination source code list showing composition of
comparison destination software; a correspondence analysis unit which compares the comparison
source code list and the comparison destination source code list to analyze correspondence of
both; a comparison object specifying unit which specifies a comparison destination source code,
which is a comparison object of each comparison source code contained in the comparison
10 source code list, fi^om the comparison destination source code list, based on the correspondence;
a similarity calculation unit which calculates a similarity between the comparison source code
and the comparison destination source code specified by the comparison object specifying unit; a
similarity evaluation unit which judges a similar source code possessing the highest similarity
among the comparison destination source codes specified by the comparison object specifying
15 unit; and an output unit which outputs the comparison source code and similarity between the
comparison source code and the similar source code, by making them corresponded.
Other units are explained in the embodiments to implement the invention.
(Advantageous Effects)
According to the present invention, it becomes possible to provide a source code
20 similarity evaluation method for evaluating similarity of two kinds of software composed of a
plurality of source code files, and a source code similarity evaluation apparatus for performing
this.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 is a schematic configuration drawing showing a source code similarity
25 evaluation apparatus in the first embodiment.
Fig. 2 is a drawing showing the action of a source code similarity evaluation unit
in the first embodiment.
Fig. 3 A is a flow chart showing the processing of a development result products
analysis unit in the first embodiment.
30 Fig. 3B is a flow chart showing the processing of a development result products
analysis unit in the first embodiment.
Fig. 4 is a drawing showing a comparison source code list and a comparison
destination source code list in the first embodiment.
W6847
- 4 -
Fig. 5 is a flow chart showing the processing of a correspondence analysis unit in
the first embodiment.
Fig. 6 is a drawing showing a correspondence list in the first embodiment.
Fig. 7 is a flow chart showing the processing of a comparison object specifying
5 unit in the first embodiment.
Fig. 8 is a drawing showing a logical line definition DB in the first embodiment.
Fig. 9 is a drawing showing the finite difference analysis processing of a
similarity calculation unit in the first embodiment.
Fig. 10 is a flow chart showing the processing of a similarity calculation unit in
10 the first embodiment.
Fig. 11 is a drawing showing the similarity evaluation results in the first
embodiment and Comparative Example.
Fig. 12 is a flow chart showing the processing of a similarity evaluation unit in
the first embodiment.
15 Fig. 13 is a drawing showing the similarity-related information of a source code
in the first embodiment.
Fig. 14 is a similarity relationship diagram in the first embodiment.
Fig. 15 is a flow chart showing the processing of a similar specifications analysis
unit in the first embodiment.
20 Fig. 16 is a drawing showing the information on similar specifications in the first
embodiment.
DESCRIPTION OF THE EMBODIMENTS
Explanation will be given in detail below on aspects for implementing the present
invention with reference to each drawing.
25 (Configuration of the first embodiment)
Fig. 1 is a schematic configuration drawing showing a source code similarity
evaluation apparatus in the first embodiment.
A source code similarity evaluation apparatus 10 comprises a memory 11 for
storing executable programs or calculation results, a file database 12 for electromagnetically
30 memorizing files, a processor 13 for executing arithmetic processing, an input unit 14 for
receiving input fi^om a user, an output unit 15 for outputting information to a user, a memory unit
16 for memorizing a source code similarity evaluation program 17, and a bus 18 for enabling
mutual communication by connecting the memory 11, the file database 12, the processor 13, the
W6847
- 5 -
input unit 14, the output unit 15 and the memory unit 16.
The memory 11 is composed of, for example, a RAM (Random Access Memory)
or the like, and has a source code similarity evaluation unit 20 which evaluates similarity among
source code groups composing two software, and a similar specifications analysis unit 30 which
5 extracts detailed design specifications 42b (Fig. 2, (b) of the Fig. 4) and test specifications 43b
(Fig. 2), relating to a similar comparison destination source code.
Further, the source code similarity evaluation unit 20 comprises a development
result products analysis unit 21 (the source code list extraction unit) which extracts the source
code list showing the configuration of each software; the correspondence analysis unit 22 which
10 compares the comparison source code list and the comparison destination source code list, and
analyzes and judges correspondence of both ; the comparison object specifying unit 23 which
specifies a comparison destination source code, which is a comparison object of comparison
source code contained in the comparison source code list, fi"om the comparison destination
source code list, based on this correspondence; the similarity calculation unit 24 which calculates
15 similarity between the comparison source code and the comparison destination source code
specified by the comparison object specifying unit 23; and the similarity evaluation unit 25
which judges a similar source code possessing the highest similarity to the comparison source
codes.
The source code similarity evaluation program 17, which is memorized in the
20 memory unit 16, is read into the relevant memory 11, and executed by the processor 13, whereby
the source code similarity evaluation unit 20 and the similar specifications analysis unit 30 are
embodied.
The file database 12 is a storage apparatus represented by, for example, an HDD
(Hard Disk Drive), and is the one which electro-magnetically memorizes files.
25 The processor 13 is, for example, a CPU (Central Processing Unit), and embodies
the source code similarity evaluation unit 20 and the similar specifications analysis unit 30 by
executing program files read into the memory 11, and controls the relevant source code
similarity evaluation apparatus 10.
The input unit 14 is, for example, a mouse, a key board, a tablet or the like, and is
30 the one which a user uses in performing direction to the source code similarity evaluation
apparatus 10.
The output unit 15 is, for example, a liquid crystal display apparatus or a printer
or the like, and is the one which outputs and displays operation guidance or processing resuhs of
the relevant source code similarity evaluation apparatus 10 to a user. The output unit 15 of the
V
W6847
- 6 -
present embodiment outputs the comparison source code and similarity thereof.
The memory unit 16 is a storage apparatus represented by, for example, an HDD,
and is the one which electro-magnetically memorizes files.
The source code similarity evaluation apparatus 10 is connected to an external
5 network 100 via a network interface (not illustrated), and enables to access also to an external
file database 110. Hereafter, in the present embodiment, the source code similarity evaluation
apparatus 10 is described as the one to analyze software (projects) present in the file database 12.
(Action of the first embodiment)
Fig. 2 is a drawing showing the action of a source code similarity evaluation unit
10 in the first embodiment.
The source code similarity evaluation unit 20 performs evaluation of similarity
between software stored in a comparison source software storage part 40a and software stored in
a comparison destination software storage part 40b, when received direction of a user 200 via a
user interface 19. The source code similarity evaluation unit 20 displays similarity-related
15 information 48 to the user 200 via the user interface 19.
The comparison source software storage part 40a and the comparison destination
software storage part 40b are stored in the file database 12 or the external file database 110 (Fig.
1). The comparison source software storage part 40a is, for example, a altered one of existing
software, and is composed of a comparison source code 41a, which is a file containing a code
20 described using a predetermined programming language, the detailed design specifications 42a
where design specifications of this comparison source code 41a is described, and the test
specifications 43a where test specifications of this comparison source code 41a is described, in a
folder hierarchy under a predetermined path of the file database 12. It should be noted that the
comparison source code 41a is present in multiple, in many cases, and there may be the case of
25 reaching an order of, for example, several thousands to several tens of thousands.
The comparison destination software storage part 40b is existing software, which
is, for example, an alteration source of comparison source software, and is composed of a
comparison destination source code 41b, which is a file containing a code described using the
identical programming language as that for describing the comparison source code 41a, the
30 detailed design specifications 42b where design specifications of the relevant comparison
destination source code 41b is described, and the test specifications 43b where test specifications
of the relevant comparison destination source code 41b is described, in a folder hierarchy under a
predetermined path of the file database 12.
Hereafter, in the case where the comparison source code 41a and the comparison
W6847
- 7 -
»
destination source code 41b are not particularly distinguished, they are described simply as the
"source code 41". In the case where the detailed design specifications 42a and 42b are not
particularly distinguished, they are described simply as the "detailed design specifications 42".
In the case where the test specifications 43a and 43b are not particularly distinguished, they are
5 described simply as the "test specifications 43". In addition, existing software is described as
"base software", and changed or altered software of the relevant existing software is described as
"ahered software".
A logical line definition (Database) 26 is a database where a logical line of a
source code composing software is defined every each extension.
10 The source code similarity evaluation unit 20 acquires definition of the logical
line of the source code by referring to the logical line definition DB26, based on the extension of
the source code, and extracts the logical line of the comparison source code 41a or the
comparison destination source code 41b or the like, based on definition of the relevant logical
line. The logical line definition DB26 is stored in the memory unit 16 (Fig. 1) and is referenced
15 by the source code similarity evaluation unit 20.
The development result products analysis unit 21 analyzes the comparison source
software storage part 40a and extracts a comparison source code list 44a, which shows
composition of comparison source software; and analyzes the comparison destination software
storage part 40b and extracts a comparison destination source code list 44b, which shows
20 composition of comparison destination software. The comparison source code list 44a is a list
of the comparison source code 41a which composes the comparison source software. The
comparison destination source code list 44b is a list of the comparison destination source code
41b which composes the comparison destination software.
The correspondence analysis unit 22 is the one which compares the comparison
25 source code list 44a with the comparison destination source code list 44b, and analyzes and
judges correspondence of both. The correspondence analysis unit 22 analyzes and judges the
comparison destination source code 41b having correspondence to the comparison source code
41a composing the comparison source code list 44a, based on the comparison source code list
44a and the comparison destination source code list 44b, and outputs a correspondence list 45.
30 The correspondence list 45 is a list of a combination of the comparison source
code 41a and the comparison destination source code 41b which corresponds to the relevant
comparison source code 41a.
The comparison object specifying unit 23 is the one which specifies a comparison
object of each comparison source code 41a contained in the comparison source code list 44a
W6847
- 8 -
*
from the comparison destination source code list 44b, based on the correspondence hst 45, and
outputs the similarity evaluation results 47 calculated by the similarity calculation unit 24.
The similarity calculation unit 24 is the one which calculates similarity between
the comparison source code 41a and the comparison destination source code 41b specified by the
5 comparison object specifying unit 23.
The similarity evaluation results 47 is a calculation result of similarity between
the comparison source code 41a contained in the comparison destination source code list 44b,
and the comparison destination source code 41b contained in the comparison source code list
44a.
10 The similarity evaluation unit 25 is the one which judges a similar source code
possessing the highest similarity among the comparison destination source codes 41b specified
by the comparison object specifying unit 23, and outputs the similarity-related information 48.
The similarity-related information 48 stores the comparison source code 41a contained in the
comparison destination source code list 44b, a similar source code, and similarity between this
15 comparison source code 41a and the similar source code.
The user interface 19 receives an input of the user 200, by the input unit 14, as
well as provides information to the user 200 by outputting information by the output unit 15.
The output unit 15 makes correspondent and outputs the comparison source code 41a and
similarity between this comparison source code 41a and the similar source code.
20 The similar specifications analysis unit 30 is the one which presents specifications
relating to the similar source code, as the similar specifications, in the case where specifications
have not been related to the comparison source code 41a described in the comparison source
code list 44a. The similar specifications analysis unit 30 specifies the detailed design
specifications 42b to be a diversion candidate, and the test specifications 43b, based on the
25 comparison destination source code 41b which is a diversion source, and outputs the similar
specifications information 49.
The Figs. 3 A and 3B are flow charts showing the processing of a development
result products analysis unit in the first embodiment.
The Fig. 3 A is a flow chart showing total processing of the development result
30 products analysis unit.
When the processing is started, in the step SIO, the development result products
analysis unit 21 designates a root folder of the comparison source software storage part 40a and
performs development result products analysis processing (Fig. 3B).
In the step SI 1, the development result products analysis unit 21 designates a root
W6847
- 9 -
folder of the comparison destination software storage part 40b and performs development result
products analysis processing (Fig. 3B). When the processing of the step Sll is ended, the
whole processing of the Fig. 3 A is ended.
The Fig. 3B is a flow chart showing the processing every source code storage part
5 of the development result products analysis unit 21.
When the processing is started, in the step S20, the development result products
analysis unit 21 searches a project file (for example, "makefile" or the like) of the relevant folder
specified as a source code storing part. Here the "project file" means an administrative file of
the relevant software, that is, the one where rules etc. to form an executable file from a source
10 file of the relevant software are described.
In the steps S21 to S25, the development resuU products analysis unit 21 repeats
the processing on the all project files.
In the step S22, the development result products analysis unit 21 extracts the
source code 41 (a file name and a relative path name) relating to the relevant project file.
15 In the step S23, the development resuh products analysis unit 21 extracts the
detailed design specifications 42 of each source code 41 relating to the relevant project file.
The development result products analysis unit 21 searches a relative path of the source code 41
I and has extracted the detailed design specifications 42 possessing the file name of the source
code 41, and the file name having predetermined relationship.
I 20 In the step S24, the development result products analysis unit 21 extracts the test
i
I specifications 43 of each source code 41 relating to the relevant project file. The development
result products analysis unit 21 searches the relative path of the source code 41, and has
extracted the test specifications 43 possessing the file name having predetermined relationship to
the file name of the source code 41.
25 In the step S25, the development result products analysis unit 21 judges whether
the processing was repeated on the all project files or not. The development result products
analysis unit 21, in the case where the relevant judgment condition is not satisfied, returns to the
processing of the step S21.
In the step S26, the development result products analysis unit 21 searches sub-
30 folders of the relevant predetermined folder.
In the steps S27 to S29, the development result products analysis unit 21 repeats
the processing on the all sub-folders.
In the step S28, the development resuh products analysis unit 21 recursively
performs development result products analysis processing (Fig. 3B) for the relevant sub-folders.
W6847
-10-
In the step S29, the development result products analysis unit 21 judges whether
the processing was repeated on the all project files or not. The development result products
analysis unit 21, in the case where the relevant judgment condition is not satisfied, returns to the
processing of the step S27, and in the case where the relevant judgment condition is satisfied,
5 completes the processing of the Fig. 3B.
The (a) and (b) of the Fig. 4 are drawings showing a comparison source code list
and a comparison destination source code list in the first embodiment.
The (a) of the Fig. 4 is a drawing showing a comparison source code list 44a.
The comparison source code list 44a has an ID column 44-1 for identifying each
10 comparison source code 41a, a file name column 44-2 for storing file name information of each
comparison source code 41a, a relative path name column 44-3 for storing relative path name
information where each comparison source code 41a is stored, a logical line number column 44-
4 for storing SLOC (source lines of code), which is a logical line number of each comparison
source code 41a, a detailed design specifications column 44-5 for storing a file name of the
15 detailed design specifications 42a of each comparison source code 41a, and a test specifications
column 44-6 for storing a file name of the test specifications 43a of each comparison source code
I 41a.
I
For example, a file "FOl.c" of the comparison source code 41a, having an ID of 1,
is stored in a relative path "/DOl", and the logical line number is 300 lines; a file name of the
20 detailed design specifications 42a is "F01_spec.doc"; and a file name of the test specifications
I 43a is "F01jest.doc".
For example, a file "F06.c" of the comparison source code 41a, having an ID of 4,
is stored in a relative path "/D02", and the logical line number is 500 lines; and because the
detailed design specifications 42b and the test specifications 43b are not present, N/A (Not
25 Available), showing no presence of the relevant files, is stored in the detailed design
specifications column 44-5, and the test specifications column 44-6.
The (b) of the Fig. 4 is a drawing showing a comparison destination source code
list 44b.
The comparison destination source code list 44b is composed similarly to the
30 comparison source code list 44a.
Fig. 5 is a flow chart showing the processing of a correspondence analysis unit in
the first embodiment.
When the processing is started, in the step S30, the correspondence analysis unit
22 initializes a variable i with 1. The variable i shows the ID of the comparison source code
. . W6847
- 11-
41a included in the comparison source code list 44a. Here, because the variable i=l, the file
name of the comparison source code Fi (41a) is "FOl.c".
In the step S31, the correspondence analysis unit 22 initializes a variable j with 1.
The variable j shows the ID of the comparison destination source code 41b included in the
5 comparison destination source code list 44b.
In the step S32, the correspondence analysis unit 22 judges whether the relative
path name of the comparison source code Fi (41a) is identical to the relative path name of the
comparison destination code Fj (41b), or not. The correspondence analysis unit 22 performs
the processing of the step S3 3, in the case where the relevant judgment condition is satisfied
10 (Yes), while performs the processing of the step S35, in the case where the relevant judgment
condition is not satisfied (No).
In the step S33, the correspondence analysis unit 22 judges whether the file name
of the comparison source code Fi (41a) is identical to the file name of the comparison destination
code Fj (41b), or not. The correspondence analysis unit 22 performs the processing of the step
15 S34, in the case where the relevant judgment condition is satisfied (Yes), while, it proceeds to the
I step S3 5, in the case where the relevant judgment condition is not satisfied (No).
In the step S34, the correspondence analysis unit 22 judges the comparison source
code Fi (41a) and the comparison destination source code Fi (41b) as being in correspondence,
and records it in the correspondence list 45.
20 In the step S3 5, the correspondence analysis unit 22 judges whether the variant j
is equal to or larger than the maximum value jmax (in the present embodiment, it is 5) of the ID
of the comparison destination source code list 44b, or not. The correspondence analysis unit 22
performs the processing of the step S3 7, in the case where the relevant judgment condition is
satisfied (Yes), while, it performs the processing of the step S3 6, in the case where the relevant
25 judgment condition is not satisfied (No).
In the step S36, the correspondence analysis unit 22 adds 1 to the variant j and
returns to the processing of the step S32. In this way, the correspondence analysis unit 22 can
perform the processing of the steps S32 to S34, for each comparison destination source code 41b
of the all comparison destination source code list 44b.
30 In the step S37, the correspondence analysis unit 22 judges whether the variant i
is equal to or larger than the maximum value imax (in the present embodiment, it is 5) of the ID
of the comparison source code list 44a, or not. The correspondence analysis unit 22 ends the
processing of Fig. 5, in the case where the relevant judgment condition is satisfied (Yes), while,
it performs the processing of the step S3 8, in the case where the relevant judgment condition is
W6847
- 12-
not satisfied (No).
In the step S3 8, the correspondence analysis unit 22 adds 1 to the variant i and
returns to the processing of the step S31. In this way, the correspondence analysis unit 22 can
perform the processing of the steps S31 to S35, for each comparison source code 41a of the all
5 comparison source code list 44a.
Fig. 6 is a drawing showing the correspondence list in the first embodiment.
In the correspondence list 45, correspondence between the ID of the comparison
source code 41a and the ID of the comparison destination source code 41b is described. The
ID=1 of the comparison source code 41a and the ID=1 of the comparison destination source code
10 41b have correspondence. The ID=2 of the comparison source code 41a and the ID=2 of the
comparison destination source code 41b have correspondence. The ID=3 of the comparison
source code 41a and the ID=4 of the comparison destination source code 41b have
correspondence.
Fig. 7 is a flow chart showing processing of a comparison object specifying unit
15 in the first embodiment.
When the processing is started, in the step S40, the comparison object specifying
unit 23 initializes a variable i with 1. The variable i shows the ID of the comparison source
I code Fi (41a) included in the comparison source code list 44a.
In the step S41, the comparison object specifying unit 23 judges whether the
20 comparison destination source code Fj (41b) having correspondence to the comparison source
I code Fi (41a) is present or not. The comparison object specifying unit 23 performs the
processing of the step S342, in the case where the relevant judgment condition is satisfied (Yes),
while, it performs the processing of the step S44, in the case where the relevant judgment
condition is not satisfied (No).
25 In the step S42, the comparison object specifying unit 23 sets the ID of the
comparison destination source code Fj (41b) corresponding to the comparison source code Fi
(41a) to be the variable j , based on the correspondence list 45.
In the step S43, the comparison object specifying unit 23 performs the processing
of the step 48, after performing similarity calculation processing of the comparison source code
30 Fi (41a) and the comparison destination source code Fj (41b).
In the step S44, the comparison object specifying unit 23 initializes a variable j
with 1. The variable j shows the ID of the comparison destination source code 41b included in
the comparison destination source code list 44b.
In the step S45, the comparison object specifying unit 23 performs the similarity
W6847
-13-
calculation processing of the comparison source code Fi (41a) and the comparison destination
source code Fj (41b). The similarity calculation processing will be explained in detail in Fig.
10 to be described later.
In the step S46, the comparison object specifying unit 23 judges whether the
5 variant j is equal to or larger than the maximum value jmax (in the present embodiment, it is 5)
of the ID of the comparison destination source code list 44b, or not. The comparison object
specifying unit 23 performs the processing of the step S48, in the case where the relevant
judgment condition is satisfied (Yes), while performs the processing of the step S47, in the case
where the relevant judgment condition is not satisfied (No).
10 In the step S47, the comparison object specifying unit 23 adds 1 to the variant j
and returns to the processing of the step S45.
In the step S48, the comparison object specifying unit 23 judges whether the
variant i is equal to or larger than the maximum value imax (in the present embodiment, it is 5)
of the ID of the comparison source code list 44a, or not. The comparison object specifying unit
15 23 ends the processing of Fig. 7, in the case where the relevant judgment condition is satisfied
(Yes), while performs the processing of the step S49, in the case where the relevant judgment
condition is not satisfied (No).
In the step S49, the comparison object specifying unit 23 adds 1 to the variant i
and returns to the processing of the step S41.
20 Fig. 8 is a drawing showing a logical line definition DB in the first embodiment.
A logical line definition DB26 has an extension column 26-1, a delimiter column 26-2 of the
i
logical line, a comment start column 26-3 and a comment end column 26-4.
The extension column 26-1 shows information on an extension part of the file
name. Here the extension means a part at and after the period of tail of the file name. The
25 source code shows a computer language by which the relevant source code is described, by the
extension part of the file name.
The delimiter column 26-2 is the one showing a delimiter rule of the logical line
of the computer language relating to the relevant extension. The source code similarity
evaluation unit 20 counts number of the logical line of the source code, based on the delimiter
30 rule of the relevant logical line.
The comment start column 26-3 is the one showing a starting rule of a comment
of the computer language relating to the relevant extension.
The comment end column 26-4 is the one showing an ending rule of a comment
of the computer language relating to the relevant extension. The source code similarity
W6847
- 14-
evaluation unit 20, in counting the logical line number of the source code, ignores all comments,
based on the comment start column 26-3 and the comment end column 26-4.
Fig. 9 is a drawing showing finite difference analysis processing of a similarity
calculation unit in the first embodiment.
5 Here, explanation will be given using similarity relation between the comparison
source code Fi (41a) and the comparison destination source code Fj (41b), as an example.
The comparison source code Fi (41a) is the one showing the comparison source code 41a by the
logical line. The logical line number of the comparison source code Fi (41a) is shown by L
(Fi).
10 The comparison destination source code Fj (41b) is the one expressed the
comparison destination source code 41b by the logical line. The logical line number of the
comparison destination source code Fj (41b) is shown by L (Fj).
In a common line 41c of the comparison source code Fi (41a) and a common line
I 41d of the comparison destination source code Fj (41b), the identical content is described. Line
15 number of the common lines 41c and the 41d is shown by a common line number L (FiAFj).
In the finite difference analysis processing S51, finite difference between the
comparison source code Fi (41a) and the comparison destination source code Fj (41b) is
analyzed. Detail of the relevant finite diffisrence analysis processing S51 is described in J. W
Hunt and M. D. Mcllroy, "An Algorithm for Differential File Comparison", BellTelephone
20 Laboratories Computing Science Technical Report, #41, July, 1976.
I As the result of the finite difference analysis processing S51, finite difference Dij
(46) between source codes is output. The first digit of each line of the finite difference Dij (46)
between source codes shows content of the finite difference. In the case where first digit of the
line is "<", it shows that the relevant line is included only in the comparison source code Fi
25 (41a). In the case where first digit of the line is ">", it shows that the relevant line is included
only in the comparison destination source code Fj (41b). Here, the total line number of the
finite difference Dij between source codes (46) is given by L (Dij).
A calculation formula of similarity Sij between the comparison source code Fi
(41a) and the comparison destination source code Fi (41b) is shown in (Expression 1).
„.. _ UFIAFJ)
30 oil — —z r ... (Expression 1)
A calculation formula of the common line number L (FiAFj) is shown in
(Expression 2).
I
I
W6847
-15-
\ ru- c-^ UFO-hLiFj^-UDij)
H,r I At]) — ... (Expression 2)
A calculation formula of an independent line number L (FivFj) is shown in
(Equation 3).
h(FiyFj) — L(Fi) 4- L(iFj) — L{Fi/^Fj) ... (Expressions)
5 Fig. 10 is a drawing showing the processing of a similarity calculation unit in the
first embodiment.
The similarity calculation unit 24 has two of the comparison source code Fi (41a)
and the comparison destination source code Fi (41b) as inputs.
In the logical line extraction processing S50, the similarity calculation unit 24
10 extracts the logical line of the comparison source code Fi (41a) and the comparison destination
source code Fi (41b), respectively, by referring to the logical line definition DB26.
I
I In the finite difference analysis processing S51, the similarity calculation unit 24
analyzes the finite difference Dij between source codes between the comparison source code Fi
(41a) and the comparison destination source code Fj (41b).
I 15 In the finite difference line number measurement processing S52, the similarity
calculation unit 24 calculates the logical line number L (Dij) of the finite difference Dij between
source codes.
! In the common line number calculation processing S53, the similarity calculation
unit 24 calculates the common line number L (FiAFj) between the source codes, based on
20 (expression 2).
In the independent line number calculation processing S54, the similarity
calculation unit 24 calculates the independent line number L (FivFj) between the source codes,
based on (expression 3).
In the between source codes similarity calculation processing S55, the similarity
25 calculation unit 24 calculates the between source cords similarity Sij, by dividing the common
line number L (FiAFj) with the independent line number L (FivFj), based on (expression 1).
The (a) and (b) of the Fig. 11 are drawings showing similarity evaluation resuHs
in the first embodiment and Comparative Example.
The (a) of the Fig. 11 is a drawing showing the similarity evaluation result 47 in
30 the first embodiment.
The line direction of the similarity evaluation results 47 shows each file of the
comparison source code list 44a. The column direction of the similarity evaluation results 47
W6847
-16-
shows each file of the comparison destination source code list 44b. In the elements of each
matrix, calculation resuUs of the similarity Sij between source codes are stored.
Here, because in the FOl.c, F02.c, and F04.C of the comparison source code list
44a, the comparison destination source code 41b having correspondence to the comparison
5 destination source code list 44b is present, similarity between the comparison destination source
code 41b having the relevant correspondence is calculated, and similarity of the other source
coded 41 not having the relevant correspondence is not calculated.
In the case where the source code having correspondence to the comparison
destination source code list 44b is not present in the comparison source code 41a relating to the
10 comparison source code list 44a, similarity to the all comparison destination source code 41b of
the comparison destination source code list 44b is calculated.
The (b) of the Fig. 11 is a drawing showing the similarity evaluation result 47c in
Comparative Example.
Here, the source code similarity evaluation apparatus of Comparative Example is
I 15 composed so as to calculate similarity between the all comparison source codes Fi (41a) of the
comparison source code list 44a and the all comparison destination source codes Fj (41b) of the
comparison destination source code list 44b.
I The source code similarity evaluation apparatus 10 of the present embodiment
investigates, in advance, correspondence between the source codes, and acquires the similarity
I 20 evaluation results 47 by calculating only similarity of the source codes having correspondence.
i
I In this way, the source code similarity evaluation apparatus 10 of the present embodiment
enables to decrease calculation amount, and therefore enables to obtain the similarity evaluation
results 47 in a shorter period of time, as compared with Comparative Example.
Fig. 12 is a flow chart showing processing of a similarity evaluation unit in the
25 first embodiment.
When the processing is started, in the step S60, the similarity evaluation unit 25
initializes a variable i with 1. The variable i shows the ID of the comparison source code Fi
(41a) included in the comparison source code list 44a.
In the step S61, the similarity evaluation unit 25 judges whether correspondence
30 destination is present in the comparison source code Fi (41a) or not, based on the correspondence
list 45. The similarity evaluation unit 25 performs the processing of the step S62, in the case
where the relevant judgment condition is satisfied (Yes), while, it performs the processing of the
step S63, in the case where the relevant judgment condition is not satisfied (No).
In the step S62, the similarity evaluation unit 25 sets the ID of the comparison
W6847
-17-
destination source code Fj (41b) having correspondence to the comparison source code Fi (41a)
to be the variant j to perform the processing of the step S64. Here, the comparison destination
source code Fj (41b) shown by the variant j is a similar source code for calculating the similarity
of the comparison source code Fi (41a) shown by the variant i.
5 In the step S63, the similarity evaluation unit 25 sets the ID of the comparison
destination source code Fj (41b) having correspondence to the comparison source code Fi (41a)
to be the variant j to perform the processing of the step S64. Here, the comparison destination
source code Fj (41b) shown by the variant j is a "similar source code" for calculating the
similarity to the comparison source code Fi (41a) shown by the variant i.
10 In the step S64, the similarity evaluation unit 25 sets the file name of the
comparison destination source code Fj (41b) to the similar file name of the comparison source
code Fi (41a).
In the step S65, the similarity evaluation unit 25 sets the relative path name of the
comparison destination source code Fj (41b) to the similar relative path name of the comparison
15 source code Fi (41 a).
In the step S66, the similarity evaluation unit 25 sets similarity between the
comparison source code Fi (41a) and the comparison destination source code Fj (41b) to
similarity of the comparison source code Fi (41a).
In the step S67, the similarity evaluation unit 25 judges whether the variant i is
20 equal to or larger than the maximum value (in the present embodiment, it is 5) of the ID of the
comparison source code list 44a, or not. The similarity evaluation unit 25 ends the processing
of Fig. 12, in the case where the relevant judgment condition is satisfied (Yes), while, it performs
the processing of the step S68, in the case where the relevant judgment condition is not satisfied
(No).
25 In the step S68, the similarity evaluation unit 25 adds 1 to the variant i and returns
to the processing of the step S61.
Fig. 13 is a drawing showing the similarity-related information of a source code
in the first embodiment.
The similarity-related information 48 has an ED column 48-1 for identifying each
30 comparison source code 41a, an ID column 48-2 for storing file name information of each
comparison source code 41a, an ID column 48-3 for storing relative path name information
where each comparison source code 41a is stored, an ID column 48-4 for storing logical line
number (SLOC) of each comparison source code 41a, a similar file name column 48-5 for
storing a file name information of the comparison destination source code 41b corresponding to
W6847
-18-
the comparison source code 41a, a similar relative path name column 48-6 for storing the relative
path name information of the comparison destination source code 41b corresponding to the
comparison source code 41a, and a similarity column 48-7 for storing similarity information
between the comparison source code 41a and the comparison destination source code 41b.
5 Fig. 14 is a similarity relation drawing in the first embodiment.
The similarity relation drawing 60 is the one where the similarity-related
information 48 is shown to the output unit 15 of the user interface 19. The similarity relation
drawing 60 is represented by a tree structure, where a root directory is regarded as a root node;
the directory is regarded as an intermediate node; and a file is regarded as a leaf node.
10 In the similarity relation drawing 60, a root 61 shown as "root" is represented as
the root node. In the root 61, which is the root node, a folder 62-1 shown by "DOl", and a
folder 62-2 shown by "D02" are branched in a right direction as the intermediate node.
In the folder 62-1, which is the intermediate node, the file 63-1 shown by "FOl.c"
and the file 63-2 shown by "F02.c" are branched in a right direction as the leaf node.
15 In the folder 62-2, as the intermediate node, the file 63-3 shown by "F04.c", the
file 63-4 shown by "F06.c" and the file 63-5 shown by "F07.c"are branched in a right direction
as the leaf node. Hereafter, in the case where the files 63-1 to 63-5 are not particularly
! distinguished, they are described simply as the file 63.
At the right side of the files 63-1 to 63-5, which are the leaf nodes, rotated bar
20 graphs 64-1 to 64-5, showing logical line number and similarity as each source code, are shown.
Hereafter, in the case where the rotated bar graphs 64-1 to 64-5 are not particularly distinguished,
they are described simply as the rotated bar graph 64.
At the underside of the similarity relation drawing 60, a legend 65 is shown. In
the legend 65, there are shown a "clone part" shown by while color and an "original part" shown
25 by gray color.
Length of the rotated bar graph 64 is the logical line number, as the source code
of a corresponding file 63. The rotated bar graph 64 is each color-coded by two color, white
color and gray color. However, the rotated bar graph 64, not limited to this, may be color-coded
by a combination of arbitrary two or more colors.
30 Similarity between the file 63 and the comparison destination source code 41b,
which is most resemble to the relevant file 63, is shown by an area ratio of the "clone part"
relative to the whole area of the rotated bar graph 64. For example, in the case where similarity
of the file 63-1 is 95%, area ratio of the "clone part" relative to the whole area of the rotated bar

graph 64-1 is 95%.
W6847
- 19-
In this way, a user can easily look down upon similarity of two software
composed of a plurality of source code groups, therefore can easily discover software which is a
code clone. The user still more can easily discover a source code having many altered places,
and coup with troubles accompanied with the altered places.
5 Fig. 15 is a flow chart showing the processing of a similar specifications analysis
unit in the first embodiment.
When the processing is started, in the step S70, the similar specifications analysis
unit 30 initializes a variable i with 1. The variable i shows the ED of the comparison source
code Fi (41a) included in the comparison source code list 44a.
10 In the step S71, the similar specifications analysis unit 30 judges whether
specifications are present in the comparison source code Fi (41a) or not. In the present
embodiment, the similar specifications analysis unit 30 judges presence of specifications by
whether specifications of a file name, which has predetermined relationship to the file name of
the comparison source code Fi (41a), is present or not, in a relative path of the comparison
15 source code Fi (41a). The similar specifications analysis unit 30 performs the processing of the
step S78, in the case where the relevant judgment condition is satisfied (Yes), while, it performs
the processing of the step S72, in the case where the relevant judgment condition is not satisfied
(No).
In the step S72, the similar specifications analysis unit 30 judges whether
20 correspondence destination is present in the comparison source code Fi (41a) or not, based on the
correspondence list 45. The similar specifications analysis unit 30 performs the processing of
the step S73, in the case where the relevant judgment condition is satisfied (Yes), while, it
performs the processing of the step S74, in the case where the relevant judgment condition is not
satisfied (No).
25 In the step S73, the similar specifications analysis unit 30 sets the ID of the
comparison destination source code Fj (41b) having correspondence to the comparison source
code Fi (41a) to be the variant j to perform the processing of the step S75.
In the step S74, the similar specifications analysis unit 30 sets the ID of the
comparison destination source code Fj (41b) having maximum similarity to the comparison
30 source code Fi (41a) to be the variant j to perform the processing of the step S75.
In the step S75, the similar specifications analysis unit 30 registers a file name
and similarity of the comparison destination source code Fj (41b) in a similar file name and
similarity of the comparison source code Fi (41a).
In the step S76, the similar specifications analysis unit 30 registers the detailed
W6847
- 2 0 -
design specifications 42b of the comparison destination source code Fj (41b) in the detailed
design specifications of the comparison source code Fi (41a).
In the step S77, the similar specifications analysis unit 30 registers the test
specifications of the comparison destination source code Fj (41b) in similar test specifications of
5 the comparison source code Fi (41a).
In the step S78, the similar specifications analysis unit 30 judges whether the
variant i is equal to or larger than the maximum value imax (in the present embodiment, it is 5)
of the ID of the comparison source code list 44a, or not. The similar specifications analysis unit
30 ends the processing of Fig. 15, in the case where the relevant judgment condition is satisfied
10 (Yes), while, it performs the processing of the step S79, in the case where the relevant judgment
condition is not satisfied (No).
In the step S79, the similar specifications analysis unit 30 adds 1 to the variant i
I and returns to the processing of the step S71.
Fig. 16 is a drawing showing information on similar specifications in the first
15 embodiment.
The similar specifications information 49 has an ID column 49-1 for identifying
I comparison source code Fi (41a), a file name column 49-2 for storing file name information of
comparison source code Fi (41a), a detailed design specifications column 49-3 for storing a file
name of the detailed design specifications 42a, a test specifications column 49-4 for storing a file
20 name of the test specifications 43 a, and a similar file name column 49-5 for storing file name
information of the most similar comparison destination source code Fj (41b).
In the case where the detailed design specifications 42a has not been related to the
relevant comparison source code Fi (41a), file name information of the detailed design
specifications 42b of the comparison destination source code Fj (41b), which is most similar to
25 the relevant comparison source code Fi (41a), is described, in the parenthesis of the detailed
design specifications column 49-3.
In the case where the teat specifications 43a has not been related to the relevant
comparison source code Fi (41a), file name information of the test specifications 43b of the
comparison destination source code Fj (41b), which is most similar to the relevant comparison
30 source code Fi (41a), is described, in the parenthesis of the test specifications column 49-4.
In the test specifications column 49-4, still more, information on the file name of
the similar test specifications is described in the parenthesis. In the similar file name column
49-5, still more, information on similarity is described in the parenthesis.
By this similar specifications information 49, the user 200 can easily acquire the
I
I
W6847
- 2 1 -
a
detailed design specifications 42b and the test specifications 43b of the comparison destination
source code 41b contained in the comparison destination source code list 44b, together with
similarity of its comparison destination source code 41b.
(Effect of the first embodiment)
5 The first embodiment explained above has the following effects (A) to (G).
(A) The source code similarity evaluation unit 20 prepares the similarity relation
drawing 60 by evaluating the relation of two kinds of software, vidthout directing the
correspondence of individual software. In this way, a user 200 can easily look down upon
similarity of two software composed of a plurality of source code groups, therefore can easily
10 discover software which is a code clone.
(B) The comparison object specifying unit 23, when the comparison source code
41a has been judged to have correspondence to any of the comparison destination source code
41b of the comparison destination source code list 44b, specifies the comparison destination
source code 41b having correspondence, as a comparison object of the comparison source code
15 41a, so as to evaluate similarity. In this way, because it is enough to evaluate similarity of the
comparison source code 41a having correspondence only once, it can decrease calculation
I amount, and therefore end the similarity evaluation of a large scale software themselves, in a
short period of time.
(C) The correspondence analysis unit 22, in the case where the relative path name
20 and the file name of the comparison source code 41a coincide with the relative path name and
the file name of any of the comparison destination source code 41b of the comparison destination
source code list 44b, judges the coincident comparison destination source code 41b to have the
correspondence to the relevant comparison source code 41a. In this way, because
correspondence between the comparison source code 41a and the comparison destination source
25 code 41b is judged only by information obtained by searching the relevant folder, it can decrease
calculation amount, and therefore can end the similarity evaluation of a large scale software
themselves, in a short period of time.
(D) The output unit 15, in the case where the comparison source code 41a has not
correspondence to any of the comparison destination source code 41b of the comparison
30 destination source code list 44b, displays the highest similarity among similarities of the all
comparison destination source codes 41b of the comparison destination source code list 44b. In
this way, even when the file name of the comparison source code 41a, which is altered software,
has been changed, similarity can be displayed by selecting the most appropriate comparison
destination source code 41b.
W6847
-22-
(E) The output unit 15 displays by a tree diagram, where a root folder of the
comparison source software storage part 40a, where the comparison source code 41a is stored, is
regarded as a root node; each folder composing a relative path of the comparison source code
41a is regarded as an internal node; and each similarity of the comparison source code 41a is
5 regarded as a leaf node. In this way, similarity between folder composition of the comparison
source software storage part 40a and a comparison destiny of the source code file contained in
each folder can be displayed in an easy to understand way by the user 200.
(F) The source code similarity evaluation unit 20, in the similarity relation
drawing 60, displays the logical line number of the comparison source code 41a by area of the
10 rotated bar graph 64, as well as displays similarity between the comparison source code 41a and
the comparison destination source code 41b having the highest similarity by area ratio of either
color of a rotated bar graph 64 which is color-coded. In this way, the user 200 can easily
discover the comparison source code 41a having many altered places, and can coup with troubles
accompanied with the altered places
15 (G) The similar specifications analysis unit 30 prepares the similar specifications
information 49 automatically. In this way, the user 200 can easily acquire the detailed design
specifications 42b and the test specifications 43b of the comparison destination source code 41b
contained in the comparison destination source code list 44b, together with similarity of its
comparison destination source code 41b. The user 200 can fiirther judge in what degree the
20 detailed design specifications 42b and the test specifications 43 b can be diverted by similarity of
the comparison destination source code 41b.
(Modified Examples)
The present invention allows altered implementation, without limiting to the
above embodiment, within a range not to depart fi^om the gist of the present invention, including,
25 for example, the following (a) to (e).
(a) The source code similarity evaluation apparatus 10 of the present embodiment
analyzes software (project) present in the file database 12. However, v^thout limiting to this,
the source code similarity evaluation apparatus 10 may analyze software present in the external
file database 110, or may compare and analyze software present in the file database 12 and
30 software present in the external file database 110.
(b) The source code similarity evaluation apparatus 10 of the present embodiment
analyzes, by designating two folders, two kinds of software (project) present in the designated
two folders, respectively. However, without limiting to this, the source code similarity
evaluation apparatus 10 may analyze two kinds of software (project) relating to the designated
W6847
- 2 3 -
two project files, by designating two project files.
I
I (c) The source code similarity evaluation apparatus 10 of the present embodiment
judges presence of specifications by whether specifications of a file name, which has
predetermined relationship to the file name of the comparison source code Fi (41a), is present or
5 not, in a relative path of the comparison source code Fi (41a). However, without being limited
to this, the source code similarity evaluation apparatus 10 may judge presence of presence of
specifications by whether specifications which have been related to the comparison source code
Fi (41a) are present or not, in the project file.
(d) The output unit 15 of the source code similarity evaluation apparatus 10
10 represents similarity between the comparison source code 41a and the similar source code, by
area ratio of either color of the rotated bar graph 64 which is color-coded. However, without
being limited to this, the output unit 15 may represent similarity between the comparison source
code 41a and the similar source code, by a color or white/black shading or a pattern of the
rotated bar graph 64, and still more may represent it by an arbitrary graph type (pie chart, scatter
15 diagram, radar chart, bubble chart or the like) other than the rotated bar graph 64.
(e) The source code similarity evaluation apparatus 10 of the present invention
may be realized as a computer application program, and the relevant application program may be
recorded in a computer readable memory medium so as to provide it to a user.

W6847
-24-
CLAIMS:
1. A source code similarity evaluation apparatus (10), comprising:
a source code list extraction unit (21) which extracts a comparison source code
list (44a) showing composition of comparison source software, and a comparison destination
source code list (44b) showing composition of comparison destination software;
a correspondence analysis unit (22) which compares the comparison source code
list (44a) and the comparison destination source code list (44b) to analyze correspondence (45)
of both;
a comparison object specifying unit (23) which specifies a comparison destination
source code (41b), which is a comparison object of each comparison source code (41a) contained
in the comparison source code list (44a), from the comparison destination source code list (44b),
based on the correspondence (45);
I a similarity calculation unit (24) which calculates a similarity between the
comparison source code (41a) and the comparison destination source code (41b) specified by the
comparison object specifying unit (23);
a similarity evaluation unit (25) which judges a similar source code (41)
possessing the highest similarity among the comparison destination source codes (41b) specified
by the comparison object specifying unit (23); and
an output unit (15) which outputs the comparison source code (41a) and similarity
between the comparison source code (41a) and the similar source code (41), by making them
corresponded.
2. The source code similarity evaluation apparatus (10) according to claim 1,
comprising that:
the comparison object specifying unit (23), in the case where the comparison
source code (41a) had been judged to have the correspondence to any of the comparison
destination source code (41b) of the comparison destination source code list (44b), specifies the
comparison destination source code (41b) having the relevant correspondence as a comparison
object, and in the case where the comparison source code (41a) had been judged not to have the
correspondence to any of the comparison destination source code (41b) of the comparison
destination source code list (44b), specifies all of the comparison destination source codes (41b)
of the comparison destination source code list (44b) as comparison objects.
3. The source code similarity evaluation apparatus (10) according to claim 2,
comprising that:
the source code list extraction unit (21) further extracts the comparison source
i
i
W6847
-25-
code list (44a) by acquiring a relative path name and a file name of the comparison source code
(41a) composing the comparison source software, as well as extracts the comparison destination
source code list (44b) by acquiring a relative path name and a file name of the comparison
destination source code (41b) composing the comparison destination software; and
the correspondence analysis unit (22), in the case where the relative path name and the file name
I of the comparison source code (41a) coincide with the relative path name and the file name of
any of the comparison destination source code (41b) of the comparison destination source code
list (44b), judges that the coincident comparison destination source code (41b) has the
correspondence to the relevant comparison source code (41a).
4. The source code similarity evaluation apparatus (10) according to claim 1,
comprising that:
the similarity, which the output unit (15) outputs, is, in the case where the
comparison source code (41a) has the correspondence to any of the comparison destination
source code (41b) of the comparison destination source code list (44b), the similarity with the
comparison destination source code (41b) having the correspondence.
5. The source code similarity evaluation apparatus (10) according to claim 1,
comprising that:
the similarity, which the output unit (15) outputs, is, in the case where the
comparison source code (41a) has not the correspondence to any of the comparison destination
source code (41b) of the comparison destination source code list (44b), the highest similarity
among the similarities with all comparison destination source codes (41b) of the comparison
destination source code list (44b).
6. The source code similarity evaluation apparatus (10) according to claim 1,
comprising that:
the output unit (15) displays by a tree diagram where a root folder (62) of the
comparison source software is regarded as a root node; each folder located in a relative path of
the comparison source code (41a) is regarded as an inner node; and the comparison source code
(41a) and the similarity between the comparison source code (41a) and the similar source code
(41) are regarded as a leaf node.
7. The source code similarity evaluation apparatus (10) according to claim 1,
comprising that:
the output unit (15) displays the similarity between the comparison source code
(41a) and the similar source code (41) by area ratio of either color of a bar graph which is
classified by color, as well as displays the logical line number of the comparison source code
W6847
-26-
(41a) by total area of the bar graph.
8. The source code similarity evaluation apparatus (10) according to claim 1, further
comprising, in the case where specifications have not been related to the comparison source code
(41a), the similar specifications analysis unit (30) which provides specifications of the similar
source code (41) as the similar specifications.
9. A source code similarity evaluation method, comprising:
a step which extracts a comparison source code list (44a) showing composition of
comparison source software, and a comparison destination source code list (44b) showing
composition of comparison destination software;
a step which compares the comparison source code list (44a) and the comparison
destination source code list (44b) to analyze correspondence of both;
a step which specifies a comparison destination source code (41b), which is a
comparison object of each comparison source code (41a) contained in the comparison source
code list (44a), fi"om the comparison destination source code list (44b), based on the
correspondence;
a step which calculates similarity between the comparison source code (41a) and
the comparison destination source code (41b) specified by the comparison object specifying unit
(23);
a step which judges a similar source code having the highest similarity among the
comparison destination source codes (41b) specified by the comparison object specifying unit
(23); and
a step which outputs the comparison source code (41a) and similarity between the
comparison source code (41a) and the similar source code, by making them correspond.
10. A source code similarity evaluation apparatus, substantially as herein described
with reference to accompanying drawings and examples.
11. A source code similarity evaluation method, substantially as herein described with
reference to accompanying drawings and examples.

Documents