Abstract: The present invention relates to a process analysis device for analyzing the process executed by an information processing device and extracting a cipher logic such as the cipher function or decoding function utilized in the process. The process analysis device is provided with: an execution trace acquisition unit for acquiring the execution trace of the process to be analyzed; a block extraction unit for extracting from the execution trace a block that is the processing unit indicating a loop structure; a block information extraction unit for extracting block information including input information and output information; and a block information analysis unit for generating characteristic determination information for determining the characteristic of the input output relation of a block by using the input information or output information of the block information analyzing the input output relation of the block by using the characteristic determination information and determining that a block indicating the characteristic of the input output relation of the cipher function or decoding function is a cipher logic.
DESCRIPTION
Title of Invention
PROCESS ANALYSIS APPARATUS, PROCESS ANALYSIS METHOD, AND 5 PROCESS ANALYSIS PROGRAM Technical Field
[0001] The present invention relates to a process analysis apparatus which analyzes a process executed in an information processing unit and extracts encryption logic, such as an encryption function or a decryption function,
10 used in the process. Background Art
[0002] A "targeted attack", called Advanced Persistent Threat (APT), has become noticeable recently as a new security threat that targets and makes persistent attacks on a specific organization. APT infects a terminal of a
15 targeted organization with malware through email, and the infecting malware communicates with the server of the attacker outside and download new attack programs or transmit confidential information in the system of the organization. To detect such a security incident at an early stage, and prevent damage from spreading, a "Security Operation Center" (SOC) service
20 is needed to monitor various logs in network devices and detect suspicious
signs. If an incident is detected, the organization has to carry out an incident response including investigation into the cause of the incident and damage, studies on countermeasures, restoration of the service, implementation of preventive measures of recurrence, and the like. Furthermore, depending on
25 the client or business partner of the organization, the organization also needs
to clarify what has been leaked out and what has not been leaked out of the confidential information.
[0003] Network forensics play an important role for the organization to investigate the cause of the incident and the damage. Network forensics 5 analyze a log generated by a personal computer, a server, a network device or the like, or a packet recorded on a network, and investigate the intrusion route of malware, an infected terminal, accessed information, attacker's commands, information transmitted outside, and the like. Malware, however, uses cryptographic technologies to keep communications secret, these days.
10 Therefore, to identify, by tracing, commands transmitted from an attacker and information transmitted outside has become difficult if the organization implements network forensics.
[0004] To address this issue, the encryption logic and key that have been used by the malware for keeping the communication secret need to be identified to
15 decrypt the encrypted communication. Usually, in this process, the binary of malware programs need to be analyzed. Existing encryption logic extraction methods mostly specify the encryption logic and key by searching the execution trace obtained when malware is executed, for a typical characteristic of encryption logic, like the malware analysis system disclosed
20 in Patent Document 1, for example. Among binary analysis technologies of malware programs, the technologies disclosed in Non-Patent Documents 1 through 9 are known. Citation List Patent Literature
25 [0005] Patent Document 1: JP 2013-114637 A
Non-Patent Literature
[0006] Non-Patent Document 1: Noe Lutz, Towards Revealing Attacker's Intent by Automatically Decrypting Network Traffic, Master Thesis MA-2008-08. 5 Non-Patent Document 2: Zhi Wang, Xuxian Jiang, Weidong Cui, Xinyuan Wang and Mike Grace, ReFormat: automatic reverse engineering of encrypted messages, Proceedings of the 14th European Conference on Research in Computer Security. Non-Patent Document 3: Felix Matenaar, Andre Wichmann, Felix Leder and
10 Elmar Gerhards-Padilla, CIS: The Crypto Intelligence System for Automatic Detection and Localization of Cryptographic Functions in Current Malware, Proceedings of the 7th and Unwanted Software (Malware 2012). Non-Patent Document 4: Xin Li, Xinyuan Walnternational Conference on Malicious ng, Wentao Chang, CipherXRay: Exposing Cryptographic
15 Operatens and Transient Secrets from Monitored Binary Execution, IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING (preprint) 2012.
Non-Patent Document 5: Felix Grobert, Carsten Willems, and Thorsten Holz, Automated Identification of Cryptographic Primitives in Binary Programs,
20 Proceedings of the 14th International Conference on Recent Advances in Intrusion Detection.
Non-Patent Document 6: Joan Calvet, Jose M. Fernandez, Jean-Yves Marion, Aligot: Cryptographic Function Identification in Obfuscated Binary Programs, Proceedings of the 19th ACM Conference on Computer and Communications
25 Security, CCS 2012.
Non-Patent Document 7: Intel, Pin - A Dynamic Binary Instrumentation Tool,
https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentatio
n-tool
Non-Patent Document 8: Bitblaze, TEMU: The BitBlaze Dynamic Analysis 5 Component, http://bitblaze.cs.berkeley.edu/temu.html
Non-Patent Document 9: Jordi Tubella and Antonio Gonzalez, Control
Speculation in Multithreaded Processors through Dynamic Loop Detection, In
Proceedings of the Fourth International Symposium on High-Performance
Computer Architecture, pp.14-23, 1998. 10 Summary of Invention
Technical Problem
[0007] .In the conventional technology typified by Patent Document 1, a lot
of irrelevant types of logic are extracted as encryption logic candidates. The
problem is that the malware analyst has to get rid of irrelevant logic manually,, 15 which requires a great deal of time and effort. Therefore, there is a need for
a highly accurate encryption logic extraction technology to suppress
extracting irrelevant logic.
[0008] The present invention is directed to solving problems such as that
described above. An objective of the invention is to specify, with accuracy, 20 encryption logic used by malware, by analyzing the execution trace of the
malware, based on the characteristic of the encryption logic used by the
malware for encrypting files and communications.
Solution to Problem
[0009] To solve the problems described above, a process analysis apparatus of 25 the present invention may include:
an execution trace acquisition section to acquire an execution trace of a process to be analyzed;
a block extraction section to extract, from the execution trace, a block
. that is a processing unit indicating a loop structure;
5 a block information extraction section to extract, from the block,
block information including input information and output information; and a block information analysis section to:
generate characteristic determination information for determining a characteristic of an input/output relation of the block, using one 10 of the input information and the output information of the block information;
analyze the input/output relation of the block, using the characteristic determination information: and
determine the block which indicates a characteristic of an input/output relation of one of an encryption function and a decryption 15 function, as encryption logic.
Advantageous Effects of Invention
[0010] The present invention has an advantageous effect of specifying, with accuracy, encryption logic used by malware, by generating characteristic determination information for determining the characteristic of the 20 input/output relation of a block extracted from the execution trace; analyzing the input/output relation of the block, using the characteristic determination information; and determining a block indicating the characteristic of an input/output relation of an encryption function or a decryption function, as encryption logic. 25 Brief Description of Drawings
[0011] Fig. 1 is a configuration diagram illustrating an example of the
configuration of a process analysis apparatus according to a first embodiment.
Fig. 2 is a flow chart illustrating a pfocess flow of process analysis by
the process analysis apparatus according to the first embodiment.
5 Fig. 3 is a diagram illustrating an example of a definition list 105.
Fig. 4 is a diagram illustrating an example of a data format in which input information and output information are stored.
Fig. 5 is a diagram illustrating a characteristic of a printable character
string included in input/output information of an encryption function.
10 Fig. 6 is a configuration diagram illustrating an example of the
configuration of a process analysis apparatus according to a first working example.
Fig. 7 is a flow chart illustrating a flow of character-string rate determination by a character-string rate determination section 160 according 15 to the first working examle.
Fig. 8 is a diagramt illustrating an example of a character code table stored in a character code table DJ3 162.
Fig. 9 is a diagram illustrating an example (1) of how to use an
encryption function of malware.
20 Fig. 10 is a configuraton diagram illustrating an example of the
configuration of a process analysis apparatus according to a second working example.
Fig. 11 is a flow chart illustrating a flow of decoding determination by a data decoding setion 170 according to the second working example.
Fig. 12 is a diagram illustrating an example (2) of how to use an encryption! function of malware.
Fig. 13 is a configuration diagram illustrating an example of the configuration of a process analysis apparatus according to a third working 5 example.
Fig. 14 is a flow chart illustrating a flow of data decompression determination by a data decompression section 180 according to the third working example.
Fig. 15 is a diagram illustrating a basic definition of cryptography.
10 Fig.16 is a configuration diagram illustrating an example of the
configuration of a processs analysis apparatus according to a fourth working example.
Fig. 17 is a flow chart illustrating a flow of virtual execution determination by a virtual execution section 190 according to the fourth 15 working example.
Fig. 18 is a flow chart illustrating a a flow (first half) of virtual execution analysis by the virtual execution section 190 according to the fourth working example.
Fig. 19 is a flow chart illustrating a flow (last half) of the virtual 20 execution analysis by the virtual execution section 190 according to the fourth working example. Description of Embodiments [0012] First Embodiment
Fig. 1 is a configuration diagram illustrating an example of the 25 configuration of a process analysis apparatus according to a first embodiment.
Referring to Fig. 1, a process analysis apparatus 100 includes an execution trace acquisition section 110, a block extraction section 120, a block information extraction section 130, a block information analysis section 140 and an analysis result output section 150. 5 [0013] The process analysis appartus 100 is a device for binary analysis of a malware program. The process analysis appratus 100 is a computer in which a CPU (Central Processing Unit) connects, via a bus, to hardware devices such as a ROM, a RAM, a communicatoin board, a display, a keyboard, a mouse, a magnetic disc device, and the like, for example. The process analysis
10 apparauts 100 is also provided with a virtual machine on the CPU, which provides an execution environment for executing a malware program. [0014] The execution trace acquisition section 110 executes an execution file 101 to be analyzed in the execution environment of the virtual macine, and acquires an execution trace 102 which is the log inforamtoin of the executed
15 proess and process information 103 which is varioius types of information on the executed process.
[0015] The block execution section 120 extracts blocks, which are basic composition elements of a program, from the execute trace 102 acquired by the execution trace acquisition section 110, and outputs a block list 104 which
20 is a list of extracted blocks. The block execution section 120 also extracts information required for block information analysis, which will be described later, from each row of the execution trace 102, and outputs a definition list 105. [0016] The block information extraction section 130 extracts block
25 information including the input/output information to be executed in the block,
from the execution trace 102, the block list 104 and the definition list 105, and then outputs a block information list 106.
[0017] The block information analysis section 140 analyses whether or not the block to be analyzed is of encryption logic, using the block information 5 list 106 outputted by the block information extraction section 130, and outputs an analysis result list 107.
[0018] The analysis result output section 150 outputs the content of the analysis result list 107 obtained through analysis by the block information analysis section 140 to a display, for example, for the analyst to see.
10 [0019] Process analysis by the process analysis apparats 100 is discussed below with reference to the flow chart of Fig. 2.
Fig. 2 is a flow chart illustrating a flow of process analysis by the process analysis apparatus according to the first embodiment. [0020] First, in Step SU0, the execution trace acquisition section 110
15 executes the program of the execution file 101 to be analyzed, in the analysis environment on the virtual machine or the like. The execution trace acquisition section 110 monitors the process of the executed program, and records the execution trace 102 of the process. The following pieces of information are recorded in the execution trace 102, for example:
20 - Address of the executed command;
Address (operation code, operand) of the executed command; - Accessed register and the value thereof; and
Address, value and mode (READ/WRITE) of the accessed memory. [0021] Among methods for acquiring the execution trace are a method using
25 Dynamic Instrumentation Tool, such as Pin, described in Non-Patent
Document 7, and a method using an emulator, such as TEMU, described in Non-Patent Document 8, for example. The execution trace acquisition section 110 acquires the execution trace, based on any of those existing methods. 5 [0022] At the same time as acquiring the execution trace 102, the execution trace acquisition section 130 extracts, as the process information 103, information about the DLL or function that has been loaded into the process that has acquired the execution trace 102. The following are pieces of information recorded in the process information 103, for example: 10 - Base address of the process;
Name, address and size of DLL that has been loaded into the process; and Name and address of API that has been exported from DLL
A typical and practical example of the process information 103 is the PE header of the process that has been loaded to the memory. 15 [0023] In Step SI20, the block extraction section 120 extracts, from the execution trace 102, a block which is a basic composition element of the program. Herein, the block is a function, a loop, loops in concatenation, or the like, and is provided with the following pieces of information indicating each of those: 20 - Block ID; Block type:
Block beginning address; Block end address; and In-block command string (acquired from a memory image of the process).
The block extraction section 120 manages the above block information for extracted blocks, as the block list 104.
[0024] Each piece of the information representing the block is described
below.
5 As the block ID, a unique value in the block list is set. As the block
type, the outmost logic (function, loop, loops in concatenation) making up the block is set. The block beginning address indicates the number identifying the location in the memory used by the process, at which the block has started. The block end address indicates the number, identifying the location in the
10 memory used by the process, at which the block has ended. The in-block command string is a command sequence within the range from the beginning address to the end address in the memory used by the process. [0025] The block extraction section 120 specifies a function, by tracing, in the execution trace 102, a relation between a function call command such as
15 "call" and a return command such as "ret". The block extraction section 120 also specifies a loop, by tracing, in the execution trace, repetitions of a command pattern and Backward Jump. The block extraction section 120 also specifies loops in concatenation, by tracing, in the execution trace, an input/output relation between loops. With regard to the extraction of the
20 block list 104, the technology disclosed in Non-Patent Document 5, 6 or 9 may be used, for example.
[0026] Further, in Step SI20, the definition list 105 is generated as the information required in the steps discussed later.
Fig. 3 is a diagram illustrating an example of the definition list 105.
Referring to Fig. 3, the definition list 105 is a tabic of the following pieces of information recorded while the block extraction section 120 is reading the execution trace 102 row by row:
- Row number of the execution trace;
5 - Address, in the same row, at which a command has been executed
Storage area (register, memory), in the same row, in which a change has been made; New value; and
- Value size.
10 [0027] In Step S130, the block information extraction section 130 extracts block information from the execution trace 102 and the block (block list 104, definition list 105), and outputs the block information list 106. In the block information list 106, herein, block information including the following pieces of information is registered as an element: 15 - Block ID;
Input information; - Output information; and Context. [0028] Each piece of the information representing the block information is 20 described below.
The block ID is the information for association with a block registered in the block list 104.
The input information is the information that satisfies the following conditions, in the execution trace 102: 25 - Information defined prior to execution of a block; and
Information read prior to overwriting, during execution of the block.
The output information is the information that satisfies the following condition, in the execution trace 102:
Last information written into the storage area (register or memory) during
5 execution of the block.
The context is the information for indicating the timing at which a block has been executed in the execution trace 102.
[0029] Extractions of the input information, the output information, and the context are described below in detail. 10 [0030] First, the input information is extracted as follows.
The block information extraction section 130 analyzes the execution trace 102 row by row. Assume that the command address of the execution trace 102 being watched falls within the range from the beginning address to the &nd address of a block Bl registered in the block list 104. The block 15 information extraction section 130 further analyzes the execution trace 102. Assume that a command has been executed at an address X within the range of the block Bl and that a specific storage area has been read by executing READ by the command. The block information extraction section 130 analyzes the definition list 105 to confirm whether the specific storage area 20 has been written by WRITE executed by a command at an address before the address X, within the range of the block Bl. When WRITE has not been executed, the block information extraction section 130 determines the specific storage area as the input information.
[0031] When READ has been executed for an adjacent memory area by the 25 same command at the same address, in the execution trace 102, it is highly
likely that the adjacent memory area has been accessed as a buffer, and therefore the adjacent memory area is also determined as the input information. In other words, the input information includes the beginning address and size of the adjacent memory area, and a byte sequence stored in 5 the adjacent memory. The type of the input information, "buffer", is also recorded. With regard to the input information which is obtained by executing READ by the command at the same address, and whose value has been incremented or decremented, the type of the input information is determined as "counter". The type of the input information which is used as
LO a loop-end condition in a loop, or used as an initial value of a counter, is determined as "end condition". [0032] The output information is extracted as follows.
Assume that the block information extraction section 130 is analyzing the execution trace 102 of a block Bl. Also assume that the block
.5 information extraction section 130 further analyzes the execution trace 102, and that the trace has gone beyond the range of the block Bl. In this case, the block information extraction section 130 determines, by analyzing the definition list 105, the information which is written by executing WRITE by the command within the range of the block, as the output information. When
0 WRITE has been executed multiple times for the same storage area, one with the greatest row number in the execution trace 102, the latest one, is determined as the output information.
[0033] Similarly to the input information, when an adjacent memory area has been written by executing WRITE by the same command at the same address,
5 in the execution trace 102, it is highly likely that the adjacent memory area
has been accessed as buffer. Therefore, the adjacent memory area is also determined as the output information. In other words, the output information includes the beginning address and size of the adjacent memory area, and a byte sequence stored in the adjacent memory. The type of the output 5 information, "buffer", is also recorded. With regard to the output
information which is written by executing WRITE by the command at the same address, and whose value has been incremented or decremented, the type of the output information is determined as "counter". The type of the output information that is used as a loop-end condition in a loop, or used as the
10 initial value of a counter, is determined as "end condition".
[0034] A data format in which the input information and the output information are stored is described below.
Fig. 4 is a diagram illustrating an example of the data format in which the input information and the output information are stored.
15 Referring to Fig. 4, the storage area (beginning address), the value
(byte sequence), the size (byte) and the information type are stored as information relating to the input information or the output information. [0035] The context is extracted as follows.
The context is the one to represent a call relation (nest relation)
20 between blocks. Assume that BI, B2, B3, B4, B5, B6, B7 and BS are blocks, for example. Then, assume that after the execution of BI is completed, B2 is executed; B3 and B4 are executed within B2; and B5 is executed after the execution of B2 is completed. Further assume that B6 is executed within B5, and B7 is executed within B6; and then B8 is executed after the execution of
25 B6 and the execution of B7 are completed.
[0036] In the call relation between blocks, the contexts of Bl, B2, B5 and B8
are expressed by 1, 2, 3 and 4, respectively. The contexts of B3 and B4
executed within B2 are expressed by 2.1 and 2.2, respectively. Likewise, the
context of B6 executed within B5 is expressed by 3.1. The context of B7 5 executed within B6 is expressed by 3.1.1. Expressing the contexts as
described allows calls for the same block (same block ID) to be distinguished
according to the place of call.
[0037] Alternatively, the context may be expressed in any format if the call
relation (nest relation) between blocks can be expressed. 10 [0038] The block information extraction section 130 determines, while
analyzing the execution trace 102, whether the block has come to an end, or a
new block is called within the block, as follows:
In the case of a jump outside the range of the block by a command (e.g.,
"jmp", "jne" or "ret") other than the function call (e.g., "call" or "enter")
15 while the execution trace 102 of the block is analyzed, it is indicated that
the block has come to an end.
In the case of a jump outside the range of the block by the function call
(e.g., "call" or "enter") while the execution trace 102 of the block is
analyzed, it is considered that a new block is called within the block,
20 without ending the block.
In a case other than those, where the execution trace 102 has gone beyond the range of a block while the execution trace 102 of that block is analyzed, it is indicated that the block has come to an end
[0039] A description now returns to the flow chart of Fig. 2.
In Step S140, the block information analysis section 140 analyzes block information in the block information list 106, and specifies encryption logic. In the block information analysis, the block information analysis section 140 generates characteristic determination information for 5 determining a characteristic of the input/output relation of blocks, using the input information or output information of the block information, analyzes the input/output relation of the block, using this characteristic determination information, and determines a block indicating the characteristic of the input/output relation of an encryption function or a decryption function, as 10 encryption logic. The characteristic determination information generated by the block information analysis section 140 and the method for determining encryption logic using the characteristic determination information will be described in detail in a first working example through a fourth working example, discussed later. As a result of analysis of the block information, 15 the block information analysis section 140 outputs the analysis result list 107 including the determination result of encryption logic.
[0040] Finally, in Step SI50, the analysis result output section 150 organizes an analysis result based on the block list 104, the block information list 106 and the analysis result list 107, and outputs the following pieces of 20 information:
Beginning address of encryption logic Input information (storage area, value, size) Output information (storage area, value, size) Beginning address of decryption logic
[0041] Thus, the invention according to the first embodiment has an advantageous effect of specifying, with accuracy, encryption logic used by malware, by generating the characteristic determination information for determining the characteristic of the input/output relation of the block 5 extracted from the execution trace; analyzing the input/output relation of the block, using the characteristic determination information; and determining the block indicating the characteristic of the input/output relation of the encryption function or the decryption function, as encryption logic. [0042] The first working example to the fourth working example are
10 discussed below for implementing the first embodiment in detailed and exact ways. [0043] First working example
Usually, an input to an encryption function is plaintext and an output from the encryption function is a random byte sequence. Given this fact, the
15 following characteristics can be seen: there is a high rate of printable
character strings in an input to the encryption function and there is a low rate of printable character strings in an output from the encryption function.
Fig. 5 is a diagram illustrating the characteristics of printable character strings included in the input/output information of an encryption
20 function.
Referring to Fig. 5, "Hello", which is to be inputted, is a printable character string, and "A" is a printable character string and "■" denotes an unprintable character string of the outputted information. A first working example describes an example of specifying encryption logic by utilizing the
25 characteristic of the printable character string included in the input/output
information of the encryption function like the one described above, as characteristic determination information.
[0044] Fig. 6 is a configuration diagram illustrating an example of the configuration of the process analysis apparatus according to the first working 5 example.
Referring to Fig. 6, the block information analysis section 140 includes a character-string rate determination section 160, a character code determination algorithm database (hereafter, the database is denoted by DB) 161, and a character code table DB 162. The block information analysis
10 section 140 receives the block information list 106, performs a
character-string rate determination process, and outputs the analysis result list 107. The character-string rate determination section 160 determines a printable character-string rate of the input information of the block information inputted from the block information analysis section 140 to
15 specify encryption logic, and outputs an encryption logic list 1 (163)
including the specified encryption logic, to the block information analysis section 140.
[0045] Alternatively, the character-string rate determination section 160, the character code determination algorithm DB 161, and the character code table
20 DB 162 may be included in the block information analysis section 140.
[0046] A flow of the character-string rate determination process of the first working example is discussed below with reference to Fig. 7,
Fig. 7 is a flow chart illustrating a flow of a character-string rate determination process of the character string rate determination section 160
25 according to the first working example.
[0047] First, in Step SI601, the character-string rate determination section 160 initializes the encryption logic list 1 (163). The encryption logic list 1 (163) is the block information list 106 which stores the block information that is determined as an encryption logic candidate by the character-string rate 5 determination section 160.
[0048] In Step SI602, the character-string rate determination section 160 confirms whether there is the next element (block information) in the block information list 106. If there is no block information of the next element, the process comes to an end, proceeding through the branch of "No". If there is 10 the block information of the next element, a next element Bi is selected in Step S1603.
[0049] In Step SI604, the character-string rate determination section 160 determines the rate of the printable character strings of the input information of the block information. The printable character string here is a printable
15 character string of a chain of c letters ending with a new-line character or a null character. The character-string rate determination section 160 determines the character code of the byte sequence set to the value, for the input information whose information type is set to "buffer". This character code determination is executed by utilizing algorithms registered in the
20 character-code determination algorithm DB 161. When the character code is determined, a corresponding character code table can be obtained from the character code table DB 162, and thereby the printable character can be confirmed.
Fig. 8 is a diagram illustrating an example of the character code table
25 stored in the character code table DB 162.
Fig. 8 shows an example where character codes are stored in
association with Japanese "hiragana" characters.
[0050] The character-string rate determination section 160 calculates the
printable character-string rate by dividing the total sum of the printable 5 character-string lengths of the printable character strings obtained from a byte
sequence of the input information, by the same byte sequence length. Note
that the character-string length is calculated with a multi-byte character as 2
bytes.
[0051] In Step S1605, the character-string rate determination section 160 10 determines the rate of the printable character strings of the output information
in the block information. The procedure here is the same as that of Step
S1604.
[0052] In Step S1606, the character-string rate determination section 160
calculates '"printable character-string rate of input' - 'printable 15 character-string rate of output'", as a difference between the printable
character-string rate nf an input and the printable character-string rate of an
output.
[0053] In Step SI607, when the difference calculated in Step SI606 is at or
above a threshold 6, the character-string rate determination section 160 adds 20 the same block information Bi to the encryption logic list 1 (163). Note that
c and 6 given above are adjustable parameters.
[0054] Alternatively, in Step SI 604, a file-type examination of the input
information may also be performed. If the input is a file in a known file
format such as WORD or PDF, text information is extracted according to that 25 specific file format, and the printable character-string rate is calculated just
for that text information. This allows the printable character-string rate of the input information to be calculated appropriately even if the input information is a type of the file, such as a WORD file or a PDF file, obtained by encoding text into a special format. The file-type examination can be 5 performed by utilizing a known tool. [0055] Second working example
Malicious programs such as malware can encode (e.g., Base64 encoding) encrypted data into printable data and then transmit the printable data on the Internet.
10 Fig. 9 is a diagram illustrating an example (1) of how malware uses an
encryption function.
Referring to the example shown in Fig. 9, malware encodes, through Base64 encoding, ciphcrtext obtained by encrypting a message through an encryption function, and transmits the encoded ciphertext on the Internet
15 through HTTP transmission. Given this fact, it is possible that the output
information of a block is decoded by a known decoder (e.g., Base64 decoder), and when the decoding succeeds, then the block whose output information is equivalent to the decoded value is determined as an encryption logic candidate. A second working example describes a working example of specifying
20 encryption logic by utilizing a characteristic of encoding included in the input/output information of an encryption function like the one described above, as the characteristic determination information. [0056] Fig. 10 is a configuration diagram illustrating an example of the configuration of a process analysis apparatus according to the second working
25 example.
Referring to Fig. 10, the block information analysis section 140 includes a data decoding section 170 and an encoding/decoding algorithm DB 171. The block information analysis section 140 receives the block information list 106, performs decoding determination, and outputs the 5 analysis result list 107. The data decoding section 170 also decodes the output information of the block information inputted from the block information analysis section 140, and when the decoding succeeds, outputs an encryption logic list 2 (172) including an encryption logic candidate which is a block whose output information is equivalent to the decoded value, to the
10 block information analysis section 140.
[0057] Alternatively, the data decoding section 170 and the encoding/decoding algorithm DB 171 are included in the block information analysis section 140. [0058] A flow of a decoding determination process of the second working
15 example is discussed below with reference to Fig. 11.
Fig. 11 is a flow chart illustrating a flow of the decoding determination process of the data decoding section 170 according to the second working example. [0059] First, in Step S1701, the data decoding section 170 initializes the
20 encryption logic list 2 (172). The encryption logic list 2 (172) is the block information list 106 that stores the block information determined as an encryption logic candidate in the data decoding section 170. [0060] In Step SI702, the data decoding section 170 confirms whether there is the next element (block information) in the block information list 106.
25 When there is no block information of the next element, the process comes to
an end, proceeding through the branch of No. When there is the block information of the next element, the next element Bi is selected in Step S1703. [0061] In Step S1704, the data decoding section 170 decodes the output information of the block information by utilizing a known decoding algorithm. 5 Known decoding algorithms are stored in the encoding/decoding algorithm DB 171. The data decoding section 170 decodes the output information whose information type is "buffer".
[0062] In Step S1705, the data decoding section 170 determines whether or not the decoding has succeeded. When the decoding succeeds with one of 10 decoding algorithms stored in the encoding/decoding algorithm DB 171, the data decoding section 170 holds the decoding result and then proceeds to Step S1707 through the branch of Yes.
[0063] In Step S1707, the data decoding section 170 searches the block information list 106 to the block whose output information matches the
15 decoding result held in Step SI705. Alternatively, the search may be limited for the block whose context is older than that of the block Bi, for efficient processing. When detecting the block Bj (i # j) whose output information that matches the decoding result as a result of searching the block information list 106, the data decoding section 170 adds the block information of the block
20 Bj to the encryption logic candidate list 2 (172), in Step S1708. [0064] Third working example
Malicious programs such as malware can compress data before the data is encrypted.
Fig. 12 is a diagram illustrating an example (2) of how malware uses
25 an encryption function.
Referring to the example shown in Fig. 12, malware compresses a message through a compression function before the message is inputted to an encryption function, and transmits ciphertext obtained by encrypting this compressed data through the encryption function, on the Internet through 5 HTTP transmission. Given this fact, it is possible to try decompressing the input information of the block with a known decompression algorithm (e.g., zip, izh), and when the decompression succeeds, then the block is determined as an encryption logic candidate. A third working example describes a working example of specifying encryption logic by utilizing the characteristic
10 of the compression process included in the input/output information of an encryption function, as the characteristic determination information. [0065] Fig. 13 is a configuration diagram illustrating an example of the configuration of a process analysis apparatus according to the third working example.
15 Referring to Fig. 13, the block information analysis section 140
includes a data decompression section 180 and a compression/decompression algorithm DB 181. The block information analysis section 140 receives the block information list 106, performs data-decompression determination, and outputs the analysis result list 107. The data decompression section 180
20 decompresses the input information of the block information inputted from the block information analysis section 140. When the decompression succeeds, the data decompression section 180 determines the block as an encryption logic candidate, and outputs an encryption logic list 3 (182) including the encryption logic candidate to the block information analysis section 140.
[0066] Alternatively, the data decompression section 180 and the
compression/decompression algorithm DB 181 may be included in the block
information analysis section 140.
[0067] A flow of a data decompression determination process according to 5 the third working example is discussed with reference to Fig. 14.
Fig. 14 is a flow chart illustrating a flow of a decompression
determination process of the working example 3.
[0068] First, in Step S1801, the data decompression section 180 initializes
the encryption logic list 3 (182). The encryption logic list 3 (182) is the 10 block information list 106 that stores the block information determined as
encryption logic candidate in the data decompression section 180.
[0069] In Step SI802, the data decompression section 180 confirms whether
there is the next element (block information) in the block information list 106.
When there is no block information of the next element, the process comes to 15 an end, proceeding through the branch of No. When there is the block
information of the next element, the next element Bi is selected in step S1803.
[0070] In Step S1804, the data decompression section 180 decompresses the
input information of the block information by utilizing a known
decompression algorithm. Known decompression algorithms are stored in 20 the compression/decompression decoding algorithm DB 181. The data
decompression section 180 decompresses the input information whose
information type is "buffer".
[0071] In Step SI 805, the data decompression section 180 determines
whether or not the decompression has succeeded. When the decompression 25 succeeds with one of the decompression algorithms stored in the
compression/decompression algorithm DB 181, the data decompression section 180 proceeds to SI 806 through the branch of Yes. [0072] In Step S1806, the data decompression section 180 adds the block information of the block Bj to the encryption logic candidate list 3 (182). 5 [0073] Fourth working example
In accordance with the basic definition of cryptography, it is obvious that ciphertext obtained by encrypting a message (plaintext) with a key can be decoded with the same key to obtain the original message. Therefore, m-Dec (k, Enc (k, m)) is satisfied where an encryption function, a decryption 10 function, a key and plaintext, are represented by "Enc", "Dec", "k", and "m" denote, respectively.
Fig. 15 is a diagram illustrating the basic definition of cryptography.
Referring to Fig. 15, when plaintext "Hello" is encrypted with a key to obtain ciphertext "■ • • A ■", the plaintext "Hello" of the original message can 15 be obtained by decrypting the ciphertext "• • ■ A •" with the same key.
In accordance with the basic definition of cryptography described above, when part (ciphertext, assumingly) of an output of a block "f" is used as part of an input of another block "g", and then "g" is processed, if the output of "g" matches the input (plaintext, assumingly) off", then it is highly 20 likely that "f" is an encryption function and "g" is a decryption function.
Given this fact, if a pair of blocks is selected, and the input information and the output information of those blocks are processed based on the basic definition of cryptography, then the pair of blocks can be determined as an encryption logic candidate. A fourth working example describes a working 25 example of specifying encryption logic by finding out the pair of blocks that
satisfies the relation of the basic definition of cryptography, and then utilizing the characteristic of the input/output information based on the basic definition of cryptography, as the characteristic determination information. [0074] Fig. 16 is a configuration diagram illustrating an example of the 5 configuration of a process analysis apparatus according to the fourth working example.
Referring to Fig. 16, the block information analysis section 140 includes a virtual execution section 190.
The block information analysis section 140 receives the block
10 information list 106, performs a virtual execution determination process, and outputs the analysis result list 107. The virtual execution section 190 performs virtual execution on a pair of blocks, based on the basic definition of cryptography, by utilizing the input information and output information of the block information inputted from the block information analysis section 140.
15 When the virtual execution succeeds, the virtual execution section 190 determines the pair of blocks as an encryption/decryption function pair candidate, and outputs an encryption/decryption function pair list 191 including the encryption/decryption function pair candidate, to the block information analysis section 140.
20 [0075] Alternatively, the virtual execution section 190 may be included in the block information analysis section 140.
[0076] A flow of a virtual execution determination process according to the fourth working example is discussed below with reference to Fig. 17.
Fig. 17 is a flow chart illustrating a flow of a virtual execution determination process of the virtual execution section 190 according to the fourth working example.
[0077] First, in Step S1901, the virtual execution section 190 merges 5 previously extracted encryption logic candidates to generate an encryption logic list 4. As the previously extracted encryption logic candidates, the encryption logic lists 1 to 3, which are the encryption logic candidates determined in the first working example to the third working example, are used, for example. Further in generating the encryption logic list 4, if an
10 encryption logic candidate is duplicated, the duplicated logic candidates are unified
[0078] In Step SI902, the virtual execution section 190 initializes the analysis result list 107. The analysis result list 107 is a list of pairs of pieces of the block information determined as a pair of encryption logic and
15 decryption logic, in the virtual execution section 190.
[0079] In Step SI903, the virtual execution section 190 confirms whether there is the next element (block information) in the encryption logic list 4. When there is no block information of the next element, the process comes to an end, proceeding through the branch of No. When there is the block
20 information of the next element, the next element Bi is selected in Step SI 904. [0080] In Step SI905, the virtual execution section 190 performs virtual execution analysis using the basic definition of cryptography by utilizing the output information of the next element Bi. The process of virtual execution analysis will be described later in detail.
[0081] In Step S1906, the virtual execution section 190 determines whether or not the virtual execution analysis result is Null. When the virtual execution analysis result is not Null, the process proceeds to Step SI907 through the branch of No. 5 [0082] In Step SI907, the virtual execution section 190 registers the virtual execution analysis result in the analysis result list 107. [0083] A flow of virtual execution analysis in Step S1905 is discussed in detail with reference to Fig. 18 and Fig. 19.
Fig. 18 is a flow chart illustrating a flow (first half) of virtual 10 execution analysis of the virtual execution section 190 according to the fourth working example 4.
Fig. 19 is a flow chart illustrating a flow (last half) of the virtual execution analysis of the virtual execution section 190 according to the fourth working example 4. 15 [0084] In the virtual execution analyses of Step S1905. the virtual execution section 190 is provided with the block information Bi and an execution file to be analyzed as parameters.
[0085] First, in Step S201, the virtual execution section 190 initializes the
encryption/decryption function pair list.
20 In Step S202, the virtual execution section 190 executes the execution
file 101 to be analyzed, in a virtual environment, to start the process, and then suspends the process after a certain period of time.
In Step S203, the virtual execution section 190 generates a snapshot of the process, which is called Snapshot 1.
[0086] In Step S204, the virtual execution section 190 confirms whether there is the next element (block information) in the block information list 106. When there is no block information of the next element, the process proceeds to Step S222, through the branch of No, where the coding/decryption function 5 pair list is returned, and the process ends. When there is the block
information of the next element, the next element Bj is selected in Step S205. Alternatively, it is also possible to confirm the contexts of the block Bi and the block Bj, and then select the block Bj which is not in a nest relation. [0087] In Step S206, the virtual execution section 190 restores the Snapshot 1
10 of the process.
In Step S207, the virtual execution section 190 injects the process with a command string making up the block Bj. More specifically, the virtual execution section 190 searches the block list 104 for the element corresponding to the block ID of the block information Bj, and acquires the
15 command string of the block and the beginning address. Then, the virtual execution section 190 injects the command string at the beginning address of the process.
In Step S208, the virtual execution section 190 generates the snapshot of that process, which is called Snapshot 2.
20 In Step S209, the virtual execution section 190 acquires the input
information of Bj.
[0088] In Step S210, the virtual execution section 190 generates an input snapshot Iss, based on the input information of Bj and the output information of Bi. The input snapshot is the information that is an input of the block to
25 be executed. The input snapshot Iss is generated as follows. Assuming that
the block Bi has n pieces of the output information, the block Bi is expressed as O - {Ol-On}. Assuming that the block Bj has m pieces of the input information, the block Bj is expressed as I - {Il-Im}. Assuming that OieO), Iss is the input information obtained by replacing the j-th element of I by Oi. 5 The replacement is performed between pieces of information whose types of input/output information are "buffer". Alternatively, it is also possible to select pieces of information whose sizes are similar to each other on a priority basis for replacement. The replacement is performed with respect to the value and the size. Further, the replacement is performed so that the same
10 Iss is not duplicated.
[0089] In Step S211, the virtual execution section 190 determines whether a new Iss has been generated. When there is no new Iss generated, the process proceeds to execute Step S204, through the branch of No. When there is a new Iss generated, the process proceeds to S212 where the virtual execution
15 section 190 restores the Snapshot 2 of the process, and reflects the Iss on the same process in Step S213. In the reflection of the Iss, the values of all the pieces of the input information in the Iss are set in an appropriate storage area (register, memory). [0090] In Step S214, the virtual execution section 190 sets, in an Instruction
20 Register, the beginning address of the injected command string, and resumes the process in Step S235.
[0091] In Step S216, the virtual execution section 190 monitors the execution address of the process and confirms whether or not the execution address goes beyond the range of the block Bj.
In Step 217, the virtual execution section 190 determines whether or not the execution of the block has come to an end. When the execution address being monitored goes beyond the range of the block Bj, the virtual execution section 190 determines that the execution of the block Bj has come 5 to an end, and suspends the process in Step S218.
[0092] In Step S219> the virtual execution section 190 compares the output information of the executed block Bj with the input information of the block Bi. The output information obtained by executing the block Bj is extracted from the memory of the process being suspended, based on the beginning
10 address of the output information of the block Bj.
[0093] In Step S220, the virtual execution section 190 determines whether the output information of the block Bj matches the input information of the block Bi. When they match, the process proceeds through the branch of Yes to Step S221 where the virtual execution section 190 registers the block Bi and
15 the block Bj in the encryption/decryption function pair list, as a pair of encryption logic and decryption logic.
[0094] Alternatively, in Step S207, the process may be continued until the beginning address of the block, instead of injecting the command string of the block. In that case, the processing of Step S214 is skipped, and the process
20 is resumed at Step S215.
[0095] Thus, the invention discussed in the first to fourth working examples has the advantageous effect of specifying, with accuracy, encryption logic used by malware, by generating the characteristic determination information for determining the characteristic of the input/output relation of a block
25 extracted from the execution trace; analyzing the input/output relation of the
block, using this characteristic determination information; and dctcrminin that a block indicating the characteristic of the input/output relation of an encryption function and a decryption function, as encryption logic. Reference Signs List 5 [0096]
100 process analysis apparatus
101 execution file
102 execution trace
103 process information 10 104 block list
105 definition list
106 block information list
107 analysis result list
110 execution trace acquisition section
15 120 block extraction section
130 block information extraction section
140 block information analysis section
150 analysis result output section
160 character-string rate determination section
20 161 character code determination algorithm DB
162 character code table DB
163 encryption logic list 1
170 data decoding section
171 encoding/decoding algorithm DB 25 172 encryption logic list 2
172 data decompression section
173 compression/decompression algorithm DB
182 encryption logic list 3
190 virtual execution section
5 191 encryption/decryption function pair list
CLAIMS
1. A process analysis apparatus comprising:
an execution trace acquisition section to acquire an execution trace of 5 a process to be analyzed;
a block extraction section to extract, from the execution trace, a block that is a processing unit indicating a loop structure;
a block information extraction section to extract, from the block,
block information including input information and output information; and
10 a block information analysis section to:
generate characteristic determination information for determining a characteristic of an input/output relation of the block, using one of the input information and the output information of the block information, analyze the input/output relation of the block, using the 15 characteristic determination information, and
determine (he block which indicates a characteristic of an input/output relation of one of an encryption function and a decryption function, as encryption logic.
20 2. The process analysis apparatus of claim 1, comprising a
character-string rale determination section to determine a printable character-string rate which is a rate of printable character strings in one of the input information and the output information of the block information; wherein the block information analysis section:
calculates, as the characteristic determination information, a difference between a first printable character-string rate of the input information and a second printable character-string rate of the output information, which have been determined by the character-string rate 5 determination section, and
when the difference is at or above a predetermined threshold, determines the block as the encryption logic.
3. The process analysis apparatus of claim 1, comprising a data decoding
10 section to decode the output information of the block information; wherein the block information analysis section:
generates, as the characteristic determination information, a
decoding result obtained by decoding the output information of the block
information by the data decoding section, and
15 determines the block which has the output information that
matches the decoding result, as encryption logic.
4. The process analysis apparatus of claim 1, comprising a data
decompression section to decompress the output information of the block 20 information;
wherein the block information analysis section:
generates, as the characteristic determination information, a decoding result obtained by decompressing the output information of the block information by the data decompression section, and
determines the bock which has the output information that matches the decoding result, as the encryption logic.
5. The process analysis apparatus of claim 1, comprising a virtual
5 execution section to input the output information of the block information to another block, and execute a process of the another block; wherein the block information analysis section;
generates, as the characteristic determination information, an execution result obtained by executing the process of the another block by the 10 virtual execution section, and
determines a block which has the input information that matches the execution result, as the encryption logic.
6. A process analysis method of a process analysis apparatus to analyze a
15 process to be analyzed and determine encryption logic, the process analysis
method comprising:
an execution trace acquisition step in which an execution trace acquisition section acquires an execution trace of the process to be analyzed;
a block extraction step in which a block extraction section extracts, 20 from the execution trace, a block that is a processing unit indicating a loop structure:
a block information extraction step in which a block information extraction section extracts, from the block, block information including input information and output information; and
a block information analysis step in which a block information analysis section:
generates characteristic determination information for determining a characteristic of an input/output relation of the block, using one 5 of the input information and the output information of the block information, analyzes the input/output relation of the block, using the characteristic determination information, and
determines the block which indicates a characteristic of an input/output relation of one of an encryption function and a decryption 10 function, as the encryption logic.
7. A process analysis program, causing a computer to execute:
an execution trace acquisition step to acquire an execution trace of a
process to be analyzed;
15 a block extraction step to extract, from the execution trace, a block
that is a processing unit indicating a loop structure:
a block information extraction step to extract, from the block, block
information including input information and output information; and
a block information analysis step to:
20 generate characteristic determination information for
determining a characteristic of an input/output relation of the block, using one of the input information and the output information of the block information,
analyze the input/output relation of the block, using the characteristic determination information, and
determine the block which indicates a characteristic of an input/output relation of one of an encryption function and a decryption function, as the encryption logic.