A Method For Automatic Compling A Source Program Into An Object

A Method For Automatic Compling A Source Program Into An Object Program And An Apparatus

A method for automatic compiling a source program into an object program, comprising: locating at least one predetermined sequence within said source program, said sequence being a parallel construct to be executed by multiple threads in parallel; wherein for each located parallel construct:(a) a start code is inserted into said object program prior to a first instruction of the parallel construct, said start code being a new threaded entry code indicating the beginning of the parallel construct;(b) an invocation code is inserted into said object program prior to said start code, said invocation code for addressing said start code and for transferring the parallel construct identified by the corresponding new threaded entry code to a multi-threaded run-time system for parallel execution; and(c) a stop code is inserted into said object program after the last instruction of said parallel construct, said stop code being a threaded return instruction signaling to said multi-threaded run- time system to stop execution of said parallel construct and to return.

Patent Information

Application #

Filing Date

28 November 2002

Publication Number

Publication Type

Invention Field

GENERAL ENGINEERING

Status

Email

Parent Application

Patent Number

Legal Status

Grant Date

2005-09-30

Renewal Date

Applicants

INTEL CORPORATION

2200 MISSION COLLEGE BOULEVARD, SANTA CLARA, CALIFORNIA 95052, UNITED STATES OF AMERICA.

Inventors

1. KNUD KIRKEGAARD

56 PALMWELL WAY, SAN JOSE, CALIFORNIA 95138, USA.

2. MILIND GIRKAR

1049 WEST OLIVE, #3. SUNNYVALE, CALIFORNIA 94086, USA.

3. PAUL GREY

4190 FEAFEL DRIVE, SAN JOSE, CALIFORNIA 95134, USA.

4. XINMIN TIAN

4291 NORWALK DRIVE, V102, SAN JOSE, CALIFORNIA 95129, USA.

Specification

FORM 2
THE PATENTS ACT 1970
[39 OF 1970]
COMPLETE SPECIFICATION
[See Section 10; rule 13]
"A METHOD FOR AUTOMANC COMPLING A SOURCE PROGRAM
INTO AN OBJECT PROGRAM AND AN APPARATUS"
INTEL CORPORATION, a Delaware Corporation, 2200 Mission College Boulevard, Santa Clara, California 95052, United States of America
The following specification particularly describes the nature of the invention and the manner in which it is to be performed :-

The present invention relates to a method for automatic compiling a source program into an object program and an apparatus.
The present invention relates generally to compiler optimization
techniques and, more specifically, to a multi-entry threading method and
apparatus for automatic and directive-gxiided parallelization of a source program.
BACKGROUND OF THE INVENTION
The ever-increasing complexity of computational problems that require a solution is reflected in the increase in the speed and performance of computers designed to solve such problems. As CPUs become faster and multiprocessor system configurations multiply, program performance must be improved and efficient methods for compiling and executing source programs must be used.
Parallel processing is rapidly becoming mainstream technology influencing architecture and software design from the home computer market to business applications. Parallel applications are executed by a multiple processor computer system which includes a plurality of processors interconnected so as to exchange data with each other.
Figure 1A is a block diagram of a distributed-memory multiple processor computer system. As illustrated in Figure 1A, a computer system 100 includes multiple processing modules 120. Each processing module 120 includes a

processor 122 and a memory 124. In the computer system 100, any number of processing modules can be interconnected as shown.
Figure IB is a block diagram of a shared-memory multiple processor computer system. As illustrated in Figure 1B, a computer system 150 includes multiple processors 160 connected to a shared memory 170. In one embodiment, memory 170 includes exclusive areas occupied by each processor 160 and a common area accessed by all processors. In the computer system 150, only a limited number of processors 160 may be interconnected, due to the limitations imposed by the shared memory 170.
Parallel processing methods use automatic tools, such as automatic parallelizing compilers, which compile the source programs and facilitate parallel processing of the programs. A compiler looks at the entire source program, collects and reorganizes the instructions, and translates the source program into object code executable by the computer.
One compiler technique involves use of outlining technology, which transforms selected regions of a program into outlined or separate subroutines. Each outlined subroutine is then sent to one thread in a processor of parallel execution. Parallelization using outlining technology is described in detail in Jyh-Hemg Chow et al., Automatic Parallelization for Symmetric Shared-Memory Multiprocessors, Proceedings of CASCON '96, Toronto, Canada, 12-14 November, 1996. However, parallelization of a source program using outlining technology increases the complexity of compiler generating multi-threaded code. Because the

original code is split into separate subroutines, a number of scalar optimizations, which were originally applied to one single subroutine, will have to be invoked for several different subroutines, creating a procedure that generates less efficient code and is time consuming. BRIEF DESCRIPTION OF THE DRAWINGS
The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:
Figure 1A is a block diagram of one embodiment for a distributed-memory multiple processor computer system.
Figure 1B is a block diagram of one embodiment for a shared-memory multiple processor computer system.
Figure 2 is a block diagram of one embodiment for a computer system.
Figure 3A is a block diagram of one embodiment for a process of obtaining an executable program in a computer system.
Figure 3B is a block diagram of one embodiment for a process of obtaining a parallel executable program in a computer system.
Figure 4 is a flow diagram of one embodiment for a multi-entry threading method for automatic and directive-guided parallelization of a source program.

DETAILED DESCRIPTION
In the following detailed description of embodiments of the invention, reference is made to the accompanying drawings in which like references indicate similar elements, and in which are shown by way of illustration specific embodiments in which the invention may be practiced.
Numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to one skilled in the art that the present invention may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that logical, mechanical, electrical and other changes may be made without departing from the scope of the present invention.
Some portions of the detailed descriptions that follow are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of acts leading to a desired result. The acts are those requiring physical manipulations of physical

quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as "processing" or "computing" or "calculating" or "determining" or "displaying" or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
The present invention also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer, selectively activated oij reconfigured by a computer program stored in the computer. Such a computet! program may be stored in a computer readable storage medium, such as, but not' limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and[

magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method. For example, any of the methods according to the present invention can be implemented in hard-wired circuitry, by programming a general purpose processor or by any combination of hardware and software. One of skill in the art will immediately appreciate that the invention can be practiced with computer system configurations other than those described below, including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like. The invention can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. The required structure for a variety of these systems will appear frorn the description below.
The methods of the invention are described in terms of computer software. If written in a programming language conforming to a recognized standard, sequences of instructions designed to implement the methods can be compiled for

execution on a variety of hardware platforms and for interface to a variety of operating systems. In addition, the present invention is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the invention as described herein. Furthermore, it is common in the art to speak of software, in one form or another (e.g., program, procedure, application...), as taking an action or causing a result. Such expressions are merely a shorthand way of saying that execution of the software by a computer causes the processor of the computer to perform an action or produce a result.
Figure 2 is a block diagram of one embodiment for a computer system 200. Computer system 200 includes a system bus 201, or other communications module similar to the system bus, for communicating information, and a-processing module, such as processor 202, connected to bus 201 for processing information. Computer system 200 further includes a main memory 204, such as a random access memory (RAM) or other dynamic storage device, connected to bus 201, for storing information and instructions to be executed by processor 202. Main memory 204 may also be used for storing temporary variables or other intermediate information during execution of instructions by processor 202. Computer system 200 also includes a read only memory (ROM) 206, and/or other similar static storage device, connected to bus 201, for storing static information and instructions for processor 202.

An optional data storage device 207, such as a magnetic disk or optical disk, and its corresponding drive may also be connected to computer system 200 for storing information and instructions. System bus 201 is connected to an external bus 210, which connects computer system 200 to other devices. Computer system 200 may also be connected via bus 210 to a display device 221, such as a cathode ray tube (CRT) or a liquid crystal display (LCD), for displaying information to a computer user. For example, graphical or textual information may be presented to the user on display device 221. Typically, an alphanumeric input device 222, such as a keyboard including alphanumeric and other keys, is connected to bus 210 for communicating information and/or command selections to processor 202. Another type of user input device is cursor control device 223, such as a conventional mouse, touch mouse, trackball, or other type of cursor direction keys, for communicating direction information and command selection to processor 202 and for controlling cursor movement on display 221. A fully loaded computer system may optionally include video, camera, speakers, sound card, and many other similar conventional options.
A communication device 224 is also connected to bus 210 for accessing remote computers or servers, via the Internet, for example. The communication device 224 may include a modem, a network interface card, or other well known interface devices, such as those used for interfacing with Ethernet, Token-ring, or other types of networks. In any event, in this manner, the computer system 200

may be connected to a number of servers via a conventional network infrastructure.
Figure 3A is a block diagram of one embodiment for a process of obtaining an executable program in a computer system. According to Figure 3A, source file 310 includes source code written by programmers in high-level languages, for example FORTRAN or C. The source code instructions must be translated into machine language. The translation process involves several processing steps and includes compilation of the source code instructions.
In one embodiment, the high-level language source code instructions within source file 310 are passed through a compiler (not shown). The compiler translates the high-level instructions into object code stored within object files 320. In one embodiment, considering a parallel computer system, the compiler needs to generate the object code in a form suitable for parallel execution. In one embodiment, the object code includes multiple modules, each module incorporating one or more subroutines. Some modules may be stored in runtime library 340.
Finally, the object code is passed through a linker 330. The linker 330 combines the modules and gives real values to addresses within the modules, thereby producing an executable program 350.
Figure 3B is a block diagram of one embodiment for a process of obtaining a parallel executable program in a computer system. As shown in Figure 3B, a serial source program 360 and a serial source program with OpenMP directives

365 are compiled by a compiler (not shown), which creates parallel executable code 370. The parallel executable code 370 is then linked to a parallel runtime library 380. The runtime library 380 creates multiple thread entries 385, which are then sent to a SMP machine 390 for execution.
Figure 4 is a flow diagram of one embodiment for a multi-entry threading method for automatic and directive-guided parallelization of a source program.
In one embodiment, a source program to be compiled and executed by a multi-processor computer system needs to be parallelized in order to fully take advantage of the system's resources. Therefore, depending on the number of processors, multiple threads must be generated to run the source program in parallel.
The source program includes multiple loops of code, also known as parallel regions. A parallel region or loop is defined as a code block of the program that is to be executed by the multiple threads in parallel. One example of a source program including multiple parallel regions or loops is as follows:

Each thread receives a portion of the loop and executes the portion in parallel with other threads. Parallel regions or loops are sequences of the code representing the fundamental parallel constructs that indicate code to be executed in parallel.
Referring to Figure 4, at processing block 410, the source program or source code is received and read by the compiler. At processing block 420, a first parallel construct within the routine to be executed in parallel is located by the compiler.
At processing block 430, a start code is generated by the compiler. In one embodiment, the start code is a new threaded entry code indicating the beginning of a parallel construct. At processing block 440, an invocation code is generated by the compiler. In one embodiment, the invocation code is an invocation instruction, which passes the threaded entry identified by the new threaded entry code to the multi-threaded run-time system for parallel execution on multi¬processor systems.
At processing block 450, the new threaded entry code is inserted before the parallel construct in the source program. In one embodiment, the new entry code is inserted prior to a first instruction of the parallel construct. At processing block

460, the invocation instruction is inserted before the new tlireaded entry code in the source program.
At processing block 470, a stop code is inserted after the parallel construct in the source program. In one embodiment, the stop code is a threaded return instruction, which is inserted after a last instruction of the parallel construct. The threaded return instruction signals the run-time system to perform the synchronization and return to the main program.
At processing block 480, a new location instruction is generated by the compiler. In one embodiment, the location instruction is a label instruction indicating the next instruction to be executed by the multiprocessor system. At processing block 485, the location instruction is inserted after the threaded return instruction.
At processing block 490, a jump instruction is generated and is inserted before the new threaded entry to direct the system to continue execution of the source program at the location instruction. In one embodiment, the jump instruction is inserted subsequent to the invocation instruction and prior to the new threaded entry code.
At processing block 495, a decision is made whether the routine contains any new parallel constructs. If the routine contains another parallel construct, then blocks 420 through 495 are processed again with respect to the new parallel construct. Otherwise, if the routine does not contain any new parallel constructs, the procedure stops.

In the foregoing specification, the invention has been described with reference to specific exemplary embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

WE CLAIM:
2. A method for automatic compiling a source program into an object program, comprising: locatmg at least one predetermined sequence within said source program, said sequence being a parallel construct to be executed by multiple threads in parallel; wherein for each located parallel construct:
(a) a start code is inserted into said object program prior to a first instruction of the parallel construct, said start code being a new threaded entry code indicating the beginning of the parallel construct;
(b) an invocation code is inserted into said object program prior to said start code, said invocation code for addressing said start code and for transferring the parallel construct identified by the corresponding new threaded entry code to a multi-threaded run-time system for parallel execution; and
(c) a stop code is inserted into said object program after the last instruction of said parallel construct, said stop code being a threaded return instruction signaling to said multi-threaded run-time system to stop execution of said parallel construct and to return,
2. The method as claimed in claim 1, wherein:

(a) a location instruction is inserted after said stop code, said location instruction being a label instruction indicating the next instruction to be executed after the parallel construct; and
(b) a jump instruction is inserted prior to said start code and after said invocation code, said jump instruction for directing the run-time system tp continue the execution at the location instruction.
The method as claimed in claim 1, having receiving said source program; and reading said source program.
The method as claimed in claim 1, wherein each sequence of said plurality of predetermined sequences is a parallel construct.
The method as claimed in claim 1, wherein said system is a multi-threaded run-time system capable of executing said each sequence in parallel.
The method as claimed in claim 1, wherein each sequence of said plurality of predetermined sequences is an Open MP parallel construct.
The method as claimed in claim 1, wherein inserting said start code has generating said start code for insertion.

The method as claimed in claim 1, wherein inserting said invocation code has generating said invocation code for insertion.
The method as claimed in claim 2, wherein inserting said location instruction has generating said location instruction for insertion.
An apparatus comprising:
a system bus (201) for communicating information;
a memory (204) to store a source program;
a read only memory (206) for staring static information and
instruction;
a display device (221) for displaying information to a user;
a processor (202) coupled to said memory; and
an alphanumeric input device (222) for communicating information to
processor; wherein the said processor
locates a plurality of predetermined sequences within said
source program;
inserts a start code in said source program prior to a first
instruction of each sequence of said plurality of predetermined
sequences;
inserts an invocation code in said source program prior to said
start code, said invocation code addressing said start code and
transferring said each sequence to a system for execution; and

inserts a stop code in said source program after a base
instruction of said each sequence of said plurality of sequences,
said stop code signaling to said system to stop execution of said
each sequence;
optionally comprising a data storage device (207) for storing
information and instruction; and
a cursor control device (223) for communicating direction
information to processor.
The apparatus as claimed in claim 10, wherein the said processor inserts a location instruction after said stop code; generates a jump instruction to start execution of said source program At said location instruction; and inserts said jump instruction prior to said start code and subsequent to said function code.
The apparatus as claimed in claim 10, wherein the said processor receives said source program and reads said source program.
The apparatus as claimed in claim 10, wherein each sequence of said plurality of predetermined sequences is a parallel construct.
The apparatus as claimed in claim 10, wherein said system is a multi-threaded run-time system capable of executing said each sequence in parallel.

15. The apparatus as claimed in claim 10, wherein each plurality of predetermined sequences is an Open construct.
16. The apparatus as claimed in claim 10, wherein prior to
start code, said processor generates said start code for insertion
17. The apparatus as claimed in claim 10, wherein prior to inserting invocation code, said processor generates said invocation insertion.
18. The apparatus as claimed in claim 11, wherein prior to inserting said location instruction, said processor generates said location instruction for insertion.
Dated this 28th day of November, 2002.

[RITUSHKA NEGI]
OF REMFRY & SAGAR
ATTORNEY FOR THE APPLICANTS

Documents

Application Documents

#	Name	Date
1	in-pct-2002-01701-mum-power of authority(03-08-2001).pdf	2001-08-03
2	in-pct-2002-01701-mum-form 1(25-11-2002).pdf	2002-11-25
3	IN-PCT-2002-01701-MUM-WO INTERNATIONAL PUBLICATION REPORT(28-11-2002).pdf	2002-11-28
4	IN-PCT-2002-01701-MUM-POWER OF AUTHORITY(28-11-2002).pdf	2002-11-28
5	in-pct-2002-01701-mum-form-pct-ipea-409(28-11-2002).pdf	2002-11-28
6	in-pct-2002-01701-mum-form 5(28-11-2002).pdf	2002-11-28
7	in-pct-2002-01701-mum-form 3(28-11-2002).pdf	2002-11-28
8	IN-PCT-2002-01701-MUM-FORM 2(TITLE PAGE)-(COMPLETE)-(28-11-2002).pdf	2002-11-28
9	IN-PCT-2002-01701-MUM-FORM 2(COMPLETE)-(28-11-2002).pdf	2002-11-28
10	IN-PCT-2002-01701-MUM-FORM 1(28-11-2002).pdf	2002-11-28
11	IN-PCT-2002-01701-MUM-DRAWING(28-11-2002).pdf	2002-11-28
12	IN-PCT-2002-01701-MUM-DESCRIPTION(COMPLETE)-(28-11-2002).pdf	2002-11-28
13	IN-PCT-2002-01701-MUM-CLAIMS(28-11-2002).pdf	2002-11-28
14	IN-PCT-2002-01701-MUM-ABSTRACT(28-11-2002).pdf	2002-11-28
15	in-pct-2002-01701-mum-form 19(18-03-2004).pdf	2004-03-18
16	in-pct-2002-01701-mum-power of authority(16-05-2005).pdf	2005-05-16
17	in-pct-2002-01701-mum-form 2(granted)-(16-05-2005).pdf	2005-05-16
19	in-pct-2002-01701-mum-drawing(16-05-2005).pdf	2005-05-16
20	in-pct-2002-01701-mum-correspondence(16-05-2005).pdf	2005-05-16
21	in-pct-2002-01701-mum-claims(granted)-(16-05-2005).pdf	2005-05-16
23	in-pct-2002-01701-mum-cancelled pages(16-05-2005).pdf	2005-05-16
24	in-pct-2002-01701-mum-abstract(16-05-2005).pdf	2005-05-16
26	in-pct-2002-01701-mum-petition under rule 138(20-05-2005).pdf	2005-05-20
27	in-pct-2002-01701-mum-correspondence(ipo)-(30-09-2005).pdf	2005-09-30
28	IN-PCT-2002-01701-MUM-CORRESPONDENCE(IPO)-(14-11-2005).pdf	2005-11-14
29	IN-PCT-2002-01701-MUM-SPECIFICATION(AMENDED)-(16-5-2005).pdf	2018-08-08
30	IN-PCT-2002-01701-MUM-FORM 2(TITLE PAGE)-(GRANTED)-(30-9-2005).pdf	2018-08-08
31	IN-PCT-2002-01701-MUM-FORM 2(GRANTED)-(30-9-2005).pdf	2018-08-08
32	IN-PCT-2002-01701-MUM-DRAWING(GRANTED)-(30-9-2005).pdf	2018-08-08
33	IN-PCT-2002-01701-MUM-DESCRIPTION(GRANTED)-(30-9-2005).pdf	2018-08-08
34	IN-PCT-2002-01701-MUM-CORRESPONDENCE(20-5-2005).pdf	2018-08-08
35	IN-PCT-2002-01701-MUM-CLAIMS(GRANTED)-(30-9-2005).pdf	2018-08-08
36	IN-PCT-2002-01701-MUM-CLAIMS(AMENDED)-(20-5-2005).pdf	2018-08-08
37	IN-PCT-2002-01701-MUM-CANCELLED PAGES(20-5-2005).pdf	2018-08-08
38	IN-PCT-2002-01701-MUM-ABSTRACT(GRANTED)-(30-9-2005).pdf	2018-08-08