2200 MISSION COLLEGE BOULEVARD,
SANTA CLARA,
CALIFORNIA 95052
Inventors
1. ADAM B.STRAUSS
1280 TAMARACK AVENUE,
BREA, CALIFORNIA 92821
2. ANURAG BIST
40 ALCOBA, IRVNE,
CALFORINA 91765
3. STAN HSIEH
1535 KIOW CREST DRIVE,
DIAMOND BAR,
CALFORNIA 91765
4. ZHEN ZHU
39 LA RONDA, IRVINE,
CALFORNIA 92606
5. RAGHAVENDRA S. PRABHU
2700 PETERSON PLACE,
#10H COSTA MESA,
CALIFORNIA 92626
Specification
FORM 2
THE PATENTS ACT 1970
[39 OF 1970]
&
THE PATENTS RULES, 2003
COMPLETE SPECIFICATION
[See Section 10; rule 13]
A METHOD FOR IMPLEMENTING TONE DETECTION IN A TONE DETECTION PROCESSOR
INTEL CORPORATION, a corporation incorporated in the State of Delaware, of 2200 Mission College Boulevard, Santa Clara, California 95052, United States of America.
The following specification particularly describes the invention and the manner in which it is to be performed:
TONE DETECTfON
FOR
INTEGRATED TELECOMMUNICATIONS PROCESSING
5 RELATED APPLICATION
This application claims the benefit of U.S. Provisional Patent Application No. 60/231,090 filed on September 8, 2000.
FIELD OF THE INVENTION
10 This invention relates generally to signal processors. More particularly, the
invention relates to telephone signal processors and tone detection for integrated' telecommunications processing.
i
BACKGROUND OF THE INVENTION
Single chip digital signal processing devices (DSP) are relatively well known.
15 DSPs generally are distinguished from general purpose microprocessors in that DSPs typically support accelerated arithmetic operations by including a dedicated multiplier and accumulator (MAC) for performing multiplication of digital numbers. The instruction set for a typical DSP device usually includes a MAC instruction for performing multiplication of new operands and addition with a prior accumulated value
20 stored within an accumulator register. A MAC instruction is typically the only
instruction provided in prior art digital signal processors where two DSP operations, multiply followed by add, are performed by the execution of one instruction. However, when performing signs! processing functions on data it is often desirable to perform other DSP operations in varying combinations.
25 An area where DSPs may be utilized is in telecommunication systems. One use
of DSPs in telecommunication systems is digital filtering. In this case a DSP is typically programmed with instructions to implement some filter function in the digital or time domain. The mathematical algorithm for a typical finite impulse response (FIR) filter may look like the equation Yn = hoXo + h1X1 + h2X2 + ... + hNXN where hn
30 are fixed filter coefficients numbering from 1 to N and Xnare the data samples. The
equation Yn may be evaluated by using a software program. However in some applications, it is necessary that the equation be evaluated as fast as possible. One way to do this is to perform the computations using hardware components such as a DSP device programmed to compute the equation Y„. In order to further speed the process, it
5 is desirable to vectorize the equation and distribute the computation amongst multiple DSPs such that the final result is obtained more quickly. The multiple DSPs operate in parallel to speed the computation process, hi this case, the multiplication of terms is spread across the multipliers of the DSPs equally for simultaneous computations of terms. The adding of terms is similarly spread equally across the adders of the DSPs
10 for simultaneous computations. In vectorized processing, the order of processing terms is unimportant since the combination is associative. If the processing order of the terms is altered, it has no effect on the final result expected in a vectorized processing of a function.
One area where finite impulse response filters is applied is in echo cancellation
15 for telephony processing. Echo cancellation is used to cancel echoes over full duplex telephone communication channels. The echo-cancellation process isolates and filters the unwanted signals caused by echoes from the main transmitted signal in a two-way transmission. Single or multiple DSP chips can be used to implement an echo canceller having finite impulse response filter to provide echo cancellation. However, echo
20 cancellation is only one part of telecommunication processing. Typically, telephone processing functions are spread over multiple devices, components or boards in a telephone communication system.
Referring now to Figure 8, a typical prior art telephone communication system is illustrated. A telephone, fax, or data modem couples to a local subscriber loop 802 at
25 one end and another local subscriber loop 802' at an opposite end. Each of the local subscriber loops 802 and 802' couple to 2-wire/4-wire hybrid circuits 804 and 804'. Hybrid circuits 804 are composed of resistor networks, capacitors, and ferrite-core transformers. Hybrids circuits 804 convert 4-wixe telephone trunk lines 806 (a pair in each direction) running between telephone exchanges of the PSTN 812 to each of the 2-
30 wire local subscriber loops 802 and 802'. The hybrid circuits 804 is intended to direct all the energy from a talker on the 4-wire trunk 806 at a far-end to a listener on a 2-wire local subscriber loop 802 at a near end.
Echoes 810 are often formed when a speech signal from a far end talker leaves
2
a far end hybrid 804' on a pair of the four wires 806', and arrives at the near end after traversing the PSTN 812, and may be heard by the listener at the near side. In traditional telephone networks, an echo canceller is placed at each end of the PSTN in order to reduce and attempt to eliminate this echo.
5 Referring now to Figure 9, a typical prior an digital echo canceller 900 is
illustrated. The prior art digital echo canceller 900 couples between the hybrid circuit 804 and the public switched telephone network (PSTN) 902 on the telephone trunk lines. The governing specification for digital echo cancellers is the ITU-T recommendation G.168, Digital network echo cancellers. The following terms from
10 ITU-T document G.168 are used herein and are illustrated in Figure 9. The end or side of the connection towards the local handset is referred to as the near end, near side or send side 910. The end or side of the. connection towards the distant handset is referred to as the far end, far side or receive side 920. The part of the circuit from the near end 910 to-the far end 920 is the send path 930. The part of the circuit from the far end to
15 the near end is the receive path 935. The part of the circuit (i.e. copper wire, hybrid) in the local loop 802, between the end system subscriber or telephone system 108 and the central-office termination of the hybrid 804, is the end path. Speech sigoais entering the echo canceller 900 from the near end 910 are the send input S;n. Speech signals entering the echo canceller from the far end 920 are the received input R^n. Speech
20 signals output from the echo canceller 900 to the far end 920 are the send output Soul. Speech signals exiting the echo canceller to the near end 910 are the received output
Rout-
The typical prior art digital echo canceller 900 includes the basic components of an echo estimator 902, a digital subtractor 904, and a non-linear processor 906.
25 Typically, the echo-cancellation process in the typical prior art digital echo canceller 900 begins by eliminating impedance mismatches. In order to do so, the typical digital echo canceller 900 taps the receive-side input signal (R;n). R^n is processed to generate an estimate of Sin in the echo estimator (902). Sin serves as the reference signal for the echo cancellation process. Rin is also passed through to the near end 910 without
30 change as the R<,ul signal. The echo estimator 902 is a linear finite impulse response (FIR) convolution filter implemented in a DSP. The estimator 902 accepts successive samples of voice on Rin (typically a 16 bit sample every 125 microseconds). The voice samples are multiplied with a set of filter coefficients approximating the impulse
response of circuitry in the endpath to generate an echo estimation. Over time, the set of filter coefficients are changed (i.e. adapted) until they accurately represent the desired impulse response to form an accurate echo estimation. The echo estimation is coupled into the subtractor 904. If the echo estimation is accurate, it is substantially
5 equivalent to the actual echo on S;n and the output from the subtractor 906 into the nonlinear processor has linear echoes substantially removed. The non-linear processor 906 is used to remove non-linear echo sources.
With growing interest in providing telephony communication channels over packet networks such as the Internet or Asynchronous Transfer Mode (ATM),
10 telephony processing has become more complicated.
4
BRIEF DESCRIPTIONS OF THE DRAWINGS
Figure IA is a block diagram of a system utilizing the present invention.
Figure IB is a block diagram of a printed circuit board utilizing the present
invention within the gateways of the system in Figure 1 A.
5 Figure 2 is a block diagram of the Application Specific Signal Processor
(ASSP) of the present invention.
Figure 3 is a block diagram of an instance of the core processors within the
ASSP of the present invention.
Figure 4 is a block diagram of the RISC processing unit within the core
10 processors of Figure 3.
Figure 5A is a block diagram of an instance of the signal processing units within
the core processors of Figure 3.
Figure 5B is a more detailed block diagram of Figure 5A illustrating the bus
structure of the signal processing unit.
15 Figure 6A is an exemplary instruction sequence illustrating a program model for
DSP algorithms employing the instruction set architecture of the present invention.
Figure 6B is a chart illustrating the permutations of the dyadic DSP instructions.
Figure 6C Is an exemplary bitmap for a control extended dyadic DSP
instruction.
20 Figure 6D is an exemplary bitmap for a non-extended dyadic DSP instruction.
Figure 6E and 6F list the set of 20-bit instructions for the ISA of the present
invention.
Figure 6G lists'the set of extended control instructions for the ISA of the present
invention.
25 Figure 6H lists the set of 40-bit DSP instructions for the ISA of the present
invention.
Figure 61 lists the set of addressing instructions for the ISA of the present
invention.
Figure 7 is a block diagram illustrating the instruction decoding and
30 configuration of the functional blocks of the signal processing units.
Figure 8 is a prior art block diagram illustrating a PSTN telephone network and
echoes therein.
Figure 9 is a prior art block diagram illustrating a typical prior art echo canceller
5
for a PSTN telephone network.
Figure 10 is a block diagram of a packet network system incorporating the integrated telecommunications processor of the present invention.
Figure 11A is a block diagram of the firmware telecommunication processing
5 modules of the integrated telecommunications processor for one of multiple full duplex channels.
Figure 1 IB illustrates a process for tone detection that can be implemented by a tone detection processor/module according to one embodiment of the invention. Figure 11C illustrates a table of common frequencies used in the
10 telecommunications industry and associated exemplary coefficients for a Goertzel filter used in conjunction with the process of Figure UB according to one embodiment of the invention.
Figure 1 ID illustrates a partial dictionary of exemplary call progress tones used in conjunction with the process of Figure 1 IB according to one embodiment of the
15 invention.
Figure I IE illustrates another process for tone detection that can be implemented by a tone detection processor/module according to another embodiment of the invention.
Figure 1 IF illustrates an efficient DFII structure for implementing elliptic IIR
20 filters used in conjunction with the process of Figure 1 IE according to one embodiment of the invention.
Figure 11G illustrates a sub-process for phase reversal detection used in
conjunction with the process of Figure 1 IE according to one embodiment of the
invention.
25 Figure 11H illustrates a sub-process for FAX V.21 detection used in
conjunction with the process of Figure I IE according to one embodiment of the invention.
Figure 12 is a flow chart of telecommunication processing from the near end to
the packet network.
30 Figure 13 is a flow chart of the telecommunication processing of a packet from
the network into the integrated telecommunications processor into TDM signals at the near end.
Figure 14 is a block diagram of the data flows and interaction between
6
exemplary functional blocks of the integrated telecommunications processor 150 for telephony processing.
Figure 15 is a block diagram of exemplary memory maps into the memories of
the integrated telecommunications processor 150.
5 Figure 16 is a block diagram of an exemplary memory map for the global buffer
memory of the integrated telecommunications processor 150.
Figure 17 is an exemplary time line diagram of reception and processing time for frames of data.
Figure 18 is an exemplary time line diagram of how core processors of the
10 integrated telecommunications processor 150 process frames of data for multiple communication channels.
Like reference numbers and designations in the drawings indicate like elements providing similar functionality. A letter or prime after preference designator number represents an instance of an element having the reference designator number.
15
7
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
In the following detailed description of the present invention, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to one skilled in the art that the present
5 invention may be practiced without these specific details. In other instances well
known methods, procedures, components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the present invention. Furthermore, the present invention will be described in particular embodiments but may be implemented in hardware, software, firmware or a combination thereof.
10 Multiple application specific signal processors (ASSPs) having the instruction
set architecture of the present invention, including dyadic DSP instructions, are provided within gateways in communication systems to provide improved voice and data communication over a packetized network Each ASSP includes a serial interface, a host interface, a buffer memory and four core processors in order to simultaneously
15 process multiple channels of voice or data. Each core processor preferably includes a reduced instruction set computer (RISC) processor and four signal processing units (SPs). Each SP includes multiple arithmetic blocks to simultaneously process multiple voice and data communication signal samples for communication over IP, ATM, Frame Relay, or other packetized network. The four signal processing units can execute
20 digital signal processing algorithms in parallel. Each ASSP is flexible and can be programmed to perform many network functions or data/voice processing functions, including voice and data compression/decompression in telecommunication systems (such as CODECs), particularly packetized telecommunication networks, simply by altering the software program controlling the commands executed by the ASSP.
25 An instruction set architecture for the ASSP is tailored to digital signal
processing applications including audio and speech processing such as compression/decompression and echo cancellation. The instruction set architecture implemented with the ASSP, is adapted to DSP algorithmic structures. This adaptation of the ISA of the present invention to DSP algorithmic structures balances the ease of
30 implementation, processing efficiency, and programmability of DSP algorithms. The
instruction set architecture may be viewed as being two component parts, one (RISC
ISA) corresponding to the RISC control unit and another (DSP ISA) to the DSP
datapaths of the signal processing units 300. The RISC ISA is a register based
8
architecture including 16-registers within the register file 433, while the DSP ISA is a memory based architecture with efficient digital signal processing instructions. The instruction word for the ASSP is typically 20 bits but can be expanded to 40-bits to control two instructions to the executed in series or parallel, such as two RISC control
5 instruction and extended DSP instructions. The instruction set architecture of the ASSP has four distinct types of instructions to optimize the DSP operational mix. These are (1) a 20-bit DSP instruction that uses mode bits in control registers (i.e. mode registers), (2) a 40-bit DSP instruction having control extensions that can override mode registers, (3) a 20-bit dyadic DSP instruction, and (4) a 40 bit dyadic DSP
10 instruction. These instructions are for accelerating calculations within the core
processor ofthe type where D = ((A op 1 B)op2 C] and each of "op 1" and "op2" can be a multiply, add or extremum (min/max) class of operation on the three operands A, B, and C. The ISA of the ASSP which accelerates these calculations allows efficient chaining of different combinations of operations.
15 All DSP instructions of the instruction set architecture of the ASSP are dyadic
DSP instructions to execute two operations in one instruction with one cycle throughput. A dyadic DSP instruction is a combination of two DSP instructions or operations in one instruction and includes a main DSP. operation (MAIN OP) and a sub DSP operation (SUB OP). Generally, the instruction set architecture of the present
20 invention can be generalized to combining any pair of basic DSP operations to provide very powerful dyadic instruction combinations. The DSP arithmetic operations in the preferred embodiment include a multiply instruction (MULT), an addition instruction (ADD), a minimize/maximize instruction (MIN/MAX) also referred to as an extrema instruction, and a no operation instruction (NOP) each having an associated operation
25 code ("opcode").
The present invention efficiently executes these dyadic DSP instructions by means of the instruction set architecture and the hardware architecture of the application specific signal processor.
Moreover, embodiments of the present invention relate to an integrated tone
30 detection processor for discriminatingbetween tone and voice signals and determining the tones. The integrated tone detection processor includes a semiconductor integrated circuit having at least one signal processing unit to perform tone detection. Further, a processor readable storage means/machine-readable medium (e.g. a storage device,
such as memory) stores signal processing instructions for execution by the at least one signal processing unit to perform the functions of the tone detection processor. The tone detection processor performs automatic gain control (AGC) to normalize the power of the tone or voice signal. Further, the energy of the tone or voice signals are
. 5 determined at specific frequencies utilizing a Goertzel Filter process which implements a plurality of Goertzel filters. The tone detection processor determines whether or not a tone is present, and if a tone exists, determines the type of tone.
In one embodiment, the tone detection processor determines whether the tone is one of a dial tone, a busy tone, a fast busy tone, a ringing tone, or a fax tone. However,
10 the tone detection processor can also determine many other types of tones. Also, the Goertzel filters can compute the energy levels of tone or voice signals at user-defined specific frequencies, for example at 16 user-defined frequencies. Based upon determining the two maximum energy levels of the Goertzel filtered tone, whether the tone is a single tone, dual tone, silence, or other (e.g. speech) can be discriminated. The
15 tone can then be identified by a user-defined dictionary of tones. Based upon various ON and OFF cadence checks in combination with the use of TONE ON and TONE OFT counters, tones can be declared. Further, by utilizing four signal processors, simultaneously, according to an architecture of one embodiment of the present invention, very robust and efficient tone detection is provided.
20 Also, in other embodiments of the invention, other methods and structures for
tone detection are provided, including the robust and efficient detection of FAX V.21 tones and modem tones.
Referring now to Figure 1A, a voice and data communication system 100 is illustrated. The system 100 includes a network 101 which is a packetized or packet-
25 switched network, such as IP, ATM, or frame relay.. The network 101 allows the
communication of voice/speech and data between endpoints in the system 100, using packets. Data may be of any type including audio,, video, email, and other generic forms of data. At each end of the system 100, the voice or data requires packerization when transceived across the network 101. The system 100 includes gateways 104A
30 and 104B in order to packetize the information received for transmission across the
network 101. A gateway is a device for connecting multiple networks and devices that use different protocols. Voice and data information may be provided to a gateway 104 from a number of different sources in a variety of digital formats. In system 100,
10
analog voice signals are transceived by a telephone 108. In system 100, digital voice signals are transceived at public branch exchanges (PSX) 112A and 1128 which are coupled to multiple telephones, fax machines, or data modems. Digital voice signals are transceived between PBX 112A and PBX 112B with gateways 104A and 104B,
5 respectively over the packet network 101. Digital data signals may also be transceiver directly between a digital modem 114 and a gateway I04A. Digital modem 114 may be a Digital Subscriber Line (DSL) modem or a cable modem. Data signals may also be coupled into system 100 by a wireless communication system by means of a mobile unit 118 transceiving digital signals or analog signals wirelessly to a base station 116.
10 Base station 116 converts analog signals into digital signals or directly passes the
digital signals to gateway 104B. Data may be transceived by means of modem signals over the plain old telephone system (POTS) 107B using a modem 110. Modem signal communicated over POTS 107B are traditionally analog in nature and are coupled into a switch 106B of the public switched telephone network (PSTN). At the switch 106B,
15 analog signals from the POTS 107B are digitized and transceived to the gateway 104B by time division multiplexing (TDM) with each time slot representing a channel and one DS0 input to gateway 104B. At each of the gateways 104A and 104B, incoming signals are packetized for transmission across the network 101. Signals received by the gateways 104A and 104B from the network 101 are depacketized and transcoded for
20 distribution to the appropriate destination.
Referring now to Figure IB, a network interface card (NIC) 130 of a gateway 104 is illustrated. The NIC 130 includes one or more application-specific signal processors (ASSPs) 150A-150N. The number of ASSPs within a gateway is expandable to handle additional channels. Line interface devices 131 of NIC 130
25 provide interfaces to various devices connected to the gateway, including the network 101. In interfacing to the network 101, the line interface devices packetize data for transmission out on the network 101 and depacketize data which is to be received by the ASSP devices. Line interface devices 131 process information received by the gateway on the receive bus 134 and provides it to the ASSP devices. Information from
30 the ASSP devices 150 is communicated on the transmit bus 132 for transmission out of the gateway. A traditional line interface device is a multi-channel serial interface or a UTOPIA device. The NIC 130 couples-to a gateway backplane/network interface bus 136 within the gateway 104. Bridge logic 138 transceives information between bus 136
and NIC 130. Bridge logic 13S transceives signals between the NIC 130 and the backplane/network interface bus 136 onto the host bus 139 for communication to either one or more of the ASSP devices 150A-150N, a host processor 140, or a host memory 142. Optionally coupled to each of the one or more ASSP devices 150A through I SON
5 (generally referred to as ASSP 150) are optional local memory 145A through 145N (generally referred to as optional local memory 145), respectively. Digital data on the receive bus 134 and transmit bus 132 is preferably communicated in bit wide fashion. While internal memory within each ASSP may be sufficiently large to be used as a scratchpad memory, optional local memory 145 may be used by each of the ASSPs 150
10 if additional memory space is necessary.
Each of the ASSPs 150 provide signal processing capability for the gateway. The type of signal processing provided is flexible because each ASSP may execute differing signal processing programs. Typical signal processing and related voice packetization functions for an ASSP include (a) echo cancellation; (b) video, audio, and
15 voice/speech compression/decompression (voice/speech coding and decoding); (c)
delay handling (packets, frames); (d) loss handling; (e) connectivity (LAN and WAN); (f) security (encryption/decryption); (g) telephone connectivity, (h) protocol processing (reservation and transport protocols, RSVP, TCP/IP, RTP, UDP for IP, and AAL2, AAJL1, AAL5 for ATM); (i) filtering; (j) Silence suppression; (k) length handling
20 (frames, packets); and other digital signal processing functions associated with the communication of voice and data over a communication system. Each ASSP 150 can perform other functions in order to transmit voice and data to the various endpoints of the system 100 within a packet data stream over a packetized network.
Referring now to Figure 2, a block diagram of the ASSP 150 is illustrated. At
25 the heart of the ASSP 150 are four core processors 200A-200D. Each of the core processors 200A-200D is respectively coupled to a data memory 202A-202D and a program memory 204A-204D. Each of the core processors 200A-200D communicates with outside channels through the multi-channel serial interface 206, the multi-channel memory movement engine 208, buffer memory 210, and data memory 202A-202D.
30 The ASSP 150 further includes an external memory interface 212 to couple to the
external optional local memory 145. The ASSP 150 includes an external host interface 214 for interfacing to the external host processor 140 of Figure IB. - Further included within the ASSP 150 are timers 216, clock generators and a phase-lock loop 218,
12
miscellaneous control logic 220, and a Joint Test Action Group (JTAG) test access port 222 for boundary scan testing. The multi-channel serial interface 206 maybe replaced with a UTOPIA parallel interface for some applications such as ATM. The ASSP 150 further includes a microcontroller 223 to perform process scheduling for the core
5 processors 200A-200D and the coordination of the data movement within the ASSP as well as an interrupt controller 224 to assist in interrupt handling and the control of the ASSP 150.
Referring now to Figure 3, a block diagram of the core processor 200 is illustrated coupled to its respective data memory 202 and program memory 204. Core
10 processor 200 is the block diagram for each of the core processors 200A-200D. Data memory 202 and program memory 204 refers to a respective instance of data memory 202A-202D and program memory 204A-204D, respectively. The core processor 200 includes four signal processing units SP0 300A, SPl 300B, SP2 300C and SP3 300D. The core processor 200 further includes a reduced instruction set computer (RISC)
15 control unit 302 and a pipeline control unit 304. The signal processing units 300A-300D perform the signal processing tasks on data while the RISC control unit 302 and the pipeline control unit 304 perform control tasks related to the signal processing function performed by the SPs 300A-300D. The control provided by the RISC control unit 302 is coupled with the SPs 300A-300D at the pipeline level to yield a tightly
20 integrated core processor 200 that keeps the utilization of the signal processing units 300 at a very high level.
The signal processing tasks are performed on the datapaths within the signal processing units 300A-300D. The nature of the DSP algorithms are such that they are inherently vector operations on streams of data, that have minimal temporal locality
25 (data reuse). Hence, a data cache with demand paging is not used because it would not function well and would degrade operational performance. Therefore, the signal processing units 300A-300D are allowed to access vector elements (the operands) directly from data memory 202 without the overhead of issuing a number o f load and store instructions into memory resulting, in very efficient data processing. Thus, the
30 instruction set architecture of the present invention having a 20 bit instruction word which can be expanded to a 40 bit instruction word, achieves better efficiencies than VLTW architectures using 256-bits or higher instruction widths by adapting the ISA to DSP algorithmic structures. The adapted ISA leads to very compact and low-power
hardware that can scale to higher computational requirements. The operands that the ASSP can accommodate are varied in data type and data size. The data type may be real or complex, an integer value or a fractional value, with vectors having multiple elements of different sizes. The data size in the preferred embodiment is 64 bits but
5 larger data sizes can be accommodated with proper instruction coding.
Referring now to Figure 4, a detailed block diagram of the RISC control unit 302 is illustrated. RISC control unit 302 includes a data aligner and formatter 402, a memory address generator 404, three adders 406A-406C, an arithmetic logic unit (ALU) 408, a multiplier 410, a barrel shifter 412, and a register file 413. The register
10 file 413 points to a starting memory location from which memory address generator 404 can generate addresses into data memory 202. The RISC control unit 302 is responsible for supplying addresses to data memory so that the proper data stream is fed to the signal processing units 300A-300D. The RISC control unit 302 is a register to register organization with load and store instructions to move data to and from data
15 memory 202. Data memory addressing is performed by RISC control unit using a 32-bit register as a pointer that specifies the address, post-modification offset, and type and permute fields. The type field allows a variety of natural DSP data to be supported as a "first class citizen" in the architecture. For instance the complex type allows direct operations on complex data stored-in memory removing a number of bookkeeping
20 instructions. This is useful in supporting QAM demodulators in data modems very efficiently.
'Referring now to Figure 5 A, a block diagram of a signal processing unit 300 is illustrated which represents an instance of the SPs 300A-300D. Each of the signal processing units 300 includes a data typer and aligner 502, a first multiplier Ml 504A,
25 a compressor 506, a first adder Al 510A, a second adder A2 510B, an accumulator register 512, a third adder A3 5 IOC, and a second multiplier M2 504B. Adders 510A-510C are similar in structure and are generally referred to as adder 510. Multipliers 504A and 504B are similar in structure and generally referred to as multiplier 504. Each of the multipliers 504A and 504B have a multiplexer 514A and 514B respectively
30 at its input stage to multiplex different inputs from different busses into the multipliers. Each of .the adders 510A, 510B, 510C also have a multiplexer 520 A, 520B, and 520C respectively at its input stage to multiplex different inputs from different busses into the adders. These multiplexers and other control logic allow the adders, multipliers and
14
other components within the signal processing units 3O0A-300C to be flexibly interconnected by proper selection of multiplexers. In the preferred embodiment, multiplier M1 504A, compressor 506, adder A1 510A, adder A2 51 OB and accumulator 512 can receive inputs directly from external data buses through the data typer and
5 aligner 502. In the preferred embodiment, adder 5IOC and multiplier M2 504B receive inputs from the accumulator 512 or the outputs from the execution units multiplier M1 504 A, compressor 506, adder A1 510A, and adder A2 51 OB.
Program memory 204 couples to the pipe control 304 which includes an instruction buffer that acts as a local loop cache. The instruction buffer in the preferred
10 embodiment has the capability of holding four instructions. The instruction buffer of the pipe control 304 reduces the power consumed in accessing the main memories to fetch instructions during the execution of program loops.
Referring now to Figure 5B, a more detailed block diagram of the functional ulocks and the bus structure of the signal processing unit is illustrated. Dyadic DSP
15 instructions are possible because of the structure and functionality provided in each signal processing unit Output signals are coupled out of the signal processor 300 on the Z output bus 532 through the data typer and aligner 502. Input signals are coupled into the signal processor 300 on the X input bus 53 Land Y input bus 533 through the data typer and aligner 502. Internally, the data typer and aligner 502 has a different
20 databus tocouple toeachofmultiplierMl 504A, compressor 506, adder Al 510A, adder A2 510B, and accumulator register AR 512. While the data typer and aligner 502 could have data busses coupling to the adder A3 510C and the multiplier M2 504B, in the preferred embodiment it does not in order to avoid extra data lines and conserve area usage of an integrated circuit. Output data is coupled from the accumulator
25 register AR 512 into the data typer and aligner.502. Multiplier Ml 504A has buses to couple its output into the inputs of the compressor 506, adder Al 510A, adder A2 510B, and the accumulator registers AR512. Compressor 506 has buses to couple its output into the inputs of adder Al 510A and adder A2 510B. Adder Al 510Ahasabus to couple its output into the accumulator registers 512. Adder A2 510B has buses to
30 couple its output into the accumulator registers 512. Accumulator registers 512 has
buses to couple its output into multiplier M2 504B, adder A3 510C, and data typer and aligner 502. Adder A3 510C has buses to couple its output into the multiplier M2 504B and the accumulator registers 512. Multiplier M2 504B has buses to couple its output
into the inputs of the adder A3 5 IOC and the accumulator registers AR 512.
INSTRUCTION SET ARCHITECTURE
The instruction set architecture of the ASSP 150 is tailored to digital signal 5 processing applications including audio and speech processing such as
compression/decompression and echo cancellation. In essence, the instruction set architecture implemented with the ASSP 150, is adapted to DSP algorithmic structures. The adaptation of the ISA of the present invention to DSP algorithmic structures is a balance between ease of implementation, processing efficiency, and programmability
10 of DSP algorithms. The ISA of the present invention provides for data movement operations, DSP/arithmetic/logical operations, program control operations (such as function calls/returns, unconditional/conditional jumps and branches), and system operations (such as privilege, interrupt/trap/hazard handling and memory management control).
15 Referring now to Figure 6A, an exemplary instruction sequence 600 is
illustrated for a DSP algorithm program model employing the instruction set architecture of the present invention. The instruction sequence 600 has an outer loop 601 and an inner loop 602. Because DSP algorithms tend to perform repetitive computations, instructions 605 within, the inner loop 602 are executed more often than
20 others. Instructions 603 are typically parameter setup code to set the memory pointers, provide for the setup of the outer loop 601, and other 2X20 control instructions. Instructions 607 are typically context save and function return instructions or other 2X20 control instructions. Instructions 603 and 607 are often considered overhead instructions which are typically infrequently. executed. Instructions 604 are typically to
25 provide the setup for the inner loop 602, other control through 2x20 control
instructions, or off set extensions for pointer backup. Instructions 606 typically provide tear down of the inner loop 602, other control through 2x20 control instructions, and combining of data path results within the signal processing units. Instructions 605 within the inner loop 602 typically provide inner loop execution of DSP operations,
30 control of the four signal processing units 300 in a single instruction multiple data
execution mode, memory access for operands, dyadic DSP operations, and other DSP functionality through the 20/40 bit DSP instructions of the ISA of the present invention. Because instructions 605 are so often repeated, significant improvement in operational
16
efficiency may be had by providing the DSP instructions, including general dyadic instructions and dyadic DSP instructions, within the ISA of the present invention.
The instruction set architecture of the ASSP 150 can be viewed as being two component parts, one (RISC ISA) corresponding to the RISC control unit and another
5 (DSP ISA) to the DSP data paths of the signal processing units 300. The RISC ISA is a register based architecture including sixteen registers within the register file 413, while the DSP ISA is a memory based architecture with efficient digital signal processing instructions. The instruction word for the ASSP is typically 20 bits but can be expanded to 40-bits to control two RISC or DSP instructions to be executed in series or
10 parallel, such as a RISC control instruction executed in parallel with a DSP instruction, or a 40 bit extended RISC or DSP instruction.
The instruction set architecture of the ASSP 150 has 4 distinct types of instructions to optimize the DSP operational mix. These are (1) a 20-bit DSP instruction that uses mode bits in control registers (i.e. mode registers), (2) a 40-bit
15 DSP instruction having control extensions that can override mode registers, (3) a 20-bit dyadic DSP instruction, and (4) a 40 bit dyadic DSP instruction. These instructions are for accelerating calculations within the core processor 200 of the type where D = [ (A opl B) op2 C ] and each of "opl" and "op2" can be a multiply, add or extremum (min/max) class of operation on the three operands A, B, and C. The ISA of the ASSP
20 150 which accelerates these calculations allows efficient chaining of different
combinations of operations. Because these type of operations require three operands, they must be available to the processor. However, because the device size places limits on the bus structure, bandwidth is limited to two vector reads and one vector write each cycle into and out of data memory 202. Thus one of the operands, such as B or C,
25 needs to come from another source within the core processor 200. The third operand can be placed into one of the registers of the accumulator 512 or the RIS.C register file 413. In order to accomplish this within the core processor 200 there are two subclasses of the 20-bit DSP instructions which are (1) A and B specified by a 4-bit specifier, and C and D by a 1-bit specifier and (2) A and C specified by a 4-bit specifier, and B and D
30 by a 1 bit specifier.
Instructions for the ASSP are always fetched 40-bits at a time from program memory with bit 39 and 19 indicating the type of instruction. After fetching, the instruction is grouped into two sections of 20 bits each for execution of operations. In
the case of 20-bit control instructions with parallel execution (bit 39=0, bit 19=0), the two 20-bit sections are control instructions that are executed simultaneously. In the case of 20-bit control instructions for serial execution (bit 39=0, bit 19=1), the two 20-bit sections are control instructions that are executed serially. In the case of 20-btt DSP
5 instructions for serial execution (bit 39=1, bit 19=1), the two 20-bit sections are DSP instructions that are executed serially. In the case of 40-bit DSP instructions (bit 39=1, bit 19=0), the two 20 bit sections form one extended DSP instruction which are executed simultaneously.
The ISA of the ASSP 150 is fully predicated providing for execution prediction. 10 Within the 20-bit RISC control instruction word and the 40-bit extended DSP
instruction word there are 2 bits of each instruction specifying one of four predicate registers within the RISC control unit 302. Depending upon the condition of the : predicate register, instruction execution can conditionally change base on its contents. In order to access operands within the data memory 202 or registers within the
15 accumulator 512 or register file 413, a 6-bit specifier is used in the DSP extended
instructions to access operands in memory and registers. Of the six bit specifier used in the extended DSP instructions, the MSB (Bit 5) indicates whether the access is a memory access or register access. In the preferred embodiment, if Bit 5 is set to logical one, it denotes a memory access for an operand. If Bit 5 is set to a logical zero, it
20 denotes a register access for an operand. If Bit .5 is set to 1, the contents of a specified register (rX where X: 0-7) are used to obtain.the effective memory address and post-modify the pointer field by one of two possible offsets specified in one of the specified rX registers. If Bit 5 is set to 0, Bit 4 determines what register set has the contents of the desired operand. If Bit-4 is set to 0, then the remaining specified bits 3:0 control
25 access to the registers within the register file 413 or to registers within the signal processing units 300.
DSP INSTRUCTIONS
There are four major classes of DSP instructions for the ASSP 150 these are :
30
1) Multiply (MULT): Controls the execution of the main multiplier connected to data
buses from memory.
Controls: Rounding, sign of multiply
18
Operates on vector data specified through type field in address register Second operation; Add, Sub, Min, Max in vector or scalar mode
2) Add (ADD): Controls the execution of the main-adder
5 Controls: absolute value control of the inputs, limiting the result Second operation: Add, add-sub, mult, mac, min, max
3) Extremum (MIN/MAX): Controls the execution of the main-adder
Controls: absolute value control of the inputs. Global or running max/min with T
10 register, TR register recording control
Second operation: add, sub, mult, mac, min, max
4) Misc: type-match and permute operations.
15 The ASSP 150 can execute these DSP arithmetic operations in vector or scalar
fashion. In scalar execution, a reduction or combining operation is performed on the vector results to yield a scalar result. It is common in DSP applications to perform scalar operations, which are efficiently performed by the ASSP 150.
The 20-bit DSP instruction words have 4-bit operand specifiers that can directly
20 access data memory using 8 address registers (r0-r7) within the register file 413 of the RISC control unit 302. The method of addressing by the 20 bit DSP instruction word is regular indirect with the address register specifying the pointer into memory, post-modification value, type of data accessed and permutation of the data needed to execute the algorithm efficiently. All of the DSP instructions control the multipliers 504 A-
25 504B, adders 510A-510C, compressor 506 and the accumulator 512, the functional units of each signal processing unit 300A-300D.
In the 40 bit instruction word, the type of extension from the 20 bit instruction word falls into five categories: 1) Control and Specifier extensions that override the control bits in mode registers
30 2) Type extensions that override the type specifier in address registers
3). Permute extensions that override the permute specifier for vector data in address
registers
4) Offset extensions that can replace or extend the offsets specified in the address
19
registers
5) DSP extensions that control the lower rows of functional units within a signal
processing unit 300 to accelerate block processing.
5 The 40-bit control instructions with the 20 bit extensions further allow a large
immediate value (16 to 20 bits) to be specified in the instruction and powerful bit manipulation instructions.
Efficient DSP execution is provided with 2x20-bit DSP instructions with the first 20-bits controlling.the top functional units (adders 501A and 510B, multiplier
10 504A, compressor 506) that interface to data buses from memory and the second 20 bits controlling the bottom functional units (adder 510C and multiplier 504B) that use internal or local data as operands. The top functional units, also referred to as main units, reduce the inner loop cycles in the inner loop 602 by parallelizing across consecutive taps or sections. The bottom functional units cut the outer loop cycles in
15 the outer loop 601 in half by parallelizing block DSP algorithms across consecutive samples.
Efficient DSP execution is also improved by the hardware architecture of the present invention. In this case, efficiency is improved in the manner that data is supplied to and from data memory 202 to feed the four signal processing units 3.00 and
20 the DSP functional units therein. The data highway is comprised of two buses, X bus 531 and Y bus 533, for X and Y source operands, and one 2 bus 532 for a result write. All buses, including X bus 531, Y bus 533, and Z bus 532, are preferably 64 bits wide. The buses are uni-directional to simplify the physical design and reduce transit times of data. In the preferred embodiment/when in a 20 bit DSP mode, if the X and Y buses are
25 both carrying operands read from memory for parallel execution in a signal processing unit 300, the parallel load field can only access registers within the register file 413 of the RISC control unit 302. Additionally, the four signal processing units 300A-300D in parallel provide four parallel MAC units (multiplier 504A, adder 510A, and accumulator .512) that can make simultaneous computations. This reduces the cycle
30 count from 4 cycles ordinarily required to perform four MACs to only one cycle.
DYADIC DSP INSTRUCTIONS
All DSP instructions of the instruction set architecture of the ASSP 150 are
20
dyadic DSP instructions within the 20 bit or 40 bit instruction word.' A dyadic DSP instruction informs the ASSP in one instruction and one cycle to perform two operations. Referring now to Figure 6B is a chart illustrating the permutations of the dyadic DSP instructions. The dyadic DSP instruction 610 includes a main DSP
5 operation 611 (MAIN OP) and a sub DSP operation 612 (SUB OP), a combination of two DSP instructions or operations in one dyadic instruction. Generally, the instruction set architecture of the present invention can be generalized to combining any pair of basic DSP operations to provide very powerful dyadic instruction combinations. Compound DSP operational instructions can provide uniform acceleration for a wide
10 variety of DSP algorithms not just multiply-accumulate intensive filters. The DSP instructions or operations in the preferred embodiment include a multiply instruction (MULT), an addition instruction (ADD), a minimize/maximize instruction (MIN/MAX) also referred to as an extrema instruction,, and a no operation instruction (NOP) each having an associated operation code ("opcode"). Any two DSP
15 instructions can be combined together to form a dyadic DSP instruction. The NOP instruction is used for the MAIN OP or SUB OP when a single DSP operation is desired to be executed by the dyadic DSP instruction. There are variations of the general DSP instructions such as vector and scalar operations of multiplication or addition, positive or negative multiplication, and positive or negative addition (i.e.
20 subtraction).
Referring now to Figure 6C and Figure 6D, bitmap syntax for an exemplary dyadic DSP instruction is illustrated. Figure 6C illustrates bitmap syntax for a control extended dyadic DSP instruction while Figure 6D illustrates bitmap syntax for a non-extended dyadic DSP instruction. In the non-extended bitmap syntax the instruction
25 word is the twenty most significant bits of a forty bit word while the extended bitmap syntax has an instruction word of forty bits. The three most significant bits (MSBs), bits numbered 37 through 39, in each indicate the MAIN OP instruction type while the SUB OP is located near the middle or end of the instruction bits at bits numbered 20 through 22. In the preferred embodiment, the MAIN OP instruction codes are 000 for
30 NOP, 101 for ADD, 110 for MIN/MAX, and 100 for MULT. The SUB OP code for the given DSP instruction varies according to what MAIN OP code is selected. In the case of MULT as the MAIN OP, the SUB OPs are 000 for NOP, 001 or 010 for ADD, 100 or 011 for a negative ADD or subtraction, 101 or 110 for MIN, and 111 for MAX.
21
In the preferred embodiment, the MAIN OP and the SUB OP are not the same DSP instruction although alterations to the hardware functional blocks could accommodate it. The lower twenty bits of the control extended dyadic DSP instruction, the extended bits, control the signal processing unit to perform rounding, limiting, absolute value of
5 inputs for SUB OP, or a global MIN/MAX operation with a register value.
The bitmap syntax of the dyadic DSP instruction can be converted into text syntax for program coding. Using the multiplication or MULT non-extended instruction as an example, its text syntax for multiplication or MULT is
(vmul|vmuin).(vadd|vsub|vmax|sadd|ssub|smax) da, sx, sa, sy [,(psO)|psl)]
10 The "vmul|vmuln" field refers to either positive vector multiplication or negative vector multiplication being selected as the MAIN OP. The next field, 'Vadd|vsub|vmax|sadd|ssub|smax", refers to either vector add, vector subtract, vector maximum, scalar add, scalar subtraction, or scalar maximum being selected as the SUB OP. The next field,' "da", refers to selecting one of the registers within the accumulator
15 for storage of results. The field "sx" refers to selecting a register within the RISC
register file 413 which points to a memory location in memory as one of the sources of operands. The field **sa" refers to selecting the contents of a register within the accumulator as one of the sources of operands. The field "sy" refers to selecting a register within the RISC register file 413 which points to a memory location in memory
20 as another one of the sources of operands. The field of M[,(ps0)|psl)]" refers to pair
selection of keyword PSO or PSl specifying which are the source-destination pairs of a parallel-store control register. Referring now to Figure 6E and 6F, lists of the set of 20-bit DSP and control instructions for the ISA of the present invention is illustrated. Figure 6G lists the set of extended control instructions for the ISA of the present
25 invention. Figure 6H lists the set of 40-bit DSP instructions for the ISA of the present invention. Figure 61 lists the set of addressing instructions for the ISA of the present invention.
Referring now to Figure 7, a block diagram illustrates the instruction decoding
for configuring the blocks of the signal processing unit 300. The signal processor 300
30 includes the final decoders 704A through 704N, and multiplexers 720A through 720N.
The multiplexers 720A through 720N are representative of the multiplexers 514, 516,
520, and 522 in Figure 5B. The predecoding 702 is provided by the RISC control unit
302 and the pipe control 304. An instruction is provided to the predecoding 702 such
as a dyadic DSP instruction 600. The predecoding 702 provides preliminary signals to the appropriate final decoders 704A through 704N on how the multiplexers 720A through 720N are to be selected for the given instruction. Referring back to Figure SB, in a dyadic DSP instruction the MAIN OP generally, if not a NOP, is performed by the
5 blocks of the multiplier Ml 504A, compressor 506, adder Al 510A, and adder A2 510B. The result is stored in one of the registers within the accumulator register AR 512. In the dyadic DSP instruction the SUB OP generally, if not a NOP, is performed by the blocks of the adder A3 5-IOC and the multiplier M2 504B. For example, if the dyadic DSP instruction is to perform is an ADD and MULT, then the ADD operation of
10 the MAIN OP is performed by the adder A1 510A and the SUB OP is performed by the multiplier Ml 504A. The predecoding 720 and the final decoders 704A through 704N appropriately select the respective multiplexers 720A through 720B to select the MAIN OP to be performed by the adder Al 510A and the SUB OP to be performedlby the multiplier M2 504B. In the exemplary case, multiplexer 520A selects inputs from the
15 data typer and aligner 502 in order for adder A1 510A to perform the ADD operation, multiplexer 522 selects the output from adder 510A for accumulation in the j accumulator 512, and multiplexer 514B selects outputs from the accumulator 512 as its inputs to perform the MULT SUB OP.. The MAIN OP and SUB OP can be either executed sequentially (i.e. serial execution on parallel words) or in parallel (i.e. parallel
20 execution on parallel words). If implemented sequentially, the result of the MAIN OP may be an operand of the SUB OP. The final decoders 704A through 704N have their own control logic to properly time the sequence of multiplexer selection for each element of the signal processor 300 to match the pipeline execution of how the MAIN OP and SUB OP are executed, including sequential or parallel execution. The RISC
25 control unit 302 and the pipe control 304 in conjunction with the final decoders 704A through 704N pipelines instruction execution by pipelining the instruction itself and by providing pipelined control signals. This allows for the data path to be reconfigured by the software instructions each cycle.
30
TELECOMMUNICATIONS PROCESSING
Referring now to Figure 10, a detailed system block diagram of the packetized telecommunication communication network 100' is illustrated. In the packetized
23
telecommunications network 100' an end system 10SA is at a near end while an end
system 10SB is at a far end. The end systems 10SA and/or 10SB can be a telephone, a
fax machine, a modem, wireless pager, wireless cellular telephone or other electronic
device that operates over a telephone communication system. The end system 108A
5 couples to switch 106A which couples into gateway 104A. The end system 108B
couples to switch 106B which couples into gateway 104B. Gateway 104A and gateway
104B couple to the packet network 101 to communicate voice and other
telecommunication data between each other using packets. Each of the gateways 104A
and 104B include network interface cards (NIC) 130A-130N, a system controller board
10 1010, a framercard 1012, and an Ethernet interface card 1014. The network interface
cards (NIC) 130A-130N in the gateways provide telecommunication processing for
multiple communication channels over the packet network 101. On one side, the NICs
i 130 couple packet data into arid out of the system controller board 1010. The packet
data is packetized and depacketized by the system controller board 1010. The system
15 controller board 1010 couples the packets of packet data into and out of the Ethernet interface card 1014. The Ethernet interface card 1014 of the gateways transmits and receives the packets of telecommunication data over the packet network 101. On an opposite side, the NICs 130 couple time division multiplexed (TDM) data into and out of the Jframer card 1012. The framercard 1012 frames the data from multiple switches
20 106 as time division multiplexed data for coupling into the network interface cards 130. The framer card 1012 pulls data out of the framed TDM data from the network interface cards 130 for coupling into the switches 106.
Each of the network interface cards 130 includes a micro controller (cPCI controller) 140 and one or more of integrated telecommunications processors 150A-
25 150N. Each of the integrated telecommunications processors 150N includes one or more RISC/DSP core processor 200, one or more data memory (DRAM) 202, one or more program memory (PRAM) 204, one or more serial TDM interface ports 206 to support multiple TDM channels, a bus controller or memory movement engine 208, a global or buffer memory 210, a host or host bus interface 214, and a microcontroller
30 (MIPS) 223. Firmware flexibly controls the functionality of the blocks in the integrated telecommunications processor 150 which can vary for each individual channel of communication.
Referring now to Figure 11 A, a block diagram of the firmware
24
telecommunications processing modules of the application specific signal processor 150, forming the "integrated telecommunications processor" 150, for one of multiple full duplex channels is illustrated. One full duplex channel consists of two time-division multiplexed (TDM) lime slots on the TDM or near side and two packet data
5 channels on the packet network or far side, one for each direction of communication. The telecommunication processing provided by the firmware can provide telephony processing for each given channel including one or more of network echo cancellation 1103, dial tone detection 1104, a fax processor 1119, voice activity detection 1105, dual-tone multi-frequency (DTMF) signal detection 1106; dual-tone multi-frequency
10 (DTMF) signal generation 1107; dial tone generation 1108; G.7xxx voice encoding (i.e. compression) 1109; G.7xxx voice decoding (i.e. decompression) 1110, and comfort noise generation (CNG) 1111. The firmware for each channel is flexible and can also provide GSM decoding/encoding, CDMA decoding/encoding, digital subscriber line (DSL), modem services including modulation/demodulation, fax services including
15 modulation/demodulation and/or other functions associated with telecommunications services for one or more communication channels. While -Law / A-Law decoding 1101 and -Law /. A-Law encoding 1102 can be performed using firmware, in one embodiment it is implemented in hardware circuitry in order to speed the encoding and decoding of multiple communication channels. The integrated telecommunications :
20 processor 150 couples to the host processor 140 and a packet processor 1120. The host processor 140 loads the firmware into the integrated telecommunications processor to perform the processing in a voice over packet (VoP) network system or packetized
network system.
The -Law / A-Law decoding 1101 decodes encoded speech into linear speech
25 data. The -Law / A-Law encoding 1102 encodes linear speech data into -Law / A-Law encoded speech. The integrated telecommunications processor 150 includes hardware G.71.1 -Law / A-Law decoders and -Law / A-Law encoders. The hardware conversion of A-law/ -law encoded signals.into linear PCM samples and vice versa is optional depending upon the type of signals received. Using hardware for
30 this conversion is preferable in order to speed the conversion process and handle additional communication channels. The TDM signals at the near end are encoded speech signals. The integrated telecommunications processor 150 receives TDM signals from the near end anH decodes them into pulse-code modulated (PCM) linear
data samples S,-n. These PCM linear data samples S;n are coupled into the network
echo-cancellation module 1103. The network echo-cancellation module 1103 removes
an echo estimated signal from the PCM linear data samples S;n to generate PCM linear
data samples Som. The PCM linear data samples S0u( are provided to the DTMF
5 detection module 1106 and the voice-activity detection and comfort-noise generator
module 1105. The output of the Network Echo Canceller (Sout) is coupled into the
Tone Detection module 1104, the DTMF Detection module 1106, and the Voice
Activity Detection module 1105. Control signals from the Tone Detection module
1104 are coupled back into the Network Echo Cancellation module 1103. The decoded
10 speech samples from the far end are PCM linear data samples Rin and are coupled into
the network echo cancellation module 1103. The network echo cancellation module
1103 copies Rin for echo cancellation purposes and passes it out as PCM Hnear data
samples Rout. The PCM linear data samples R<,U[ are coupled into the mu-Iaw and A-
law encoding module 1102. The PCM linear data samples R^t are encoded into mu-
15 law and A-law encoded speech and interleaved into the TDM output signals of the
TDM channel Output to the near end. The interleaving for framing of the data is
performed after the linear to A-Iaw/mu-law conversion by a Framer (not shown in
Figure 11 A) which puts the individual channel data into different time slots. For
example, for Tl signaling there are 24 such time slots for each Tl frame.
20 The Network Echo Cancellation module 1103 has two inputs and two outputs
because it has full duplex interfaces with both the XDM channels and the packet network via the VX-Bus. The network echo cancellation module 1103 cancels echoes from linear as well as non-linear sources in the communication channel. The network echo cancellation module 1103 is specifically tailored to cancel non-linear echoes
25 associated with the packet delays/latency generated in the packetized network.
The tone detection module 1104 receives both tone and voice signals from the network cancellation module 1103. The tone detection module 1104 discriminates the tones from the voice signals in order to determine what the tones are signaling. The tone detection module determines whether or not the tones from the near end are call
30 progress tones (dial tone, busy tone, fast busy tone, etc.) signaling on-hook, ringing, off-hook or busy, or a fax/modem call. If a far end is dialing the near end, the call progress tones of on-hook, ringing, or off-hook or busy signal is translated into packet signals by the tone detection module for transmission over the packet network to the far
end. If the tone detection module determines that fax/modem tones are present
indicating that the near end is initiating a fax/modem call, further voice processing is
bypassed and the echo cancellation by the network echo cancellation module 1103 is
disabled.
5 To detect tones, the tone detection module 1104 uses infinite impulse-response
(IIR) filters and accompanying logic. When a FAX or modem tone signaling tone is detected, the signaling tones help control the respective signaling event. The tone detection module 1104 detects the presence of several in-band tones at specific frequencies, checks their cadences, signals their presence to the echo cancellation
10 module 1103, and prompts other modules to take appropriate actions. The tone
detection module 1104 and the DTMF detection module operate in parallel with the network echo canceller 1103.
The tone detection module can detect true tones with signal amplitude levels from 0 dB to -40 dB in the presence of a reasonable amount of noise. The tone
15 detection module can detect tones within a reasonable neighborhood of center
frequency with detection delays within a prescribed limit. The tone detection module matches the tone cadences, as required by the tone-cadence rules defined by the ITU/TIA standards. To achieve the above properties, certain trade-offs are necessary in that the tone detection module must adjust several energy thresholds, the filter roll-off
20 -rate, and the filter stop band attenuation. Furthermore, the tone detection module is easily upgradeable to allow detection of additional tones simply by updating the firmware. The current telephony-related tones that the tone-detection module 1104 can detect are listed in the following table:
27
Tones the Tone-Detection Module Detects
Tone Name Tone Description 'On* Time •Off Time
FAXCED 2100 Hz 2.6 to 4 seconds
EchoCancellation Disable / Modem Tones 2100 Hz, with phase reversal every 450 ms 2.6 to 4 seconds
FAX CNG 1100 Hz 0.5 seconds 3 seconds
FAX V.21 7E flags frequency-shifl keying at 1750-Hz carrier. At least three 7E flags signal the onset of a FAX signal being sent.
2400 Hzi In-band signaling tones and continuity check tones G.168 Test 8 describes the performance of echo cancellation in the presence of these tones.
2600 Hz
When a 2100-Hz tone with phase reversal is detected indicating a V-series modem operation the echo canceller is shut off temporarily. When the tone detection module
5 detects facsimile tones, the echo canceller is shut off temporarily. The tone detection module can also detect the presence of narrowband signals, which can be control signals to control the actions of the echo cancellation module 1103. The tone detection modules function both during call set up and while the call progress through termination of the communication channel for the call. Any tone which is sent,
10 generated, or detected before the actual call or communication channel is established, is referred to as an out-of-band tone. Tones which are detected during a call, after the call has been set-up, are referred to as in-band tones. The Tone Detector, in it's most general form, is capable of detecting many signaling tones. The tones that are detected include the call progress tones such as a Ringing Tone, a Busy Tone, a Fast Busy Tone,
15 a Caller ID Tone, a Dial Tone, and other signaling tones which vary from country to
country. The, call progress tones control the handshaking required to set up a call. Once a call is established, all the tones which are generated and detected are referred to as in-band tones. The same Tone Detectors and Generators Blocks are used both for in-band and out-of band tone detection and generation.
20 Figure 1 IB illustrates a process 1121 for tone detection that can be implemented
28
by a tone detection processor/module according to one embodiment of the invention. As previously discussed, the tone detection module 1104 receives both tone and voice signals from the network cancellation module and discriminates the tones from the voice signals in order to determine what the tones are signaling. The tone detection
5 module determines whether or not the tones are call progress tones (dial tone, busy
tone, fast busy tone, etc.) signaling on-hook, ringing, off-hook or busy, a fax call signal, or a modem call signal.
Upon start (block 1122), the process 1121 receives incoming tone and voice data frames. A frame is composed of N samples of the incoming tone/voice signal. In
10 one embodiment, a frame is composed of, for example, 120 samples. Frequency
resolution increases as the frame size increases. 120 samples was chosen to optimize both time and frequency resolution. The process 1121 operates on a frame by frame basis. The process 1121 first performs automatic gain control (AGC) (block 1124).: The principal of operation of the AGC is based on normalizing the power of the
15 incoming tone/voice signal to make sure that the gain is not so high that it will overflow the Goertzel filter. In doing so, the AGC computes the total energy (e.g. Xx(n)rt2).
Next, the process 1121 utilizes a Goertzel Filter process which implements a plurality of Goertzel filters to determine the energy of the tone/voice signal at specific frequencies. The Goertzel filter is a type of discrete Fourier transform to obtain a
20 power spectrum, as a iunction of frequency, for a given signal waveform. The Goertzel filter is a type of infinite impulse-response (HR) filter and is well known in the art. These specific frequencies can be chosen by the user of the "integrated telecommunications processor" 150. Figure 11C shows a table of common frequencies used in the telecommunications industry and associated exemplary coefficients for the
25 Goertzel filter. Also, it should be appreciated, that the user can define two frequencies to define dual-tone multi-frequency (DTMF) tones, as well as other combinations of frequencies, to define various tones.
In one embodiment, the Goertzel filter computes the energy levels of the tone/voice signal at 16 specific frequencies. This takes advantage of the architecture of
30 the integrated telecommunications processor 150. In one embodiment, the integrated telecommunications processor 150 includes a RISC/DSP core processor 200 that includes four signal processors 300a-d that can operate in parallel to perform four Goertzel filters, simultaneously. Thus, in four cycles of the core processor 200, 16
29
Goertzel filters can be computed to determine the energy levels of the tone/voice si