Sign In to Follow Application
View All Documents & Correspondence

System, Apparatus And Method For Handling Multi Protocol Traffic In Data Link Layer Circuitry

Abstract: In one embodiment, an apparatus includes: a transaction layer circuit to output transaction layer information; and a link layer circuit coupled to the transaction layer circuit, the link layer circuit to receive and process the transaction layer information and to output link layer information to a physical circuit. The link layer circuit may include a first selection circuit to receive and direct cache memory protocol traffic to a selected one of a first logical port and a second logical port. Other embodiments are described and claimed.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
16 November 2021
Publication Number
25/2022
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
ipo@iphorizons.com
Parent Application

Applicants

Intel Corporation
2200 Mission College Boulevard, Santa Clara, California 95054, USA

Inventors

1. NITISH PALIWAL
6117 SE Hacienda Street Hillsboro OR 97123 USA
2. PEEYUSH PUROHIT
2413 NW Byrne Terrace Portland OR 97229 USA
3. SWADESH CHOUDHARY
227 Pettis Avenue Apartment 2 Mountain View CA 94041 USA
4. MANJULA PEDDIREDDY
3826 Eastwood Circle Santa Clara CA 95054 USA
5. MAHESH NATU
713 Sands Way Folsom CA 95630 USA
6. MAHESH WAGH
13527 NW Hogan Street Portland OR 97229 USA

Specification

Claims:CLAIM
1. An apparatus comprising:
a transaction layer circuit to output transaction layer information; and
a link layer circuit coupled to the transaction layer circuit, the link layer circuit to receive and process the transaction layer information and to output link layer information to a physical circuit, the link layer circuit comprising:
a first selection circuit to receive and direct cache memory protocol traffic to a selected one of a first logical port and a second logical port.

2. The apparatus of claim 1, further comprising a control circuit coupled to the first selection circuit, the control circuit to control the first selection circuit to direct the cache memory protocol traffic to the second logical port, the second logical port mapped to a second physical port of the physical circuit.

3. The apparatus of claim 2, wherein the control circuit comprises:
a first buffer to store the cache memory protocol traffic; and
a second buffer to store memory protocol traffic.

4. The apparatus of claim 3, wherein the first selection circuit is coupled to the first buffer, the first selection circuit to direct the cache memory protocol traffic to a selected one of the first logical port or the second logical port according to control information from the control circuit.

5. The apparatus of claim 3, further comprising an address decoder coupled to the second buffer, the address decoder to direct the memory protocol traffic to at least one of the first logical port, the second logical port, a third logical port, and a fourth logical port.

6. The apparatus of claim 1, further comprising:
a second selection circuit coupled to the first logical port and a third logical port and to direct the cache memory protocol traffic and memory protocol traffic to another selection circuit; and
a third selection circuit coupled to the second logical port and a fourth logical port and to direct the cache memory protocol traffic and the memory protocol traffic to the another selection circuit.

7. The apparatus of claim 6, further comprising the another selection circuit coupled to the second selection circuit and the third selection circuit.

8. The apparatus of claim 2, further comprising at least one configuration register having a first field to store a first indicator, which when a first value is to cause the first selection circuit to direct the cache memory protocol traffic to the second logical port and a second field to store a second indicator which, when set, is to prevent an update to the first indicator.

9. The apparatus of claim 2, wherein the control circuit is to receive a configuration message based on a protocol encoding received from the physical circuit, the protocol encoding based on detection of a type of device coupled to the second physical port of the physical circuit.

10. The apparatus of claim 2, wherein the control circuit is to cause the first selection circuit to direct the cache memory protocol traffic to the second logical port when a Compute Express Link (CXL) Type 2 device is coupled to the second physical port of the physical circuit.

11. A method comprising:
receiving configuration information regarding connection of a cache-capable device to a first port of a plurality of ports of a physical circuit coupled to the cache-capable device via a data bus;
in response to the configuration information, configuring a first selection circuit of a link layer coupled to the physical circuit to cause cache memory protocol traffic to be directed to a first logical port of the link layer mapped to the first port of the physical circuit; and
directing the cache memory protocol traffic, via the first selection circuit, to the first logical port and thereafter to the first port of the physical circuit for transfer to the cache-capable device.

12. The method of claim 11, further comprising receiving the configuration information comprising one or more protocol encoding messages from the physical circuit, to indicate detection of one or more devices coupled to the plurality of ports of the physical circuit.

13. The method of claim 12, further comprising receiving the configuration information comprising a first indicator to indicate that the cache-capable device is coupled to the first port of the physical circuit.

14. The method of claim 12, further comprising directing memory protocol traffic, via an address decoder, to a second logical port of the link layer and thereafter to a second port of the plurality of ports of the physical circuit for transfer to a memory device coupled to the second port.

15. At least one computer readable storage medium having stored thereon instructions, which if performed by a machine cause the machine to perform the method of any one of claims 11 to 14.

16. An apparatus comprising means to perform a method as claimed in any one of claims 11 to 14.

17. A system comprising:
a host processor comprising one or more cores and a Compute Express Link (CXL) interface circuit, the CXL interface circuit comprising:
a transaction layer circuit to output transaction layer information;
a link layer circuit coupled to the transaction layer circuit, the link layer circuit to receive and process the transaction layer information and to output link layer information to a physical circuit, the link layer circuit comprising:
a first selection circuit to receive and direct CXL.cache protocol traffic to a selected one of a first logical port and a second logical port; and
a plurality of logical ports comprising the first logical port, the second logical port, a third logical port and a fourth logical port, wherein the first selection circuit is coupled to the first logical port and the second logical port; and
a physical layer circuit coupled to the link layer circuit, the physical layer circuit comprising a logical circuit and a physical circuit, the physical circuit comprising a plurality of physical ports comprising a first physical port, a second physical port, a third physical port and a fourth physical port;
a cache-capable device coupled to the host processor via a CXL link, the cache-capable device coupled to the second physical port, the second physical port mapped to the second logical port; and
a memory device coupled to the host processor via the CXL link, the memory device coupled to the first physical port, the first physical port mapped to the first logical port.

18. The system of claim 17, wherein the link layer circuit comprises a control circuit coupled to the first selection circuit, the control circuit to control the first selection circuit to direct the CXL.cache protocol traffic to the second logical port.

19. The system of claim 18, wherein the control circuit comprises:
a first buffer to store the CXL.cache protocol traffic; and
a second buffer to store CXL.memory protocol traffic.

20. The system of claim 19, wherein the first selection circuit is coupled to the first buffer, the first selection circuit to direct the CXL.cache protocol traffic to the second logical port according to control information from the control circuit.

21. The system of claim 20, wherein the control circuit is to generate the control information in response to a protocol encoding from the physical layer circuit, the protocol encoding to indicate detection of the cache-capable device coupled to the second physical port.

22. The system of claim 20, wherein the control circuit is to further disable cache functionality of the cache-capable device when another cache-capable device is coupled to the first physical port, the first physical port advertised as Port 0.
, Description:CROSS-REFERENCE TO RELATED APPLICATION
[0001] This application claims priority to U.S. Non-Provisional Patent No. 17/128,648, filed on December 21, 2020, entitled SYSTEM, APPARATUS AND METHOD FOR HANDLING MULTI-PROTOCOL TRAFFIC IN DATA LINK LAYER CIRCUITRY, the disclosure of which is hereby incorporated by reference.
Technical Field
[0002] Embodiments relate to multi-protocol communications via a link.
Background
[0003] Compute Express Link (CXL) is an interconnect technology that allows attachment of CXL-compliant devices to host processor systems. CXL links are implemented according to a given CXL specification such as the CXL Specification version 1.1 (published June 2019). A device may couple to such CXL links via a FlexBus (FxB) port. There may be single or multiple devices attached on the same FxB port concurrently using link subdivision. However currently there are restrictions as to types of devices allowed to be connected to specific lanes of the link, which unduly limits flexibility.
Brief Description of the Drawings
[0004] FIG. 1 is a block diagram of a control circuit that may be present in a link layer in accordance with an embodiment.
[0005] FIG. 2 is a block diagram of a control circuit that may be present in a link layer in accordance with another embodiment.
[0006] FIG. 3 is a flow diagram of a method in accordance with an embodiment.
[0007] FIG. 4 is a flow diagram of a method in accordance with another embodiment.
[0008] FIG. 5 is a flow diagram of a method in accordance with a still further embodiment.
[0009] FIG. 6 is a block diagram of an interface circuit in accordance with an embodiment.
[0010] FIG. 7 is a block diagram of a system in accordance with an embodiment.
[0011] FIG. 8 is a block diagram of a system in accordance with another embodiment of the present invention.
[0012] FIG. 9 is a block diagram of an embodiment of a SoC design in accordance with an embodiment.
[0013] FIG. 10 is a block diagram of a system in accordance with another embodiment of the present invention.
[0014] FIG. 11 is a block diagram of a network architecture in accordance with an embodiment.
Detailed Description
[0015] In various embodiments, a data link layer (also “link layer” or “link layer circuit” herein) is provided that can support multi-protocol message transmission and further allow flexibility as to device types that may be attached on specific lanes of an interconnect. As a result, richer feature sets can be realized without an added cost in terms of silicon area and power.
[0016] This data link layer may be part of a protocol stack through which communications flow, where the protocol stack further includes a transaction layer and a physical layer. While flexible device attachment and link layer circuitry are described in connection with a CXL-based system, embodiments are not limited in this regard.
[0017] Further while one example use case is for a cloud-based architecture that may communicate using interconnects and switches in accordance with a CXL specification such as the CXL 1.1 Specification or any future versions, modifications, variations or alternatives, other implementations are possible. For example embodiments may be used in other coherent interconnect technologies such as an IBM XBus protocol, an Nvidia NVLink protocol, an AMD Infinity Fabric protocol, cache coherent interconnect for accelerators (CCIX) protocol or coherent accelerator processor interface (OpenCAPI).
[0018] In a CXL implementation, traffic flows of different communication protocols are sent along CXL interconnects. For example, there may be separate traffic flows including so-called CXL.cache, CXL.io and CXL.mem communication protocols via which traffic of different communication protocols is communicated. More generally, the interconnect may support various interconnect protocols, including a non-coherent interconnect protocol, a coherent interconnect protocol, and a memory interconnect protocol. Non-limiting examples of supported interconnect protocols may include PCI, PCIe, USB, IDI, IOSF, SMI, SMI3, SATA, CXL.io, CXL.cache, and CXL.mem, and/or the like.
[0019] With embodiments a microarchitecture for a data link layer may include circuitry to flexibly support multiple protocols (such as both CXL.cache and CXL.mem (collectively here “CXL.cache-mem”) through a single pipeline, where protocol traffic is routed to apropos bifurcated physical lanes without adding extra performance overheads and hardware penalties.
[0020] In different implementations, configuration of a link layer to provide flexible port mappings can be realized using hardware autonomous techniques that may ameliorate any system software overhead, or using a software-based technique via system level software. With embodiments herein, platform architects and users can realize flexible attach points for CXL Type 1 and 2 (cache-capable) devices without additional hardware cost. That is, link layer and other circuitry may be reused across ports instead of duplicating data main-band modules for packing communications. As such, embodiments provide flexibility with negligible power, performance, and area impact.
[0021] With embodiments, common design resources are used to realize link subdivision without logic replication, thus providing flexibility with negligible power and area impact, and no performance impact. For instance, embodiments use shared buffer resources for CXL.cache protocol traffic across ports and use multiplexing and demultiplexing schemes to route traffic to apropos device/port, realizing significant reduction of bit-cells for storage.
[0022] Embodiments also operate with zero performance degradation both in terms of latency and bandwidth, by simplifying selection circuitry to be a single logic level scheme and restricting bus routing requirements for CXL.cache channel only. Other link specific peripheral functions built into the controller such as RAS, power management may continue to operate on per-port granularity, seamlessly. Other unique host specific implementation functions built into the controller such as bus lock and interrupt handling may also operate seamlessly and need not be made aware of Flexbus topology.
[0023] In embodiments a link layer may include a personality agnostic data link layer (CXLCM) module as a controller instance (referred to herein as a “CXLCM controller”) capable of transporting CXL.cache and CXL.mem protocol messages over a FxB link without any restrictions on where a cache-capable device is attached. In some cases any port may be selected as an attach point for a cache-capable device. According to current CXL architectural requirements, a cache-capable Type 1 or Type 2 device warrants support for x8 link width. To reduce routing requirements, an embodiment may enable selection of a limited number of ports as a choice of attach point for connecting an accelerator to CPU host. Of course embodiments are not limited in this regard, and other embodiments may extend support to all ports where support for x4 cache-capable devices is warranted.
[0024] In an embodiment, the CXLCM controller may interface with a logical physical unit (PHY) module according to a data bus in accordance with a Logical PHY Interface (LPIF) Specification, such as the LPIF Specification version 1.0 (March 2019) or any future versions, modifications or variations. To enable CXCLM controller configuration, embodiments may use LPIF-based information/encodings to identify detected devices. In an embodiment, a LPIF data bus interface is 16 lanes wide [15:0] and is shared across subdivided ports based on how a FxB link is natively subdivided. For example, for 2x8 attachments, lower 8 lanes [7:0] map to Port 0 and upper 8 lanes [15:8] map to Port 2 respectively. Similarly, for 4 x4 devices, lanes [3:0], [7:4], [11:8], and [15:12] map to Port 0, Port 1, Port 2, and Port 3, respectively.
[0025] With this in perspective, FIGs. 1 and 2 outline a high level block diagram of link layer circuitry for both transmit (Tx) and receive (Rx) directions, where a link is subdivided in 4x4 port configuration, and Ports 0 and 2 are annotated to support both cache and memory traffic. Of course other variations of ports and links widths are possible in other embodiments.
[0026] Referring now to FIG. 1, shown is a block diagram of a control circuit that may be present in a link layer in accordance with an embodiment. As shown in FIG. 1, a transmit pipe of a control circuit 100 such as a CXLCM controller is shown. This control circuit may be present in link layer circuitry of a CXL protocol stack. While this control circuit is for an embodiment according to a CXL implementation, understand that other implementations are possible.
[0027] As illustrated, incoming information, e.g., from a transaction layer, may be received in a given buffer 110, 115. More specifically, incoming cache memory protocol traffic may be stored in cache transmit buffers 110. In turn incoming memory protocol traffic may be stored in memory transmit buffers 115. Note that buffers 110, 115 may include or be associated with control circuitry to handle buffer management, e.g., using a credit-based mechanism. With dedicated protocol-based buffers, multi-protocol design may be simplified and various forward progress rules are enforced.
[0028] When a given cache memory data unit, such as a packet, flit or so forth is selected for output from buffer 110, it is provided to a first selection circuit 120, which may be implemented as a swizzle multiplexer. Note that communication of protocol traffic throughout the link layer may be on the basis of CXL network flits. As shown, first selection circuit 120 may direct this cache memory protocol traffic to a selected one of multiple logical ports 130.
[0029] Specifically as shown, in this example there are four logical ports 1300–1303. Each logical port may be mapped to a corresponding physical port present in a physical circuit (not shown for ease of illustration in FIG. 1). With embodiments herein, cache memory protocol traffic may be directed to a selected one of logical port 1300 and logical port 1302. Of course in other implementations, this traffic may be directed to any of the other logical ports.
[0030] Also, depending upon particular configuration each of these ports also may be configured to receive memory protocol traffic. Address decoder 125 in turn may direct a given memory protocol traffic unit to a given one of logical ports 130. Address decoder 125 routes CXL.mem protocol messages to any of ports 1300-3, depending on which CXL device a memory request message is destined. In this way address decoder 115 routes a memory request transaction in one-master-to-many-subordinates fashion.
[0031] Ports 130 may be defined as logical (and potentially structural) entities that provide one-to-one mapping to a subdivided FxB link (and thus to corresponding physical ports of a physical layer). Annotation of (CM) in ports 1300 and 1302 specifies that a given port is capable of packing cache and memory transactions, whereas (M) specifies that a given port is capable of packing memory transactions only (e.g., ports 1301 and 1303). In this way, embodiments implement a silicon area optimized design where a ‘superset’ capable port is not required for all port [i] instances.
[0032] In an example each port 130 may output N bytes of data information (e.g., 16 bytes in one example). As further illustrated, the output of pairs of ports 130 may be provided to an additional level of selection circuits, namely subdivision multiplexers 140, 145. In turn, up to 2N bytes may be output from these multiplexers to another level selection circuit, namely another subdivision multiplexer 150, which may pass given traffic on particular lanes of a data bus 160. Note that in another embodiment, the various selection circuits may be implemented below port [i] hierarchies. Data bus 160 may be implemented as a LPIF data bus to couple a link layer to a physical layer circuit, that in turn couples to one or more devices via a CXL link. Understand while shown at this high level in the embodiment of FIG. 1, many variations and alternatives are possible.
[0033] Referring now to FIG. 2, shown is a block diagram of a control circuit that may be present in a link layer in accordance with another embodiment. As shown in FIG. 2, a receive pipe of a control circuit 200 such as a CXLCM controller is shown. This control circuit may be present in link layer circuitry of a CXL protocol stack (and may be included as part of the same circuitry as control circuit of FIG. 1).
[0034] As illustrated, incoming information, e.g., from a data bus 260 is received via physical layer circuitry and provided via selection circuitry implemented as a subdivision demultiplexer 250 and to an additional level of selection circuits, namely subdivision demultiplexers 240, 245. In turn, up to 2N bytes that are received may be output in N byte chunks to corresponding logical ports 2300–2303, each of which may be mapped to a corresponding physical port present in a physical circuit. In this receive direction, logical port 2300 and logical port 2302 may communicate via a selection circuit 220, which may be implemented as a swizzle demultiplexer, to a cache receive buffer 210. In turn memory protocol traffic may be directed from any of ports 2300-230n to a memory receive buffer 215. Understand while shown at this high level in the embodiment of FIG. 2, many variations and alternatives are possible.
[0035] Selection circuitry present in control circuits 100 and 200 (e.g.,) may be used to route concurrent cache and memory traffic to various sub-divided ports through a controller stack. Depending on implementation, selection control for this circuitry may be performed on a hardware autonomous or software basis.
[0036] In a hardware autonomous implementation, a link layer may use information presented on a LPIF control interface bus to determine whether to activate particular select lines. To this effect, physical layer (pl_protocol) encodings according to a LPIF definition provide clues about device types that are detected during training. Details about such 3-bit encodings are available in the LPIF specification and attached devices advertise protocol encodings during boot-time that it intends to be operational during run-time. This hardware autonomous approach may enable a CXLCM controller to sample and latch pl_protocol encodings and implement a phased priority approach to activate selection circuitry.
[0037] Table 1 illustrates a possible approach; however an alternate implementation may choose not to follow this behavior.

Documents

Application Documents

# Name Date
1 202144052566-FORM 1 [16-11-2021(online)].pdf 2021-11-16
2 202144052566-DRAWINGS [16-11-2021(online)].pdf 2021-11-16
3 202144052566-DECLARATION OF INVENTORSHIP (FORM 5) [16-11-2021(online)].pdf 2021-11-16
4 202144052566-COMPLETE SPECIFICATION [16-11-2021(online)].pdf 2021-11-16
5 202144052566-FORM-26 [26-02-2022(online)].pdf 2022-02-26
6 202144052566-FORM 3 [13-05-2022(online)].pdf 2022-05-13
7 202144052566-FORM 3 [16-11-2022(online)].pdf 2022-11-16
8 202144052566-FORM 18 [16-12-2024(online)].pdf 2024-12-16