Abstract: Various embodiments provide a method and apparatus of providing an RE aware technique for placing slots based on redundancy across and within slot communication pairs.
METHOD AND APPARATUS FOR PROVIDING TRAFFIC RE-AWARE
SLOT PLACEMENT
TECHNICAL FIELD
The inventions relate generally to methods and apparatus for providing
slot placement.
BACKGROUND
This section introduces aspects that may be helpful in facilitating a
better understanding of the inventions. Accordingly, the statements of this
section are to be read in this light and are not to be understood as admissions
about what is in the prior art or what is not in the prior art.
In some known virtual machine placement strategies, virtual machines
are placed in data center racks based on minimizing the bandwidth
requirements of data transfers across racks.
SUMMARY OF SOME EMBODIMENTS
Various embodiments provide a method and apparatus of providing an
RE-aware technique for placing slots based on redundancy across and within
slot communication pairs.
In one embodiment, an apparatus is provided for providing slot
placement. The apparatus includes a data storage and a processor
communicatively connected to the data storage. The processor is prgrammed
to: determine a plurality of slots to be placed within a plurality of containers;
determine a plurality of redundancy parameters; and determine a placement
of the plurality of slots in the plurality of containers based on the plurality of
redundancy parameters.
In some of the embodiments, a plurality of the plurality of slots are
virtual machines.
In some of the embodiments, a plurality of the plurality of containers
are racks in a data center.
In some of the embodiments, the redundancy parameters are based on
intra-comm-redundancy and inter-comm-redundancy of at least a portion of
the plurality of slots.
In some of the embodiments, the redundancy parameters are based on
communication patterns of at least a portion of the plurality of slots.
In some of the embodiments, the determination of the placement of the
plurality of slots comprises programming the processor to: determine a
plurality of clusters; assign each of at least a portion of the plurality of slots to
at least one of the plurality of clusters; and assign each of at least a portion of
the plurality of clusters to the at least one of the plurality of containers.
In some of the embodiments, the assignment of the portion of the
plurality of slots is based on a required slot bandwidth after redundancy
elimination.
In some of the embodiments, the assignment of the portion of the
plurality of slots is further based on one or more slot placement constraints.
In some of the embodiments, the assignment of the portion of the
plurality of clusters is based on an effective-inter-container-bandwidth-usage.
In some of the embodiments, the determination of the placement of the
plurality of slots comprises further programming the processor to exchange a
first placed slot and a second placed slot based on a determination that intercontainer-
bandwidth-usage will be improved and that the exchange meets
one or more inter-container bandwidth constraints.
In a second embodiment, a switching system fof providing slot
placement. The system including: a placement controller and a plurality of
agents communicatively connected to the placement controller. The plurality
of agents are programmed to: collect a plurality of redundancy measurements
and send the plurality of redundancy measurements to the placement
controller. The placement controller is prgrammed to: receive the plurality of
redundancy measurements, determine a plurality of slots to be placed within a
plurality of containers, determine a plurality of redundancy parameters based
on the plurality of redundancy measurements, and determine a placement of
the plurality of slots in the plurality of containers based on the plurality of
redundancy parameters.
In some of the embodiments, the determination of the placement of the
plurality of slots comprises programming the placement controller to:
determine a plurality of clusters, assign each of at least a portion of the
plurality of slots to at least one of the plurality of clusters, and assign each of
at least a portion of the plurality of clusters to the at least one of the plurality of
containers.
In some of the embodiments, the assignment of the portion of the
plurality of slots is based on a required slot bandwidth after redundancy
elimination.
In some of the embodiments, the assignment of the portion of the
plurality of slots is further based on one or more slot placement constraints.
In some of the embodiments, the assignment of the portion of the
plurality of clusters is based on an effective-inter-container-bandwidth-usage.
In some of the embodiments, the determination of the placement of the
plurality of slots comprises further programming the placement controller to:
exchange a first placed slot and a second placed slot based on a
determination that inter-container-bandwidth-usage will be improved and that
the exchange meets one or more inter-container bandwidth constraints.
In a third embodiment, a method is provided for providing slot
placement. The method including: determining a plurality of slots to be placed
within a plurality of containers, determining a plurality of redundancy
parameters, and determining a placement of the plurality of slots in the
plurality of containers based on the plurality of redundancy parameters.
In some of the embodiments, the method further includes exchanging a
first placed slot and a second placed slot based on a determination that intercontainer-
bandwidth-usage will be improved and that the exchange meets
one or more inter-container bandwidth constraints.
In some of the embodiments, the step of determining the placement of
the plurality of slots includes: determining a plurality of clusters, assigning
each of at least a portion of the plurality of slots to at least one of the plurality
of clusters, and assigning each of at least a portion of the plurality of clusters
to the at least one of the plurality of containers.
In some of the embodiments, the step of assigning the portion of the
plurality of slots is based on a required slot bandwidth after redundancy
elimination.
BRIEF DESCRIPTION OF THE DRAWINGS
Various embodiments are illustrated in the accompanying drawings, in
which:
FIG. 1 illustrates an embodiment of an RE-aware data center 100
serving clients 190-1 and 190-2 via network 180;
FIG. 2 depicts a flow chart illustrating an embodiment of a method 200
for an RE-aware system (e.g., RE-aware data center 100 of FIG. 1) to perform
RE-aware placements of slots;
FIG. 3 illustrates a functional block diagram 300 of redundancy in an
RE-aware data center 100;
FIG. 4 depicts a flow chart illustrating an embodiment of a method 400
for a placement controller (e.g., placement controller 170 of FIG. 1) to place
slots as illustrated in step 270 of FIG. 2;
FIG. 5 depicts a flow chart illustrating an embodiment of a method 500
for placing slots as illustrated in step 480 of FIG. 4;
FIG. 6 depicts a flow chart illustrating an embodiment of a method 600
for placing slots as illustrated in step 480 of FIG. 4 and steps 520, 540 and
FIG. 7 depicts pseudo code illustrating an embodiment of steps 6 10 -
640 of FIG. 6;
FIG. 8 depicts pseudo code illustrating an embodiment of step 650 of
FIG. 6;
FIG. 9 depicts pseudo code illustrating an embodiment of step 670 of
FIG. 6; and
FIG. 10 schematically illustrates an embodiment of various apparatus
1000 such as one one of end hosts 120, one of ToR switches 130, one of
aggregation switches 150, or placement controller 170 of FIG. 1.
To facilitate understanding, identical reference numerals have been
used to designate elements having substantially the same or similar structure
or substantially the same or similar function.
DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS
The description and drawings merely illustrate the principles of the
invention. It will thus be appreciated that those skilled in the art will be able to
devise various arrangements that, although not explicitly described or shown
herein, embody the principles of the invention and are included within its
scope. Furthermore, all examples recited herein are principally intended
expressly to be only for pedagogical purposes to aid the reader in
understanding the principles of the invention and the concepts contributed by
the inventor(s) to furthering the art, and are to be construed as being without
limitation to such specifically recited examples and conditions. Additionally,
the term, "or," as used herein, refers to a non-exclusive or, unless otherwise
indicated (e.g., "or else" or "or in the alternative"). Also, the various
embodiments described herein are not necessarily mutually exclusive, as
some embodiments can be combined with one or more other embodiments to
form new embodiments.
Various embodiments provide a method and apparatus of providing an
RE-aware technique for placing slots based on redundancy across and within
slot communication pairs. Advantageously, such placement may reduce
overall bandwidth usage of inter-container (e.g., racks or data center) links.
FIG. 1 illustrates an embodiment of an RE-aware data center 100
serving clients 190-1 and 190-2 via network 180. The RE-aware data center
100 includes racks 110-1 and 110-2 (collectively, racks 110), top of the rack
switches 130-1 and 130-2 (collectively, ToR switches 130), redundancy
elimination (RE) boxes 140-1 and 140-2 (collectively, RE boxes 140),
aggregation switches 150-1 - 150-2 (collectively, aggregation switches 150)
and placement controller 170. The RE-aware data center 100 also includes
links 135-1 , 145-1 , 155-1 , 175-1 , and 185-1 (collectively, RE-aware data
center links) (remaining link labels have been omitted for the purpose of
clarity). Racks 110 are communicatively connected with network 180 via an
appropriate one of the ToR switches 130, RE boxes 140 aggregation switches
150, and appropriate RE-aware data center links. It should be appreciated
that RE-aware data center 100 may be architected in any suitable
configuration and that RE-aware data center 100 is just one exemplary
architecture being used for illustrative purposes. Placement controller 170
controls the allocation of resources (e.g., processing, memory, storage or
networking) in racks 110 within RE-aware data center 100.
Racks 110 include end hosts 120-1-EH1 - 120-1-EH5 and 120-2-EH1
- 120-2-EH5 (collectively, end hosts 120). It should be appreciated that while
2 racks are illustrated here, system 100 may include more of less racks. It
should be further appreciated that not all of the racks of system 100 may
include end hosts.
End host 120-1 -EH 1 include slots 122-1 - 122-3 (collectively, slots
122) (illustrated as hashed boxes) and agent 160-1-A1 (illustrated as
checkered box). It should be appreciated that while 3 slots are illustrated in
end host 120-1 -EH 1, each of end hosts 120 may include fewer or more slots
(remaining slots have been omitted for the purpose of clarity), including some
end hosts having no slots. It should be further appreciated that while one ( 1 )
agent is illustrated in each of end hosts 120 (e.g., agents 160-1-A1 - 160-1-
A5 and 160-2-A1 - 160-2-A5), each of end hosts 120 may include fewer of
more agents, including some end hosts having no agents.
Slots 122 are a collection of allocated resources configured to service
application instances serving clients 190-1 and 190-2 (collectively, clients
190). The term "slot" as user herein may be any suitable collection of
allocated resources such as a virtual machine comprising
processing/compute, memory or networking resources or virtualized storage.
ToR switches 130 switch data between end hosts in an associated rack
and appropriate aggregate switch(es). For example, ToR switch 130-1
switches data from end hosts in rack 110-1 to end hosts in other racks (e.g.,
rack 110-2) or network 180 via aggregate switch 150-1 or 150-2.
RE boxes 140 perform redundancy elimination. In particular, RE boxes
140 cache received data and suppress duplicate traffic. It should be
appreciated that while two (2) RE boxes are illustrated system 100 may
include fewer of more RE boxes. It should be further appreciated that though
illustrated as being placed after ToR switches in system 100, RE boxes may
be placed in any suitable portion of system 100 to perform redundancy
elimination. Moreover, though illustrated as a separate component, RE boxes
140 may include any configuration such as: (a) a separate redundancy
elimination component; (b) a software module residing on a component such
as one or more of the end hosts 120 or one or more of the ToR switches 130;
(c) or any other configuration combination.
For example, a source RE box (e.g., RE box 140-1 ) may create an
identifier associated with received data from an end host (e.g., end host 120-
1-EH1 ) and transmits the created identifier to a target RE box (e.g., RE box
140-2). In subsequent transmissions between the source and target RE box
(in either direction), a sourcing RX box (e.g., RE box 140-1 ) may perform data
de-duplication by replacing subsequent repeated byte sequences of newly
received data with the created identifier to reduce application latency and
conserve bandwidth. In this example, the receiving RE box (e.g., RE box 140-
2) uses the created identifier to retrieve the complete stored byte sequence
and forwards the retrieved byte sequence to it destination (e.g., end host 120-
2-EH1 ) .
Aggregation switches 150 switch data between an associated ToR
switch and network 180. For example, ToR switch 130-1-1 switches data from
resources in rack 110-1 to network 180 via an appropriate aggregation switch
(e.g., aggregation switch 150-1 or 150-2).
Agents 160-1-A1 - 160-1-A5, 160-2-A1 - 160-2-A5, 160-3-A1 - 160-3-
A2, and 160-4-A1 - 160-4-A2 (collectively, agents 160) measure the
redundancy in communications (e.g., slot-to-slot) and report these
measurements to placement controller 170 via one or more appropriate REaware
data center links. The term "redundancy" as used herein means any
content sequence capable of being de-duplicated by RE boxes 140 and
should be understood broadly as including any repeated content sequence. It
should be appreciated that though illustrated as residing in each of end hosts
120, ToR switches 130 and aggregation switches 150, system 100 may
include any configuration of agents. For example, agents may be deployed:
(a) on a portion or all of end hosts 120; (b) on a portion or all of end hosts
120, and a portion or all of aggregation switches 150; (c) or any other
configuration combination.
On end hosts 120, an agent may monitor slot-to-slot communications
for slots residing on the end hosts. On ToR switches 130, an agent may
monitor: (a) slot-to-slot communications from slots on different end hosts
within a rack; (b) communications among end hosts within the rack; (c)
communications among an end host within the rack and an end host on other
racks; or (d) the like. In an embodiment including a distributed data center
setting, agents on aggregation switches 150 may monitor the outgoing
communications to other data centers.
Placement controller 170 includes assigns slots (e.g., one or more of
slots 122) to containers (e.g., one of racks 110 or a data center in a
distributed data center). It should be appreciated that while one ( 1 ) placement
controller is illustrated here, system 100 may include more placement
controllers or that there could be one placement controller for a set of data
centers. In some of these embodiments, though illustrated within system 100,
placement controller 170 may be positioned outside of system 100 and
connected to system 100 via network 180.
RE-aware data center links and links 195-1 and 195-2 support
communicating over one or more communication channels such as: wireless
communications (e.g., LTE, GSM, CDMA, bluetooth); femtocell
communications (e.g., WiFi); packet network communications (e.g., IP);
broadband communications (e.g., DOCSIS and DSL); storage
communications (e.g., Fibre Channel, iSCSI) and the like. It should be
appreciated that though depicted as a single connection, links may be any
number or combinations of communication channels.
The network 180 includes any number of access and edge nodes and
network devices and any number and configuration of links (not shown for
purposes of clarity). Moreover, it should be appreciated that network 180 may
include any combination and any number of wireless, or wire line networks
including: LTE, GSM, CDMA, Local Area Network(s) (LAN), Wireless Local
Area Network(s) (WLAN), Wide Area Network (WAN), Metropolitan Area
Network (MAN), or the like.
Clients 190-1 and 190-2 may include any type of communication
device(s) capable of sending or receiving information over network 180 via an
appropriate one of links 195-1 or 195-2. For example, a communication
device may be a thin client, a smart phone (e.g., client 190-2), a personal or
laptop computer (e.g. 190-1 ) , server, network device, tablet, e-reader or the
like. Communication devices may rely on other resources within exemplary
system to perform a portion of tasks, such as processing or storage, or may
be capable of independently performing tasks. It should be appreciated that
while two clients are illustrated here, system 100 may include fewer or more
clients. Moreover, the number of clients at any one time may be dynamic as
clients may be added or subtracted from the system at various times during
operation.
In some embodiments, some or a portion of slots 122 are virtual
machines.
In some embodiments, some or a portion of slots 122 are virtualized
storage.
In some embodiments, ToR switches 130 are Ethernet switches.
In some embodiments, aggregation switches 150 are layer 2 Ethernet
switches.
In some embodiments of RE-aware data center 100, RE-aware data
center 100 is a distributed data center in a system of distributed data centers.
In some of these embodiments, RE boxes 140 may be placed after
aggregation switches 150 (e.g., between the aggregation switches 150 and
network 180) in place of or in addition to their illustrated placement after ToR
switches 130. Advantageously, if RE-aware data center 100 does not
accommodate all slots of an application instance (e.g., some slots are
allocated in other data centers), the RE-aware data center 100 may be used
to reduce the inter-data center traffic in order to more efficiently utilize the
bandwidth between these distributed data centers.
FIG. 2 depicts a flow chart illustrating an embodiment of a method 200
for an RE-aware system (e.g., RE-aware data center 100 of FIG. 1) to perform
RE-aware placements of slots. The method includes one or more agents 160-
1 - 160-N (e.g., one or more of agents 160 of FIG. 1) collecting redundancy
measurements (steps 210-1 - 2 10-N) and reporting the collected
measurements (steps 230-1 - 230-N) to the placement controller 170 (e.g.,
placement controller 170 of FIG. 1) based on a determination that a reporting
trigger has occurred (steps 220-1 - 220-N). The method further includes the
placement control 170 receiving the measurements (step 250) from agents
160-1 - 160-N and performing slot placement based on the received
measurements (step 270) based on a determination that a placement trigger
has occurred (step 260).
In the method 200, the steps 2 10-1 - 2 10-N include collecting
redundancy measurements. In particular, measurements include slot
communication patterns and the determined redundancy associated with slot
pair(s). Communication patterns may be a traffic matrix of the collected
measurements between slots and the determined redundancy may include
the redundant traffic identified between slot-to-slot communications in the
traffic matrix.
In the method 200, the steps 220-1 - 220-N include determining that a
reporting trigger has occurred. Based on the trigger determination, the method
either proceeds to one of steps 230-1 - 230-N respectively or returns to one
of steps 220-1 - 220-N respectively. The trigger may be any suitable event
signaling that collected measurements should be sent to placement controller
170. For example, the trigger event may be: (a) periodically triggered at
threshold intervals; (b) based on the measurements collected in one of steps
210-1 - 2 10-N respectively; (c) based on receipt of a request from placement
controller 170; or (d) the like. It should be appreciated that multiple trigger
events may occur at the same time.
In the method 200, the steps 230-1 - 230-N include reporting the
collected measurements to placement controller 170. In particular, collected
measurements may be reported in any suitable form such as: (a) the raw
collected measurements; (b) compressed or modified information based on
the collected measurements; (c) an aggregate of the collected measurements;
or (d) the like.
In the method 200, the step 250 includes receiving measurements from
one or more of agents 160-1 - 160-N.
It should be appreciated that the measurements may be pushed or
pulled to the apparatus performing the method.
In the method 200, the step 260 includes determining that a placement
trigger has occurred. Based on the trigger determination, the method either
proceeds to steps 270 or returns to step 260. The trigger may be any suitable
event signaling that a placement should be performed. For example, the
trigger event may be: (a) periodically triggered at threshold intervals; (b)
based on the agent measurements received in step 250; or (c) the like. It
should be appreciated that multiple trigger events may occur at the same
time. A quality threshold may be any suitable threshold such as the current
effective inter-container bandwidth being greater than a threshold limit as
compared to an estimate optimal effective inter-container bandwidth.
In the method 200, the step 270 includes placing the slots within in the
end hosts (e.g., end hosts 120 in FIG. 1) . In particular, the slots to be placed
in end hosts are determined and are placed in end hosts based on the agent
measurements received in step 250.
In some embodiments of one or more of steps 2 10-1 - 2 10-N,
redundancy measurements include measurements of redundancy in intracommunication-
redundancy or inter-communication-redundancy in slot-to-slot
communications.
The term "intra-comm-redundancy" as used herein means the
redundancy within slot-to-slot communications and should be understood
broadly as including the redundancy between any slot-to-slot communication.
For example, the redundancy between: (a) slot 122-1 and 122-3 of FIG. 1; (b)
122-1 and a slot (not shown) in end host 120-2-EH1 of FIG. 1; (c); 322-SLOT1
and 322-SLOT2 of FIG. 3 or (d) 322-SLOT1 and 322-SLOT3 of FIG. 3.
The term "inter-comm-redundancy" as used herein means the
redundancy between two slot-to-slot communications and should be
understood broadly as including the redundancy between any two slot-to-slot
communications. For example, the redundancy between: (a) slot-to-slot pair
slot 122-1 / 122-3 and slot-to-slot pair 122-1 and a slot (not shown) in end
host 120-2-EH1 of FIG. 1; or (b); slot-to-slot pair 322-SLOT1 / 322-SLOT3 of
and slot-to-slot pair 322-SLOT2 / 322-SLOT-4 of FIG. 3 is inter-commredundancy.
In some embodiments of the step 250, an API may be provided for
communication between a placement controller and agent (e.g., placement
controller 170 and one or more of agents 160 of FIG. 1) . In some of these
embodiments, a placement controller may request all of a portion of collected
redundancy measurements from one or more agents.
In some embodiments, portions of steps 260 and 270 may be
performed concurrently. For example, a placement trigger in step 260 may
involve determining whether the current effective inter-container bandwidth
meets the estimated optimal effective inter-container bandwidth. As should be
appreciated, determining the estimated optimal effective inter-container
bandwidth may require a determination of the optimal placement of virtual
machines. As such, the same determined optimal placement of virtual
machines may be used in both steps 260 and 270.
FIG. 3 illustrates a functional block diagram 300 of redundancy in a
RE-aware data center 100. Diagram 300 includes rack 3 10-1 and 310-2 (e.g.,
two of racks 110 in FIG. 1) and RE boxes 340-1 and 340-2. Rack 310-1
includes slots 322-SLOT1 and 322-SLOT2 and rack 310-2 includes slots 322-
SLOT3 and 322-SLOT4. Data transfer paths 390-1 - 390-7 illustrate data
transfer paths between components. RE boxes are positioned to de-duplicate
data over the network links illustrated by data transfer 390-4.
In some embodiments such as distributed data center embodiments,
rack 3 10-1 and 310-2 may be data centers.
In some embodiments, RE boxes 340-1 or 340-2 may include more
than one RE box. For example, separate RE boxes may be software modules
executing on one or more of the end hosts.
In the examples below, assume the following:
1) each of slots 322-SLOT1 - 322-SLOT4 may be allocated one of
four virtual machines: VM1 , VM2, VM3 and VM4;
2) there is inter-comm-redundancy between VM1 ® VM3 and VM2 ®
VM4 (i.e., there is a high similarity between data transfers that may
be de-duplicated by RE boxes 340-1 and 340-2);
3) the bandwidth requirements of the data traffic between VM1 and
VM2 is 8V;
4) the bandwidth requirements of the data traffic between VM1 and
VM3 is 10V;
5) the bandwidth requirements of the data traffic between VM3 and
VM4 is 8V; and
6) the bandwidth requirements of the data traffic between VM2 and
VM4 is 10V.
In a first example, the four virtual machines are assigned to slots using
a bandwidth-based placement strategy that places the VM pairs having higher
bandwidth traffic within the same rack. As such, VM1 and VM3 are placed in
the same rack and VM2 and VM4 are placed in the same rack. For example
the placements may be: VM1 assigned to 322-SLOT1 , VM3 assigned to 322-
SLOT2, VM2 assigned to 322-SLOT3 and VM4 assigned to 322-SLOT4. As
such the bandwidth requirements across data transfer links would be:
7) 10V between 322-SLOT1 and 322-SLOT2 via 390-1 ;
8) 8V between 322-SLOT1 and 322-SLOT3 via 390-2 ® RE box 340-
1 ® 390-4 ® RE box 340-2 and 390-5;
9) 7V between 322-SLOT2 and 322-SLOT4 via 390-3 ® RE box 340-
1 ® 390-4 ® RE box 340-2 and 390-6; and
10) 10V between 322-SLOT3 and 322-SLOT4 via 390-7.
In this example, the inter-rack bandwidth requirement (i.e., the
bandwidth over data transfer 390-4) is 15V as shown below.
8V for traffic between slot 322-SLOT1 and slot 322-SLOT3
+ 7V for traffic between slot 322-SLOT2 and slot 322-SLOT4.
In a second example, the four virtual machines are assigned to slots
using an RE-aware placement strategy as described herein. As such, VM1
and VM2 are placed in the same rack and VM3 and VM4 are placed in the
same rack. For example the placements may be: VM1 assigned to 322-
SLOT1 , VM2 assigned to 322-SLOT2, VM3 assigned to 322-SLOT3 and VM4
assigned to 322-SLOT4. As such the bandwidth requirements across data
transfer links would be:
11)8V between 322-SLOT1 and 322-SLOT2 via 390-1 ;
12) 10V between 322-SLOT1 and 322-SLOT3 via 390-2 ® RE box
340-1 ® 390-4 ® RE box 340-2 and 390-5;
13) 10V between 322-SLOT2 and 322-SLOT4 via 390-3 ® RE box
340-1 ® 390-4 ® RE box 340-2 and 390-6; and
14)7V between 322-SLOT3 and 322-SLOT4 via 390-7.
In this example, the inter-rack bandwidth requirement (i.e., the
bandwidth over data transfer 390-4) is 10V as shown below. It should be
appreciated that the inter-comm-redundancy (i.e., similarity between data
transfers 322-SLOT1 ®322-SLOT3 and 322-SLOT2®322-SLOT4) is
exploited in this example as the inter-comm-redundancy slot pairs traverse
the RE box pair 340-1 and 340-2 allowing de-duplication of data.
10V for traffic between slot 322-SLOT1 and slot 322-SLOT3
+ 10V for traffic between slot 322-SLOT2 and slot 322-SLOT4.
- 10V for data de-duplicated by RE box pair 340-1 and 340-2.
Thus, in this example, by exploiting the high similarity between data
transfers, the RE-aware placement strategy significantly reduced the
bandwidth usage of inter-rack (or inter-data center in a distributed data center)
link 390-4 by 33% over a bandwidth-based placement strategy (i.e., 10V
compared with 15V).
FIG. 4 depicts a flow chart illustrating an embodiment of a method 400
for a placement controller (e.g., placement controller 160 of FIG. 1) to place
slots in the end hosts as illustrated in step 270 of FIG. 2. The method includes
placing the slots (step 480) based on a determined set of slots to be placed
(step 420), the determined communication patterns (step 440) and the
determined slot communication redundancy (step 460).
In the method 400, the step 420 includes determining the number and
types of slots to place. In particular, the apparatus performing the method
determines the number and type of slots required to support the applications
instances serving clients (e.g., clients 190 in FIG. 1) .
In the method 400, the step 440 includes determining communication
patterns. In particular, communication patterns are derived from the
communication patterns received from one or more agents (e.g., step 250 in
FIG. 2). The communication patterns may be derived using any suitable
mechanism such as: (a) using the received communication patterns; (b)
modifying the received communication patterns; (c) aggregating the received
communication patterns; or (d) the like.
In the method 400, the step 460 includes determining communication
redundancy. In particular, communication redundancies are derived from the
communication redundancies received from one or more agents (e.g., step
250 in FIG. 2). The communication redundancies may be derived using any
suitable mechanism such as: (a) using the received communication
redundancies; (b) modifying the received communication redundancies; (c)
aggregating the received communication redundancies; or (d) the like.
In the method 400, the step 480 includes performing an RE-aware
placement strategy based on the determined required slots, the determined
communication patterns, and the determined slot redundancies. In particular,
the RE-aware placement strategy takes into account the redundancy in data
transfers for slot-to-slot communications and place slots to exploit redundancy
in data transfers while minimizing the overall inter-rack (or inter-data center)
bandwidth usage.
In some embodiments, the step 480 includes basing the placement of
slots on the intra-comm-redundancy of slot-to-slot data transfers across racks,
and the inter-comm-redundancy of two slot-to-slot transfers across racks. In
some embodiments, the intra-comm-redundancy and inter-comm-redundancy
are determined across data centers (e.g., data centers in distributed settings)
instead of or in addition to as across racks.
In some embodiments, the step 480 includes using conventional
integer linear program or classical optimization techniques to determine slot
placement. Conventional optimization techniques involve determining the
action that best achieves a desired goal or objective. An action that best
achieves a goal or objective may be determined by maximizing or minimizing
the value of an objective function. In some embodiments, the goal or metric of
the objective function may be to minimize the overall inter-container (e.g.,
inter-rack or inter-data center) bandwidth usage (e.g., a b where ua,b
a b
represents the bandwidth used between container Ca and Cb) .
The problem may be represented as:
Optimizing:
[E.1] y = f(X! , x2, x„)
Subject to:
[E.2] Gj(Xi , X2 , ..., xn) bj j = 1, 2, ... m
Where the equation E.1 is the objective function and equation E.2
constitutes the set of constraints imposed on the solution. The x , variables, x-i ,
x2, .. . , Xn, represent the set of decision variables and y = f(xi , x2, ..., xn) is the
objective function expressed in terms of these decision variables. It should be
appreciated that the objective function may be maximized or minimized.
In some of any of these embodiments, constraints may be one or more
of the following:
1. x = 1; where each slot is placed in one container and x is a
j
variable indicating the placement of slot(i) in container(j).
£ Maxj ; where each container(j) has a limit Maxj on the
maximum number of slots that the container may accommodate
(e.g., based on capacity or provisioning parameters).
j ,a,k,b ³ Xj,a + Xk,b - ; where Vj,a ,k,b represents a variable that is 1
if slot(i) is placed in container(a) and slot (k) is placed in
container(b) or otherwise zero.
Bandwidth towards usage across containers is based on
contributions of each type of traffic (i.e., unique, intra-redundant
and inter-redundant).
a. The contribution of unique data transfers towards
bandwidth usage between container Ca and Cb is
,k k
Uni j,k y j , , , W h re Unij k is the amount of
unique traffic between slots j and k . If slot(j) is placed in
container Ca and slot(k) is placed in container Cb, then it
will contribute to the bandwidth usage between container
b. The contribution of intra-redundant data transfers towards
bandwidth usage between container Ca and C b is zero. It
should be appreciated that intra-redundant data will be
suppressed by the RE mechanism.
c . The contribution of inter-redundant data transfer toward
bandwidth usage is computed as follows:
å , , « « , , , x , Where lnterj,k,,,m
represents the inter-slot-redundancy, and Zj,i,a ,k,m,b is a
variable that represents when inter-slot redundancy will
be suppressed. For example, when slot(j) and slot(k) are
placed in containers Ca and Cb respectively (i.e., Vj,a ,k,b =
1) , and slot(l) and slot(m) are also placed in containers Ca
and C b respectively (i.e., yi, a ,m ,b = 1) , the inter-slotredundant
data will be suppressed by the RE mechanism
and thusly, Zj,i, a ,k,m,b = 0. However, if Vj, a ,k ,b = 1 and yi,a ,m ,b
= 0, inter-slot-redundancy will not be suppressed and
thuSly, Zj,|, a ,k,m,b =
5. The bandwidth between container Ca and Cb is less than
available bandwidth BWa : ua,b £ BWa .
6. One or more slots are constrained to one or more identified
containers (e.g., Xj, a = 0).
FIG. 5 depicts a flow chart illustrating an embodiment of a method 500
for placing slots as illustrated in step 480 of FIG. 4. The method includes
determining a number of clusters (step 520), assign slots to available clusters
(step 540), and assign clusters to containers (step 560).
In the method 500, the step 520 includes determining a number of
clusters. In particular, the determined number of slots to be placed (e.g., step
420 in FIG. 4) are partitioned into "K" clusters, such that each partitioned
cluster may be allocated in at least one container. The term "container" as
used herein may be any suitable component capable of aggregating slots
such as a rack within a data center setting or a data center itself within a
distributed data center setting.
In some embodiments, a container is a rack.
In some embodiments, a container is a data center.
FIG. 6 depicts a flow chart illustrating an embodiment of a method 600
for placing slots as illustrated in step 480 of FIG. 4 and steps 520, 540 and
560 of FIG. 5.
In the method 600, the step 610 includes selecting a number of
clusters "K" as described herein. Where K is the number of clusters to which
the determined slots are distributed.
In the method 600, the step 620 includes assigning each of a number
of random slots to an available container. In some of these embodiments, the
number of random slots assigned is "K".
In the method 600, the steps 630 and 640 include for each remaining
slot "S" to be allocated (step 640), placing slot "S" in an available cluster
based on computed distances to each available cluster (step 630).
In the method 600, the step 650 includes assigning clusters to
containers based on an effective-inter-container-bandwidth-usage metric.
The method 600 optionally includes step 660. Step 660 includes
determining whether the effective-inter-container-bandwidth-usage
determined in step 650 meets a calculated optimal and whether the
determined placement satisfies the constraint that each inter-container traffic
(R1 ,R2) should be less than BW(R1 ,R2). Where BW is the available
bandwidth between container R 1 and R2. Determining a calculated optimal
effective-inter-container-bandwidth-usage may be done using any suitable
method such as: (a) setting an iterative count and selecting the best value; (b)
comparing the determined effective-inter-container-bandwidth-usage from
step 650 against a predetermined threshold; or (c) the like.
In the method 600, the step 670 includes optimizing the determined
slot placement. In particular, optimization of slot placement includes
determining whether moving container pairs would improve the effective-intercontainer-
bandwidth-usage. For example, exchanging S 1 in C 1 with S2 in C2
may be based on a determination that moving S 1 in container C 1 to container
C2, and moving S2 in container C2 to C 1 improves the effective-intercontainer-
bandwidth-usage.
In some embodiments of step 610, K is selected based on the number
of available slots in each container.
In some embodiments of step 6 10, K is selected randomly.
In some embodiments, step 630 includes computing the distances to
each available cluster and placing slot S in an available cluster based on the
computed distances to each available cluster. Where an available cluster is a
cluster that is not filled (i.e., has assigned slots than capacity).
In some of these embodiments of step 630, computing distances to
each candidate cluster (e.g., cluster C) is based on an effective-bandwidth
metric (M). In particular, the effective-bandwidth metric is the bandwidth
required after redundancy elimination within the data transfer from slot S to
any of the other slots in the cluster (intra-comm-redundancy) and across data
transfers slot S to any slot i , and slot S to any slot j in the cluster (inter-commredundancy).
In some of these embodiments the effective-bandwidth metric, is
computed as described in the pseudo code lines ( 1) - (3).
( 1) M_from(S) =
emand islot - S , slot - i) - Intra Re dundancyislot - S , slot - )) -
(Inter Re dundancyislot - S , slot - i, slot - S , slot - j))
slot—i,slot— ºC
(2) M_to(S) =
emand islot - i, slot - S ) - Intra Re dundancyislot - i, slot - S)) -
(Inter Re dundancyislot - i, slot - S , slot - j , slot - S))
slot—i,slot— ºC
(3) M(S) = M_from(S) + M_to(S)
Where: the function Demand(slot-i,slot-j) returns the bandwidth
demand between slots "i" and "j"; the function lntraRedundancy(slot i,slot j )
returns the intra-comm-redundancy for data transfers from slot i to slot j ; and
the function lnterRedundancy(slot i , slot j , slot k , slot I) returns the inter-commredundancy
between data transfers (slot i , slot j ) and data transfers (slot k ,
slot I).
In some embodiments of step 630, the cluster placement selection is
based on a calculated minimum average distance to cluster C (Min(C)). In
some of these embodiments, the Min(C) is computed as shown in pseudo
code lines (4) - (6).
(4) for each cluster C
(5) compute AvgDistance(C)= (1/N * 1/M(S));
// where N is the number of slots in the cluster C
(6) Min(C) = Min | AvgDistance(C) |
It should be appreciated that by taking the inverse of the effective
bandwidth metric (e.g., M(S)), and choosing the minimum (slot, container)
pair, slots with high bandwidth demands to a container will be placed in that
container, potentially reducing the traffic that would traverse across containers
(e.g., reduce the effective inter-container-bandwidth-usage).
In some embodiments, the steps 6 10 - 640 are as described in FIG. 7.
Where VM are virtual machines (e.g., a type of slot); Mindist u is the minimum
average distance to cluster C; MinVM and MinC are the respective VM and
containers associated with the Mindist u; Uniu v is the amount of unique traffic
between VMs u and v ; and lnterj,k,i,m is the inter-VM redundant traffic demands
for redundancy between data transfers from VMj ® VMk and VM| ® VMm.
In some embodiments of step 650, effective inter-container-bandwidthusage
is computed as the sum of all inter-container bandwidth demands
minus inter-container redundant traffic (including both intra-comm-redundant
traffic and inter-comm-redundant traffic). In some of these embodiments, the
effective inter-container-bandwidth-usage is computed as shown in pseudo
code is lines (7) - (10).
(7) Effective-intercontainer-bandwidth-usage = 0
(8) For each container Ci, Cj in pair
(9) Find the list of (slot i , slot j ) communications such that slot i and
slot j are in different containers;
(10) Effective-intercontainer-bandwidth-usage +=
{Demandislot - i, slot - j)) -
slot—i,slot—j
{Intra Re dundancyislot - i, slot - j)) -
slot—i,slot—j
{Inter Re dundancy{slot - i, slot - j , slot - k, slot - /)) ;
slot—i,slot— ,slot—k ,slot—I
// where slot i and slot k are in one container and
// slot j and slot k are in another container
In some embodiments, the step 650 is as described in FIG. 8. Where
VM are virtual machines (e.g., a type of slot); Uni u v is the amount of unique
traffic between VMs u and v ; and is the inter-VM redundant traffic
demands for redundancy between data transfers from VMj ® VMk and VM ®
VMm.
In some embodiments of step 670, the optimization determination is
computed as shown in pseudo code lines ( 1 1) - ( 16).
( 1 1) for each container Ci, Cj pair;
(12) for each slot k in Ci;
( 13) for each slot I in Cj, determine whether exchanging
slot k with slot I improves the
effective inter-container bandwidth usage;
(14) select the slot Imax
// where Imax is the slot I with the best calculated
// effective inter-container-bandwidth usage metric;
( 15) if slot k ¹ slot Imax &&
the exchange does not violate the
inter-container bandwidth constraints
( 16) exchange slot k with slot Imax;
In some embodiments, the step 670 is as described in FIG. 9.
In some embodiments, method 600 includes slot placement
constraint(s). In particular, for some slots, only a subset of containers is
available for slot placement. In some of these embodiments, the selection of
the subset of available containers is based on a quality of service metric such
as latency.
In some of these embodiments, a number of containers ("K") is
selected and a set of randomly chosen K slots is assignment to the K
containers using a one-to-one correspondence and ensuring that the slot
placement constraint(s) are met. In step 630, in computing the distance of a
slot to the containers, only containers meeting the slot placement constraint(s)
are considered. In step 670, only exchanges meeting slot placement
constraints are performed.
Although primarily depicted and described in a particular sequence, it
should be appreciated that the steps shown in methods 200, 400, 500 and
600 may be performed in any suitable sequence. Moreover, the steps
identified by one step may also be performed in one or more other steps in
the sequence or common actions of more than one step may be performed
only once.
It should be appreciated that steps of various above-described
methods can be performed by programmed computers. Herein, some
embodiments are also intended to cover program storage devices, e.g., data
storage media, which are machine or computer readable and encode
machine-executable or computer-executable programs of instructions,
wherein said instructions perform some or all of the steps of said abovedescribed
methods. The program storage devices may be, e.g., digital
memories, magnetic storage media such as a magnetic disks and magnetic
tapes, hard drives, or optically readable data storage media. The
embodiments are also intended to cover computers programmed to perform
said steps of the above-described methods.
FIG. 10 schematically illustrates an embodiment of various apparatus
1000 such as one one of end hosts 120, one of ToR switches 130, one of
aggregation switches 150 or placement controller 170 of FIG. 1. The
apparatus 1000 includes a processor 10 10, a data storage 101 1, and
optionally an I/O interface 1030.
The processor 1010 controls the operation of the apparatus 1000. The
processor 10 10 cooperates with the data storage 10 11.
The data storage 101 1 may store program data such as redundancy
measurements, communication patterns or the like as appropriate. The data
storage 101 1 also stores programs 1020 executable by the processor 101 0.
The processor-executable programs 1020 may include an agent
collection program 1021 , a placement controller program 1023, a placement
controller placement program 1025, or an I/O interface program 1027 as
appropriate. Processor 1010 cooperates with processor-executable programs
1020.
The I/O interface 1030 cooperates with processor 101 0 and I/O
interface program 1027 to support communications over any appropriate
communication channel(s) such as links of FIG. 1.
The agent collection 1021 performs the steps 210-1 , 220-1 and 230-1
of method 200 of FIG. 2 as described above.
The placement controller program 1023 performs the steps 250, 260,
and 270 of method 200 of 2IG. 5 as described above.
The placement controller placement program 1025 performs the steps
of method 270 of FIG. 2, the steps of FIG. 4, the steps of FIG. 5 or the steps
of FIG. 6 as described above.
In some embodiments, the processor 1010 may include resources
such as processors / CPU cores, the I/O interface 1030 may include any
suitable network interfaces, or the data storage 101 1 may include memory or
storage devices. Moreover the apparatus 1000 may be any suitable physical
hardware configuration such as: one or more server(s), blades consisting of
components such as processor, memory, network interfaces or storage
devices. In some of these embodiments, the apparatus 1000 may include
cloud network resources that are remote from each other.
In some embodiments, the apparatus 1000 may be virtual machine. In
some of these embodiments, the virtual machine may include components
from different machines or be geographically dispersed. For example, the
data storage 101 1 and the processor 1010 may be in two different physical
machines.
In some embodiments, the apparatus 1000 may be a general purpose
computer programmed to perform the methods 200, 400, 500 or 600.
When processor-executable programs 1020 are implemented on a
processor 1010, the program code segments combine with the processor to
provide a unique device that operates analogously to specific logic circuits.
Although depicted and described herein with respect to embodiments
in which, for example, programs and logic are stored within the data storage
and the memory is communicatively connected to the processor, it should be
appreciated that such information may be stored in any other suitable manner
(e.g., using any suitable number of memories, storages or databases); using
any suitable arrangement of memories, storages or databases
communicatively connected to any suitable arrangement of devices; storing
information in any suitable combination of memory(s), storage(s) or internal or
external database(s); or using any suitable number of accessible external
memories, storages or databases. As such, the term data storage referred to
herein is meant to encompass all suitable combinations of memory(s),
storage(s), and database(s).
The description and drawings merely illustrate the principles of the
invention. It will thus be appreciated that those skilled in the art will be able to
devise various arrangements that, although not explicitly described or shown
herein, embody the principles of the invention and are included within its spirit
and scope. Furthermore, all examples recited herein are principally intended
expressly to be only for pedagogical purposes to aid the reader in
understanding the principles of the invention and the concepts contributed by
the inventor(s) to furthering the art, and are to be construed as being without
limitation to such specifically recited examples and conditions. Moreover, all
statements herein reciting principles, aspects, and embodiments of the
invention, as well as specific examples thereof, are intended to encompass
equivalents thereof.
The functions of the various elements shown in the FIGs., including
any functional blocks labeled as "processors", may be provided through the
use of dedicated hardware as well as hardware capable of executing software
in association with appropriate software. When provided by a processor, the
functions may be provided by a single dedicated processor, by a single
shared processor, or by a plurality of individual processors, some of which
may be shared. Moreover, explicit use of the term "processor" or "controller"
should not be construed to refer exclusively to hardware capable of executing
software, and may implicitly include, without limitation, digital signal processor
(DSP) hardware, network processor, application specific integrated circuit
(ASIC), field programmable gate array (FPGA), read only memory (ROM) for
storing software, random access memory (RAM), and non volatile storage.
Other hardware, conventional or custom, may also be included. Similarly, any
switches shown in the FIGS are conceptual only. Their function may be
carried out through the operation of program logic, through dedicated logic,
through the interaction of program control and dedicated logic, or even
manually, the particular technique being selectable by the implementer as
more specifically understood from the context.
It should be appreciated that any block diagrams herein represent
conceptual views of illustrative circuitry embodying the principles of the
invention. Similarly, it should be appreciated that any flow charts, flow
diagrams, state transition diagrams, pseudo code, and the like represent
various processes which may be substantially represented in computer
readable medium and so executed by a computer or processor, whether or
not such computer or processor is explicitly shown.
What is claimed is:
1. An apparatus for providing slot placement, the apparatus
comprising:
a data storage; and
a processor communicatively connected to the data storage, the
processor being configured to:
determine a plurality of slots to be placed within a plurality of
containers;
determine a plurality of redundancy parameters; and
determine a placement of the plurality of slots in the plurality of
containers based on the plurality of redundancy parameters.
2. The apparatus of claim 1, wherein the redundancy parameters are
based on intra-comm-redundancy and inter-comm-redundancy of at least a
portion of the plurality of slots.
3. The apparatus of claim 2, wherein the redundancy parameters are
based on communication patterns of at least a portion of the plurality of slots.
4. The apparatus of claim 1, wherein the determination of the
placement of the plurality of slots comprises configuring the processor to:
determine a plurality of clusters;
assign each of at least a portion of the plurality of slots to at
least one of the plurality of clusters; and
assign each of at least a portion of the plurality of clusters to the
at least one of the plurality of containers.
5. The apparatus of claim 4, wherein the assignment of the portion of
the plurality of slots is based on a required slot bandwidth after redundancy
elimination.
6. The apparatus of claim 5, wherein the assignment of the portion of
the plurality of clusters is based on an effective-inter-container-bandwidthusage.
7. A method for providing slot placement, the method comprising:
at a processor communicatively connected to a data storage,
determining a plurality of slots to be placed within a plurality of containers;
determining, by the processor in cooperation with the data
storage, a plurality of redundancy parameters; and
determining, by the processor in cooperation with the data
storage, a placement of the plurality of slots in the plurality of containers
based on the plurality of redundancy parameters.
8. The method of claim 7, further comprising:
exchanging, by the processor in cooperation with the data
storage, a first placed slot and a second placed slot based on a determination
that inter-container-bandwidth-usage will be improved and that the exchange
meets one or more inter-container bandwidth constraints.
9. The method of claim 7, wherein the step of determining the
placement of the plurality of slots comprises:
determining, by the processor in cooperation with the data
storage, a plurality of clusters;
assigning, by the processor in cooperation with the data storage,
each of at least a portion of the plurality of slots to at least one of the plurality
of clusters; and
assigning, by the processor in cooperation with the data storage,
each of at least a portion of the plurality of clusters to the at least one of the
plurality of containers.
10. The method of claim 9, wherein the step of assigning the portion of
the plurality of slots is based on a required slot bandwidth after redundancy
elimination.
| # | Name | Date |
|---|---|---|
| 1 | 456-DELNP-2015.pdf | 2015-01-20 |
| 2 | PD015234IN-NP ALCATEL LUCENT_GPOA _NEW FOR USE - CHECK BEFORE USING.pdf | 2015-03-12 |
| 3 | PD015234IN-NP - SPEC FOR FILING.pdf | 2015-03-12 |
| 4 | PD015234IN-NP - FORM 5.pdf | 2015-03-12 |
| 5 | PD015234IN-NP - FORM 3.pdf | 2015-03-12 |
| 6 | 456-delnp-2015-Correspondence Others-(30-04-2015).pdf | 2015-04-30 |
| 7 | 456-delnp-2015-Assignment-(30-04-2015).pdf | 2015-04-30 |
| 8 | 456-delnp-2015-Form-3-(12-06-2015).pdf | 2015-06-12 |
| 9 | 456-delnp-2015-Correspondence Others-(12-06-2015).pdf | 2015-06-12 |
| 10 | 456-delnp-2015-Form-3-(28-10-2015).pdf | 2015-10-28 |
| 11 | 456-delnp-2015-Correspondence Others-(28-10-2015).pdf | 2015-10-28 |
| 12 | 456-delnp-2015-Form-3-(16-03-2016).pdf | 2016-03-16 |
| 13 | 456-delnp-2015-Correspondence Others-(16-03-2016).pdf | 2016-03-16 |
| 14 | Form 3 [07-06-2016(online)].pdf | 2016-06-07 |
| 15 | Form 3 [19-11-2016(online)].pdf | 2016-11-19 |
| 16 | 456-DELNP-2015-FORM 3 [11-08-2017(online)].pdf | 2017-08-11 |
| 17 | 456-DELNP-2015-FORM 3 [12-01-2018(online)].pdf | 2018-01-12 |
| 18 | 456-DELNP-2015-FORM 3 [11-06-2018(online)].pdf | 2018-06-11 |
| 19 | 456-DELNP-2015-FER.pdf | 2019-09-12 |
| 20 | 456-DELNP-2015-FORM 3 [11-03-2020(online)].pdf | 2020-03-11 |
| 21 | 456-DELNP-2015-OTHERS [12-03-2020(online)].pdf | 2020-03-12 |
| 22 | 456-DELNP-2015-FER_SER_REPLY [12-03-2020(online)].pdf | 2020-03-12 |
| 23 | 456-DELNP-2015-DRAWING [12-03-2020(online)].pdf | 2020-03-12 |
| 24 | 456-DELNP-2015-CLAIMS [12-03-2020(online)].pdf | 2020-03-12 |
| 25 | 456-DELNP-2015-Power of Attorney-160320.pdf | 2020-03-18 |
| 26 | 456-DELNP-2015-Correspondence-160320.pdf | 2020-03-18 |
| 27 | 456-DELNP-2015-Correspondence to notify the Controller [10-02-2021(online)].pdf | 2021-02-10 |
| 28 | 456-DELNP-2015-FORM 3 [02-03-2021(online)].pdf | 2021-03-02 |
| 29 | 456-DELNP-2015-Written submissions and relevant documents [17-03-2021(online)].pdf | 2021-03-17 |
| 30 | 456-DELNP-2015-PatentCertificate31-03-2021.pdf | 2021-03-31 |
| 31 | 456-DELNP-2015-IntimationOfGrant31-03-2021.pdf | 2021-03-31 |
| 32 | 456-DELNP-2015-US(14)-HearingNotice-(HearingDate-02-03-2021).pdf | 2021-10-17 |
| 33 | 456-DELNP-2015-RELEVANT DOCUMENTS [26-08-2022(online)].pdf | 2022-08-26 |
| 34 | 456-DELNP-2015-RELEVANT DOCUMENTS [03-08-2023(online)].pdf | 2023-08-03 |
| 1 | stratos_tech_report_05-09-2019.pdf |
| 2 | searchstrategy2019-09-0514-54-57_05-09-2019.pdf |