Abstract: Disclosed is a method for dynamically scaling processing resources of a processing system, the method comprising: receiving information indicative of a current state of a user in a three-dimensional (3D) environment of a metaverse; determining, from amongst a plurality of actions that are feasible in the 3D environment, a set of actions that are feasible from the current state of the user; predicting, from amongst the set of actions, an action that is likely to be 10 performed by the user from the current state of the user, by employing a machine learning model that is dynamically trained for making said prediction; determining a tentative next state of the user which corresponds to the action that is predicted; and adjusting a scale of the processing resources according to the tentative next state of the user.
Description:TECHNICAL FIELD
The present disclosure relates to methods for dynamically scaling processing resources of a processing system. The present disclosure also relates to systems for dynamically scaling processing resources of a processing system.
BACKGROUND 5
The world is increasingly relying on digital technologies which comprises systems, devices and resources that generate, store or process data. Immersive technology is one such digital technology that is able to create engaging, realistic, and interactive experiences for a user. Immersive technology provided extended-reality (XR) experiences, which include virtual reality (VR) experiences, 10 augmented reality (AR) experiences, mixed reality (MR) experiences, and the like, on various devices and platforms.
Despite many recent advancements in immersive technology existing techniques and equipment for providing a fully immersive experience in XR have several limitations associated therewith. Currently, in some instances, processors of 15 systems that provide immersive experiences are kept on (i.e., in an active working state) at all times, which increases power consumption and costs associated with provisioning of the immersive experiences. In some other instances, the processors of the systems are turned on only when there is a demand or a requirement by the user, for enabling reduction of power consumption and costs. 20 However, this approach is associated with latency issues (for example, it may take several seconds or minutes to turn on the processors upon receiving an indication of the demand), and hence a user's experience of an immersive environment adversely affected.
Therefore, in light of the foregoing discussion, there exists a need to overcome the 25 aforementioned drawbacks associated with existing processors of the systems for providing the XR experience.
3
SUMMARY
The present disclosure seeks to provide a method for dynamically scaling processing resources of a processing system. The present disclosure also seeks to provide a system for dynamically scaling processing resources of a processing system. An aim of the present disclosure is to provide a solution that overcomes at 5 least partially the problems encountered in prior art.
In one aspect, an embodiment of the present disclosure provides a method for dynamically scaling processing resources of a processing system, the method comprising:
- receiving information indicative of a current state of a user in a three-10 dimensional (3D) environment of a metaverse;
- determining, from amongst a plurality of actions that are feasible in the 3D environment, a set of actions that are feasible from the current state of the user;
- predicting, from amongst the set of actions, an action that is likely to be performed by the user from the current state of the user, by employing a machine 15 learning model that is dynamically trained for making said prediction;
- determining a tentative next state of the user which corresponds to the action that is predicted; and
- adjusting a scale of the processing resources according to the tentative next state of the user. 20
Optionally, the method further comprises dynamically training the machine learning model by:
- receiving information indicative of an actual next state of the user in the 3D environment, wherein the actual next state is attained when an action is actually performed by the user; 25
- determining whether the actual next state matches the tentative next state; and
4
- when it is determined that the actual next state matches the tentative next state, rewarding the machine learning model by increasing a value of a cumulative reward of the machine learning model.
Optionally, when it is determined that the actual next state does not match the tentative next state, the method further comprises penalizing the machine learning 5 model by decreasing a value of a cumulative reward of the machine learning model.
Optionally, the cumulative reward of the machine learning model is determined using a reward function, the value of the cumulative reward being indicative of a training state of the machine learning model. 10
Optionally, when the actual next state of the user is attained prior to the step of adjusting the scale of the processing resources, the method further comprises adjusting the scale of the processing resources according to the actual next state of the user upon receiving the information indicative of the actual next state of the user. 15
Optionally, the step of predicting the action that is likely to be performed by the user comprises utilizing a predictive algorithm in the machine learning model for making said prediction.
Optionally, the predictive algorithm takes into account at least one of: a historical record of actions performed by the user at the current state, a historical record of 20 actions performed by other users at the current state, a historical record of actions performed by the user at a state similar to the current state, a historical record of actions performed by other users at a state similar to the current state, when making said prediction.
Optionally, the step of adjusting the scale of the processing resources comprises 25 increasing a number of the processing resources of the processing system that are allocated for the user, or decreasing a number of the number of the processing
5
resources of the processing system that are allocated for the user, depending on processing resource requirements of the tentative next state with respect to processing resource requirements of the current state.
Optionally, a given state of the user is one of: a given pose of the user, a given appearance of the user's avatar, a condition of the user's avatar. 5
Optionally, the plurality of actions comprise two or more of: selecting an option, selecting an object, manipulating an object, changing a setting, moving to a location, in the 3D environment.
In another aspect, an embodiment of the present disclosure provides a system for dynamically scaling processing resources of a processing system, the system 10 comprising a processor configured to:
- receive information indicative of a current state of a user in a three-dimensional (3D) environment of a metaverse;
- determine, from amongst a plurality of actions that are feasible in the 3D environment, a set of actions that are feasible from the current state of the user; 15
- predict, from amongst the set of actions, an action that is likely to be performed by the user from the current state of the user, by employing a machine learning model that is dynamically trained for making said prediction;
- determine a tentative next state of the user which corresponds to the action that is predicted; and 20
- adjust a scale of the processing resources according to the tentative next state of the user.
Optionally, the processor is further configured to dynamically train the machine learning model by:
- receiving information indicative of an actual next state of the user in the 25 3D environment, wherein the actual next state is attained when an action is actually performed by the user;
6
- determining whether the actual next state matches the tentative next state; and
- when it is determined that the actual next state matches the tentative next state, rewarding the machine learning model by increasing a value of a cumulative reward of the machine learning model. 5
Optionally, when it is determined that the actual next state does not match the tentative next state, the processor is further configured to penalize the machine learning model by decreasing a value of a cumulative reward of the machine learning model.
Optionally, the cumulative reward of the machine learning model is determined 10 using a reward function, the value of the cumulative reward being indicative of a training state of the machine learning model.
Optionally, when the actual next state of the user is attained prior to the step of adjusting the scale of the processing resources, the processor is further configured to adjust the scale of the processing resources according to the actual next state of 15 the user upon receiving the information indicative of the actual next state of the user.
Optionally, when predicting the action that is likely to be performed by the user, the processor is configured to utilize a predictive algorithm in the machine learning model for making said prediction. 20
Optionally, the predictive algorithm takes into account at least one of: a historical record of actions performed by the user at the current state, a historical record of actions performed by other users at the current state, a historical record of actions performed by the user at a state similar to the current state, a historical record of actions performed by other users at a state similar to the current state, when 25 making said prediction.
7
Optionally, when adjusting the scale of the processing resources, the processor is configured to increase a number of the processing resources of the processing system that are allocated for the user, or decrease a number of the number of the processing resources of the processing system that are allocated for the user, depending on processing resource requirements of the tentative next state with 5 respect to processing resource requirements of the current state.
Optionally, the system further comprises a data repository communicably coupled to the processor, wherein the data repository has stored thereon at least information indicative of a plurality of states that are achievable by the user in the 3D environment of the metaverse and one or more actions corresponding to each 10 of the plurality of states, and information indicative of a required scale of the processing resources for each of the plurality of states.
Optionally, the processor is further configured to store, at the data repository, at least one of: the action that is likely to be performed by the user from the current state of the user, the tentative next state of the user which corresponds to the 15 action that is predicted, a value of a cumulative reward of the machine learning model.
BRIEF DESCRIPTION OF THE DRAWINGS
The summary above, as well as the following detailed description of illustrative embodiments, is better understood when read in conjunction with the appended 20 drawings. For the purpose of illustrating the present disclosure, exemplary constructions of the disclosure are shown in the drawings. However, the present disclosure is not limited to specific methods and instrumentalities disclosed herein. Moreover, those skilled in the art will understand that the drawings are not to scale. Wherever possible, like elements have been indicated by identical 25 numbers.
Embodiments of the present disclosure will now be described, by way of example only, with reference to the following diagrams wherein:
8
FIG. 1 illustrates steps of a method for dynamically scaling processing resources of a processing system, in accordance with an embodiment of the present disclosure;
FIGs. 2A, 2B, 2C, and 2D show exemplary perspective views of a 3D environment representing a virtual car showroom, in accordance with an 5 embodiment of the present disclosure;
FIG. 3 illustrates an exemplary process flow for determining a tentative next state of a user, in accordance with an embodiment of the present disclosure;
FIG. 4 shows an exemplary flow diagram for dynamically training a machine learning model, in accordance with an embodiment of the present disclosure; and 10
FIGs. 5A and 5B illustrate block diagrams of a system for dynamically scaling processing resources of a processing system, in accordance with different embodiments of the present disclosure.
DETAILED DESCRIPTION
The following detailed description illustrates embodiments of the present 15 disclosure and ways in which they can be implemented. Although some modes of carrying out the present disclosure have been disclosed, those skilled in the art would recognize that other embodiments for carrying out or practicing the present disclosure are also possible.
The present disclosure provides a method for dynamically scaling processing 20 resources of a processing system, and a system for dynamically scaling processing resources of a processing system. Herein, a machine learning model is dynamically trained to reduce prediction errors while predicting an action based on a current state of a user. Beneficially, a tentative next state is determined based on this predicted action, and then the processing resources are dynamically scaled 25 accordingly in a time-efficient manner that minimizes lag. Additionally, the dynamic scaling of processing resources reduces power consumption and costs associated with provisioning of the immersive experiences, as an amount of the processing resources that are required in future are procured or activated based on
9
the tentative next state. Since the scaling of the processing resources based on predicted tentative next states reduces latency issues, it also enhances user's experience of the 3D environment. The method is easy to implement, dynamically trains the machine learning model, thereby accurately determining the tentative next state of the user in the 3D environment based on which the processing 5 resources are efficiently dynamically scaled.
Referring to FIG. 1, illustrated are steps of a method for dynamically scaling processing resources of a processing system, in accordance with an embodiment of the present disclosure. At step 102, information indicative of a current state of a user in a three-dimensional (3D) environment of a metaverse is received. At step 10 104, a set of actions that are feasible from the current state of the user are determined. The set of actions are determined from amongst a plurality of actions that are feasible in the 3D environment. At step 106, an action that is likely to be performed by the user from the current state of the user is predicted, by employing a machine learning model that is dynamically trained for making said prediction. 15 The action is predicted from amongst the set of actions. At step 108, there is determined a tentative next state of the user which corresponds to the action that is predicted. At step 110, a scale of the processing resources is adjusted according to the tentative next state of the user.
The steps 102, 104, 106, 108, 110 are only illustrative and other alternatives can 20 also be provided where one or more steps are added, one or more steps are removed, or one or more steps are provided in a different sequence without departing from the scope of the claims herein. It will be appreciated that the method is easy to implement, dynamically trains the machine learning model, thereby accurately determining the tentative next state of the user in the 3D 25 environment.
Optionally, at step 102, the information indicative of the current state of the user in the 3D environment is received from at least one of: a device using which the user accesses and interacts with the metaverse, a software application of the 3D
10
environment. The information indicative of the current state of the user is received in order to enable prediction of the tentative next state of the user, based on which the processing resources are to be dynamically scaled. The information indicative of the current state of the user is received in real time, or in near-real time.
Throughout the present disclosure, the term "metaverse" refers to an extended-5 reality (XR) space that provides users with experiences of one or more XR environments. The one or more XR environments are 3D environments of the metaverse, and encompass virtual reality (VR) environments, augmented reality (AR) environments, mixed reality (MR) environments, or similar. Furthermore, information related to the 3D environment may be stored in a form of a 3D 10 polygonal mesh, a 3D point cloud, a 3D surface cloud, a 3D surflet cloud, a 3D grid, or similar. The 3D environment may include light, shadow, and contextual patterns of any virtual object in the metaverse. The 3D environment could be used for viewing in real-time or later. An example of the VR environment be a video tour of a wildlife sanctuary, and an example of the AR environment may be a 15 digital makeup tool wherein virtual makeup is overlaid on a video feed of a user's face and body.
Throughout the present disclosure, the term "state" refers to a condition (i.e., a situation) of the user in the 3D environment of the metaverse, at a given time. The term "current state" of the user refers to a current condition (i.e., a present 20 condition) of the user in the 3D environment of the metaverse, at a current time. It will be appreciated that each user experiences a unique journey in the metaverse, thereby entering into different states throughout their unique journey. In order to move from one state to another in the 3D environment, the user performs an action. Furthermore, the user can be a new user (i.e., a user who is newly 25 introduced into the 3D environment), or an existing user who is already present in the 3D environment.
Optionally, a given state of the user is one of: a given pose of the user, a given appearance of the user's avatar, a condition of the user's avatar. The given state of
11
the user is at least one of: the current state, the tentative next state, an actual next state (as described later), of the user. Herein, the "pose" of the user refers to a position and/or an orientation of the user in the 3D environment of the metaverse. The position may be expressed as 3D cartesian coordinates, spherical coordinates, or similar. The orientation may be expressed using Euler angles, Tait-Bryan 5 angles, orientation vectors, orientation matrices, and similar. The term "avatar" refers to a virtual representation of the user in the 3D environment of the metaverse. Furthermore, the user's avatar can also be referred to as a digital identity of the user in the metaverse. The appearance of the avatar is not constrained by any physical factors, and thus may resemble the user, or may look 10 different from the user. The "condition" of the user's avatar refers to any circumstance or a factor which affects the avatar in the metaverse. The condition of the user's avatar may, for example, be a health condition of the user's avatar (for example, in a boxing game in the metaverse), a metaverse-specific condition of the avatar (for example, in the metaverse, an avatar could be in a normal 15 condition or in an energized condition, wherein in the energized condition, the avatar is able to perform some specialized actions which were not possible in the normal condition), and the like.
Throughout the present disclosure, the phrase "plurality of actions that are feasible in the 3D environment " refers to all actions that can be performed by the 20 user in the 3D environment of the metaverse. The plurality of actions can be provided in the form of a list, a table, and similar. Optionally, all possible states achievable by the user in the 3D environment are determined, and subsequently one or more actions corresponding to each of the possible states are determined. The user is able to achieve one or more next states from the current state, by 25 performing one or more actions. Every state has its own set of actions that are feasible therefrom. Therefore, optionally, the set of actions that are feasible from the current state of the user may be determined at step 104 using a data structure (for example, a list, a tree, or similar) or information indicative of all the possible states achievable by the user in the 3D environment and their corresponding 30
12
actions. Herein, the set of actions comprises one or more actions that are feasible from the current state of the user. In an instance, the set of actions may include a single action, wherein that action is predicted with complete (i.e., 100%) certainty at the next step. In another instance, the set of actions may include multiple actions. Hence, any one action from amongst the set of actions may be predicted 5 with less than complete (i.e., less than 100%) certainty at the next step.
Optionally, the plurality of actions comprise two or more of: selecting an option, selecting an object, manipulating an object, changing a setting, moving to a location, in the 3D environment. The plurality of actions are determined as a whole for all possible states of the user that are achievable in the 3D environment. 10 Optionally, selecting the option is performed by making a selection from a list of options available for the 3D environment, by making a selection from a set of icons indicating the options that are available, by providing a text input for selecting the option, and similar. Examples of options may include, but are not limited to, hiding an object (for example, a map), showing an object, deleting an 15 object, changing location of an object, entering any 3D environment, exiting any 3D environment, changing emotions, changing a theme in the metaverse, applying an environmental effect in the metaverse, and similar. The term "object" refers to a virtual object or to a 3D virtual representation of a real object present in a real-world environment. The object can be selected randomly or intentionally, by the 20 user. Optionally, manipulating the object includes at least one of: opening the object, closing the object, changing an appearance of the object, changing a position of the object, changing properties (i.e., shape, size, physical properties, material properties, and similar) of the object, replicating the object, deleting the object. The term "setting" refers to at least one of: a brightness, a contrast, a 25 configuration, a saturation, a theme, a music, a visual effect, a lighting condition, and the like, of the 3D environment. The user can also move to the location (which can be a new location or a previously-visited location) from a current location in the 3D environment.
13
As a first example, the 3D environment may represent a virtual car showroom. A user present in the virtual car showroom has a plurality of actions available thereto, wherein the plurality of actions may comprise two or more of: showing a 3D map of the virtual car showroom, hiding the 3D map, switching views from a first-person view to a third-person view, changing emotions, selecting an avatar, 5 selecting a brand of a car, entering point of interest in the 3D environment, selecting any virtual car-showroom from a list of virtual car-showrooms, entering a virtual car-showroom, selecting a different level within the virtual car-showroom, entering an online booking zone, entering a key specifications zone, interacting with a virtual test drive object, entering an augmented reality zone, 10 selecting a catalogue button, deselecting a catalogue button, adding a new car model, replacing an old car model, removing an existing car model, skipping cinematics, virtual test driving, restarting the virtual test drive, entering a virtual room, exiting a virtual room, entering a zone where an event is held, changing color of the car model, and the like. When the current state of the user is that the 15 user is standing outside the virtual car showroom, the set of actions feasible from the current state of the user are: showing the 3D map, hiding the 3D map, switching views from the first-person view to the third-person view, changing emotions, entering the virtual car showroom.
Referring to FIGs. 2A, 2B, 2C and 2D, there are shown exemplary perspective 20 views of a 3D environment representing a virtual car showroom 200, in accordance with an embodiment of the present disclosure. In FIGs. 2A-2D, various examples of the current state of a user 202 in the virtual car showroom 200 are illustrated. The user 202 is shown to be inside the virtual car showroom 200. The virtual car showroom 200 is shown to have a car 204, a reception desk 25 206, a sofa 208 kept beside the reception desk 206, and a (virtual) receptionist 210 sitting at the reception desk 206. The user could have different current states at different times.
In FIG. 2A, the user 202 is shown to be standing in a middle of the virtual car showroom 200. This position is, for example, a current state of the user 202 at a 30
14
time T1. A plurality of actions are feasible in the virtual car showroom 200 are two or more of: walking towards the car 204, opening a door of the car 204, inspecting an interior of the car 204, inspecting an exterior of the car 204, sitting inside the car 204, driving away the car 204 for a virtual test drive, walking towards the reception desk 206, standing by the reception desk 206, talking to a 5 receptionist sitting at the reception desk 206, getting a brochure from the reception desk 206, walking towards the sofa 208 from the current state, walking towards the sofa from the reception desk 206, standing by the sofa 208, sitting on the sofa 208, standing up from the sofa 208, changing a size of the car 204, changing a color of the car 204, changing a model of the car 204, entering a zone (not shown) 10 where an event is held, exiting the virtual car showroom 200. However, a set of actions that are feasible from the current state of the user as shown in FIG. 2A are at least one of: walking towards the car 204, inspecting the exterior of the car 204, walking towards the reception desk 206, talking to the receptionist 210 sitting at the reception desk 206, walking towards the sofa 208. 15
Alternatively, in FIG. 2B, the user 202 is shown to be standing near the car 204, and this position is, for example, the current state of the user 202 at a time T2. The set of actions feasible from the current state may be at least one of: opening the door of the car 204, inspecting the interior of the car 204, inspecting the exterior of the car 204, sitting inside the car 204, driving away the car 204 for the virtual 20 test drive.
Alternatively, in FIG. 2C, the user 202 is shown to be standing at the reception desk 206, and this position is, for example, the current state of the user 202 at a time T3. The set of actions feasible from the current state may be at least one of: talking to the receptionist sitting at the reception desk 206, getting the brochure 25 from the reception desk 206, walking towards the sofa 208 from the reception desk 206.
15
Alternatively, in FIG. 2D, the user 202 is sitting on the sofa 208, and this position is, for example, the current state of the user 202 at a time T4. The set of actions feasible from the current state may be standing up from the sofa 208.
FIGs. 2A, 2B, 2C and 2D are merely examples, which should not unduly limit the scope of the claims herein. A person skilled in the art will recognize many 5 variations, alternatives, and modifications of embodiments of the present disclosure. For example, when an orientation of the user is also known, the orientation may also be used to determine the set of actions feasible from the current state.
The machine learning model is used at step 106 to predict one action, amongst the 10 set of actions determined at step 104, that is likely to be performed by the user from the current state of the user. Optionally, the action that is predicted has a highest probability of occurrence from the current state, amongst probabilities of occurrence of all actions in the set of actions. In the present disclosure, the machine learning model is dynamically trained by continuously receiving input 15 (which is in the form of pairs of (current) state and (predicted) action) and feedback into the machine learning model, and incorporating the received input and feedback for accurately predicting the action from the set of actions. Specifically, the machine learning is dynamically trained (i.e., regularly trained in time) using reinforcement learning. Herein, in reinforcement learning, when the 20 machine learning model predicts the action correctly, then said machine learning model is rewarded. Alternatively, when the machine learning model predicts the action incorrectly, then said machine learning model is penalized. Dynamic training of the machine learning model is described later in more detail. A technical benefit of dynamically training the machine learning model is that it 25 makes the machine learning model highly adaptable and accurate over time, thus making it very efficient.
Optionally, the step of predicting the action that is likely to be performed by the user comprises utilizing a predictive algorithm in the machine learning model for
16
making said prediction. Herein, the predictive algorithm is used to minimize a prediction error of the machine learning model and to accurately predict the action of the user in the 3D environment based on the current state. The predictive algorithm is based on a Markov Decision Process (MDP). The MDP provides a mathematical framework for modelling the step of predicting the action in 5 situations where outcomes are partly random and partly under the control of the user. The predictive algorithm utilizes Quality-learning (Q-learning) to learn a value of all the possible actions in all the possible states in the 3D environment. Q-learning is a model-free reinforcement learning algorithm, i.e., it does not require a model of the metaverse. The Q-learning is performed using a Q-table 10 which enables the predictive algorithm to predict the action likely to be performed by the user based on an expected reward (i.e., a Q-value) for each state considered as the current state, from amongst all possible state-action pairs in the 3D environment. Herein, the Q-table is a data structure which is used to calculate a maximum expected future reward (i.e., a maximum value of the Q-value) for all 15 the possible actions performed corresponding to all the possible states of the user in the 3D environment. The Q-table includes Q-values for each pair of state and action possible in the 3D environment of the metaverse. The Q-values are initialized to 0 before initiating the machine learning model. It will be appreciated, that the Q-table is continuously updated as the dynamic learning of the machine 20 learning is performed, which means that the Q-values for each pair of the state and the action possible in the 3D environment are continuously updated. The Q-tables are updated based on an equation, such as the Bellman equation which receives two inputs, namely, a state and an action feasible in the 3D environment. The Bellman equation is given as equation (1): 25
????(????,????)= ??[????+1+??????+2+??2????+3+?][????,????] (1)
wherein ????(????,????) represents Q-values for a given action ????, given a particular state ????, ??[????+1+??????+2+??2????+3+?] represents an expected discounted cumulative reward, and [????,????] represents the possible states and their corresponding possible actions in the 3D environment. 30
17
Optionally, the predictive algorithm takes into account at least one of: a historical record of actions performed by the user at the current state, a historical record of actions performed by other users at the current state, a historical record of actions performed by the user at a state similar to the current state, a historical record of actions performed by other users at a state similar to the current state, when 5 making said prediction. The historical record of actions may be recorded over a period of time, such as, a day, a week, a year, or similar. The 3D environment of the metaverse can be accessed by different users, including the user and the other users, over the period of time. The user and/or the other users could behave in a manner similar to each other or different from each other in the 3D environment. 10 There can be one or more states similar to the current state within the 3D environment. The user and/or the other users could behave in a manner similar to or different than their behavior in the current state. Beneficially, the historical records of actions in various scenarios mentioned hereinabove represent actual past interaction of the user and/or the other users with the 3D environment, 15 thereby serving as an accurate and useful reference for updating the values in the Q-table. Hence, when a new user or an existing user enters the 3D environment of the metaverse, the Q-values in the Q-table are no longer reset to 0 as was previously described. Instead, the Q-table is prepared based on at least one of the aforesaid historical interaction data of the user and/or of the other users with the 20 3D environment. The Q-table is also updated based on current interaction of the user with the 3D environment. Optionally, if the user has historically visited the metaverse, the Q-values in the Q-table are a weighted average of past Q-values in the Q-table for the user.
At step 108, the tentative next state of the user corresponding to the action that is 25 predicted, is determined. The tentative next state is an outcome which would tentatively occur when the action that has been predicted by the machine learning model (optionally utilizing the predictive algorithm) is performed. Notably, the tentative next state is named so, since its occurrence is only tentative in nature, i.e., it is likely to occur upon the likely performance of the action that is predicted. 30
18
The tentative next state is different from the current state of the user. There can be another set of actions from another plurality of actions, that are possible from the tentative next state of the user. Subsequently, the predictive algorithm can predict another action, from the another set of actions, that is likely to be performed by the user from the tentative next state of the user, and so on. Continuing with 5 reference to the first example, when it is predicted that the user will enter the virtual car showroom, from amongst the set of actions, the tentative next state of the user will be that of the user standing inside the virtual car showroom.
Referring to FIG. 3, illustrated is an exemplary process flow 300 for determining the tentative next state of the user, in accordance with an embodiment of the 10 present disclosure. The tentative next state of the user corresponds to the action that is predicted. In FIG. 3, the current state of the user is given by ??1. A set 302 of actions (as depicted by a dashed rectangle) which are feasible from the current state ??1 of the user, is determined. The set 302 of actions comprises following actions: ??1, ??2, up to ???? wherein n could be any finite positive number. Every 15 action in the set 302 of actions leads to a tentative next state of the user. For example, the action ??1 leads to a tentative next state ??1', the action ??2 leads to a tentative next state ??2', and the action ???? leads to a tentative next state ??n', and so on. From these tentative next states, a tentative superset 304 of actions (as depicted by a round dotted rectangle) comprising one or more tentative sets of 20 actions, which would be feasible when any of the tentative next states ??1', ??2', and ????' that would become the current state of the user, is determined. The tentative superset 304 of actions comprises following actions: ??1' up to ??n' for every tentative next state ??1', ??2', up to ????' of the user. It will be appreciated that a number of actions in the tentative sets of actions corresponding to different tentative next 25 states could be same or different.
The current state ??1 of the user, the set 302 of actions, the tentative states, and the tentative superset 304 of actions are only illustrative and other alternatives can also be provided where one or more steps are added, one or more steps are
19
removed, or one or more steps are provided in a different sequence without departing from the scope of the claims herein.
At step 110, the scale of the processing resources of the processing system is adjusted. The scaling of the processing resources refers to increasing or decreasing an amount or types of resources that are required based on the tentative 5 next state of the user (and the action predicted by the machine learning model). Throughout the present disclosure, the term "processing resources" broadly encompasses devices that perform processing tasks, and additionally, optionally, also encompasses other devices that are used in conjunction with the devices that perform processing tasks. For example, the processing resources may be a 10 plurality of processors, and additionally, optionally, cache memories associated with the plurality of processors. It will be appreciated that every state of the user in the 3D environment of the metaverse has certain requirements of processing resources, and these requirements would change based on the action predicted from the current state, since the tentative next state is determined based on the 15 action predicted from the set of actions. Beneficially, when the tentative next state is determined and then necessary changes for the scaling of the processing resources are made accordingly, there is achieved efficient and effective dynamic scaling of the processing resources. Prior to actually achieving the tentative next state, the adjustment of the processing resources is beneficially initialized which 20 reduces a latency in managing processing resources based on requirements (or on demand). A technical benefit of adjusting the scale of the processing resources according to the tentative next state of the user is reduced power consumption (since all processing resources need not be kept on at all times), and time-efficient adjustment of processing capacity based on predicted (i.e., tentative) 25 requirements.
Optionally, the step of adjusting the scale of the processing resources comprises increasing a number of the processing resources of the processing system that are allocated for the user, or decreasing a number of the number of the processing resources of the processing system that are allocated for the user, depending on 30
20
processing resource requirements of the tentative next state with respect to processing resource requirements of the current state. In this regard, when the processing resource requirements of the tentative next state are higher than the processing resource requirements of the current state, the number of the processing resources of the processing system that are allocated for the user are 5 increased, and vice versa. For example, when the action predicted by the machine learning model requires the user to access heavy-graphics content, while currently the user is accessing light-graphics content, the number of processing resources allocated for the user is increased, and vice versa. In such an example, if one processor is currently allocated for the user, then the scale of the processing 10 resources may be increased such that two processors are allocated for the tentative next state of the user.
Alternatively, optionally, the step of adjusting the scale of the processing resources comprises adjusting an amount of cache memory assigned to the processing resources according to processing resource requirements of the 15 tentative next state with respect to processing resource requirements of the current state. In this regard, when the processing resource requirements of the tentative next state are higher than the processing resource requirements of the current state, the amount of the cache memory assigned to the processing resources of the processing system (that are allocated for the user) is increased, and vice versa. For 20 example, when the tentative next state requires the user to access heavy-graphics content while currently the user is accessing light-graphics content, the amount of cache memory assigned to the processing resources is increased, and vice-versa. In such an example, if 8 megabytes of cache memory is currently assigned to the processing resources, then the scale of the processing resources may be increased 25 such that 16 megabytes of cache memory is allocated for the tentative next state of the user.
Alternatively, optionally, the step of adjusting the scale of the processing resources comprises adjusting a number of processing threads being handled by each core of the processing resources according to processing resource 30
21
requirements of the tentative next state with respect to processing resource requirements of the current state. In this regard, when the processing resource requirements of the tentative next state are higher than the processing resource requirements of the current state, the number of processing threads handled by each core of the processing resources of the processing system (that are allocated 5 for the user) can be increased (i.e., scaled up), and vice versa. For example, when the tentative next state involves the user engaging in multi-tasking whereas the current state involves the user engaging in only a single task, the number of processing threads being handled by each core of the processing resources in increased, and vice-versa. In such an example, if one processing thread 10 (corresponding to the single task) is currently handled by a core, then the number of processing threads of the core may be increased to four processing threads, when the user may be predicted to perform four tasks in the tentative next state.
Referring to FIG. 4, there is shown an exemplary flow diagram 400 illustrating dynamically training of the machine learning model 402, in accordance with an 15 embodiment of the present disclosure. The flow diagram 400 represents the machine learning model 402, and the three-dimensional (3D) environment 404 of the metaverse. The machine learning model 402 is dynamically trained to predict the action likely to be performed by a user. This (predicted) action would tentatively be performed in the 3D environment 404, and this feature is depicted 20 by an arrow-head line 406a. The performance of the (predicted) action would result in a tentative next state. The user actually performs an action in the 3D environment 404, resulting in an actual next state of the user. The machine learning model 402 ingests the actual next state of the user in the 3D environment 404 and compares the actual next state with the tentative next state. This feature is 25 depicted as an arrow-head line 406b. Based on a status of matching of the actual next state with the tentative next state, the machine learning model 402 is either rewarded or penalized (an action of rewarding or penalizing being depicted by an arrow-head line 406c), to increase or decrease a cumulative reward of the machine learning model 402. 30
22
FIG. 4 is merely an example, which should not unduly limit the scope of the claims herein. A person skilled in the art will recognize many variations, alternatives, and modifications of embodiments of the present disclosure.
Optionally, the method further comprises dynamically training the machine learning model 402 by: 5
- receiving information indicative of an actual next state of the user in the 3D environment 404, wherein the actual next state is attained when an action is actually performed by the user;
- determining whether the actual next state matches the tentative next state; and 10
- when it is determined that the actual next state matches the tentative next state, rewarding the machine learning model 402 by increasing a value of a cumulative reward of the machine learning model 402.
In this regard, the information indicative of the actual next state of the user in the 3D environment 404 is received in a manner similar to the information indicative 15 of the current state of the user. The information indicative of the actual next state of the user is received in real time or near-real time, upon the action being actually performed by the user. This action can be same as or different from the action predicted to be likely performed by the user. Prior to updating the Q-table based on the actual next state of the user, the Q-table includes Q-values based on a given 20 historical record of actions. Herein, the given historical record of actions comprises at least one of: the historical record of actions performed by the user at the current state, the historical record of actions performed by other users at the current state, the historical record of actions performed by the user at the state similar to the current state, the historical record of actions performed by other 25 users at a state similar to the current state. The Q-values in the Q-table are updated based on the information indicative of the actual next state of the user. The Q-table is updated based on an equation (2) given as: ?????? ??(??,??)=??(??,??)+ ??[??(??,??)+??max??'(??',??')-??(??,??)] (2)
23
wherein, ?????? ??(??,??) represents updated Q-values for the action ?? that is actually performed by the user, given a particular state ?? (i.e., the current state from which the action is actually performed), ??(??,??) represents initial Q-values for the action ?? and the particular state ??, ?? is a learning rate, ??(??,??) is a value of a reward for taking the action in the particular state ??, ?? is a discount rate, and max??'(??',??') 5 represents a maximum expected future reward for a future action ??' that is an action performed by the user in future, given another particular state ??' (i.e., a future tentative next state from which the future action ??' is performed). Herein, the max??'(??',??') is determined by measuring a reward, iteratively, based on the action ?? that is actually performed by the user in the future, and simultaneously 10 iteratively updating the Q-table using the equation (1). Herein, the learning rate ?? is a numerical value representing an extent to which the actual next state of the user differs from the tentative next state of the user. Hence, when the learning rate ?? is 0, it means that the machine learning model 402 had correctly predicted the tentative next state of the user, and hence the actual next state matches the 15 tentative next state. Alternatively, when the learning rate ?? is 1, it means that the machine learning model 402 had incorrectly predicted the tentative next state of the user, and hence the actual next state does not match the tentative next state.
Optionally, the ??(??,??) value is randomly initialized based on the interaction of the user with the 3D environment 404, and is gradually determined as the machine 20 learning model 402 is dynamically trained. The discount rate ?? accounts for energy loss of the user when said user moves from the current state to the actual next state. Hence, the discount rate ?? is used to determine an importance of the expected reward. Herein, when the discount rate ?? approaches 0, it means that the machine learning model 402 has become short-sighted (i.e., myopic) and only 25 considers the current rewards when making its predictions. For example, the predicted action may be one corresponding to a highest Q-value from amongst all state-action pairs of the current state of the user. Alternatively, when the discount rate ?? approaches 1, it means that the machine learning model 402 strives for long-term rewards which are higher in value, in the long term, than long-term 30
24
values of the current rewards when the discount rate ?? approached 0. For example, the predicted action may correspond to a third-highest Q-value from amongst all state-action pairs of the current state of the user, when the predicted action further leads to highest rewards in later stages of state-action pairs. The discount rate ?? is then multiplied with the maximum discounted cumulative value of the expected 5 reward of the tentative next state. The learning rate ?? is multiplied with a sum of the ??(??,??) value, a product of the discount rate ?? and maximum value of the expected reward of the tentative next state, and a negative value of the Q-values present in the Q-table. Consequently, when the learning rate ?? is 0 (to denote matching of the actual next state with the tentative next state), the ?????? ??(??,??) is 10 calculated, which increases the value of the cumulative reward of the machine learning model 402. Hence, the machine learning model 402 updates the Q-table each time the information indicative of the actual next state of the user is received, thus dynamically training the machine learning model 402.
Optionally, the cumulative reward of the machine learning model 402 is 15 initialized to 0. Hence, whenever the actual next state matches the tentative next state, the value of the cumulative reward of the machine learning model 402 is increased. With every correct outcome, the cumulative reward further increases, until all possible actions and their corresponding possible states have been considered, and the Q-table has been updated accordingly. This helps the machine 20 learning model 402 to be accurate and relevant for the user, as well as for other users at states that are similar to states of the user. Over time, with such dynamic training, the machine learning model 402 beneficially correctly predicts the action that is likely to be performed by the user from the current state of the user.
Alternatively, optionally, when it is determined that the actual next state does not 25 match the tentative next state, the method further comprises penalizing the machine learning model 402 by decreasing a value of a cumulative reward of the machine learning model 402. When the learning rate ?? is 1 (to denote non-matching of the actual next state with the tentative next state), the ?????? ??(??,??) is calculated using equation (2), to decrease the value of the cumulative reward of 30
25
the machine learning model 402, by updating the Q-values in the Q-table. This means that the machine learning model 402 failed to correctly predict the action that was likely to be performed by the user from the current state of the user. Hence, the action likely to be performed by the user (as was predicted using the machine learning model 402) and the action actually performed by the user in the 5 3D environment 404 are different, and the machine learning model 402 is penalized for such wrongful prediction. It will be appreciated that a penalty is a negative reward, i.e., a penalty would negatively bring down or reduce the cumulative reward.
Optionally, the cumulative reward of the machine learning model 402 is 10 determined using a reward function, the value of the cumulative reward being indicative of a training state of the machine learning model 402. The reward function is, for example, ????+1+??????+2+??2????+3+? wherein ???? indicates a reward corresponding to a kth time, and wherein a cumulative total of all rewards received by the machine learning model 402 at multiple times in a given time 15 period indicate the training state of the machine learning model 402 at the end of the given time period. The reward function enables in determining the cumulative reward of the machine learning model 402, thereby further indicating an importance of the state-action pair from all the possible state-action pairs. Herein, the reward function is a sparse function. While using the sparse function, when the 20 action actually performed by the user matches the predicted action, the machine learning model 402 is rewarded by increasing a value of the cumulative reward by 1 unit. Alternatively, when the action actually performed by the user does not match the predicted action, the machine learning model 402 is penalized by keeping the cumulative reward at 0 unit. For example, when the cumulative 25 reward of the machine learning model 402, upon performing a given action corresponding to a given state, is 0.6, it may indicate that the given state-given action pair is 60 percent important to the machine learning model 402.
Optionally, when the actual next state of the user is attained prior to the step of adjusting the scale of the processing resources, the method further comprises 30
26
adjusting the scale of the processing resources according to the actual next state of the user upon receiving the information indicative of the actual next state of the user. In such a case, the scale of the processing resources is adjusted according to the actual next state, to meet processing resource requirements of the user with immediate effect. A technical benefit of adjusting the scale of the processing 5 resources in such a manner is that the scaling can be performed time-efficiently when the actual next state of the user is unexpectedly achieved prior to managing of the processing resources according to the tentative next state.
Referring to FIGs. 5A and 5B, illustrated are block diagrams of a system 500 for dynamically scaling processing resources of a processing system 502, in 10 accordance with different embodiments of the present disclosure. In FIGs. 5A and 5B, the system 500 comprises a processor 504. The processor 504 could be external to the processing system 502 (as shown in FIG. 5B), or be a part of the processing system 502 (as shown in FIG. 5A). When the processor 504 is a part of the processing system 502, the processing system 502 itself would then be capable 15 of scaling its processing resources dynamically. In FIG. 5B, the system 500 is shown to further comprise a data repository 506 communicably coupled to the processor 504.
Optionally, the processor 504 is further configured to dynamically train the machine learning model 402 by: 20
- receiving information indicative of an actual next state of the user in the 3D environment 404, wherein the actual next state is attained when an action is actually performed by the user;
- determining whether the actual next state matches the tentative next state; and 25
- when it is determined that the actual next state matches the tentative next state, rewarding the machine learning model 402 by increasing a value of a cumulative reward of the machine learning model 402.
27
Alternatively, optionally, when it is determined that the actual next state does not match the tentative next state, the processor 504 is further configured to penalize the machine learning model 402 by decreasing a value of a cumulative reward of the machine learning model 402.
Optionally, the cumulative reward of the machine learning model 402 is 5 determined using a reward function, the value of the cumulative reward being indicative of a training state of the machine learning model 402.
Optionally, when the actual next state of the user is attained prior to the step of adjusting the scale of the processing resources, the processor 504 is further configured to adjust the scale of the processing resources according to the actual 10 next state of the user upon receiving the information indicative of the actual next state of the user.
Optionally, when predicting the action that is likely to be performed by the user, the processor 504 is configured to utilize a predictive algorithm in the machine learning model 402 for making said prediction. 15
Optionally, the predictive algorithm takes into account at least one of: a historical record of actions performed by the user at the current state, a historical record of actions performed by other users at the current state, a historical record of actions performed by the user at a state similar to the current state, a historical record of actions performed by other users at a state similar to the current state, when 20 making said prediction.
Optionally, when adjusting the scale of the processing resources, the processor 504 is configured to increase a number of the processing resources of the processing system 502 that are allocated for the user, or decrease a number of the number of the processing resources of the processing system 502 that are 25 allocated for the user, depending on processing resource requirements of the tentative next state with respect to processing resource requirements of the current state.
28
Optionally, the data repository 506 has stored thereon at least information indicative of a plurality of states that are achievable by the user in the 3D environment 404 of the metaverse and one or more actions corresponding to each of the plurality of states, and information indicative of a required scale of the processing resources for each of the plurality of states. Herein, the term "data 5 repository" refers to hardware, software, firmware, or a combination of these for storing the information in an organized (namely, structured) manner, thereby, allowing for easy storage, access (namely, retrieval), updating and analysis of the information. The data repository 506 can be implemented as one or more storage devices. A technical advantage of using the data repository 506 is that is it 10 provides an ease of storage and access of processing the information, as well as for storage, access and further processing of outputs generated by the processor 504.
Optionally, the processor 504 is further configured to store, at the data repository 506, at least one of: the action that is likely to be performed by the user from the 15 current state of the user, the tentative next state of the user which corresponds to the action that is predicted, a value of a cumulative reward of the machine learning model 402. This is beneficial in updating the Q-values in the Q-table, and thereby enables the machine learning model 402 to predict the action likely to be performed by the user or other users, in the 3D environment 404. 20
Modifications to embodiments of the present disclosure described in the foregoing are possible without departing from the scope of the present disclosure as defined by the accompanying claims. Expressions such as "including", "comprising", "incorporating", "have", "is" used to describe and claim the present disclosure are intended to be construed in a non-exclusive manner, namely allowing for items, 25 components or elements not explicitly described also to be present. Reference to the singular is also to be construed to relate to the plural. , Claims:CLAIMS
What is claimed is:
1. A method for dynamically scaling processing resources of a processing system, the method comprising:
- receiving information indicative of a current state of a user in a three-5 dimensional (3D) environment of a metaverse;
- determining, from amongst a plurality of actions that are feasible in the 3D environment, a set of actions that are feasible from the current state of the user;
- predicting, from amongst the set of actions, an action that is likely to be performed by the user from the current state of the user, by employing a machine 10 learning model that is dynamically trained for making said prediction;
- determining a tentative next state of the user which corresponds to the action that is predicted; and
- adjusting a scale of the processing resources according to the tentative next state of the user. 15
2. A method as claimed in claim 1, further comprising dynamically training the machine learning model by:
- receiving information indicative of an actual next state of the user in the 3D environment, wherein the actual next state is attained when an action is actually performed by the user; 20
- determining whether the actual next state matches the tentative next state; and
- when it is determined that the actual next state matches the tentative next state, rewarding the machine learning model by increasing a value of a cumulative reward of the machine learning model. 25
3. A method as claimed in claim 2, wherein when it is determined that the actual next state does not match the tentative next state, the method further
30
comprises penalizing the machine learning model by decreasing a value of a cumulative reward of the machine learning model.
4. A method as claimed in claim 2 or 3, wherein the cumulative reward of the machine learning model is determined using a reward function, the value of the cumulative reward being indicative of a training state of the machine learning 5 model.
5. A method as claimed in any of claims 2, 3, or 4, wherein when the actual next state of the user is attained prior to the step of adjusting the scale of the processing resources, the method further comprises adjusting the scale of the processing resources according to the actual next state of the user upon receiving 10 the information indicative of the actual next state of the user.
6. A method as claimed in any of claims 1-5, wherein the step of predicting the action that is likely to be performed by the user comprises utilizing a predictive algorithm in the machine learning model for making said prediction.
7. A method as claimed in claim 6, wherein the predictive algorithm takes 15 into account at least one of: a historical record of actions performed by the user at the current state, a historical record of actions performed by other users at the current state, a historical record of actions performed by the user at a state similar to the current state, a historical record of actions performed by other users at a state similar to the current state, when making said prediction. 20
8. A method as claimed in any of claims 1-7, wherein the step of adjusting the scale of the processing resources comprises increasing a number of the processing resources of the processing system that are allocated for the user, or decreasing a number of the number of the processing resources of the processing system that are allocated for the user , depending on processing resource 25 requirements of the tentative next state with respect to processing resource requirements of the current state.
31
9.A method as claimed in any of claims 1-8, wherein the plurality of actionscomprise two or more of: selecting an option, selecting an object, manipulating an object, changing a setting, moving to a location, in the 3D environment.
10.A system for dynamically scaling processing resources of a processingsystem, the system comprising a processor configured to: 5
- receive information indicative of a current state of a user in a three-dimensional (3D) environment of a metaverse;
- determine, from amongst a plurality of actions that are feasible in the 3D environment, a set of actions that are feasible from the current state of the user;
- predict, from amongst the set of actions, an action that is likely to be 10 performed by the user from the current state of the user, by employing a machine learning model that is dynamically trained for making said prediction;
- determine a tentative next state of the user which corresponds to the action that is predicted; and
- adjust a scale of the processing resources according to the tentative next 15 state of the user.
| # | Name | Date |
|---|---|---|
| 1 | 202311008214-STATEMENT OF UNDERTAKING (FORM 3) [08-02-2023(online)].pdf | 2023-02-08 |
| 2 | 202311008214-POWER OF AUTHORITY [08-02-2023(online)].pdf | 2023-02-08 |
| 3 | 202311008214-FORM FOR STARTUP [08-02-2023(online)].pdf | 2023-02-08 |
| 4 | 202311008214-FORM FOR SMALL ENTITY(FORM-28) [08-02-2023(online)].pdf | 2023-02-08 |
| 5 | 202311008214-FORM 1 [08-02-2023(online)].pdf | 2023-02-08 |
| 6 | 202311008214-FIGURE OF ABSTRACT [08-02-2023(online)].pdf | 2023-02-08 |
| 7 | 202311008214-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [08-02-2023(online)].pdf | 2023-02-08 |
| 8 | 202311008214-DRAWINGS [08-02-2023(online)].pdf | 2023-02-08 |
| 9 | 202311008214-DECLARATION OF INVENTORSHIP (FORM 5) [08-02-2023(online)].pdf | 2023-02-08 |
| 10 | 202311008214-COMPLETE SPECIFICATION [08-02-2023(online)].pdf | 2023-02-08 |
| 11 | 202311008214-Proof of Right [16-03-2023(online)].pdf | 2023-03-16 |
| 12 | 202311008214-FORM-26 [16-03-2023(online)].pdf | 2023-03-16 |
| 13 | 202311008214-Correspondence-10523.pdf | 2023-05-13 |
| 14 | 202311008214-OTHERS-010523.pdf | 2023-07-17 |
| 15 | 202311008214-OTHERS-010523 -1.pdf | 2023-07-17 |
| 16 | 202311008214-GPA-010523.pdf | 2023-07-17 |