Abstract: Embodiments of the present invention provide a method and a system for generating recommendation of a plurality of stickers for a pair of users in a chat conversation. The method includes receiving user states of the pair of users that correspond to a sequence of recent chat content exchanged between the pair of users and feature vectors corresponding to preferences of the pair of the users. The method includes encoding the user states into user state vectors by a reinforcement learning (RL) agent. The user state vectors are combined using a cooperative modeling and Nash equilibrium techniques. A recommendation of a plurality of stickers is generated based on the combination of the user state vector. The method further includes receiving rewards for the RL agent upon selection of at least one sticker from the plurality of stickers by each of the users.
[0001] The present invention relates to a chat application and more particularly
relates to a method and system for recommending stickers to a pair of users
communicating via the chat application.
BACKGROUND
[0002] Generally, users communicate with other users using chat applications using
user devices such as mobile phones. The chat applications may correspond to social
network applications such as Facebook®
, Twitter®
, instant messaging (IM)
applications or multimedia messaging applications, such as WhatsApp®
, WeChat®
,
hike Messenger®
, and Snapchat®
. Such chat applications enable users to
communicate with the other users by exchanging messages. The messages may
include text message, voice message, emojis, animations, stickers, GIFs, etc. Now-adays, the users prefer expressing their emotions or thoughts using the stickers, which
provide a rich form of visual expressions of the user over text messages in chat
conversations. Typically, different types of the stickers are available on an online
application store. These different types of stickers are downloaded from the online
application store and installed to a palette associated with a chat application.
[0003] The users may select stickers from the palette and use the stickers in a chat
conversation. However, it may take time to load the stickers in the palette. In some
cases, the users may not be able to find relevant stickers for the chat conversation. For
instance, the users may be discussing on a context, such as a topic about restaurants
in the chat conversation. The users may search for stickers that are relevant to the
context. However, it may be a mundane task for the users to search and select the
relevant stickers for the chat conversation, or type text messages. Consequently, this
may reduce interest of the users and the users may discontinue or reduce engagement
in the chat conversation.
3
[0004] Accordingly, there is a need for a technical solution for providing stickers to
users in a chat conversation, while precluding the need to search and select relevant
stickers for the chat conversation. More specifically, there is need to provide relevant
stickers that are personalized and context-based for the users in the chat conversation.
SUMMARY
[0005] According to drawbacks and limitations of the existing chat applications, the
present invention is directed towards generating personalized recommendation of a
plurality of stickers to a pair of users in a chat conversation. The generation of the
recommendation of the plurality of stickers is based on a context of the chat
conversation and preferences of each user of the pair of users. The user preferences
are based on personal information, such as location information, time information of
the users, weather information of location of the users, previously opted stickers, or
the like. Further, the preferences may include a particular position of stickers on a
chat application, such as displaying the stickers at lower side, left side or right side of
the chat application, or a particular kind of stickers such as stickers of cat, avatar,
emoji, etc.
[0006] In one embodiment of the present invention, the recommendation of a
plurality of stickers may be generated locally in corresponding user equipments of the
pair of users. For instance, two users are engaged in a chat conversation session using
chat applications in their respective user equipments. Each user equipment of the pair
of users is associated with a reinforcement learning (RL) agent that generates the
recommendation of a plurality stickers using a reinforcement learning model. Each
RL agent is located at the corresponding user equipment associated with each user of
the pair of users. When one user of the pair of users opens a chat application in the
corresponding user equipment, the corresponding RL agent may obtain a sequence of
recent chat content (comprising stickers) exchanged between the pair of users for
determining context of a chat conversation.
4
[0007] In some embodiments, the sequence of recent chat content, exchanged
between the users, is represented as user states. Thus, the user states include state
variables representing information related to the context of the chat conversation as
well as personal information of the users, such as location of the users, time of day,
season, local weather, etc. The user states are encoded into user state vectors by the
respective RL agent associated with the instance of the chat application running on
the respective user equipment. In some example embodiments, the user states are
encoded locally by an RL agent in each corresponding user equipments of the pair of
users. In one example embodiment, each RL agent encodes corresponding user states
of the pair of users into user state vectors using a neural-network based contextual
encoder, such as a Recurrent Neural Network (RNN) or the like.
[0008] The user state vector of one user (i.e. first user) generated by an RL agent
(i.e. first RL agent) in a first user equipment of the first user is shared to another RL
agent (i.e. second RL agent) in a second user equipment of a second user of the pair
of users. The user state vector of the first user is shared with the second user via a
server system, and vice versa. The server system combines the user state vector of
each user (i.e. the first user and the second user) with features corresponding to a
gender, an emotion, a pose, or a preference of corresponding user for generating a
final user state vector of the corresponding user. The server system shares the final
user state vector of the first user and the second user to the second RL agent. The
second RL agent combines the final user state vector of the first user with the final
user state vector of the second user to recommend stickers to the second user. In some
example embodiments, the final user state vectors are combined using a cooperative
modeling along with a Nash equilibrium technique. The cooperative modeling is
performed so that responses of the second user are relevant to a chat content of the
first user. The Nash equilibrium technique is applied to determine accurate response
content that are beneficial to both the pair of users. In alternate embodiments, the
user states are encoded globally by an RL agent located at a server system. The
second user selects at least one sticker from the plurality of stickers. In a similar
5
manner, a plurality of stickers corresponding to the context and the preferences of the
first user is generated by the corresponding first RL agent in the first user equipment
of the first user. When the users select the at least one sticker from the recommended
plurality of stickers, the corresponding RL agents are rewarded. Additionally, the RL
agents are rewarded when the pair of users continues to engage in the chat
conversation. For instance, when the pair of users continues the chat conversation for
an extended amount of time, the RL agents are rewarded.
[0009] In an example embodiment, the RL agents interact with an environment (i.e. a
dynamic messaging environment in a chat application) based on the RL model. The
RL agent may be in many states (S) of the environment, and may take actions (A) to
transition from one state (S1) to another (S2). The next state is determined by
transition probabilities between states. When the RL agent takes an action, the
environment delivers a reward as feedback. The RL agents learn and gain experiences
based on the rewards. A policy function associated with the RL agent provides the
guideline on what is the optimal action to take in a certain state which results in
maximum total rewards. Each state is associated with a value function predicting the
expected amount of future rewards which can be received in this state by acting
according to the corresponding policy function. In other words, the value function
quantifies how good a state is. Each of the experiences (hereinafter referred to RL
agent experience) of the RL agents are encoded. The encoded RL agent experiences
comprise a state, an action, a reward and a next state of corresponding RL agent that
are stored in an experience replay buffer. Each RL agent experience undergoes an
importance weighted sampling for updating the value function of the corresponding
RL agent. The policy function of each RL agent is updated based on the updated
value function. In some embodiments, policy gradient algorithms are used for
obtaining an optimal reward based on the policy function and the value function. The
RL agents receive the optimal reward based on actions taken by the RL agents under
the policy function.
6
[0010] Further, the RL agents determine end of the chat conversation and cease the
process of generating the recommendation of a plurality of stickers. In some cases,
the end of chat conversation is determined when the pair of users is not engaged in
the chat conversation for more than a pre-defined threshold time-period. For instance,
if the pair of users is not engaged in the chat conversation for more than 20 mins or
30 mins, the RL agents cease the generation of recommending the plurality of
stickers. In some other cases, the end of the chat conversation is determined when last
messages in the chat conversation include information, such as ‘bye’, ‘later’, ‘see
you’, etc.
[0011] Accordingly, embodiments of the present invention disclose a method for
recommending a plurality of stickers for a chat conversation. The method includes
generating a first user state vector of a first user and a second user state vector of a
second user of a pair of users in the chat conversation. The method further includes
generating a final user state vector based on combination of the first user state vector
and the second user state vector using cooperative modeling technique. The method
includes generating a recommendation of a plurality of stickers based on the final
user state vector.
[0012] Accordingly, embodiments of the present invention disclose a system for
recommending a plurality of stickers for a chat conversation. The system comprises a
memory configured to store a set of instructions to be executed by a processor. The
processor is configured to generate a first user state vector of a first user and a second
user state vector of a second user of a pair of users in the chat conversation. The
processor is further configured to generating a final user state vector based on
combination of the first user state vector and the second user state vector using
cooperative modeling technique. The processor is further configured to generating a
recommendation of a plurality of stickers based on the final user state vector.
BRIEF DESCRIPTION OF THE DRAWINGS
7
[0013] FIG. 1 illustrates a network environment for generating a recommendation of
a plurality of stickers to a pair of users in a chat conversation, in accordance with an
example embodiment of the present invention;
[0014] FIG. 2 shows a block diagram of a server system for generating the
recommendation of the plurality of stickers, in accordance with an example
embodiment of the present invention;
[0015] FIG. 3 shows a schematic diagram of an architecture of the server system for
recommending stickers using local RL agents located in each user equipment, in
accordance with an example embodiment of the present invention;
[0016] FIG. 4 shows a method flow diagram for generating the recommendation of
the plurality of stickers for the pair of users in the chat conversation, in accordance
with an example embodiment of the present invention; and
[0017] FIG. 5 shows a user interface (UI) of a user equipment displaying the plurality
of stickers recommended for the user in the chat conversation with another user, in
accordance with an example embodiment of the present invention.
DETAILED DESCRIPTION
[0018] The drawings accompanied herein constitute a part of this disclosure.
Reference numerals refer to same parts throughout the different diagrams.
Components in the diagrams are not necessarily to scale. Some diagrams may
indicate the components of block diagrams and may not represent internal circuitry of
each component. The diagrams are provided herein for understanding purpose of the
disclosure.
[0019] Throughout the following description, numerous references may be made
regarding servers, services, or other systems formed from computing devices. It
should be appreciated that the use of such terms is deemed to represent one or more
computing devices having at least one processor configured to or programmed to
execute software instructions stored on a computer readable tangible, non-transitory
8
medium or also referred to as a processor readable medium. For example, a server
system can include one or more computers operating as a web server, data source
server, or other type of computer server in a manner to fulfill described roles,
responsibilities, or functions. Within the context of this document, the disclosed
modules are also deemed to comprise computing devices having a processor and a
non-transitory memory storing instructions executable by the processor that cause the
device to control, manage, or otherwise manipulate the features of the devices or
systems.
[0020] The embodiments are described herein for illustrative purposes and are
subject to many variations. It is understood that various omissions and substitutions
of equivalents are contemplated as circumstances may suggest or render expedient
but are intended to cover the application or implementation without departing from
the spirit or the scope of the present disclosure. Further, it is to be understood that the
phraseology and terminology employed herein are for the purpose of the description
and should not be regarded as limiting. Any heading utilized within this description is
for convenience only and has no legal or limiting effect.
[0021] The term “sticker” used herein throughout the description may include a
character-driven illustration representing a message, an action or emotion for using in
a chat, on a messaging application, a social media application, or the like.
[0022] FIG. 1 illustrates a network environment (100) for generating
recommendation of a plurality of stickers to a pair of users, such as a user (102A) and
a user (102B) in a chat conversation, in accordance with an example embodiment of
the present invention. Each user, i.e. the user (102A) and the user (102B) is
associated with a user equipment (104A) and a user equipment (104B), respectively.
Some examples of the user equipments (104A and 104B) include, but are not be
limited to a desktop, a computer, a tablet, a phablet, and a smartphone. In an example
scenario, the pair of users (102A) and (102B) is engaged in the chat conversation
using chat applications, such as a chat application (106A) and a chat application
(106B) in the corresponding user equipments (104A) and (104B). The chat
9
applications (106A) and (106B) correspond to an application interface hosted by a
server system (108). For instance, the chat applications (106A and 106B) may
correspond to social networking applications such as Facebook® Twitter®, instant
messaging (IM) applications or multimedia messaging applications, such as
WhatsApp®, WeChat®, hike messenger®, and Snapchat®.
[0023] The user equipments (104A and 104B) communicate with the server system
(108) via a network (110). The network (110) may comprise suitable logic, circuitry,
and interfaces that may be configured to provide a plurality of network ports and a
plurality of communication channels for transmission and reception of data. Each
network port may correspond to a virtual address (or a physical machine address) for
transmission and reception of the communication data. For example, the virtual
address may be an Internet Protocol Version 4 (IPv4) (or an IPv6 address) and the
physical address may be a Media Access Control (MAC) address. The network may
be associated with an application layer for implementation of communication
protocols based on one or more communication requests from at least one of the one
or more communication devices. The communication data may be transmitted or
received, via the communication protocols. Examples of such wired and wireless
communication protocols may include, but are not limited to, Transmission Control
Protocol and Internet Protocol (TCP/IP), User Datagram Protocol (UDP), Hypertext
Transfer Protocol (HTTP), File Transfer Protocol (FTP), ZigBee, EDGE, infrared
(IR), IEEE 802.11, 802.16, cellular communication protocols, and/or Bluetooth (BT)
communication protocols.
[0024] Examples of the network (110) may include, but is not limited to a wireless
channel, a wired channel, a combination of wireless and wired channel thereof. The
wireless or wired channel may be associated with a network standard which may be
defined by one of a Local Area Network (LAN), a Personal Area Network (PAN), a
Wireless Local Area Network (WLAN),, Wireless Area Network (WAN), Wireless
Wide Area Network (WWAN), a Long Term Evolution (LTE) network, a plain old
telephone service (POTS), and a Metropolitan Area Network (MAN). Additionally,
10
the wired channel may be selected on the basis of bandwidth criteria. For example, an
optical fiber channel may be used for a high bandwidth communication. Further, a
coaxial cable-based or Ethernet-based communication channel may be used for
moderate bandwidth communication.
[0025] The user (102A) is engaged in the chat conversation with the user (102B).
However, the user (102B) may be reluctant to participate in the chat conversation due
to difficulty in finding relevant stickers in the chat conversation with the user (102A)
or lack of interest in typing text message, for example, the user (102B) may be
engaged in some other activity (such as eating, reading) while chatting with user
(102A), and therefore not interested in making much effort in the conversation. This
may affect interest of the user (102A), and the chat conversation session between the
pair of users (102A and 102B) may end. The loss of interest in the chat conversation
session may result in less engagement with the chat applications (106A and 106B).
Consequently, the chat applications (106A and 106B) may become obsolete, such
that the users (102A and 102B) may uninstall it from their user equipment (i.e. the
user equipment 104A and 104B). In order to overcome such scenario, a plurality of
stickers that are relevant to a context of the chat conversation and preferences of each
of the pair of users (102A) and (102B) is generated. The preferences are based on
personal information of the users (102A and 102B), such as location information,
time information, weather information, season information, history of stickers used in
previous chat conversation, etc.
[0026] In the chat conversation, the pair of users (102A and 102B) exchange chat
content comprising a sequence of textual message, audio message, image or video
message as well as stickers. The exchange of chat content may be stored in a database
(not shown in FIG. 1) of the server system (108). Additionally, the exchange of chat
content may include the personal information of the users (102A and 102B).
[0027] In such a scenario, a plurality of stickers for each pair of users (102A and
102B) is recommended. The plurality of stickers is recommended based on contextual
information of the chat conversation and preferences of the pair of users (102A and
11
102B). This recommendation is performed using the server system (108), which is
explained next in FIG. 2.
[0028] Referring now to FIG. 2, a block diagram 200 of the server system (108) of
FIG. 1 for generating the recommendation of a plurality of stickers is illustrated,
according to some embodiments. The server system (108) comprises one or more
processors, such as a processor (202), a memory (204), and a communication
interface (206).
[0029] The processor (202) may comprise suitable logic, circuitry, and interfaces that
may be configured to execute set of instructions stored in the memory (204) for
generating the recommendation of plurality of stickers. The processor (202) may be
embodied in a number of different ways. For example, the processor (202) may be
embodied as one or more of various hardware processing means such as a
coprocessor, a microprocessor, a controller, a digital signal processor (DSP), a
processing element with or without an accompanying DSP, or various other
processing circuitry including integrated circuits such as, for example, an ASIC
(application specific integrated circuit), an FPGA (field programmable gate array), a
microcontroller unit (MCU), a hardware accelerator, a special-purpose computer
chip, or the like. As such, in some embodiments, the processor (202) may include one
or more processing cores configured to perform independently.
[0030] Examples of the processor (202) may be an Application-Specific Integrated
Circuit (ASIC) processor, a Complex Instruction Set Computing (CISC) processor, a
central processing unit (CPU), an Explicitly Parallel Instruction Computing (EPIC)
processor, a Very Long Instruction Word (VLIW) processor, and/or other processors
or circuits. The processor 202 may implement a number of processor technologies
known in the art such as a reinforcement learning, a cooperative learning, or a
machine learning model, , or the like. As such, in some embodiments, the processor
(202) may include one or more processing cores configured to perform
independently. A multi-core processor may enable multiprocessing within a single
physical package.
12
[0031] Additionally or alternatively, the processor (202) may include one or more
processors configured in tandem via a bus to enable independent execution of
instructions, pipelining and/or multithreading. Additionally or alternatively, the
processor (202) may include one or processors capable of processing large volumes
of workloads and operations to provide support for big data analysis. However, in
some cases, the processor (202) may be a processor specific device (for example, a
mobile terminal or a fixed computing device) configured to employ an embodiment
of the disclosure by further configuration of the processor (202) by instructions for
performing the algorithms and/or operations described herein.
[0032] The memory (204) may comprise suitable logic, circuitry, and interfaces that
may be configured to store the set of instructions for the recommendation of a
plurality of stickers, a machine code and/or instructions executable by the processor
(202). Additionally, the set of instructions may include program codes corresponding
to reinforcement learning techniques for recommendation of a plurality of stickers.
The memory (202) may be non-transitory and may include, for example, one or more
volatile and/or non-volatile memories. For example, the memory (204) may be an
electronic storage device (for example, a computer readable storage medium)
comprising gates configured to store data (for example, bits) that may be retrievable
by a machine (for example, a computing device like the processor 202). The memory
(204) may be configured to store information, data, content, applications,
instructions, or the like, for enabling the apparatus to carry out various functions in
accordance with an example embodiment of the present invention. For example, the
memory (204) may be configured to store information including processor
instructions for generating recommendation of the plurality of stickers to the user
(102A and 102B) in chat conversation. Examples of implementation of the memory
(204) may include, but are not limited to, Random Access Memory (RAM), Read
Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory
(EEPROM), Hard Disk Drive (HDD), a Solid-State Drive (SSD), a CPU cache,
and/or a Secure Digital (SD) card.
13
[0033] The communication interface (206) may comprise input interface and output
interface for supporting communications to and from any component with which the
server system (108) may communicate. The communication interface (206) may be
any means such as a device or circuitry embodied in either hardware or a combination
of hardware and software that is configured to receive and/or transmit data to/from a
communications device in communication with the user equipments (104A and
104B). In this regard, the communication interface (206) may include, for example,
an antenna (or multiple antennas) and supporting hardware and/or software for
enabling communications with a wireless communication network. Additionally or
alternatively, the communication interface (206) may include the circuitry for
interacting with the antenna(s) to cause transmission of signals via the antenna(s) or
to handle receipt of signals received via the antenna(s). In some environments, the
communication interface (206) may alternatively or additionally support wired
communication. As such, for example, the communication interface (206) may
include a communication modem and/or other hardware and/or software for
supporting communication via cable, digital subscriber line (DSL), universal serial
bus (USB) or other mechanisms.
[0034] The processor (202) is configured to generate, using a reinforcement learning
(RL) model, the recommendation of a plurality of stickers to the pair of users (102A
and 102C) that are provided to each corresponding chat applications (i.e. the chat
applications 106A and 106B) via the communication interface (206). In some
example embodiments, the processor (202) may receive a user state of each user, i.e.
the user (102A) and the user (102B). In one embodiment, the user state include state
variables that correspond to a sequence of exchanged chat content between the pair of
users (102A) and (102B) in the chat conversation. The sequence of exchanged
content comprises textual messages, audio messages, and stickers. Each of the
sequence of exchanged contents is represented as the user state.
[0035] The processor (202) may further encode each of the user states into user state
vectors. The user state vectors include values or weights representing contextual
14
information of the chat conversation and preferences of the users based on personal
information of the pair of users (102A) and (102B). In some embodiments, the
processor (202) may facilitate recommendation of stickers to users by executing
program codes of corresponding to the reinforcement learning model. In some
example embodiments, the processor (202) is configured to provide an RL agent in
each user equipment, such as the user equipment (104A and 104B), when the chat
applications (106A and 106B) are installed in the user equipments (104A and 104B)
respectively. The RL agents in the user equipments (104A and 104B) locally encode
the user states. In some alternate embodiments, the processor (202) is configured to
encode the user states globally by an RL agent.
[0036] In some embodiments, the user states are encoded into the corresponding user
state vectors using a neural-network based contextual encoder (e.g. a Recurrent
Neural Network). Further, each of the user state vectors of the users (102A and 102B)
are combined using a cooperative modeling and a Nash equilibrium technique to
provide response content that is relevant and accurate to both the users (102A and
102B). Further, the processor (202) concatenates the user state vectors with features
corresponding to a gender, an emotion, a pose, or the preferences of the users (102A
and 102B) to generate a final state vector (SA’) and (SB’) of the respective users
(102A and 102B). The processor (202) generates the recommendation of a plurality
of stickers for each user (i.e., the user 102A and 102B) based on the final state
vectors.
[0037] Further, each RL agent receives a reward for each sticker recommendation
action given to the corresponding user of the pair of users (102A and 102B), when
one or more stickers are selected by the user from the plurality of recommended
stickers by the RL agent. Additionally, the RL agent receives a reward for every
additional time-period, such as additional minute the pair of users (102A and 102B) is
engaged in the chat conversation. The earned rewards contribute to experiences (i.e.
RL agent experience) comprising a state, an action, a reward and a next state of a
corresponding RL agent. In an example embodiment, the user state represents a
15
sequence of stickers shared by the user (e.g. the user 102A) in the chat conversion,
the action corresponds to a recommendation of a plurality of stickers to the user by
the RL agent, and the next state represents the future stickers to be recommended to
the user according to the chat context. The RL agent functions according to the policy
function and the value function of the RL model associated with the RL agent. The
policy function and the value function are updated based on the RL agent experience.
Each RL agent experience is stored in an experience replay buffer in the memory
(204). In an alternate embodiment, the experience replay buffer storing the RL agent
experience may also be present in a local memory of the corresponding user
equipment The processor (202) is also configured to perform an importance weighted
sampling for each RL agent experience. The importance weighted sampling provides
an updated value function that estimates optimal actions with respect to current user
state. The updated value function is used in updating the policy function that
determines RL agent’s behavior or actions. Further, the processor (202) is configured
to determine end of the chat conversation when the pair of users (102A and 102B) are
not engaged in the chat conversation for more than a pre-defined threshold timeperiod, for example, 30 minutes. Alternatively, the processor (202) determines the
end of the chat conversation based on chat contents with messages, such as ‘bye’,
‘see you’, ‘catch you later’, ‘take care’, or the like.
[0038] In some embodiments, the user states are encoded into the state vectors locally
by an RL agent in each of the corresponding user equipments (104A and 104B), and
recommends stickers based on combination of user state vectors of both the users
using cooperative modelling and Nash equilibrium, which is described next in FIG. 3.
[0039] FIG. 3 illustrates a schematic diagram (300) of an architecture of the server
system (108) for recommending stickers using local RL agents, such as an RL agent
(302A) and another RL agent (302B) located in each of the user equipment (104A)
and the user equipment (104B) respectively, in accordance with an example
embodiment of the present invention. Each of the RL agents (302A and 302B)
corresponds to a reinforcement learning model which makes a decision of actions to
16
be taken in a situation, i.e. the recommendation of a plurality of stickers. In one
example scenario, whenever a user (102A) opens the chat application (106A) on the
user equipment (104A), the RL agent (302A) may obtain a recent history of chat
content from the server system (108). The recent history of chat content includes a
sequence of exchanged messages comprising stickers. Additionally, the recent history
of chat content may entail personal information of the user (102A), location
information of the user (102A), time-zone information of the user (102A), or the like.
[0040] The RL agent (302A) converts the sequence of exchanged messages into a
user state for the user (102A). The exchanged messages comprise messages shared by
both the users (102A and 102B) in the chat. In some example embodiments, the user
state is encoded using a neural-network based contextual encoder (e.g. a Recurrent
Neural Network) of the RL agent (302A). More specifically, the contextual encoder
may take a semantic or vector representation of the user state and convert the
semantic representation into a user state vector (SA) of the user (102A). In a similar
manner, the RL agent (302B) encodes a user state of the user (102B) into a user state
vector (SB). The RL agent (302A) further sends the user state vector of the user
(102A) to the server system (108). The server system (108) combines additional
features associated with the user state vector of the user (102A) and shares a final
user state vector of (SA”) the user (102A) to the RL agent (302B) located in the user
equipment (104B). The RL agent (302B) combines the final user state vectors (SA’)
and (SB’) of the users (102A) and (102B) based on a cooperative modeling and a
Nash equilibrium technique.
[0041] The cooperative modeling is performed to provide responses of the user
(102B) that are relevant to the chat content provided by the user (102A) in the chat
conversation. After performing the cooperative modeling, the Nash equilibrium is
applied so that the recommendation of a plurality of stickers for a follow-up response
is accurate for the chat conversation. The Nash equilibrium determines behavior or
path for the RL agent (302B) that corresponds to the required recommendation for a
plurality of stickers for the follow-up responses of the user (102B). Thus, the
17
cooperative modeling and the Nash equilibrium technique enable the RL agent
(302B) to recommend stickers for the user (102B) that are relevant to respond to chat
content of the user (102A), and also take into consideration the personalized
relationship between the user (102A) and the user (102B).
[0042] Further, the state vectors of each user are concatenated with features of a
gender, an emotion, a pose, or the preferences of the respective user to generate the
final state vector. The RL agent (302B) combines the final state vectors (SA + SB) of
both the users. The RL agent (302B) generates a recommendation of a plurality of
stickers for the follow-up responses of the user (102B) based on the combination of
the final state vectors The combined final state vectors (
) may be represented as
[0043] (
) (
) (1)
[0044] where, “W” and “b” are learnable matrix for learning dependencies between a
user state (
) of user (102A) and user state (
) of user (102B) and “|
|” is concatenation operation between encoded user states, i.e. final state vector of
user (102A) (
) and final state vector of user (102B) (
).
[0045] Accordingly, the user (102B) may select at least one sticker from the plurality
of stickers for the follow-up responses. When the user (102B) selects the stickers, the
RL agent (302B) is rewarded. Additionally, the RL agent (302B) receives an
additional reward when both the users (102A and 102B) continue to engage in the
chat conversation. The additional time spent in the chat conversation may be
considered in minutes. The rewards of each of the RL agents (302A and 302B) is
represented as reward = R (
|
,
), where
represents stickers selected by
the user (102B) and
,
represents exchanged stickers. The RL agents (302A and
302B) may stop the process of generating the recommendation of a plurality of
stickers to the users (102A and 102B), when the chat content includes messages or
stickers of ‘bye’, ‘take care’, etc.
[0046] In an example embodiment, the policy function and the value function
associated with the RL agent are also updated, based on the rewards received by the
18
RL agent. The RL model may comprise a transition probability function and a reward
function. The transition probability function records the probability of transitioning
from state S1 to S2 after taking action A while obtaining reward R. The reward
function predicts the next reward triggered by one action. Based on the experiences of
the RL agent while interacting with the environment, the value function and policy
function are updated so that the RL agent may reach to the optimal value function and
optimal policy function. The experience of the RL agent is defined by the action
taken and rewards gained by the RL agent, i.e. transition probability function and a
reward function. The value function and the policy function guide the RL agent to
take actions and gain rewards. Accordingly, after each experience, the value function
and the policy function are updated for each RL agent present at respective user
equipment. Further, the experiences of the RL agent are stored in an experience relay
buffer (not shown). In an example embodiment, the RL model applies an importance
weighted sampling technique for updating of the value function and the policy
function.
[0047] In some alternate embodiments, the user states of the pair of users (102A and
102B) may be encoded globally at the server system (108). The server system (108)
encodes the user state using a global RL agent. The global RL agent may be stored in
the memory (204) of the server system (108). In an example scenario, the server
system (108) receives chat content exchanged between the pair of users (102A and
102B), the network (110). The RL agent accesses a sequence of recent chat contents
exchanged between the pair of users (102A and 102B). The RL agent determines user
states of the pair of users (102A and 102B) from the sequence of recent chat contents.
The user states are encoded into state vectors (i.e. (SA) and (SB)) that include
representing contextual information of the chat conversation and personal information
of the users (102A and 102B) that provide preferences of the users (102A and 102B).
The RL agent generates the state vectors on the basis of which the recommendation
of a plurality of stickers for the chat conversation is generated for both the users
(102A and 102B).
19
[0048] FIG. 4 shows a method flow (400) diagram for generating a recommendation
of a plurality of stickers for the pair of users (102A and 102B) in a chat conversation,
in accordance with an example embodiment of the present invention. The method
flow (400) starts at step (402). In an example embodiment, the user (102A) initiates
the chat conversation by sending a sticker, such as ‘hi’ and continues sending a text
message of ‘I’m planning for a trip’. The user (102B) may respond back with another
sticker saying ‘hi’ to chats of the user (102A). The chat content exchanged between
the pair of users (102A and 102B) comprises a sequence of recent messages shared
by both the users (102A and 102B) over the chat. The sequence of recent chat content
(comprising stickers) exchanged between the users (102A and 102B) provide user
states of the users (102A) and user (102B). Further, personal information that provide
preferences of the users (102A and 102B) are also extracted as feature vectors.
[0049] At step (404), the user states of the user (102A and 102B) corresponding to
the sequence of recent chat content exchanged and feature vectors corresponding to
preferences of the pair of users are generated. The user states include information
representing context of the chat conversation and personal information of the users
(102A and 102B), such as location, time-zone, weather, season, previously opted
stickers, etc.
[0050] At step (406), the user states are encoded into user state vectors by an RL
agent. In some embodiments, the user states of the users (102A and 102B) are locally
encoded into user state vectors by corresponding RL agents (302A and 302B) in the
user equipments (104A and 104C). Further, one of the RL agents, such as the RL
agent (302A) sends the user state vector (SA) of the user (102A) to the server system
(108). The server system (108) combines additional features associated with the user
(such as a gender, an emotion, a user preference, or the like), with the user state
vector of the user (102A). The server system (108) shares the final user state vector
(SA’) of the user (102A) to the RL agent (302B) in the user equipment (104B). In an
alternate embodiment, the RL agent in the server system (108) globally encodes the
user states into the state vectors (SA’) and (SB’) of the users (102A and 102B). The
20
user states may be encoded using a neural network based contextual encoder, such as
an RNN.
[0051] At step 408, the final state vectors (SA’) and (SB’) of the users (102A and
102B) are received by the RL agent (for example the RL agent 302B) from the server
system (108). At step (410), combining the final user state vectors (SA’) and (SB’) of
the pair of users (102A and 102B) using a cooperative modeling and a Nash
equilibrium technique.
[0052] At (412), a recommendation of a plurality of stickers is generated based on the
combination of the final user state vectors. The generated recommendation of a
plurality of stickers is provided to the users (102A and 102C) via the corresponding
chat applications (106A and 106B) in the user equipments (104A and 104B). Each of
the users (102A and 102B) selects at least one sticker from the plurality of stickers.
[0053] At (414), rewards for the RL agent are received upon selection of the at least
one sticker from the recommendation of a plurality of stickers by each of the users
(102A and 102B). Additionally, the RL agent receives the rewards when the pair of
users (102A and 102B) continues to engage in the chat conversation. The rewards
enable the RL agent (such as the RL agent 302A and the RL agent 302B) to learn and
gain experiences (i.e. RL agent experiences). An RL agent experience comprises a
collection of state, an action, a reward and a next state of corresponding RL agent.
The collection is stored in an experience replay buffer that undergoes an importance
weighted sampling for updating a value function of each of the RL agents (302A and
302B). The value function estimates values for actions of the corresponding RL agent
that are used for updating a policy function of the RL agent. The policy function
determines policies defining behavior or actions for each of the RL agents (302A and
302B). Further, the RL agent determines end of the chat conversation when the pair
of users (102A and 102B) is engaged in the chat conversation for a pre-defined
threshold time, such as 20 mins, 30 mins and/or the like. Additionally, the RL agent
determines the end of the chat conversation when there is a chat content
corresponding to ‘bye’, ‘goodbye’, ‘see you’, etc. In some embodiments, each of the
21
RL agents (302A and 302B) receives the rewards when the corresponding user (such
as the user 102A and the user 102B) selects at least one sticker from the plurality of
stickers. The method (400) ends at step (414). Further, the method may comprises
appending the reward based on extension of time duration for the chat conversation
by a pre-defined threshold time.
[0054] The recommendation of a plurality of stickers displayed to one of the users
(e.g. the user 102A) in the pair of users (102A and 102B) in the chat conversation, is
shown in FIG. 5.
[0055] FIG. 5 shows a user interface (UI) (500) of the user equipment (104B)
displaying a recommendation of a plurality of stickers (504), in accordance with an
example embodiment of the present invention. The UI (500) displays a chat
conversation between the pair of users (102A and 102B). As shown in the UI (500),
the user (102B) receives a chat content (502) comprising a sticker of ‘hi’ and a text
message of ‘I’m planning for a trip’ along with a sticker related to travel. As
mentioned earlier, user states of the pair of users (102A) in the chat conversation are
encoded into state vectors on the basis of which the recommendation of a plurality of
stickers (504) is generated and displayed in the UI (500). The plurality of stickers
(504) corresponds to context of the chat conversation as well as preferences of the
user (102B). The user (102B) selects a sticker (506) of ‘hello’ from the plurality of
stickers (504) to respond to first part of the chat content (502). The user (102B) also
selects a sticker (508) from the plurality of stickers (504) to respond to second part of
the chat content (502). The sticker (508) indicates a message ‘Where are you going?’.
[0056] When the user (102B) selects stickers, such as the sticker (506) and the sticker
(508), corresponding RL agent that generated the plurality of stickers (504) is
rewarded. For instance, the RL agent (302B) in the user equipment (104B) is
rewarded when the stickers (506) and (508) are selected. Additionally, the RL agent
(302B) (as well as the RL agent 302A) receives rewards when the pair of users (102A
and 102B) continues to engage in the chat conversation. Further, the recommendation
process for the plurality of stickers (e.g. plurality of stickers 504) may end when the
22
pair of users (102A and 102B) are not engaged in the chat conversation for more than
a pre-defined time threshold, e.g. 20 minutes. In some cases, the users (102A or
102B) may send chat content comprising ‘bye’, ‘goodnight’, ‘later’, etc. In such
cases, the RL agents (302A and 302B) determines the chat content with ‘bye’,
‘goodnight’, ‘later’, or the like and ceases the recommendation of the plurality of
stickers.
[0057] Various embodiments of the present invention disclose a method and a system
for recommending a plurality of stickers to a pair of users in a chat conversation. The
plurality of stickers is relevant to context of the chat conversation and preferences of
the pair of users. Each user selects at least one sticker from the plurality of stickers
with ease, while precluding the need to scroll and search from a large of stickers.
Moreover, the pair of users is encouraged to engage in the chat conversation as the
ease of finding relevant stickers is improved.
[0058] Many modifications and other embodiments of the inventions set forth herein
will come to mind of one skilled in the art to which these inventions pertain having
the benefit of the teachings presented in the foregoing descriptions and the associated
drawing. Therefore, it is to be understood that the inventions are not to be limited to
the specific embodiments disclosed and that modifications and other embodiments
are intended to be included within the scope of the present disclosure. Moreover,
although the foregoing descriptions and the associated drawing describe example
embodiments in the context of certain example combinations of elements and/or
functions, it should be appreciated that different combinations of elements and/or
functions may be provided by alternative embodiments without departing from the
scope of the present disclosure. In this regard, for example, different combinations of
elements and/or functions than those explicitly described above are also contemplated
as may be set forth in some of the present disclosure. Although specific terms are
employed herein, they are used in a generic and descriptive sense only and not for
purposes of limitation.
We Claim:
1. A method for recommending a plurality of stickers (504) for a chat conversation,
the method comprising:
generating (404) a first user state vector of a first user (102A) and a second
user state vector of a second user (102B) of a pair of users (102A and 102B) in the
chat conversation;
generating (408) a final user state vector based on combination of the first
user state vector and the second user state vector using cooperative modeling
technique; and
generating (412) a recommendation of a plurality of stickers based on the
final user state vector.
2. The method as claimed in claim 1, further comprising encoding (406) each user
state of the first user (102A) and the second user (102B) to generate the first user
state vector and the second user state vector based on a reinforcement learning (RL)
model.
3. The method as claimed in claim 1, wherein the recommendation of the plurality
of stickers is generated according to Nash equilibrium technique.
4. The method as claimed in claim 1, wherein the user state of each user of the pair
of users (102A and 102B) corresponds to a sequence of exchanged stickers in the chat
conversation by both the users (102A and 102B).
5. The method as claimed in claim 1, further combining one or more additional
features associated with each user with the respective user state vector of each user of
the pair of users (102A and 102B).
6. The method as claimed in claim 4, wherein the one or more additional features
comprise at least one of a gender, an emotion, a pose, or one or more preferences of
the users (102A and 102B).
7. The method as claimed in claim 1, further comprising determining an end of the
chat conversation based on non-sharing of a sticker for a pre-determined time period.
24
8. The method as claimed in claim 2, further comprising:
receiving a reward for an RL agent of the RL model based on at least one
sticker selected by the pair of users (102A and 102B) from the recommendation; and
updating the RL model based on the reward received using policy gradient
algorithms.
9. A system for recommending a plurality of stickers (504) for a chat conversation,
the system comprising:
a memory (204) configured to store instructions;
at least one processor (202) configured to execute the instructions to :
generate (404) a first user state vector of a first user (102A) a second
user state vector associated with a second user (102B) of a pair of users (102A
and 102B) in the chat conversation;
generate (408) a final user state vector based on combination of the
first user state vector and the second user state vector using cooperative
modeling technique; and
generate (412) a recommendation of a plurality of stickers based on
the final user state vectors.
10. The system as claimed in claim 9, wherein the at least one processor (202) is
further configured to encode (406) each user state of the first user (102A) and the
second user (102B) to generate the first user state vector and the second user state
vector based on a reinforcement learning (RL) model.
11. The system as claimed in claim 9, wherein the recommendation of the plurality of
stickers is generated according to Nash equilibrium technique.
12. The system as claimed in claim 9, wherein the user state of each user of the pair
of users (102A and 102B) corresponds to a sequence of exchanged stickers in the chat
conversation by both the users (102A and 102B).
13. The system as claimed in claim 9, wherein the at least one processor (202) is
further configured to combine one or more additional features associated with each
25
user with the respective user state of each user of the pair of users (102A and 102B);
and
wherein the one or more additional features comprise at least one of a gender,
an emotion, a pose, or preferences of the users (102A and 102B).
14. The system as claimed in claim 10, the at least one processor (202) further
configured to:
receive a reward for an RL agent of the RL model based on at least one sticker
selected by each user of the pair of users (102A and 102B) from the recommendation;
and
update the RL model based on the reward received using policy gradient
algorithms.
| # | Name | Date |
|---|---|---|
| 1 | 202011012282-STATEMENT OF UNDERTAKING (FORM 3) [21-03-2020(online)].pdf | 2020-03-21 |
| 2 | 202011012282-FORM 1 [21-03-2020(online)].pdf | 2020-03-21 |
| 3 | 202011012282-FIGURE OF ABSTRACT [21-03-2020(online)].pdf | 2020-03-21 |
| 4 | 202011012282-DRAWINGS [21-03-2020(online)].pdf | 2020-03-21 |
| 5 | 202011012282-DECLARATION OF INVENTORSHIP (FORM 5) [21-03-2020(online)].pdf | 2020-03-21 |
| 6 | 202011012282-COMPLETE SPECIFICATION [21-03-2020(online)].pdf | 2020-03-21 |
| 7 | 202011012282-FORM-26 [18-05-2020(online)].pdf | 2020-05-18 |
| 8 | 202011012282-Proof of Right [24-08-2020(online)].pdf | 2020-08-24 |
| 9 | abstract.jpg | 2021-10-18 |
| 10 | 202011012282-FORM 18 [21-03-2024(online)].pdf | 2024-03-21 |
| 11 | 202011012282-FER.pdf | 2025-06-06 |
| 12 | 202011012282-FORM 3 [02-09-2025(online)].pdf | 2025-09-02 |
| 1 | 202011012282_SearchStrategyNew_E_searchfileE_22-01-2025.pdf |