Abstract: Advertising channels have evolved from conventional techniques, where users are exposed to a sequence of advertisement campaigns via various communication channels. While advertisers revisit the design of advertising campaigns to concurrently serve the requirements emerging out of new ad channels, it is also critical for advertisers to estimate the contribution from touch-points (view, clicks, converts) on different channels, based on the sequence of customer actions. A deep recurrent neural network architecture is implemented which is a causal attribution mechanism for user-personalized MTA in the context of observational data. More specifically, present disclosure implements a causal recurrent network (CRN) that minimizes selection bias in channel assignment across time-steps and touchpoints. Users’ pre-conversion actions is utilized to predict per-channel attribution. [To be published with FIG. 2]
FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION (See Section 10 and Rule 13)
Title of invention:
CAUSAL ATTENTION-BASED ESTIMATION OF PER-CHANNEL
ATTRIBUTION
Applicant
Tata Consultancy Services Limited A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
Preamble to the description
The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD [001] The disclosure herein generally relates to multi-touch attribution techniques, and, more particularly, to causal attention-based estimation of per-channel attribution.
BACKGROUND [002] Advertising channels have evolved from conventional print media, billboards, and radio-advertising to online digital advertising (ad), where the users are exposed to a sequence of ad campaigns via social networks, display ads, search etc. While advertisers revisit the design of advertising campaigns to concurrently serve the requirements emerging out of new ad channels, it is also critical for advertisers to estimate the contribution from touch-points (view, clicks, converts) on different channels, based on the sequence of customer actions. This process of contribution measurement is often referred to as multitouch attribution (MTA). Traditionally, in the pre-internet era, advertising was carried out through different advertising channels such as print media, radio, TV, billboards, and direct mail. However, in the internet age, with the advent of digital advertising, multichannel marketing employs a combination of offline (retail, newspapers, billboards, mail order catalogs, radio, etc.) and online (websites, display ads, social media, paid search, email, mobile) media in order to better engage with the end-users. Multichannel marketing has posed new challenges in the task of determining the per-channel conversion credits, which is the value of each customer engagement (also called as a customer touchpoint).
SUMMARY [003] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one aspect, there is provided a processor implemented method for causal attention-based estimation of per-channel attribution. The method comprises receiving, via one or more hardware processors, a browser history associated with one or more
users, wherein the browser history comprises one or more touchpoints, and wherein the one or more touchpoints correspond to one or more interactions of the one or more users with information comprised in one or more channels being viewed; learning one or more latent state representations for each of the one or more touchpoints; learning one or more unbiased click representations for each of the one or more touchpoints using the one or more learned latent state representations; learning, via an attention network executed by the one or more hardware processors, a per-touchpoint attribution weight for each of the one or more touchpoints, based on the one or more unbiased click representations; predicting a per-channel attribution based on the per-touchpoint attribution weight, wherein the per-channel attribution corresponds to a conversion value of the one or more channels; and estimating a conversion of the one or more users based on the per-channel attribution and the one or more unbiased click representations for each of the one or more touchpoints.
[004] In an embodiment, the step of learning one or more unbiased click representations for each of the one or more touchpoints using the one or more learned latent state representations is based on a decorrelation of the one or more contexts of the one or more users from one or more associated channel preferences at each of the one or more touchpoints.
[005] In an embodiment, the one or more contexts comprise one or more of an operating system, a user information, a website, a channel, an advertising information, a user demographic, or a touchpoint history associated with the one or more users.
[006] In an embodiment, the one or more unbiased click representations are learnt to predict an equi-propensity of each of the one or more channels, using a first classifier.
[007] In an embodiment, the method further comprises predicting, based on the one or more unbiased click representations, a probability of a click at each of the one or more touchpoints, using a second classifier.
[008] In another aspect, there is provided a system for causal attention-based estimation of per-channel attribution. The system comprises a memory
storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to: receive a browser history associated with one or more users, wherein the browser history comprises one or more touchpoints, and wherein the one or more touchpoints correspond to one or more interactions of the one or more users with information comprised in one or more channels being viewed; learn one or more latent state representations for each of the one or more touchpoints; learn one or more unbiased click representations for each of the one or more touchpoints using the one or more learned latent state representations; learn, via an attention network executed by the one or more hardware processors, a per-touchpoint attribution weight for each of the one or more touchpoints, based on the one or more unbiased click representations; predict a per-channel attribution based on the per-touchpoint attribution weight, wherein the per-channel attribution corresponds to a conversion value of the one or more channels; and estimate a conversion of the one or more users based on the per-channel attribution and the one or more unbiased click representations for each of the one or more touchpoints.
[009] In an embodiment, the one or more unbiased click representations are learnt for each of the one or more touchpoints using the one or more learned latent state representations based on a decorrelation of the one or more contexts of the one or more users from one or more associated channel preferences at each of the one or more touchpoints.
[010] In an embodiment, the one or more contexts comprise one or more of an operating system, a user information, a website, a channel, an advertising information, a user demographic, or a touchpoint history associated with the one or more users.
[011] In an embodiment, the one or more unbiased click representations are learnt to predict an equi-propensity of each of the one or more channels, using a first classifier.
[012] In an embodiment, the one or more hardware processors are further configured by the instructions to predict, based on the one or more unbiased click
representations, a probability of a click at each of the one or more touchpoints, using a second classifier.
[013] In yet another aspect, there is provided a computer program product comprising a non-transitory computer readable medium having a computer readable program embodied therein, wherein the computer readable program, when executed on a computing device causes the computing device for causal attention-based estimation of per-channel attribution by receiving, via one or more hardware processors, a browser history associated with one or more users, wherein the browser history comprises one or more touchpoints, and wherein the one or more touchpoints correspond to one or more interactions of the one or more users with information comprised in one or more channels being viewed; learning one or more latent state representations for each of the one or more touchpoints; learning one or more unbiased click representations for each of the one or more touchpoints using the one or more learned latent state representations; learning, via an attention network executed by the one or more hardware processors, a per-touchpoint attribution weight for each of the one or more touchpoints, based on the one or more unbiased click representations; predicting a per-channel attribution based on the per-touchpoint attribution weight, wherein the per-channel attribution corresponds to a conversion value of the one or more channels; estimating a conversion of the one or more users based on the per-channel attribution and the one or more unbiased click representations for each of the one or more touchpoints.
[014] In an embodiment, the step of learning one or more unbiased click representations for each of the one or more touchpoints using the one or more learned latent state representations is based on a decorrelation of the one or more contexts of the one or more users from one or more associated channel preferences at each of the one or more touchpoints.
[015] In an embodiment, the one or more contexts comprise one or more of an operating system, a user information, a website, a channel, an advertising information, or a user demographic.
[016] In an embodiment, the one or more unbiased click representations are learnt to predict an equi-propensity of each of the one or more channels, using a first classifier.
[017] In an embodiment, the computer readable program, when executed on the computing device further causes the computing device to predict, based on the one or more unbiased click representations, a probability of a click at each of the one or more touchpoints, using a second classifier.
[018] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
[019] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
[020] FIG. 1 illustrates typical user journeys across several channels resulting in a temporal sequence of touchpoints.
[021] FIG. 2 depicts a system for causal attention-based estimation of per-channel attribution, in accordance with an embodiment of the present disclosure.
[022] FIG. 3 depicts a block diagram architecture of a causal recurrent network (CRN) as implemented by the system of FIG. 2 for causal attention-based estimation of per-channel attribution, in accordance with an embodiment of the present disclosure.
[023] FIG. 4 depicts an exemplary flow chart illustrating a method for causal attention-based estimation of per-channel attribution, using the system of FIG. 2 and the CRN of FIG. 3 respectively, in accordance with an embodiment of the present disclosure.
[024] FIG. 5 depicts a graphical representation illustrating a comparison of Log-loss for conversion (LLconv) of the method of the present dsiclosure and one or more baseline approaches, in accordance with an embodiment of the present disclosure.
[025] FIG. 6 depicts a graphical representation illustrating a comparison of area under conversion ROC curve (AUC) of the method of the present dsiclosure and one or more baseline approaches, in accordance with an embodiment of the present disclosure.
[026] FIG. 7 depicts a box plot of attribution weights for non-convert and convert sequences, in accordance with an embodiment of the present disclosure.
[027] FIG. 8 depicts an impact of the CRN of FIG. 3 as implemented by the system of FIG. 2, in accordance with an embodiment of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS [028] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
[029] Traditionally, in the pre-internet era, advertising was carried out through different advertising channels such as print media, radio, TV, billboards, and direct mail. However, in the internet age, with the advent of digital advertising, multichannel marketing employs a combination of offline (retail, newspapers, billboards, mail order catalogs, radio, etc.) and online (websites, display ads, social media, paid search, email, mobile) media in order to better engage with the end-users. Multichannel marketing has posed new challenges in the task of determining the per-channel conversion credits, which is the value of each customer engagement (also called as a customer touchpoint) that finally lead to a conversion (buy). Understanding the value of per-channel per-customer touchpoint aids in fair allocation of the budget per-channel leading to acquiring new customers effectively.
[030] Multi-touch attribution (MTA) measures the impact of each touchpoint and its contribution towards a conversion, hence determining the value
of that specific touchpoint. Data-driven MTA was developed as advertisers started to adopt digital marketing, providing opportunities to incorporate sophisticated and accurate techniques to understand and improve advertising budget allocation. For example, consider a scenario where a user intends to purchase a new laptop. After the customer logs queries on the search-engines, he sees targeted ads from specific laptop manufacturing companies. First the customer sees a display advertisement, which he ignores. Next, he sees an advertisement on his social media (e.g., Instagram®) feed that catches his attention and takes him to the laptop manufacturers’ website. Finally, the launch of a new product and a promotional offer via email with a discount code leads to a conversion. Such user journeys through several advertisement channels is also depicted in FIG. 1. More specifically, FIG. 1 illustrates typical user journeys across several channels resulting in a temporal sequence of touchpoints. In the process, the user-level data such as gender, age, geography, user-level events (clicks, impressions) etc. are logged. Conventional techniques such as first-touch attribution, last-touch attribution, linear attribution, etc. use such data to measure MTA. These techniques aggregate across users and do not utilize the user-level data for MTA. In order to provide a user-personalized experience by exploiting vast amounts of data, it becomes necessary to design more sophisticated data-driven machine learning techniques.
[031] Several data-driven approaches have been proposed in literature. In one of the literature (e.g., J. M. Robins, M. A. Hernan, and B. Brumback, “Marginal structural models and causal inference in epidemiology,” 2000.) the authors proposed a logistic regression method to predict the conversion rate with respect to advertisement occurrences. The authors in another literature (e.g., Y. Zhang, Y. Wei, and J. Ren, “Multi-touch attribution in online advertising with survival theory,” in 2014 IEEE International Conference on Data Mining. IEEE, 2014, pp. 687–696. – also referred as Zhang et al.) proposed data-driven MTA with survival theory, but do not personalize MTA since they neglect user characteristics. Furthermore, another literature (e.g., W. Ji and X. Wang, “Additional multi-touch attribution for online advertising,” in Thirty-First AAAI Conference on Artificial
Intelligence, 2017.) used a model-based survival technique where they assumed that the impact of ad exposures is additive and fades with time and employ hazard rate to reflect the influence of an ad exposure. However, the above-mentioned techniques do not employ data for user-personalized attribution. More recently, deep neural network-based approaches have been proposed. In such literature (e.g., K. Ren, Y. Fang, W. Zhang, S. Liu, J. Li, Y. Zhang, Y. Yu, and J. Wang, “Learning multi-touch conversion attribution with dual-attention mechanisms for online advertising,” in Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018, pp. 1433–1442.), the authors proposed a sequential user behavior, sequence learning model, where they learnt the attribution from final conversion estimation. An additional modeling constraint in the online advertising scenario is that a single user is exposed to multiple channels, and hence, the data can be interpreted as being longitudinal in nature. In another literature (e.g., R. Du, Y. Zhong, H. Nair, B. Cui, and R. Shou, “Causally driven incremental multi touch attribution using a recurrent neural network,” arXiv preprint arXiv:1902.00215, 2019.), the authors considered multiple touchpoints as different time-steps and employed RNNs to address MTA. They proposed a user-level model for purchase of a brand’s product as a function of the user’s exposure to ads followed by a fitted model to allocate the incremental attribution. In yet another literature (e.g., Diemert Eustache, Meynet Julien, P. Galland, and D. Lefortier, “Attribution modeling increases efficiency of bidding in display advertising,” in Proceedings of the AdKDD and TargetAd Workshop, KDD, Halifax, NS, Canada, August, 14, 2017. ACM, 2017, p. To appear. – also referred as Eustache et al.), the authors proposed an LSTM based sequential model in conjunction with the attention mechanism to capture the contextual dependency of the touchpoints. An effective approach towards MTA is employing causal inference methods to provide interpretability to a conversion. In a further literature (e.g., B. Dalessandro, C. Perlich, O. Stitelman, and F. Provost, “Causally motivated attribution for online advertising,” in Proceedings of the sixth international workshop on data mining for online advertising and internet economy, 2012, pp. 1– 9. Also referred as Dalessandro et al.), the problem of attribution was posed as a
causal estimation problem using Shapely values. In another literature (e.g., R. Singal, O. Besbes, A. Desir, V. Goyal, and G. Iyengar, “Shapley meets uniform: An axiomatic framework for attribution in online advertising,” in The World Wide Web Conference, 2019, pp. 1713–1723.), the authors developed an axiomatic framework for MTA in online advertising, by proposing a novel metric for attribution called as counterfactual adjusted Shapley value. A novel interpretable deep learning model, DeepMTA, for online multitouch attribution was developed by combining deep learning and cooperative game theory.
[032] A counterfactual analysis in the context of multi-channel attribution was proposed by Dalessandro et al., where the impact of a channel on user conversion is measured by obtaining the difference in conversion outcomes on a user when he is exposed to the channel as compared to when he is not exposed to the channel. However, since it is not possible to obtain outcomes for both these scenarios for a given user, users are randomly assigned to either group. However, in observational studies pertaining to digital advertising, assigning users to specific channel is not always feasible, and hence, conventional methods to obtain causal parameters for measuring MTA are biased. The main issue in observational data is the presence of confounding (also referred to as selection bias), i.e., channel assignment per user depends on user context and hence, users per channel is not random. In order to abate the effect of confounding, several statistical approaches such as sub-classification, weighting, imputations, and propensity score (PS) matching for unbiased per-individual causal estimates have been proposed. Furthermore, modern deep neural network (DNN) based approaches minimize the discrepancy between the distribution of individuals receiving different treatments, in order to emulate a randomized trial (e.g., refer F. Johansson, U. Shalit, and D. Sontag, “Learning representations for counterfactual inference,” in ICML, 2016, pp. 3020–3029. and A. Sharma, G. Gupta, R. Prasad, A. Chatterjee, L. Vig, and G. Shroff, “Multimbnn: Matched and balanced causal inference with neural networks,” arXiv preprint arXiv:2004.13446, 2020.).
[033] Causal inference on longitudinal observational data provides an opportunity to understand how user-behavior and buying patterns evolve as a cause-
effect relationship under different channels and ad exposures, thus leading to new tools for digital advertising. In the context of time-varying confounding, estimating the effects of time-varying channel exposures are based on marginal Structural Models (MSMs) and inverse probability of treatment weighting (IPTW). Recently, one of the researches (e.g., I. Bica, A. M. Alaa, J. Jordon, and M. van der Schaar, “Estimating counterfactual treatment outcomes over time through adversarially balanced representations,” arXiv preprint arXiv:2002.04083, 2020. – also referred as Bica et al.) proposed the counterfactual recurrent network (CRN), which integrates domain adversarial training into sequence-to-sequence architecture for estimating treatment effects over time. Furthermore, CRN constructs treatment invariant representations at each time-step, thus avoiding the association between patient history and treatment assignment. However, when applied to digital advertising, CRN by itself does not provide information regarding per-channel conversion credits or attribution.
[034] In the present disclosure, systems and methods implement a causal attribution mechanism where user-personalized MTA are obtained for observational data. Particularly, a deep neural network (DNN) architecture is implemented with additional functionality to obtain per-channel attribution by employing an attention layer. Some of the technical contribution by the present disclosure and its systems and methods are as follows, and such exemplary contributions shall not be construed as limiting the scope of the present disclosure:
1. The present disclosure implements a causal recurrent network architecture for MTA, which helps to compensate for time-varying confounders (representative causes of confounders shown by crosses in FIG. 8) further leading to reduction of selection bias in learning touchpoint credits to conversion, i.e., attribution.
2. The present disclosure implements a hierarchical network design for prediction of outcomes, which help to overcome highly skewed ratios of conversions in comparison to customer touchpoints.
3. Extensive validation of causal recurrent network (CRN) (as implemented
by the system of the present disclosure) is done in terms of prediction
performance, budget allocation and interpreting users’ buying behaviour.
[035] While system of the present disclosure entails inherent benefits of
counterfactual analysis, which is to identify which part of the observed profits (due
to conversions) is attributable to the impact of an advertisement, modelling the data
using recurrent neural network (RNN) aids in analysing time-variation in
confounders. Furthermore, since a single channel is used per time-step, an attention
layer is designed and implemented to measure attribution in terms of one attention
weight per channel. The present disclosure demonstrates the efficacy of system and
method described herein, on a challenging real-world Criteo dataset.
[036] Referring now to the drawings, and more particularly to FIGS. 2 through 8, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
[037] FIG. 2 depicts a system 100 for causal attention-based estimation of per-channel attribution, in accordance with an embodiment of the present disclosure. In an embodiment, the system 100 includes one or more hardware processors 104, communication interface device(s) or input/output (I/O) interface(s) 106 (also referred as interface(s)), and one or more data storage devices or memory 102 operatively coupled to the one or more hardware processors 104. The one or more processors 104 may be one or more software processing components and/or hardware processors. In an embodiment, the hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) is/are configured to fetch and execute computer-readable instructions stored in the memory. In an embodiment, the system 100 can be implemented in a variety of computing
systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.
[038] The I/O interface device(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface device(s) can include one or more ports for connecting a number of devices to one another or to another server.
[039] The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, a database 108 is comprised in the memory 102, wherein the database 108 comprises browser history associated with one or more users, wherein the browser history comprises one or more touchpoints, each of the one or more touchpoints correspond to one or more interactions of the one or more users with information comprised in one or more channels being viewed.
[040] The information stored in the database 108 may further comprise (i) one or more latent state representations learnt for each of the one or more touchpoints, one or more unbiased click representations learnt for each of the one or more touchpoints using the one or more learned latent state representations, a per-touchpoint attribution weight learnt for each of the one or more touchpoints, per-channel attribution and the like. The information stored in the database 108 (or memory 102) may further comprise a one or more classifiers, an equi-propensity of each of the one or more channels, and a probability of a click at each of the one or more touchpoints. The information stored in the database 108 further comprises one or more contexts of the one or more users, one or more associated channel preferences at each of the one or more touchpoints, and the like. The one or more contexts comprise one or more of an operating system, a user information, a
website, a channel, an advertising information, a user demographic, or a touchpoint history associated with the one or more users.
[041] In an embodiment, one or more techniques, neural networks, and the like, as known in the art are comprised in the memory 102 and invoked as per the requirement to perform the methodologies described herein. For instance, the system 100 stores a Causal Recurrent Network (CRN - as depicted in FIG. 3) in the memory 102 that is invoked for execution of the method of the present disclosure. The memory 102 further comprises (or may further comprise) information pertaining to input(s)/output(s) of each step performed by the systems and methods of the present disclosure. In other words, input(s) fed at each step and output(s) generated at each step are comprised in the memory 102 and can be utilized in further processing and analysis.
[042] FIG. 3, with reference to FIG. 2, depicts a block diagram architecture of a Causal Recurrent Network (CRN) as implemented by the system 100 of FIG. 2 for causal attention-based estimation of per-channel attribution, in accordance with an embodiment of the present disclosure.
[043] FIG. 4 depicts an exemplary flow chart illustrating a method for causal attention-based estimation of per-channel attribution, using the system 100 of FIG. 2 and the CRN of FIG. 3 respectively, in accordance with an embodiment of the present disclosure. In an embodiment, the system(s) 100 comprises one or more data storage devices or the memory 102 operatively coupled to the one or more hardware processors 104 and is configured to store instructions for execution of steps of the method by the one or more processors 104. The steps of the method of the present disclosure will now be explained with reference to components of the system 100 of FIG. 2, the block diagram of FIG. 3, and the flow diagram as depicted in FIG. 4. In an embodiment, at step 202 of the present disclosure, the one or more hardware processors 104 receive a browser history associated with one or more users, wherein the browser history comprises one or more touchpoints. More specifically, the browser history is received by the CRN via the one or more hardware processors 104, in one embodiment of the present disclosure. In an embodiment, the one or more touchpoints correspond to one or more interactions
of the one or more users with information comprised in one or more channels being viewed. The above step is elaborated by way of following description.
[044] The system and method of the present disclosure consider
as the user browsing dataset/browser history where a given user (customer) un interacts with Tn touchpoints. Each touchpoint has a context vector Xtn which consists of features such as user demographics, advertisement information, website, operating system along with channel ct, binary outcome of click or non-click zt+1n, etc. Note that click outcome zt+1n is observed as a part of context co-variates zt+1n. Each touch point is delivered by one of K channels given by ct = [ct(1),…,ct(k),…,ct(K)], where each entry ct(k) is binary , i.e, ct(k) ∈ {0,1}. In a given time-step, each touchpoint can be represented by one channel, and hence, ct is a one-hot vector. The sequence of touchpoints leads to conversion (1) or non-conversion (0), leading to a binary outcome yn. Several preliminary works in MTA literature do not use click for determining attribution of channels from touchpoints to conversion (e.g., refer “S. K. Arava, C. Dong, Z. Yan, A. Pani et al., “Deep neural net with attention for multi-channel multi-touch attribution,” arXiv preprint arXiv: 1809.02230, 2018.” - also referred as Arava et al). However, in real-world the conversion rate of a user is very small in comparison to his interaction with online ads. This leads to problem of class imbalance. To abate this effect, the present disclosure and its system and method use intermediate outcomes, click, to learn a hierarchical mapping from touchpoints to click and further to conversions to boost the estimation of sparse conversion behaviour in CRN of FIG. 3 of the present disclosure.
[045] The above description can be better understood by way of a following example. Consider a scenario where a user intends to purchase a new laptop. After the customer logs query (or queries) on the search-engines, he/she sees targeted advertisements (ads) from specific laptop manufacturing companies. First the customer sees a display advertisement, which he/she ignores. This is the first touchpoint in user journey. Next, he/she sees an advertisement on his social media feed (e.g., say Instagram®) feed) that catches his attention, and takes him to the laptop manufacturers’ website. This is the second touchpoint and has a click
outcome associated. Finally, the launch of a new product and a promotional offer via email with a discount code leads to a conversion. In addition to this, the present disclosure considers multi-touch observation data which has an inherent issue of selection bias due to time-varying confounders. Confounders are user contextual features that impact both outcome and channel preference at each touchpoint. For example, channel preference for the current touchpoint is influenced by factors like user’s click behaviour, user’s demographics, access to channels etc. (e.g., refer FIG. 1). These factors along with channel selected for current advertisement display have a bearing on user’s current touchpoint click action, thereby accounting for confounding effect at each touchpoint. The present disclosure addresses the issue of time-varying confoundedness using representation learning and further learn attribution of each channel in conversion using an attention network as described in the sequel.
[046] Referring to steps of FIG. 4, at step 204, the one or more hardware
processors 104 learn one or more latent state representations for each of the one or
more touchpoints. The above step 204 is performed by the system 100 to obtain a
fixed dimensional representation of history/past and current user context at each
touchpoint. For instance, at a first touchpoint, the latent state representation has
knowledge about customer log query (or queries) on search engine and current user
context information such as operating system type (OS), demographics, a
touchpoint history associated with the one or more users, etc. Similarly, at second
touchpoint, latent state representation has information of search logs and display
advertisement (ad) touchpoint and current context of user. At third touchpoint,
latent state representation has information of search logs, display touchpoint, social
media touchpoint (e.g., Instagram® touchpoint) which customer clicks and current
context information. The step of learning one or more unbiased click
representations for each of the one or more touchpoints using the one or more learned latent state representations is based on a decorrelation of one or more contexts of the one or more users from one or more associated channel preferences at each of the one or more touchpoints. The one or more contexts comprise, but are not limited to, one or more of an operating system, a user information, a website, a
channel, an advertising information, or a user demographic (e.g., country, age, race, gender, etc.).
[047] At step 206, the one or more hardware processors 104 learn one or more unbiased click representations for each of the one or more touchpoints using the one or more learned latent state representations. At each touchpoint, latent state representation learnt (or learned) in previous step is highly predictive of what are user’s preference to channels. However, this adds biasness to data. At each touchpoint latent state representation is made invariant of channel assignment but should be able to predict click outcome well in the step. Such transformation of latent state representation at each touchpoint is termed as unbiased click representation. The above steps of learning the unbiased click representations for each touchpoint based on a decorrelation of context from associated channel preferences, and learning the one or more unbiased click representations for each of the one or more touchpoints using the one or more learned latent state representations are better understood by way of following description:
[048] In the causal inference based on representation learning, the crux of the loss function is based on decorrelating context information and treatment assignment for abating selection bias. In this context, data is sequential since touchpoints span different time-steps in a sequential fashion. CRN decorrelates user’s history (context) from his/her channel preferences (treatment) at each touchpoint. For this, a recurrent neural network (comprised in the memory 102) is employed and executed to learn latent state representation of context at each touchpoint (stn) which is expressed as:
[049] This latent state vector (stn) is a L dimensional representation of user’s ad interaction journey consisting of channels offered [c1, … , ct-1], user’s demographics [X1n ,..., Xt-1n,…,Xtn] and click outcomes [z2n,…,zt-1n,…,ztn] till touchpoint t and hence can be termed as latent user history (stn). This latent user history impacts click outcome zt+1 and plays a significant role in personalized ad targeting algorithms for deciding current channel ctn. This preferential assignment of the channel leads to selection bias. If proceeded without compensating for the
selection bias, biased estimates of the click outcome are obtained, which further impact channel attribution and conversion prediction. Unlike conventional, unitary time-step scenarios, here in the present disclosure, the impact of biased estimation tends to extend to several timesteps, making it necessary to abate the effect of selection bias at each time-step. In order to abate this effect, channel invariant balanced representation rtn is learnt using MinMax loss such that Φt: stn → rtn, where Φt ∈ RM×L is a representation that helps to minimize confounding.
[050] In an embodiment, the one or more unbiased click representations are learnt to predict an equi-propensity of each of the one or more channels, using a first classifier. More specifically, a channel assignment classifier comprised in the memory 102 is invoked and executed to predict an equi-propensity of each of the one or more channels using (or based on) the one or more unbiased click representations. The expression ‘Equi-propensity’ herein refers to equal preference to each of the one or more channels. In other words, for decorrelation of latent state representation from channel assignment, each latent state representation is transformed to unbiased representation which should be indiscriminative of channel assignment. In another embodiment, based on the one or more unbiased click representations, a probability of a click at each of the one or more touchpoints, using a second classifier. More specifically, a click outcome prediction classifier comprised in the memory 102 is invoked and executed to predict the probability of a click at each of the one or more touchpoints, using (or based on) the one or more unbiased click representations. In other words, alongside, at each touchpoint, biased representation should be highly predictive of click outcome at each touchpoint which is ensured by the click outcome prediction classifier. The above description of predicting (i) the equi-propensity of each of the one or more channels and (ii) the probability of a click at each of the one or more touchpoints are better understood by way of following description:
[051] The present disclosure uses two classifiers, the channel-assignment classifier (Ct,c) and the click outcome prediction classifier (Ct,z) at each of the touchpoints for the purpose of learning balanced representation rtn. Channel-assignment classifier (Ct,c) is a 2 layered, multiple layer perceptron (MLP) with K
dimensional outcome fed to a softmax layer for learning propensity of each of the K channels. Similarly, Ct,z is click prediction classifier with 2-layered MLP with 1 dimensional output fed to sigmoid layer for obtaining probability of click at each touchpoint
[052] Since the objective is to obtain an unbiased representation rtn invariant of channel ct, channel-assignment classifier should learn equi-propensity for each of K channels.
[053] Hence, the loss function (Lt,c) for channel-assignment classifier should maximize.
[054] In addition, this representation must be an accurate predictor of click outcome. For this, click-prediction loss Zt,c should minimize the loss function given as:
where . Hence, a channel-invariant representation is learnt that
predicts click outcomes accurately at each of the touchpoints using an overall MinMax loss:
where Φ = [Φ1, ...,ΦTn], and the value of A is obtained during hyperparameter tuning of the network.
[055] Referring steps of FIG. 4, at step 208, the one or more hardware processors 104 learn, via an attention network executed by the one or more hardware processors, a per-touchpoint attribution weight for each of the one or more touchpoints, based on the one or more unbiased click representations. At each touchpoint, the unbiased click representation is passed via the attention network to learn per-touchpoint attribution weight. This attribution weight measures the impact of each touchpoint towards conversion. For example, its assigns impact value of touchpoint 1, 2, 3, and so on to final conversion.
[056] At step 210, the one or more hardware processors 104 predict a per-channel attribution based on the per-touchpoint attribution weight. In an
embodiment, the per-channel attribution corresponds to a conversion value of the one or more channels. Attribution weight of touchpoint 1, 2, 3, and so on corresponds to attribution weight of each channel associated with each touchpoint. In above example, it gives attribution weight of display advertisement, social media feed/ad (e.g., Instagram® ad) and email ad to final conversion. The above steps 208 and 210 of learning the per-touchpoint attribution weight for each of the one or more touchpoints, and predicting the per-channel attribution based on the per-touchpoint attribution weight are better understood by way of following description:
[057] A click outcome probability was obtained as mentioned above which factors in channel invariant representation of user history ‘rt’ and channel used at current touchpoint ct. The issue of computing attribution of each touchpoint to conversion is addressed which is represented by ŷ .
[058] The attribution of each touchpoint to conversion is computed using an attention layer after the representation layer (e.g., refer FIG. 3). To compute attribution weights, hierarchical attention mechanism as known in the art (e.g., refer ‘Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy, “Hierarchical attention networks for document classification,” in Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, 2016, pp. 1480–1489.’) is implemented by the system 100 of the present disclosure. The click outcome of the t-th touchpoint of the causal recurrent network given by ztn is processed through a onelayer tanh MLP to obtain the hidden representation vtn of click outcome, channel and representation rt (e.g., as depicted in FIG. 3). This can be mathematically represented as:
where wv, bv are the trainable parameters of MLP. Then attribution weights are obtained as the similarity of hidden representation vtn with trainable touchpoint context vector u, normalized through a softmax function:
[059] At step 212 of the present disclosure, the one or more hardware processors 104 estimate a conversion of the one or more users based on the per-channel attribution and the one or more unbiased click representations for each of the one or more touchpoints. In other words, the system 100 or the CRN predicts whether the one or more users purchase (or buys) one or more products and/or one or more services that are being viewed via the one or more channels (e.g., channel could include, but not limited to, search engine, social media, website, and the like) . The above step of 212 is better understood by way of following description.
[060] To obtain the predicted conversion output, firstly the hidden state representation of entire sequence ht is obtained as the attention weighted sum of touchpoint representation vt, given by:
The sequence representation htn is passed through a sigmoid one-layer MLP in order to obtain conversion prediction yn, as given by:
where by is a trainable parameter along with Wy. Furthermore, binary cross entropy loss is used for conversion prediction of the sequence, i.e.,
[061] The above per-channel attribution/conversion rate prediction and other preceding steps of FIG. 4 may be further better understood by way of following description:
[062] The present disclosure describes the CRN architecture for multi-touch attribution as depicted in FIG. 3. The network consists of three major interconnected components, as depicted in FIG. 3. First, the latent state history (stn) of user is learned at each touchpoint which is followed by the reduction of selection bias by employing representation learning using MinMax loss equation (4). This component is termed as Causal Recurrent Network which outputs click probability to the subsequent attention layer. Attention mechanism is used for touchpoint attribution which learns hidden representation vtn of channel, click and context representation and uses similarity between vtn and trainable context vector u for
computing attention of each touchpoint. Finally, a one-layer MLP is employed for conversion prediction. The overall loss function for each sequence is given by:
EXPERIMENTAL SETUP:
[063] The efficacy of the system of FIG. 2 and the CRN of FIG. 3 is demonstrated on publicly available real-world live traffic dataset Criteo (e.g., refer ‘Eustache et al.’ for the mentioned dataset). The dataset and the processing methodology are discussed, followed by evaluation metrics, baseline approaches and implementation specifics of system and CRN of FIGS. 2 and 3 respectively. Dataset and Data processing:
[064] The present disclosure has used Criteo dataset for validation of the method of the present disclosure. The dataset has more than 16 million impressions (touchpoints) over 675 campaigns. Each impression is associated with 9 categorical covariates, the context of which is masked for confidentiality purposes. Impression logs are also associated with the identity of user and conversion identity. Each user is associated with either single or multiple conversion identities. Hence, such user sequences are split in such a way that each sequence has at most one conversion. Additionally, sequences which consisted of more than 20 touchpoints have been omitted since those constitute less than 0.5% of all sequences in the data. Each impression consists of information regarding the advertising campaign and total cost incurred. 675 advertising campaigns have been considered as channels, and 10 channels have been randomly selected for analysis. Accordingly, sequences which consisted of channels other than the selected ones have been removed. For the ease of handling and analysing data, vocabulary size of covariates is reduced by combining categorical features based on the word frequency distribution. First order statistics of the processed dataset used for evaluation and original Criteo dataset is provided in Table 1. Furthermore, the processed dataset is split into 60:20:20 for training, validation, and testing sets.
Table 1 (first order statistics of processed Criteo dataset)
Processed Criteo dataset Criteo dataset
No. of user 44,370 6,142,256
No. of channels 10 675
No. of sequences 46,299 6,755,770
No. of touchpoints 82,590 16,468,027
No. of convert sequence 2,541 806,196
No. of click touchpoints 27,782 5,947,563
Evaluation Metric
[065] The method of the present disclosure has been evaluated in two (major) parts. The first part focuses on evaluating the conversion estimation and click estimation performance in terms for log-loss and area under conversion ROC curve (AUC). The log-loss (LL) for conversions is given by:
Similarly, log-loss for clicks is given by:
[066] The second part focuses on attribution-guided budget allocation performance over historical data (e.g., refer ‘K. Ren, Y. Fang, W. Zhang, S. Liu, J. Li, Y. Zhang, Y. Yu, and J. Wang, “Learning multi-touch conversion attribution with dual-attention mechanisms for online advertising,” in Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018, pp. 1433-1442.’ - also referred as Ren et al.). For this, the return on investment (ROI) of each channel has been computed:
where αtn(k) is the attention weight, costtn(k) is the monetary expenditure of k-th channel, at touchpoint t corresponding to the n-th sequence in the data sample, and Π(.) is the indicator function. Subsequently, budgets are allocated across K channels according to:
where B is the total budget to be allocated. The above budget allocation is intuitive since it is the weighted average of ROI, implying that channels with large ROI are allotted higher budgets.
[067] These re-allocated channel budgets are then used to traverse along testing set impressions ordered by their serving time. If there is no budget left for the channel corresponding to impression, entire sequence is removed from further analysis and termed as blacklist sequence. Current channel’s cost is subtracted from the remaining budget of the channel and total number of conversion sequences from non-blacklist sequences is the number of true conversions. The cost spend on all channels is the total expenditure and is used for the computation of attribution-guided marketing budget allocation evaluation. The detailed budget re-allocation algorithm can be found in as known in the art research work (e.g., refer Ren et al.).
[068] Cost per action (CPA) is used, which is the total expenditure normalized by the number of true conversions, and conversion rate (CVR), which is the number of true conversions averaged by number of testing set sequences as metrics for validating budget allocation using attribution obtained from system and CRN of FIGS. 2 and 3 respectively. Baseline Models:
[069] The following baseline approaches have been discussed for multi-touch attribution against which the method of the present disclosure is compared:
1. Additive Hazard (AH) (e.g., refer Zhang et al.) is based on survival theory for multi-channel attribution problem in online advertising. It considers both, the impact of different levels of advertising channels and time-decaying effect.
2. Logistic regression (LR) approach presented in X. Shao and L. Li, “Data-driven multi-touch attribution models,” in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 2011, pp. 258–264. (also referred as Shao et al.), where attributions of each channel are computed using logistic regression.
3. Additional Multi-touch Attribution (AMTA) (e.g., ‘W. Ji and X. Wang, “Additional multi-touch attribution for online advertising,” in Thirty-First
computing attention of each touchpoint. Finally, a one-layer MLP is employed for conversion prediction. The overall loss function for each sequence is given by:
EXPERIMENTAL SETUP:
[063] The efficacy of the system of FIG. 2 and the CRN of FIG. 3 is demonstrated on publicly available real-world live traffic dataset Criteo (e.g., refer ‘Eustache et al.’ for the mentioned dataset). The dataset and the processing methodology are discussed, followed by evaluation metrics, baseline approaches and implementation specifics of system and CRN of FIGS. 2 and 3 respectively. Dataset and Data processing:
[064] The present disclosure has used Criteo dataset for validation of the method of the present disclosure. The dataset has more than 16 million impressions (touchpoints) over 675 campaigns. Each impression is associated with 9 categorical covariates, the context of which is masked for confidentiality purposes. Impression logs are also associated with the identity of user and conversion identity. Each user is associated with either single or multiple conversion identities. Hence, such user sequences are split in such a way that each sequence has at most one conversion. Additionally, sequences which consisted of more than 20 touchpoints have been omitted since those constitute less than 0.5% of all sequences in the data. Each impression consists of information regarding the advertising campaign and total cost incurred. 675 advertising campaigns have been considered as channels, and 10 channels have been randomly selected for analysis. Accordingly, sequences which consisted of channels other than the selected ones have been removed. For the ease of handling and analysing data, vocabulary size of covariates is reduced by combining categorical features based on the word frequency distribution. First order statistics of the processed dataset used for evaluation and original Criteo dataset is provided in Table 1. Furthermore, the processed dataset is split into 60:20:20 for training, validation, and testing sets.
Table 1 (first order statistics of processed Criteo dataset)
Processed Criteo dataset Criteo dataset
No. of user 44,370 6,142,256
No. of channels 10 675
No. of sequences 46,299 6,755,770
No. of touchpoints 82,590 16,468,027
No. of convert sequence 2,541 806,196
No. of click touchpoints 27,782 5,947,563
Evaluation Metric
[065] The method of the present disclosure has been evaluated in two (major) parts. The first part focuses on evaluating the conversion estimation and click estimation performance in terms for log-loss and area under conversion ROC curve (AUC). The log-loss (LL) for conversions is given by:
Similarly, log-loss for clicks is given by:
[066] The second part focuses on attribution-guided budget allocation performance over historical data (e.g., refer ‘K. Ren, Y. Fang, W. Zhang, S. Liu, J. Li, Y. Zhang, Y. Yu, and J. Wang, “Learning multi-touch conversion attribution with dual-attention mechanisms for online advertising,” in Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018, pp. 1433-1442.’ - also referred as Ren et al.). For this, the return on investment (ROI) of each channel has been computed:
where is the attention weight, is the monetary expenditure of k-th
channel, at touchpoint t corresponding to the n-th sequence in the data sample, and (.) is the indicator function. Subsequently, budgets are allocated across K channels according to:
where B is the total budget to be allocated. The above budget allocation is intuitive since it is the weighted average of ROI, implying that channels with large ROI are allotted higher budgets.
[067] These re-allocated channel budgets are then used to traverse along testing set impressions ordered by their serving time. If there is no budget left for the channel corresponding to impression, entire sequence is removed from further analysis and termed as blacklist sequence. Current channel’s cost is subtracted from the remaining budget of the channel and total number of conversion sequences from non-blacklist sequences is the number of true conversions. The cost spend on all channels is the total expenditure and is used for the computation of attribution-guided marketing budget allocation evaluation. The detailed budget re-allocation algorithm can be found in as known in the art research work (e.g., refer Ren et al.).
[068] Cost per action (CPA) is used, which is the total expenditure normalized by the number of true conversions, and conversion rate (CVR), which is the number of true conversions averaged by number of testing set sequences as metrics for validating budget allocation using attribution obtained from system and CRN of FIGS. 2 and 3 respectively. Baseline Models:
[069] The following baseline approaches have been discussed for multi-touch attribution against which the method of the present disclosure is compared:
1. Additive Hazard (AH) (e.g., refer Zhang et al.) is based on survival theory for multi-channel attribution problem in online advertising. It considers both, the impact of different levels of advertising channels and time-decaying effect.
2. Logistic regression (LR) approach presented in X. Shao and L. Li, “Data-driven multi-touch attribution models,” in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 2011, pp. 258–264. (also referred as Shao et al.), where attributions of each channel are computed using logistic regression.
3. Additional Multi-touch Attribution (AMTA) (e.g., ‘W. Ji and X. Wang, “Additional multi-touch attribution for online advertising,” in Thirty-First
AAAI Conference on Artificial Intelligence, 2017.’ – also referred as Ji et al.) which employs mathematical tools from survival analysis. Here, hazard rate is used to measure the influence of online advertisement exposure, based on the assumption that the effect of an advertisement exposure fades with time and the browsing path of users are additive.
4. Deep Neural Net with Attention Multi-touch Attribution (DNAMTA) (e.g., refer Eustache et al.) is a deep neural network incorporating attention mechanism in addition to an LSTM network.
5. Dual-Attention Recurrent Neural Network (DARNN) (e.g., refer Ren et al.) uses dual-attention RNNs, one for impression-level data and another for clickstream data in order to calculate the effective conversion attribution. [070] NVIDIA GK110BL [Tesla K40c] was used by the system and
method of the present disclosure for training and hyperparameter optimization of all baseline approaches. Implementation Specifics:
[071] An embedding representation or continuous representations of each of the categorical covariates was obtained prior to the first layer of CRN of FIG. 3. Weights of embedding layer along with CRN weights are learnt using loss function as given in equation (10). Note that the hyperparameters are selected based on the overall loss function given in equation (10) on the validation dataset. It is to be understood by a person having ordinary skill in the art or person skilled in the art that the selection hyperparameters may be empirically derived or pre-defined, and the hyperparameter search space would vary based on implementation type and scenarios under consideration, and such examples of hyperparameters selection shall not be construed as limiting the scope of the present disclosure. EXPERIMENTAL RESULTS:
[072] The experimental analysis of the method of the present disclosure has been demonstrated on the real-world Criteo dataset. Experimental evaluation has been divided into two parts:
1. Analysis of the method of the present disclosure’s prediction performance
2. Analysis of attribution guided budget allocation
[073] Subsequently, the nature attribution weights has been analyzed in convert and non-convert sequences and channel attribution weights based on user’s buying behaviour. Prediction performance analysis:
[074] AUC, have been used as the metric to evaluate
prediction performance. It was observed that method of the present disclosure achieved high AUC of 0.9591, and very low values of click and conversion log-loss. An empirical analysis of the baseline approaches as compared to method of the present disclosure as shown in Table 2.
Table 2
DARNN LR AMTA AH Budget CPA CVR Number of true conversions
0.8 0.6 0.4 0.2 0.8 0.6 0.4 0.2 0.8 0.6 0.4 9 5 4 5 0.2
49.92 48.7 48.2 68.4 51.3 0.0315 0.0314 0.0319 0.0209 0.0252
53.6 49.9 54.8 68.7 68.5 0.0305 0.0319 0.029 0.0215 0.0198
74.2 72.7 72.5 36.8 36.8 0.0334 0.0217 0.0217 0.0262 0.0364
32.3 31.32 29.63 29.96 29.96 0.0514 0.0522 0.0535 0.0453 0.0576
Method of the
DNAMTA
present disclosure 9.59 9.58 9.53 9.54 14.31 0.1346 0.1233 0.1209 0.0803 0.0422 186 163 130 17 19
9.51 9.56 9.44 9.41 16.04 0.1358 0.124 0.1042 0.0814 0.0374 190 170 138
[075] It is observed from above Table 2 that the method of the present disclosure outperforms the baseline approaches by huge margins. In addition, method of the present disclosure (λ = 0), i.e., the network of the system 100 without accounting for confounders underperforms as compared to CRN/method of the present disclosure (λ > 0), hence highlighting the importance of compensation of selection bias due to confounding context variables. Furthermore, these results also point to the significance of using hierarchical network of impressions-click-conversion prediction which helps to overcome skewed representation of conversions in the data. Note that click is used as an input covariate in all other baselines except for CRN/method of the present disclosure (CAMTA) and DARNN which predicts clicks and uses it further for prediction of conversions and attribution.
[076] In FIG. 5 and FIG. 6, the and AUC performance of method
of the present disclosure with the baseline schemes across epochs. It is observed that method of the present disclosure consistently beats state of art approaches by large margins, for different metrics related to conversion prediction. More specifically, FIG. 5, with reference to FIGS. 2 through 4, depicts a graphical representation illustrating a comparison of Log-loss for conversion of the
method of the present dsiclosure and one or more baseline approaches, in accordance with an embodiment of the present disclosure. FIG. 6, with reference to FIGS. 2 through 5, depicts a graphical representation illustrating a comparison of
area under conversion ROC curve (AUC) of the method of the present dsiclosure and one or more baseline approaches, in accordance with an embodiment of the present disclosure. Attribution weights comparison:
[077] In addition, box plots have been employed to analyze the attribution weights of convert and non-convert sequences from testing data for all touchpoints. These box plots are shown in FIG. 7 where touchpoints represent on x-axis, while the y-axis represents the attribution weight for each touchpoint. More specifically, FIG. 7, with reference to FIGS. 2 through 5, depicts a box plot of attribution weights for non-convert and convert sequences, in accordance with an embodiment of the present disclosure. Box plots are a standard method to depict the data distribution based on five numbers: quartile 1, quartile 2 (median), quartile 3, minimum, maximum. Minimum and maximum refers to quartile 1 ± 1:5 * (quartile 3 - quartile 1). The dots are the outliers beyond maximum, minimum value of attribution weight for each touchpoint.
[078] From FIG. 7, it is observed that the median attribution weights of non-convert sequences is very close to 0 and distribution across touchpoints is uniform with quartile 3 for all touchpoints < 0.15. However, attribution weights of convert sequences has non-uniform distribution across touchpoints, with touchpoint 3, 7, 10, and 13 having high values of quartile 3 and touchpoints 7, 9, 10, 12, and 13 having high values for quartile 1, 2, clearly pointing to discriminative capability of the method of the present disclosure for attribution weights of convert and non-convert sequences. Attribution guided budget allocation:
[079] Method of the present disclosure has been further analyzed for attribution guided budget allocation. Here, different proportions of total testing set budget have been used to evaluate the performance of budget re-allocation algorithm. has been used for computing (equation (13)) in the
implementation of the system and method of the present disclosure. It is to be noted that the cost values are scaled to a very small value in Criteo dataset. To highlight the difference of CPA across baselines, Criteo data cost has been scaled by 1000. It
is seen that if budget allocation guided by method of the present disclosure’s attribution is used, CPA is least and CVR, which is the conversion rate, is highest for method of the present disclosure for 0.4, 0.6, 0.8, and 1 proportions of total budget. Hence, for attribution-guided budget allocation, method of the present disclosure outperforms baseline approaches for higher budgets. For smaller budget, research work (e.g., refer ‘Arava et al.’) and method of the present disclosure are equivalent.
[080] It is necessary to be pointed out that this methodology of evaluating budget allocation is an approximate technique in the absence of real-time environment for simulating user, context, channel, cost, outcome. In addition, system and method of the present disclosure have been trained for learning best attribution weights for the prediction of (non-) converts and (non-) clicks. User-Behaviour using attribution:
[081] Channel attribution weights have been analysed for different types of users categorized based on their return to advertiser. First, return of each
impression is computed, where is the attention weight of each
impression, is the money spend for the impression,is the probability of
conversion as obtained using method of the present disclosure. Average of users’ touchpoints is computed to obtain users’ return to advertiser. Based on this, users are categorized as low-, medium-, high- return users (using 3-means clustering) constituting of 93.26%, 5.88% and 0.90% of testing set user population.
[082] Further, user level weighted average of attention (attribution) weights was analysed for each of the K channels using box plots for low-, medium¬, high- return user-groups (not shown in FIGS). It was observed for analysis that, for low-return user-group, the channel attribution weights are consistently low for all channels. Hence, it can be concluded that low- return users do not have high affinity for any of the channels. Medium- and high- return users have high propensity for all channels while high-return users are particularly more prone to buying using channel 1 to 6.
[083] FIG. 8, with reference to FIGS. 2 through 7, depicts an impact of the CRN of FIG. 3 as implemented by the system 100 of FIG. 2 for causal attention-based estimation of per-channel attribution, in accordance with an embodiment of the present disclosure. More specifically, FIG. 8 illustrates the temporal confounding effects, where x1, x2 are the user-context covariates affecting channels c1, c2 and outcome y1, y2 at touchpoint T1, T2 of a user journey. The crosses (or cross marks) indicate the bias compensating nature of the CRN of FIG. 3.
[084] Present disclosure provides system and method that implement a DNN based MTA framework by employing a causal recurrent network for abating selection bias, followed by an attention layer for computing the attribution weights, and a single layer MLP for predicting conversions. In the context of the challenging real-world Criteo dataset, it is showed that method of the present disclosure is able to outperform several state-of-the-art baselines such as DARNN, DNAMATA, AMTA, etc. in terms of prediction accuracy (Log-loss and AUC). Experiments have been presented towards interpreting the per-channel attribution weights. Using box plots it has been observed that median attribution weights of non-convert sequences are very close to 0 and distribution of touchpoints is uniform, while attribution weights of convert sequences has non-uniform distribution across touchpoints, highlighting the discriminative power of method of the present disclosure with respect to convert and non-convert sequences.
[085] Furthermore, insights into budget allocation are given using the attribution weights obtained using the method of the present disclosure. Perhaps one of the most notable achievement of the system and method of the present disclosure is user behavior modelling. Here, it has been shown that there is considerable behavioral differences observable between low-return and medium/high-return user groups, as measured using the channel attribution weights obtained using system and method of the present disclosure. While attribution weights remained low and uniform across all channels for low- return users, indicating low affinity for any of the channels, medium- and high- return users displayed high propensity for all channels. Furthermore, high-return users were more prone to buying via channel 1 to 6.
[086] Therefore, by temporal modeling of the MTA problem, coupled with compensating for selection bias helps the attention layer (or attention mechanism) in the system of FIG. 2 (not shown in FIG. 2) and CRN of FIG. 3 to derive reliable inferences on the buying patterns and user behavior. Furthermore, system and method of the present disclosure provides an unbiased interpretation angle to the attribution weights obtained from the DNN architecture, which was not available in research works (e.g., refer Ren et al.).
[087] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[088] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
[089] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[090] The illustrated steps are set out to explain the exemplary
embodiments shown, and it should be anticipated that ongoing technological
development will change the manner in which particular functions are performed.
These examples are presented herein for purposes of illustration, and not limitation.
Further, the boundaries of the functional building blocks have been arbitrarily
defined herein for the convenience of the description. Alternative boundaries can
be defined so long as the specified functions and relationships thereof are
appropriately performed. Alternatives (including equivalents, extensions,
variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[091] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or
stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[092] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.
We Claim:
1. A processor implemented method for causal attention-based estimation of
per-channel attribution, comprising:
receiving, via one or more hardware processors, a browser history associated with one or more users, wherein the browser history comprises one or more touchpoints, and wherein the one or more touchpoints correspond to one or more interactions of the one or more users with information comprised in one or more channels being viewed (202);
learning, via the one or more hardware processors, one or more latent state representations for each of the one or more touchpoints (204);
learning, via the one or more hardware processors, one or more unbiased click representations for each of the one or more touchpoints using the one or more learned latent state representations (206);
learning, via an attention network executed by the one or more hardware processors, a per-touchpoint attribution weight for each of the one or more touchpoints, based on the one or more unbiased click representations (208);
predicting, via the one or more hardware processors, a per-channel attribution based on the per-touchpoint attribution weight, wherein the per-channel attribution corresponds to a conversion value of the one or more channels (210); and
estimating, via the one or more hardware processors, a conversion of the one or more users based on the per-channel attribution and the one or more unbiased click representations for each of the one or more touchpoints (212).
2. The processor implemented method of claim 1, wherein the step of learning
one or more unbiased click representations for each of the one or more touchpoints
using the one or more learned latent state representations is based on a decorrelation
of one or more contexts of the one or more users from one or more associated
channel preferences at each of the one or more touchpoints.
3. The processor implemented method of claim 2, wherein the one or more contexts comprise one or more of an operating system, a user information, a website, a channel, an advertising information, a user demographic, or a touchpoint history associated with the one or more users.
4. The processor implemented method of claim 1, further comprising predicting, based on the one or more unbiased click representations, to predict an equi-propensity of each of the one or more channels, using a first classifier.
5. The processor implemented method of claim 1, further comprising predicting, based on the one or more unbiased click representations, a probability of a click at each of the one or more touchpoints, using a second classifier.
6. A system (100) for causal attention-based estimation of per-channel attribution, comprising:
a memory (102) storing instructions;
one or more communication interfaces (106); and
one or more hardware processors (104) coupled to the memory (102) via the one or more communication interfaces (106), wherein the one or more hardware processors (104) are configured by the instructions to:
receive a browser history associated with one or more users, wherein the browser history comprises one or more touchpoints, and wherein the one or more touchpoints correspond to one or more interactions of the one or more users with information comprised in one or more channels being viewed;
learn one or more latent state representations for each of the one or more touchpoints;
learn one or more unbiased click representations for each of the one or more touchpoints using the one or more learned latent state representations;
learn, via an attention network executed by the one or more hardware processors, a per-touchpoint attribution weight for each of the one or more touchpoints, based on the one or more unbiased click representations; and
predict a per-channel attribution based on the per-touchpoint attribution weight, wherein the per-channel attribution corresponds to a conversion value of the one or more channels; and
estimate a conversion of the one or more users based on the per-channel attribution and the one or more unbiased click representations for each of the one or more touchpoints.
7. The system of claim 6, wherein the one or more unbiased click representations are learnt for each of the one or more touchpoints using the one or more learned latent state representations based on a decorrelation of one or more contexts of the one or more users from one or more associated channel preferences at each of the one or more touchpoints.
8. The system of claim 7, wherein the one or more contexts comprise one or more of an operating system, a user information, a website, a channel, an advertising information, a user demographic, or a touchpoint history associated with the one or more users.
9. The system of claim 6, wherein the one or more hardware processors are further configured by the instructions to predict, based on the one or more unbiased click representations, an equi-propensity of each of the one or more channels, using a first classifier.
10. The system of claim 6, wherein the one or more hardware processors are further configured by the instructions to predict, based on the one or more unbiased click representations, a probability of a click at each of the one or more touchpoints, using a second classifier.
| # | Name | Date |
|---|---|---|
| 1 | 202021049694-STATEMENT OF UNDERTAKING (FORM 3) [13-11-2020(online)].pdf | 2020-11-13 |
| 2 | 202021049694-REQUEST FOR EXAMINATION (FORM-18) [13-11-2020(online)].pdf | 2020-11-13 |
| 3 | 202021049694-FORM 18 [13-11-2020(online)].pdf | 2020-11-13 |
| 4 | 202021049694-FORM 1 [13-11-2020(online)].pdf | 2020-11-13 |
| 5 | 202021049694-FIGURE OF ABSTRACT [13-11-2020(online)].jpg | 2020-11-13 |
| 6 | 202021049694-DRAWINGS [13-11-2020(online)].pdf | 2020-11-13 |
| 7 | 202021049694-DECLARATION OF INVENTORSHIP (FORM 5) [13-11-2020(online)].pdf | 2020-11-13 |
| 8 | 202021049694-COMPLETE SPECIFICATION [13-11-2020(online)].pdf | 2020-11-13 |
| 9 | 202021049694-Proof of Right [23-02-2021(online)].pdf | 2021-02-23 |
| 10 | Abstract1.jpg | 2021-10-19 |
| 11 | 202021049694-FORM-26 [21-10-2021(online)].pdf | 2021-10-21 |
| 12 | 202021049694-FER.pdf | 2023-02-03 |
| 13 | 202021049694-OTHERS [20-07-2023(online)].pdf | 2023-07-20 |
| 14 | 202021049694-FER_SER_REPLY [20-07-2023(online)].pdf | 2023-07-20 |
| 15 | 202021049694-DRAWING [20-07-2023(online)].pdf | 2023-07-20 |
| 16 | 202021049694-CLAIMS [20-07-2023(online)].pdf | 2023-07-20 |
| 17 | 202021049694-PatentCertificate07-08-2024.pdf | 2024-08-07 |
| 18 | 202021049694-IntimationOfGrant07-08-2024.pdf | 2024-08-07 |
| 1 | 202021049694E_03-02-2023.pdf |