Clustering Based Association Of Market Basket

< Back

Clustering Based Association Of Market Basket

Abstract: Systems and methods for clustering based association rules generator for market basket analysis are described herein. In one implementation, a clustering based association rule generator for market basket analysis (102) includes a density logic handler (216) and an events logic handler (218) to prepare a predefined table of clustered transactions which need to be analyzed for association rules. The predefined table is further optimized for association rule generation using other modules of the clustering based association rule generator for market basket analysis (102). Further, a cache calculator (214) to shorten the association rule generation time by shifting the association rule generation processing to the memory completely when a predefined threshold is reached. Further, an association rule generator (232) which uses a predefined set of analytical measures to classify the association rules such that the risk of finding spurious association rules is mitigated.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

30 September 2011

Publication Number

18/2014

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Patent Number

Legal Status

Grant Date

2020-07-13

Renewal Date

Applicants

TATA CONSULTANCY SERVICES LIMITED

Nirmal Building 9th Floor Nariman Point Mumbai Maharashtra

Inventors

1. MOHANTY Santosh Kumar

Tata Consultancy Services limited Gateway Park Akruti Business Port Road No. 13 MIDC Andheri Mumbai 400093 Maharashtra

2. KHOLKAR Dinanath

Tata Consultancy Services Nirlon Knowledge Park B3 Nirlon Complex Goregaon East Mumbai 400 063 Maharashtra

Specification

FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENTS RULES, 2003
COMPLETE SPECIFICATION
(See section 10, rule 13)
/. Title of the invention'. Clustering based Association of Market Basket
2. Applican(s)
NAME NATIONALITY ADDRESS
TATA CONSULTANCY INDIAN Nirmal Building, 9th Floor, Nariman Point, SERVICES LIMITED Mumbai, 400021 Maharashtra India
3. Preamble to the description
COMPLETE SPECIFICATION
The following specification particularly describes the invention and the manner in which it
is to be performed.

FIELD OF THE INVENTION
The present invention relates to the field of data mining or knowledge discovery in databases. More particularly, this invention relates to the process and association rules for determining significant and potentially hidden patterns and extracting useful information that exists in a very large database.
BACKGROUND OF THE INVENTION
Data mining or knowledge discovery in databases (KDD) is a process of determining significant and potentially hidden patterns and extracting useful information that exists in a very large database. In data mining, association rule determination is a popular and well researched method for determining interesting relations between variables in large databases. Such information can be used as the basis for decisions about marketing activities such as, e.g., promotional pricing or product placements. In addition to the above example from market basket analysis, association rules are employed today in many application areas including Web usage mining, intrusion detection and bioinformatics.
Over time a number of algorithms have been used for association rule generation like Apriori algorithm, Eclat algorithm, frequent pattern growth (FP-growth) algorithm, GUHA procedure ASSOC etc. For example in Apriori algorithm for association rule mining, given a set of item sets (for instance, sets of retail transactions, each listing individual items purchased), the Apriori algorithm attempts to find subsets which are common to at least a minimum number 'C of the item sets. The Apriori algorithm uses a "bottom up" approach, where frequent subsets are extended one item at a time (a step known as candidate generation), and groups of candidates are tested against a transactional data having the item sets. The algorithm terminates when no further successful extensions are found.
These algorithms were quite efficient, however when the size of data to be analyzed is very large, the time taken and computational resources needed to generate association rules became significantly large. Also, with the large volume of data to be analyzed and the large number of

the association rules hence generated, increased the risk of finding many spurious associations, i.e., associations which are valid based on the generated association rules but misleading from business decision perspective.
SUMMARY
This summary is provided to introduce concepts and methods for clustering based association rule generation for market basket analysis, and the concepts are further described in the detailed description. This summary is neither intended to identify essential features of the claimed subject matter nor is intended for use in determining or limiting the scope of the claimed subject matter.
A system and a method for clustering based association rule generation for market basket analysis are provided herein. In one implementation, a transactions list of transactions to be analyzed for association rules are populated in a predefined table in a database, after which they are sorted and pruned to remove transactions having only one item. This predefined table is then used to generate association rules using a predefined list of analytical measures and efficient computational method. Also, as a means to optimize the processing time there is a mechanism built in the system to shift the association rule generation processing to the memory completely when a predefined threshold is reached.
The principle objective of the present invention is to provide a system and method for mining
associations rules in large databases.
Another significant objective of the invention is to introduce additional analytical measures for
generating valid association rules that will lead to right business decision.
Another significant objective of the invention is to introduce an events based approach for
generating association rules applicable to specific events. This brings the sharpness of the
association rule through transactional data segmentation and more meaningful business decision.
Another significant objective of the invention is to introduce a density based approach for
generating association rules to significantly optimize the computing time while generating
association rules.
BRIEF DESCRIPTION OF DRAWINGS

Further objects, embodiments, features and advantages of the present invention will be more apparent when read together with the detailed description and the accompanied drawings. The objective of the drawings included herein is to lay more emphasis on understanding the underlying principle of the invention. The manner in which the drawings are presented in no way limit the scope of the invention and the advantages one can garner from the embodiments of the present invention.
The detailed description is described with reference to the accompanying figures. In the figures,
the left-most digit(s) of a reference number identifies the figure in which the reference number
first appears. The same numbers are used throughout the drawings to reference like features and
components.
Drawing 1: Illustrates an example of analysis of transactions.
Drawing 2: Illustrates an exemplary network environment implementing a clustering based
association rule generator for market basket analysis, in accordance with an implementation of
the present subject matter.
Drawing 3: Illustrates an exemplary clustering based association rule generator for market basket
analysis, in accordance with an embodiment of the present subject matter.
Drawing 4: Illustrates an exemplary method for clustering based association rule generation for
market basket analysis in an organization, according to an embodiment of the present subject
matter.
DETAILED DESCRIPTION
To better understand the context of the invention, a brief overview of typical association rules and their derivation is now provided. Let T be a set of all items in a database, for example, a database of goods of a commercial entity, such as a retail store. Let 'T' be a set of all sales transactions taking place in the commercial entity. Each transaction lists the set of items presents in it and hence, represents a nonempty subset of T. For example, in retail store a purchase made by a customer having one or more products can be considered as a transaction and the list of all the products in the store would form the T.

Any nonempty subset of T is called an itemset. Let 'A' and 'B' are the two nonempty itemsets of T. The following is a list of basic set definitions:
~A : Not of A.
A-B : The set of elements which are present in the set A but not in the set B.
AUB : The set of all elements present in the set A or the set B.
Uk Ak : A1 U A2 U... U Ak where Aj , 1 B, where A and B are disjoint and nonempty itemsets. The itemset A is called an antecedent and the itemset B is called a consequent of the association. The acceptance and strength of the association rules is determined using various analytical measures, such as support, confidence, and interest. The support of an association rule is defined as P(A∩B). It is denoted by supp(A=>B). The user specified minimum support is denoted by min_supp. In general, if 'S' is a nonempty subset of T having elements X1; X2, ..., Xm the support of 'S', denoted by supp(S), is defined as P(X1 ∩ X2 ∩...∩ Xm). The percentage of transactions in which all items involved in the association rule are present is given by the support of the association rule. An association rule with little support is usually of less significance to the user and only those association rules which are above the threshold support are accepted.
The confidence of an association rule is defined as P(B/A). It is denoted by conf(A=>B). The user specified minimum confidence is denoted by min_conf. The confidence of the association

rule is a measure of the chance of occurrence of'B' when 'A' has occurred. Thus it is a measure of how strong the implication is and association rules which have its confidence less than the threshold confidence are omitted. The value of min_conf should be greater or equal to the minsupp. Therefore only those association rules which are above the threshold support are generated.
The interest of an association rule is defined as P(B/A) / P(B). It is denoted by interest(A=>B). It measures by what multiple is the chance of occurrence of 'B' enhanced when 'A' occurs. Clearly, interest < 1 is not acceptable. Hence, such association rules are rejected.
The conviction of an association rule is defined as 1 / (P(~B/A) / P(~B)). It is denoted by conv(A=>B). It is the reciprocal of the multiple by which the chance of non-occurrence of B is enhanced when A occurs. It is desirable that this reciprocal of the multiple be as high as possible and values of the conviction < 1 is not acceptable. Hence, an association rule with conviction < 1 is rejected and higher the conviction, more robust is the association rule.
The different analytical measures defined till now, which are support, confidence, interest and conviction have been in use since the last decade but the association rules generated have known issues in terms of generation of spurious association rules. So there is a need to prune misleading association rules and generate valid association rules at each level of association. For this end the present subject matter introduces new analytical measures namely alternate confidence and valence and a decision matrix to decide valid association rules.
The alternate confidence is one of the new analytical measures, which is defined as P(B/~A). It is denoted by aconf(A=>B). The alternate confidence of an association rule is a measure of the chance of occurrence of'B' when 'A' does not occur. This measure is necessary to determine the robustness of the association rule. The alternate confidence of an association rule being equal or close to its confidence means the occurrence of 'B' is independent of the occurrence of 'A'. If the alternate confidence of the association rule is substantially greater than its confidence, then the implication is in the opposite direction meaning 'B' occurs when 'A' does not occur; whereas if it is substantially less than its confidence, then it confirms that the implication is robust.

The valence is another new analytical measure and is defined as the difference between the confidence and the alternate confidence (valence(A,B) = P(B/A) - P(B/~A)). The user specified minimum valence is denoted by minvalence (0 < minvalence < 1). The computational principles for valence and interest also states that 'valence < 0' if and only if 'interest <1'. Both analytical measures are important to determine the validity of the association rule that is defined in the decision matrix.
To bring a structure to the association rule selection process, a decision matrix is introduced wherein only associations whose support > min_supp and confidence > minconf are generated. In one implementation, the decision matrix can be represented in a tabular format as below:

Conditional Logic
valence < min valence interest < 1 conv < 1 conv < interest Decision
Y Y/N Y/N Y/N Reject
Y/N Y Y/N Y/N Reject
Y/N Y/N Y Y/N Reject
N N N Y Weakly Accept
N N N N Accept
The analytical measures described above can be further elucidated using an example of a market basket analysis for a supermarket where biscuits and chocolate are sold. Let us examine the association rule biscuits=>chocolates. Let minsupp be 30%, min_conf be 50% and minvalence is 0.15. The analysis of the transactions is illustrated in Fig 1:
As represented in the diagram, 65% of the transactions contain biscuits, 70% contain chocolates and 50%o contain both biscuits and chocolates. Hence the support (50%) of the association rule is greater than the min_supp. The confidence of the association rule is 50/65 or 0.77 {11%), which is greater than the minconf. The alternate confidence of the association rule is (70-50) / (100-65) or 0.57. The valence of the rule is 0.2 (0.77 - 0.57), which is greater than the minvalence. The interest measure of the rule is 0.50 / (.65*0.70) or 1.10. The conviction of the rule is (1.0 -0.70)/(0.15/0.65) or 1.3. Since valence > min_valence, interest > 1, conviction > 1, and conviction > interest, the association rule generated is categorized as 'Accept'.
In the present subject matter, systems and methods for a clustering based association rule generator for market basket analysis are proposed to overcome the problems associated with

prior methods and systems. In an embodiment of the proposed system of association rule generation for market basket analysis, the clustering based association rule generator for market basket analysis has a provision for enhanced analytical measures, decision matrix, reduction of processing time with in memory retention of transactions, events based clustering and density based clustering.
The systems and methods for the clustering based association rule generator for market basket analysis can be implemented in a variety of computing systems. The computing systems that can implement the described method(s) include, but are not restricted to, mainframe computers, workstations, personal computers, desktop computers, minicomputers, servers, multiprocessor systems, laptops and the like. The embodiments are described in the context of the following exemplary system architecture(s) represented by Fig 2 to 4.
Fig. 2 illustrates a network environment 100 implementing a clustering based association rule generator for market basket analysis 102, also referred to as the CAMB system 102, in accordance with an embodiment of the present subject matter. In one implementation, the network environment 100 can be an office network, including thousands of office personal computers, laptops, various servers, such as blade servers, and other computing devices connected over a network 106. In another implementation, the network environment 100 can be a retail network with a limited number of points of sale terminals and barcode scanners connected over the network 106. In yet another implementation, the CAMB system 102 can be a standalone system.
The CAMB system 102 is further connected to a plurality of user devices 104-1, 104-2, 104-3...104-N, collectively referred to as the user devices 104 and individually referred to as a user device 104. The CAMB system 102 and the user devices 104 may be implemented as any of a variety of conventional computing devices, including, for example, servers, a desktop PC, a notebook or portable computer, a workstation, a mainframe computer, and an internet appliance. The CAMB system 102 is connected to the user devices 104 over the network 106 through one or more communication links.

The network 106 may be a wireless network, a wired network, or a combination thereof. The network 106 can also be an individual network or a collection of many such individual networks, interconnected with each other and functioning as a single large network, e.g., the Internet or an intranet. The network 106 can be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, and such. The network 106 may either be a dedicated network or a shared network, which represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), etc., to communicate with each other. The users, such as analytics experts, and database administrators may interact through the user devices 104 with the CAMB system 102 for faster generation of non spurious association rules.
The clustering based association rule generator 102 is further connected to a database 108 having information about all the transactions in the network environment 100. For example, in the case of the network environment 100 being a retail market network, the database may have information about all the interactions taking place in the retail market, all the items, such as bread, butter, biscuit present in the retail market, and all events, such weekday shopping, weekend shopping, festive shopping, sales occurring the retail market. The database 108 resides in a standard relational database environment. The database has at least three tables - an Item table, consisting all details of the. items; a Txn table, consisting all details of the transactions; a predefined table called as an Item_Txn table, where each record contains the respective keys of Item and Txn denoting the presence of an item in a transaction and an Events Indicator denoting the type of event the transaction belongs to. It should be appreciated that the terminology utilized for annotating the predefined table is only indicative and should not construed as a limitation.
Fig 3 illustrates the clustering based association rule generator for market basket analysis 102, in accordance with an embodiment of the present subject matter. In said embodiment, the CAMB system 102 includes one or more CPU's 204, hereinafter referred to as the processor(s) 204, supporting circuits 206 and a memory 210 coupled to the processor 204. The processor 204 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices

that manipulate signals based on operational instructions. Among other capabilities, the processor 204 is configured to fetch and execute computer-readable instructions and data stored in the memory 210. The supporting circuits 206, includes components, for example, but not 'limited to, Input/ Output (I/O) interfaces (not shown in the figure) and network interfaces (not shown in the figure).
The I/O interfaces may include a variety of software and hardware interfaces, for example, interface for peripheral device(s) such as a display unit, a keyboard, a mouse, an external memory, a printer, etc. The network interfaces may enable the CAMB system 102 to communicate with other computing devices and peripheral devices, such as web servers, and external databases over the network 106. The network interfaces may facilitate multiple communications within a wide variety of protocols and networks, such as wired networks, e.g., LAN, cable, etc., and/or wireless networks, e.g., WLAN, cellular, satellite, etc.
The memory 210 may include any computer-readable medium known in the art including, for example, volatile memory such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. The memory 210 includes program modules 212 and program data 220.
The program modules 212 include routines, programs, objects, components, data structures, etc., which perform particular tasks or implement particular abstract data types. The program modules 212 further include an input module 236, a database connector 240, a candidate set generator 238, a cache calculator 214, a frequent itemset generator 234, an events logic handler 218, a density logic handler 216, an association rule generator 232, an output module 230, and other modules (not shown in the diagram). The other modules may include programs or coded instructions that supplement applications and functions on the CAMB system 102, for example, programs in operating system of the CAMB system 102.
The program data 220, amongst other things, serves as a repository for storing data processed, received, and generated by one or more of the program modules 212. The program data 220

includes candidate data 222, frequent itemset data 228, association rule data 224, and other data 226. The other data 226 may include data generated as a result of the execution of one or more modules in the program modules 212, such as the other modules.
In the exemplary embodiment, the input module 236 is designed to accept inputs from the users. Examples of inputs received by this module from the users are min_supp, minconf, min_valence, and database connection details. These inputs are then used by other modules of the CAMB system 102. The database connection details are used by the database connector 240 to establish a connection with the database 108. Once a database connection has been established by database connector 240 the other modules in the memory can read, write or modify the data in the database 108.
The density logic handler 216 consists of two major sub-modules: an initialization sub-module (not shown in the figure) and a Build sub-module (not shown in the figure). Once a connection to the database 108 is established, the Initialization sub-module of the density logic handler 216 uses the Item table to assign serial numbers to all the unique items present in the Txn table. In an alternate embodiment the Initialization sub-module goes through the Txn table to identify unique items in the Txn table and populates the Item table and subsequently assigns them serial numbers. The serial numbers can be represented as ij to JN where ii represents unique item 1 and iN represents unique item N, where N is the number of unique items present in Txn table. The initialization sub-module then calculates the number of transactions present in Txn table which is represented as T. The initialization sub-module then populates the Item_Txn table which has 'Tn' rows with each row corresponding to a transaction and N+3 columns. The first three columns in the Item_Txn table represent the Transaction Number, the Events Indicator, and a Density Index. Each of the next N columns in the ItemTxn table represents a unique item, for example, column 4 represents ii and column N+3 represents in
The Initialization sub-module then analyzes each row from the Txn table containing the details of the transactions and fills a 0 or 1 in columns 4 to N+3 of each row if the corresponding item is present in the transaction or not. For example, in a row, say row 1, items 2, 4, 5, and 6 are present, the Initialization sub-module then fills a 1 in columns corresponding to the items 2, 4, 5, and 6, i.e., the columns 5, 7, 8, and 9 and fills 0 in the remaining columns. The Initialization sub-

module then calculates the number of unique items present in each transaction and enters that value, as the Density Index, in a Density Index Column, i.e., the column representing the Density Index of each transaction. For instance, in the above example, the Density Index of the transaction represented in column 1 is 4 and will be entered by the Initialization sub-module in the Density Index Column corresponding to the row 1. The Initialization sub-module also copies the Transaction Number and the Event Indicator from the Txn table to Transaction Number column and Events Indicator column, i.e., the columns representing the Transaction Number and the event indicator, respectively, of each transaction in ItemTxn table.
The Build sub-module of the density logic handler 216 subsequently prunes the ItemTxn table so that records corresponding to all the transactions having Density Index value as ' 1' are deleted and accordingly adjusts the value of 'T' and 'N'. The events logic handler 218 first sorts the ItemTxn table in ascending order of the Events Indicator. Then events logic handler 218sorts the transactions for every Event Indicator value such that the Density Index is in ascending order. The following table (Table 01) represents the Txn Table, conversion to event-density-Txn Matrix, and event driven cluster tables with density based sorting for association rule generation.

Table 01 The ItemTxn table is then used by the Candidate Set Generator 238 and Frequent Itemset generator 234 which are standard modules executing the Generate-candidate and Frequent-Itemset-generator functions of the Apriori algorithm. For clarity, a Candidate Set at level 'k' (Ck) is the set of all itemset having k items with each itemset has support > minsupp. The candidate set thus includes a potential list of association rules, interchangeably referred to as the items combinations. A Frequent Itemset at level 'k' (Fk) is the set of all itemset under Ck where confidence > min_conf. Each Ck is derived efficiently from Fk-1 and each Fk is derived efficiently from Ck.
The Candidate Set Generator 238 creates a new candidate list of potential Frequent Itemset by combining items from a last list of Frequent Itemset Transactions passed as parameters to it. The Candidate Set Generator 238 also prunes various transaction combinations based on a flag variable called prune flag, which indicates whether a row from Frequent Itemset Transactions has to be pruned or not.
A first list of the Frequent Itemset Transactions consists of all the transactions recorded in the Item_Txn table. The Frequent Itemset generator 234 uses the newly generated candidate list from the Candidate Set generator 238 and checks whether each of the items combinations generated pass a minimum support test, i.e., whether each of the combinations has support more than the minsupp. If the items combination fails the minimum support test, the prune flag for the items combination is turned on. After the check for all the items in candidate list is done this list is passed to the Candidate Set Generator 238 as Frequent Itemset list along with the prune flag for each list item, for generation of new candidate list This process is repeated till the candidate set generated by the Candidate Set generator 238 is null at which stage the last Frequent Itemset list is provided as output to Association Rule Generator 232.
Further, to generate the Frequent Itemset from the Candidate set we need to find the support of each set in the Candidate Set. This is a key issue because this needs a scan over the whole database. Therefore if N iterations of a while loop, used for scanning, is made, N scans over the database is needed. Also, it means communicating with the database 108 N times. S ince the size

of the database is potentially large, this operation will be costly. To reduce the number of database iterations the Cache Calculator 214, calculates the memory requirements to store the Candidate set in the Memory 210. If the ratio of memory requirements for the Candidate list to the physical memory available in the memory 210 decreases to a preconfigured value all further processing is shifted to the memory 210. In one implementation, the preconfigured value should be ideally 0.5. Alternatively, the preconfigured value can lie between a range of about 0.25 and 0.35 if the processor is dedicated for this task. When the Candidate list and the Frequent Itemset are copied into the memory in Candidate data 222 and Frequent Itemset data 228 sections in the memory 210 and further execution is done using these sections instead of the database 108. The old Candidate List and Frequent Itemset list generated are overwritten by the new Candidate List and Frequent Itemset list and provided to the association rule generator 232.
The Association Rule Generator 232 tests the frequent itemset, having a final set of item combinations, against the various analytical measures provided in the decision matrix after which it is sent to the Output module 230. The Association Rule Generator 232 comprises of a calculation module (not shown in the figure) and an evaluation module (not shown in the figure). The calculation module calculates a set of predefined analytical measures for each of the frequent itemsets provided to it. In one implementation the predefined set of analytical measures are confidence, alternate confidence, valance, interest and conviction. The evaluation module uses the user provided inputs and the analytical measures calculated by calculation module to classify the last frequent itemset as the association rules. In one implementation, the predefined set of user inputs are minconf and minvalence which are from the input for the decision matrix to classify the association rules as Reject, Weakly Accept and Accept. The items combinations that are either accepted or weakly accepted are provided as verified association rules to the output module 230. The Output module 230 is used to send the verified association rules to external world through a variety of interfaces like display unit, printer or a file sent over the network 106.
Fig 4 illustrates an exemplary method 300 for market basket association rule generation in an organization according to one of the embodiments of the present subject matter. The exemplary method 300 may be described in the general context of computer executable instructions and may be a computer implementable method. Generally, computer executable instructions can

include routines, programs, objects, components, data structures, procedures, modules, functions and the like that perform particular functions or implement particular abstract data types. The method may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communication network. In a distributed computing environment, computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.
The order in which the method is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method, or an alternate method. Additionally, individual blocks may be deleted from the method without departing from the spirit and scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or a combination thereof.
In accordance with one embodiment of the present subject matter, the method 300 may be implemented in the previously described CAMB system 102. The method 300 is initiated at the block Stepl wherein the user inputs including, but not limited to, elimination criteria, such as values of min_supp and min_conf are accepted. The method 300 also receives the connection string (not shown in the diagram) to connect to the database 108. In the Step2 the connection with the database 108 is established. In the step 3 the Initialization sub-module of the Density Logic Handler module 216 populates a predefined table, interchangeably referred to as the Item_Txn table, having at least one record optimized for generation of association rules using a transaction table, for example, the Txn table present in the database. As a sub step of the step 3, the Initialization sub-module reads a row from the Txn table and fills in 1 for every unique item present in the transaction in the column predefined for the item in Item_Txn table. In the next sub step the total number of unique items present in the row is filled in the Density Index column of the row. In Step 4 the rows containing transactions having only one item are pruned and new values of the total number of transactions and unique items is calculated.
In Step 5 the Item_Txn table is sorted in ascending order of Events Index after which in Step 6 for each of the Event Index value the transactions are ordered in ascending order of the Density

index. This constitutes an initial frequent itemset Transactions list which is passed on to the Step 7. In Step 7 a new candidate set of potential Frequent Itemset is created by combining items from the last list of Frequent Itemset Transactions passed as parameters to it. During the creation of the candidate list the items having the prune flag turned on are dropped from the combination. The list generated at this stage is called the Candidate items list.
In Step 8 the size of the Candidate items list generated in the Step7 is compared against the Physical memory available in the CAMB system 102. If the ratio of the Memory requirements for the Candidate items list to the physical memory available increases to around a preconfigured value all further processing is shifted to the memory 210. In Step 9 the newly generated Candidate items list is checked and if found empty, the processing is shifted to step 12.
In Step 10, the non empty Candidate items list is checked for min_supp condition. In this check the support measure for each itemset in the Candidate items list is calculated. If the value is found to be below the minsupp level the prune switch is set. In Step 11, after check for all items in the Candidate items list is complete it creates the new Frequent Itemset list, which is then sent to step 7. In Step 12 the last Frequent Itemset list is checked against the decision matrix using minconf and minvalence to classify the frequent itemset list as association rules and finally in step 13 the accepted and weakly accepted association rules are published for business decision.
Results of Exemplary association rules for the Table 01 when tested against the analytical
measures are as follows:
The Table 01 Txn dataset under the assumption of min_supp = 20%, minconf = 30%, min_valence = 0.0, provides the following valid association rules for 'Generic Day' and 'Weekend' events.
Generic Day Event:
Level 01 Association Rule:
{1=>3, Conf = 100.0%, Interest = 1.5, Conv = Infinity, Rule = 'ACCEPT'}
{2=>4, Conf = 100%, Interest = 2.0, Conv = Infinity, Rule = 'ACCEPT'}
{2=>6, Conf = 50%>, Interest = 1.0, Conv = 1.0, Rule = 'ACCEPT'}

{2=>7, Conf = 50.0%, Interest = 1.2, Conv =1.17, Rule = 'WEAKLY ACCEPT'}
{5=>6, Conf=80.0%, Interest = 1.6, Conv = 2.5, Rule = 'ACCEPT'}
There are no association rules which are level 02 or higher satisfying the minsupp, minconf and minvalence criteria.
Weekend Event:
Level 01 Association Rule:
{1=>2, Conf =83.33%, Interest = 1.04, Conv = 1.2, Rule='ACCEPT'}
{1=>3, Conf =66.66%, Interest =1.11, Conv =1.21, Rule = 'ACCEPT'}
{1=>4, Conf = 100.0%, Interest = 1.25, Conv = Infinity, Rule='ACCEPT'}
{2=>3, Conf =62.5%, Interest = 1.04, Conv = 1.07, Rule ='ACCEPT'}
{3=>4, Conf = 83.33%, Interest = 1.04, Conv = 1.2, Rule = 'ACCEPT'}
{3=>6, Conf = 66.66%, Interest =1.11, Conv = 1.2, Rule = 'ACCEPT'}
{5=>6, Conf = 62.5%, Interest = 1.04, Conv = 1.07, Rule = 'ACCEPT'}
{5=>7, Conf = 75.0%, Interest = 1.07, Conv = 1.2, Rule = 'ACCEPT'}
{6=>7, Conf=83.33%, Interest =1.19, Conv = 1.2, Rule = 'ACCEPT'}
Level 02 Association Rule:
{(1, 2) =>3, Conf =60.0%, Interest = 1.0, Conv = 1.0, Rule='ACCEPT'}
{(1,2) =>4, Conf = 100.0%, Interest = 1.25, Conv = Infinity, Rule='ACCEPT'}
{(1, 3) =>4, Conf = 100.0%, Interest = 1.25, Conv = Infinity, Rule = 'ACCEPT'}
{(3, 4) =>6, Conf = 60.0%, Interest = 1.0, Conv = 1.0, Rule = 'ACCEPT'}
{(5, 6) =>7, Conf =80.0%, Interest =1.14, Conv = 1.5, Rule = ACCEPT'}
Level 03 Association Rule:
{(1,2, 3) =>4, Conf = 100.0%, Interest = 1.25, Conv = Infinity, Rule='ACCEPT'}
There are no association rules which are level 04 or higher satisfying the min_supp, minconf and min_valence criteria.

Although implementations of a clustering based association rule generator have been described in language specific to structural features and/or methods, it is to be understood that the appended claims are not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as implementations of the clustering based association rule generator.

I/WE CLAIM:
1. A clustering based association rule generator for market basket analysis (102) which can act on a database (108) having at least one transactions table having a plurality of transactions with each transaction having one or more items, the clustering based association rule generator for market basket analysis (102) comprising: a processor (204); and
a memory (210) coupled to the processor (204), the memory (210)comprising, an input module (236) configured to accept predefined user inputs; a database connector unit (240) to connect to the database (108) based on the predefined user inputs and fetch transactions data present in the database (108);
a density logic handler (216) configured to populate a predefined table from the transactions data such that transactions having more than one items are retained;
an events logic handler (218) configured to sort the predefined table such that the predefined table can be used for association rule generation on the basis of predefined events;
a candidate set generator (238) to generate a candidate set using the transactions data for testing against predefined analytical measures, wherein the candidate set includes a potential list of items combinations;
a frequent itemset generator (234) to test the candidate set against a set of analytical measures to create the next set of item combinations through the candidate set generator; and
an association rule generator (232) to test the item combinations provided
by the frequent itemset generator (234) against another set of analytical measures,
to receive verified association rules.
2. The clustering based association rule generator for market basket analysis (102) as
claimed in claim 1, further comprising an output module (230), configured to provide the
verified association rules from the rule generator (232).

3. The clustering based association rule generator for market basket analysis (102) as claimed in claim 1, wherein the clustering based association rule generator for market basket analysis (102) further comprises a cache calculator (214) configured to evaluate the condition wherein a candidate list and a Frequent Item list would be saved in the memory (210) based on the size of the candidate list and memory space available in the memory (210).
4. The clustering based association rule generator for market basket analysis (102) as claimed in claim 1, wherein the density logic handler (216) further comprises:
an initialization sub module configured to read data from the database (108) and populate the predefined table such that the transactions are grouped by events and ordered by number of unique items in a particular transaction; and
a build sub module configured to delete all transactions in the predefined table having only one unique item.
5. The clustering based association rule generator for market basket analysis (102) as
claimed in claim 1, wherein the rule generator (232) further comprises:
a calculation module for calculation of predefined analytical measures for last frequent itemset; and
an evaluation module for classification of the item combinations as the association rules using the analytical measures.
6. A computerized method for association rule generator using a database containing a
transaction table having a plurality of transaction records each having at least one item,
the method comprises:
receiving a set of inputs from user including a set of one or more elimination criteria;
populating a predefined table having at least one record optimized for generation of association rules using the transaction table present in the database;

generating a frequent itemset by combining one or more items present in each record of the predefined table such that they pass one set of elimination criteria provided in the inputs; and
filtering one or more association rules from a list frequent itemset based on another set of user provided elimination criteria.
7. The computerized method as claimed in claim 6 further comprises displaying the filtered association rules.
8. The computerized method as claimed in claim 6, wherein the generating the set of association rules further comprises evaluating a condition wherein the candidate list and Frequent Item list would be saved in the memory based on the size of candidate list and memory available.
9. The computerized method as claimed in claim 6, wherein the method further comprises :
reading data from the said database;
populating the predefined table such that transactions are grouped by events and ordered by number of unique items in a transaction; and
deleting all the transactions in the predefined table having only one unique item.
10. The computerized method as claimed in claim 6, wherein the method further comprises:
calculating a predefined set of analytical measures for the last frequent itemset; and classification of the last frequent itemset as the one or more association rules using the analytical measures.

Documents

Orders

Section	Controller	Decision Date

Application Documents

#	Name	Date
1	2805-MUM-2011-FORM 18(24-10-2011).pdf	2011-10-24
1	2805-MUM-2011-RELEVANT DOCUMENTS [26-09-2023(online)].pdf	2023-09-26
2	2805-MUM-2011-RELEVANT DOCUMENTS [27-09-2022(online)].pdf	2022-09-27
2	2805-MUM-2011-CORRESPONDENCE(24-10-2011).pdf	2011-10-24
3	2805-MUM-2011-POWER OF ATTORNEY(13-12-2011).pdf	2011-12-13
3	2805-MUM-2011-IntimationOfGrant13-07-2020.pdf	2020-07-13
4	2805-MUM-2011-PatentCertificate13-07-2020.pdf	2020-07-13
4	2805-MUM-2011-CORRESPONDENCE(13-12-2011).pdf	2011-12-13
5	2805-MUM-2011-Written submissions and relevant documents (MANDATORY) [01-11-2019(online)].pdf	2019-11-01
5	2805-MUM-2011-OTHERS [16-03-2018(online)].pdf	2018-03-16
6	2805-MUM-2011-FER_SER_REPLY [16-03-2018(online)].pdf	2018-03-16
6	2805-MUM-2011-Correspondence to notify the Controller (Mandatory) [10-10-2019(online)].pdf	2019-10-10
7	2805-MUM-2011-HearingNoticeLetter-(DateOfHearing-24-10-2019).pdf	2019-10-03
7	2805-MUM-2011-CORRESPONDENCE [16-03-2018(online)].pdf	2018-03-16
8	2805-MUM-2011-CORRESPONDENCE(3-2-2012).pdf	2018-08-10
8	2805-MUM-2011-COMPLETE SPECIFICATION [16-03-2018(online)].pdf	2018-03-16
9	2805-MUM-2011-FER.pdf	2018-08-10
9	2805-MUM-2011-CLAIMS [16-03-2018(online)].pdf	2018-03-16
10	2805-MUM-2011-FORM 1(3-2-2012).pdf	2018-08-10
10	Form-3.pdf	2018-08-10
11	ABSTRACT1.jpg	2018-08-10
11	Form-1.pdf	2018-08-10
12	Drawings.pdf	2018-08-10
13	ABSTRACT1.jpg	2018-08-10
13	Form-1.pdf	2018-08-10
14	2805-MUM-2011-FORM 1(3-2-2012).pdf	2018-08-10
14	Form-3.pdf	2018-08-10
15	2805-MUM-2011-CLAIMS [16-03-2018(online)].pdf	2018-03-16
15	2805-MUM-2011-FER.pdf	2018-08-10
16	2805-MUM-2011-COMPLETE SPECIFICATION [16-03-2018(online)].pdf	2018-03-16
16	2805-MUM-2011-CORRESPONDENCE(3-2-2012).pdf	2018-08-10
17	2805-MUM-2011-CORRESPONDENCE [16-03-2018(online)].pdf	2018-03-16
17	2805-MUM-2011-HearingNoticeLetter-(DateOfHearing-24-10-2019).pdf	2019-10-03
18	2805-MUM-2011-Correspondence to notify the Controller (Mandatory) [10-10-2019(online)].pdf	2019-10-10
18	2805-MUM-2011-FER_SER_REPLY [16-03-2018(online)].pdf	2018-03-16
19	2805-MUM-2011-OTHERS [16-03-2018(online)].pdf	2018-03-16
19	2805-MUM-2011-Written submissions and relevant documents (MANDATORY) [01-11-2019(online)].pdf	2019-11-01
20	2805-MUM-2011-PatentCertificate13-07-2020.pdf	2020-07-13
20	2805-MUM-2011-CORRESPONDENCE(13-12-2011).pdf	2011-12-13
21	2805-MUM-2011-POWER OF ATTORNEY(13-12-2011).pdf	2011-12-13
21	2805-MUM-2011-IntimationOfGrant13-07-2020.pdf	2020-07-13
22	2805-MUM-2011-RELEVANT DOCUMENTS [27-09-2022(online)].pdf	2022-09-27
22	2805-MUM-2011-CORRESPONDENCE(24-10-2011).pdf	2011-10-24
23	2805-MUM-2011-RELEVANT DOCUMENTS [26-09-2023(online)].pdf	2023-09-26
23	2805-MUM-2011-FORM 18(24-10-2011).pdf	2011-10-24

Search Strategy

1	2805_MUM_2011_search_14-09-2017.pdf