System And Method For Sentence Classification Using Capsule Network

< Back

System And Method For Sentence Classification Using Capsule Network

Abstract: This disclosure relates generally to sentence classification using a Multi Dimension capsule network including a word embedding layer, a feature extraction layer composed of a plurality of Bi-LSTMs, a primary capsule layer, a convolutional capsule layer and a softmax layer. The word embedding layer determines an initial sentence representation having a concatenated embedding vector associated with words of the sentence. The Bi-LSTMs encode contextual semantics between the words of the sentence using the concatenated embedding vector to obtain context vectors. Multiple capsules associated with distinct dimensions are obtained from the context vectors by a filter of the primary capsule layer. The Convolutional Capsule Layer computes a final sentence representation for the sentence by determining coupling strength between child-parent pair capsules connected in the multiple levels. Class probabilities of the final sentence representation is computed by passing the final sentence representation through the softmax layer for classification of the sentence. [To be published with FIG. 2]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

26 July 2019

Publication Number

05/2021

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

kcopatents@khaitanco.com

Parent Application

Patent Number

Legal Status

Grant Date

2024-05-17

Renewal Date

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th Floor, Nariman Point Mumbai -400021 Maharashtra India

Inventors

1. KHURANA, Prerna

Tata Consultancy Services Limited Plot no. A-44 & A45, Ground , 1st to 05th floor & 10th floor Block C&D, Sector 62, Noida - 201309 Uttar Pradesh, India

2. SRIVASTAVA, Saurabh

Tata Consultancy Services Limited Plot no. A-44 & A45, Ground , 1st to 05th floor & 10th floor Block C&D, Sector 62, Noida - 201309 Uttar Pradesh, India

Specification

FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION (See Section 10 and Rule 13)
Title of invention:
SYSTEM AND METHOD FOR SENTENCE CLASSIFICATION USING CAPSULE NETWORK
Applicant
Tata Consultancy Services Limited A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
The following specification particularly describes the invention and the manner in which it is to be performed.

TECHNICAL FIELD [001] The disclosure herein generally relates to sentence classification, and, more particularly, to system and method for sentence classification using multi-dimension capsule network.
BACKGROUND [002] Sentence classification has been utilized for classification of sentences into a class. For example, sentence classification has been used for classifying sentences into a class associated with toxic comments, hate speech, appreciative speech, cheering messages, greetings, and so on. Typically, due to enormous amount of content available on social network sites and internet in general, it is immensely challenging to manually detect the class to which a sentence belongs.
SUMMARY [003] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a method for sentence classification is provided. The processor implemented method for sentence classification includes employing, a Multi Dimension Capsule Network for classification of a sentence via one or more hardware processors. The Multi Dimension Capsule Network includes a word embedding layer, a feature extraction layer composed of a plurality of Bi-directional Long short term memory (Bi-LSTMs), a primary capsule layer, a convolutional capsule layer and a softmax layer. Employing the Multi Dimension Capsule Network includes determining, by the word embedding layer, an initial sentence representation for the sentence via the one or more hardware processors. The initial sentence representation includes a concatenated embedding vector associated with a plurality of words of the sentence. Further, the method includes encoding, by the plurality of Bi-LSTMs, contextual semantics between the plurality of words of the sentence using the concatenated embedding vector to

obtain a plurality of context vectors (Ci), via the one or more hardware processors. Also, the method includes obtaining, from the plurality of context vectors (Ci). The plurality of capsules are associated with distinct dimensions by a filter of the primary capsule layer, via the one or more hardware processors. Each capsule of the plurality of capsules includes a set of instantiated parameters obtained from the plurality of context vectors (Ci). Each of the plurality of capsules are connected in multiple levels using shared transformation matrices and a routing model. Moreover, the method includes computing, by the Convolutional Capsule Layer, a final sentence representation for the sentence by determining coupling strength between child-parent pair capsules of the plurality of capsules connected in the multiple levels, via the one or more hardware processors. Also, the method includes calculating, via the one or more hardware processors, class probabilities of the final sentence representation by passing the final sentence representation through the softmax layer for classification of the sentence.
[004] In another aspect, a system for sentence classification is provided. The system includes one or more memories; and one or more hardware processors, the one or more memories coupled to the one or more hardware processors, wherein the one or more hardware processors are configured to execute programmed instructions stored in the one or more memories to employ, a Multi Dimension Capsule Network for classification of a sentence. The Multi Dimension Capsule Network includes a word embedding layer, a feature extraction layer composed of a plurality of Bi-LSTMs, a primary capsule layer, a convolutional capsule layer and a softmax layer. The one or more hardware processors are configured to execute instructions to determine, by the word embedding layer, an initial sentence representation for the sentence, the initial sentence representation comprising a concatenated embedding vector associated with a plurality of words of the sentence. Further, the one or more hardware processors are configured to execute instructions to encode, by the plurality of Bi-LSTMs, contextual semantics between the plurality of words of the sentence using the concatenated embedding vector to obtain a plurality of context vectors (Ci). Moreover, the one or more hardware processors are configured to execute

instructions to obtain, from the plurality of context vectors (Ci), a plurality of capsules associated with distinct dimensions by a filter of the primary capsule layer. Each capsule of the plurality of capsules includes a set of instantiated parameters obtained from the plurality of context vectors (Ci). Each of the plurality of capsules are connected in multiple levels using shared transformation matrices and a routing model. Also, the one or more hardware processors are configured to execute instructions to compute, by the Convolutional Capsule Layer, a final sentence representation for the sentence by determining coupling strength between child-parent pair capsules of the plurality of capsules connected in the multiple levels. Finally, the one or more hardware processors are configured to execute instructions to calculate class probabilities of the final sentence representation by passing the final sentence representation through the softmax layer for classification of the sentence.
[005] In yet another aspect, a non-transitory computer readable medium for a method for sentence classification. The processor implemented method for sentence classification includes employing, a Multi Dimension Capsule Network for classification of a sentence via one or more hardware processors. The Multi Dimension Capsule Network includes a word embedding layer, a feature extraction layer composed of a plurality of Bi-LSTMs, a primary capsule layer, a convolutional capsule layer and a softmax layer. Employing the Multi Dimension Capsule Network includes determining, by the word embedding layer, an initial sentence representation for the sentence via the one or more hardware processors. The initial sentence representation includes a concatenated embedding vector associated with a plurality of words of the sentence. Further, the method includes encoding, by the plurality of Bi-LSTMs, contextual semantics between the plurality of words of the sentence using the concatenated embedding vector to obtain a plurality of context vectors (Ci), via the one or more hardware processors. Also, the method includes obtaining, from the plurality of context vectors (Ci). The plurality of capsules are associated with distinct dimensions by a filter of the primary capsule layer, via the one or more hardware processors. Each capsule of the plurality of capsules includes a set of instantiated parameters obtained from

the plurality of context vectors (Ci). Each of the plurality of capsules are connected in multiple levels using shared transformation matrices and a routing model. Moreover, the method includes computing, by the Convolutional Capsule Layer, a final sentence representation for the sentence by determining coupling strength between child-parent pair capsules of the plurality of capsules connected in the multiple levels, via the one or more hardware processors. Also, the method includes calculating, via the one or more hardware processors, class probabilities of the final sentence representation by passing the final sentence representation through the softmax layer for classification of the sentence.
[006] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
[007] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
[008] FIG. 1 illustrates an exemplary network environment for implementation of a system for sentence classification using a multi dimension capsule network according to some embodiments of the present disclosure.
[009] FIG. 2 illustrates a flow diagram for a method for sentence classification using a multi dimension capsule network, in accordance with an example embodiment of the present disclosure.
[010] FIG. 3 illustrates an example block diagram representing architecture of a multi dimension capsule network hierarchical capsule for sentence classification, in accordance with an example embodiment of the present disclosure.
[011] FIG. 4 illustrates a sentence encoder of the multi dimension capsule network of FIG. 3, in accordance with an example embodiment of the present disclosure.

[012] FIG. 5 illustrates a block diagram of an exemplary computer system for implementing embodiments consistent with the present disclosure.
[013] FIG. 6 illustrates a confusion matrix for TRACTM database utilized for an example scenario of implementing a multi dimension capsule network for sentence classification, in accordance with an example embodiment.
[014] FIGS. 7, 8 and 9 illustrates various examples of sentence classification performed by disclosed system, in accordance with an example embodiment.
DETAILED DESCRIPTION OF EMBODIMENTS [015] In general, there is an enormous amount of content available on the internet. It is therefore a humongous task to perform classification of sentences in the available content manually. Earlier works in Capsule network based deep learning architecture to classify sentences have proved that these networks work well as compared to other deep learning architectures. Numerous machine learning methods for detection of comments such as inappropriate comments in online forums exist today. Traditional approaches include Naïve Bayes classifier logistic regression support vector machines, and random forests. However, deep learning models, for instance, convolutional neural networks and variants of recurrent neural networks have shown promising results and achieved better accuracies in sentence classification tasks. Some of the conventional techniques in sentence classification compared different deep learning and shallow approaches on datasets and proposed an ensemble model that outperforms all approaches. Further, work done by proposed LSTMs with attention on TRAC dataset for better classification. Capsule networks have shown to work better on images, also recently these networks have been investigated for text classification.
[016] Various embodiments disclosed herein provide system and method for sentence classification using a multi dimension capsule network. For example in one embodiment, the method includes obtaining a sentence representation of sentences associated with a task by encoding input words of the sentence into a

fixed length vector. After obtaining word representations, said word representations are concatenated into one single vector of fixed length, this fixed length vector is then passed, first to a feature extraction layer and then to a layer of capsules to further squeeze out the essential word level features. An important contribution of the disclosed embodiments is employability of multi-dimension Capsule network for capturing the sentence representation. The multi-dimension Capsule network captures features necessary for classification of such sentences. The disclosed system is capable of handling transliterated comments. An example of the proposed multi-dimension capsule network and use of multi-dimension capsule network for detecting aggression and toxicity is described further with reference to FIGS. 1-9.
[017] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments. It is intended that the following detailed description be considered as exemplary only, with the true scope being indicated by the following claims.
[018] Referring now to the drawings, and more particularly to FIGS. 1 through 9, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
[019] FIG. 1 illustrates an example network implementation 100 of a system 102 for sentence classification using multi-dimension capsule network, in accordance with an example embodiment. The multi-dimension structure of the disclosed capsule network aid in robustly classifying the sentences, as described further in detail below.

[020] In various embodiments disclosed herein, the multi-dimension capsule network for sentence classification includes a word embedding layer having a word embedding layer, feature extraction layer having multiple Bi-LSTMs, primary capsule layer, convolution capsule layer. The word embedding layer obtains a fixed-size vector representation of each word of a sentence. The feature extraction layer composed of the plurality of Bi-LSTMs encode the whole sentence, then a Primary and Convolutional Capsule layer extracts the high-level features from the sentences. The architecture for multi-dimension capsule network is described further in detail with reference to FIGS. 2-5.
[021] Although the present disclosure is explained considering that the system 102 is implemented on a server, it may be understood that the system 102 may also be implemented in a variety of computing systems 104, such as a laptop computer, a desktop computer, a notebook, a workstation, a cloud-based computing environment and the like. It will be understood that the system 102 may be accessed through one or more devices 106-1, 106-2... 106-N, collectively referred to as devices 106 hereinafter, or applications residing on the devices 106. Examples of the devices 106 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, a Smartphone, a tablet computer, a workstation and the like. The devices 106 are communicatively coupled to the system 102 through a network 108.
[022] In an embodiment, the network 108 may be a wireless or a wired network, or a combination thereof. In an example, the network 108 can be implemented as a computer network, as one of the different types of networks, such as virtual private network (VPN), intranet, local area network (LAN), wide area network (WAN), the internet, and such. The network 106 may either be a dedicated network or a shared network, which represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), and Wireless Application Protocol (WAP), to communicate with each other. Further, the network 108 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices. The

network devices within the network 108 may interact with the system 102 through communication links.
[023] As discussed above, the system 102 may be implemented in a computing device 104, such as a hand-held device, a laptop or other portable computer, a tablet computer, a mobile phone, a PDA, a smartphone, and a desktop computer. The system 102 may also be implemented in a workstation, a mainframe computer, a server, and a network server. In an embodiment, the system 102 may be coupled to a data repository, for example, a repository 112. The repository 112 may store data processed, received, and generated by the system 102. In an alternate embodiment, the system 102 may include the data repository 112.
[024] The network environment 100 supports various connectivity options such as BLUETOOTH®, USB, ZigBee and other cellular services. The network environment enables connection of devices 106 such as Smartphone with the server 104, and accordingly with the database 112 using any communication link including Internet, WAN, MAN, and so on. In an exemplary embodiment, the system 102 is implemented to operate as a stand-alone device. In another embodiment, the system 102 may be implemented to work as a loosely coupled device to a smart computing environment. The components and functionalities of the system 102 are described further in detail with reference to FIGS.2-5.
[025] Referring collectively to FIGS. 2-5, components and functionalities of the system 102 for sentence classification using multi-dimension capsule network is described in accordance with an example embodiment. For example, FIG. 2 illustrate a flow diagram for a method for sentence classification using a multi dimension capsule network, in accordance with an example embodiment of the present disclosure. FIG. 3 illustrates an example block diagram of a multi-dimension capsule network for sentence classification, in accordance with an example embodiment of the present disclosure. FIG. 4 illustrates a sentence encoder of the multi dimension capsule network of FIG. 3, in accordance with an example embodiment of the present disclosure. FIG. 5 illustrates a block diagram

of an exemplary computer system for implementing embodiments consistent with the present disclosure.
[026] In various embodiments described herein, the sentence classification is performed by obtaining an initial sentence representation of the sentence, and individual word representation for the words of the sentence obtained from pre-trained fast-text embedding are concatenated. The initial sentence representation is then passed through a feature extraction layer which consists of Bi-LSTM units to get a sentence representation. This representation is then passed through the Primary and Convolutional Capsule Layer to extract the high-level features of a sentence. Finally, the features are then passed through a classification layer to calculate the class probabilities. The method of detecting toxicity is described in further detail with reference to FIG. 2 below.
[027] At 202 of method 200, a multi-dimensional capsule network for sentence classification of content is employed. The multi-dimensional capsule network 300 (illustrated with reference to FIG. 3) includes a word embedding layer 302, a feature extraction layer 304, a primary capsule layer 306 and a convolutional capsule layer 308. In addition, the multi-dimensional capsule network 300 includes a softmax layer 310. In an embodiment employing the multi-dimensional capsule network 300 for sentence classification includes a method as is defined using 204-212 of method 200 below, and reference to FIGS. 2-5
[028] At 204, the method 202 includes determining an initial sentence representation for each of the sentence associated with the content. The sentence may include a plurality of words. The initial sentence representation is determined by the word embedding layer 302 of the sentence encoder 300. The initial sentence representation includes a concatenated embedding vector. The concatenated embedding vector includes a fixed-length vector vi corresponding to each word wi of the sentence. The fixed-length vector corresponding to a sentence is representative of lexical-semantics of words of the sentence. In an embodiment, the fixed-length vector vi corresponding to a word wi is obtained a from a ‘weight matrix’ W ∈ Rdw x |V|, where dw is the vector dimension and |V | is the vocabulary

size. Each column j of weight matrix corresponds to a vector Wj ∈ Rdw x |V| for the jth word in vocabulary. Each vi represents the lexical-semantics of words obtained after pre-training from a large corpus through an unsupervised training.
[029] At 206 of method 202, the feature extraction layer 304 composed of a plurality of Bi-LSTMs encodes contextual semantics between the words within the sentence using the concatenated embedding vector associated with the sentence to obtain a plurality of context vectors (Ci ). For example, for a sentence of length N, the concatenated embedding vectors may be [v1;v2...,vN]. The contextual semantics between words within a sentence is encoded through it. The output from feature extraction layer 304 is for a word wi
where, Cl and ci right and left contexts (hidden activations), and dsen is number of LSTM units. Finally, for all the N words,
C = [C1,C2, …,CN]∈ RN X (2Xdsen)
[030] The capsules replace singular scalar outputs by local “capsules” which are vectors of highly informative outputs known as “instantiated parameters”. In text processing, these instantiated parameters can be hypothesized as local orders of the words and their semantic representation. In an embodiment, to capture the semantics and cover a large part of a sentence, the primary capsule layer 408 includes a filter (or shared window) Wb which obtains a plurality of capsules (pi) associated with distinct dimension from the plurality of context vectors (Ci), at 208. As the plurality of capsules are associated with distinct dimensions, different instantiation parameters (or features) may be captured therefrom. In an embodiment, the dimension of the capsules can be varied to capture different sets of instantiation parameters from the context vectors by the capsules. An example representation of the capsule layers of the plurality of capsules is illustrated and described further with reference to FIG. 4.
[031] As illustrated in FIG. 4, for context vectors Ci , different shared windows are used to get capsules pi, where
shared window Wb ∈ R (2 X dsen) X d
pi = g(WbCi) where, g is nonlinear squash activation d is capsule dimension
and dsen is the number of LSTM units used to capture input features. Factor

d can be used to vary a capsule’s dimension which can be used to capture
different instantiation parameters. The capsules are then stacked together
to create a capsule feature map,
P =[p1, p2, p3,…, pC] ∈ R (N X C X d) consisting of total NxC capsules of
dimension d.
[032] It will be noted herein that for the brevity of description, only two capsules are illustrated in FIG. 4, however, in various applications and scenarios, the the primary capsule layer 408 may include more number of capsules.
[033] Each of the plurality of capsules may be connected in multiple levels using shared transformation matrices and a routing model.
[034] In an embodiment, iterative dynamic routing model is used to introduce a coupling effect where the agreement between lower level capsules (layer l) and higher level capsules (l+1) is maintained. In an example scenario, if capsules with low level features at layer l is “m”, and capsules at layer (l+1) is “n” then, for a capsule j at layer (l+1), the output vector can be computed by:

where, cij is the coupling coefficient between capsule i of layer l to capsule j of layer (l+1) and are determined by iterative dynamic routing, Ws is the shared weight matrix between the layers l and l+1. In an embodiment, softmax function is utilized for computations. The softmax function is used over all the b’s to determine the connection strength between the capsules.
The routing process can be interpreted as computing soft attention between lower and higher level capsules.
[035] At 210, the method 202 includes computing, by the Convolutional Capsule Layer 308, a final sentence representation for the sentence. In the Convolutional Capsule Layer 308, the capsules are connected to lower level capsules which determine the child-parent relationship by multiplying the shared transformation matrices followed by the routing algorithm. In an embodiment, the final sentence representation is calculated by determining coupling strength between child-parent pair contextual capsules

where, ui is the child capsule and Ws is shared weight between capsules i
and j.
[036] Finally, the coupling strength between the child-parent capsule is determined by the routing algorithm to produce the parent feature map. The coupling coefficients ci j are calculated iteratively in ‘r’ rounds by:

[037] Logits bi j which are initially same, determines how strongly the capsules j should be coupled with capsule i. The capsules are then flattened out into a single layer and then multiplied by a transformation matrix WFC followed by routing algorithm to compute the final sentence representation (sk).
[038] At 212 of 202, class probabilities of the final sentence representation are calculated by passing the final sentence representation through the softmax layer 310 for classification of the sentence.
[039] FIG. 5 is a block diagram of an exemplary computer system 501 for implementing embodiments consistent with the present disclosure. The computer system 501 may be implemented in alone or in combination of components of the system 102 (FIG. 1). Variations of computer system 501 may be used for implementing the devices included in this disclosure. Computer system 501 may comprise a central processing unit (“CPU” or “hardware processor”) 502. The hardware processor 502 may comprise at least one data processor for executing program components for executing user- or system-generated requests. The processor may include specialized processing units such as integrated system (bus) controllers, memory management control units, floating point units, graphics processing units, digital signal processing units, etc. The processor may include a microprocessor, such as AMD AthlonTM, DuronTM or OpteronTM, ARM’s application, embedded or secure processors, IBM PowerPCTM, Intel’s Core, ItaniumTM, XeonTM, CeleronTM or other line of processors, etc. The processor 502 may be implemented using mainframe, distributed processor, multi-core, parallel, grid, or other architectures. Some

embodiments may utilize embedded technologies like application specific integrated circuits (ASICs), digital signal processors (DSPs), Field Programmable Gate Arrays (FPGAs), etc.
[040] Processor 502 may be disposed in communication with one or more input/output (I/O) devices via I/O interface 503. The I/O interface 503 may employ communication protocols/methods such as, without limitation, audio, analog, digital, monoaural, RCA, stereo, IEEE-1394, serial bus, universal serial bus (USB), infrared, PS/2, BNC, coaxial, component, composite, digital visual interface (DVI), high-definition multimedia interface (HDMI), RF antennas, S-Video, VGA, IEEE 802.11 a/b/g/n/x, Bluetooth, cellular (e.g., code-division multiple access (CDMA), high-speed packet access (HSPA+), global system for mobile communications (GSM), long-term evolution (LTE), WiMax, or the like), etc.
[041] Using the I/O interface 503, the computer system 501 may communicate with one or more I/O devices. For example, the input device 504 may be an antenna, keyboard, mouse, joystick, (infrared) remote control, camera, card reader, fax machine, dongle, biometric reader, microphone, touch screen, touchpad, trackball, sensor (e.g., accelerometer, light sensor, GPS, gyroscope, proximity sensor, or the like), stylus, scanner, storage device, transceiver, video device/source, visors, etc.
[042] Output device 505 may be a printer, fax machine, video display (e.g., cathode ray tube (CRT), liquid crystal display (LCD), light-emitting diode (LED), plasma, or the like), audio speaker, etc. In some embodiments, a transceiver 506 may be disposed in connection with the processor 502. The transceiver may facilitate various types of wireless transmission or reception. For example, the transceiver may include an antenna operatively connected to a transceiver chip (e.g., Texas Instruments WiLink WL1283, Broadcom BCM4750IUB8, Infineon Technologies X-Gold 618-PMB9800, or the like), providing IEEE 802.11a/b/g/n, Bluetooth, FM, global positioning system (GPS), 2G/3G HSDPA/HSUPA communications, etc.

[043] In some embodiments, the processor 502 may be disposed in communication with a communication network 508 via a network interface 507. The network interface 507 may communicate with the communication network 508. The network interface may employ connection protocols including, without limitation, direct connect, Ethernet (e.g., twisted pair 10/100/1000 Base T), transmission control protocol/internet protocol (TCP/IP), token ring, IEEE 802.11a/b/g/n/x, etc. The communication network 508 may include, without limitation, a direct interconnection, local area network (LAN), wide area network (WAN), wireless network (e.g., using Wireless Application Protocol), the Internet, etc. Using the network interface 507 and the communication network 508, the computer system 501 may communicate with devices 509 and 510. These devices may include, without limitation, personal computer(s), server(s), fax machines, printers, scanners, various mobile devices such as cellular telephones, smartphones (e.g., Apple iPhone, Blackberry, Android-based phones, etc.), tablet computers, eBook readers (Amazon Kindle, Nook, etc.), laptop computers, notebooks, gaming consoles (Microsoft Xbox, Nintendo DS, Sony PlayStation, etc.), or the like. In some embodiments, the computer system 701 may itself embody one or more of these devices.
[044] In some embodiments, the processor 502 may be disposed in communication with one or more memory devices (e.g., RAM 513, ROM 514, etc.) via a storage interface 512. The storage interface may connect to memory devices including, without limitation, memory drives, removable disc drives, etc., employing connection protocols such as serial advanced technology attachment (SATA), integrated drive electronics (IDE), IEEE-1394, universal serial bus (USB), fiber channel, small computer systems interface (SCSI), etc. The memory drives may further include a drum, magnetic disc drive, magneto-optical drive, optical drive, redundant array of independent discs (RAID), solid-state memory devices, solid-state drives, etc. Variations of memory devices may be used for implementing, for example, any databases utilized in this disclosure.
[045] The memory devices may store a collection of program or database components, including, without limitation, an operating system 516, user interface

application 517, user/application data 518 (e.g., any data variables or data records discussed in this disclosure), etc. The operating system 516 may facilitate resource management and operation of the computer system 501. Examples of operating systems include, without limitation, Apple Macintosh OS X, Unix, Unix-like system distributions (e.g., Berkeley Software Distribution (BSD), FreeBSD, NetBSD, OpenBSD, etc.), Linux distributions (e.g., Red Hat, Ubuntu, Kubuntu, etc.), IBM OS/2, Microsoft Windows (XP, Vista/7/8, etc.), Apple iOS, Google Android, Blackberry OS, or the like. User interface 517 may facilitate display, execution, interaction, manipulation, or operation of program components through textual or graphical facilities. For example, user interfaces may provide computer interaction interface elements on a display system operatively connected to the computer system 501, such as cursors, icons, check boxes, menus, scrollers, windows, widgets, etc. Graphical user interfaces (GUIs) may be employed, including, without limitation, Apple Macintosh operating systems’ Aqua, IBM OS/2, Microsoft Windows (e.g., Aero, Metro, etc.), Unix X-Windows, web interface libraries (e.g., ActiveX, Java, Javascript, AJAX, HTML, Adobe Flash, etc.), or the like.
[046] In some embodiments, computer system 501 may store user/application data 518, such as the data, variables, records, etc. as described in this disclosure. Such databases may be implemented as fault-tolerant, relational, scalable, secure databases such as Oracle or Sybase. Alternatively, such databases may be implemented using standardized data structures, such as an array, hash, linked list, structured text file (e.g., XML), table, or as hand-oriented databases (e.g., using HandStore, Poet, Zope, etc.). Such databases may be consolidated or distributed, sometimes among various computer systems discussed above. It is to be understood that the structure and operation of any computer or database component may be combined, consolidated, or distributed in any working combination.
[047] Additionally, in some embodiments, the server, messaging and instructions transmitted or received may emanate from hardware, including operating system, and program code (i.e., application code) residing in a cloud

implementation. Further, it should be noted that one or more of the systems and methods provided herein may be suitable for cloud-based implementation. For example, in some embodiments, some or all of the data used in the disclosed methods may be sourced from or stored on any cloud computing platform.
[048] An example scenario depicting the results of sentence classification performed by the disclosed system is described further. Herein, in the disclosed scenario, the sentence classification is performed to classify the comments into the following categories, Overtly Aggressive (OAG), Covertly Aggressive (CAG), and Non-aggressive (NAG).
[049] In the example scenario, KaggleTM dataset was used for determining the sentence classification by the disclosed system. The dataset was made of Wikipedia talk page comments and is contributed by Conversation AI. Each comment has a multi-class label, and there are a total of 6 classes, namely, toxic, severe toxic, obscene, threat, insult and identity hate. The data (159571 sentences) was split into training (90%), validation (10%) and 153164 test sentences.
[050] Another dataset TRACTM dataset was considered. TRACTM a dataset for Aggression identification, and contains 15,000 comments in both Hindi and English.
[051] As a preprocessing step, case folding of all the words and removal of punctuations was performed. The code for tokenization was taken which seems to properly separate the word tokens and special characters. For training the classification models, fastText embeddings of dimension 300 were used that were trained on a common crawl. For out of vocabulary (OOV) words, the embeddings randomly were initialized. For feature extraction, 200 LSTM units were used, each for capturing forward and backward contexts (total of 400). 20 capsules of dimension 15 and another 20 of dimension 20 were used for all the experiments. The number of routings was kept to be 3 as more routings could introduce overfitting.
[052] To further avoid overfitting, the dropout values were adjusted to 0.4. Cross-entropy was used as the loss function and Adam as an optimizer (with

default values) for all the models. All the hyperparameters values were obtained by tuning several models on the validation set and then finally selecting the model with minimum validation loss.
[053] The results were reported on a total of 3 datasets, two of which belong to TRAC-1 dataset. The evaluation metric for TRAC-1 is F1 score, while for Kaggle dataset is ROC-AUC. As illustrated in Table 1, the disclosed system performed better for all the datasets except for TRAC Twitter data, in which the model could not beat the previous Capsule Network.
Table 1: Results Of various architectures on publicly available datasets

Model Kaggle Toxic
Comment Classification (ROC-AUC) TRAC
Twitter
English
(F1-Score) TRAC Facebook
English (F1-Score)
Vanilla CNN 96.615 53.006 58.44
Bi-LSTM 97.357 54.147 61.223
Attention Networks 97.425 55.67 62.404
Hierarchical CNN 97.952 53.169 58.942
Bi-LSTM with Maxpool 98.209 53.391 62.02
Bi-LSTM and
Logistic
Regression 98.011 53.722 61.478
Pretrained LSTMs 98.05 53.166 62.9
CNN-Capsule 97.888 54.82 60.09
LSTM-Capsule 98.21 58.6 62.032
Disclosed system 98.464 57.953 63.532
[054] Some very strong and some recent baseline algorithms were used for comparing the results. Some of the examples for which the model is making mistakes were analysed. Samples were picked from TRAC Facebook English dataset. For analysis, LIME was used, which performs some perturbations on the input data to understand the relationship between input and the output data. It uses a local interpretable model to approximate the model in question and tries to create certain explanations of input data.

[055] From the confusion matrix (illustrated in FIG. 6), it was observe that the model gets most confused by predicting CAG comments as NAG. This can be because the words used in the sentence might not sound aggressive and the model labels them as neutral sentences. However, in reality, the sentence as a whole is a sarcastic one. For example, refer to FIG. 7 which goes wrong because the words it is focusing on, are all neutral words, but when combined, it is sarcasm on bridging the gap the between the poor and the middle class. Secondly, the model is also incorrectly predicting NAG and OAG comments as CAG equally, this is because there are certain comments against the government which are mostly present in CAG class. Refer to FIG. 9 and FIG. 7, in these comments, the government or some government official is being criticized, the attack is not directly pointed and there is hidden aggression.
[056] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[057] Various embodiments disclosed herein provide method and system for sentence classification using multi-dimension capsule networks. In an embodiment, a multi-dimension capsule network is employed for performing sentence classification. An important contribution of disclosed embodiments is that the dimensions of capsules in the capsule network can be changed, and by varying the dimensions of capsules, various sets of instantiation parameters can be derived from context vectors. These sets of instantiation parameters facilitates in robust classification of the sentences. For example, for a sentence, in certain scenarios, ten features may be enough and in some other scenarios aforementioned number of features may not be enough for the purpose of classification. Thus, an exact dimension ‘d’ of a capsule may not be determined. By allowing the capsules to have different dimensions the disclosed system and

method can capture requisite features for classification. In case the dimensions are kept very large, the back-propagation can identify features extracted therefrom as redundant and may ignore such features for sentence classification.
[058] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
[059] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[060] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed.

These examples are presented herein for purposes of illustration, and not
limitation. Further, the boundaries of the functional building blocks have been
arbitrarily defined herein for the convenience of the description. Alternative
boundaries can be defined so long as the specified functions and relationships
thereof are appropriately performed. Alternatives (including equivalents,
extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[061] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[062] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.

We Claim:
1. A processor implemented method for sentence classification, comprising:
employing, a Multi Dimension Capsule Network for classification of a sentence via one or more hardware processors, the Multi Dimension Capsule Network comprising a word embedding layer, a feature extraction layer composed of a plurality of Bi-directional Long short term memory (Bi-LSTMs), a primary capsule layer, a convolutional capsule layer and a softmax layer, wherein employing comprises:
determining, by the word embedding layer, an initial sentence representation for the sentence via the one or more hardware processors, the initial sentence representation comprising a concatenated embedding vector associated with a plurality of words of the sentence;
encoding, by the plurality of Bi-LSTMs, contextual semantics between the plurality of words of the sentence using the concatenated embedding vector to obtain a plurality of context vectors (Ci), via the one or more hardware processors;
obtaining, from the plurality of context vectors (Ci), a plurality of capsules associated with distinct dimensions by a filter of the primary capsule layer, via the one or more hardware processors, each capsule of the plurality of capsules comprising a set of instantiated parameters obtained from the plurality of context vectors (Ci), each of the plurality of capsules connected in multiple levels using shared transformation matrices and a routing model;
computing, by the Convolutional Capsule Layer, a final sentence representation for the sentence by determining coupling strength between child-parent pair capsules of the plurality of capsules connected in the multiple levels, via the one or more hardware processors; and
calculating, via the one or more hardware processors, class probabilities of the final sentence representation by passing the final sentence representation through the softmax layer for classification of the sentence.

2. The processor implemented method of claim 1, wherein the concatenated embedding vector comprises a fixed-length vector corresponding to the sentence and is representative of lexical-semantics of words of the sentence.
3. The processor implemented method of claim 1, wherein a context vector of the plurality of context vectors associated with a word comprises a right context and a left context between adjacent words of the sentence.
4. The processor implemented method of claim 1, wherein the filter (Wb) multiplies a context vector (Ci) with shared window to obtain a capsule pi,

g is a non-linear squash function,
d is dimension of the capsule (pi) and
dsen is a number of the plurality of Bi-LSTMs used to encode the plurality
of context vectors.
5. A system (501) for sentence classification comprising:
one or more memories (504); and
one or more first hardware processors (502), the one or more first memories (504) coupled to the one or more first hardware processors (502), wherein the one or more first hardware processors (502) are configured to execute programmed instructions stored in the one or more first memories (504), to:
employ, a Multi Dimension Capsule Network for classification of a sentence, the Multi Dimension Capsule Network comprising a word embedding layer, a feature extraction layer composed of a plurality of Bi-directional Long short term memory (Bi-LSTMs), a primary capsule layer, a convolutional capsule layer and a softmax layer, wherein employing comprises:

determine, by the word embedding layer, an initial sentence representation for the sentence, the initial sentence representation comprising a concatenated embedding vector associated with a plurality of words of the sentence;
encode, by the plurality of Bi-LSTMs, contextual semantics between the plurality of words of the sentence using the concatenated embedding vector to obtain a plurality of context vectors (Ci);
obtain, from the plurality of context vectors (Ci), a plurality of capsules associated with distinct dimensions by a filter of the primary capsule layer, each capsule of the plurality of capsules comprising a set of instantiated parameters obtained from the plurality of context vectors (Ci), each of the plurality of capsules connected in multiple levels using shared transformation matrices and a routing model;
compute, by the Convolutional Capsule Layer, a final sentence representation for the sentence by determining coupling strength between child-parent pair capsules of the plurality of capsules connected in the multiple levels; and
calculate class probabilities of the final sentence representation by passing the final sentence representation through the softmax layer for classification of the sentence.
6. The system of claim 5, wherein the concatenated embedding vector comprises a fixed-length vector corresponding to the sentence and is representative of lexical-semantics of words of the sentence.
7. The system of claim 5, wherein a context vector of the plurality of context vectors associated with a word comprises a right context and a left context between adjacent words of the sentence.
8. The system of claim 5, wherein the filter (Wb) multiplies a context vector (Ci) with shared window to obtain a capsule pi,

where, pi = g(WbCi),
(2Xdsen ) X d ,
Wb ∈R
g is a non-linear squash function,
d is dimension of the capsule (pi) and
dsen is a number of the plurality of Bi-LSTMs used to encode the plurality
of context vectors.

Documents

Application Documents

#	Name	Date
1	201921030331-IntimationOfGrant17-05-2024.pdf	2024-05-17
1	201921030331-STATEMENT OF UNDERTAKING (FORM 3) [26-07-2019(online)].pdf	2019-07-26
2	201921030331-PatentCertificate17-05-2024.pdf	2024-05-17
2	201921030331-REQUEST FOR EXAMINATION (FORM-18) [26-07-2019(online)].pdf	2019-07-26
3	201921030331-FORM 18 [26-07-2019(online)].pdf	2019-07-26
3	201921030331-CLAIMS [19-11-2021(online)].pdf	2021-11-19
4	201921030331-FORM 1 [26-07-2019(online)].pdf	2019-07-26
4	201921030331-FER_SER_REPLY [19-11-2021(online)].pdf	2021-11-19
5	201921030331-OTHERS [19-11-2021(online)].pdf	2021-11-19
5	201921030331-FIGURE OF ABSTRACT [26-07-2019(online)].jpg	2019-07-26
6	201921030331-FER.pdf	2021-10-19
6	201921030331-DRAWINGS [26-07-2019(online)].pdf	2019-07-26
7	201921030331-ORIGINAL UR 6(1A) FORM 26-181119.pdf	2019-11-20
7	201921030331-DECLARATION OF INVENTORSHIP (FORM 5) [26-07-2019(online)].pdf	2019-07-26
8	201921030331-ORIGINAL UR 6(1A) FORM 1-141119.pdf	2019-11-16
8	201921030331-COMPLETE SPECIFICATION [26-07-2019(online)].pdf	2019-07-26
9	201921030331-FORM-26 [15-11-2019(online)].pdf	2019-11-15
9	Abstract1.jpg	2019-10-25
10	201921030331-Proof of Right (MANDATORY) [12-11-2019(online)].pdf	2019-11-12
11	201921030331-FORM-26 [15-11-2019(online)].pdf	2019-11-15
11	Abstract1.jpg	2019-10-25
12	201921030331-COMPLETE SPECIFICATION [26-07-2019(online)].pdf	2019-07-26
12	201921030331-ORIGINAL UR 6(1A) FORM 1-141119.pdf	2019-11-16
13	201921030331-DECLARATION OF INVENTORSHIP (FORM 5) [26-07-2019(online)].pdf	2019-07-26
13	201921030331-ORIGINAL UR 6(1A) FORM 26-181119.pdf	2019-11-20
14	201921030331-DRAWINGS [26-07-2019(online)].pdf	2019-07-26
14	201921030331-FER.pdf	2021-10-19
15	201921030331-FIGURE OF ABSTRACT [26-07-2019(online)].jpg	2019-07-26
15	201921030331-OTHERS [19-11-2021(online)].pdf	2021-11-19
16	201921030331-FER_SER_REPLY [19-11-2021(online)].pdf	2021-11-19
16	201921030331-FORM 1 [26-07-2019(online)].pdf	2019-07-26
17	201921030331-CLAIMS [19-11-2021(online)].pdf	2021-11-19
17	201921030331-FORM 18 [26-07-2019(online)].pdf	2019-07-26
18	201921030331-PatentCertificate17-05-2024.pdf	2024-05-17
18	201921030331-REQUEST FOR EXAMINATION (FORM-18) [26-07-2019(online)].pdf	2019-07-26
19	201921030331-STATEMENT OF UNDERTAKING (FORM 3) [26-07-2019(online)].pdf	2019-07-26
19	201921030331-IntimationOfGrant17-05-2024.pdf	2024-05-17

Search Strategy

1	2021-05-1012-36-59E_10-05-2021.pdf