System And Method For Identification And Classification Of

< Back

System And Method For Identification And Classification Of Multilingual Messages In An Online Interactive Portal

Abstract: The present disclosure provides system and method for identification and classification of multilingual messages that would be considered inappropriate in an online interactive portal. The system may include processors to generate a set of data of intended inappropriate multilingual messages to train classification model. The set of data with labels is classified by assigning unique identifiers. The system includes pre-processing module to eliminate unwanted characters from set of data to train classification model. The classification model may be trained by multilingual representation module based at least in part on set of data with labels. The classification model determines whether set of data with one or more labels includes intended inappropriate multilingual messages. Furthermore, feedback loop module is utilised to retrain classification model recurrently to update set of data. The system is formed on Convolutional Neural Network (CNN) configured to classify multilingual messages as inappropriate in online interactive portal.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

01 August 2024

Publication Number

32/2024

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

Parent Application

Applicants

EXTRAMARKS EDUCATION INDIA PVT. LTD.

D-180, Sector-63 Noida, Uttar Pradesh 201301

Inventors

1. KULSHRESTHA, Ritvik

Extramarks Education India Pvt. Ltd., D-180, Sector-63 Noida, Uttar Pradesh 201301

2. SHARMA, Gaurav

Extramarks Education India Pvt. Ltd., D-180, Sector-63 Noida, Uttar Pradesh 201301

3. DWIVEDI, Deep

Extramarks Education India Pvt. Ltd., D-180, Sector-63 Noida, Uttar Pradesh 201301

4. DAS, Abhra

Extramarks Education India Pvt. Ltd., D-180, Sector-63 Noida, Uttar Pradesh 201301

5. GADHAWAL, Suman

Extramarks Education India Pvt. Ltd., D-180, Sector-63 Noida, Uttar Pradesh 201301

6. TRIPATHI, Vipin

Extramarks Education India Pvt. Ltd., D-180, Sector-63 Noida, Uttar Pradesh 201301

Specification

WE CLAIM:
1. A system for identification and classification of multilingual messages that would be
considered inappropriate in an online interactive portal, the system comprising:
5 one or more processors to:
generate a set of data of an intended inappropriate multilingual messages to train a
classification model;
classify the set of data with one or more labels by assigning one or more unique
identifiers;
10 a pre-processing module, communicatively coupled to the one or more processors,
to:
eliminate one or more unwanted characters from the set of data to train the
classification model;
a multilingual representation module, communicatively coupled to the one or more
15 processors and the pre-processing module, to:
train the classification model based at least in part on the set of data with the one
or more labels based on the one or more unique identifiers, the classification model being
generated by at least one deep-learning-based approach;
determine by the classification model whether the set of data with the one or more
20 labels includes the intended inappropriate multilingual messages, at least in part, on the
unique identifiers;
a feedback loop module, communicatively coupled to the multilingual
representation module, to:
retrain the classification model recurrently to update the set of data;
25 a memory storing computer-executable instructions, communicatively coupled to the one or more processors, to:
store at least the set of data with the one or more labels based on the one or more
unique identifiers to train the classification model, and
wherein the system is formed on a Convolutional Neural Network (CNN)
5 configured to classify the multilingual messages as inappropriate in the online interactive
portal.
2. The system according to claim 1, wherein the multilingual messages in the online interactive
portal are a set of text posted on at least one of a message board forum, a chatbot, a blog, or an
article.
10 3. The system according to claim 1, wherein the set of data comprises the multilingual messages
in various vernacular dialects such as Hindi, English, etc.
4. The system according to claim 1, wherein the classification of the set of data with the one or
more labels indicates that the set of text includes at least one of violence-inducing messages,
offensive, hate speech, or profanity.
15 5. The system according to claim 1, wherein the computer-executable instructions, when
executed by the one or more processors, further classify the set of data with the one or more
labels based on the one or more unique identifiers, wherein the one or more labels further
comprising:
a class label based on the one or more unique identifiers configured to classify the
20 set of data as offensive or non-offensive;
a flag label based on the one or more unique identifiers configured to classify
whether the set of data includes non-roman alphabets/numerals or is entirely
incomprehensible; and
a language label based on the one or more unique identifiers configured to classify
25 the set of data in the various vernacular dialects such as Hindi, English, etc.
6. The system according to claim 1, wherein the unique identifiers associated with the one or
more labels of the set of data is a numerical value (e.g., 0,1, or 2).
7. The system according to claim 1, wherein the unwanted characters eliminated from the set
of data includes at one of punctuation marks, links/hyperlinks, extra spaces, numbers, etc.
8. The system according to claim 1, wherein the classification model based on the set of data
of the intended inappropriate multilingual messages is generated by using the various deep5 learning-based approaches such as Support Vector Machines (SVM), K Nearest Neighbors
(KNN), Decision Tree (DT), Random Forest (RF), etc.
9. The system according to claim 1, wherein the feedback loop module retrains the
classification model recurrently to update the set of data basis a confidence value, wherein the
confidence value varies in the range from 0 to 1,
10 wherein the feedback loop module updates the set of data to retrain the
classification model when the confidence value is greater than to a value of 0.9, and
wherein the feedback loop module verifies the intended inappropriate multilingual
message by an administrator before updating the set of data to retrain the classification
model when the confidence value is less than to a value of 0.9.
15 10. A method for identification and classification of multilingual messages that would be
considered inappropriate in an online interactive portal, the method comprising:
generating, via one or more processors, a set of data of an intended inappropriate
multilingual messages to train a classification model;
classifying, via the one or more processors, the set of data with one or more labels
20 by assigning one or more unique identifiers;
eliminating, via a pre-processing module, one or more unwanted characters from
the set of data to train the classification model;
training, via a multilingual representation module, the classification model based
at least in part on the set of data with the one or more labels based on the one or more
25 unique identifiers, the classification model being generated by at least one deep-learningbased approach;
determining, via the classification model, whether the set of data with the one or more labels includes the intended inappropriate multilingual messages, at least in part,
on the unique identifiers;
retraining, via a feedback loop module, the classification model recurrently to
update the set of data;
5 storing, in a memory, at least the set of data with the one or more labels based on
the one or more unique identifiers to train the classification model, and
wherein the method is formed on a Convolutional Neural Network (CNN)
configured to classify the multilingual messages as inappropriate in the online interactive
portal.

Documents

Application Documents

#	Name	Date
1	202447058332-STATEMENT OF UNDERTAKING (FORM 3) [01-08-2024(online)].pdf	2024-08-01
2	202447058332-REQUEST FOR EXAMINATION (FORM-18) [01-08-2024(online)].pdf	2024-08-01
3	202447058332-FORM 18 [01-08-2024(online)].pdf	2024-08-01
4	202447058332-FORM 1 [01-08-2024(online)].pdf	2024-08-01
5	202447058332-DRAWINGS [01-08-2024(online)].pdf	2024-08-01
6	202447058332-DECLARATION OF INVENTORSHIP (FORM 5) [01-08-2024(online)].pdf	2024-08-01
7	202447058332-COMPLETE SPECIFICATION [01-08-2024(online)].pdf	2024-08-01
8	202447058332-FORM-26 [24-10-2024(online)].pdf	2024-10-24
9	202447058332-Proof of Right [04-12-2024(online)].pdf	2024-12-04
10	202447058332-RELEVANT DOCUMENTS [02-01-2025(online)].pdf	2025-01-02
11	202447058332-RELEVANT DOCUMENTS [02-01-2025(online)]-1.pdf	2025-01-02
12	202447058332-POA [02-01-2025(online)].pdf	2025-01-02
13	202447058332-POA [02-01-2025(online)]-1.pdf	2025-01-02
14	202447058332-FORM 3 [02-01-2025(online)].pdf	2025-01-02
15	202447058332-FORM 13 [02-01-2025(online)].pdf	2025-01-02
16	202447058332-FORM 13 [02-01-2025(online)]-1.pdf	2025-01-02
17	202447058332-FER.pdf	2025-08-22
18	202447058332-FORM 3 [22-10-2025(online)].pdf	2025-10-22

Search Strategy

1	202447058332_SearchStrategyNew_E_SearchHistory(41)E_20-05-2025.pdf