System And Method To Evaluate Authenticity Of A Message

< Back

System And Method To Evaluate Authenticity Of A Message

Abstract: A system (10) to evaluate authenticity of a message is disclosed. The system includes a processing subsystem including a data acquisition module to acquire at least one screenshot of the message from a user. The processing subsystem includes a data extraction module to extract textual data from the at least one screenshot. The processing subsystem includes a content analysis module to preprocess the textual data to obtain a processed data. The content analysis module is to convert the processed data into numerical vectors. The content analysis module is to identify linguistic patterns by feeding the numerical vectors to an attention based transformer model. The content analysis module is to evaluate a confidence score upon comparing linguistic patterns with historical linguistic patterns of a plurality of fraudulent messages. The processing subsystem includes a classification to classify the message as a fraudulent message when the confidence score is below a predefined threshold. FIG. 1

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

15 December 2023

Publication Number

27/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

PAVAN KUMAR

04020 TOWER-PINE, MAHAGUN MYWOODS, SECTOR16-C, GR. NOIDA, G. B. NAGAR, UTTAR PRADESH- 201308, INDIA

Inventors

1. PAVAN KUMAR

04020 TOWER-PINE, MAHAGUN MYWOODS, SECTOR16-C, GR. NOIDA, G. B. NAGAR, UTTAR PRADESH- 201308, INDIA

Specification

Description:FIELD OF INVENTION
[0001] Embodiments of the present disclosure relate to a field of data processing and more particularly to a system and a method to evaluate authenticity of a message.
BACKGROUND
[0002] Short message service and introduction of various messaging applications have revolutionized communication, enabling individuals and businesses to exchange information. However, fraudulent and malicious activities are increasing along with the convenience offered by the short message service and the messaging applications. The fraudulent and malicious activities include phishing, scams, and misinformation campaigns conducted through the short message service and the messaging applications.
[0003] The users often receive fraudulent messages which may appear legitimate but are attempts to deceive, defraud, and harm the users. The fraudulent messages may contain malicious links, personal information requests, and misleading content, posing risks to privacy of the users, finances, and overall well-being. Currently, the users face difficulty in identifying the fraudulent messages since existing solutions rely heavily on manual intervention, which is time-consuming, cumbersome, and prone to errors.
[0004] Hence, there is a need for an improved system and method to evaluate authenticity of a message to address the aforementioned issue(s).
OBJECTIVE OF THE INVENTION
[0005] An objective of the invention is to provide a system and method to evaluate authenticity of a message utilizing natural language processing and machine learning.
BRIEF DESCRIPTION
[0006] In accordance with an embodiment of the present disclosure, a system to evaluate authenticity of a message is provided. The system includes a processing subsystem hosted on a server and configured to execute on a network to control bidirectional communications among a plurality of modules. The processing subsystem includes a data acquisition module configured to acquire at least one screenshot of the message from a user through a user interface associated with a user device. The processing subsystem also includes a data extraction module operatively coupled to the data acquisition module. The data extraction module is configured to extract textual data from the at least one screenshot using an image processing technique. The processing subsystem also includes a content analysis module operatively coupled to the data extraction module. The content analysis module is configured to preprocess the textual data using a natural language processing technique to obtain a processed data. The content analysis module is also configured to convert the processed data into one or more numerical vectors using a word embedding technique. The content analysis module is also configured to identify one or more linguistic patterns by feeding the one or more numerical vectors to an attention based transformer model. The content analysis module is further configured to evaluate a confidence score upon comparing the one or more linguistic patterns with one or more historical linguistic patterns of a plurality of fraudulent messages. The processing subsystem further includes a classification module operatively coupled to the content analysis module. The classification module is configured to classify the message as a fraudulent message when the confidence score is below a predefined threshold. The classification module is also configured to provide a predefined color code to the textual data upon classifying the message as the fraudulent message, thereby evaluating the authenticity of the message. In one embodiment, classification module may increase an intensity of the color coding as the confidence score decreases and vice versa.
[0007] In accordance with another embodiment of the present disclosure, a method to evaluate authenticity of a message is provided. The method includes acquiring, by a data acquisition module, at least one screenshot of the message from a user through a user interface associated with a user device. The method also includes extracting, by a data extraction module, textual data from the at least one screenshot using an image processing technique. The method also includes preprocessing, by a content analysis module, the textual data using a natural language processing technique to obtain a processed data. The method also includes converting, by the content analysis module, the processed data into one or more numerical vectors using a word embedding technique. The method also includes identifying, by the content analysis module, one or more linguistic patterns by feeding the one or more numerical vectors to an attention based transformer model. The method also includes evaluating, by the content analysis module, a confidence score upon comparing the one or more linguistic patterns with one or more historical linguistic patterns of a plurality of fraudulent messages. The method also includes classifying, by a classification module, the message as a fraudulent message when the confidence score is below a predefined threshold. The method further includes providing, by a classification module, a predefined color code to the textual data upon classifying the message as the fraudulent message, thereby evaluating the authenticity of the message.
[0008] To further clarify the advantages and features of the present disclosure, a more particular description of the disclosure will follow by reference to specific embodiments thereof, which are illustrated in the appended figures. It is to be appreciated that these figures depict only typical embodiments of the disclosure and are therefore not to be considered limiting in scope. The disclosure will be described and explained with additional specificity and detail with the appended figures.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] The disclosure will be described and explained with additional specificity and detail with the accompanying figures in which:
[0010] FIG. 1 is a block diagram representation of a system to evaluate authenticity of a message in accordance with an embodiment of the present disclosure;
[0011] FIG. 2 is a block diagram representation of one embodiment of the system of FIG. 1 in accordance with an embodiment of the present disclosure.
[0012] FIG. 3 is a schematic representation of an exemplary embodiment of the system of FIG. 1, in accordance with an embodiment of the present disclosure;
[0013] FIG. 4 is a block diagram of a computer or a server in accordance with an embodiment of the present disclosure; and
[0014] FIG. 5 is a flow chart representing the steps involved in a method to evaluate authenticity of a message in accordance with an embodiment of the present disclosure.
[0015] Further, those skilled in the art will appreciate that elements in the figures are illustrated for simplicity and may not have necessarily been drawn to scale. Furthermore, in terms of the construction of the device, one or more components of the device may have been represented in the figures by conventional symbols, and the figures may show only those specific details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the figures with details that will be readily apparent to those skilled in the art having the benefit of the description herein.
DETAILED DESCRIPTION
[0016] For the purpose of promoting an understanding of the principles of the disclosure, reference will now be made to the embodiment illustrated in the figures and specific language will be used to describe them. It will nevertheless be understood that no limitation of the scope of the disclosure is thereby intended. Such alterations and further modifications in the illustrated system, and such further applications of the principles of the disclosure as would normally occur to those skilled in the art are to be construed as being within the scope of the present disclosure.
[0017] The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such a process or method. Similarly, one or more devices or sub-systems or elements or structures or components preceded by "comprises... a" does not, without more constraints, preclude the existence of other devices, sub-systems, elements, structures, components, additional devices, additional sub-systems, additional elements, additional structures, or additional components. Appearances of the phrase "in an embodiment", "in another embodiment" and similar language throughout this specification may, but not necessarily do, all refer to the same embodiment.
[0018] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the art to which this disclosure belongs. The system, methods, and examples provided herein are only illustrative and not intended to be limiting.
[0019] In the following specification and the claims, reference will be made to a number of terms, which shall be defined to have the following meanings. The singular forms “a”, “an”, and “the” include plural references unless the context clearly dictates otherwise.
[0020] Embodiments of the present disclosure relate to a system and a method to evaluate authenticity of a message. The system includes a processing subsystem hosted on a server and configured to execute on a network to control bidirectional communications among a plurality of modules. The processing subsystem includes a data acquisition module configured to acquire at least one screenshot of the message from a user through a user interface associated with a user device. The processing subsystem also includes a data extraction module operatively coupled to the data acquisition module. The data extraction module is configured to extract textual data from the at least one screenshot using an image processing technique. The processing subsystem also includes a content analysis module operatively coupled to the data extraction module. The content analysis module is configured to preprocess the textual data using a natural language processing technique to obtain a processed data. The content analysis module is also configured to convert the processed data into one or more numerical vectors using a word embedding technique. The content analysis module is also configured to identify one or more linguistic patterns by feeding the one or more numerical vectors to an attention based transformer model. The content analysis module is further configured to evaluate a confidence score upon comparing the one or more linguistic patterns with one or more historical linguistic patterns of a plurality of fraudulent messages. The processing subsystem further includes a classification module operatively coupled to the content analysis module. The classification module is configured to classify the message as a fraudulent message when the confidence score is below a predefined threshold. The classification module is also configured to provide a predefined color code to the textual data upon classifying the message as the fraudulent message, thereby evaluating the authenticity of the message.
[0021] FIG. 1 is a block diagram representation of a system (10) to evaluate authenticity of a message in accordance with an embodiment of the present disclosure. The system (10) includes a processing subsystem (20) hosted on a server (30) and configured to execute on a network (40) to control bidirectional communications among a plurality of modules. In a specific embodiment, an integrated database (50) may be associated with the processing subsystem (20) to store data associated with the plurality of modules. In some embodiments, the integrated database (50) may include a structured query language database, a non-structured query language database, a columnar database and the like.
[0022] Further, in one embodiment, the server (30) may be a cloud-based server. In another embodiment, the server (30) may be a local server. In one example, the network (40) may be a private or public local area network (LAN) or wide area network (WAN), such as the Internet. In another embodiment, the network (40) may include both wired and wireless communications according to one or more standards and/or via one or more transport mediums.
[0023] Furthermore, in one example, the network (40) may include wireless communications according to one of the 802.11 or Bluetooth specification sets, or another standard or proprietary wireless communication protocol. In yet another embodiment, the network (40) may also include communications over a terrestrial cellular network, including, a GSM (global system for mobile communications), CDMA (code division multiple access), and/or EDGE (enhanced data for global evolution) network.
[0024] Additionally, the processing subsystem (20) includes a data acquisition module (60) configured to acquire at least one screenshot of the message from a user through a user interface associated with a user device. In one embodiment, the data acquisition module (60) may be configured to acquire the message from the user device directly through the network (40). In one embodiment, the user device may include, but is not limited to, a phone, a computer, a tab, a personal digital assistant, and the like. In some embodiments, the message may include a WhatsApp message, facebook message, an email, a flash message, a text message and the like.
[0025] Moreover, for example, consider a scenario in which a user X receives a text message in the phone of the user X. The text message says ‘congratulations! You won the gift card. Please click on the following link to claim your gift card”. The user X may capture the at least one screenshot of the message and the data acquisition module (60) may acquire the screenshot from the user X through the user interface associated with the phone of the user X.
[0026] Also, the processing subsystem (20) includes a data extraction module (70) operatively coupled to the data acquisition module (60). The data extraction module (70) is configured to extract textual data from the at least one screenshot using an image processing technique. In one embodiment, the image processing technique may include optical character recognition. In continuation with the ongoing example, the data extraction module (70) may extract textual data from the at least one screenshot as follows: ‘congratulations! You won the gift card. Please click on the following link to claim your gift card”.
[0027] Further, the processing subsystem (20) includes a content analysis module (80) operatively coupled to the data extraction module (70). The content analysis module (80) is configured to preprocess the textual data using a natural language processing technique to obtain a processed data. In one embodiment, the natural language processing may include stemming, lemmatization, and tokenizing. As used herein, the tokenization is a process of breaking the textual data into tokens. The tokens may include words, phrases and symbols. As used herein, the stemming is a process of reducing the words into respective base forms. As used herein, lemmatization is a process of reducing the base forms into dictionary forms.
[0028] Furthermore, in continuation with the ongoing example, the content analysis module (80) may tokenize the textual data as follows: 'Congratulations', 'You', 'won', 'the', 'gift card', 'Please', 'click', 'on', 'the', 'following', 'link', 'to', 'claim', 'your', 'gift card'. The content analysis module (80) may perform stemming on tokenized textual data. After stemming, the tokenized textual data may look like as follows: 'congratul', 'you', 'won', 'the', 'gift card', 'pleas', 'click', 'on', 'the', 'follow', 'link', 'to', 'claim', 'your', 'gift card'. The content analysis module (80) may further perform lemmatization on stemmed textual data to obtain the processed data. and the results area as follows: 'congratulation', 'You', 'won', 'the', 'gift card’', 'Please', 'click', 'on', 'the', 'following', 'link', 'to', 'claim', 'your', 'gift card'.
[0029] Moreover, the content analysis module (80) is configured to convert the processed data into one or more numerical vectors using a word embedding technique. As used herein, the word embedding technique may be defined as a technique to represent the words in a continuous vector space, where the words with similar meaning are represented by vectors which are close to each other.
[0030] Additionally, in continuation with the ongoing example, the content analysis module (80) may convert the processed data into the one or more numerical vectors as follows: 'Congratulations': [0.1, 0.2, 0.3], 'You': [0.4, 0.5, 0.6], 'won': [0.7, 0.8, 0.9], 'the': [1.0, 1.1, 1.2], 'gift card': [4.0, 4.1, 4.2], 'Please': [1.6, 1.7, 1.8], 'click': [1.9, 2.0, 2.1], 'on': [2.2, 2.3, 2.4], 'following': [2.5, 2.6, 2.7], 'link': [2.8, 2.9, 3.0], 'to': [3.1, 3.2, 3.3], 'claim': [3.4, 3.5, 3.6], 'your': [3.7, 3.8, 3.9], 'gift card': [4.0, 4.1, 4.2]
[0031] Also, the content analysis module (80) is configured to identify one or more linguistic patterns by feeding the one or more numerical vectors to an attention based transformer model. In one embodiment, the attention based transformer model may be trained to identify the one or more linguistic patterns from the one or more numerical vectors by comparing the one or more numerical vectors with one or more prestored vectors. In continuation with the ongoing example, consider a scenario in which, the attention based transformer model may be trained with phrase “you won a gift card” associated with a plurality of fraudulent messages and the corresponding one or more prestored vectors associated with the phrase may include 'You': [0.4, 0.5, 0.6], 'won': [0.7, 0.8, 0.9],’a’:[1.4,1.5,1.6], 'gift card': [4.0, 4.1, 4.2], 'claim': [3.4, 3.5, 3.6], 'using': [0.5, 0.6, 0.7], 'link': [2.8, 2.9, 3.0].
[0032] Further, the attention based transformer model may calculate cosine similarity between the one or more numerical vectors and the one or more prestored vectors as follows: 'You': 1.0, 'won': 0.99, 'a': 0.87, 'gift card': 0.99, ‘claim’:0.99, ‘using’: 0.60, ‘link’: 0.99”. Based on the cosine similarly, the content analysis module (80) may identify the one or more linguistic patterns present in each of the one or more numerical vectors and the one or more prestored vectors. The one or more linguistic patterns are as follows: ‘you’, ‘won, ‘gift card’, ‘claim’, ‘link’. In one embodiment, the content analysis module may be defined using convolutional neural networks.
[0033] Furthermore, the content analysis module (80) is configured to evaluate a confidence score upon comparing the one or more linguistic patterns with one or more historical linguistic patterns of the plurality of fraudulent messages. In continuation with the ongoing example, the content analysis module (80) may further compare ‘you’, ‘won, ‘gift card’, ‘claim’, ‘link’ with the one or more historical linguistic patterns of the plurality of fraudulent messages. Consider a scenario in which, the one or more historical linguistic patterns may also contain you’, ‘won, ‘gift card’, ‘claim’, ‘link’. In such a scenario, the evaluated confidence score may be 0 out of 100 since all of the one or more linguistic patterns are matching with the one or more historical linguistic patterns.
[0034] Moreover, the processing subsystem (20) includes a classification module (90) operatively coupled to the content analysis module (80). The classification module (90) is configured to classify the message as a fraudulent message when the confidence score is below a predefined threshold. In continuation with the ongoing example, consider a scenario in which the predefined threshold may include the confidence score of at least 75. Since the confidence score evaluated is below 75, the classification module (90) may classify the message as the fraudulent message.
[0035] Additionally, the classification module (90) is configured to provide a predefined color code to the textual data upon classifying the message as the fraudulent message, thereby evaluating the authenticity of the message. In continuation with the ongoing example, the classification module (90) may provide a ‘red color’ to the textual data upon classifying the message as the fraudulent message. In one embodiment, the data acquisition module (60) is configured to generate a ticket number upon acquiring the at least one screenshot of the message to enable the user to fetch one or more details regarding the message. In such an embodiment, the one or more details may include the confidence score, and classification of the message. In one embodiment, the classification module (90) may generate a report highlighting the confidence score, and the classification of the message, and a brief explanation of the analysis lead to the classification. The classification module (90) may render the report in the user interface associated with the user device.
[0036] FIG. 2 is a block diagram representation of one embodiment of the system (10) of FIG. 1 in accordance with an embodiment of the present disclosure. The system (10) of FIG. 1 includes the data acquisition module (60), the data extraction module (70), the content analysis module (80) and the classification module (90) . In one embodiment, the system (10) of FIG. 1 may include the processing subsystem (20) including a link analysis module (100) configured to analyze the textual data to identify one or more universal resource locators (URLs) from the textual data. In such an embodiment, the link analysis module (100) may be configured to compare spellings of the one or more universal resource locators with one or more prestored universal resource locators stored in the integrated database to generate a reliability score. In one embodiment, the link analysis module (100) may also be configured to classify the message as the fraudulent message when the reliability score generated is below a predefined value.
[0037] Further, in continuation with the ongoing example, consider a scenario in which the text message received by the user X includes one or more URLs. The link analysis module (100) may analyze the textual data to identify the one or more URLs from the textual data. Consider the scenario in which, the one or more URLs identified by the link analysis module (100) may be ‘http://amazn.com’. The link analysis module (100) may compare ‘http://amazn.com’ with one or more prestored URLs stored in the integrated database to generate a reliability score. The one or more prestored URLs may include ‘https://amazon.com’ and the reliability score generated by the link analysis module (100) may be a low value due to spelling mistakes present in the ‘http://amazn.com’. The link analysis module (100) may further classify the message as the fraudulent message since the reliability score is below the predefined value.
[0038] Furthermore, in one embodiment, the processing subsystem (20) may include a validation module (110) configured to detect at least one domain name from the at least one screenshot to identify one or more internet protocol addresses associated with the domain name. In such an embodiment, the validation module (110) may be configured to compare the one or more internet protocol addresses identified with one or more flagged internet protocol addresses to evaluate a threat score. In one embodiment, the validation module (110) may also be configured to classify the message as the fraudulent message when the threat score evaluated is above a predefined threshold.
[0039] Moreover, in continuation with the ongoing example, the validation module (110) may detect the domain name ‘amazn.com’ from the at least one screenshot to identify the one or more internet protocol address associated with the domain name. The one or more internet protocol address may be as follows: 52.94.236.248, 205.251.242.103, 54.239.28.85. The validation module (110) may match each of the one or more internet protocol address with the one or more flagged internet protocol addresses stored in the integrated database to evaluate the threat score. The threat score evaluated by the validation module (110) may be a higher value when at least one of the one or more internet protocol addresses i.e. 52.94.236.248, 205.251.242.103, 54.239.28.85 match with at least one of the one or more flagged internet protocol address. Upon obtaining the threat score above the predefined threshold the validation module (110) may classify the message as the fraudulent message.
[0040] Additionally, in one embodiment, the processing subsystem (20) may include a recommendation module (120) configured to provide one or more recommendations to the user upon classifying the message as the fraudulent message. In such an embodiment, the one or more recommendations may include one or more procedures to be followed by the user while interacting with the message. In continuation with the ongoing example, the recommendation module (120) may recommend the user X to delete the message without clicking on the link upon classifying the message as the fraudulent message. The recommendation module (120) may also recommend opening the link in a sandbox environment upon classifying the message as the fraudulent message. As used herein, the sandbox environment refers to a virtualized and isolated environment where potentially malicious software or files may be executed and analysed in a secure manner.
[0041] Also, in one embodiment, the processing subsystem (20) may include an authentication module (130) configured to classify the message as a genuine message upon detecting a digital signature in the message. Consider another scenario in which ,the user X may receive another text message having the digital signature in the text message. The authentication module (130) may classify the text message as the genuine message upon detecting the digital signature in the text message.
[0042] Further, in some embodiments, the processing subsystem (20) may include a training module (140) configured to train the attention based transformer model based on the one or more linguistic patterns provided by the user. In such an embodiment, the training module (140) may also be configured to update the one or more prestored universal resource locators stored in the integrated database (50) with the one or more universal resource locators provided by the user. In continuation with the ongoing example, the training module (140) may train the attention based transformer model based on the one or more linguistic patterns provided by the user X. The one or more linguistic patterns may include a word including ‘lottery’ and a corresponding numerical vector. Similarly, the training module (140) may enable the user X to update the one or more prestored universal resource locators stored in the integrated database with a new website address. The training module may (140) dynamically update predefined rules and patterns associated with safe or unsafe messages.
[0043] FIG. 3 is a schematic representation of an exemplary embodiment (150) of the system (10) of FIG. 1 in accordance with an embodiment of the present disclosure. For example, consider a scenario in which a user Y (160) receives a text message on the phone of user Y (160). The text message says ‘your KYC has been updated. To get cashback click on the link.’ The user Y (160) may capture at least one screenshot of the message, and the data acquisition module (60) may acquire the screenshot from user Y (160) through the user interface associated with the phone of user Y (160). The data extraction module (70) may extract textual data from the at least one screenshot as follows: ‘your KYC has been updated. To get cashback click on the link.’
[0044] Further, the content analysis module (80) may tokenize the textual data as follows: 'your', 'KYC', 'has', 'been', 'updated', 'To', 'get', 'cashback', 'click', 'on', 'the', 'link'. The content analysis module (80) may perform stemming on tokenized textual data. The tokenized textual data may look like as follows after stemming: 'your', 'KYC', 'ha', 'been', 'updat', 'To', 'get', 'cashback', 'click', 'on', 'the', 'link'. The content analysis module (80) may further perform lemmatization on the textual data to obtain the processed data. and the results area as follows: 'your', 'KYC', 'have', 'been', 'updated', 'To', 'get', 'cashback', 'click', 'on', 'the', 'link'. The content analysis module (80) may convert the processed data into the one or more numerical vectors as follows: 'your': [0.1, 0.2, 0.3], 'KYC': [0.4, 0.5, 0.6], 'have': [0.7, 0.8, 0.9], 'been': [1.0, 1.1, 1.2], 'updated': [4.0, 4.1, 4.2], 'To': [1.6, 1.7, 1.8], 'get': [1.9, 2.0, 2.1], 'cashback': [2.2, 2.3, 2.4], 'click': [2.5, 2.6, 2.7], 'on': [2.8, 2.9, 3.0], 'the': [3.1, 3.2, 3.3], 'link': [3.4, 3.5, 3.6].
[0045] Furthermore, the content analysis module (80) may identify one or more linguistic patterns by feeding the one or more numerical vectors to an attention based transformer model. The attention-based transformer model may be trained with the phrase “ updated your KYC to get cashback” associated with a plurality of fraudulent messages. The corresponding one or more prestored vectors associated with the phrase may include 'update': [4.0, 4.1, 4.2] 'your': [0.1, 0.2, 0.3], 'KYC': [0.4, 0.5, 0.6], 'To': [1.6, 1.7, 1.8], 'get': [1.9, 2.0, 2.1], 'cashback': [2.2, 2.3, 2.4].
[0046] Moreover, the attention-based transformer model may calculate cosine similarity between the one or more numerical vectors and the one or more prestored vectors. The content analysis module (80) may identify the one or more linguistic patterns present in each of the one or more numerical vectors and the one or more prestored vectors. The one or more linguistic patterns identified may be as follows: ‘update’, ‘your’, ‘KYC’, ‘been’, ‘to’, ‘get’, ‘cashback’. In continuation with the ongoing example, the content analysis module (80) may further compare ‘update’, ‘your’, ‘KYC’, ‘been’, ‘to’, ‘get’, ‘cashback’ with the one or more historical linguistic patterns of the plurality of fraudulent messages.
[0047] Additionally, consider a scenario in which, the one or more historical linguistic patterns may also contain ‘update’, ‘your’, ‘KYC’, ‘been’, ‘to’, ‘get’, ‘cashback’. In such a scenario, the evaluated confidence score may be 0 out of 100 since all of the one or more linguistic patterns are matching with the one or more historical linguistic patterns. In continuation with the ongoing example, consider another scenario in which the predefined threshold may include the confidence score of at least 75. Since the confidence score evaluated is below 75, the classification module (90) may classify the message as the fraudulent message.
[0048] Also, the classification module (90) may provide a ‘red colour’ to the textual data upon classifying the message as the fraudulent message. In one embodiment, the data acquisition module (60) is configured to generate a ticket number upon acquiring the at least one screenshot of the message to enable the user to fetch one or more details regarding the message. In such an embodiment, the one or more details may include the confidence score, and classification of the message.
[0049] Further, consider a scenario in which the text message received by user Y (160) includes one or more URLs. The link analysis module (100) may analyse the textual data to identify the one or more URLs from the textual data. Consider the scenario in which, the one or more URLs identified by the link analysis module (100) may be ‘http://ptm.com’. The link analysis module (100) may compare ‘http://ptm.com’ with one or more prestored URLs stored in the integrated database to generate a reliability score. The one or more prestored URLs may include ‘https://paytm.com’, and the reliability score generated by the link analysis module (100) may be a low value due to spelling mistakes present in ‘http://ptm.com’. The link analysis module (100) may further classify the message as the fraudulent message since the reliability score is below the predefined value.
[0050] Furthermore, the validation module (110) may detect the domain name ‘ptm.com’ from the at least one screenshot to identify the one or more internet protocol addresses associated with the domain name. The one or more internet protocol addresses may be as follows: 52.94.236.248, 205.251.242.103, 54.239.28.85. The validation module (110) may match each of the one or more internet protocol addresses with the one or more flagged internet protocol addresses. The recommendation module (120) may recommend user Y (160) to delete the message without clicking on the link upon classifying the message as the fraudulent message.
[0051] Moreover, consider another scenario in which user Y (160) may receive another text message having the digital signature in the text message. The authentication module (130) may classify the text message as the genuine message upon detecting the digital signature in the text message. The training module (140) may train the attention-based transformer model based on the one or more linguistic patterns provided by user Y (160). The one or more linguistic patterns may include a word including ‘pan number’ and a corresponding numerical vector. Similarly, the training module (140) may enable user Y (160) to update the one or more prestored universal resource locators stored in the integrated database with a new website addresses.
[0052] FIG. 4 is a block diagram of a computer or a server (30) in accordance with an embodiment of the present disclosure. The server (30) includes processor(s) (170), and memory (180) operatively coupled to the bus (190). The processor(s) (170), as used herein, includes any type of computational circuit, such as, but not limited to, a microprocessor, a microcontroller, a complex instruction set computing microprocessor, a reduced instruction set computing microprocessor, a very long instruction word microprocessor, an explicitly parallel instruction computing microprocessor, a digital signal processor, or any other type of processing circuit, or a combination thereof.
[0053] The memory (180) includes several subsystems stored in the form of executable program which instructs the processor to perform the method steps illustrated in FIG. 1. The memory (180) is substantially similar to the system (10) of FIG.1. The memory (180) has the following subsystems: the processing subsystem (20) including the data acquisition module (60), the data extraction module (70), the content analysis module (80), the classification module (90), the link analysis module (100), the validation module (110), the recommendation module (120), the authentication module (130) and the training module (140). The plurality of modules of the processing subsystem (20) performs the functions as stated in FIG. 1 and FIG. 2. The bus (190) as used herein refers to be the internal memory channels or computer network that is used to connect computer components and transfer data between them. The bus (190) includes a serial bus or a parallel bus, wherein the serial bus transmit data in bit-serial format and the parallel bus transmit data across multiple wires. The bus (190) as used herein, may include but not limited to, a system bus, an internal bus, an external bus, an expansion bus, a frontside bus, a backside bus, and the like.
[0054] The processing subsystem (20) includes a data acquisition module (60) configured to acquire at least one screenshot of the message from a user through a user interface associated with a user device. The processing subsystem (20) also includes a data extraction module (70) operatively coupled to the data acquisition module (60). The data extraction module (70) is configured to extract textual data from the at least one screenshot using an image processing technique. The processing subsystem (20) also includes a content analysis module (80) operatively coupled to the data extraction module (70). The content analysis module (80) is configured to preprocess the textual data using a natural language processing technique to obtain a processed data. The content analysis module (80) is also configured to convert the processed data into one or more numerical vectors using a word embedding technique. The content analysis module (80) is also configured to identify one or more linguistic patterns by feeding the one or more numerical vectors to an attention based transformer model. The content analysis module (80) is further configured to evaluate a confidence score upon comparing the one or more linguistic patterns with one or more historical linguistic patterns of a plurality of fraudulent messages. The processing subsystem (20) further includes a classification module (90) operatively coupled to the content analysis module (80). The classification module (90) is configured to classify the message as a fraudulent message when the confidence score is below a predefined threshold. The classification module (90) is also configured to provide a predefined color code to the textual data upon classifying the message as the fraudulent message, thereby evaluating the authenticity of the message.
[0055] The processing subsystem (20) also includes a link analysis module (100) configured to analyze the textual data to identify one or more universal resource locators from the textual data. The link analysis module (100) is also configured to compare spellings of the one or more universal resource locators with one or more prestored universal resource locators stored in the integrated database (50) to generate a reliability score. The link analysis module (100) is further configured to classify the message as the fraudulent message when the reliability score generated is below a predefined value.
[0056] The processing subsystem (20) also includes a validation module (110) configured to detect at least one domain name from the at least one screenshot to identify one or more internet protocol addresses associated with the domain name. The validation module (110) is also configured to compare the one or more internet protocol addresses identified with one or more flagged internet protocol addresses to evaluate a threat score. The validation module (110) is further configured to classify the message as the fraudulent message when the threat score evaluated is above a predefined threshold.
[0057] The processing subsystem (20) also includes a recommendation module (120) configured to provide one or more recommendations to the user upon classifying the message as the fraudulent message. The one or more recommendations includes one or more procedures to be followed by the user while interacting with the message.
[0058] The processing subsystem (20) also includes an authentication module (130) configured to classify the message as a genuine message upon detecting a digital signature in the message.
[0059] The processing subsystem (20) also includes a training module (140) configured to train the attention based transformer module based on the one or more linguistic patterns provided by the user. The training module (140) is also configured to update the one or more prestored universal resource locators stored in the integrated database (50) with the one or more universal resource locators provided by the user.
[0060] Computer memory elements may include any suitable memory device(s) for storing data and executable program, such as read only memory, random access memory, erasable programmable read only memory, electrically erasable programmable read only memory, hard drive, removable media drive for handling cards and the like. Embodiments of the present subject matter may be implemented in conjunction with program modules, including functions, procedures, data structures, and application programs, for performing tasks, or defining abstract data types or low-level hardware contexts. Executable program stored on any of the above-mentioned storage media may be executable by the processor(s) (170).
[0061] FIG. 5 is a flow chart representing the steps involved in a method (300) to evaluate authenticity of a message in accordance with an embodiment of the present disclosure. The method (300) includes acquiring at least one screenshot of the message from a user through a user interface associated with a user device in step 310. In one embodiment, acquiring at least one screenshot of the message from a user through a user interface associated with a user device includes acquiring at least one screenshot of the message from a user through a user interface associated with a user device by a data acquisition module. In one embodiment, the user device may include, but is not limited to, a phone, a computer, a tab, a personal digital assistant, and the like. In some embodiments, the message may include a WhatsApp message, messenger message, an email, a flash message, a text message and the like.
[0062] The method (300) also includes extracting textual data from the at least one screenshot using an image processing technique in step 320. In one embodiment, extracting textual data from the at least one screenshot using an image processing technique includes extracting textual data from the at least one screenshot using an image processing technique by a data extraction module. In one embodiment, the image processing technique may include optical character recognition.
[0063] The method (300) also includes preprocessing the textual data using a natural language processing technique to obtain a processed data in step 330. In one embodiment, preprocessing the textual data using a natural language processing technique to obtain a processed data includes preprocessing the textual data using a natural language processing technique to obtain a processed data by a content analysis module. In one embodiment, the natural language processing may include stemming, lemmatization, and tokenizing. As used herein, the tokenization is a process of breaking the textual data into tokens. The tokens may include words, phrases and symbols. As used herein, the stemming is a process of reducing words into respective base forms. As used herein, lemmatization is a process of reducing the base forms into dictionary forms.
[0064] The method (300) also includes converting the processed data into one or more numerical vectors using a word embedding technique in step 340. In one embodiment, converting the processed data into one or more numerical vectors using a word embedding technique includes converting the processed data into one or more numerical vectors using a word embedding technique by the content analysis module. As used herein, the word embedding technique may be defined as a technique to represent words in a continuous vector space, where words with similar meaning are represented by vectors that are close to each other.
[0065] The method (300) also includes identifying one or more linguistic patterns by feeding the one or more numerical vectors to an attention based transformer model in step 350. In one embodiment, identifying one or more linguistic patterns by feeding the one or more numerical vectors to an attention based transformer model includes identifying one or more linguistic patterns by feeding the one or more numerical vectors to an attention based transformer model by the content analysis module. In one embodiment, the attention based transformer model may be trained to identify the one or more linguistic patterns from the one or more numerical vectors by comparing the one or more numerical vectors with one or more prestored vectors.
[0066] The method (300) also includes evaluating a confidence score upon comparing the one or more linguistic patterns with one or more historical linguistic patterns of a plurality of fraudulent messages in step 360. In one embodiment, evaluating a confidence score upon comparing the one or more linguistic patterns with one or more historical linguistic patterns of a plurality of fraudulent messages includes evaluating a confidence score upon comparing the one or more linguistic patterns with one or more historical linguistic patterns of a plurality of fraudulent messages by the content analysis module.
[0067] The method (300) also includes classifying the message as a fraudulent message when the confidence score is below a predefined threshold in step 370. In one embodiment, classifying the message as a fraudulent message when the confidence score is below a predefined threshold includes classifying the message as a fraudulent message when the confidence score is below a predefined threshold by a classification module.
[0068] The method (300) also includes providing a predefined color code to the textual data upon classifying the message as the fraudulent message, thereby evaluating the authenticity of the message in step 380. In one embodiment, providing a predefined color code to the textual data upon classifying the message as the fraudulent message includes providing a predefined color code to the textual data upon classifying the message as the fraudulent message by the classification module.
[0069] Various embodiments of the system and method to evaluate authenticity of a message described above enable various advantages. The data acquisition module is capable of acquiring at least one screenshot of the message from the user through the user interface associated with the user device, thereby providing flexibility to the user to submit at least one screenshot of the message to the data acquisition module. The data extraction module is capable of extracting the textual data from the at least one screenshot using the image processing technique, thereby facilitating efficient analysis of the content of the message.
[0070] Further, the content analysis module is capable of identifying the one or more linguistic patterns by feeding the one or more numerical vectors to an attention based transformer model, thereby ensuring identification of the one or more linguistic patterns which may be part of the fraudulent message. The content analysis module is capable of evaluating the confidence score upon comparing the one or more linguistic patterns with one or more historical linguistic patterns of the plurality of fraudulent messages, thereby providing a benchmark for classifying the message as the fraudulent message or the genuine message.
[0071] Furthermore, the classification module is capable of classifying the message as the fraudulent message when the confidence score is below a predefined threshold, thereby performing efficient classification of the message within the minimum possible time without any manual intervention. The classification module is capable of reducing error in the classification by eliminating the manual intervention in the due process. Further, the classification module is capable of color coding the textual data upon classifying the message as the fraudulent message, thereby imparting a visual impact to the user.
[0072] Moreover, the link analysis module is cable of analyzing the textual data to identify the one or more universal resource locators, identifying the spelling mistakes in the universal resource locators by comparing the one or more universal resource locators with one or more prestored universal resource locators, and classifying classify the message as the fraudulent message when the reliability score generated is below a predefined value, thereby providing a reliable way to classify the message based on the one or more universal resource locators present in the message. The validation module is capable of classifying the message as the fraudulent message based on the one or more internet protocol addresses associated with a domain name present in the message, thereby increasing reliability of the system while evaluating the authenticity of the message.
[0073] Additionally, the recommendation module is capable of providing the one or more recommendations to the user upon classifying the message as the fraudulent message, thereby protecting privacy and security of the user. The authentication module is capable of classifying the message as the genuine message upon detecting a digital signature in the message, thereby preventing erroneous evaluation of the message by the system. The training module is capable of training the attention based transformer models based on the on the one or more linguistic patterns provided by the user, update the one or more prestored universal resource locators stored in the integrated database with the one or more universal resource locators provided by the user, thereby making the system adaptive to changes.
[0074] It will be understood by those skilled in the art that the foregoing general description and the following detailed description are exemplary and explanatory of the disclosure and are not intended to be restrictive thereof. While specific language has been used to describe the disclosure, any limitations arising on account of the same are not intended.
[0075] The figures and the foregoing description give examples of embodiments. Those skilled in the art will appreciate that one or more of the described elements may well be combined into a single functional element. Alternatively, certain elements may be split into multiple functional elements. Elements from one embodiment may be added to another embodiment. For example, the order of processes described herein may be changed and are not limited to the manner described herein. Moreover, the actions of any flow diagram need not be implemented in the order shown; nor do all the acts need to be necessarily performed. Also, those acts that are not dependent on other acts may be performed in parallel with the other acts. The scope of embodiments is by no means limited by these specific examples.
, Claims:1. A system (10) to evaluate authenticity of a message comprising:
characterized in that:
a processing subsystem (20) hosted on a server (30) and configured to execute on a network (40) to control bidirectional communications among a plurality of modules comprising:
a data acquisition module (60) configured to acquire at least one screenshot of the message from a user through a user interface associated with a user device;
a data extraction module (70) operatively coupled to the data acquisition module (60), wherein the data extraction module (70) is configured to extract textual data from the at least one screenshot using an image processing technique;
a content analysis module (80) operatively coupled to the data extraction module (70), wherein the content analysis module (80) is configured to:
preprocess the textual data using a natural language processing technique to obtain a processed data;
convert the processed data into one or more numerical vectors using a word embedding technique;
identify one or more linguistic patterns by feeding the one or more numerical vectors to an attention based transformer model;
evaluate a confidence score upon comparing the one or more linguistic patterns with one or more historical linguistic patterns of a plurality of fraudulent messages;
a classification module (90) operatively coupled to the content analysis module (80), wherein the classification module (90) is configured to:
classify the message as a fraudulent message when the confidence score is below a predefined threshold; and
provide a predefined color code to the textual data upon classifying the message as the fraudulent message, thereby evaluating the authenticity of the message.
2. The system (10) as claimed in claim 1, wherein the data acquisition module (60) is configured to generate a ticket number upon acquiring the at least one screenshot of the message to enable the user to fetch one or more details regarding the message, wherein the one or more details comprises the confidence score, and classification of the message.
3. The system (10) as claimed in claim 1, wherein the image processing technique comprises optical character recognition.
4. The system (10) as claimed in claim 1, wherein the natural language processing technique comprises stemming, lemmatization, and tokenizing.
5. The system (10) as claimed in claim 1, wherein the processing subsystem (20) comprises a link analysis module (100) configured to:
analyze the textual data to identify one or more universal resource locators from the textual data;
compare spellings of the one or more universal resource locators with one or more prestored universal resource locators stored in the integrated database to generate a reliability score; and
classify the message as the fraudulent message when the reliability score generated is below a predefined value.
6. The system (10) as claimed in claim1, wherein the processing subsystem (20) comprises a validation module (110) configured to:
detect at least one domain name from the at least one screenshot to identify one or more internet protocol addresses associated with the domain name;
compare the one or more internet protocol addresses identified with one or more flagged internet protocol addresses to evaluate a threat score; and
classify the message as the fraudulent message when the threat score evaluated is above a predefined threshold.
7. The system (10) as claimed in claim1, wherein the processing subsystem (20) comprises a recommendation module (120) configured to provide one or more recommendations to the user upon classifying the message as the fraudulent message, wherein the one or more recommendations comprises one or more procedures to be followed by the user while interacting with the message.
8. The system (10) as claimed in claim1, wherein the processing subsystem (20) comprises an authentication module (130) configured to classify the message as a genuine message upon detecting a digital signature in the message.
9. The system (10) as claimed in claim1, wherein the processing subsystem (20) comprises a training module (140) configured to:
train the attention based transformer module based on the one or more linguistic patterns provided by the user; and
update the one or more prestored universal resource locators stored in the integrated database with the one or more universal resource locators provided by the user.
10. A method (300) comprising:
characterized in that:
acquiring, by a data acquisition module, at least one screenshot of the message from a user through a user interface associated with a user device; (310)
extracting, by a data extraction module, textual data from the at least one screenshot using an image processing technique; (320)
preprocessing, by a content analysis module, the textual data using a natural language processing technique to obtain a processed data; (330)
converting, by the content analysis module, the processed data into one or more numerical vectors using a word embedding technique; (340)
identifying, by the content analysis module, one or more linguistic patterns by feeding the one or more numerical vectors to an attention based transformer model; (350)
evaluating, by the content analysis module, a confidence score upon comparing the one or more linguistic patterns with one or more historical linguistic patterns of a plurality of fraudulent messages; (360)
classifying, by a classification module, the message as a fraudulent message when the confidence score is below a predefined threshold; (370) and
providing, by a classification module, a predefined color code to the textual data upon classifying the message as the fraudulent message, thereby evaluating the authenticity of the message. (380)
Dated this 15th day of December 2023

Signature

Jinsu Abraham
Patent Agent (IN/PA-3267)
Agent for the Applicant

Documents

Application Documents

#	Name	Date
1	202311085888-STATEMENT OF UNDERTAKING (FORM 3) [15-12-2023(online)].pdf	2023-12-15
2	202311085888-POWER OF AUTHORITY [15-12-2023(online)].pdf	2023-12-15
3	202311085888-FORM 1 [15-12-2023(online)].pdf	2023-12-15
4	202311085888-DRAWINGS [15-12-2023(online)].pdf	2023-12-15
5	202311085888-DECLARATION OF INVENTORSHIP (FORM 5) [15-12-2023(online)].pdf	2023-12-15
6	202311085888-COMPLETE SPECIFICATION [15-12-2023(online)].pdf	2023-12-15
7	202311085888-FORM-26 [23-01-2024(online)].pdf	2024-01-23
8	202311085888-Power of Attorney [29-04-2025(online)].pdf	2025-04-29
9	202311085888-FORM-26 [29-04-2025(online)].pdf	2025-04-29
10	202311085888-Covering Letter [29-04-2025(online)].pdf	2025-04-29