Abstract: At this point in time, text messages are common place. In the proposed invention the equipment learning is used to make predictions about the reliability of SMS messages. To fully comprehend machine learning, one must be familiar with its various applications. In order to convey this objective, the Multinomial Ignorant Bayes approach was utilized; however, no publications nor references have been found. This invention is dependent on the gathering of data, cleaning of data, evaluation of data, creation of messages, and training of models. A look at Multinomial Ignorant Bayes is presented in this review. It was necessary to tidy up the data. We conducted an analysis of the data and looked for patterns across a wide range of parameters. Initial preparations for the text included vectorizing, stemming, and tokenizing. The most important steps of the version training were discussed. In order to accomplish this assignment, the Multinomial Naive Bayes approach, which is effective in classifying text, was utilized. Through the use of word frequency, this strategy identifies spam. Scams, spam, pigs, phishing, junk mail, malware, whitelisting, blacklisting, text classification, kaggle, machine learning, artificial intelligence, characteristics removal, dataset, stop words, word stemming, lemmatization, n-grams, TF-IDF, and bag of words are some of the tools that are utilized by the Spam Discovery System. Accuracy, sensitivity, uniqueness, F1 rating, ROC contour, area under the curve (AUC), real favorable and real unfavorable, false positive and false adverse are some of the statistical analytic measures that are utilized within the field. In addition to data pre-processing, confusion matrices, overfitting, underfitting, and receiver operating characteristic tuning, the term "hyperparameter tuning" can also refer to these other types of tuning. There is a connection between the following: version control, content evaluation, heuristics, keyword correlation, and peruse at will.
Description:Field of Invention
The Spam Discovery System uses terms like scams, spam, pigs, phishing, junk mail, malware, whitelisting, blacklisting, text classification, Kaggle, machine learning, artificial intelligence, attributes removal, dataset, stop words, word stemming, lemmatization, n-grams, TF-IDF, and bag of words. Statistical analysis measures include Accuracy, Sensitivity, Uniqueness, F1 Rating, ROC Contour, AUC, Real Favorable and Real Unfavorable, False Positive and False Adverse. The term "hyperparameter tuning" can refer to data pre-processing, confusion matrices, overfitting, underfitting, and receiver operating characteristic tuning. Version Control, Content Evaluation, Heuristics, Keyword Correlation, and Peruse at Will are connected.
Objectives of the Invention
The primary objective of the invention is to establish an early identity. By enhancing spam detection and monitoring, a website that identifies spam can assist users in rapidly evaluating incoming data, recognizing patterns of spammy behavior, and blocking or severely restricting access to material that may be suspicious. In order to keep an eye on spam tendencies and make the most of discovery tactics, administrators may make use of real-time warnings on potential hazards, sophisticated analytics, and long-term monitoring. The use of automatic response mechanisms that effectively decrease spam risks, coverage systems that are simpler, and clear processing of material that has been reported are all things that people appreciate. The technology protects internet networks from spam by integrating complex algorithms and ever-evolving learning systems. It does this without jeopardizing the trust of users or the integrity of operations, and it ensures that the performance is satisfactory. To add to that, I Quick and straightforward feedback on whether an email is real or spam is provided via the result page of a spam detector. Customers are provided with clear signs through the utilization of color-coded labels or symbols, as well as exhaustive descriptions of the category criteria that are utilized. A greater understanding of each alternative can be achieved by providing administrators with confidence ratings and information regarding the reasons that drove the classification. By providing updates in real time, a system can improve both its efficiency and its ability to respond to newly emerging risks. Through the utilization of integrated coverage tools, customers are able to submit feedback on the accuracy of the categories, which contributes to the ongoing development process. It is the result page's responsibility to provide customers with information that is both plain and helpful. This helps consumers feel more confident and gives them the ability to make well-informed decisions regarding reported messages.
Background of the Invention
Enlarging training samples for text categorization through the use of a word embedding method is achieved by carrying out the following procedures, which are detailed in the invention:
Collecting keywords from the small sample class to form a keyword set; randomly selecting half of the words from all texts in the non-small sample class; substituting those words with an equal number of words from the keyword set to construct a new word segment; finding the K nearest neighbors of the new speech section and known training sample through text similarity; categorizing new speech segments into small sample classes using K nearest neighbor classification; combining the new speech segments with a text categorization training set to form an enlarged training set. The process is outlined in DE102019/201988A1, and its intention is to enhance the quality of the available training sample by incorporating words that are not texts. Aside from this, it explains the screening process for candidate samples that were derived from word embedding using K nearest neighbor text classification, eliminating erroneous candidate training samples, obtaining available training samples, and enlarging training samples. Encoder-decoder-based event extraction is the approach that is revealed in the embodiment CN112597366A.
Prior to starting the word vector training process, phase 1, named text preparation (S1), involves cleaning open-domain data by removing words and phrases that are not needed; S2, identify the category of sentence-related event, label the text and preprocessed or edited text marking; Encoder-decoder training is included in S3 work, which also involves developing a light-weight deep learning model. This is achieved by combining the GRU and Attention models for event type identification. Towards the end of filling out the event extraction task and getting the correct event type, S4 utilizes the easy deep learning encoder-decoder network that was trained in S3. This is achieved based on the continuous training of the encoder-decoder network, which simplifies and represents the task of event extraction and produces the prediction outcome. There are numerous benefits that accompany the invention, for example, its high capacity to learn abstract things, its brevity in model size, and the fact that it uses very minimal time and resources while making calculations. Also, it possesses a substantial adaptability to the field.
According to the patent application number US20090144374A1, the invention is a method and system that is provided for the purpose of identifying and avoiding unwanted email or spam mail. The current invention is a system and method for channeling email in order to avoid unsolicited email and to identify the sources of unsolicited email. This is accomplished by adding a new layer to a standard email. An example of the innovation can be found in (US7747691B2) Through the use of the APB generator, it is possible to send messages to wireless mobile communication devices by means of direct wireless communications rather than through traditional email messages.
In (US10084801B2), the process of detecting infectious messages includes performing an individual characteristic analysis of a message in order to determine whether or not the message is suspicious, determining whether or not a similar message has been noted in the past in the event that the message is determined to be suspicious, and classifying the message according to its individual characteristics and its similarity to the message that has been noted in the past in the event that a similar message has been noted in the past. Systems and methods for reporting spam that has been detected in a communication network are given in the patent application number (WO2012078318A1). The network's entities are able to determine that a certain electronic message contains spam and then generate a spam report for that particular electronic message. The report on spam is presented in a format that has been improved by the addition of newly defined fields. When it comes to a collection of scientific papers, a single document can perform the function of both the document itself and a citation of other documents inside the collection, much like a digital library. Citations serve as a connection between these documents. At the document level and the citation level, the Bernoulli Process Topic (BPT) model displays the corpus. Both of these levels are represented by the model. In the latent topic space, each document in the BPT model is represented in two distinct ways, one for each of its roles. These representations are distinct from one another. Furthermore, a Bernoulli process-based generating method is capable of capturing the multi-level hierarchical structure of the citation network. Estimates of the distribution parameters of the BPT model are obtained by the utilization of a variational approximation technique. US9892367B2 is the publication that found this.
Petroleum geoscientists are able to identify emotional mismatches in text that is connected to petroleum geoscience with the assistance of the data processing technique and system described in US10861064B2. The computations are carried out by the data processing system in order to find pertinent petroleum geoscience associations, forecast their sentiment, and highlight sentiment conflicts associated with those relationships. Businesses have the ability to calibrate inconsistencies with their exploration and operations success and failure records in order to train a classifier to predict potential possibilities and risks. To be more specific, the data processing system is able to manage digital unstructured text from a wide variety of sources, such as academic articles, corporate reports, websites, and other sources. This data is analyzed and utilized by the system in order to provide petroleum geoscientists with assistance in identifying opportunities and dangers to their enterprises. The provision of systems and techniques for the management of infectious messages that have been sent is included in (US8955106B2). Receiving a message, forwarding the message, detecting whether or not the message that was transmitted is contagious after it had been passed, and preventing the infectious forwarded message are all components of managing electronic messages. Currently, the majority of anti-virus programs identify viruses by using virus signatures that are derived from previously identified viruses. These kinds of solutions, on the other hand, frequently fail to provide adequate protection for the network during the period of time that ensues between the initial emergence of a virus and the deployment of its signature. During this time interval, which is also known as "time zero" or "day zero," networks are more susceptible to being assaulted by malicious actors. In order for a standard anti-virus system to function well, it is typically necessary for viruses to be discovered, signatures regarding those viruses to be established, and the system to be deployed. It is possible for the time zero danger to re-emerge even after the system has been adapted to deal with an epidemic. This occurs because the virus mutates, which renders the previous signature useless.
Summary of the Invention
Through the utilization of the spam.csv dataset, the suggested invention has demonstrated the effectiveness of a multinomial Naive Bayes classifier for the detection of spam. This strategy will, in the long term, be helpful in detecting and filtering spam, which is a significant problem with undesired messages. Artificial intelligence approaches are utilized by the system in order to discriminate between legitimate and spam communications. These strategies are based on the features of the training data. Through the use of this technology, spam detection is improved while simultaneously minimizing dependency on human monitoring and external services.
Detailed Description of the Invention
It is also important to keep in mind that the performances of word-matching algorithms are not without their limitations. This is something that should always be kept in mind. Always and in all situations, this is something that should always be kept in mind. This consideration factor is an important aspect that must always be kept in mind. In the case of word-matching research instruments, e.g., it is of most importance to restrict the number of words that are part of the query that is posed to the system. This restricts the amount of data that can be extracted from the system. The significance of something is viewed as being of most importance. In consideration of the fact that this can bear a dramatic effect on the outcomes makes it a crucial consideration to keep in mind. It is due to the fact that the system necessitates the feeding of a certain number of phrases that this situation arises. The justification for this is that the system necessitates the said number of phrases. Placing a limit on the number of terms that may be applied in the study is part of the attempts that are being made. This is one of the measures that are being achieved. In addition to that, every single one of the terms that are present should be included, and there ought not to be any additional development that is unnecessary.
But in case the user utilizes an excessive amount of generic terms, the search engine will produce papers that, aside from being useless, have these general types of phrases. This is due to the fact that the user has employed an overwhelming number of generic terms. This is due to the fact that the user is employing an overwhelmingly disproportionate number of generic terms, which cannot be deemed suitable. The category of publications in this case comprise articles that embody the general terms used by the statement. They are included in this category. It is vital to highlight that for the user to perform the work, the user must first have an existing understanding of the subject area. This is due to the fact that the user is performing the work. It is necessary to underline that the process of selecting a small number of words that are still significant is not an easy effort in and of itself, and it is important to note that this is the case. Regarding this particular point, there is something that needs to be emphasized. It is highly suggested that the importance of this topic be placed at the center of discussion. A user must be sensitive to two aspects: 1) information that is relevant or irrelevant and therefore should or should not be added to the search (also referred to as contextualization), and 2) the vocabulary that is proper and sanctioned for expressing the information (also referred to as lexicographical textualization). Both these aspects are critical for the user to be conscious of. Both these aspects are critical for the user to be conscious of and to consider. The user needs to be conscious of and consider these two aspects in order to succeed. Thus, it is crucial that the consumer is aware of both these points so that they can utilize the product.
It is necessary that the user is conscious of and possesses these two elements in mind for them to be able to thrive in their activities. If the user does not provide those words that are crucial or correct, or if they give too much information that is irrelevant, the word-finding program will be unable to perform what it was made to accomplish. There is a possibility that either of these two scenarios will become a reality. There is a good chance they will both materialize. Another point of interest as an aside, so that the user can be able to do their legal research, for instance, it is crucial for them to know about the legal issues involved in problem analysis to some extent. This is just one of a huge amount of others. Prior to when they can cite references to previous cases, they can sift out the most important facts in the case and use the appropriate legal terms from previous instances. All this before they can quote references to prior cases. One of the most important things in their favor is that they can do that. Since they have the ability to eliminate the information from the scenario that is perceived as being most relevant, this situation has arisen due to their actions. During the search for the correct set of keywords that will yield results that are acceptable, the user must go through a trial and error process with a vast array of potential word combinations. This is required in an effort to identify the right set of keywords.
It is essential to finish this phase in order to know which group of keywords is most appropriate. It must be finished in order to select the right group of keywords to utilize. The approach that is being disclosed in this article is another technique that can potentially be utilized successfully and is being talked about here. When they are being faced with situations like these types of events, the idea of using a research instrument that is meant to help actioners get information on what the law has to say concerning a given topic is made null and void.
A research instrument is meant to guide actioners to know what the law has to say concerning a given subject. There is a likelihood of reaching at a conclusion that is similar to the one previously discussed in terms of the methods that we adopt for scientific literature or other uses of word-thing research tools.This is because there is a likelihood that we will end up at the same result. The fact is that this is the case regardless of the setting in which the research is carried out. It is also possible to arrive at the same conclusion and apply it to a wider range of other kinds of situations and occurrences. An important benefit is that this is the case. There are a couple of fairly recent legal research tools that have emerged as byproducts of their creation over the last couple of years. The capability to access these specific resources is provided to those who are deemed to be part of the general public. When these technologies were being created, the capacity to understand natural language was considered at various stages along the way. These things needed to be done in order for the innovations to be successful, so it was important to do them. To identify a language contained in the database records (e.g., files) with the identical meaning as the query, the main idea behind such research tools is to go systematically through the corpus of all the case-related files (e.g., judicial opinions, legislation, legal opinions, etc.) contained in a database. This is the central idea that provides the basis for these research instruments.
These research instruments are founded on this core principle, which forms their foundation.These research tools are created on the basis of this essential concept, which acts as the basis for their construction. On the basis of this fundamental idea, which serves as the foundation for the building of these research instruments, they are developed. This is being attempted for a variety of reasons, one of which is to find a language that can provide a solution to the problem that has been raised for consideration. This research instrument is better than the conventional legal research instruments in that it does not try very hard to closely approximate the language of the question to the wording of a case document. This makes it a better instrument for performing legal research. This allows these tools for research to provide results that are more precise than they would otherwise have been. Because of this, these tools for research are some of the best available at the moment, which is part of why they are selected. Because this is so, people have a greater expectation of being able to generate results that are correct. This is just one of the factors that help them be superior, in conjunction with the many others that help them be superior.
Brief description of Drawing
Figure 1. Data flow of Spam Detection System , Claims:The scope of the invention is defined by the following claims:
Claim:
1. A computer-implemented method for classifying input textual data into categories based on previously labeled datasets, comprising the steps of:
a) A receiving user input via a form submission and parsing the input text from the received form submission.
b) The extracting features from the parsed text using a Count Vectorizer to convert the text into numerical token counts.
c) The extracted features are classified using a trained Multinomial Naive Bayes classifier to predict the category of the input text.
2. According to claim 1, further comprising preprocessing the dataset by removing irrelevant columns and retaining only text messages and their associated class labels.
3. According to claim 1, wherein the dataset used for training comprises labeled examples of at least two classes, including spam and non-spam messages.
4. According to claim 1, further comprising splitting the dataset into a training set and a testing set using a train-test split approach to enable model training and evaluation.
5. According to claim 1, wherein the trained Multinomial Naive Bayes classifier predicts the category of the input text and provides a confidence score indicating the probability of the predicted classification.
6. According to claim 1, wherein the parsing of the form submission is performed by a Base HTTP Request Handler configured to handle POST requests and extract text input.
| # | Name | Date |
|---|---|---|
| 1 | 202541074782-REQUEST FOR EARLY PUBLICATION(FORM-9) [06-08-2025(online)].pdf | 2025-08-06 |
| 2 | 202541074782-FORM-9 [06-08-2025(online)].pdf | 2025-08-06 |
| 3 | 202541074782-FORM FOR STARTUP [06-08-2025(online)].pdf | 2025-08-06 |
| 4 | 202541074782-FORM FOR SMALL ENTITY(FORM-28) [06-08-2025(online)].pdf | 2025-08-06 |
| 5 | 202541074782-FORM 1 [06-08-2025(online)].pdf | 2025-08-06 |
| 6 | 202541074782-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [06-08-2025(online)].pdf | 2025-08-06 |
| 7 | 202541074782-EVIDENCE FOR REGISTRATION UNDER SSI [06-08-2025(online)].pdf | 2025-08-06 |
| 8 | 202541074782-EDUCATIONAL INSTITUTION(S) [06-08-2025(online)].pdf | 2025-08-06 |
| 9 | 202541074782-DRAWINGS [06-08-2025(online)].pdf | 2025-08-06 |
| 10 | 202541074782-COMPLETE SPECIFICATION [06-08-2025(online)].pdf | 2025-08-06 |