System And Method For Automatic Evaluation Of Answer Scripts

< Back

System And Method For Automatic Evaluation Of Answer Scripts

Abstract: SYSTEM AND METHOD FOR AUTOMATIC EVALUATION OF ANSWER SCRIPTS ABSTRACT A system and method for automatically evaluating responses of students for descriptive answers is disclosed.The system (100) may include an input unit (102) adapted to receive responses of students from a plurality of sources. Embodiments may also include an input converter unit (104) adapted to receive responses from the input unit (102) and convert the received responses into an enhanced textual output. Embodiments may also include an evaluation unit (106) adapted to receive the enhanced textual output from the input converter unit (104) and evaluate the responses provided by the students, the evaluation unit (106) may be configured to perform pre-processing of the enhanced textual output to standardize the responses for further processing. Embodiments may also include compiling a list of reference responses from key answers provided, or a subset of randomized answer keys, or summarization synthesized from high-scoring responses by students. FIG. 2

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

08 November 2022

Publication Number

12/2023

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

AMRITA VISHWA VIDYAPEETHAM

Kasavanahalli, Carmelaram P.O. Bangalore – 560035, India

Inventors

1. PATI, PEETA BASA

AMRITA Vishwa Vidyapeetham, Department of Computer Science & Engineering, Kasavanahalli, Carmelaram P.O. Bangalore – 560035, India

2. GUPTA, DEEPA

AMRITA Vishwa Vidyapeetham, Department of Computer Science & Engineering, Kasavanahalli, Carmelaram P.O. Bangalore – 560035, India

3. SAYEED, MOHAMMED AZAM

AMRITA Vishwa Vidyapeetham, Department of Computer Science & Engineering, Kasavanahalli, Carmelaram P.O. Bangalore – 560035, India

4. KUMAR, AASHU

AMRITA Vishwa Vidyapeetham, Department of Computer Science & Engineering, Kasavanahalli, Carmelaram P.O. Bangalore – 560035, India

Specification

Description:F O R M 2

THE PATENTS ACT, 1970
(39 of 1970)

COMPLETE SPECIFICATION
(See section 10 and rule 13)

TITLE
SYSTEM AND METHOD FOR AUTOMATIC EVALUATION OF ANSWER SCRIPTS
INVENTORS:
PATI, PEETA BASA
GUPTA, DEEPA
SAYEED, MOHAMMED AZAM
KUMAR, AASHU
Indian Citizens
AMRITA Vishwa Vidyapeetham, Department of Computer Science & Engineering,
Kasavanahalli, Carmelaram P.O. Bangalore – 560035, India
APPLICANT
AMRITA VISHWA VIDYAPEETHAM
Kasavanahalli, Carmelaram P.O.
Bangalore – 560035, India

THE FOLLOWING SPECIFICATION PARTICULARLY DESCRIBES THE INVENTION AND THE MANNER IN WHICH IT IS TO BE PERFORMED:

SYSTEM AND METHOD FOR AUTOMATIC EVALUATION OF ANSWER SCRIPTS
CROSS-REFERENCES TO RELATED APPLICATION
[1] None.
FIELD OF INVENTION
[2] The present disclosure relates to evaluation of answer scripts and in particular to automatic system for evaluation of answer scripts from multiple sources.
DESCRIPTION OF THE RELATED ART
[3] Examinations are the mainstay of assessment for student’s academic success in educational institutions. Most of the classroom examinations are conducted with handwritten answer scripts.Manual grading of descriptive answers by human evaluators is a time-consuming and vulnerable error-prone activity. Some amount of automation is possible for MCQ’s. For example, answers provided on screen by selecting radio buttons or check boxes can be fully automated. Answers to MCQ’s provided in form of OMR sheets can also be automated with OMR readers. If the answer script contains check-boxes provided in the answer script, that can be read with ICR / checkbox readers available in market and automation may be employed. However, anything beyond MCQ’s with answers provided in descriptive format doesn’t have a solution presently.
[4] Automatic scoring methods for free-text student responses have the potential to significantly decrease instructors' burden. This task of automatically assessing such student responses (as opposed to, say, gap-filling questions) is known as short answer scoring (SAS), and automatic methods have been developed for tasks ranging from science assessments to reading comprehension, as well as for domains as diverse as foreign language learning, citizenship selection exams, and more traditional routine classroom tasks. AutoSAS -like systems can be applied in systems and businesses in the area of education where they can be utilized not only to reduce the economic cost of grading but also to reduce the time-intensive task of manual correction significantly.Therebyprovidingaautomized scalable platform for consistent grading of answer sheets in an acceptable time-bound manner. Students can also immensely benefit from such a scalable platform as it provides continuous feedback to students on areas of improvement and strengths, students can use these insights to examine their work before submitting it for final approval.
[5] Various publications have tried to address the problems associated with automation of the evaluation of descriptive answers. US publication2014065593A1 discloses method to create and store assignments, automatically analyze student accuracy and ability, and store analysis on individual students for an extended amount of time. Various non-patent literature also discusses the evaluation of cameras descriptive answers. Victoria et. al., “Intelligent Short Answer Assessment using Machine Learning” (2020) discusses the evaluation on the usage of word, their importance and grammatical meaning of the sentences provided in the answer scripts. Extraction from answer script, measuring various similarities from summarized extracted text and assigning weight value to each calculated parameter to score the marks for the answer scriptsis discussed in Rahman et.al., “An Automated Approach for Answer Script Evaluation Using Natural Language Processing” (2019). Gomaa et. al. discusses the reduction of time, increase in evaluation consistency and standardization for the student responses in “Tapping into the Power of Automatic Scoring” (201).
[6] Presently, there are no automated systems for evaluating answer scripts having descriptive answers or responses provided by students in other forms such as speech. Therefore, there is a need for anautomated system for evaluating answer scripts to assess a large number of students in a faster, less cumbersome, and less exhaustive manner.
SUMMARY OF THE INVENTION
[7] The present subject matter relates to a robot for navigating elongated profiles or structures.
[8] An embodiment of the present disclosure may include a system for automatically evaluating responses of students for descriptive answers, the system may include an input unit adapted to receive responses of students from a plurality of sources. Embodiments may also include an input converter unit adapted to receive responses from the input unit and convert the received responses into an enhanced textual output.Embodiments may also include an evaluation unit adapted to receive the enhanced textual output from the input converter unit and evaluate the responses provided by the students, the evaluation unit may be configured to perform pre-processing of the enhanced textual output to standardize the responses for further processing. Embodiments may also include compiling a list of reference responses fromprovided key answers, or a subset of randomized answer keys, or synthesized answer keys from high-scoring responses by students.Embodiments may also include perform feature engineering to capture the patterns responses provided by the students, the compiled reference answers obtained by feature extraction. Embodiments may also include perform dimensionality reduction on the features extracted from the enhanced textual output to reduce the redundancy and complexity of the enhanced textual output.
[9] Embodiments may also include determine and mimic the cognitive patterns of human evaluatorsby modeling the extracted features using machine learning based training models coupled with semantic interpretation-based deep learning models on the enhanced representations of word embeddings. Embodiments may also include represent scores for student response. Embodiments may also include a display unit for providing the final scores obtained by the students for their responses. In some embodiments, the plurality of sources includes one or more of a scanner, a keyboard, a tablet or a speech-to-text converter. In some embodiments, the input converter unit may include of a text module, an audio module, a language module and an anomaly module.
[10] Embodiments of the present disclosure may also include a method for automatically evaluating responses of students for descriptive answers, the method may include receiving responses of students from a plurality of sources in an input unit. Embodiments may also include transmitting the received responses from the input unit to an input converter unit and converting the received responses into an enhanced textual output.
[11] Embodiments may also include performing pre-processing of the enhanced textual output to standardize the responses for further processing in an evaluation unit. Embodiments may also include compiling a list of reference responses from key answers provided, or a subset of randomized answer keys, or synthesized answer keys from high-scoring responses by students.
[12] Embodiments may also include performing feature engineering to capture the pattern of responses provided by the students to the reference answers obtained by feature extraction. Embodiments may also include performing dimensionality reduction on the features extracted from the enhanced textual output to reduce the redundancy and complexity of the enhanced textual output.
[13] Embodiments may also include determining and mimicking the cognitive patterns of human evaluators by modeling the extracted features using machine learning based training models coupled with semantic interpretation-based deep learning models on the enhanced representations of word embeddings. Embodiments may also include representing scores for student response. Embodiments may also include displaying the final scores obtained by the students for their responses on a display unit.
[14] Embodiments may also include, for pre-processing of the enhanced textual output in the evaluation unit may include converting responses from uppercase, italics, or any other form to lowercase. Embodiments may also include removing irrelevant tokens present in textual data from web forms or speech responses. Embodiments may also include correcting usage of punctuation, spacing of words, and special characters. Embodiments may also include perform spell check and correction for spelling errors detected in the responses.
[15] Embodiments may also include obtaining a list of reference responses may include receiving answer keys and generate answer keys based on the summarized text of all high-score responses of students and the received answer keys in a marking scheme. Embodiments may also include receiving pseudo answer keys addressing plurality of marking scores for responses, with answer keys collected to represent marking scores other than the highest scores. Embodiments may also include receiving high-score responses and generate answer keys of a subset of smallest length highest scored responses, a subset of largest length highest scored responses, a subset of average length highest scored responses and a subset of median length basedhighest scored responses.
[16] Embodiments may also include feature engineering may include performing word embedding representations of a set of paired words to capture the semantic links between words. Embodiments may also include performing sentence embedding to obtain sentence-level semantics and eliminate errors that may be created through word embedding.
[17] Embodiments may also include obtaining representative sentence similarity scores to discern the similarity between the embedding representation of the enhanced textual output and compiled reference answers. Embodiments may also include obtaining weighted keywords and generate n-gram ratios by extracting important keywords from texts and representing them as corresponding unigram, bigram, and trigram keywords and generating a feature of the incidence ratio of a set of ngrams for each of the student response to the compiled reference answers.
[18] Embodiments may also include determining number of close matches on word-based scores using the important unigram keywords collected in n-grams in the textual output. Embodiments may also include determining statistical textual features of each response by number of words, number of unique terms, standard type-token ratio, root TTR, corrected TTR, and Mean Segmental TTR.
[19] Embodiments may also include determining linguistic- features of the responses by computing average sentence by character, average sentence length by word, average syllable per word, count of special character, count of punctuations, count of functional words. Embodiments may also include determining vocabulary richness features of the responses by hapax legemena, Hanore measure, hapax dis legemena, yules characteristics, Simpson index, sicheles measure, brunets measure, and Shannon entropy.
[20] Embodiments may also include determining readability features of the responses by computing Flesch readability Ease, FleschCincade Grade level, dale chall readability, and grunning fox index, etc. Embodiments may also include determining the presence of alphanumerical indicators in the responses. Embodiments may also include obtaining lexical overlap for the response by computing ratio of incident count of nouns/verbs in a response to the total length of the set of noun/verbs for each response.
[21] Embodiments may also include obtaining matching scores of important lexical words of nouns and verbs collected from each reference answer for student responses. Embodiments may also include obtaining word content overlap to determine the extent to which words/phrases capture content of a sentence in the response overlaps with the reference answer.
[22] Embodiments may also include obtaining arguments overlap to determine phrases in the response having noun followed by the verb. Embodiments may also include obtaining prompt overlap to determine the overlap between a question and the response. Embodiments may also include obtaining temporal features to determine the overall tense of the sentences used in the responses to understand a student’s writing style. Embodiments may also include performing Named Entity Recognition to classify a token in unstructured text into pre-defined entity classes based on domain/language corpus to generate features of ratios of incident counts of extracted Entity class from each response using pre-trained token classification models.
[23] Embodiments may also include determining the learning pattern and word representation may include training regression models based on machine language with a dataset emulating human evaluator’s score. Embodiments may also include training fine-tuned embedding regression models with the dataset of embedding representation pretrained on the massive corpus.
[24] Embodiments may also include converting scanned documents of typewritten and/or handwritten responses into enhanced textual output may include performing noise estimation and removal of noise from the received scanned documents. Embodiments may also include performing boundary detection and page segmentation on the noise-free scanned documents.
[25] Embodiments may also include performing size normalization and adaptive binarization on the segmented scanned documents. Embodiments may also include performing layout analysis on the normalized documents. Embodiments may also include performing script recognition and character recognition after layout analysis for the scanned documents. Embodiments may also include performing language-specific accuracy enhancement required to increase the readability of the scanned documents. Embodiments may also include generating enhanced textual output from the scanned document.
[26] Embodiments may also include converting textual data into enhanced textual output may include performing language identification and translation for received textual data to increase the readability. Embodiments may also include generating enhanced textual output from the textual data. Embodiments may also include converting audio input into enhanced textual output may include performing noise estimation and removal of the received audio input. Embodiments may also include performing language and speech identification on the noise-less audio input. Embodiments may also include performing language specific accuracy enhancement on the identified audio output to increasing clarity of the audio output. Embodiments may also include generating enhanced textual output from the audio output.
[27] Embodiments may also include determining anomalies in input received as scanned documents, textual data and audio input may include receiving anomalous text as input. Embodiments may also include performing anomaly detection on the received anomalous text. Embodiments may also include providing suggest corrections and scoring on received anomalous text. Embodiments may also include determining decision on correction to be made to the anomalous text. Embodiments may also include generating enhanced textual output from the corrected anomalous text.
[28] This and other aspects are described herein.
BRIEF DESCRIPTION OF THE DRAWINGS
[29] The invention has other advantages and features, which will be more readily apparent from the following detailed description of the invention and the appended claims, when taken in conjunction with the accompanying drawings, in which:
[30] Figure 1 is a block diagram illustrating a system, according to some embodiments of the present disclosure.
[31] Figure 2 is a flowchart illustrating a method for automatically evaluating responses of students for descriptive answers, according to some embodiments of the present disclosure.
[32] Figure 3 is a flowchart illustrating the method for pre-processing of the enhanced textual output, according to some embodiments of the present disclosure.
[33] Figure 4 is a flowchart further illustrating the method for obtaining a list of reference responses, according to some embodiments of the present disclosure.
[34] Figure 5 is a flowchart further illustrating the method for performing feature engineering, according to some embodiments of the present disclosure.
[35] Figure 6 is a flowchart illustrating the method for converting scanned documents of typewritten and/or handwritten responses into enhanced textual output, according to some embodiments of the present disclosure.
[36] Figure 7 is a flowchart illustrating the method for converting textual data into enhanced textual output, according to some embodiments of the present disclosure.
[37] Figure 8 is a flowchart illustrating the method for converting audio input into enhanced textual output, according to some embodiments of the present disclosure.
[38] Figure 9 is a flowchart illustrating the method for determining anomalies in input received from plurality of sources, according to some embodiments of the present disclosure.
[39] Referring to the figures, like numbers indicate like parts throughout the various views.

DETAILED DESCRIPTION OF THE EMBODIMENTS
[41] While the invention has been disclosed with reference to certain embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the scope of the invention. In addition, many modifications may be made to adapt to a particular situation or material to the teachings of the invention without departing from its scope.
[42] Throughout the specification and claims, the following terms take the meanings explicitly associated herein unless the context clearly dictates otherwise. The meaning of “a”, “an”, and “the” include plural references. The meaning of “in” includes “in” and “on.” Referring to the drawings, like numbers indicate like parts throughout the views. Additionally, a reference to the singular includes a reference to the plural unless otherwise stated or inconsistent with the disclosure herein.
[43] The present subject matter describesasystem and method for automatically evaluating answer scripts provided by students during both online and offline examinations. It facilitates the human evaluators in evaluating the answer scripts quickly and without bias while reducing the evaluation time.
[44] A system 100 for automatically evaluating answer scripts is illustrated in FIG. 1, according to some embodiments of the present disclosure. In some embodiments, the system 100 may include an input unit 102 adapted to receive responses of students from a plurality of sources, an input converter unit 104 connected to the input unit 102 is adapted to receive responses from the input unit 102 and convert the received responses into an enhanced textual output. The system 100 may also include an evaluation unit 106connected to the input converter unit 104, adapted to receive the enhanced textual output from the input converter unit 104 and evaluate the responses provided by the students. The system 100 further includes a display unit 108 for providing the final scores obtained by the students for their responses. The plurality of sources from which inputs maybe received includescanner, a keyboard, a tablet or a speech-to-text converter. Further, the input converter unit 104 includes a text module, an audio module, a language module and an anomaly module for processing inputs received in plurality of formats from plurality of sources.
[45] In some embodiments, the evaluation unit 106 performs pre-processing of the received enhanced textual output to standardize the responses for further processing. And compiling a list of reference responses from key answers provided, or a subset of randomized answer keys, or synthesized set of answer keys such as extractive/abstract summarization, paraphrasing or extractive/abstract question answering based on passage based prompt type from high-scoring responses by students. The evaluation unit 106 further performs feature engineering to capturepatternsinenhanced/raw responses provided by the students, and more importantly relational patterns of student responseswith the compiled reference answers obtained by feature extraction.In some embodiments, dimensionality reduction is performed on the features extracted from the enhanced textual output to reduce the redundancy and complexity of the enhanced textual output. The cognitive patterns of human evaluators is determine and mimicked by modeling the extracted features using machine learning based training models coupled with semantic interpretation-based deep learning models on the enhanced representations of word embeddings and scores for student responses is obtained.
[46] FIG. 2 is a flowchart that shows a method for automatically evaluating responses of students for descriptive answers, according to some embodiments of the present disclosure. The method may include receiving responses of students from a plurality of sources in an input unit as provided in 202. At 204, the method may include transmitting the received responses from the input unit to an input converter unit and converting the received responses into an enhanced textual output. At 206, the method may include performing pre-processing of the enhanced textual output to standardize the responses for further processing in an evaluation unit.In some embodiments, at 208, the method may include compiling a list of reference responses from key answers provided, or a subset of randomized answer keys, or synthesized answer keys such as extractive/abstract summarization, paraphrasing or extractive/abstract question answering based on passage based prompt type from high-scoring responses by students. At 210, the method may include performing feature engineering to capture the pattern of responses provided by the students to the compiled reference answers obtained by feature extraction. At 212, the method may include performing dimensionality reduction on the features extracted from the enhanced textual output to reduce the redundancy and complexity of the enhanced textual output.At 214, the method may include determining and mimicking the cognitive patterns of human evaluators by modeling the extracted features using machine learning based training models coupled with semantic interpretation-based deep learning models on the enhanced representations of word embeddings. At 216, the method may include representing scores for student response. At 218, the method may include displaying the final scores obtained by the students for their responses on a display unit.
[47] In some embodiments, determining the learning pattern and word representationpertaining to the cognitive patterns of the evaluators includes performing one or more additional steps such as Training regression models based on machine language with a dataset emulating human evaluator’s scoreandtraining/fine-tuned embedding deep learning regression models with the dataset of embedding representation pretrained on the massive corpus.Machine Learning regression models selected for determination of learning patterns are those that are most consistent in model performance metric such as Quadratic weighted kappa (QWK) which is typical assessment metric for determining agreement between two raters such as human evaluator and any automated evaluation system. Hyperparameter tuners that are external to the modelsare tuned before model training to provide the optimal values of hyperparameters that are to be configured for the model training. Further, transformermodels having modified architecture with final output layer as a Regressor rather than standard classifiers are used and Fined tuned to the same training set obtained during ML-based modeling for consistency with the only difference being input feature is the preprocessed text of student responses. The fine-tuning on pretrained models facilitates the utilization of the massive pools of embedding representation trained on massive corpus and hardware units, which surpass the traditional recurrent neural networks or autoregressive models with token masking, attention layers and complex bidirectional representation of sentences allowing the model to learn the complex inner representation of language.
[48] A flowchart illustrating the method for pre-processing of the enhanced textual output, according to some embodiments of the present disclosure is shown in FIG. 3. The method used for pre-processing used in the evaluation unit 106 includesconverting responses from uppercase, italics, or any other form to lowercase as provided in step 302 and removing irrelevant tokens present in textual data from web forms or speech responses as provided in step 304. Correcting usage of punctuation, spacing of words, and special characters as mentioned in step 306 and perform spell check and correction for spelling errors detected in the responses at 308. Spell correction facilitates in improving semantic based features such as word embedding and sentence level embedding representations.
[49] FIG. 4 is a flowchart illustrating the method for obtaining a list of reference responses, according to some embodiments of the present disclosure. The method includes receiving answer keys and generatingsummarizedanswer keys based on all high-score responses of students or additionally summarization of paraphrased textual responses, and for special prompt having reading passages extractive and abstractive question answering task can be utilized to generate additional answer keys,and the received answer keys in a marking scheme as provided in 402, receiving pseudo answer keys addressing plurality of marking scores for responses, with answer keys collected to represent marking scores other than the highest scores at 404 andreceiving high-score responses and generate answer keys of a subset of smallest length highest scored responses, a subset of largest length highest scored responses, a subset of average length highest scored responses and a subset of median length highest scored responses as provided in 406.
[50] A method for performing feature engineering, according to embodiments of the present disclosureis shown in FIG. 5. The method includes performing word embedding representations of a set of paired words to capture the semantic links between words at 502, performing sentence embedding to obtain sentence-level semantics and eliminate errors that may be created through word embedding at 504 and obtaining representative sentence similarity scores to discern the similarity between the embedding representation of the enhanced textual output and compiled reference answers as provided in 506. The method further includes obtaining weighted keywords and generate n-gram ratios by extracting important keywords from texts and representing them as corresponding unigram, bigram, and trigram keywords and generating a feature of the incidence ratio of a set of ngrams for each of the responses to the compiled reference answers as presented in 508, determining number of close matches on word-based scores using the important unigram keywords collected in n-grams in the textual output at 510, determining statistical textual features of each response by number of words, number of unique terms, standard type-token ratio, root TTR, corrected TTR, and Mean Segmental TTR at 512 and determining linguistic features of the responses by computing average sentence by character, average sentence length by word, average syllable per word, count of special character, count of punctuations, count of functional words as provided in 514. In 516, the method further determines vocabulary richness features of the responses by hapax legemena, Hanore measure, hapax dis legemena, yules characteristics, Simpson index, sicheles measure, brunets measure, and Shannon entropy, determining readability features of the responses by computing Flesch readability Ease, FleschCincade Grade level, dale chall readability, and grunning fox index is provided at 518, determining the presence of alphanumerical indicators in the responses at 520 and obtaining lexical overlap for the response by computing ratio of incident count of nouns/verbs in a response to the total length of the set of noun/verbs for each response at 522. The method further includes obtaining matching scores of important lexical words of nouns and verbs collected from each reference answer for student responses as provided in 524, obtaining word content overlap to determine the extent to which words/phrases capture content of a sentence in the response overlaps with the reference answer at 526, obtaining arguments overlap to determine phrases in the response having noun followed by the verb at 528, obtaining prompt overlap to determine the overlap between a question and the response at 530, obtaining temporal features to determine the overall tense of the sentences used in the responses to understand a student’s writing style at 532 andperforming Named Entity Recognition to classify a token in unstructured text into entities classes based on pretrained domain/language corpus. To generate features of ratios of incident counts of extracted Entity from each response using pre-trained token classification models as provided in 534.
[51] In some embodiments, the textual data representation as word embeddings facilitates in capturing semantic links between words and documents in their various settings. Although word embeddings in comprehension of the text, averaging of words may lead to some errors which are resolved by using sentence embedding. To improve the embedding representation and similarity between reference answers, Sentence similarity scores are utilizedthat extrapolate sentence embedding to a similarity score. These regressors work as work pair-wise as Siamese networks and cosine similarity to generate similarity scores. To obtain further clarity for the responses received, ngrams were utilized as they capture context very well from the text but they have to be constructed from meaningful phrases/keyphrases. Important keywords are extracted from texts and represented as corresponding unigram, bigram, and trigram keywords and generate a feature of the ratio of a set of ngrams for each student response to the reference answers. The Unigrams are computed from high scored responses and the ratio of the number of unigrams present in each student response is computed, further matching unigrams generated from reference answer to the total set of unigrams is generated. This is repeated by generating ratios of unique unigram of available reference keys such as fromsummarized text of all high scored responses, paraphrased summarized text,answer key from question answering pipeline based on raw passage text/ preprocessed passage text provided in the prompt (for reading passages) and the question and reference key list generated from the smallest length, largest length, avg length, and median length high score responses. Ngram ratio is computed as bigrams and trigrams for each student response to the compiled collection of reference keys.
[52] In some embodiments, statistical features act as a discriminating feature between low-scored responses and high-scored answers as statistical metrics can capture statistical patterns that facilitates indetermining the human evaluator's patterns and complexity of natural language. Sentence length indicates the complexity of a sentence, Various ratios of sentence length of each student response is computed to the length of prompt, list of reference answers computed from a collection of smallest, largest, medium, and median length, provided answer key and extracted answer key. To capture the lexical knack of each student response, features were computed asnumber of words, the number of unique terms, standard type-token ratio, root TTR, corrected TTRand Mean Segmental TTR. In complement the statistical features, linguistic and vocabulary features are used. These include lexical features such as average sentence by character, average sentence length by word, average syllable per word, Count of Special Character, Count of Punctuations and count of Functional words. The vocabulary richness of student responses is computed using the following Hapax legomenon which is a metric to obtain words or expressions that occur only once in the entire context. These represent the zip’s law, the appearance of words not belonging to their origin or prevalence in context, Hapax dis legomena metric are a subset of significantly low hapax such that 40-60% belong to hapax legomenon and rest 10-15% belong to dis legomena, Yule’s Characteristics (K) metric is used for word frequency measurement for a large block of text that measures the likelihood of two nouns, chosen at random from the text being the same thatindicates the repetitiveness and complexity of each student's response, Simpsons Index is a metric used to represent the diversity of words used in sentences and calculated as the weighted arithmetic means of proportional abundance and measures the probability that two individual words selected randomly belong to the same lexicons, Brunets Measure is a lexical diversity measure that has been employed in stylometric text studies and is typically stated to be independent of text length. Additionally Shannon Entropy is used which is a metric derived from the foundational concept of information theory, that is the index to represent entropy which quantifies the amount of information contained in the text such as student’s response.
[53] Further, readability features is extracted from the received student responses using the following Flesch Reading Ease which is one of the accurate readability formulas that assumes the best text should contain shorter sentences and words. A score between 60-70 is acceptable. A lower score like 0-29 signifies quite confusing and 90-100 represents quite simple and clear, FleschKincad Grade Level is a similar metric that indicates the corresponding US Education grade level required to interpret the passage or answer text, Dale–Chall readability formula is a readability test that offers a numerical assessment of the level of comprehension difficulties that readers encounter when reading a document. It uses a list of 3000 words that groups of fourth-grade American kids can dependably grasp, with any word that is not on that list considered toughand Gunning fog index (similar to FleschKincad Grade level and Dale-Chall readability formula) is a test of readability for English text. The index calculates the number of years of formal education required to grasp the material on the first reading. For responses that require numerical values with units with mathematical computations such as answers for physics and chemistry would have response that have alphanumerical values, the top three most common occurring alphanumeric from high scored responses which probably represent important terms for evaluating responses is obtained, then three separate Boolean features are recorded that indicate either the values are present in each student response.
[54] In some embodiments, lexical overlap performsthe frequency of noun overlap between two phrases measured by noun overlaps. A collection of nouns are collected from highest scored student responses and then filtered based on count threshold, this list is used to find the incident count and the ratio is computed over the length of nouns set collected from modal answers. A similar metric is done for the verb overlap. Further. the ratio of the incident count of Nouns/Verb to the total length of the set of Noun/Verb computed is captured as features for each student response withentire compiled text of high scored student responses, summarized text of high scored responses,paraphrased summarized text ,extracted question answering pipeline text (if passage based prompt) and reference answer list generated based on the smallest, largest, average, and median length high scored responses. Further, word content overlap determines the extent to which words/phrases capture content of the sentence overlapping with the model answers. This collection is expanded to include word synonyms. Similar to Lexical overlap, a set of Nouns and Verbs are computed as separate features using glove architecture. The ratio of incidence count of Nouns and Verbs separated to the total Nouns/verbs from reference text, content captured as synonyms are included incident count inclusive of glove similarity threshold >0.7.
[55] In further embodiments, argument overlap is performed by creating custom rules using spacy package to extract phrases having the POS tag sequences as noun followed by verb. This limits the phrases extracted by rule extracting phrases of noun followed by the verb, however, the rules can be expanded to other POS tag sequences based on computing resources. For phrase overlap, glove embeddings similarity score threshold>0.8 is used to capture the semantics and consider word synonyms. Noun-verb phrases are extracted from high score clustered sentences by normalizing the sentence embeddingto unit length, applying affinity propagation clustering to generates cluster centers without the need to specify the number of clusters as Kmeans. This a list of all generated cluster center sentences, features are generated for the ratio of the incident count of Noun-verb phrases of each student response matching the total list of Noun-verb phrases from the cluster center list to the total length of the unique set of cluster phrases. This facilitates theusage of a reference list of high-scored answer cluster centers to represent the sample of high-scored answers more objectively.Noun-verb Phrases Summative and Raw text high scored responses are obtained by generating a ratio of incident counts of Noun-verb phrases of the summative text of all high scored answers, paraphrased summarized text and provided answer key. Additionally, the incident count ratio is used for all high-scored responses so no important phrase is missed from the summative text.Noun-verb Phrases dependent Reference Answers is obtained by generating a ratio of the incident count of matching Noun-verb phrases from the list of reference answers separately namely smallest length, highest, average, median length high scored answer list to the total length of unique noun-verb phrases extracted from each using spacy rule matchers.
[56] In further embodiments, prompt overlap is performed to obtain the overlap between the question and answer. Short replies often get part of their information from the question asked such as in English comprehension descriptive questions. Hence, the measure of the overlap between question and the student response is an important grading statistic for any grader. For example, in reading comprehension-based prompts, the replies derive their context and meaning from the understanding and the question itself. Textual entailment situation in the prompt overlap of student responses to the prompt is provided as hypotheses.For obtaining the prompt overlap an embedded representation of each word in the phrases provided in premise and hypotheses and make a final entailment decision with the aggregated score. The question/prompt is used as the hypothesis and each student response as the premise, the obtained result dictionary of Entailment score, contradiction, and neutrality are used as features.
[57] In some embodiments, temporal features are used determine the tense of sentences by using Booleanfeatures created representing temporal present, future, and past, it is possible to have multiple temporalities for each student response based on the complexity of the language used.Named Entity Recognitionis a subtask of information extraction that classifies each token in unstructured text into pre-defined Entity categories based on domain. Features are generated of ratios of incident counts of extracted entity from each student response using pretrained token classification models.Each NER is run on each student response to generate an incident count of extracted unique Entity of Interest against the collected list of reference answers and summarized text of all high scored answers.
[58] In some embodiments, dimensionality reduction is performed to reduce the space and computation time required for modeling dataset is reduced, reduces redundancy in data, avoids overfitting and reduces complexity in the dataset in the training model. As high dimensional word and sentence embeddings are used there is a need for applying a generalized Dimensionality reduction for the responses. The dimensionality reduction used is mutual information that measures the entropy drop under the condition of the target value.MI (feature; target) = Entropy (feature) – Entropy (feature | target). The range of MI score is from 0 to 1, higher value signifies the closer relation of the feature to the target. sklearn’smutual_info_regression is used to estimate mutual information for the continuous target column, the filter of MI threshold >0 is used to obtain the subset of features such that the two variables are not independent.
[59] FIG. 6 is a flowchart illustrating the method for converting scanned documents of typewritten and/or handwritten responses into enhanced textual output, according to some embodiments of the present disclosure. The method includes performing noise estimation and removal of noise from the received scanned documents as provided in 602, performing boundary detection and page segmentation on the noise-free scanned documents as provided in 604 and performing size normalization and adaptive binarization on the segmented scanned documents as provided in 606. Further, at 608 performing layout analysis on the normalized documents, performing at 610 script recognition and character recognition after layout analysis for the scanned documents, performing language-specific accuracy enhancement required to increase the readability of the scanned documents at 612 and generating enhanced textual output from the scanned document as provided in 614
[60] FIG. 7 is a flowchart illustrating the method for converting textual data into enhanced textual output, according to some embodiments of the present disclosure. The method includes performing language identification and translation for received textual data to increase the readability at 702 andgenerating enhanced textual output from the textual data as provided in 704.
[61] FIG. 8 is a flowchart illustrating the method for converting audio input into enhanced textual output, according to some embodiments of the present disclosure. The method includes performing noise estimation and removal of the received audio input at 802, performing language and speech identification on the noise-less audio input at 804, performing language specific accuracy enhancement on the identified audio output to increasing clarity of the audio output at 806 andgenerating enhanced textual output from the audio output as provided in 808.
[62] FIG. 9 is a flowchart illustrating the method for determining anomalies in input received from plurality of sources, according to some embodiments of the present disclosure. The method includes performing anomaly detection on the received anomalous text as provided in 902, providing (906) suggest corrections and scoring on received anomalous text at 904, determining (908) decision on correction to be made to the anomalous text as provided in 906 andgenerating (910) enhanced textual output from the corrected anomalous text at 908.
[63] Although the detailed description contains many specifics, these should not be construed as limiting the scope of the invention but merely as illustrating different examples and aspects of the invention. It should be appreciated that the scope of the invention includes other embodiments not discussed herein. Various other modifications, changes and variations which will be apparent to those skilled in the art may be made in the arrangement, operation and details of the system and method of the present invention disclosed herein without departing from the scope of the invention.
[64] While the invention has been disclosed with reference to certain embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the scope of the invention. In addition, many modifications may be made to adapt to a particular situation or material the teachings of the invention without departing from its scope, which should be as delineated in the claims appended hereto.

, Claims:WE CLAIM:
1. A system (100) for automatically evaluating responses of students for descriptive answers, the system comprises:
an input unit (102) adapted to receive responses of studentsfrom a plurality of sources;
an input converter unit (104) adapted to receive responses from the input unit and convert the received responses into an enhanced textual output;
an evaluation unit (106) adapted to receive the enhanced textual output from the input converter unit and evaluate the responses provided by the students, the evaluation unit is configured to:
perform pre-processing of the enhanced textual output to standardize the responses for further processing;
compile a list of reference responses from key answers provided, or a subset of randomized answer keys, or synthesized answer keys from high-scoring responses by students;
perform feature engineering to capture the pattern of responses provided by the students to the reference answers obtained by feature extraction;
perform dimensionality reduction on the features extracted from the enhanced textual output to reduce the redundancy and complexity of the enhanced textual output;
determine and mimic the cognitive patterns of human evaluatorsby modeling the extracted features using machine learning based training models coupled with semantic interpretation-based deep learning models on the enhanced representations of word embeddings;
represent scores for student response; and
a display unit (108) for providing the final scores obtained by the students for their responses.

2. The system (100) as claimed in claim 1, wherein the plurality of sources includes one or more of a scanner, a keyboard, a tablet or a speech-to-text converter.

3. The system (100) as claimed in claim 1, wherein the input converter unit (104) comprises of a text module (110), an audio module (112), a language module (114) and an anomaly module (116).

4. A method (200) for automatically evaluating responses of students for descriptive answers, the method comprises:
receiving(202) responses of studentsfrom a plurality of sourcesin an input unit;
transmitting (204) the received responses from the input unit to an input converter unit and converting the received responses into an enhanced textual output;
performing pre-processing (206) of the enhanced textual output to standardize the responses for further processing in an evaluation unit;
compiling (208) a list of reference responses from key answers provided, or a subset of randomized answer keys, or synthesized answer keys from high-scoring responses by students
performing feature engineering (210) to capture the pattern of responses provided by the students to the reference answers obtained by feature extraction;
performing dimensionality reduction (212) on the features extracted from the enhanced textual output to reduce the redundancy and complexity of the enhanced textual output;
determining and mimicking (214) the cognitive patterns of human evaluators by modeling the extracted features using machine learning based training models coupled with semantic interpretation-based deep learning models on the enhanced representations of word embeddings;
representing (216) scores for student response; and
displaying (218) the final scores obtained by the students for their responses on a display unit.

5. A method (200) as claimed in claim 4, wherein for pre-processing of the enhanced textual output in the evaluation unit (106) comprises:
converting (302) responses from uppercase, italics, or any other form to lowercase;
removing (304) irrelevant tokens present in textual data from web forms or speech responses;
correcting (306) usage of punctuation, spacing of words, and special characters; and
perform (308) spell check and correction for spelling errors detected in the responses.

6. A method (200) as claimed in claim 4, wherein obtaining a list of reference responses comprises:
receiving (402) answer keys and generating answer keys based on the summarized text of all high-score responses of students;
receiving (404) pseudo answer keys addressing plurality of marking scores for responses, with answer keys collected to represent marking scores other than the highest scores; and
receiving (406) high-score responses and generate answer keys of a subset of smallest length highest scored responses, a subset of largest length highest scored responses, a subset of average length highest scored responses and a subset of median length highest scored responses.

7. A method (200) as claimed in claim 4, wherein feature engineering comprises:
performing(502) word embedding representations of a set of paired words to capture the semantic links between words;
performing (504) sentence embedding to obtain sentence-level semantics and eliminate errors that may be created through word embedding;
obtaining (506) representative sentence similarity scores to discern the similarity between the embedding representation of the enhanced textual output and compiled reference answers;
obtaining (508) weighted keywords and generate n-gram ratios by extracting important keywords from texts and representing them as corresponding unigram, bigram, and trigram keywords and generating a feature of the incidence ratio of a set of ngrams for each of the responses to the compiled reference answers;
determining (510) number of close matches on word-based scores using the important unigram keywords collected in n-grams in the textual output;
determining (512) statistical textual features of each response by number of words, number of unique terms, standard type-token ratio, root TTR,corrected TTR, and Mean Segmental TTR;
determining (514) linguistic features of the responses by computing average sentence by character, average sentence length by word, average syllable per word, count of special character, count of punctuations, count of functional words;
determining (516) vocabulary richness features of the responses by hapax legemena, Hanore measure, hapax dis legemena, yules characteristics, Simpson index, sicheles measure, brunets measure, and Shannon entropy;
determining (518) readability features of the responses by computing Flesch readability Ease, FleschCincade Grade level, dale chall readability, and grunning fox index, etc;
determining (520) the presence of alphanumerical indicators in the responses;
obtaining (522) lexical overlap for the response by computing ratio of incident count of nouns/verbsin a response to the total length of the set of noun/verbs for each response;
obtaining (524) matching scores of important lexical words of nouns and verbs collected from each reference answer for student responses;
obtaining (526) word content overlap to determine the extent to which words/phrases capture content of a sentence in the response overlaps with the reference answer;
obtaining (528) arguments overlap to determine phrases in the response having noun followed by the verb;
obtaining (530) prompt overlap to determine the overlap between a question and the response;
obtaining (532) temporal features to determine the overall tense of the sentences used in the responses to understand a student’s writing style; and
performing (534)named entity recognitionto classifya token in unstructured text into pre-defined entityclasses based on domain/languagecorpusto generate features of ratios of incident counts of extracted Entity from each response using pre-trained token classification models.

8. The method (200) as claimed in claim 4, wherein determining the learning pattern and word representation comprises:
training regression models based on machine language with a dataset emulating human evaluator’s score; and
training fine-tuned embedding regression models with the dataset of embedding representation pretrained on the massive corpus.

9. The system (200) as claimed in claim 4, wherein converting scanned documents of typewritten and/or handwritten responses into enhanced textual output comprises:
performing(602) noise estimation and removal of noise from the received scanned documents;
performing(604) boundary detection and page segmentation on the noise-free scanned documents;
performing(606) size normalization and adaptive binarization on the segmented scanned documents;
performing(608) layout analysis on the normalized documents;
performing(610) script recognition and character recognition after layout analysis for the scanned documents;
performing (612)language-specific accuracy enhancement required to increase the readability of the scanned documents; and
generating(614) enhanced textual output from the scanned document.

10. The method (200) as claimed in claim 4, wherein converting textual data into enhanced textual output comprises:
performing(702) language identification and translation for received textual data to increase the readability; and
generating(704) enhanced textual output from the textual data.

11. The method (200) as claimed in claim 4, wherein converting audio inputinto enhanced textual output comprises:
performing(802) noise estimation and removal of the received audio input;
performing(804) language and speech identification on the noise-less audio input;
performing(806) language specific accuracy enhancement on the identified audio output to increasing clarity of the audio output; and
generating (808)enhanced textual output from the audio output.

12. The method (200) as claimed in claim 4, wherein determining anomalies in input received as scanned documents, textual data and audio input comprises:
receiving(902) anomalous text as input;
performing(904) anomaly detection on the received anomalous text;
providing (906) suggest corrections and scoring on received anomalous text;
determining(908) decision on correction to be made to the anomalous text; and
generating(910) enhanced textual output from the corrected anomalous text.

Sd.- Dr V. SHANKAR
IN/PA-1733
For and on behalf of the Applicants

Documents

Application Documents

#	Name	Date
1	202241063737-STATEMENT OF UNDERTAKING (FORM 3) [08-11-2022(online)].pdf	2022-11-08
2	202241063737-FORM FOR SMALL ENTITY(FORM-28) [08-11-2022(online)].pdf	2022-11-08
3	202241063737-FORM 1 [08-11-2022(online)].pdf	2022-11-08
4	202241063737-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [08-11-2022(online)].pdf	2022-11-08
5	202241063737-EVIDENCE FOR REGISTRATION UNDER SSI [08-11-2022(online)].pdf	2022-11-08
6	202241063737-EDUCATIONAL INSTITUTION(S) [08-11-2022(online)].pdf	2022-11-08
7	202241063737-DRAWINGS [08-11-2022(online)].pdf	2022-11-08
8	202241063737-DECLARATION OF INVENTORSHIP (FORM 5) [08-11-2022(online)].pdf	2022-11-08
9	202241063737-COMPLETE SPECIFICATION [08-11-2022(online)].pdf	2022-11-08
10	202241063737-FORM-26 [04-01-2023(online)].pdf	2023-01-04
11	202241063737-FORM-9 [17-03-2023(online)].pdf	2023-03-17
12	202241063737-FORM 18 [30-06-2023(online)].pdf	2023-06-30
13	202241063737-Proof of Right [30-11-2024(online)].pdf	2024-11-30
14	202241063737-FORM-8 [30-11-2024(online)].pdf	2024-11-30
15	202241063737-FER.pdf	2025-02-06
16	202241063737-RELEVANT DOCUMENTS [04-04-2025(online)].pdf	2025-04-04
17	202241063737-POA [04-04-2025(online)].pdf	2025-04-04
18	202241063737-FORM 13 [04-04-2025(online)].pdf	2025-04-04
19	202241063737-PETITION UNDER RULE 137 [30-07-2025(online)].pdf	2025-07-30
20	202241063737-FORM-5 [30-07-2025(online)].pdf	2025-07-30
21	202241063737-FER_SER_REPLY [30-07-2025(online)].pdf	2025-07-30
22	202241063737-DRAWING [30-07-2025(online)].pdf	2025-07-30
23	202241063737-CORRESPONDENCE [30-07-2025(online)].pdf	2025-07-30
24	202241063737-CLAIMS [30-07-2025(online)].pdf	2025-07-30

Search Strategy

1	SearchStrategyMatrixE_01-02-2024.pdf