System And Method For Video Resume Analysis To Evaluate Candidate

< Back

System And Method For Video Resume Analysis To Evaluate Candidate Confidence Score

Abstract: A system (102) for analyzing video resumes for candidate evaluation is disclosed. The system (102) utilizes a lightweight state-of-the-art model to analyze individual frames of the video resume, ensuring the candidate is appropriately presented. This checks for offensive content and verifies adherence to video resume standards. In processing phase, the video resume is divided into video, text, and speech processing, ensuring a comprehensive evaluation. Additionally, scores are generated for multiple features for a candidate, and the generated scores are combined to produce a confidence score, providing a thorough evaluation of suitability for each candidate. This system (102) enhances recruitment process to ensure a detailed and contextually aligned assessment of video resumes.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

12 March 2024

Publication Number

15/2024

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

AVUA INTERNATIONAL PRIVATE LIMITED

E 272 Phase, 8 A, 2nd Floor, Sector 75, Sahibzada Ajit Singh Nagar, Punjab 160071

Inventors

1. KUMAR, Bharath

71 Gollahalli, JP Nagar 9th Phase, Anjanapura Banglore South, Karnataka - 560062

2. SHEEL, Manan

1548/9, New Colony, Near ICS Coaching Center, Sonipat, Haryana, India. Pin code - 131001.

3. CHOUDHARY, Adit

House No. 866 / Sector - 5, Urban Estate, Kurukshetra - 136118 , Haryana

Specification

Description:TECHNICAL FIELD
[0001] The present disclosure relates to the technical field of human resources management and recruitment processes. More particularly, it pertains to a system and method for comprehensive analysis of video resumes to determine and assign a candidate confidence score.

BACKGROUND
[0002] Background description includes information that may be useful in understanding the present disclosure. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed disclosure, or that any publication specifically or implicitly referenced is prior art.
[0003] In digital era, process of job application has undergone a profound transformation, with the conventional practice of submitting resumes in PDF or Word formats becoming a widespread norm. The shift to online application platforms and email submissions has necessitated the adoption of document formats that are easily accessible, shareable, and maintain the integrity of the document's structure. Job seekers routinely send their resumes electronically, allowing for swift transmission to prospective employers and recruitment agencies.
[0004] PDF and Word formats have emerged as the de facto standards for resume submissions due to their compatibility, portability, and formatting consistency across various devices and operating systems. PDFs, in particular, offer a fixed layout, ensuring that the document appears as intended irrespective of the device or software used to view it. This consistency is crucial for presenting a professional and polished image to potential employers, reinforcing the importance of choosing a reliable and universal format.
[0005] The prevalence of online application portals, company websites, and email-based submissions has made it imperative for job seekers to tailor their resumes for electronic transmission. This digital evolution has prompted individuals to adapt their document creation and formatting strategies, considering the nuances associated with electronic submissions. Applicants now navigate the delicate balance of incorporating visually appealing elements while ensuring that the document remains easily parseable by applicant tracking systems (ATS) that many companies employ to manage large volumes of resumes.
[0006] As the job market continues to embrace technology, the need for innovative tools that can efficiently analyze and assess electronic resumes becomes increasingly apparent. While PDF and Word formats have become the standard for written resumes, the advent of video resumes introduces an exciting dimension to the hiring process.
[0007] Video resumes provide a comprehensive view of a candidate's communication skills, demeanor, and other non-verbal cues that are often critical in professional settings. The transition from traditional resumes to video formats has created new challenges and opportunities for both job seekers and recruiters. Traditional methods of evaluating resumes are often inadequate when applied to video content due to the additional layers of complexity introduced by visual and auditory elements. Moreover, the subjective nature of human judgment in interpreting non-verbal cues can lead to inconsistencies in candidate evaluations.
[0008] There is, therefore, a need for an improved, robust, and efficient solution that assesses various facets of a video resume, providing a more nuanced and reliable means of evaluating candidate suitability.

OBJECTS OF THE PRESENT DISCLOSURE
[0009] Some of the objects of the present disclosure, which at least one embodiment herein satisfies are as listed herein below.
[0010] An object of the present disclosure is to provide a system and method for comprehensive analysis of video resumes to determine and assign a candidate confidence score.
[0011] Another object of the present disclosure is to provide a system and method that enhances efficiency in candidate evaluation by analyzing facial features and non-verbal cues.
[0012] Another object of the present disclosure is to provide a system and method that ensures relevance in video resume content through advanced text processing techniques.
[0013] Another object of the present disclosure is to provide a system and method that prioritizes professionalism by detecting offensive or vulgar language in video resumes.
[0014] Another object of the present disclosure is to provide a system and method that leverages state-of-the-art audio processing techniques for a more comprehensive evaluation of candidates' communication skills.
[0015] Another object of the present disclosure is to provide a system and method that employs machine learning for precise confidence scoring, optimizing the combination of various assessment components.
[0016] Another object of the present disclosure is to provide a system that promotes fairness and objectivity in candidate evaluation by relying on standardized criteria and benchmarks.
[0017] Another object of the present disclosure is to provide a system that allows for easy integration with existing recruitment platforms, facilitating seamless adoption in diverse hiring processes.
[0018] Another object of the present disclosure is to provide a system that contributes to a positive candidate experience by ensuring that the evaluation process is focused on professional attributes and skills.

SUMMARY
[0019] Various aspects of present disclosure pertain to the field of human resources management and recruitment processes. More particularly, it pertains to a system and method for comprehensive analysis of video resumes to determine and assign a candidate confidence score. This innovative approach employs cutting-edge technologies in computer vision, natural language processing (NLP), and audio signal processing to evaluate various facets of a candidate's presentation, behavior, and communication skills.
[0020] An aspect of the present disclosure pertains to a system for video resume analysis that includes an input unit, which receives a set of candidate-associated videos, and a controller, equipped with processors and memory, orchestrates the system's operations. Initially, the system verifies the received videos to detect whether the received videos indicate a video resume or not, if not indicate the video resume, discard the video. And, when the video indicates video resumes, facial features are extracted, encompassing eye movement, blink count, lip movement, and forehead/eyebrow dynamics. These features provide insights into the candidate's behavior and emotional expressions.
[0021] Simultaneously, the system converts audio content into text and evaluates its alignment with predefined resume context criteria. Videos failing this alignment test are promptly discarded. Acoustic features, including MFCC and spectrogram, along with transcription features such as hesitation markers and speaking rate, are extracted from the audio content. Each candidate is assigned individual scores based on facial features, extracted text, acoustic features, and transcription features.
[0022] Further refinement is achieved by analyzing eye movement patterns, blink counts, and lip movement for each candidate, providing deeper insights into attention, comfort, and audio-visual synchronization. Textual content extraction involves sophisticated techniques like audio-to-text conversion, NER, sentiment analysis, keyword extraction, and NLP, ensuring a nuanced understanding of the spoken words.
[0023] The set of acoustic features, covering MFCC, spectrogram, and others, contributes to a holistic assessment. Transcription features, including hesitation markers and speaking rate, offer additional layers of evaluation. The system's versatility is enhanced by a text classification technique for categorizing textual content and comparing it against predefined resume context criteria.
[0024] In the evaluation process, the controller detects non-resume-related, offensive, or vulgar words, adding a layer of content scrutiny. To fine-tune the confidence score generation, the system assigns weights to individual scores and utilizes a machine learning model. This model optimizes the confidence score derived from the combination of facial features, extracted text, acoustic features, and transcription features.
[0025] Another aspect of the present disclosure pertains to a method for video resume analysis that provides a systematic approach for evaluating candidates through their video submissions. Initially, a set of candidate-associated videos is received by an input unit, and a controller, equipped with processors, verifies the received videos to detect whether the received videos indicate a video resume or not, if not indicate the video resume, discard the video. And, when the video indicates the video resume, extract one or more facial features from each video This extraction includes intricate analyses such as eye movement patterns, blink counts, lip movement, and observation of forehead and eyebrow dynamics. These features offer nuanced insights into attention, comfort, audio-visual synchronization, and emotional expressions during the video. Simultaneously, the method includes converting the audio content of each video into a textual format and extracting textual content. The extracted text undergoes a rigorous evaluation for alignment with predefined resume context criteria. Videos failing this alignment test are promptly discarded. Additionally, the method involves extracting a set of acoustic features and a set of transcription features from the audio content of each video.
[0026] The generated individual scores for each candidate are based on evaluations of facial features, extracted text, acoustic features, and transcription features, conducted separately.
[0027] Various objects, features, aspects, and advantages of the inventive subject matter will become more apparent from the following detailed description of preferred embodiments, along with the accompanying drawing figures in which numerals represent components.

BRIEF DESCRIPTION OF DRAWINGS
[0028] The accompanying drawings are included to provide a further understanding of the present disclosure and are incorporated in, and constitute a part of this specification. The drawings illustrate exemplary embodiments of the present disclosure, and together with the description, serve to explain the principles of the present disclosure.
[0029] FIG. 1 illustrates an exemplary network architecture of a proposed system for video resume analysis, in accordance with an embodiment of the present disclosure.
[0030] FIG. 2 illustrates an exemplary architecture of a proposed system for video resume analysis, in accordance with an embodiment of the present disclosure.
[0031] FIG. 3 illustrates an exemplary view of a flow diagram of a proposed method for analysing video resumes, in accordance with some embodiments of the present disclosure.

DETAILED DESCRIPTION
[0032] The following is a detailed description of embodiments of the disclosure depicted in the accompanying drawings. The embodiments are in such detail as to clearly communicate the disclosure. However, the amount of detail offered is not intended to limit the anticipated variations of embodiments; on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the present disclosure as defined by the appended claims.
[0033] References to “an embodiment”, “an exemplary embodiment”, “one example”, “an example”, “for instance”, and so on, indicate that the embodiment(s) or example(s) may include a particular feature, structure, characteristic, property, element, or limitation, but that not every embodiment or example necessarily includes that particular feature, structure, characteristic, property, element, or limitation. Furthermore, repeated use of the phrase “in an embodiment” does not necessarily refer to the same embodiment.
[0034] Embodiments of present disclosure pertain to the field of human resources management and recruitment processes. More particularly, it pertains to a system and method for comprehensive analysis of video resumes to determine and assign a candidate confidence score.
[0035] An embodiment of the present disclosure pertains to a system for video resume analysis that includes an input unit, which receives a set of candidate-associated videos, and a controller, equipped with processors and memory, orchestrates the system's operations. Initially, the system verifies the received videos to detect whether the received videos indicate a video resume or not, if not indicate the video resume, discard the video. And, when the video indicates video resumes, facial features are extracted, encompassing eye movement, blink count, lip movement, and forehead/eyebrow dynamics. These features provide insights into the candidate's behavior and emotional expressions.
[0036] Simultaneously, the system converts audio content into text and evaluates its alignment with predefined resume context criteria. Videos failing this alignment test are promptly discarded. Acoustic features, including MFCC and spectrogram, along with transcription features such as hesitation markers and speaking rate, are extracted from the audio content. Each candidate is assigned individual scores based on facial features, extracted text, acoustic features, and transcription features.
[0037] Further refinement is achieved by analyzing eye movement patterns, blink counts, and lip movement for each candidate, providing deeper insights into attention, comfort, and audio-visual synchronization. Textual content extraction involves sophisticated techniques like audio-to-text conversion, NER, sentiment analysis, keyword extraction, and NLP, ensuring a nuanced understanding of the spoken words.
[0038] In an embodiment, the set of acoustic features, covering MFCC, spectrogram, and others, contributes to a holistic assessment. Transcription features, including hesitation markers and speaking rate, offer additional layers of evaluation. The system's versatility is enhanced by a text classification technique for categorizing textual content and comparing it against predefined resume context criteria.
[0039] In an embodiment, in the evaluation process, the controller detects non-resume-related, offensive, or vulgar words, adding a layer of content scrutiny. To fine-tune the confidence score generation, the system assigns weights to individual scores and utilizes a machine learning model. This model optimizes the confidence score derived from the combination of facial features, extracted text, acoustic features, and transcription features.
[0040] Another embodiment of the present disclosure pertains to a method for video resume analysis that provides a systematic approach for evaluating candidates through their video submissions. Initially, a set of candidate-associated videos is received by an input unit, and a controller, equipped with processors, verify the received videos to detect whether the received videos indicating a video resume or not, if not indicating the video resume, discard the video. And, when the video indiactes video resume, extracts one or more facial features from each video. This extraction includes intricate analyses such as eye movement patterns, blink counts, lip movement, and observation of forehead and eyebrow dynamics. These features offer nuanced insights into attention, comfort, audio-visual synchronization, and emotional expressions during the video.
[0041] Simultaneously, the method includes converting the audio content of each video into a textual format and extracting textual content. The extracted text undergoes a rigorous evaluation for alignment with predefined resume context criteria. Videos failing this alignment test are promptly discarded. Additionally, the method involves extracting a set of acoustic features and a set of transcription features from the audio content of each video.
[0042] The generated individual scores for each candidate are based on evaluations of facial features, extracted text, acoustic features, and transcription features, conducted separately.
[0043] The manner in which the proposed system works is described in further detail in conjunction with FIGs. 1 to 3. It may be noted that these figures are only illustrative, and should not be construed to limit the scope of the subject matter in any manner.
[0044] FIG. 1 illustrates an exemplary network architecture of a proposed system for video resume analysis, in accordance with an embodiment of the present disclosure.
[0045] In an embodiment, referring to FIG. 1, a system 102 for video resume analysis to revolutionize a recruitment process is disclosed. The system 102 is configured to receive videos, i.e. video resumes, which act as comprehensive self-introductions covering various aspects of one's professional career, including education, work experience, skill sets, and projects.
[0046] The system 102 will be connected to a network 106, which is further connected to at least one computing device 108-1, 108-2, … 108-N (collectively referred to as computing device 108, herein). The communication network 106 may include, but not be limited to, a wireless network, a wired network, an internet, an intranet, a public network, a private network, a packet-switched network, a circuit-switched network, an ad hoc network, an infrastructure network, a Public-Switched Telephone Network (PSTN), a cable network, a cellular network, a satellite network, a fiber optic network, or some combination thereof.
[0047] In an embodiment, the computing device 108 may communicate with the system 102 via a set of executable instructions residing on any operating system to receive the grayscale images. In an embodiment, the one or more computing devices 108 may include, but not be limited to, any electrical, electronic, electromechanical, or equipment, or a combination of one or more of the above devices such as mobile phone, smartphone, Virtual Reality (VR) devices, Augmented Reality (AR) devices, laptop, a general-purpose computer, desktop, personal digital assistant, tablet computer, mainframe computer, or any other computing device. It may be appreciated that computing devices 108 may not be restricted to the mentioned devices and various other devices may be used. The computing device 108 is configured to display a graphical user interface, providing candidates with an intuitive and convenient way to share the video resume.
[0048] Upon receipt of the video from the input unit 104, system 102 verifies the received videos using a video classifier to detect whether the received videos indicate a video resume. If the video does not indicate video resume, it is discarded. When the video indicates a video resume, the system further extracts one or more facial features of the candidate, converts the audio content from the video into a textual format, and extracts textual content.
[0049] Further, system 102 evaluates the extracted textual content for alignment with a predefined resume context criteria, and upon detection of the audio content as not aligned to the predefined resume context criteria, discard the video. Further, system 102 extracts a set of acoustic features and a set of transcription features from the audio content of the video, and generates an individual score for the candidate for facial features, the extracted text, the acoustic features and the transcription features separately, and combine the generated individual scores to generate a confidence score for the candidate.
[0050] In an embodiment, system 102 includes a server 110 communicatively coupled to the system by the network 106. The server 110 is a central repository for storing various types of data, including but not limited to, data associated with video resumes, trained machine learning models, and the predefined criteria used for evaluating the resumes. Further, centralized storage location is advantageous as this allows for efficient management and organization of crucial system components. For example, the server can store the video resumes received by the system, making them easily accessible for analysis. This can also store machine learning models that have been trained to perform specific tasks related to video resume analysis. Additionally, the predefined resume context criteria, which serve as a benchmark for evaluating the resumes, can be stored on the server.
[0051] Although FIG. 1 shows exemplary components of the network architecture 100, in other embodiments, the network architecture 100 may include fewer components, different components, differently arranged components, or additional functional components than depicted in FIG. 1. Additionally, or alternatively, one or more components of the network architecture 100 may perform functions described as being performed by one or more other components of the network architecture 100.
[0052] FIG. 2 illustrates an exemplary architecture of a proposed system for video resume analysis, in accordance with an embodiment of the present disclosure.
[0053] In an aspect, referring to FIG. 2, a system 102 may include a controller 202 that comprises one or more processor(s). The controller 202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, edge or fog microcontrollers, digital signal processors, central processing units, logic circuitries, and/or any devices that process data based on operational instructions. Among other capabilities, the controller 202 may be configured to fetch and execute computer-readable instructions stored in a memory 204 of the system 102. The memory 204 may be configured to store one or more computer-readable instructions or routines in a non-transitory computer-readable storage medium, which may be fetched and executed to create or share data packets over a network service. The memory 204 may comprise any non-transitory storage device including, for example, volatile memory such as Random Access Memory (RAM), or non-volatile memory such as Erasable Programmable Read-Only Memory (EPROM), flash memory, and the like.
[0054] The system 102 may include an interface(s) 206. The interface(s) 206 may comprise a variety of interfaces, for example, interfaces for data input and output devices, referred to as I/O devices, storage devices, and the like. The interface(s) 206 may facilitate communication to/from the system 102. The interface(s) 206 may also provide a communication pathway for one or more components of the system 102. Examples of such components include but are not limited to, processing unit/engine(s) 208 and a database 210.
[0055] In an embodiment, the processing unit/engine(s) 208 may be implemented as a combination of hardware and programming (for example, programmable instructions) to implement one or more functionalities of the processing engine(s) 208. In the examples described herein, such combinations of hardware and programming may be implemented in several different ways. For example, the programming for the processing engine(s) 208 may be processor-executable instructions stored on a non-transitory machine-readable storage medium and the hardware for the processing engine(s) 208 may comprise a processing resource (for example, one or more processors), to execute such instructions. In the present examples, the machine-readable storage medium may store instructions that, when executed by the processing resource, implement the processing engine(s) 208. In such examples, system 102 may comprise the machine-readable storage medium storing the instructions and the processing resource to execute the instructions, or the machine-readable storage medium may be separate but accessible to system 102 and the processing resource. In other examples, the processing engine(s) 208 may be implemented by electronic circuitry.
[0056] In an embodiment, the database 210 may include data that may be either stored or generated as a result of functionalities implemented by any of the components of the controller 202 or the processing engine 208. In an embodiment, the database 210 may be separate from the system 102.
[0057] In an exemplary embodiment, the processing engine 208 may include one or more engines selected from any of a video resume collection module 212, a video processing module 214, an audio and text processing module 216, a speech processing module 218, a scoring module 220, a training and testing module 222, and other module(s) 224. The other module(s) 224 have functions that may include but are not limited to testing, storage, and peripheral functions, such as a wireless communication unit for remote operation, an audio unit for alerts and the like.
[0058] In an embodiment, the video resume collection module 212 may be configured to receive a set of videos (interchangeably referred to as video resume, hereinafter), each associated to a candidate, from an input unit 102. Specifically, the input unit 102 serves as a gateway for the video resume collection module 212, facilitating the reception of the videos, with each video being associated with a particular candidate. The input unit 102 is configured to handle and process the incoming video resumes, ensuring that they are appropriately directed to the video resume collection module 212 for subsequent analysis. This component acts as the interface through which the system can access and incorporate video resumes into its workflow, initiating the comprehensive evaluation process. Further, upon receipt of the video from the input unit 104, the video resume collection module 212 verifies the received videos using a custom-trained video classifier with hybrid models. These models are specifically designed to detect whether the video resume uploaded by the candidate is, indeed, a video resume. If the video does not indicate video resume, the the video resume collection module 212 eliminates the video uploaded by the candidate, recognizing it does not meet the criteria for video resume based on visual cues. Continuing further, when the video indicates video resume, the system processes the video in further steps.
[0059] In an embodiment, the video processing module 214 is configured to meticulously analyze and extract various facial features from the received video resumes, ensuring a comprehensive understanding of the featured individual's behavior and attitude. The Video-Resumes undergo thorough processing from an initial frame to a concluding frame. Additionally, predefined libraries are employed to extract specific facial features, including eye movement, blink count, lips movement, and the movement of forehead and eyebrows. These features provide insights into malpractice tracking, assessing comfort or nervousness, ensuring audio synchronization, and understanding cognitive processing and emotional expressions. Undesired actions like hand movements, adjustments to hair, or inappropriate expressions are carefully noted.
[0060] The video processing module 214 utilizes algorithms to analyze the individual's behavior with a specifically trained pose-detection model, quantifying both behavioral and attitudinal aspects. Through the analysis of all extracted features and their comparison with benchmark levels, the system determines and quantifies the level of confidence exhibited by the individual in the video resume. Further, super-resolution techniques are applied to elevate the quality and clarity of video resumes to higher standards. Additionally, for user-friendliness and consistency, all video resumes are standardized to uniform levels of clarity, brightness, and contrast.
[0061] In an exemplary embodiment, pose detection includes algorithms that can identify and track the human body's pose, which includes the positions and orientations of various body parts such as head, torso, arms, and legs. Additionally, pose-detection models are often trained on large datasets to learn the relationships between different body parts and their typical configurations. The pose-detection models can vary, and there are several approaches and architectures for pose estimation, including Convolutional Pose Machines (CPM), OpenPose, and various deep learning-based methods. The choice of the pose-detection algorithm depends on factors such as the specific requirements of the application, the available training data, and the desired level of accuracy.
[0062] Moreover, several initial models are built for specific use cases, eliminating video resumes that do not meet the criteria or are non-resume videos. These facial features (or visual features) are crucial for scoring the confidence estimation of a candidate and are extracted during the processing stage, including behavioral analysis of self-introduction actions, assessment of stress or nervousness through facial expression analysis, detection of script reading through eye analysis, and evaluation of attentiveness through frame-by-frame pose analysis and head pose estimation.
[0063] Each visual feature plays a significant role in estimating the confidence score of the candidate, emphasizing the importance of accurate analysis in capturing even subtle visual cues that may impact the confidence assessment.
[0064] In an embodiment, the audio and text processing module 216 may be configured for comprehensively analyzing video resumes. Initially, this converts the spoken words, or audio content, from each video into written text, or textual content, utilizing technologies like OpenAI Whisper. This conversion allows for efficient analysis of information presented in the video resumes. Further, the audio and text processing module 216 includes cleaning the extracted textual content by removing any extraneous noise or artifacts introduced during the conversion process. Also, includes standardizing the formatting and structure of the text to maintain consistency across all video resumes. Further, focuses on assessing the relevance of the textual content to the expected theme of a video resume. Criteria for relevance include self-introduction, qualifications, skill sets, professional experience, and work ethics. If the textual content does not align with these criteria, the video is discarded.
[0065] The audio and text processing module 216 may be further configured to enhance context evaluation, Named Entity Recognition (NER) techniques are employed to recognize and classify entities, such as names, organizations, and locations, within the textual content. A specifically trained NER model on resume-related content helps identify and flag non-resume-related, offensive, or vulgar words or sentences, resulting in the rejection of videos containing such elements. Further, text classification algorithms categorize the textual content within each video, enabling the system to compare this with predefined resume context criteria. This ensures effective differentiation between various types of textual information, based on established criteria.
[0066] Continuing further, the audio and text processing module 216 conducts additional analyses, including sentiment analysis to understand the emotional tone conveyed by the individual in the video, and keyword extraction to highlight relevant skills, experiences, or qualifications.
[0067] Continuing further, Natural Language Processing (NLP) techniques are applied for contextual understanding, encompassing semantics and relationships within the extracted textual content.
[0068] Continuing further, grammar and spell checks are performed to guarantee the accuracy and professionalism of the textual content from the video resumes.
[0069] The culmination of all these features is compared against the predefined resume context criteria to evaluate confidence level of the individual. In essence, the audio and text processing module ensures a thorough and standardized evaluation of both audio and textual content within video resumes.
[0070] In an embodiment, the speech processing module 218 may be configured to extract both acoustic and transcription features from the audio content of each video resume. For acoustic features, a library Librosa is utilized to process audio obtained from the video. An exemplary list outlines various acoustic and transcription features extracted from the audio content of video resumes, each playing a unique role in characterizing the speaker's (i.e. candidate) communication style and confidence. TRANSCRIPTION FEATURES:
[0071] Number of Hesitation Markers: This includes counting occurrences of common hesitation markers like 'uh,' 'umm,' 'er,' 'ah,' and 'hmm' in the transcription, indicating moments of uncertainty or contemplation.
[0072] Response Length: The total number of words in the transcription, providing a measure of the length of the spoken content.
[0073] Pauses Ratio: The ratio of the total pause duration, including hesitations, to the total speaking time, offering insights into the speaker's pacing and fluency.
[0074] Filler Ratio: The ratio of total disfluencies (fillers) to the total number of content words, indicating the prevalence of speech fillers like 'uh' and 'um.'
[0075] Speaking Rate: The average number of tokens (words) spoken per second, indicating the speaker's pace.
[0076] Silence Per Token: The ratio of the total number of silence intervals to the total number of tokens (words) in the transcription, reflecting moments of quietness during speech.
ACOUSTIC FEATURES:
[0077] MFCC (Mel Frequency Cepstral Coefficients): A compact representation of the spectral characteristics of an audio signal, capturing frequency information.
[0078] Spectrogram: A visual representation showing the magnitude or intensity of different frequencies present in the signal over time.
[0079] Energy Entropy: A measure of the distribution or randomness of energy across different frequency bands.
[0080] Spectral Centroid: A measure of the center of mass or average frequency of the signal's power spectrum.
[0081] Spectral Spread: Quantifies the width or dispersion of the signal's power spectrum around its spectral centroid, indicating the range of frequencies present.
[0082] Spectral Entropy: Measures the level of randomness or complexity in the distribution of spectral energy across different frequency bands.
[0083] Fundamental Frequency: The lowest dominant frequency component in the signal spectrum.
[0084] Mean Fundamental Frequency: The average of fundamental frequencies.
[0085] Range of Frequencies: Describes the span of frequencies present in the signal.
[0086] Amplitude Mean: The average of the absolute values of the audio signal.
[0087] Amplitude Variance: The variance of the absolute values of the audio signal.
[0088] Zero Crossing Rate: The number of zero crossings per duration of the signal, indicating changes in the signal's polarity.
[0089] These features collectively contribute to a comprehensive analysis of the speaker's confidence and communication style, enabling a holistic assessment in the context of video resumes. Further, these features provide insights into the prosodic features of speech, helping to gauge the confidence level of the candidate. Additionally, the acoustic features involve complex characteristics like MFCC and other spectral domain features, while transcription features focus on prosodic elements such as hesitation markers and speaking rate. For one example, upon extraction of these features, annotation of video files for confidence levels is performed. During this manual annotation process, human annotators evaluate and assess the confidence level of candidates. They rely on cues provided by both acoustic and transcription features. This assessment results in the creation of a labeled dataset where each video is paired with its corresponding confidence level.
[0090] Subsequently, training of a machine learning model using this annotated dataset. The model is specifically crafted to predict a candidate's confidence level autonomously. During this prediction, the model takes into consideration both the acoustic and transcription features. Further, integration of both sets of features enables the model to offer a comprehensive and accurate prediction of the confidence exhibited by candidates in their video resumes. This ensures that the model captures nuanced aspects of a candidate's presentation and communication style. By considering a diverse range of features, the model becomes more adept at understanding and predicting the subtle variations in confidence levels, resulting in a more robust and reliable confidence level prediction for each candidate.
[0091] In an embodiment, the scoring module 220 may be configured to generate individual scores for a candidate based on various aspects, including facial features, extracted text, acoustic features, and transcription features. Each of these individual scores is generated separately. To enhance the precision of the generated confidence score, the controller assigns weights to each of the individual scores. Additionally, a machine learning model is employed, trained specifically to optimize the generated confidence score by considering a combination of each individual score for the candidate. This machine learning model learns from an annotated dataset, incorporating nuanced relationships between different features and generating a confidence score for the candidate.
[0092] In an exemplary embodiment, the determination of the confidence level in a VideoResume is not solely based on the video content. Instead, this includes the incorporation of data from the video's audio and the extracted text from that audio. This holistic approach ensures that every aspect, including facial expressions, spoken words, and linguistic content, is meticulously utilized and monitored to precisely calculate the confidence level of the candidate.
[0093] In an exemplary embodiment, the features considered for scoring include sentiment analysis, which assesses the emotional tone conveyed by the individual in the video resume. Keyword extraction identifies and extracts key terms or keywords highlighting relevant skills, experiences, or qualifications mentioned by the candidate. The usage of standard words in the introduction, as well as grammar and spell checks, are also crucial factors in estimating the confidence score. These features collectively contribute to a comprehensive evaluation of the candidate's confidence level, ensuring a robust and informed assessment.
[0094] In an embodiment, the training and testing module 222 may be configured to receive video resumes from various platforms using web scraping techniques. This includes employing search queries on open-source video streaming platforms to gather a diverse set of video resumes for training purposes. The utilization of web scraping allows for the collection of a wide range of video resumes, enhancing the diversity of the training dataset. During the processing phase, a lightweight state-of-the-art (SOTA) model is employed to analyze individual frames within the Video-Resumes at specific intervals. This accelerates the evaluation process by focusing on keyframes rather than processing the entire video continuously. The initial check conducted by the model verifies whether the Video-Resumes feature the individual on camera, ensuring that the candidate is appropriately presented.
[0095] Subsequent criteria are to ensure that the content within the video resume adheres to certain standards. This includes avoiding offensive language and steering clear of unrelated topics that deviate from the intended purpose of the video resume. This quality check ensures that the collected video resumes maintain a level of professionalism and relevance.
[0096] The processing phase itself is segmented into three distinct working phases: Video Processing, Text Processing, and Speech Processing. These phases operate simultaneously after the initial model check. The Video Processing phase focuses on the visual content, Text Processing analyzes the textual information extracted from the videos, and Speech Processing deals with the audio content. This segmentation allows for a comprehensive and efficient analysis of video resumes, covering different modalities and aspects simultaneously. This ensures a thorough evaluation of each video resume by considering visual, textual, and auditory elements in a synchronized manner.
[0097] Additionally, a testing phase to assess the model's performance. After training the model using the collected Video-Resumes, this undergoes evaluation on a separate testing dataset consisting of unseen examples. This testing dataset is carefully chosen to represent real-world scenarios and challenges that the model may encounter. During testing, the model processes new video resumes it has not seen before, predicting confidence levels based on extracted features. The model's performance is assessed by comparing its predictions to the known confidence levels in the testing dataset. Metrics such as accuracy, precision, recall, and F1 score are used to quantify how well the model generalizes to new data, providing insights into its effectiveness in predicting confidence levels in video resumes. Further, regular testing ensures ongoing model accuracy and facilitates iterative improvements for enhanced performance over time.
[0098] In an exemplary embodiment, candidates submit video resumes online. System 102 checks if the videos follow certain standards, like no offensive language, and analyzes the videos using AI. This looks at facial expressions, what's said, and how it's said. For the text in the videos, system 102 checks for important info and feelings. If the system sees good stuff, it gives higher scores. All these scores are added up to decide if the candidate's video is good or not. The system learns from past decisions to get better over time. If a video is accepted, this means the candidate performed well in showing skills and qualifications. If not, this indicates the video couldn’t meet the expected standards.
[0099] FIG. 3 illustrates an exemplary view of a flow diagram of a proposed method for analysing video resume, in accordance with some embodiments of the present disclosure.
[00100] In an embodiment, a method 300 for video resume analysis is disclosed. The method 200 begins at step 302, where an input unit 104, receives a set of videos, each associated with a candidate. The input unit 104 acts as the interface through which it interacts with external sources providing video resumes. The set of videos received by the input unit includes individual video submissions from candidates, each showcasing their qualifications, skills, and experiences. These videos could be submitted through various channels, such as online platforms, job portals, or any other means that facilitates the collection of video resumes.
[00101] Continuing further, at step 304, a controller 202, verifies the received set of videos from a video classifier for detecting whether the received set of videos indicates a video resume or not, if not indicate the video resume, discard the video.
[00102] Continuing further, at step 306, the controller 202, upon successful verification of the video, when the video indicates the video resume, extracting one or more facial features of the candidate from the associated video. For extraction of the facial features, firstly, the controller 202 analyzes the eye movement patterns of the candidate to gauge their attention and engagement levels throughout the duration of the video. This provides insights into the candidate's focus and interest in the content presented in the video.
[00103] Continuing with the facial feature extraction, the controller 202 goes on to determine the blink count of the candidate. The blink count is utilized as an indicator to evaluate the candidate's levels of comfort or nervousness during the video. Increased blink rates may suggest nervousness, while a normal or lower blink count could signify a more comfortable demeanor.
[00104] Additionally, controller 202 monitors the movement of the candidate's lips to assess audio-visual synchronization in the associated video. This ensures that the audio content aligns appropriately with the lip movements, contributing to quality and coherence of the video presentation.
[00105] Continuing further, controller 202 observes movement of the candidate's forehead and eyebrows. This analysis is conducted to delve into the cognitive processing and emotional expressions of the candidate during the video. Changes in these facial features can provide valuable insights into the candidate's thought processes and emotional responses throughout the presentation.
[00106] Continuing further, at step 308, controller 202, converts audio content from each video into a textual format separately and extracts textual content. The extraction of textual content is a multi-faceted procedure that includes several key techniques to enhance the quality and relevance of the extracted information. Firstly, an audio-to-text conversion technique is employed to transform the spoken words in the video into a written textual representation. This conversion is essential for making the spoken content accessible and analyzable by the system.
[00107] Subsequently, the extracted textual content undergoes text pre-processing. This step aims to refine the transcribed text by removing any unwanted elements such as noise, irrelevant information, or artifacts introduced during the transcription process. Standardizing the formatting and structure of the text is also part of this pre-processing, ensuring consistency for further analysis. Named Entity Recognition (NER) is applied to recognize and classify entities within the textual content. This involves identifying and categorizing elements such as names, organizations, and locations mentioned in the text, providing valuable insights into the candidate's background and experiences. Further to understand the emotional tone conveyed by the candidate in the video, sentiment analysis is performed on the extracted textual content. This analysis helps gauge the candidate's sentiments and expressions during the self-introduction, contributing to a more nuanced evaluation.
[00108] Additionally, keyword extraction techniques are applied to identify and extract key terms, highlighting the most relevant skills, experiences, and qualifications spoken by the candidate. This step is crucial for pinpointing the candidate's strengths and aligning them with the job requirements. Further, application of natural language processing (NLP) techniques follows, facilitating contextual understanding by interpreting semantics and relationships within the extracted textual content. NLP enhances the system's ability to comprehend the candidate's narrative and extract meaningful insights. Moreover, grammar and spell checks are conducted to ensure the accuracy and professionalism of the extracted textual content. This step contributes to maintaining a high standard of language quality in the video resumes.
[00109] Continuing further, at step 310, controller 202, evaluates the extracted textual content from the audio content of each video for alignment with a predefined resume context criteria by utilizing a text classification technique. This technique includes categorizing the textual content within each video based on predefined criteria. The categorization helps classify the content into specific thematic groups, allowing for a more structured and systematic analysis. Following the categorization, the next step is to compare the categorized textual content with the predefined resume context criteria. The predefined criteria serve as benchmarks or standards that define what constitutes relevant and desirable content in the context of a resume. This includes assessing whether the content aligns with the expected attributes, qualifications, and information typically found in a professional resume.
[00110] Further, by employing this technique and comparing the categorized content with predefined criteria, the system ensures a targeted and contextually relevant analysis of the textual information. This step enhances the system's ability to filter out non-relevant or extraneous content, focusing on the aspects that are crucial for evaluating a candidate's suitability for a particular job.
[00111] Continuing further, at step 312, controller 202, discards the video upon detection of the audio content not aligned with the predefined resume context criteria. The controller 202 identifies instances of non-resume-related words, offensive language, or vulgar expressions within the audio content of the video. The detection of such elements serves as an indication that the video may contain inappropriate or unrelated content, compromising its suitability for a professional job application. The use of this criteria-based approach allows the method 300 to automatically filter out videos that deviate from the expected norms of professional communication. By identifying and flagging non-resume-related words, offensive language, or vulgar expressions, the system ensures that the videos selected for further consideration maintain a level of professionalism and appropriateness in their content.
[00112] Continuing further, at step 314, the controller 202, extracts a set of acoustic features and a set of transcription features from the audio content of each video. The acoustic features include such as but not limited to Mel Frequency Cepstral Coefficients (MFCC), spectrogram, energy entropy, normalized energy, spectral centroid, spectral spread, spectral entropy, fundamental frequency, mean fundamental frequency, range of frequencies, amplitude mean, amplitude variance, zero crossing rate. The transcription features include such as but not limited to hesitation markers, response length, pauses ratio, filler ratio, speaking rate, and silence per token.
[00113] Continuing further, at step 316, the controller 202, generates an individual score for the candidate for the facial features, the extracted text, the acoustic features, and the transcription features separately. Each of these components contributes unique insights into the candidate's presentation, communication skills, and other relevant attributes. The generation of individual scores includes analyzing the performance of the candidate in each category separately. For example, facial features may include assessments of eye movement patterns, blink count, lip and forehead movement, providing insights into attention, comfort levels, audio-visual synchronization, and emotional expressions during the video.
[00114] Simultaneously, the extracted text undergoes thorough evaluation, considering factors such as semantic content, sentiment, named entities, keywords, language structure, and grammar. Acoustic features and transcription features, capturing nuances in speech patterns, pauses, and other audio characteristics, are also individually scored.
[00115] Following the generation of these individual scores, controller 202 proceeds to combine them into a confidence score for the candidate. This consolidated score provides a holistic measure of the candidate's performance across multiple dimensions, offering a more nuanced and accurate representation of their suitability for the role.
[00116] To enhance the precision of the confidence score, controller 202 is further configured to assign weights to each of the individual scores. This weighting mechanism allows certain aspects, deemed more critical or informative, to have a greater impact on the final confidence score. The controller 202 utilizes a trained lightweight state-of-the-art (SOTA) model, trained for this specific purpose, to optimize the combination of individual scores and their respective weights.
[00117] Thus, the present disclosure provides system and method for thorough analysis of video resumes, with a primary objective of determining and assigning a candidate confidence score. This system enhances efficiency in candidate evaluation by incorporating advanced facial feature analysis and non-verbal cue assessments. This ensures the relevance of video resume content through sophisticated text processing techniques and maintains professionalism by detecting and flagging offensive or vulgar language. Leveraging cutting-edge audio processing techniques, the system provides a comprehensive evaluation of candidates' communication skills. Additionally, machine learning is employed to precisely score confidence, optimizing the amalgamation of various assessment components. This promotes fairness and objectivity in candidate evaluation, relying on standardized criteria and benchmarks, while also offering seamless integration with existing recruitment platforms for a positive and professional candidate experience.
[00118] The disclosed methods and systems, as illustrated in the ongoing description or any of its components, may be embodied in the form of a computer system. Typical examples of a computer system include a general-purpose computer, a programmed microprocessor, a micro-controller, a peripheral integrated circuit element, and other devices, or arrangements of devices that are capable of implementing the steps that constitute the method of the disclosure.
[00119] The computer system includes a computer, an input device, a display unit, and the internet. The computer further includes a microprocessor. The microprocessor is connected to a communication bus. The computer also includes a memory. The memory may be RAM or ROM. The computer system further comprises a storage device, which may be an HDD or a removable storage drive such as a floppy-disk drive, an optical-disk drive, and the like. The storage device may also be a means for loading computer programs or other instructions onto the computer system. The computer system also includes a communication unit. The communication unit allows the computer to connect to other databases and the internet through an input/output (I/O) interface, allowing the transfer as well as reception of data from other sources. The communication unit may include a modem, an Ethernet card, or other similar devices that enable the computer system to connect to databases and networks, such as LAN, MAN, WAN, and the Internet. The computer system facilitates input from a user through input devices accessible to the system through the I/O interface.
[00120] To process input data, the computer system executes a set of instructions stored in one or more storage elements. The storage elements may also hold data or other information, as desired. The storage element may be in the form of an information source or a physical memory element present in the processing machine.
[00121] The programmable or computer-readable instructions may include various commands that instruct the processing machine to perform specific tasks, such as steps that constitute the method of the disclosure. The systems and methods described can also be implemented using only software programming or only hardware, or using a varying combination of the two techniques. The disclosure is independent of the programming language and the operating system used in the computers. The instructions for the disclosure can be written in all programming languages, including, but not limited to, ‘C’, ‘C++’, ‘Visual C++’, ‘Visual Basic’, ‘Java’, and ‘Python’. Further, software may be in the form of a collection of separate programs, a program module containing a larger program, or a portion of a program module, as discussed in the ongoing description. The software may also include modular programming in the form of object-oriented programming. The processing of input data by the processing machine may be in response to user commands, the results of previous processing, or a request made by another processing machine. The disclosure can also be implemented in various operating systems and platforms, including, but not limited to, ‘Unix’, ‘DOS’, ‘Android’, ‘Symbian’, and ‘Linux’.
[00122] The programmable instructions can be stored and transmitted on a computer-readable medium. The disclosure can also be embodied in a computer program product comprising a computer-readable medium, or with any product capable of implementing the above methods and systems, or the numerous possible variations thereof.
[00123] Various embodiments of the system and method for comprehensive analysis of video resumes to determine and assign a candidate confidence score have been disclosed. However, it should be apparent to those skilled in the art that modifications in addition to those described are possible without departing from the inventive concepts herein. The embodiments, therefore, are not restrictive, except in the spirit of the disclosure. Moreover, in interpreting the disclosure, all terms should be understood in the broadest possible manner consistent with the context.
[00124] In particular, the terms “comprises” and “comprising” should be interpreted as referring to elements, components, or steps, in a non-exclusive manner, indicating that the referenced elements, components, or steps may be present, used, or combined with other elements, components, or steps that are not expressly referenced.
[00125] Those having ordinary skills in the art will appreciate that the systems, modules, and sub-modules have been illustrated and explained to serve as examples and should not be considered limiting in any manner. It will be further appreciated that the variants of the above-disclosed system elements, modules, and other features and functions, or alternatives thereof, may be combined to create other different systems or applications.
[00126] Those skilled in the art will appreciate that any of the aforementioned steps and/or system modules may be suitably replaced, reordered, or removed, and additional steps and/or system modules may be inserted, depending on the needs of a particular application. In addition, the systems of the aforementioned embodiments may be implemented using a wide variety of suitable processes and system modules, and are not limited to any particular computer hardware, software, middleware, firmware, microcode, and the like.
[00127] The claims can encompass embodiments for hardware and software or a combination thereof.
[00128] It will be appreciated that variants of the above disclosed, and other features and functions or alternatives thereof, may be combined into many other different systems or applications. Presently unforeseen or unanticipated alternatives, modifications, variations, or improvements therein may be subsequently made by those skilled in the art, which are also intended to be encompassed by the following claims.

ADVANTAGES OF THE PRESENT DISCLOSURE
[00129] The present disclosure provides a system and method for conducting a thorough analysis of video resumes, with the ultimate aim of determining and assigning a confidence score to each candidate.
[00130] The present disclosure provides a system and method that streamlines efficiency of candidate evaluation by delving into the analysis of facial features and non-verbal cues.
[00131] The present disclosure provides a system and method that guarantees the relevance of video resume content through the utilization of advanced text processing techniques.
[00132] The present disclosure provides a system and method that focuses on professionalism, and thus identifies and flags offensive or vulgar language within video resumes.
[00133] The present disclosure provides a system and method that harnesses cutting-edge audio processing techniques, offering a more thorough evaluation of candidates' communication skills.
[00134] The present disclosure provides a system and method that integrates machine learning for precise confidence scoring, optimizing the amalgamation of various assessment components.
[00135] The present disclosure provides a system and method that champions fairness and objectivity in candidate evaluation by relying on standardized criteria and benchmarks.
[00136] The present disclosure provides a system and method that seamlessly integrates with existing recruitment platforms, allowing for easy adoption in diverse hiring processes.
[00137] The present disclosure provides a system and method that contributes to a positive candidate experience by ensuring that the evaluation process is centered on professional attributes and skills.
, Claims:We Claim:
1. A system (100) for video resume analysis comprising:
an input unit (102) configured to receive a set of videos, each associated to a candidate; and
a controller (202) in communication with the input unit, and the controller (202) comprising one or more processors, wherein the one or more processors are operatively coupled with a memory (204), the memory storing instructions executable by the one or more processors to:
receive the set of videos of candidates from the input unit (104);
verify the received set of videos from a video classifier to detect whether the received set of videos indicates a video resume;
extract one or more facial features of each candidate from the associated video, upon successful verification;
convert audio content from each video into a textual format separately, and extract textual content;
evaluate the extracted textual content from the audio content of each video for alignment with a predefined resume context criteria;
discard the video, upon detection of the audio content detected as not aligned to the predefined resume context criteria;
extract a set of acoustic features and a set of transcription features from the audio content of each video; and
generate an individual score for the candidate for the one or more facial features, the extracted text, the set of acoustic features and the set of transcription features separately, and combine the generated individual scores to generate a confidence score for the candidate.

2. The system as claimed in claim 1, wherein the one or more facial features extracted by the controller further comprises:
analyse eye movement patterns of the candidate to assess attention and engagement during the associated video;
determine blink count of the candidate to evaluate levels of comfort or nervousness during the associated video;
monitor the movement of lips of the candidate to determine audio-visual synchronization in the associated video; and
observe the movement of forehead and eyebrows of the candidate to analyse cognitive processing and emotional expressions during the associated video.

3. The system as claimed in claim 1, wherein to extract the textual content from the audio content, the controller is further configured to:
utilize an audio-to-text conversion technique for extraction of the textual content from the video;
conduct text pre-processing to remove noise and irrelevant information from the textual content;
utilize a Named Entity Recognition (NER) technique to recognize and classify entities within the textual content;
perform sentiment analysis on the extracted textual content to emotional tone conveyed by the candidate in the video;
identify and extract key terms through a keyword extraction technique to highlight relevant skills, experiences, and qualifications spoken by the candidate;
apply natural language processing (NLP) techniques for contextual understanding, including semantics and relationships within the extracted textual content; and
conduct grammar and spell checks in the extracted textual content.

4. The system as claimed in claim 1, wherein the set of acoustic features comprise any or a combination of Mel Frequency Cepstral Coefficients (MFCC), spectrogram, energy entropy, normalized energy, spectral centroid, spectral spread, spectral entropy, fundamental frequency, mean fundamental frequency, range of frequencies, amplitude mean, amplitude variance, and zero crossing rate.

5. The system as claimed in claim 1, wherein the set of transcription features comprise any or a combination of hesitation markers, response length, pauses ratio, filler ratio, speaking rate, and silence per token.

6. The system as claimed in claim 1, wherein the controller is further configured to utilize a text classification technique for categorization of the textual content within each video and compare the categorized textual content with the predefined resume context criteria.

7. The system as claimed in claim 1, wherein the controller, during the evaluation of extracted textual content for alignment with the predefined resume context criteria, further comprises detection of at least one of non-resume-related words, offensive words, and vulgar words.

8. The system as claimed in claim 1, wherein the controller is further configured to assign weights to each of the generated individual scores and utilize a machine learning model trained to optimize the generated confidence score from a combination of each of the generated individual scores for the candidate.

9. A method (300) for video resume analysis, the method comprising:
receiving (302), by an input unit, a set of videos, each associated with a candidate;
verifying (304), the received set of videos from a video classifier for detecting whether the received set of videos indicates a video resume;
extracting (306), by a controller, one or more facial features of each candidate from the associated video, upon successful verification of the video;
converting (308), by the controller, audio content from each video into a textual format separately and extracting textual content;
evaluating (310), by the controller, the extracted textual content from the audio content of each video for alignment with a predefined resume context criteria;
discarding (312), by the controller, the video upon detection of the audio content not aligned with the predefined resume context criteria;
extracting (314), by the controller, a set of acoustic features and a set of transcription features from the audio content of each video; and
generating (316), by the controller, an individual score for the candidate for the one or more facial features, the extracted text, the set of acoustic features, and the set of transcription features separately, and combining the generated individual scores to generate a confidence score for the candidate.

10. The method as claimed in claim 9, wherein the extraction of one or more facial features further comprises the steps of:
analyzing eye movement patterns of the candidate to assess attention and engagement during the associated video;
determining blink count of the candidate to evaluate levels of comfort or nervousness during the associated video;
monitoring the movement of lips of the candidate to determine audio-visual synchronization in the associated video; and
observing the movement of forehead and eyebrows of the candidate to analyze cognitive processing and emotional expressions during the associated video.

11. The method as claimed in claim 9, wherein the extraction of textual content from the audio content further includes:
utilizing an audio-to-text conversion technique for extraction of the textual content from the video;
conducting text pre-processing to remove noise and irrelevant information from the textual content;
utilizing a Named Entity Recognition (NER) technique to recognize and classify entities within the textual content;
performing sentiment analysis on the extracted textual content to discern the emotional tone conveyed by the candidate in the video;
identifying and extracting key terms through a keyword extraction technique to highlight relevant skills, experiences, and qualifications spoken by the candidate;
applying natural language processing (NLP) techniques for contextual understanding, including semantics and relationships within the extracted textual content; and
conducting grammar and spell checks in the extracted textual content.

12. The method as claimed in claim 9, further comprises utilizing a text classification technique for categorization of the textual content within each video and comparing the categorized textual content with the predefined resume context criteria.

13. The method as claimed in claim 9, wherein during the evaluation of extracted textual content for alignment with the predefined resume context criteria, detecting at least one of non-resume-related words, offensive words, and vulgar words.

14. The method as claimed in claim 9, further comprises assigning weights to each of the generated individual scores and utilizing a machine learning model trained to optimize the generated confidence score from a combination of each of the generated individual scores for the candidate.

Documents

Application Documents

#	Name	Date
1	202411017855-POWER OF AUTHORITY [12-03-2024(online)].pdf	2024-03-12
2	202411017855-FORM-9 [12-03-2024(online)].pdf	2024-03-12
3	202411017855-FORM FOR SMALL ENTITY(FORM-28) [12-03-2024(online)].pdf	2024-03-12
4	202411017855-FORM 3 [12-03-2024(online)].pdf	2024-03-12
5	202411017855-FORM 1 [12-03-2024(online)].pdf	2024-03-12
6	202411017855-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [12-03-2024(online)].pdf	2024-03-12
7	202411017855-ENDORSEMENT BY INVENTORS [12-03-2024(online)].pdf	2024-03-12
8	202411017855-DRAWINGS [12-03-2024(online)].pdf	2024-03-12
9	202411017855-COMPLETE SPECIFICATION [12-03-2024(online)].pdf	2024-03-12
10	202411017855-MSME CERTIFICATE [16-03-2024(online)].pdf	2024-03-16
11	202411017855-FORM28 [16-03-2024(online)].pdf	2024-03-16
12	202411017855-FORM 18A [16-03-2024(online)].pdf	2024-03-16
13	202411017855-FER.pdf	2024-05-10
14	202411017855-FER_SER_REPLY [29-06-2024(online)].pdf	2024-06-29
15	202411017855-CORRESPONDENCE [29-06-2024(online)].pdf	2024-06-29
16	202411017855-CLAIMS [29-06-2024(online)].pdf	2024-06-29
17	202411017855-US(14)-HearingNotice-(HearingDate-07-10-2025).pdf	2025-09-03

Search Strategy

1	SearchHistory_202411017855E_07-05-2024.pdf