Sign In to Follow Application
View All Documents & Correspondence

Adaptive Learning System

Abstract: Disclosed herein is an adaptive learning system (100) comprising of a data capturing unit (102) configured to receive input from a user, an audio capturing unit (104) configured to capture audio signals of the user, a textual interface (106) configured to receive and process textual input from a user, a communication network (122) configured to transfer data in a voice-interactive learning system, an input module (114) configured to receive and process user input data in a learning system, a pre-processing module (112) configured to process the received data to perform noise reduction, audio normalization, speech enhancement, and feature extraction, an interactive learning module (114) configured to deliver skill-based tasks in the form of adaptive activities, an AI personalization module (116) configured to track user performance and adjust learning content and task difficulty in real time, a scenario-based learning module 118 configured to guide learners through tasks that replicate real-world situations via audio-guided interactions.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
21 May 2025
Publication Number
23/2025
Publication Type
INA
Invention Field
ELECTRONICS
Status
Email
Parent Application

Applicants

SR UNIVERSITY
ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA

Inventors

1. SURESH KUMAR MANDALA
SR UNIVERSITY, ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA
2. NEELIMA GURRAPU
SR UNIVERSITY, ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA
3. JAKKU HARSHAVARDHAN
SR UNIVERSITY, ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA
4. KANDHAGATLA SHREYA
SR UNIVERSITY, ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA
5. PALLE AKSHITH REDDY
SR UNIVERSITY, ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA
6. GOLLA CHITRALEKA
SR UNIVERSITY, ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA
7. KUSAM DEEPAK RAJ
SR UNIVERSITY, ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA
8. CHANDUPATLA VYSHNAVI
SR UNIVERSITY, ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA

Specification

Description:FIELD OF DISCLOSURE
[0001] The present disclosure generally relates to an adaptive learning system that is capable of processing spoken input to determine appropriate skill responses by evaluating semantic meaning and confidence levels for accurate voice-based interaction.
BACKGROUND OF THE DISCLOSURE
[0002] With the continuous development of voice technology and artificial intelligence technology, voice wake-up technology has made a great development in the field of intelligent system s such as smart homes. At present, knowledge skills and business skills exist in the intelligent system, and the knowledge skills are used for providing corresponding knowledge question and answer services for users of the intelligent system, for example, corresponding answers can be provided for questions of each user. In addition, the business skills are used to provide corresponding business services for the user of the intelligent system, such as music services, taxi service, weather service and the like. However, when a general smart system recognizes a user voice, it is impossible to recognize whether the user voice is intended to wake up a business skill or a knowledge skill, which results in a wrong invocation of the skill. Illustratively, when the smart sound box receives a user voice message "who is the mother in the morning", if the music skill is awakened, the singer "who is the song" who is the mother "in the morning" is played, and if the knowledge skill is awakened.
[0003] Traditional voice interaction system s commonly relies on rigid keyword detection or static command templates to trigger specific functions or services. These systems are often limited in their ability to understand natural variations in spoken language, leading to misinterpretation of user intent when voice inputs do not exactly match predefined commands. Additionally, they typically do not evaluate the relevance or reliability of potential responses across different skill domains, resulting in the activation of incorrect or suboptimal skills. This lack of semantic understanding and confidence-based evaluation reduces the overall accuracy and effectiveness of the interaction. Users may face repeated errors, require multiple attempts to get the desired result, or experience confusion due to inconsistent responses. Such limitations hinder user satisfaction and restrict the scalability of voice-based systems in dynamic or complex environments where more intelligent interpretation and flexible skill selection are essential for delivering relevant and accurate responses in real-time scenarios.
[0004] CN111081225B discloses about an invention relates to an adaptive learning system identifying awakening text information corresponding to the voice request message to be processed calling a service skill semantic model to determine a target service field and a corresponding first confidence degree corresponding to the awakening text information, and lacking integrated support for audio-based interaction. This limits accessibility for users with different learning styles, literacy levels, or physical constraints, and fails to engage users in natural, conversational learning modes. calling a knowledge skill semantic model to determine a knowledge reply answer and a corresponding second confidence degree corresponding to the awakening text information selecting one of a wake-up knowledge skill and a target business skill corresponding to the target business domain according to the first confidence and the second confidence. This can reduce the probability of a false wake-up skill from a voice message.
[0005] CN112233656A discloses about an invention relates to voice awakening, in particular to an artificial intelligent voice awakening system, which comprises the steps of obtaining voice data, determining energy characteristics corresponding to the voice data by using a voice detection model, determining text data corresponding to the voice data according to the energy characteristics and lacks the teach through plain questions and answers. They don’t guide users through real-world situations, which are important for hands-on learning judging whether the voice data contains awakening keywords or not, if the voice data contains the awakening keywords, utilizing an awakening judgment model to carry out awakening judgment, and if not, outputting a current situation maintaining instruction by the awakening judgment model; the technical scheme provided by the invention can effectively overcome the defects of low accuracy of voice awakening recognition and incapability of flexibly adjusting awakening words in the prior art.
[0006] Conventionally many systems are disclosed in prior art that provides a way to assist in skill voice awakening however existing systems often lack the personalization adaptability to user needs seamless integration with learning environments and real-time feedback.
[0007] In overcome aforementioned there exists a need to develop a system that is capable of enhancing skill voice awakening through personalized interaction adaptive response mechanisms continuous engagement and context-aware learning support.
SUMMARY OF THE DISCLOSURE
[0008] The following is a summary description of illustrative embodiments of the invention. It is provided as a preface to assist those skilled in the art to more rapidly assimilate the detailed design discussion which ensues and is not intended in any way to limit the scope of the claims which are appended hereto in order to particularly point out the invention.
[0009] According to illustrative embodiments, the present disclosure focuses on an adaptive learning system which overcomes the above-mentioned disadvantages or provide the users with a useful or commercial choice.
[0010] The present disclosure solves all the above limitations of an adaptive learning system.
[0011] An objective of the present disclosure is to develop a system that is capable of recognizing and interpreting awakening text information from voice input to enable relevant skill activation and user interaction in a context-aware manner using linguistic cues and semantic understanding.
[0012] Another objective of the present disclosure is to develop a system that is capable of analyzing voice requests through multiple semantic models to determine appropriate responses based on the user’s intent and content of the spoken input for improved engagement.
[0013] Another objective of the present disclosure is to develop a system that is capable of comparing confidence levels derived from different models to intelligently decide whether to activate a service-related response or provide informative feedback based on knowledge data.
[0014] Another objective of the present disclosure is to develop a system that is capable of identifying and extracting service-related terms from input and mapping them to predefined data structures for accurate and timely skill activation without manual intervention.
[0015] Another objective of the present disclosure is to develop a system that is capable of dynamically acquiring additional relevance information when direct matches are unavailable to maintain continuity in interaction and ensure user intent is addressed.
[0016] Yet another objective of the present disclosure is to develop a system that is capable of refining understanding of service relevance through structured evaluation of external content sources and correlation techniques to enhance response reliability.
[0017] In light of the above, in one aspect of the present disclosure, an adaptive learning system is disclosed herein. The system comprises identifying awakening text information from a voice request message to determine user intent. The system also includes invoking a service skill semantic model to extract service-related keywords and determine a target service domain along with a confidence degree. The system further includes invoking a knowledge skill semantic model to obtain a knowledge-based reply and corresponding confidence score. The system also includes selecting between a wake-up knowledge skill or a target service skill based on the comparative analysis of the confidence degrees. The system further includes refining service recognition using relevance data derived from keyword correlation and structured search evaluations.
[0018] In one embodiment, the system involves identifying awakening text information from a voice request message and selecting an appropriate skill based on semantic confidence levels, further include targeting specific types of service skills such as musical skills.
[0019] In one embodiment, the system involves extracting service-related keywords and entity information from the awakening text using a semantic model. The extracted elements are then compared with a predefined database containing known service entries. This allows for accurate recognition and classification of service intent. The process enhances precision in selecting appropriate service responses.
[0020] In one embodiment, the system involves extracted keywords or entity information are not found in the database, the system includes retrieving relevant service data. This is achieved by analyzing service popularity metrics and correlation indices from external sources. The approach ensures continuity in processing ambiguous or new service requests. It helps adapt the system to evolving service trends.
[0021] In one embodiment, the system evaluates the relevance of search results using a predefined strategy that assesses their quality and contextual alignment. Search engines are queried first with service keywords alone and then with added service names to refine results. This layered querying approach improves the accuracy of service selection. It ensures that the most appropriate skill is activated.
[0022] In one embodiment, the system is integrated into a programmable system designed to execute the voice awakening process. It operates consistently across different environments without manual intervention. The instructions may also be stored in a retrievable format for reuse. This supports flexible deployment across various application platforms.
[0023] These and other advantages will be apparent from the present application of the embodiments described herein.
[0024] The preceding is a simplified summary to provide an understanding of some embodiments of the present invention. This summary is neither an extensive nor exhaustive overview of the present invention and its various embodiments. The summary presents selected concepts of the embodiments of the present invention in a simplified form as an introduction to the more detailed description presented below. As will be appreciated, other embodiments of the present invention are possible utilizing, alone or in combination, one or more of the features set forth above or described in detail below.
[0025] These elements, together with the other aspects of the present disclosure and various features are pointed out with particularity in the claims annexed hereto and form a part of the present disclosure. For a better understanding of the present disclosure, its operating advantages, and the specified object attained by its uses, reference should be made to the accompanying drawings and descriptive matter in which there are illustrated exemplary embodiments of the present disclosure.
BRIEF DESCRIPTION OF THE DRAWINGS
[0026] To describe the technical solutions in the embodiments of the present disclosure or in the prior art more clearly, the following briefly describes the accompanying drawings required for describing the embodiments or the prior art. Apparently, the accompanying drawings in the following description merely show some embodiments of the present disclosure, and a person of ordinary skill in the art can derive other implementations from these accompanying drawings without creative efforts. All of the embodiments or the implementations shall fall within the protection scope of the present disclosure.
[0027] The advantages and features of the present disclosure will become better understood with reference to the following detailed description taken in conjunction with the accompanying drawing, in which:
[0028] FIG. 1 illustrates a block diagram of an adaptive learning system in accordance with an embodiment of the present disclosure;
[0029] FIG. 2 illustrates a flow chart in accordance with an embodiment of the present disclosure;
[0030] Like reference, numerals refer to like parts throughout the description of several views of the drawing.
[0031] The adaptive learning system is illustrated in the accompanying drawings, which like reference letters indicate corresponding parts in the various figures. It should be noted that the accompanying figure is intended to present illustrations of exemplary embodiments of the present disclosure. This figure is not intended to limit the scope of the present disclosure. It should also be noted that the accompanying figure is not necessarily drawn to scale.
DETAILED DESCRIPTION OF THE DISCLOSURE
[0032] The following is a detailed description of embodiments of the disclosure depicted in the accompanying drawings. The embodiments are in such detail as to communicate the disclosure. However, the amount of detail offered is not intended to limit the anticipated variations of embodiments; on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the scope of the present disclosure.
[0033] In the following description, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be apparent to one skilled in the art that embodiments of the present disclosure may be practiced without some of these specific details.
[0034] Various terms as used herein are shown below. To the extent a term is used, it should be given the broadest definition persons in the pertinent art have given that term as reflected in printed publications and issued patents at the time of filing.
[0035] The terms “a” and “an” herein do not denote a limitation of quantity, but rather denote the presence of at least one of the referenced items.
[0036] The terms “having”, “comprising”, “including”, and variations thereof signify the presence of a component.
[0037] Referring now to FIG. 1 to FIG.2 describe various exemplary embodiments of the present disclosure. FIG. 1 illustrates a block diagram of adaptive learning system in accordance with an embodiment of the present disclosure.
[0038] The system 100, may include a data capturing unit 102, an audio capturing unit 104 a textual interface 106, a communication network 122, a microprocessor 108, an input module 110, a pre-processing module 112, an interactive learning module 114, an AI personalization module 116, a scenario-based learning module 118, a feedback generation module 120 and a user interface 124
[0039] The system is characterized by a data capturing unit 102 acquires raw user inputs from both audio and text interfaces and simply forwards them for processing without interpretation. For audio input, a microphone transduces sound waves into an analog electrical signal, which is then sampled and digitized by an analog-to-digital converter (ADC) into a stream of raw digital sample. A lightweight voice-activity detector (VAD) may monitor this digital stream for speech presence – the VAD “monitors the signal from a microphone channel for voice-like patterns and…triggers an interrupt” on detecting speech but its role is only to flag voice segments (e.g. to wake the system or gate recording), not to recognize or analyze the content. The resulting raw PCM audio (including both speech and background noise) is then forwarded unchanged to downstream modules.
[0040] The data capturing unit 102 forms the critical front-end layer of the system it serves as the sensory gateway through which all user-originated data enters. This unit is not a singular component but an intelligently orchestrated suite of subsystems that operate in harmony to capture, distinguish, and validate user inputs. Its design is rooted in multimodal interaction, allowing for both spoken and textual communication, thereby ensuring maximum accessibility and adaptability across user contexts. At the core of the data input module is its ability to dynamically detect and manage multiple input types, often concurrently. This means it can receive a user's voice input while also allowing typed commands or questions, handling them in a parallel yet coordinated fashion.
[0041] The audio capturing unit 104 is designed to detect, acquire, and forward the user's spoken input in its raw digital form, serving as the first point of contact between the user's voice and the system. It begins with a microphone or an array of microphones that continuously monitor the environment to detect incoming sound waves. These analog sound signals are then passed through an Analog-to-Digital Converter (ADC), which samples the waveform at a specific frequency (e.g., 16kHz or 44.1kHz) and transforms it into a stream of digital audio data. To ensure efficiency, the system employs Voice Activity Detection (VAD) algorithms that monitor the signal for human speech patterns, filtering out silence and irrelevant background noise. Once speech is detected, the digitized waveform still unprocessed in terms of content is buffered and then forwarded to downstream modules (such as a pre-processing or speech recognition engine) for further analysis. The audio capturing unit 104 does not perform interpretation or transcription; its role is purely to capture high-fidelity, timestamped, and clean audio signals, preserving the original speech data for intelligent processing in later stages of the system.
[0042] In one embodiment of the present invention, the audio capturing unit 104 comprises a microphone array configured to capture audio signals from multiple spatial directions, thereby enhancing the overall quality of the captured voice input and minimizing background interference. The microphone array consists of two or more spatially distributed microphones arranged in a predefined geometric configuration (e.g., linear, circular, or beamforming-compatible pattern), enabling the system to perform spatial filtering techniques such as beamforming. When a user speaks, the array captures the sound wavefronts from different positions, allowing the system to isolate the primary speech signal from environmental noise or overlapping voices. This configuration not only improves the clarity and intelligibility of the captured audio but also supports directional voice detection and echo cancellation. The captured multi-channel audio data is then digitized via analog-to-digital conversion and passed through a voice activity detection (VAD) module, which confirms the presence of human speech before forwarding the clean, high-fidelity signal to the downstream processing pipeline. This embodiment ensures robust, real-time speech acquisition even in acoustically challenging environments, making it highly suitable for interactive learning systems, virtual assistants, and AI-driven communication platform.
[0043] The textual interface 106 in the present system is configured to receive and process textual input from the user in a raw, unaltered form, acting as a dedicated channel for capturing written interactions. This interface is typically linked to hardware or software-based input devices such as physical keyboards, on-screen virtual keyboards, touchscreen panels, or stylus-based handwriting input systems. When a user initiates input such as pressing a key or tapping on a screen the interface captures each event in real-time using event listeners that detect low-level input signals. These signals are translated into their corresponding characters using standard encoding schemes ensuring accurate digital representation of the user’s intended input. All input is timestamped and buffered, preserving the precise sequence of characters, including deletions, corrections, and pauses in typing.
[0044] At this stage, the textual interface 106 does not interpret, correct, or format the input data; it simply preserves the raw user-generated text exactly as entered. Whether the user inputs complete sentences, partial thoughts, or informal expressions, the interface maintains the authenticity and intent of the original message. The raw character stream is then passed to downstream modules such as pre-processing or natural language understanding components where further operations like spell checking, tokenization, or semantic analysis may occur. This separation of concerns ensures that the textual interface 106 operates as a pure input-capturing layer, free from any bias or alteration, and forms a critical foundation for multimodal interaction systems that integrate both text and voice inputs.
[0045] The user interface 108 (UI) in the present system is configured to act as an interactive layer that facilitates seamless input and engagement from the user through both audio and text modalities. It serves as the visible and interactive gateway between the user and the underlying system, enabling the user to select input modes, initiate commands, and receive system prompts or responses. In text mode, the UI provides editable input fields, chat windows, or form elements where users can type queries or feedback using physical or virtual keyboards. The UI monitors each input event and visually reflects user actions (such as character entry, deletion, or cursor movement) in real time, enhancing transparency and responsiveness. In addition to capturing input, it also helps organize interaction flow showing context, conversation history, and guiding the user via prompts, placeholder text, or labels.
[0046] In audio mode, the user interface 108 incorporates interactive elements such as microphone buttons, voice prompt indicators, and visual feedback cues (e.g., waveform animations or “listening…” states) that inform the user when the system is actively capturing voice input. Upon activation typically through a tap or click the UI triggers the audio capturing subsystem and synchronizes with voice activity detection to begin recording speech. During this process, the UI may display visual feedback like dynamic audio level indicators or countdown timers, which give users a sense of control and system awareness. Importantly, the UI also manages mode switching allowing users to seamlessly shift between typing and speaking within the same interaction thread, often without interrupting the session. By coordinating user behaviour across modalities and providing real-time feedback, the UI ensures an intuitive, accessible, and responsive experience that supports both natural language input methods effectively.
[0047] The microprocessor 108 is connected to the data capturing unit 102 serves as the central processing hub that orchestrates the flow and initial handling of raw input data from both audio and textual channels. Upon receiving digitized audio signals and textual data streams, the microprocessor 108 112 manages buffering, synchronization, and timing, ensuring that inputs from different modalities are correctly aligned for subsequent analysis. It executes low-level control tasks such as managing interrupts triggered by voice activity detection or keystroke events, thus enabling real-time responsiveness. Additionally, the microprocessor 108 oversees signal integrity checks and may perform preliminary filtering or formatting to prepare data packets for the next stage of processing.
[0048] Beyond data management, the microprocessor 108 also facilitates communication between the data capturing unit 102 and other system components, such as the pre-processing and AI personalization module 116. It executes embedded firmware routines to route captured data efficiently through the system’s communication network 122, handle error detection and correction protocols, and manage system resource allocation to optimize performance. In systems with adaptive learning capabilities, the microprocessor 108 112 can dynamically adjust capture parameters like sampling rates or sensitivity thresholds based on contextual feedback, enhancing the overall quality and reliability of user input acquisition. Thus, the microprocessor 108 112 acts as both a traffic controller and a preliminary data processor, ensuring seamless, synchronized, and high-fidelity input capture that forms the foundation for intelligent user interaction.
[0049] The input module 110 in a learning system is responsible for receiving raw user data captured by the data capturing unit 102 and transforming it into a structured, meaningful format suitable for further analysis and learning processes. Upon receiving inputs whether in the form of digitized audio streams or raw textual data the module begins by performing essential pre-processing tasks. For audio inputs, this includes noise reduction, signal enhancement, and conversion of speech signals into phonetic or textual representations using speech recognition algorithms. For text inputs, the module carries out tokenization, normalization (such as converting all text to lowercase), and error correction to handle typos or misspellings, ensuring that the input is coherent and consistent.
[0050] Following initial cleaning, the input module 110 may apply language-specific processing, such as parsing grammatical structures, identifying named entities, or detecting sentiment and intent. It organizes the processed input into a standardized data structure, often integrating temporal markers to preserve the sequence and timing of multimodal inputs. This structured data is then forwarded to downstream components like the interactive learning module 114 or AI personalization unit, which use it to adapt the learning experience according to user needs. Throughout this workflow, the input module 110 maintains flexibility to handle diverse input types and formats, enabling the learning system to support natural, multimodal interactions while ensuring high-quality, reliable data feeds for effective educational outcome.
[0051] The pre-processing module 112 is configured to receive raw data from the data capturing unit 102 and perform critical signal conditioning and preparation tasks to enhance the quality and usability of the input for subsequent processing stages. For audio data, the module first applies noise reduction techniques to minimize background sounds and interference, improving the clarity of the user's speech. This is typically achieved through algorithms such as spectral subtraction, Wiener filtering, or adaptive noise cancellation. Following noise suppression, the module performs audio normalization, adjusting the amplitude of the speech signals to a consistent level to ensure uniformity across varying input volumes. Additionally, speech enhancement methods are applied to improve the intelligibility of the audio by emphasizing speech-relevant frequencies and suppressing distortions or reverberation.
[0052] Once the audio signal has been cleaned and normalized, the pre-processing module 112 proceeds to feature extraction, where it derives meaningful representations from the speech signal, such as Mel-frequency cepstral coefficients (MFCCs), spectrograms, or pitch contours. These features serve as compact, informative inputs for downstream modules like speech recognition or interactive learning systems. For textual data, the module may perform normalization steps such as lowercasing, punctuation removal, or tokenization to prepare the text for semantic analysis. By systematically enhancing and structuring the raw input data, the pre-processing module 112 ensures that the system operates with high-quality, reliable data that facilitates accurate recognition, interpretation, and personalized response.
[0053] An interactive learning module 114 configured to deliver skill-based tasks in the form of interactive games operates as a core functional element of the invention, transforming traditional learning or skill development processes into engaging, user-friendly experiences. This module is architected to dynamically interact with the user through voice commands and responses, utilizing the speech recognition module 102, as the primary interface for input and feedback. The working of the gamified learning module 104, begins with the system receiving a command or initiating a session based on a scheduled program or user prompt detected through the speech recognition module 102. Once activated, the learning module selects an appropriate game-based task, tailored either to the user’s predefined learning goals or to their performance history within the system. These tasks are designed around specific skills such as language acquisition, memory enhancement, problem-solving, or technical knowledge and are embedded with interactive elements that require active user participation.
[0054] As the game unfolds, the user interacts with the module entirely through voice. For instance, the module may present a quiz, riddle, or role-playing scenario where the user must respond verbally. The learning module then evaluates the input in real time, determining correctness, assessing skill level, and delivering instant feedback either as rewards, corrections, or prompts to proceed to the next level. Moreover, the module is designed to collect data on user performance over time, using analytics to adjust future challenges to better match the user’s evolving capabilities. This ensures a personalized learning journey, where the system gradually introduces more complex tasks as the user's proficiency increases. By fully leveraging the hands-free nature of the speech interface, the gamified learning module 104 makes it possible for users to engage in skill-building activities in a convenient, accessible, and immersive way, suitable for environments such as classrooms, homes, or even while commuting.
[0055] In one embodiment of the present invention, the interactive learning module 114 includes a sophisticated context-aware engine designed to enhance the learning experience by continuously monitoring environmental parameters such as background noise, lighting conditions, and user proximity using sensors like microphones, light sensors, and proximity detectors. The engine processes real-time data to assess the ambient environment for instance, if high background noise is detected, the system can automatically increase audio volume, simplify instructions, or switch to visual cues to ensure the user remains engaged and comprehends the content effectively. Similarly, if lighting conditions are poor, the system can adjust screen brightness or suggest relocating to a better-lit area. When the user is detected to be far from the system, the engine might prompt the user to move closer or modify the interaction style to maintain engagement. These dynamic adjustments optimize the learning environment, making it adaptable to different settings and ensuring that educational activities are effective, personalized, and accessible regardless of external conditions.
[0056] An AI personalization module 116 configured to track user performance and adjust learning content and task difficulty in real time functions as the intelligent backbone of the invention, ensuring that each user receives a customized and evolving learning experience tailored to their unique needs, strengths, and progress. This module operates continuously in the background, interfacing directly with both the gamified learning module 104, and the speech recognition system to collect and analyze interaction data at every step. The working of the AI personalization module 116 begins with real-time data acquisition. As the user engages with interactive tasks responding via voice commands that are processed through the speech recognition the system captures key performance indicators such as response time, accuracy of answers, frequency of engagement, task completion rates, hesitation patterns, and even sentiment cues from voice tone if equipped with emotion recognition capabilities. All these data points are instantly fed into the personalization engine.
[0057] At the core of this module lies a set of machine learning algorithms, particularly reinforcement learning and user modelling techniques, which continuously process the incoming performance data. These algorithms map the user's learning curve, identify areas of difficulty or proficiency, and detect patterns that signal either mastery or struggle with specific concepts or tasks. Based on these insights, the AI module dynamically recalibrates the upcoming content altering the complexity, nature, and pacing of the games or challenge. In addition to real-time adjustments during a session, the AI personalization module 116 maintains a longitudinal performance profile for each user. This profile becomes increasingly accurate over time, allowing the module to make predictive adjustments anticipating areas where the user is likely to encounter difficulty and pre-emptively designing support mechanisms, such as hints, slower pacing, or supplementary mini-games. Overall, the AI personalization module 116 enables a responsive, adaptive, and highly individualized learning environment. It transforms static educational experiences into dynamic journeys, where the content intelligently evolves in sync with the learner, maximizing both motivation and outcome.
[0058] In one embodiment of the present invention, the AI personalization module 116 includes a peer collaboration module designed to facilitate interactive, voice-based multiplayer learning sessions among users. This module leverages speech recognition and natural language processing technologies to enable real-time communication, allowing users to collaborate, discuss, and solve problems collectively within the learning environment. The system can identify compatible users based on their skill levels, learning preferences, and progress, and then establish secure voice channels for collaborative activities. During sessions, the module may support features such as voice prompts, contextual hints, and shared virtual spaces to foster engagement and teamwork. Additionally, it can monitor conversation dynamics to ensure productive interactions, provide feedback on communication effectiveness, and adapt the difficulty or focus of activities based on group performance. This setup promotes social learning, enhances motivation, and creates a more interactive and personalized educational experience by connecting users in meaningful voice-based collaborations.
[0059] The scenario-based learning module 118 is configured to emulate real-world skill scenarios via audio-guided interactions forms a key functional aspect of the invention, enabling users to practice and develop practical, context-specific skills in a controlled, immersive, and fully voice-interactive environment. The module is designed to replicate real-life situations through a series of guided audio narratives, instructions, prompts, and decision-making tasks that the user responds to verbally leveraging the speech recognition for input and the feedback generation module 120 for real-time response is initiated either by user selection, recommendation from the AI personalization module 116 or as part of a structured learning progression. Each scenario is crafted to mirror real-world situations relevant to the user’s learning goals, such as customer service interactions, emergency response training, language conversation practice, or technical troubleshooting procedures.
[0060] Once launched, the module delivers an immersive audio experience, guiding the user step-by-step through a simulated environment. This includes setting the context through descriptive audio, assigning a role or objective to the user (e.g., “You are a hotel receptionist handling a guest complaint”), and introducing dynamic challenges. These challenges evolve based on the user’s verbal choices and inputs. For each interaction point, the system uses audio prompts to present problems, characters, or questions, and then pauses to listen for the user's spoken response. This response is captured via the speech recognition module 102, processed for content and intent, and then evaluated against scenario logic. For example, in a simulated first-aid situation, if a user correctly says, “Check for breathing,” the simulation proceeds to the next realistic step; if the answer is incorrect or delayed, the module may initiate a corrective prompt or simulate a consequence e.g., “The person isn’t breathing”. “What will you do next”. This creates a dynamic, reactive learning experience where the outcome is shaped by the user's performance.
[0061] A feedback generation module 120 configured to provide immediate audio feedback based on the user’s task performance is an essential component of the invention, designed to create a responsive, interactive, and engaging user experience. This module operates in real time, processing the outcome of each user interaction primarily verbal responses captured and generating appropriate audio cues or spoken messages that guide, encourage, or correct the user during their learning journey. The working process of the feedback generation module 120 begins the moment a user completes a voice-based task or provides an answer during an interactive game delivered by the interactive learning module 114. The response is captured through the speech recognition system, which converts the spoken input into text and forwards it to the learning engine. The engine then evaluates the response for accuracy, relevance, and contextual appropriateness. Once the evaluation is complete, the results are sent to the feedback generation module 120. Based on this assessment, the module selects the appropriate feedback response from a predefined library of audio messages or dynamically generates a customized response using text-to-speech (TTS) technology. Feedback may be positive (e.g., “Well done!” or “Correct!”), constructive (e.g., “That’s close, but try again.”), or instructional (e.g., “Remember, the past tense of ‘go’ is ‘went’.”).
[0062] To enhance engagement and learning efficiency, the module uses variations in tone, pace, and emotional inflection when delivering feedback. For example, success messages may be delivered in an enthusiastic tone, while corrective feedback may be delivered in a calm, encouraging manner. This emotional modulation is crucial for maintaining motivation and preventing user frustration, especially during more difficult or repetitive tasks. Moreover, the module is context-aware and performance-sensitive. It adapts its feedback style based on the user’s current performance level and learning history as tracked by the AI personalization module 116. For users demonstrating early signs of fatigue or repeated errors, the feedback module might shift to a more supportive tone, offer simpler tasks, or initiate a short break through a spoken suggestion like, “Let’s take a quick pause before the next challenge. Another critical feature of this module is its ability to deliver corrective feedback in a step-by-step manner, helping users understand why an answer was incorrect and how to improve. For instance, if a user mispronounces a word or provides a grammatically incorrect sentence, the module doesn’t simply state it was wrong it may repeat the correct form, break it down into simpler parts, and prompt the user to try again. This reinforces learning through guided correction.
[0063] The voice biometric module is designed to authenticate and identify individual users by analyzing their unique vocal characteristics. When a user speaks, the module captures the audio input through a microphone and processes it using advanced digital signal processing techniques to extract distinctive features such as pitch, tone, cadence, formant frequencies, and speech patterns. These features are then transformed into a numerical voiceprint or biometric template using algorithms like Mel-Frequency Cepstral Coefficients (MFCC) and machine learning models trained on a database of known users. During subsequent interactions, the system compares the real-time voice input against stored templates using pattern matching algorithms such as Gaussian Mixture Models (GMM) or deep neural networks. If the match exceeds a predefined confidence threshold, the system successfully authenticates the user, granting access or personalizing the experience accordingly. This biometric module ensures secure, seamless user identification, enabling personalized learning environments and protecting sensitive data by verifying individual identities based solely on vocal traits.
[0064] The localized content generator functions by utilizing regional, linguistic, or professional data to produce tailored voice-based learning tasks that resonate with specific user groups. It first identifies the user's region, preferred language, or profession through user profiles or contextual cues. Based on this information, the generator accesses relevant linguistic databases, cultural references, terminology, and contextual content to create customized learning tasks. Using natural language processing and speech synthesis technologies, it constructs voice prompts, questions, or instructions in the appropriate language and dialect, incorporating region-specific idioms, accents, or technical jargon as needed. These tasks are then delivered via voice-based interfaces, ensuring cultural relevance and language appropriateness. The system may also adapt the difficulty level or content complexity based on user proficiency or learning progress, providing a personalized and engaging educational experience that aligns with the user's regional, linguistic, or professional context.
[0065] Referring now to FIG. 2 to describe various exemplary embodiments of the present disclosure. FIG. 2 illustrates a flow chart of an adaptive learning system 100 in accordance with an exemplary embodiment of the present disclosure
[0066] At step 202, the system 100 captures audio signals from the user using a microphone or microphone array. These signals are first converted from analog to digital form, then temporarily buffered. Voice activity detection (VAD) may be used to identify when speech occurs, ensuring the system only processes relevant audio segments.
[0067] At step 204, the system 100 receives and processes textual input entered by the user via a keyboard, touchscreen, or similar interface. Raw keystroke or tap data is captured, encoded, and stored in a buffer without alteration, preserving the integrity of the user's original text.
[0068] At step 206, the system 100 facilitates user interaction through both audio and text by presenting a unified interface. This includes elements like microphone buttons, text fields, and visual feedback indicators. It ensures that users can seamlessly switch between speaking and typing as preferred.
[0069] At step 208, the system 100 performs signal processing tasks on the captured data. For audio, this includes noise reduction, normalization of volume levels, speech enhancement, and extracting features such as MFCCs or spectrograms. These cleaned signals are then used for speech recognition or analysis.
[0070] At step 210, user performance is tracked in real time. The system analyses response accuracy, speed, and behavioural patterns to evaluate progress. Based on this data, learning content is dynamically modified—tasks may become harder or easier depending on the learner’s needs.
[0071] At step 212, the system 100 engages the user with scenario-based tasks that simulate real-world environments. These tasks are delivered through audio prompts and interactive sequences, helping users practice applied skills in contextually relevant situations.
[0072] At step 214, the system 100 generates and delivers immediate audio feedback. This may include praise, corrections, hints, or guidance based on the user's performance in the task, reinforcing learning outcomes and maintaining engagement.
[0073] While the invention has been described in connection with what is presently considered to be the most practical and various embodiments, it will be understood that the invention is not to be limited to the disclosed embodiments, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the scope of the appended claims.
[0074] A person of ordinary skill in the art may be aware that, in combination with the examples described in the embodiments disclosed in this specification, units and algorithm steps may be implemented by electronic hardware, computer software, or a combination thereof.
[0075] The foregoing descriptions of specific embodiments of the present disclosure have been presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the present disclosure to the precise forms disclosed, and many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described to best explain the principles of the present disclosure and its practical application, and to thereby enable others skilled in the art to best utilize the present disclosure and various embodiments with various modifications as are suited to the particular use contemplated. It is understood that various omissions and substitutions of equivalents are contemplated as circumstances may suggest or render expedient, but such omissions and substitutions are intended to cover the application or implementation without departing from the scope of the present disclosure.
[0076] Disjunctive language such as the phrase “at least one of X, Y, Z,” unless specifically stated otherwise, is otherwise understood with the context as used in general to present that an item, term, etc., may be either X, Y, or Z, or any combination thereof (e.g., X, Y, and/or Z). Thus, such disjunctive language is not generally intended to, and should not, imply that certain embodiments require at least one of X, at least one of Y, or at least one of Z to each be present.
[0077] In a case that no conflict occurs, the embodiments in the present disclosure and the features in the embodiments may be mutually combined. The foregoing descriptions are merely specific implementations of the present disclosure, but are not intended to limit the protection scope of the present disclosure. Any variation or replacement readily figured out by a person skilled in the art within the technical scope disclosed in the present disclosure shall fall within the protection scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.
, Claims:I/We Claim:
1. An adaptive learning system (100), for personalized skill development comprising:
a data capturing unit (102) configured to receive input from a user further comprises:
an audio capturing unit (104) configured to capture audio signals of the user by various means;
a textual interface (106) configured to receive and process textual input from a user;
a user interface (124) configured to facilitate user interaction and input through both audio and text modes;
a communication network (122) configured to transfer data in a voice-interactive learning system;
a microprocessor (108) connected to the data capturing unit (102) wherein the microprocessor (108) further comprises:
an input module (114) configured to receive and process user input data in a learning system;
a pre-processing module (112) configured to process the received data to perform noise reduction, audio normalization, speech enhancement, and feature extraction;
an interactive learning module (114) configured to deliver skill-based tasks in the form of adaptive activities;
an AI personalization module (116) configured to track user performance and adjust learning content and task difficulty in real time;
a scenario-based learning module 118 configured to guide learners through tasks that replicate real-world situations via audio-guided interactions;
a feedback generation module (120) configured to provide immediate audio feedback based on the user.
2. The system (100) as claimed in claim1, wherein the audio capturing unit (104) further comprises of a microphone array configured to capture audio signals from multiple directions to enhance signal quality and reduce noise.
3. The system (100) as claimed in claim 1, wherein the interactive learning module (114) further comprises of a context-aware engine configured to monitor environmental parameters including background noise.
4. The system (100) as claimed in claim 1, wherein the AI personalization module (116) further comprises of a peer collaboration module configured to establish voice-based multiplayer learning sessions between users.
5. The system (100) as claimed in claim 1, wherein a voice biometric module configured to identify individual users based on vocal characteristics.
6. The system (100) as claimed in claim 1, wherein a localized content generator configured to create region-specific, language-specific, or profession-specific voice-based learning tasks.
7. A method (200) for an adaptive learning system, the method (200) comprising;
capturing audio signals from the user;
receiving and processing textual input from the user;
facilitating user interaction through both audio and text modes;
performing noise reduction, audio normalization, speech enhancement, and feature extraction on the received data;
tracking user performance and adjusting learning content and task difficulty in real time;
guiding the user through scenario-based tasks that replicate real-world situations;
providing immediate audio feedback to the user based on the user's input and performance.

Documents

Application Documents

# Name Date
1 202541048895-STATEMENT OF UNDERTAKING (FORM 3) [21-05-2025(online)].pdf 2025-05-21
2 202541048895-REQUEST FOR EARLY PUBLICATION(FORM-9) [21-05-2025(online)].pdf 2025-05-21
3 202541048895-POWER OF AUTHORITY [21-05-2025(online)].pdf 2025-05-21
4 202541048895-FORM-9 [21-05-2025(online)].pdf 2025-05-21
5 202541048895-FORM FOR SMALL ENTITY(FORM-28) [21-05-2025(online)].pdf 2025-05-21
6 202541048895-FORM 1 [21-05-2025(online)].pdf 2025-05-21
7 202541048895-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [21-05-2025(online)].pdf 2025-05-21
8 202541048895-DRAWINGS [21-05-2025(online)].pdf 2025-05-21
9 202541048895-DECLARATION OF INVENTORSHIP (FORM 5) [21-05-2025(online)].pdf 2025-05-21
10 202541048895-COMPLETE SPECIFICATION [21-05-2025(online)].pdf 2025-05-21
11 202541048895-Proof of Right [30-05-2025(online)].pdf 2025-05-30