System For Recommending Videos Based On A User's Emotion State

Abstract: A system for recommending videos based on a user's emotion state, comprising a voice recording module of a user’s mobile phone, a speech recognition module associated with the system, configured to convert audio data into text data, a text processing module associated with the system to process text data by removing unnecessary words and formatting the text for subsequent analysis, an emotion detection module associated with the system to detect patterns indicative of a user's emotional state and determine the user's emotion based on the detected patterns, a video recommendation module configured to access the user's YouTube account and an API linked to the user's Gmail account and recommend videos based on the user's determined mood and YouTube preferences.

Patent Information

Application #

Filing Date

13 August 2025

Publication Number

35/2025

Publication Type

INA

Invention Field

ELECTRONICS

Status

Email

Parent Application

Applicants

SR University

Ananthasagar, Hasanparthy (PO), Warangal-506371, Telangana, India.

Inventors

1. Radhakrishnan P

Assistant Dean (Student Welfare) & Assistant Professor, School of Computer Science & Artificial Intelligence, SR University, Ananthasagar, Hasanparthy (PO), Warangal-506371, Telangana, India.

2. Dr. N.Sharmila Banu

Assistant Dean (Research) & Assistant Professor, School of Computer Science & Artificial Intelligence, SR University, Ananthasagar, Hasanparthy (PO), Warangal-506371, Telangana, India.

3. Dr. Salomi Samsudeen

Assistant Professor, Department of Computational Intelligence, SRM Institute of Science and Technology, Kattankulathur, Chennai – 603203, Tamil Nadu, India.

4. Kodem Sravan

Assistant Professor, Department of Computer Science and Engineering, ACE Engineering College, Ankushapur, Ghatkesar Mandal, Medchal District, Telangana. – 501301, India.

5. S. Deepan

Assistant Professor, School of Computing, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, 42, Avadi-Vel Tech Road, Vel Nagar, Avadi, Chennai, Tamil Nadu 600062, India.

6. Tamilselvi P

UG Scholar, GSS Jain College For Women, 96 Vepery High Road, Chennai 600007, Tamil Nadu, India.

Specification

Description:FIELD OF THE INVENTION

[0001] The present invention relates to a system for recommending videos based on a user's emotion state that accurately identifies user emotional states via voice analysis, enabling personalized video recommendations. The system improving voice clarity through noise reduction and also ensures reliable emotion detection, leading to highly relevant content and a superior user experience.

BACKGROUND OF THE INVENTION

[0002] The system for recommending videos aims to revolutionize content consumption. The system employs advanced voice analysis and noise reduction techniques to accurately identify real-time emotions. By integrating this emotional data with user preferences and historical viewing patterns, the system generates highly personalized video recommendations. This goes beyond traditional methods that rely solely on explicit preferences, offering a more intuitive and responsive entertainment experience. The goal is to provide content that truly resonates with the user's current mood, leading to increased engagement, satisfaction, and a seamless, tailored viewing journey.

[0003] Traditional video recommendation systems primarily rely on collaborative filtering and content-based filtering. Collaborative filtering suggests videos based on the viewing habits of similar users ("users who watched X also watched Y"). Its limitations include the cold start problem (difficulty recommending for new users or new content with limited data), data sparsity (difficulty finding patterns in vast datasets where most users haven't interacted with most items), and scalability issues as user and content bases grow. Content-based filtering recommends videos similar to those a user has previously enjoyed, based on metadata like genre, actors, or keywords. However, it often suffers from over-specialization, failing to introduce diverse or novel content, and struggles with new content lacking sufficient metadata. Neither method effectively captures a user's real-time emotional state, leading to recommendations that might be logically sound but emotionally irrelevant.

[0004] US10958962B2 discloses a video recommending system includes a virtual reality device and a server. The virtual reality device includes a brainwave sensor and a processor. The brainwave sensor is configured to acquire a first brainwave data. The processor is coupled to the brainwave sensor, and is configured to receive the first brainwave data. The server is coupled to the virtual reality device. The server is configured to generate a recommending list according to a first emotion data corresponding to the first brainwave data, and to transmit the recommending list to the virtual reality device, wherein the recommending list includes a plurality of video lists for the virtual reality device to play at least one video of the video lists.

[0005] JP2022119026A; discloses an emotion estimation device, emotion estimation system, and program that enable a device of a simple configuration to easily estimate an emotion of a speaker, and achieve smooth communication even if the communication is non-face-to-face. SOLUTION: An emotion estimation device WT comprises: an input state detection unit 120 that detects a key input state indicative of a state of a word input operation by a user; an estimation result creation unit 130 that generates an emotion estimation result indicative of an emotion of the user on the basis of the key input state; and an estimation result output unit 140 that outputs the emotion estimation result on the basis of the generation of the emotion estimation result. The estimation result creation unit 130 is configured to: determine an affection value indicative of a comfort degree of the user and an awakening degree indicative of a degree of awakening of the user on the basis of the key input state; and generate an emotion estimation result on the basis of the determined affection value and awakening.

[0006] Conventionally, many systems are available in the market for recommending videos but existing video recommendation systems often fall short because they primarily rely on historical viewing data and stated preferences, neglecting the user's real-time emotional state. This leads to generic suggestions that may not align with a user's current mood. Furthermore, background noise often compromises voice input clarity, making accurate emotion detection unreliable.

[0007] In order to overcome the aforementioned drawbacks, there exists a need in the art to develop a systems personalize content based on viewing history and preferences, offering users relevant suggestions from vast catalogs. This increases engagement, improves user satisfaction, and helps discover new content, ultimately driving platform consumption and revenue.

OBJECTS OF THE INVENTION

[0008] The principal object of the present invention is to overcome the disadvantages of the prior art.

[0009] An object of the present invention is to develop a system that is capable of accurately identify a user's emotional state by analyzing their voice input, enabling personalized video recommendations that match their mood, thus user engagement and satisfaction by providing a more intuitive and responsive entertainment experience.

[0010] Another object of the present invention is to develop a system that is capable of improving the clarity of captured voice data by reducing background noise, ensuring reliable emotion detection for effective video suggestions, therefore leading to more accurate and relevant content delivery and a significantly improved user experience.

[0011] Yet another object of the present invention is to develop a system that is capable of recommending relevant videos by accessing the user's preferences and account data, providing a tailored viewing experience based on their emotional state, thus creating a truly personalized and intuitive entertainment platform that anticipates and responds to individual user needs.

[0012] The foregoing and other objects, features, and advantages of the present invention will become readily apparent upon further review of the following detailed description of the preferred embodiment as illustrated in the accompanying drawings.

SUMMARY OF THE INVENTION

[0013] The present invention relates to a system for recommending videos based on a user's emotion state to enhance voice data clarity through noise reduction, ensuring reliable emotion detection for accurate video suggestions. By leveraging user preferences and emotional states, it creates a personalized viewing experience, anticipating and responding to individual entertainment needs.

[0014] According to an embodiment of the present invention, a system for recommending videos based on a user's emotion state, comprising a voice recording module of a user’s mobile phone, associated with the system, configured to capture a user's voice input, a speech recognition module associated with the system, configured to convert audio data into text data by processing the audio data to recognize spoken words, a text processing module associated with the system to process text data by removing unnecessary words, an emotion detection module associated with the system to detect patterns indicative of a user's emotional state and determine the user's emotion based on the detected patterns, a video recommendation module configured to access the user's YouTube account and an API linked to the user's Gmail account and recommend videos based on the user's determined mood and YouTube preferences.

[0015] While the invention has been described and shown with particular reference to the preferred embodiment, it will be apparent that variations might be possible that would fall within the scope of the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS

[0016] These and other features, aspects, and advantages of the present invention will become better understood with regard to the following description, appended claims, and accompanying drawings where:
Figure 1 illustrates an isometric view of a system for recommending videos based on a user's emotion state.

DETAILED DESCRIPTION OF THE INVENTION

[0017] The following description includes the preferred best mode of one embodiment of the present invention. It will be clear from this description of the invention that the invention is not limited to these illustrated embodiments but that the invention also includes a variety of modifications and embodiments thereto. Therefore, the present description should be seen as illustrative and not limiting. While the invention is susceptible to various modifications and alternative constructions, it should be understood, that there is no intention to limit the invention to the specific form disclosed, but, on the contrary, the invention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the invention as defined in the claims.

[0018] In any embodiment described herein, the open-ended terms "comprising," "comprises,” and the like (which are synonymous with "including," "having” and "characterized by") may be replaced by the respective partially closed phrases "consisting essentially of," consists essentially of," and the like or the respective closed phrases "consisting of," "consists of, the like.

[0019] As used herein, the singular forms “a,” “an,” and “the” designate both the singular and the plural, unless expressly stated to designate the singular only.

[0020] The present invention relates to a system for recommending videos based on a user's emotion state that accurately identify user emotional states via voice analysis, delivering personalized video recommendations that match their mood. By leveraging user preferences and account data, the system creates a truly intuitive platform for enhanced engagement and satisfaction.

[0021] Referring to Figure 1, a block diagram depicting workflow of a system for recommending videos based on a user's emotion state is illustrated. The system integrates multiple modules that work seamlessly to capture, process, and analyze user input to deliver personalized video recommendations. This invention addresses the need for dynamic content curation that aligns with a user's emotional context, improving engagement and satisfaction.

[0022] The system begins with a voice recording module integrated into a user’s mobile phone. This module captures the user’s voice input, generating audio data that represents the spoken content. By utilizing the phone’s microphone, the system ensures accessibility and ease of use, allowing users to provide input naturally. To enhance audio quality, the system incorporates noise-filtering capabilities, which remove background noise to produce clear and reliable audio data for further processing (as illustrated in Fig.1).

[0023] Next, the speech recognition module converts the captured audio data into text. This module employs advanced speech-to-text protocols to recognize spoken words accurately. A language model is applied to improve transcription accuracy by accounting for contextual linguistic patterns. This ensures that the system can handle diverse accents, dialects, and speech nuances, making it robust and adaptable to various users.

[0024] The text processing module then takes the transcribed text and prepares it for analysis. It removes unnecessary words, such as filler terms, and formats the text to ensure clarity and consistency. This preprocessing step is critical for enabling accurate emotional analysis by eliminating noise in the text data, ensuring that only relevant content is passed to the next stage.

[0025] The core of the system lies in the emotion detection module, which uses a neural network to analyze the processed text data. By identifying patterns indicative of emotional states such as happiness, sadness, or excitement, the module determines the user’s mood with high precision. The neural network is trained to recognize subtle linguistic cues, enabling the system to infer emotions even from short or casual voice inputs. This emotional insight forms the foundation for personalized content recommendations.

[0026] Finally, the video recommendation module leverages the detected emotional state to curate videos tailored to the user’s mood. Integrated with the user’s YouTube account via an API linked to their Gmail account, the module accesses the user’s watch history and liked videos. By analyzing these preferences alongside the determined mood, the system prioritizes and recommends videos that align with the user’s emotional and content preferences. For example, a user detected to be in a cheerful mood might receive recommendations for uplifting or comedic videos, while a user feeling contemplative might be suggested documentaries or inspirational content.

[0027] This system offers a novel approach to video recommendation by combining emotion detection with user-specific data, creating a highly personalized and context-aware experience. Its modular design ensures scalability and adaptability, making it suitable for integration into various platforms beyond YouTube. By addressing both technical accuracy and user engagement, the invention has the potential to redefine how digital content is curated and consumed.

[0028] The present invention work best in the manner, where the innovative system for recommending videos based on the user's emotional state, enhancing the experience on platforms like YouTube. The process begins with the voice recording module within the user's mobile phone, capturing voice input and generating audio data. This module incorporates noise-filtering capabilities to ensure clear and reliable audio. Subsequently, the speech recognition module converts the audio data into text using advanced speech-to-text protocols and a language model for improved transcription accuracy. The text processing module then refines this transcribed text by removing filler words and formatting for consistency. The core of the system is the emotion detection module, employing a neural network to analyze the processed text and accurately identify emotional states such as happiness or sadness. Finally, the video recommendation module leverages this detected emotional state. Integrated with the user’s YouTube account via the API linked to their Gmail account, this module accesses the user’s watch history and liked videos. By analyzing these preferences alongside the determined mood, the system curates and prioritizes videos that align with both the user’s emotional and content preferences, providing a truly personalized and context-aware viewing experience.

[0029] Although the field of the invention has been described herein with limited reference to specific embodiments, this description is not meant to be construed in a limiting sense. Various modifications of the disclosed embodiments, as well as alternate embodiments of the invention, will become apparent to persons skilled in the art upon reference to the description of the invention. , Claims:1) A system for recommending videos based on a user's emotion state, comprising:

i) a voice recording module of a user’s mobile phone, associated with the system, configured to capture a user's voice input and generate audio data representing the captured voice input;
ii) a speech recognition module associated with the system, configured to convert audio data into text data by processing the audio data to recognize spoken words;
iii) a text processing module associated with the system, configured to process text data by removing unnecessary words and formatting the text for subsequent analysis;
iv) an emotion detection module associated with the system, configured to analyze processed text data using a neural network to detect patterns indicative of a user's emotional state and determine the user's emotion based on the detected patterns; and
v) a video recommendation module configured to access the user's YouTube account and an API linked to the user's Gmail account, in the computing unit, to find videos and share an and recommend videos based on the user's determined mood and YouTube preferences.

2) The system as claimed in claim 1, wherein the voice recognition module further configured to filter background noise from the captured voice input to enhance the quality of the generated audio data.

3) The system as claimed in claim 1, wherein the speech recognition module of further configured to apply a language model to improve the accuracy of the conversion of audio data into text data by accounting for contextual linguistic patterns.

4) The system as claimed in claim 1, wherein the video recommendation module further configured to prioritize video recommendations by analyzing the user's YouTube watch history and liked videos to align with the determined mood.

Documents

Application Documents

#	Name	Date
1	202541077309-STATEMENT OF UNDERTAKING (FORM 3) [13-08-2025(online)].pdf	2025-08-13
2	202541077309-REQUEST FOR EARLY PUBLICATION(FORM-9) [13-08-2025(online)].pdf	2025-08-13
3	202541077309-PROOF OF RIGHT [13-08-2025(online)].pdf	2025-08-13
4	202541077309-POWER OF AUTHORITY [13-08-2025(online)].pdf	2025-08-13
5	202541077309-FORM-9 [13-08-2025(online)].pdf	2025-08-13
6	202541077309-FORM FOR SMALL ENTITY(FORM-28) [13-08-2025(online)].pdf	2025-08-13
7	202541077309-FORM 1 [13-08-2025(online)].pdf	2025-08-13
8	202541077309-FIGURE OF ABSTRACT [13-08-2025(online)].pdf	2025-08-13
9	202541077309-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [13-08-2025(online)].pdf	2025-08-13
10	202541077309-EVIDENCE FOR REGISTRATION UNDER SSI [13-08-2025(online)].pdf	2025-08-13
11	202541077309-EDUCATIONAL INSTITUTION(S) [13-08-2025(online)].pdf	2025-08-13
12	202541077309-DRAWINGS [13-08-2025(online)].pdf	2025-08-13
13	202541077309-DECLARATION OF INVENTORSHIP (FORM 5) [13-08-2025(online)].pdf	2025-08-13
14	202541077309-COMPLETE SPECIFICATION [13-08-2025(online)].pdf	2025-08-13