Methods And Systems For Adaptively Delivering Content To A Content

< Back

Methods And Systems For Adaptively Delivering Content To A Content Viewer

Abstract: A method and a system for adaptively delivering content to a content viewer is disclosed. The method performed by the system includes receiving a media content request for a media content and accessing raw media content associated with the media content. Method includes extracting frame-level data from the raw media content and computing, by an adaptive encoding model, a complexity score for the raw media content based on the frame-level data. Method includes mapping one or more media content encoding parameters and a bitrate ladder to the media content based on the complexity score. Method includes encoding the media content to obtain a plurality of encoded media content files based on the one or more media content encoding parameters and the bitrate ladder. Herein, each encoded media content file corresponds to a particular content resolution and a particular bitrate.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

06 November 2023

Publication Number

19/2025

Publication Type

INA

Invention Field

COMMUNICATION

Status

Parent Application

Applicants

Star India Private Limited

Star House, Urmi Estate, 95, Ganpatrao Kadam Marg, Lower Parel (W), Mumbai 400013, Maharashtra, India

Inventors

1. Ramesh V. Panchagnula

Star House, Urmi Estate, 95, Ganapatrao Kadam Marg, Lower Parel West, Mumbai, Maharashtra 400013, India

2. Qian Chang

Unit N711, Floor 7, North Building, Raycom Infotech Park Tower C, No.2 Kexuyuan South Road, Haidian District, Beijing 100190, People's Republic of China

3. Madhukar Bhat

Star House, Urmi Estate, 95, Ganapatrao Kadam Marg, Lower Parel West, Mumbai, Maharashtra 400013, India

Specification

DESC: The present technology generally relates to the delivery of streaming content to content viewers and, more particularly, to artificial intelligence-based methods and systems for adaptively encoding media content and delivering it to the content viewer (i.e., a user or a subscriber).
BACKGROUND
On-demand media streaming as well as live streaming of content has gained popularity in recent times and, subscribers are increasingly using a variety of electronic devices, i.e., user devices for streaming content. The streaming content is accessed on the user device of the subscriber using Over-The-Top (OTT) media services (i.e., over the Internet). The OTT streaming content providers typically use a Content Delivery Network (CDN) to deliver the streaming content to the user device associated with the subscriber.
In most cases, a content provider entrusts the CDN for delivering content to the end users. A user’s request for viewing/accessing content that is offered by the content provider is directed to the CDN, which identifies the nearest CDN Point Of Presence (PoP) or a sub-storage box to deliver the requested content to the user. Generally, the content provider delivers the requested content in an encoded form of all possible resolutions to the CDN which is known as a static ladder. In various non-limiting instances, for a particular user device, a predefined resolution may be served such as 1080p, 720p, and so on. The static ladder is a playlist of the same media content in different resolutions and bitrate combinations. The static ladder aims to enable users on mobile networks with varying bandwidth/speeds and low speed/bandwidth networks to access any media content with acceptable latency and appropriate resolution. The media player associated with a media application of the content provider installed on the user device may select from a variety of different alternate streams based on the Adaptive Bit-Rate (ABR)/the changing network conditions and adaptively switch the user to a higher/lower resolution. Thus, allowing the streaming session to adapt to the available data rate.
Conventionally, before any media content is ingested in the CDN, the content needs to be processed. Content processing includes encoding the media such as videos based on a content policy of the content provider platform. It should be understood that encoding converts raw video files into one or more digital formats that are compatible with the video players or video extensions supported by various user devices. Initially, the subscriber requests desired content from the content provider. At the content provider’s end, the CDN is identified to serve the content to the subscriber which in turn, identifies the CDN PoP or sub-storage box nearest to the subscriber to serve the requested content. Once identified, in one scenario, the CDN PoP downloads the requested content from the CDN in various resolutions such as 480p, 720p, 1080p, 1440p (i.e., 2k), 2160p (i.e., 4k), and the like. In another scenario, the identified CDN PoP is pre-warmed/pre-fetched for the requested media content, and the media content is made ready to serve the subscriber. In yet another scenario, the identified CDN PoP may re-direct the subscriber to another CDN PoP to serve the requested media content. Then, a manifest file is generated by the content provider platform and transmitted to the user device of the subscriber. The manifest file includes content playback Uniform Resource Locators (URLs) corresponding to various available resolutions, a CDN PoP identifier, and the like. Further, the user device parses the manifest file to access the requested content from the CDN PoP.
It is noted that due to one or more limitations such as Hardware limitations, Software limitations, Internet bandwidth limitations, and the like, the user device may not be able to serve the requested content to the subscriber in higher resolutions. The hardware limitations may include limited processing capability, limited physical display capability (e.g., the user device may only support 720p resolution), and the like. The software limitations may include a lack of certifications such as Widevine L1, L2, L3, and the like. To that end, the user device is unable to access the higher-end content from the CDN PoP even though the same was processed and downloaded by the CDN PoP. Therefore, leading to a wastage of processing resources, wastage of time, and an increased load at the CDN’s end which can create a bottleneck for other subscribers relying on the CDN for content thereby, requiring more CDNs in turn causing a financial burden on the content provider platform. Further, streaming low-quality media due to these limitations will also degrade the subscriber’s viewing experience thereby, leading to subscriber attrition.
In light of the above discussion, there exists a need for a technical solution for encoding and delivering content to a subscriber while addressing various limitations associated with the existing content delivery mechanism.

SUMMARY:
Various embodiments of the present disclosure provide methods and systems for adaptively encoding media content into a plurality of encoded media content files.
In an embodiment, a computer-implemented method for adaptively encoding media content into a plurality of encoded media content files is disclosed. The computer-implemented method performed by a system includes receiving a media content request from an electronic device of a content viewer for a media content. The computer-implemented method includes accessing raw media content associated with the media content based, at least in part, on the media content request. Further, the computer-implemented method includes extracting frame-level data from the raw media content. The computer-implemented method further includes computing, by an adaptive encoding model associated with the system, a complexity score for the raw media content based, at least in part, on the frame-level data. Furthermore, the computer-implemented method includes mapping one or more media content encoding parameters and a bitrate ladder to the media content based, at least in part, on the complexity score. Thereafter, the computer-implemented method includes encoding the media content to obtain the plurality of encoded media content files based, at least in part, on the one or more media content encoding parameters and the bitrate ladder. Herein, each encoded media content file corresponds to a particular content resolution and a particular bitrate.
In another embodiment, a system is disclosed. The system includes a communication interface and a memory including executable instructions. The system also includes a processor communicably coupled to the communication interface and the memory. The processor is configured to execute the instructions to cause the system, at least in part, to receive a media content request from an electronic device of a content viewer for a media content. The system is further caused to access raw media content associated with the media content based, at least in part, on the media content request. Further, the system is caused to extract frame-level data from the raw media content. Furthermore, the system is caused to compute, by an adaptive encoding model associated with the system, a complexity score for the raw media content based, at least in part, on the frame-level data. Moreover, the system is caused to map one or more media content encoding parameters and a bitrate ladder to the media content based, at least in part, on the complexity score. Thereafter, the system is caused to encode the media content to obtain a plurality of encoded media content files based, at least in part, on the one or more media content encoding parameters and the bitrate ladder. Herein, each encoded media content file corresponds to a particular content resolution and a particular bitrate.
In yet another embodiment, a non-transitory computer-readable storage medium is disclosed. The non-transitory computer-readable storage medium includes computer-executable instructions that, when executed by at least a processor of a system, cause the system to perform a method. The method includes receiving a media content request from an electronic device of a content viewer for a media content. The method includes accessing raw media content associated with the media content based, at least in part, on the media content request. Further, the method includes extracting frame-level data from the raw media content. The method further includes computing, by an adaptive encoding model associated with the system, a complexity score for the raw media content based, at least in part, on the frame-level data. Furthermore, the method includes mapping one or more media content encoding parameters and a bitrate ladder to the media content based, at least in part, on the complexity score. Thereafter, the method includes encoding the media content to obtain the plurality of encoded media content files based, at least in part, on the one or more media content encoding parameters and the bitrate ladder. Herein, each encoded media content file corresponds to a particular content resolution and a particular bitrate.
In other words, the various embodiments of the present disclosure provide an adaptive encoding of media content that is ingested on a CDN and delivers the encoded media content to a subscriber in an efficient manner. The term ‘adaptive encoding’ refers to the encoding of various frames of media content via an Artificial Intelligence (AI) or a Machine Learning (ML) model based on one or more media content-related characteristics. The one or more media content-related characteristics include at least video motion level, image characteristics, genre classification, media complexity, producer classification, subscription classification, grain strength level (or pixel richness), darkness level, provider level, complexity score, and the like. Further, the adaptive encoding includes processing the media content to remove temporal and spatially redundant frames to reduce the computational burden on the media encoder. In particular, the media content is pre-processed at the frame level before being transmitted to the CDN servers to encode foreground frames, background frames, keyframes, redundant frames, etc., adaptively. As a result of the adaptive encoding, the processing resources required for performing the encoding process are reduced while also reducing file sizes which makes streaming easier even on connections with limited bandwidth or unstable bandwidth.
Further, the content provider platform also captures subscriber meta-data related to the subscriber during the registration or login process. The subscriber meta-data may include subscriber location, available bandwidth data, contact details, user device-related data, and the like. The user device-related data includes at least the screen resolution of the user device, processing constraints, software constraints, and the like. A subscriber profile is generated based on this subscriber meta-data by the AI or ML model. Further, the CDN PoP identified by the CDN to serve content to the subscriber relies on the subscriber profile to download the requested content from the CDN in specific resolutions only. For example, if the CDN PoP determines based on the subscriber profile that a user device supports 720p streaming and is unable to play any higher resolution, then, the CDN PoP may only download encoded media content of 480p and 720p to serve the subscriber. Therefore, eliminating the need to download unnecessary content of higher resolutions (i.e., 1080p, 1440p, 2160p, etc.) thereby, saving resources at the CDN PoP end at well.

BRIEF DESCRIPTION OF THE FIGURES
The advantages and features of the invention will become better understood with reference to the detailed description taken in conjunction with the accompanying drawings, wherein like elements are identified with like symbols, and in which:
FIG. 1 illustrates an exemplary environment for streaming media content to a plurality of subscribers related to at least some embodiments of the present disclosure;
FIG. 2 illustrates a block diagram of a system configured to adaptively encode media content (i.e., raw media content files) to generate encoded media content, in accordance with an embodiment of the present disclosure;
FIG. 3 illustrates a simplified block diagram representation for determining the complexity score from the raw media content, in accordance with an embodiment of the present disclosure;
FIG. 4 illustrates a simplified block diagram representation for determining the cumulative encoding ladder for the raw media content using an adaptive encoding model, in accordance with an embodiment of the present disclosure;
FIG. 5 illustrates a simplified block diagram representation for determining the provider level of input video content, in accordance with an embodiment of the present disclosure;
FIG. 6 illustrates a simplified representation of the three different darkness level categories, in accordance with an embodiment of the present disclosure; and
FIG. 7 depicts a flow diagram of a method for adaptively encoding media content into a plurality of encoded media content files, in accordance with an embodiment of the present disclosure.
DETAILED DESCRIPTION
The best and other modes for carrying out the present invention are presented in terms of the embodiments, herein depicted in FIGS. 1 to 7. The embodiments are described herein for illustrative purposes and are subject to many variations. It is understood that various omissions and substitutions of equivalents are contemplated as circumstances may suggest or render expedient but are intended to cover the application or implementation without departing from the spirit or scope of the invention. Further, it is to be understood that the phraseology and terminology employed herein are for the purpose of the description and should not be regarded as limiting. Any heading utilized within this description is for convenience only and has no legal or limiting effect.
Reference in this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. The appearance of the phrase “in an embodiment” in various places in the specification does not necessarily all refer to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not for other embodiments.
The terms “a” and “an” herein do not denote a limitation of quantity, but rather denote the presence of at least one of the referenced items
The term ‘media content’ as used herein primarily refers to any multimedia content, such as streaming video content, audio content, and the like which is delivered to a user device such as a mobile phone, a personal computer, or a television set in response to user’s demand for content. Hereinafter, the term ‘media content’ is also interchangeably referred to as ‘content’ or ‘streaming content’ for purposes of the description.
The term ‘stream’ refers to the continuous transmission of media files from a server to a client such as from a Content Delivery Network (CDN) to a subscriber. Before transferring, the media file is encoded in different resolutions and the same media file may be played in a suitable resolution compatible with the network resolution on the user device.
The term ‘Content Delivery Network’ or ‘CDN’ as used herein primarily refers to a large distributed system of servers deployed in multiple data centers across the Internet. The goal of the CDN is to serve content to end-users with high availability and performance. In a non-limiting implementation disclosed by the present disclosure, CDNs serve live and on-demand streaming content to the subscriber.
Overview:
Various embodiments of the present disclosure provide methods, systems electronic devices, and computer program products for adaptively encoding media content into a plurality of encoded media content files. In one embodiment, the present disclosure describes a system that is configured to receive a media content request from an electronic device of a content viewer for the media content. In an embodiment, the system is further configured to access raw media content associated with the media content based, at least in part, on the media content request. In a non-limiting example, the raw media content can be compressed based, at least in part, on a set of fixed encoding parameters. Further, the system may extract frame-level data from the raw media content. In a non-limiting implementation, the frame-level data may include at least one or more of a frame size, a frame type, Structural Similarity Index (SSIM) value, average brightness, a number of frames, a motion level, a number of elements in media content, and the like.
In an embodiment, to extract the frame-level data, the system is further configured to randomly select a frame from the raw media content. Then, the system may compare the selected frame with at least a consecutive frame and a preceding frame to the selected frame. Then, the system may compute a motion level of the raw media content based, at least in part, on comparing the selected frame with the at least consecutive frame and the preceding frame. Herein, the frame-level data may include at least the motion level.
Thereafter, in an embodiment, the system is configured to compute a complexity score for the raw media content based, at least in part, on the frame-level data. In a non-limiting implementation, the system may compute the complexity score using an adaptive encoding model associated with the system. Further, the system may map one or more media content encoding parameters and a bitrate ladder to the media content based, at least in part, on the complexity score. Furthermore, the system may encode the media content to obtain the plurality of encoded media content files based, at least in part, on the one or more media content encoding parameters and the bitrate ladder. Herein, each encoded media content file may correspond to a particular content resolution and a particular bitrate.
In some embodiments, the system is configured to determine a genre of the raw media content. The system may further determine one or more genre-specific parameters for the raw media content based, at least in part, on the determined genre. Then, the system may update the one or more media content encoding parameters and the bitrate ladder for the media content based, at least in part, on the one or more genre-specific parameters.
In some other embodiments, the system is configured to access a subscriber profile associated with the content viewer. Further, the system may determine a subscription category of the content viewer based, at least in part, on the subscriber profile. Then, the system may determine a content producer of the raw media content. Thereafter, the system may determine one or more producer and subscription-specific parameters based at least in part on the determined content producer and the determined subscription category. The system may further update the one or more media content encoding parameters and the bitrate ladder for the media content based, at least in part, on the one or more producer and subscription-specific parameters.
In a specific embodiment, the system is configured to determine a grain strength category of the raw media content. The system may determine one or more grain strength-specific parameters based, at least in part, on the determined grain strength category. Then, the system may update the one or more media content encoding parameters and the bitrate ladder for the media content based, at least in part, on the one or more grain strength-specific parameters.
In another embodiment, the system is configured to determine a darkness level category of the raw media content. Then, the system may determine one or more darkness level-specific parameters based at least in part on the darkness level category. The system may update the one or more media content encoding parameters and the bitrate ladder for the media content based, at least in part, on the one or more darkness level-specific parameters.
In a non-limiting implementation, the system is configured to identify at least one content repository server in a vicinity of the content viewer based, at least in part, on a subscriber profile associated with the content viewer. Later, the system may ingest the plurality of encoded media content files on the at least one content repository server. Thereafter, the system may generate and transmit a manifest file including a plurality of Uniform Resource Locators (URLs) corresponding to the plurality of encoded media content files to the content viewer.
In another non-limiting implementation, the system is configured to identify one or more specific content instances within the raw media content. Herein, the one or more specific content instances may include at least fast-moving content, slow-moving content, and depth of the raw media content. The system may further predict one or more high-attention areas and one or more low-attention areas within the one or more specific content instances. In a non-limiting example, the system may predict the one or more high-attention areas and the one or more low-attention areas using the adaptive encoding model. Further, in an embodiment, the system encodes the one or more high attention areas based, at least in part, on one or more lossless encoding parameters. In another embodiment, the system encodes the one or more low attention areas based, at least in part, on one or more lossy encoding parameters.
Conventionally, the media content is offered by a content provider platform in all supported resolutions, as multi-resolution streaming offers the best of both worlds to the subscribers. Essentially, subscribers with slower internet and limited bandwidth should also be able to access all the media content and should not face high latency and buffering state which ruins the user experience. Hence, to avoid this problem the content provider encodes media content of multiple resolutions and shares it with the CDN, and then the media content is further transferred to a CDN PoP or sub-boxes. Each PoP contains a number of caching servers responsible for content delivery to the subscribers within its proximity.
Media encoding is the process of compressing and configuring the format of raw media content to a digital file or format, which will in turn make the media content compatible with different devices and platforms. It is noted that every user device is different and supports different media formats while having different bandwidths to stream the media content. Hence, adaptive video encoding and content delivery optimization are introduced by the present disclosure during the content delivery process to ensure encoded media content delivery to the subscriber in an efficient manner for the available bandwidth while maintaining minimum bitrate.
In particular, the present disclosure provide adaptive encoding of media content that is ingested on the CDN and delivers the encoded media content to the subscriber. The term ‘adaptive encoding’ refers to encoding the various frames of media content via an Artificial Intelligence (AI) or a Machine Learning (ML) model based on one or more media content-related characteristics. The one or more media content-related characteristics include at least video motion level, image characteristics, genre classification, media complexity, producer classification, subscription classification, grain strength level (or pixel richness), darkness level, provider level, complexity score, and the like. Further, adaptive encoding includes processing the media content to remove temporal and spatially redundant frames to reduce the computational burden on the media encoder. In particular, media content is pre-processed at the frame level before being transmitted to CDN servers to encode foreground frames, background frames, keyframes, redundant frames, etc., adaptively. As a result of the adaptive encoding, the processing resources required for performing the encoding process are reduced while also reducing file sizes which makes streaming easier even on connections with limited bandwidth or unstable bandwidth.
Further, the content provider platform also captures subscriber meta-data related to the subscriber during the registration or login process. The subscriber meta-data may include subscriber location, available bandwidth data, contact details, user device-related data, and the like. The user device-related data includes at least the screen resolution of the user device, processing constraints, software constraints, and the like. A subscriber profile is generated based on this subscriber meta-data by an AI or ML model. Further, the CDN PoP identified by the CDN to serve content to the subscriber relies on the subscriber profile to download the requested content from the CDN in specific resolutions only. For example, if the CDN PoP determines based on the subscriber profile that the user device supports 720p streaming and is unable to play any higher resolution, then, the CDN PoP may only download and/or transmit encoded media content of 480p and 720p to serve the subscriber. Therefore, eliminating the need to download unnecessary content of higher resolutions (i.e., 1080p, 1440p, 2160p, etc.) thereby, saving resources at the CDN PoP end at well.
FIG. 1 illustrates an exemplary environment 100 for streaming media content to a plurality of subscribers 102A, 102B, and 102C related to at least some embodiments of the present disclosure.
The term ‘content provider platform’ or ‘content provider’ as used interchangeably herein refers to an entity that holds digital rights associated with digital content present within digital video content libraries, and offers the video content on a subscription basis by using a digital platform and Over-The-Top (OTT) media services, i.e., the video content is streamed over the Internet to the user devices of the subscribers. A streaming content provider is hereinafter referred to as a ‘content provider’ for ease of description.
In an embodiment, the environment 100 depicts the plurality of subscribers 102A, 102B, and 102C (referred to hereinafter as ‘subscriber 102’) associated with a plurality of user devices 104A, 104B, and 104C (referred to hereinafter as ‘user device 104’). In at least some embodiments, the term ‘subscriber’ may also include one or more users in addition to the individual subscriber, for example, family members of the subscriber. To that effect, the term ‘subscriber’ as used herein may include one or more users. Further, the ‘subscriber’ is also interchangeably referred to hereinafter as a ‘content viewer’. The user device 104 is utilized by the subscriber 102 for viewing/accessing/requesting media content offered by the content provider. It is noted that the user device 104 is depicted as a smartphone for illustration purposes only and other suitable electronic devices may be used by the subscriber 102 as well. In some non-limiting examples, the user device 104 may include a smartphone, a tablet computer, a handheld computer, a wearable device, a portable media player, a gaming device, a Personal Digital Assistant (PDA), and the like.
Further, the environment 100 depicts a system 106 associated with a database 110 communicably connected to a plurality of content repository servers such as Content Delivery Networks (CDNs) 108(1), 108 (2),…, 108 (N), where ‘N’ is a natural number (hereinafter referred to interchangeably as ‘CDN 108). The CDN 108 refers to a large distributed system of servers deployed in multiple data centers across the Internet. The CDN 108 is responsible for serving media content to end-users with high availability and performance. The term ‘media content’ refers to any content offered by the content provider that may be embodied as streaming content such as live streaming media content or on-demand video streaming content. The term ‘media content’ as used herein may include ‘video content’, ‘audio content’, ‘gaming content’, ‘textual content’, and any combination of such content offered in an interactive or non-interactive form. Accordingly, the term ‘content’ is also interchangeably referred to hereinafter as ‘media content’ for the purposes of the present disclosure. Individuals wishing to view/access the content may subscribe to at least one type of subscription offered by the content provider.
Initially, the system 106 may encode raw media content or raw media files present inside the database 110 into one or more digital formats that are compatible with the video players or video extensions supported by a mobile application or a Web application of content provider platform either installed or operating on the user device (such as user device 104A) of the subscriber (such as subscriber 102A). This stage is known as the media pre-processing stage. Media encoding is the process of compressing and configuring the format of raw media content to a digital file or format, which will in turn make the media content compatible with different devices and platforms. It is noted that every user device is different and supports different media formats while having different bandwidths to stream the media content.
In an embodiment, the system 106 may adaptively encode the raw media content via an AI or ML model based, at least in part, on one or more media content-related characteristics. In a non-limiting example, the one or more media content-related characteristics include at least a complexity score determined using video motion level and image characteristics, a complexity level, genre classification, media complexity, producer classification, subscription classification, film grain strength level or pixel richness (or simply, grain strength level), darkness level, provider level, and the like. Further, adaptive encoding by the system 106 includes processing the media content to remove temporal and spatially redundant frames to reduce the computational burden. In particular, the media content is pre-processed at the frame level to encode foreground frames, background frames, keyframes, redundant frames, etc., adaptively. As a result of the adaptive encoding, the processing resources required for performing the encoding process are reduced while also reducing file sizes which makes streaming easier even on connections with limited bandwidth or unstable bandwidth. Hence, ensuring that when the encoded media content is served to the subscriber 102, the same is achieved efficiently for the available bandwidth while maintaining a minimum bitrate. Further, the system 106 transmits the encoded media content into the CDN 108 using a network 114 (explained later). When the subscriber 102A requests a desired media content, it is delivered using the encoded media content present inside the CDN 108. This process has been explained further later with reference to FIG. 1.
In an illustrative example, to subscribe to the streaming content services offered by the content provider, subscribers such as the subscriber 102 may register with the content provider by creating an online account on the content provider’s portal. As a part of the account creation process, the subscriber 102 may provide personal information, such as age, gender, language preference, content preference, and any other personal preferences to the content provider. Such information may be stored in a subscriber profile along with other account information such as type of subscription, validity date of the subscription, etc., in a database 110 associated with the content provider. Further, user device-related data is also captured in the subscriber profile during the registration process and stored in the database 110. The user device-related data may include at least the screen resolution of the user device 104A, processing constraints, software constraints, and the like. Further, subscriber meta-data related to the subscriber 102 may be captured regularly by the content provider platform upon each new login. In a non-limiting example, the subscriber meta-data may include at least subscriber location, available bandwidth data, contact details, user device-related data, and the like. The user device-related data includes at least the screen resolution of the user device, processing constraints, software constraints, and the like. In a non-limiting example, the subscriber profile is generated based on the subscriber meta-data by an AI or ML model for each login session or over the duration of the subscription.
Once the subscriber 102 has created the account, the subscriber 102 may access a User Interface (UI) of the mobile application or the Web application associated with the content provider to view/access content. It is understood that the user device 104 may be in operative communication with a communication network, such as the Internet, enabled by a network provider, also known as the network 114 (such as an Internet Service Provider (ISP) network). The user device 104 may connect to the network 114 using a wired network, a wireless network, or a combination of wired and wireless networks. Some non-limiting examples of wired networks may include the Ethernet, the Local Area Network (LAN), a fiber-optic network, and the like. Some non-limiting examples of wireless networks may include Wireless LAN (WLAN), cellular networks, Bluetooth or ZigBee networks, and the like.
The user device (such as user device 104A) may fetch the Web interface associated with the content provider over the network 114 and cause a display of the Web interface on a display screen of the user device 104A. In an illustrative example, the Web interface may include a plurality of content titles corresponding to the content offered by the content provider to its subscriber (such as subscriber 102A).
In an illustrative example, the subscriber 102A may select a content title from among the plurality of content titles displayed on the display screen of the user device 104A. The selection of the content title may trigger a content request from the subscriber 102A. The content request may include a request for a manifest file along with other information such as network Autonomous System Number (ASN), an Internet Protocol (IP) address, and location information among other metadata associated with the user device 104A. The content request is sent from the user device 104A via the network 114 to the system 106 associated with the content provider.
In at least one embodiment, the system 106 is configured to authenticate the subscriber 102A and determine if the subscriber 102A is entitled to view the requested media content. To this effect, the system 106 may be in operative communication with one or more remote servers, such as an authentication server and an entitlement server (not shown in FIG. 1 for the sake of brevity). The authentication server may facilitate authentication of account credentials associated with the subscriber 102A using standard authentication mechanisms, which are well known and thus, are not explained herein. The entitlement server may facilitate the determination of the subscriber’s subscription type (i.e., which subscription tier the subscriber 102A has subscribed to) and status (i.e., whether the subscription is still active or is expired) from the account credentials, which in turn may enable determination of whether the subscriber 102A is entitled to view/access the requested content or not. Upon successful authentication and subscription determination, the system 106 identifies the CDN (such as CDN 108(1)), i.e., in the proximity to the location of the subscriber 102A and which stores the requested media content. The CDN 108(1) is the most optimal from the CDN 108 for serving the subscriber 102A with the requested media content. It is noted that the system 106 determines the location of the subscriber 102A by extracting the location data, the network ASN, and the IP address from the content request from the subscriber 102A.
The system 106 is configured to take into account, the location of the subscriber 102A, a content ID, performance metrics associated with the CDN 108, and one or more routing policies for determining the most optimal CDN for serving the requested content to the subscriber 102A. The system 106 is further configured to generate a token after the determination of the optimal CDN. The token may be embodied as a Hash-Based Message Authentication Code (HMAC) token.
In various scenarios, determining the optimal CDN may further include determining at least one content repository server such as but not limited to the CDN PoP 112 or sub-box (referred interchangeably hereinafter) in the vicinity of the subscriber 102A if the same is available in the vicinity of the subscriber 102A. It is noted that CDN PoP 112 are data centers responsible for communicating with the users in their geographic vicinity. Each CDN PoP 112 may further include numerous caching servers. The CDN PoP 112 caches/downloads the encoded media content from the CDN 108(1) before delivering the same to the subscriber 102A. It is noted that CDN PoP 112 downloads the encoded media content from the CDN 108(1) in multiple supported resolutions such as 480p, 720p, 1080p, 1440p, 2160p, and the like. This is done so that encoded media content can be served to the subscriber 102A irrespective of the bandwidth available to the subscriber 102A.
In an embodiment, the CDN PoP 112 determines the resolutions supported by the user device 104A based at least in part on the subscriber profile. In one example, if the CDN PoP 112 determines based on the subscriber profile that the user device only supports 720p streaming (determined based on the user device-related data within the subscriber profile) and is unable to play any higher resolution, then, the CDN PoP 112 may only download the encoded media content of 480p and 720p to serve the subscriber 102A. Therefore, eliminating the need to download unnecessary content of higher resolutions (i.e., 1080p, 1440p, 2160p, etc.) thereby, saving resources at the CDN PoP 112 end at well. In another example, if the CDN PoP 112 determines based on the subscriber profile that a user device only supports 1080p streaming (determined based on the bandwidth available to the subscriber 102A based on their location data) and is unable to play any higher resolution, then, the CDN PoP 112 may only download the encoded media content of 480p, 720p, and 1080p to serve the subscriber 102A. Therefore, eliminating the need to download unnecessary content of higher resolutions (i.e., 1440p, 2160p, etc.) thereby, saving resources at the CDN PoP’s end at well.
The system 106 provides the manifest file including one or more playback URLs for supported resolutions along with the HMAC token to the user device 104A. It is noted that the one or more playback URLs correspond to the supported resolutions by the CDN PoP 112 such as 480p, 720p, 1080p, 1440p, 2160p, and the like for the user device 104A previously determined based on the subscriber profile.
The transmission of the manifest is done via the network 114. The user device 104A may be configured to generate a Hypertext Transfer Protocol (HTTP) request using the one or more playback URLs and provide the token along with an HTTP request over the network 114 to the CDN PoP 112 as per the available bandwidth. For example, if the available bandwidth is excellent for the user device 104A supporting 720p streaming, the playback URL corresponding to the 720p stream may be used by the user device 104A to stream the encoded media content segments from the CDN PoP 112. However, if during the stream the bandwidth quality degrades, then future encoded media content segments may be fetched using the playback URL corresponding to the 480p stream. The transmission of the HTTP request and the token from the user device 104A to the CDN PoP 112 is done via the network 114. The delivery of the encoded media content from the CDN PoP 112 to the user device 104A via the network 114 as well.
FIG. 2 illustrates a block diagram of a system 200 configured to adaptively encode media content (i.e., raw media content 234 files) to generate encoded media content, in accordance with an embodiment of the present disclosure. It is noted that the system 200 is identical to system 106 of FIG. 1. In some embodiments, the system 200 is embodied as a cloud-based and/or SaaS-based (software as a service) architecture. Further, it is noted that although the present explanation has been provided with regards to the raw media content 234 for the content provider, the various embodiments of the present disclosure are applicable for live streaming media content as well and, therefore, should not be taken to limit the scope of the present disclosure.
The system 200 is depicted to include a processing module 202, a memory module 204, an Input/Output (I/O) module 206, and a communication module 208. It is noted that although the system 200 is depicted to include the processing module 202, the memory module 204, the I/O module 206, and the communication module 208, in some embodiments, the system 200 may include more or fewer components than those depicted herein. The various components of the system 200 may be implemented using hardware, software, firmware, or any combination thereof. Further, it is also noted that one or more components of the system 200 may be implemented in a single server or a plurality of servers, which are remotely placed from each other.
Further, the system 200 is communicably coupled with a database 232. It is noted that the database 232 is identical to the database 110 of FIG. 1. The database 232 may be incorporated in the system 200 or maybe an individual entity connected to the system 200 or maybe the database stored in cloud storage. The database 232 is configured to store the raw media content 234 and an adaptive encoding model 236. In various non-limiting examples, the database 232 may further include various instructions or firmware data essential for the operation of the system 200. The outputs from the various modules and sub-modules such as a complexity classification sub-module 220, a producer and subscription classification sub-module 224, a film grain strength classification sub-module 226, and a darkness classification sub-module 228 are used as inputs to the adaptive encoding model 236 (the modules will be explained in subsequent paragraphs). The outputs from adaptive encoding model 236 include optimal bitrate control, optimal average bitrate, and max bitrate, denoising strength, sharpening strength, and parameters to fine-tune the encoding (such as in the case of cartoon, dark scene, grainy scene, high motion scene, etc.). The adaptive encoding model 236 also provides information that can be used to enhance the encoding of the current scene. It is to be noted that under the same constant rate factor (CRF) value, the darker contents usually generate worse quality. Herein, the trained model smartly decides the CRF offset based on the content darkness to achieve consistent output quality.
In one embodiment, the database 232 is configured to store ASN/IP pool, CDN registration data, CDN-content map, routing policies, real-time metrics of CDNs, hostnames for CDNs, subscriber profiles, subscriber meta-data, user device-related data, and the like. The database 232 may include multiple storage units such as hard drives and/or solid-state drives in a Redundant Array of Inexpensive Disks (RAID) configuration. In some embodiments, the database 232 may include a Storage Area Network (SAN) and/or a Network-Attached Storage (NAS) system. In one embodiment, the database 232 may correspond to a distributed storage system, wherein the individual database 232 is configured to store custom information, such as routing policies, ASN/IP pool, CDN registration data, etc.
It is to be noted that the system 200 as illustrated and hereinafter described is merely illustrative of an apparatus that could benefit from embodiments of the present disclosure and, therefore, should not be taken to limit the scope of the present disclosure. It is to be noted that the system 200 may include fewer or more components than those depicted in FIG. 2.
In one embodiment, the processing module 202 may be embodied as a multi-core processor, a single-core processor, or a combination of one or more multi-core processors and one or more single-core processors. For example, the processing module 202 may be embodied as one or more of various processing devices, such as a coprocessor, a microprocessor, a controller, a Digital Signal Processor (DSP), a processing circuitry with or without an accompanying DSP, or various other processing devices including integrated circuits such as, for example, an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Microcontroller Unit (MCU), a hardware accelerator, a special-purpose computer chip, or the like. In one embodiment, the memory module 204 is capable of storing machine-executable instructions, referred to herein as platform instructions 210. Further, the processing module 202 is capable of executing the platform instructions 210. In an embodiment, the processing module 202 may be configured to execute hard-coded functionality. In an embodiment, the processing module 202 is embodied as an executor of software instructions, wherein the instructions may specifically configure the processing module 202 to perform the algorithms and/or operations described herein when the instructions are executed.
The memory module 204 stores instructions/code configured to be used by the processing module 202, or more specifically by the various modules of the processing module 202. The memory module 204 may be embodied as one or more non-volatile memory devices, one or more volatile memory devices, and/or a combination of one or more volatile memory devices and non-volatile memory devices. For example, the memory module 204 may be embodied as semiconductor memories, such as flash memory, mask ROM, PROM (programmable ROM), EPROM (erasable PROM), RAM (random access memory), etc., and the like.
In an embodiment, the I/O module 206 may include mechanisms configured to receive inputs from and provide outputs to an operator of the system 200. The term ‘operator of the system 200’ as used herein may refer to one or more individuals, whether directly or indirectly associated with managing the digital OTT platform on behalf of the content provider. To enable the reception of inputs and provide outputs to the system 200, the I/O module 206 may include at least one input interface and/or at least one output interface. Examples of the input interface may include but are not limited to, a keyboard, a mouse, a joystick, a keypad, a touch screen, soft keys, a microphone, and the like. Examples of the output interface may include but are not limited to, a display such as a light-emitting diode display, a Thin-Film Transistor (TFT) display, a liquid crystal display, an Active-Matrix Organic Light-Emitting Diode (AMOLED) display, an microphone, a speaker, a ringer, and the like.
In an example embodiment, at least one module of the system 200 may include I/O circuitry (not shown in FIG. 1) configured to control at least some functions of one or more elements of the I/O module 206, such as, for example, a speaker, a microphone, a display, and/or the like. The module of the system 106 and/or the I/O circuitry may be configured to control one or more functions of the elements of the I/O module 206 through computer program instructions, for example, software and/or firmware, stored on a memory, for example, the memory module 204, and/or the like, accessible to the processing module 202 of the system 200.
The communication module 208 is configured to facilitate communication between the system 200 and one or more remote entities over a communication network. For example, the communication module 208 is capable of facilitating communication with user device 104 of the subscriber 102, with ISPs, with edge servers associated with CDNs, with content ingestion servers, and the like.
The processing module 202 is depicted to include a media encoding module 214, a transmission module 216, and a manifest module 218. The media encoding module 214 is depicted to further include the complexity classification sub-module 220, the genre classification sub-module 222, the producer and subscription classification sub-module 224, the film grain strength classification sub-module 226, the darkness classification sub-module 228, and a ladder prediction sub-module 230.
The various components of the system 200, such as the processing module 202, the memory module 204, the I/O module 206, and the communication module 208 are configured to communicate with each other via or through a centralized circuit system 212. The centralized circuit system 212 may be various devices configured to, among other things, provide or enable communication between the components of the system 200. In certain embodiments, the centralized circuit system 212 may be a central Printed Circuit Board (PCB) such as a motherboard, a main board, a system board, or a logic board. The centralized circuit system 212 may also, or alternatively, include other Printed Circuit Assemblies (PCAs) or communication channel media.
In at least one embodiment, the media encoding module 214 includes suitable logic and/or is configured to adaptively encode the raw media content 234 based on one or more media content-related characteristics. The adaptive encoding process is collectively carried out by the various sub-modules of the media encoding module 214. It is noted that while adaptively encoding the raw media content 234, either one, a combination, or all of the sub-modules of the media encoding module 214 can be utilized by the system 200, as per the requirements of the content provider platform. In a non-limiting example, the decision of using one or more of the sub-modules of the media encoding module 214 may be based on the subscriber profile of a subscriber (such as subscriber 102). In another non-limiting example, the decision of using one or more of the sub-modules of the media encoding module 214 may be based on the analysis of a plurality of subscriber profiles associated with the plurality of subscribers 102. This analysis may be performed for a specific region such as a first set of sub-modules may be used for the India region, while a second set of sub-modules may be used for the China region. Other analysis approaches may also be used while determining which sub-modules to use during the adaptive encoding process and the same would be covered under the scope of the present disclosure.
In an embodiment, the complexity classification sub-module 220 is configured to determine a complexity score associated with the raw media content 234. At first, the complexity classification sub-module 220 accesses the raw media content 234 from the database 232 and compresses the raw media content 234 based on a set of fixed encoding parameters. In a non-limiting example, the set of fixed encoding parameters includes at least a codec, profile, level, resolution, maximum bitrate, buffer size, quantization, Group of Picture (GOP) structure, and the like. Further, the raw media content 234 is analyzed to extract frame-level data. In a non-limiting example, the frame-level data includes at least one or more of a frame size, a frame type (such as I-frames, P-frames, or B-frames), Structural Similarity Index (SSIM) value, average brightness of the raw media content 234 (i.e., of the area after black edge removal), a number of frames in the raw media content 234, motion level (i.e., switching rate of the raw media content 234), number of elements in the video, motion level of the video and the like. In a non-limiting example, the term ‘motion level’ refers to a rate of change of a number of characters/objects between consecutive frames in the media content. For example, if two humans are depicted in a first frame. Then, if in the consecutive frame, only one human is depicted on the frame with the same background, the change in the depiction of a number of objects/characters or background is termed as the motion level. Further, the frame-level data is converted to a sequential array. Then, the complexity score is computed via the AI or ML model such as the adaptive encoding model 236 based at least in part on the sequential array. It is noted that the complexity score indicates the preliminary complexity of the raw media content 234.
In an embodiment, the adaptive encoding model 236 is also configured to determine the rate at which the characters/objects in the media are moving. In a non-limiting example, this analysis is performed by randomly selecting one frame from the media content. Then, comparing the selected frame with a consecutive frame or a preceding frame. This enables the adaptive encoding model 236 to determine the motion level in the media content.
In an alternative non-limiting example, the analysis is performed by selecting different frames based on predefined guidelines. For example, frames from the rolling titles or credit scenes in the media content may be selected, frames from scene boundaries (e.g., change of characters or change of background), or frames that are present just before action sequences within the media content may be selected by the adaptive encoding model 236. In one implementation, meta-data markings within the media content may be used by the adaptive encoding model 236 to determine which frames to select for further analysis.
It is noted that apart the motion level data, the adaptive encoding model 236 may also predict attention areas within the raw media content 234 that the subscriber 102A may focus on, i.e., one or more high-attention areas. In an embodiment, the adaptive encoding model 236 may predict the attention of the subscriber 102A when the scenes of the media content are not moving fast, and the depth of the scene varies greatly (between foreground and background). In other words, the adaptive encoding model 236 is configured to predict the attention of the subscriber 102A during one or more specific content instances such as (1) slow moving/paced content and (2) when the depth of the media content varies over a predefined threshold. Similarly, one or more low-attention areas may also be predicted. Then, the adaptive encoding model 236 may determine to encode content areas that may gain the one or more low-attention areas in a lossy manner. However, areas that may gain the one or more higher-attention from the subscriber may be encoded in a lossless manner. Therefore, it becomes possible to encode content in a manner where the subscriber 102A will not perceive any difference. Furthermore, redundant frames can also be selected based on spatial and temporal redundancies from the media content.
The complexity classification sub-module 220 is further configured to determine the complexity level of the raw media content 234 based on the complexity score. Herein, the complexity level/score is the value output by an estimation model 308 (will be explained in FIG. 3). The complexity level/score reflects the basic intrinsic video complexity such as the video breadth and depth. It is to be noted that, the complexity level/score describes whether the media content of high bitrate, medium bitrate, or low bitrate should be used for delivering a satisfactory media content quality to the subscriber 102A. Thereby, saving on processing and network resources. Additionally, the complexity level can be combined with other factors to predict an optimal encoding ladder (bitrate) and parameters that can output satisfactory media content quality under a predefined cost control. This aspect has been described later in the present disclosure. The pre-defined encoding ladders are created based on the previous encoding data. Further, the media content can be classified/categorized based on complexity score/level. It is noted that for each category different encoding ladders are used and the optimal encoding ladder is determined based on the output quality and the stream size. In other words, the encoding ladder which generates a satisfactory media quality at the lowest bitrate is the optimal one.
The complexity classification sub-module 220 is further configured to transmit the complexity score to the genre classification sub-module 222 and the ladder prediction sub-module 230.
In an embodiment, the ladder prediction sub-module 230 is configured to map the complexity level to the one or more media content encoding parameters and the bitrate ladder for the media content.
In an embodiment, the genre classification sub-module 222 is configured to determine the genre of the raw media content 234 based at least on genre labels associated with the raw media content 234. In various non-limiting examples, the genre includes comedy, horror, fiction, fantasy, crime and thriller, family drama, and the like. The genre classification sub-module 222 classifies the raw media content 234 into three complexity categories, i.e., high, medium, and low. Further, the genre classification sub-module 222 determines one or more genre-specific parameters for the raw media content 234 based on the genre and the complexity category for the raw media content 234. In a non-limiting example, for high-complexity media content, if the genre is a reality show, then reality show-specific parameters are determined by the genre classification sub-module 222. This aspect of the present disclosure allows for encoding the raw media content 234 while ensuring that the key details required for a particular genre are preserved.
For example, if the genre is horror then, it is generally expected that most of the scenes may be shot with a dark background. Therefore, the subscriber 102A will require higher details in dark areas of the media content however, the same is not the case for a romantic comedy. Therefore, by utilizing one or more genre-specific parameters, this nuance between different media content is captured and correct details in the final encoded media content can be preserved thus, providing a better user experience while reducing the file size. The genre classification sub-module 222 is further configured to transmit the complexity score to the ladder prediction sub-module 230. It is noted that further categories may also be introduced to improve the granularity of the analysis on the basis of the available computational resources.
In an embodiment, the ladder prediction sub-module 230 is configured to update the one or more media content encoding parameters and the bitrate ladder for the media content based on the one or more genre-specific parameters. It is noted that in the conventional encoding some of the genre parameters such as tune, Adaptive Quantization (AQ)-mode, AQ-strength, etc., are not stable on different media content. Therefore, they do not perform well while encoding different media content. In such a scenario, general encoding parameters are used. Additionally, to avoid quality issues, bitrates are usually over-allocated so that there is no quality issue in different genre media content. In the present adaptive encoding, using the outputs from the various modules such as the complexity classification sub-module 220, the genre classification sub-module 222, the producer and subscription classification sub-module 224, the film grain strength classification sub-module 226, the darkness classification sub-module 228 and the ladder prediction sub-module 230 are used as inputs to the adaptive encoding model 236. It is noted that this process helps the adaptive encoding model 236 to gain better knowledge/insight of the content/scene complexity and use of more genre specific parameters to further improve compression efficiency. To determine the weightage of the different characteristics that are used to determine the encoding and bitrate, a large number of different media content are used as training datasets for the adaptive encoding model 236. In other words, different media content having different characteristics values are encoded with different parameters and then used as training datasets for the adaptive encoding model 236. Further, optimal parameters for the media content are confirmed by evaluating the output quality and bitrates. Further, training video characteristics are used as input and optimal parameters are produced as output by using the Machine Learning (ML) techniques. By using this process, the ML model, i.e., the adaptive encoding model 236 is developed with proper weights to predict optimal parameters for the given media content characteristics. As may be understood, training of the ML model is an iterative process that begins by initializing the ML model with certain operating parameters such as weights, biases, gradients, layers and so on. Then, the training process includes performing a prediction based on the training dataset(s). Later on, the predictions are compared with ground truth values from the training dataset and the obtained loss values are used to re-train or optimize the ML model using a plurality of back-propagation techniques. It is noted that this iterative training process is repeated till the determined loss values reach a threshold value. This threshold value may depict a saturation of the loss values at the same amount or a predefined value.
It is noted that each media content with different resolutions is encoded with multiple different bitrates (e.g., 2000 kbps, 2200 kbps, 2400 kbps, 2600 kbps, etc., among others). It is understood that the higher the bitrate is, the better the quality of media content delivered to the subscriber 102A is. However, it should be noted that for a specific content, the quality does not increase linearly with the bitrate and the media quality usually gets saturated at a specific bitrate. In other words, the media quality does not increase when the bitrate gets more than its saturated bitrate. As may be understood, the saturated bitrate usually varies between different media content. To that end, the various embodiments of the present disclosure enable the ladder prediction sub-module 230 to determine and select different bitrates or ladders for different media content requested by different subscribers.
In an embodiment, the producer and subscription classification sub-module 224 is configured to determine a content producer and a subscription category of the raw media content 234. Then, the producer and subscription classification sub-module 224 determines the one or more producer and subscription-specific parameters based at least in part on the content producer and the subscription category. In a non-limiting example, the subscription category includes premium subscribed content, general subscribed content, and general free content. These subscription categories are set up by the content provider platform. It is understood that the raw media content 234 from a high-end content producer will have more details when compared with a low-end content producer. Therefore, one or more lossy encoding parameters can be used while encoding the raw media content 234 from a high-end content producer whereas only one or more lossless encoding parameters can be used to encode the raw media content 234 from a low-end content producer. To that end, even if one or more lossy encoding parameters are used for the raw media content 234 from the high-end content producer, the visual impact of the encoding can be kept at a minimal level. However, if the same one or more lossy encoding parameters are used for the raw media content 234 from the low-end content producer, the visual impact will be significant and the media content quality will be significantly degraded. Therefore, by employing the one or more producer and subscription-specific parameters, optimal encoding can be achieved. The producer and subscription classification sub-module 224 is further configured to transmit the complexity score to the ladder prediction sub-module 230.
In an embodiment, the ladder prediction sub-module 230 is configured to update the one or more media content encoding parameters and the bitrate ladder for the media content based on the one or more producer and subscription-specific parameters. For example, a free subscription allows the content to be available for all the users and presented with second media content such as advertisements (“ads”) between the media content to cover the delivery cost for the content provider. In such a scenario, due to the limitation of ads revenue per user, the free users may not be presented with content having a high bitrate, but instead are presented with content at a lower bitrate. In another example, a premium subscription allows subscribers to access/stream content with high bitrate (“premium content”) along with additional second media content such as animations or infographics related to the requested media content, and the delivery cost is covered in the subscription fee.
In an embodiment, the film grain strength classification sub-module 226 is configured to analyze the grain strength or pixel richness of the raw media content 234. Then, the grain strength is classified into five categories, i.e., not visible, only visible in dark scenes, visible in dark scenes, visible across all videos, and very visible across all videos. In a non-limiting example, the adaptive encoding model 236 is used to classify the grain strength in the required categories. Further, the film grain strength classification sub-module 226 is configured to determine the one or more grain strength-specific parameters based at least in part on the grain strength category of the raw media content 234.
It is understood that the raw media content 234 with a grain strength category of ‘very visible across all videos’ will have very high pixel richness when compared with the category of ‘not visible’. Therefore, one or more lossy encoding parameters can be used while encoding the raw media content 234 of the ‘very visible across all videos’ category whereas only one or more lossless encoding parameters can be used to encode the raw media content 234 from the ‘not visible’ category. To that end, even if one or more lossy encoding parameters are used for the raw media content 234 from the ‘very visible across all videos’ category, the visual impact of the encoding can be kept at a minimal level. However, if the same one or more lossy encoding parameters are used for the raw media content 234 from the ‘not visible’ category, the visual impact will be significant and the media content quality will be significantly degraded. Therefore, by employing and determining the one or more grain strength-specific parameters, optimal encoding can be achieved. The film grain strength classification sub-module 226 is further configured to transmit the complexity score to the ladder prediction sub-module 230.
In an embodiment, the ladder prediction sub-module 230 is configured to update the one or more media content encoding parameters and the bitrate ladder for the media content based on the one or more grain strength-specific parameters.
In an embodiment, the darkness classification sub-module 228 is configured to analyze the darkness level within the raw media content 234. Then, the darkness level is classified into three categories, i.e., very dark, dark, and normal. In a non-limiting example, the adaptive encoding model 236 is used to classify the darkness level in the required categories. Further, the darkness classification sub-module 228 is configured to determine the one or more darkness level-specific parameters based at least in part on the darkness level category of the raw media content 234. In an embodiment, for evaluating the darkness threshold, thousands of clips with different darkness levels are extracted and compressed with the same CRF value. Further, the compressed videos are aggregated by quality level. It is to be noted that the quality can be categorized into three levels (explained later in reference to FIG. 6). Finally, the average content darkness of each category is computed and the darkness threshold is estimated.
It is understood that the raw media content 234 with a grain strength category of ‘very dark’ will have very high pixel richness when compared with the category of ‘normal’. Therefore, one or more lossy encoding parameters can be used while encoding the raw media content 234 from the ‘very dark’ category whereas only one or more lossless encoding parameters can be used to encode the raw media content 234 from the ‘normal’ category. To that end, even if one or more lossy encoding parameters are used for the raw media content 234 from the ‘very dark’ category, the visual impact of the encoding can be kept at a minimal level. However, if the same one or more lossy encoding parameters are used for the raw media content 234 from the ‘normal’ category, the visual impact will be significant and the media content quality will be significantly degraded. Therefore, by employing and determining the one or more darkness level-specific parameters, optimal encoding can be achieved. The darkness classification sub-module 228 is further configured to transmit the complexity score to the ladder prediction sub-module 230.
In an embodiment, the ladder prediction sub-module 230 is configured to update the one or more media content encoding parameters and the bitrate ladder for the media content based on the one or more darkness level-specific parameters. Further, the ladder prediction sub-module 230 is configured to compute the final values of the one or more media content encoding parameters and the bitrate ladder. Then, the ladder prediction sub-module 230 determines a cumulative encoding ladder based, at least in part, on the final media content encoding parameter value and the final bitrate ladder value.
In other words, the cumulative encoding ladder is determined, based at least in part, on the complexity score, one or more producer and subscription-specific parameters, one or more genre-specific parameters, one or more grain strength-specific parameters, and one or more darkness level-specific parameters. In a non-limiting example, the ladder prediction sub-module 230 is configured to determine the cumulative encoding ladder via the adaptive encoding model 236. Further, the ladder prediction sub-module 230 is configured to encode the raw media content 234 based at least in part on the cumulative encoding ladder (i.e., the final media content encoding parameter value and the final bitrate ladder value).
Upon encoding the raw media content 234, a plurality of encoded media content files is generated such that each encoded media content file corresponds to a content resolution (such as 480p, 720p, 1080p, 1440p, 2160p, and the like.) and a bitrate supported by the content provider platform. To that end, by relying on the cumulative encoded ladder to perform the encoding process, the overall size of encoded media content files is reduced without any noticeable degradation in the quality of the encoded media content. Further, due to the reduced file sizes, the subscriber 102 will be able to stream the desired media content in higher resolutions than previously possible with the same available bandwidth. Furthermore, due to reduced file sizes, the costs associated with the storage of media content on the CDN 108 will be reduced for the content provider platform.
In an embodiment, the transmission module 216 is configured to transmit the plurality of encoded media content files to the CDN 108. In a non-limiting scenario, the plurality of encoded media content files may be ingested by the CDN PoP 112 associated with the CDN 108. This is done to ensure that the subscriber 102 receives the encoded media content from their nearest CDN PoP 112 thereby, ensuring low latency and buffering.
As explained with reference to FIG. 1, when the subscriber 102A requests media content from the content provider platform, a content request is transmitted to the system 200. Upon receiving the content request for a manifest file, the manifest module 218 is configured to generate a manifest file for the user device 104A of the subscriber 102A. The manifest file includes content playback URLs corresponding to various available resolutions, a CDN PoP identifier, and the like. Further, the user device 104A can parse the manifest file to access the requested content from the CDN PoP 112.
It is noted that various embodiments of the present disclosure can also be applied to encode and stream a live event apart from static content (i.e., the raw media content 234). In particular, a delay window may be introduced in the live stream and meanwhile, the system 200 during which the system 200 may determine the complexity score followed by using the cumulative encoding ladder for the live-event stream and codec parameters to encode the final stream and deliver it to the subscribers 102 under limited latency. For example, in live streaming instead of directly transcoding the media content to output, a delay of N seconds (where N is a non-zero natural number) is applied in processing and is cached in the processor memory. The received chunk is fetched every N seconds from the memory and the chunk is used as raw media content 234, which is the input to the adaptive encoding process. Hence, for live streaming the same adaptive encoding process is used to encode the N second chunks with optimal bitrate and parameters.
FIG. 3 illustrates a simplified block diagram representation 300 for determining the complexity score from the raw media content 234, in accordance with an embodiment of the present disclosure.
The term “Constant Rate Factor (CRF)” refers herein to the default quality (and rate control) parameter of the H. 264 and H. 265 encoders. It is possible to set value between 0 and 51, where 0 is lossless as lower values to obtain better quality, (for 8-bit only), 23 is the default, and 51 is the worst quality possible. Further, the term “Group Of Pictures (GOP)” refers herein to the distance between two keyframes, measured in the number of frames, or the amount of time between different keyframes. For example, if a keyframe is inserted every 1 second into a video, it may count as 30 frames per second, the GOP length is 30 frames or 1 second.
Furthermore, the term “structural similarity (SSIM) index” refers herein to a perceptual metric that quantifies image quality degradation caused by processing such as data compression or by losses in data transmission. SSIM index value is used for measuring the similarity between two images. The resultant SSIM index is a decimal value between -1 and 1, where 1 indicates perfect similarity, 0 indicates no similarity, and -1 indicates perfect anti-correlation.
The video frame is compressed using different algorithms with different advantages and disadvantages, centered mainly on the amount of data compression needed. Different algorithms for video frames are called picture types or frame types. The three major picture types used in the different video algorithms are I, P, and B. An I-frame (Intra-coded picture) is a complete image, like a JPG or BMP image file. A P-frame (Predicted picture) holds only the changes in the image from the previous frame. A B-frame (Bidirectional predicted picture) saves even more space by using differences between the current frame and both the preceding and following frames to specify its content.
At first, Constant Rate Factor (CRF) output SSIM 302, CRF output average Bitrate 304, and CRF Output MSU Spatial Score 306 (MSU Spatial Score = Video average I frame size) are fed to the estimation model 308 as its input. It is noted that the Constant Rate Factor (CRF) is the default quality (and rate control) parameter of the H. 264 and H. 265 encoders. It is possible to set values between 0 and 51, where 0 is lossless as lower values give better quality, (for 8-bit only), 23 is the default, and 51 is the worst quality possible. Further, in the CRF output average bitrate 304, CRF is a “constant quality” encoding mode, as opposed to constant bitrate (CBR). Typically, it is used to achieve constant quality by compressing every frame of the same type by the same amount which means throwing away the same (relative) amount of information. It is noted that CRF encoding manages to output videos at a relative consistent quality. For an example, the quality of CRF messages may lie between 0-51 (0 being the best, 51 being the worst). Additionally, SSIM is the video quality score which lies between 0 and 1 (0 being the worst, 1 being the best). It is noted that the input video is encoded with a fixed CRF value to get an output video i.e., v_output. Further, the SSIM score of v_output , i.e., the CRF output SSIM 302, the average bitrate of v_output , i.e., the CRF output average bitrate 304, and the average I frame size of v_output as MSU spatial complexity score , i.e., the CRF output MSU spatial score 306 are calculated. Finally, the SSIM score of v_output 302, the average bitrate of v_output 304, and the average I frame size of v_output are all fed into the estimation model 308 to get a complexity score which will be used for ladder prediction sub-module 310. Further, complexity score with value is a value inferred by using CRF Output MSU Spatial Score 306, CRF Output Average Bitrate 304. It’s an empirical value to indicate the comprehensive video complexity. However, it is to be noted that this complexity score with value is not very accurate, so later it is required to classify the content complexity into several categories and use additional properties to get a fine-tuned encoding ladder. Typically, content complexity has two dimensions: spatial complexity and temporal complexity. Herein, temporal complexity usually reflects that the motion of the video and the scene changes within the video. Spatial complexity usually reflects the texture density and how colorful the image is. In an embodiment, CRF output MSU spatial score is used as the strength of spatial complexity. It is to be noted that our experiments show that CRF output average bitrate has a high correlation with temporal complexity and is used as the strength of spatial complexity.
Further, the estimation model 308 feeds its output to a ladder prediction sub-module 310 as input and the ladder prediction sub-module 310 predicts/suggests the final ladder which will store the video content with suggested resolutions. It is noted that the ladder prediction sub-module 310 is identical to the ladder prediction sub-module 230 of FIG. 1
In a non-limiting example, a pseudo-code for determining the complexity score is provided below:
..
-An input video (say, original video V) chunks are encoded to a fixed resolution using a fixed CRF and a GOP;
-The output after encoding the video V may be considered as A;
-Calculate an SSIM value of output A against original video V, the output of SSIM value will be considered as B;
-The average bitrate of output A is calculated, and the output is considered as C;
- An average size I-frame of output A is calculated, and the output is considered as D;
-Compute complexity score ‘E’ using a trained function, where the function is a trained model based on historic data pairs;
-Complexity score: E= func(B,C,D).
..

FIG. 4 illustrates a simplified block diagram representation 400 for determining the cumulative encoding ladder for raw media content 234 using the adaptive encoding model 236, in accordance with an embodiment of the present disclosure.
The representation 400 depicts components such as a complexity classification sub-module 406, an adaptive encoding model 408, a ladder prediction sub-module 410, and an adjustment unit 412. It is noted that the complexity classification sub-module 406, the adaptive encoding model 408, and the ladder prediction sub-module 410 are identical to the complexity classification sub-module 220, the adaptive encoding model 236, and the ladder prediction sub-module 230 of FIG. 2.
In an embodiment, the complexity classification sub-module 406 is configured to compute the complexity score based on motion level data 402 and image style data 404. The same has been described later in the present disclosure with reference to FIG. 4. Then, the adaptive encoding model 408 is configured to access producer information 414, darkness level information 416, grain strength information 418, and genre information 420 from the various sub-modules of the media encoding module 214 as described in FIG. 2. The adaptive encoding model 408 analyzes this information to aid the ladder prediction sub-module 410 in determining a cumulative encoding ladder. This cumulative encoded ladder is transmitted to the adjustment unit 412.
In a non-limiting example, the adjustment unit 412 may include one or more testing teams who are responsible for manually watching the encoded media content i.e., encoded based on the cumulative encoding ladder. The one or more testing teams may then identify one or more issues with the encoded media content based at least on a quality assurance policy of the content provider platform. These issues may then be used as a basis for providing a feedback (see, 422) to the adaptive encoding model 408 to improve the existing adaptive encoding model 408. In an embodiment, the adjustment unit 412 is configured to evaluate the quality of the encoded media content. It is noted that few media content might have lower quality than expected in which case the feedback unit 422 is configured to correct the related modules in the adaptive encoding model 408. This is done such that the similar media can be encoded at a higher bitrate/quality in the future. Further, some media content may have too much higher quality than expected hence it may be understood that there are being over allocated bitrates and correction is required in the adaptive encoding model 408 so that a similar video can be encoded at a lower bitrate/quality in future.
In an alternative non-limiting example, the adjustment unit 412 may include one or more quality assurance machine learning models that are responsible for analyzing the final encoded media content. The one or more quality assurance machine learning models are responsible for detecting one or more issues with the encoded media content based at least on a quality assurance threshold established by the content provider platform. The one or more quality assurance machine learning models may generate one or more feedback (see, 422) messages for tuning the adaptive encoding model 408. This feedback loop may be operated iteratively until the final encoded media content passes the quality assurance threshold. Upon this, the encoded media content may be transmitted to the CDN 108. It is noted that the complexity score is saved after the final iteration and can be used directly for similar media content (which can be detected based on content labels associated with the raw media content 234) to save time and computational resources.
In a non-limiting example, a pseudo-code for determining the final media content encoding parameter and bitrate ladder is provided below:
..
-Compute the complexity score ‘E’ of an input media content;
-Compute the complexity level ‘F’.
-Map the complexity level ‘F’ with media encoding parameters ‘G’
-Map the complexity level ‘F’ with bitrate ladder ‘H’
-Update ‘G’ and ‘H’ based on the darkness level
-Update ‘G’ and ‘H’ based on film grain strength
-Update ‘G’ and ‘H’ based on Genre
-Update ‘G’ and ‘H’ based on the Producer
-Use the final ‘G’ and ‘H’ in the cumulative encoding ladder for final encoding.
..

In an instance, a conventional bitrate ladder including encoding bitrate for each media resolution is given as:
{
1080p:3000kbps,
720p: 2000kbps,
480p:700kbps
}

In an instance, a bitrate ladder for simple media content that can save cost and provide fair media quality to the subscriber 102 can be with adaptive encoding is given as:
{
1080p:2000kbps,
720p: 1200kbps,
480p:500kbps
}

FIG. 5 illustrates a simplified block diagram representation 500 for determining the provider level of input video content, in accordance with an embodiment of the present disclosure. It is noted that different provider’s content is available to different types of subscribers. Further, each provider’s estimations are done for maximum acceptable delivery cost based on maximum bitrate and are categorized into three provider levels, namely L1 provider, L2 provider, and L3 provider. Thereafter, for each provider level, the providers are mapped to the different complexity levels, and estimation is done for average delivery cost. Finally, the mapping of the provider to the complexity level is fine-tuned based on the actual average delivery cost that is predefined.
For instance, the L1 provider might be mapped to complexity levels of complex, normal, and simple (see, 502) and based on the cost tier of the L1 provider, the content coming from this provider may be directly mapped to a bit rate or ladder of ultra-complex, complex, complex level (see, 504).
In another instance, the L2 provider might be mapped to complexity levels of complex, normal, and simple (see, 506), and based on the cost tier of the L2 provider, the content coming from this provider may be directly mapped to a bit rate or ladder of ultra-complex, complex, default level (see, 508).
In yet another instance, the L3 provider might be mapped to complexity levels of complex, normal, and simple (see,510), and based on the cost tier of the L3 provider, the content coming from this provider may be directly mapped to a bit rate or ladder of ultra-complex, complex, simple level (see, 512).
FIG. 6 illustrates a simplified representation 600 of the three different darkness level categories, in accordance with an embodiment of the present disclosure. It is noted that these representations are only provided for the sake of understanding and do not limit the scope of the present disclosure. Different darkness level categories are determined by the darkness classification sub-module 228 by analyzing the individual frames of the raw media content 234. In one particular non-limiting implementation, the darkness level is classified into three categories which are ‘very dark’, ‘dark’, and ‘normal’. It is noted that further categories may also be introduced to improve the granularity of the analysis on the basis of the available computational resources. The representation depicts a first frame (see, 602) with a majority of pixels that are very dark hence, the first frame 602 is categorized as ‘very dark’ by the darkness classification sub-module 228. Further, a second frame (see, 604) depicts that the majority of pixels are dark in the second frame 604. Hence, the second frame 604 is categorized as a dark frame by the darkness classification sub-module 228. Similarly, the third frame (see, 606) depicts that there is an equal majority of very dark and dark pixels. Hence, the third frame 606 is categorized in normal darkness level by the darkness classification sub-module 228.
In order to ascertain the various improvements in performance achieved by the various embodiments of the present disclosure, a few experiments were conducted and the results of the same are provided in Table 1. The experiments were performed to determine to difference in performance between the conventional encoding approach and the proposed encoding approach. It is noted that various results provided in Table 1 are approximate in nature and may be associated with ± 5 to 10 % error due to their experimental nature.
Encoding Approach Avg. VMAF (Indian movie and sports) Avg. VMAF (Western Content)
Conventional encoding approach 99.36 97.21
Proposed encoding approach 98.94 96.43
Table 1. Experimental results depicting savings from the conventional and proposed encoding approaches.
As depicted in Table 1, the first column shows the Average VMAF for Indian movie/sports content wherein the proposed complexity based encoding approach benefits with average savings of approximately between 10 percent to 40 percent. Further, the second column the Average VMAF for western content wherein the proposed complexity based encoding approach benefits with average savings of approximately between 10 percent to 40 percent as well.
FIG. 7 depicts a flow diagram of a method 700 for adaptively encoding media content into a plurality of encoded media content files, in accordance with an embodiment of the present disclosure. The various steps and/or operations of the flow diagram, and combinations of steps/operations in the flow diagram, may be implemented by, for example, hardware, firmware, a processor, circuitry and/or by a system such as the system 200 explained with reference to FIG. 2 to FIG. 6 and/or by a different device associated with the execution of software that includes one or more computer program instructions. The method 700 starts at operation 702.
At the operation 702, the method 700 includes receiving, by a system, a media content request from an electronic device of a content viewer for a media content.
At the operation 704, the method 700 includes accessing, by the system, raw media content associated with the media content based, at least in part, on the media content request.
At the operation 706, the method 700 includes extracting, by the system, frame-level data from the raw media content.
At the operation 708, the method 700 includes computing, by an adaptive encoding model associated with the system, a complexity score for the raw media content based, at least in part, on the frame-level data.
At the operation 710, the method 700 includes mapping, by the system, one or more media content encoding parameters and a bitrate ladder to the media content based, at least in part, on the complexity score.
At the operation 712, the method 700 includes encoding, by the system, the media content to obtain a plurality of encoded media content files based, at least in part, on the one or more media content encoding parameters and the bitrate ladder. Herein, each encoded media content file corresponds to a particular content resolution and a particular bitrate.
It is noted that various embodiments of the present disclosure, the various functions of the system 200, or the method disclosed in FIG. 7 can be implemented using any one or more components of at least one content repository server such as the CDN 108 or CDN PoP 112. In some instances, a CDN such as an origin CDN server and/or one or more cache servers individually and/or in combination with each together may be used to implement the system 200. Alternatively, the system 200 can be communicably coupled with the CDN 108 to perform the various embodiments or methods described by the present disclosure.
The foregoing descriptions of specific embodiments of the present invention have been presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the present invention to the precise forms disclosed, and obviously, many modifications and variations are possible in light of the above teaching. The exemplary embodiment was chosen and described in order to best explain the principles of the present invention and its practical application, to thereby enable others skilled in the art to best utilize the present invention and various embodiments with various modifications as are suited to the particular use contemplated.
,CLAIMS:1. A computer-implemented method comprising:
receiving, by a system, a media content request from an electronic device of a content viewer for a media content;
accessing, by the system, raw media content associated with the media content based, at least in part, on the media content request;
extracting, by the system, frame-level data from the raw media content;
computing, by an adaptive encoding model associated with the system, a complexity score for the raw media content based, at least in part, on the frame-level data;
mapping, by the system, one or more media content encoding parameters and a bitrate ladder to the media content based, at least in part, on the complexity score; and
encoding, by the system, the media content to obtain a plurality of encoded media content files based, at least in part, on the one or more media content encoding parameters and the bitrate ladder, wherein each encoded media content file corresponds to a particular content resolution and a particular bitrate.

2. The computer-implemented method as claimed in claim 1, further comprising:
determining, by the system, a genre of the raw media content;
determining, by the system, one or more genre-specific parameters for the raw media content based, at least in part, on the determined genre; and
updating, by the system, the one or more media content encoding parameters and the bitrate ladder for the media content based, at least in part, on the one or more genre-specific parameters.

3. The computer-implemented method as claimed in claim 1, further comprising:
accessing, by the system, a subscriber profile associated with the content viewer;
determining, by the system, a subscription category of the content viewer based, at least in part, on the subscriber profile;
determining, by the system, a content producer of the raw media content;
determining, by the system, one or more producer and subscription-specific parameters based at least in part on the determined content producer and the determined subscription category; and
updating, by the system, the one or more media content encoding parameters and the bitrate ladder for the media content based, at least in part, on the one or more producer and subscription-specific parameters.

4. The computer-implemented method as claimed in claim 1, further comprising:
determining, by the system, a grain strength category of the raw media content;
determining, by the system, one or more grain strength-specific parameters based, at least in part, on the determined grain strength category; and
updating, by the system, the one or more media content encoding parameters and the bitrate ladder for the media content based, at least in part, on the one or more grain strength-specific parameters.

5. The computer-implemented method as claimed in claim 1, further comprising:
determining, by the system, a darkness level category of the raw media content;
determining, by the system, one or more darkness level-specific parameters based at least in part on the darkness level category; and
updating, by the system, the one or more media content encoding parameters and the bitrate ladder for the media content based, at least in part, on the one or more darkness level-specific parameters.

6. The computer-implemented method as claimed in claim 1, further comprising:
identifying, by the system, at least one content repository server in a vicinity of the content viewer based, at least in part, on a subscriber profile associated with the content viewer;
ingesting, by the system, the plurality of encoded media content files on the at least one content repository server; and
generating and transmitting, by the system, a manifest file comprising a plurality of Uniform Resource Locators (URLs) corresponding to the plurality of encoded media content files to the content viewer.

7. The computer-implemented method as claimed in claim 1, further comprising:
compressing, by the system, the raw media content based, at least in part, on a set of fixed encoding parameters.

8. The computer-implemented method as claimed in claim 1, wherein the frame-level data comprises at least one or more of a frame size, a frame type, Structural Similarity Index (SSIM) value, average brightness, a number of frames, a motion level, and a number of elements in media content.

9. The computer-implemented method as claimed in claim 1, wherein extracting the frame-level data comprises:
randomly selecting, by the system, a frame from the raw media content;
comparing, by the system, the selected frame with at least a consecutive frame and a preceding frame to the selected frame; and
computing, by the system, a motion level of the raw media content based, at least in part, on comparing the selected frame with the at least consecutive frame and the preceding frame, wherein the frame-level data comprises at least the motion level.

10. The computer-implemented method as claimed in claim 1, further comprising:
identifying, by the system, one or more specific content instances within the raw media content, wherein the one or more specific content instances comprises at least fast-moving content, slow-moving content, and depth of the raw media content;
predicting, by the adaptive encoding model, one or more high-attention areas and one or more low-attention areas within the one or more specific content instances;
encoding, by the system, the one or more high attention areas based, at least in part, on one or more lossless encoding parameters; and
encoding, by the system, the one or more low attention areas based, at least in part, on one or more lossy encoding parameters.

11. A system comprising:
a memory for storing instructions; and
a processor configured to execute the instructions and thereby cause the system, at least in part, to:
receive a media content request from an electronic device of a content viewer for a media content;
access raw media content associated with the media content based, at least in part, on the media content request;
extract frame-level data from the raw media content;
compute, by an adaptive encoding model associated with the system, a complexity score for the raw media content based, at least in part, on the frame-level data;
map one or more media content encoding parameters and a bitrate ladder to the media content based, at least in part, on the complexity score; and
encode the media content to obtain a plurality of encoded media content files based, at least in part, on the one or more media content encoding parameters and the bitrate ladder, wherein each encoded media content file corresponds to a particular content resolution and a particular bitrate.

12. The system as claimed in claim 11, wherein the system is further caused to:
determine a genre of the raw media content;
determine one or more genre-specific parameters for the raw media content based, at least in part, on the determined genre; and
update the one or more media content encoding parameters and the bitrate ladder for the media content based, at least in part, on the one or more genre-specific parameters.

13. The system as claimed in claim 11, wherein the system is further caused to:
access a subscriber profile associated with the content viewer;
determine a subscription category of the content viewer based, at least in part, on the subscriber profile;
determine a content producer of the raw media content;
determine one or more producer and subscription-specific parameters based at least in part on the determined content producer and the determined subscription category; and
update the one or more media content encoding parameters and the bitrate ladder for the media content based, at least in part, on the one or more producer and subscription-specific parameters.

14. The system as claimed in claim 11, wherein the system is further caused to:
determine a grain strength category of the raw media content;
determine one or more grain strength-specific parameters based, at least in part, on the determined grain strength category; and
update the one or more media content encoding parameters and the bitrate ladder for the media content based, at least in part, on the one or more grain strength-specific parameters.

15. The system as claimed in claim 11, wherein the system is further caused to:
determine a darkness level category of the raw media content;
determine one or more darkness level-specific parameters based at least in part on the darkness level category; and
update the one or more media content encoding parameters and the bitrate ladder for the media content based, at least in part, on the one or more darkness level-specific parameters.

16. The system as claimed in claim 11, wherein the system is further caused to:
identify at least one content repository server in a vicinity of the content viewer based, at least in part, on a subscriber profile associated with the content viewer;
ingest the plurality of encoded media content files on the at least one content repository server; and
generate and transmit a manifest file comprising a plurality of Uniform Resource Locators (URLs) corresponding to the plurality of encoded media content files to the content viewer.

17. The system as claimed in claim 11, wherein the system is further caused to:
compress the raw media content based, at least in part, on a set of fixed encoding parameters.

18. The system as claimed in claim 11, wherein to extract the frame-level data, the system is caused to:
randomly select a frame from the raw media content;
compare the selected frame with at least a consecutive frame and a preceding frame to the selected frame; and
compute a motion level of the raw media content based, at least in part, on comparing the selected frame with the at least consecutive frame and the preceding frame, wherein the frame-level data comprises at least the motion level.

19. The system as claimed in claim 11, wherein the system is further caused to:
identify one or more specific content instances within the raw media content, wherein the one or more specific content instances comprises at least fast-moving content, slow-moving content, and depth of the raw media content;
predict, by the adaptive encoding model, one or more high-attention areas and one or more low-attention areas within the one or more specific content instances;
encode the one or more high attention areas based, at least in part, on one or more lossless encoding parameters; and
encode the one or more low attention areas based, at least in part, on one or more lossy encoding parameters.

20. A non-transitory computer-readable storage medium comprising computer-executable instructions that, when executed by at least a processor of a system, cause the system to perform a method comprising:
receiving a media content request from an electronic device of a content viewer for a media content;
accessing raw media content associated with the media content based, at least in part, on the media content request;
extracting frame-level data from the raw media content;
computing, by an adaptive encoding model associated with the system, a complexity score for the raw media content based, at least in part, on the frame-level data;
mapping one or more media content encoding parameters and a bitrate ladder to the media content based, at least in part, on the complexity score; and
encoding the media content to obtain a plurality of encoded media content files based, at least in part, on the one or more media content encoding parameters and the bitrate ladder, wherein each encoded media content file corresponds to a particular content resolution and a particular bitrate.

Documents

Application Documents

#	Name	Date
1	202321075797-STATEMENT OF UNDERTAKING (FORM 3) [06-11-2023(online)].pdf	2023-11-06
2	202321075797-PROVISIONAL SPECIFICATION [06-11-2023(online)].pdf	2023-11-06
3	202321075797-POWER OF AUTHORITY [06-11-2023(online)].pdf	2023-11-06
4	202321075797-FORM 1 [06-11-2023(online)].pdf	2023-11-06
5	202321075797-DRAWINGS [06-11-2023(online)].pdf	2023-11-06
6	202321075797-DECLARATION OF INVENTORSHIP (FORM 5) [06-11-2023(online)].pdf	2023-11-06
7	202321075797-Proof of Right [08-05-2024(online)].pdf	2024-05-08
8	202321075797-PA [08-10-2024(online)].pdf	2024-10-08
9	202321075797-ASSIGNMENT DOCUMENTS [08-10-2024(online)].pdf	2024-10-08
10	202321075797-8(i)-Substitution-Change Of Applicant - Form 6 [08-10-2024(online)].pdf	2024-10-08
11	202321075797-DRAWING [06-11-2024(online)].pdf	2024-11-06
12	202321075797-CORRESPONDENCE-OTHERS [06-11-2024(online)].pdf	2024-11-06
13	202321075797-COMPLETE SPECIFICATION [06-11-2024(online)].pdf	2024-11-06
14	Abstract-1.jpg	2024-12-26
15	202321075797-FORM 18 [23-01-2025(online)].pdf	2025-01-23
16	202321075797-RELEVANT DOCUMENTS [18-09-2025(online)].pdf	2025-09-18
17	202321075797-POA [18-09-2025(online)].pdf	2025-09-18
18	202321075797-MARKED COPIES OF AMENDEMENTS [18-09-2025(online)].pdf	2025-09-18
19	202321075797-FORM 13 [18-09-2025(online)].pdf	2025-09-18
20	202321075797-AMENDED DOCUMENTS [18-09-2025(online)].pdf	2025-09-18