FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENTS RULES, 2003
COMPLETE SPECIFICATION
(See section 10, rule 13)
1. Title of the invention:
MULTIMEDIA PRESENTATION CONTENT SYNTHESIS
2 )licant(s)
NAME NATIONALITY ADDRESS
TATA CONSULTANCY SERVICES Indian Nirmal Building, 9th Floor, Nariman Point,
LIMITED Mumbai-400021, Maharashtra, India
3. Preamble to the description
COMPLETE SPECIFICATION
following specification particularly describes the invention and the manner in which it
is to be performed.
TECHNICAL FIELD
[0001] The present subject matter, in general, relates to synthesis of multimedia
presentation content and, in particular, relates to synthesizing presentation content with a recording of presenter of the presentation content.
BACKGROUND
[0002] Generally, in a lecture or a seminar, a presenter uses a variety of contents, such as
slideshows and/or demonstrations, to present information to one or more viewers. The content is often recorded, which could later be made available to the viewers for reference purpose. The content is generally recorded to create multimedia files that can be provided to the viewer either through the e-mail or can be downloaded from a website. In this way, for example, the viewers who might have missed a live lecture or demonstration can view the lecture or the demonstration at a later time according to their convenience. The viewer need not be physically present to attend a lecture or a seminar to access the contents presented during the lecture or the seminar.
[0003] Such facilities of recording the content and making it available later, is commonly
used these days for purposes such as e-learning or distance learning. In cases of e-learning or distance learning lecture presented by teacher is made available to the students in a recorded format. One can access the contents in available in recorded form in a file format, using some playback media or directly over a network.
[0004] For a perfect viewing experience, it is desirable to have a view of both the contents
presented by a presenter presenting the contents and an audio/visual recording of the presenter. Viewing only the contents and not the recording of the presenter may become a monotonous experience for the viewers. A presenter's articulation of voice, facial expression, and body language are some of the key considerations that make a lecture or a presentation engaging for the viewers.
SUMMARY
[0005] This summary is provided to introduce concepts related to synthesis of multimedia
presentation content, which is further described below in the detailed description. This summary
is not intended to identify essential features of the claimed subject matter nor is it intended for use in determining or limiting the scope of the claimed subject matter.
[0006] In one implementation, a system and method for synthesizing multimedia
presentation content comprising presentation content data and audio-visual content data is described. The method includes analyzing frames of at least one of the presentation content data and the audio-visual content data to identify an inactive frame. The time duration of the inactive frame is computed. Further, the inactive frame is segmented to create a multimedia presentation content including at least one frame of the presentation content data and at least one frame of the audio-visual content data present within the time duration. The segmentation, in one embodiment, is based on the presentation content data, the audio-visual content data and the time duration.
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] The detailed description is described with reference to the accompanying figures. In
the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to reference like features and components.
[0008] Fig. 1 illustrates an exemplary network environment implementing a content
synthesis system for synthesis of multimedia presentation content, according to one embodiment of the present subject matter.
[0009] Fig. 2 illustrates an exemplary content synthesis system, in accordance with an
implementation of the present subject matter.
[00010] Fig. 3 illustrates an exemplary method to synthesize multimedia presentation content, in accordance with an implementation of the present subject matter.
[00011] Fig. 4 illustrates an exemplary method of furnishing multimedia presentation content, in accordance with an implementation of the present subject matter.
DETAILED DESCRIPTION
[00012] Providing multimedia contents, such as video dips, online presentations, etc., over a network is a common practice of data transfer nowadays. Providing the multimedia contents in this way may help, for example, a viewer to view events, such as, a lecture, a seminar, or a conference, at a later stage and without being physically present at the event. A video recording of the event may be provided to die viewer as a multimedia attachment, which can be viewed on a display device, such as a computer, laptop, mobile, etc., at a later time. In one embodiment, the video recording can also be made available to the viewer through a downloadable file available on the internet.
[00013] The multimedia attachment or downloadable file pertaining to an event may contain presentation contents including graphical contents, such as sketches, drawings, slides, and visual cocteote such a v\d«o, ^writtea 4smoastoati«!as «n a boasd, \VUste*«Mv tivKsugh charts, and sua an along with an audio/video recording of a presenter presenting the event.
[00014] Of late, a few systems have been developed for providing, techniques for generating multimedia content by combining both the presentation content presented in an event and the audio/video recording of the presenter presenting the event The presenter's recording may be presented along with the presentation content and may occupy half a screen or may be presented as a thumbnail. The presenter's recording and the presentation contents may be clubbed together in a way such that they can be presented side by side, with both the presentation content and presenter's recording sharing about half the viewing area on a viewing screen, thereby allowing the viewer to view both the presentation contents and the presenter's recording simultaneously.
[00015] However, in this method the presenter's recording may often cause distraction to the viewer who may be trying to concentrate on the presentation contents which are being presented for a limited time only. Further, such a method of presentation of multimedia content is not compatible with devices such as mobile phones. Owing to the small size of the screen in such devices, the presentation content along with the presenter's recording may not be appropriately accommodated onto the screen.
[00016] In another technique, a single video file including both the presentation contents and the presenter's recording can be generated. In such a file, a video recording of the presentation contents are interleaved with the video recording of the presenter. However, creation of such a file requires a lot of manual effort in editing, like determining time instances where the presentation content and the presenter's recording need to be combined. However, once the file is created the viewer does not have a flexibility to change a view of the video file being shown, for example, viewer may not be able to switch between the video recordings of the presentation contents and the presenter's recording.
[00017] To this end, system(s) and method(s) for creation of multimedia presentation content by synthesizing presentation contents of an event and a recording of the presenter presenting the event have been described. The presentation contents may include, for example, a slide show, a video recording of a demonstration, illustration on a board or a paper, drawings, and sketches. The presenter's recording may include audio/video recording of the presenter's voice/facial expression, etc. In one embodiment, a content synthesis system may combine the presentation contents and the presenter's recording to create a synthesized content, which is capable of providing a view of the presentation contents and a view of the presenter's recording at definite instance, allowing switching between the presentation contents and the presenter's recording, without any intervention from a viewer viewing the synthesized content.
[00018] The synthesized content can be presented to the viewer over a network in real time or as a downloadable file in a manner that makes viewing convenient, thereby creating an environment that makes the viewer feel as if he or she is physically present at the event.
[00019] The content synthesis system is a computer implementable system that may be implemented on various computing devices. In an implementation, the content synthesis system is coupled to the network for the purpose of sending the synthesized content to a network of viewer. The synthesized content can be presented to the viewer through a display device, for example a mobile phone, computer system, and a personal digital assistant (PDA).
[00020] In an implementation, the content synthesis system receives the presentation content and the presenter's recording as input streams from same or different sources. After receiving the presentation content and the presenter's recording, the system analyzes the presentation content
and the presenter's recording to be displayed. In one implementation, frames of the presentation content are analyzed to determine time instance where it is appropriate to switch to the presenter's recording. Similarly, in one embodiment, the presenter's recording may be analyzed to determine time instance where the presentation content is to be switched.
[00021] In one example, the time instances in the presentation content where the presenter's recording can be inserted, are based on durations during which the presentation content remains inactive for a significant period of time, i.e., when there is no change in the visual of the presentation content. The presenter's recording can be inserted for a predetermined time slot at such time instances. A duration during which the presentation content remains inactive for a significant period of time is herein referred to as an inactive time slot. Similarly, the presenter's recording can also be analyzed to determine an inactive time slot, in order to insert the presentation content.
[00022] In one implementation, in order to determine an inactive time slot in the presentation content, each frame of the presentation content is compared with its subsequent frame to determine an instance where a frame with substantial change in current view of the presentation content occurs. For die purpose of comparison, the frames of the presentation content are extracted from the recording of the presentation content
[00023] In addition, the inactive time slot is compared with a certain predefined inactivity period. If the inactive time slot exceeds the predefined inactivity period, the inactive time slot of the presentation content is segmented to allow display of the presenter's recording by switching to the presenter's recording for a brief period.
[00024] In an implementation, an inactive time slot may be identified in the presenter's recording to segment the inactive time slot for insertion of die presentation content.
[00025] For the purpose of segmentation of the inactive time slot, a display time for each of the presentation content and the presenter's recording in the inactive time slot is determined. In one implementation, the display time of the presentation content and the presenter's recording are predefined by the content synthesis system. In another implementation, display time of the presentation content and the presenter's recording may be variable and determined based on
inputs provided by one or more users. In yet another implementation, the display time of the presentation content and the presenter's recording is variable and determined by the content synthesis system on the basis of predefined analytical rules.
[00026] Further, the inactive time slot may be segmented into multiple slots based on the display time of the presentation content and the presenter's recording. The presentation content and the presenter's recording may be provided alternately in the segmented inactive time slots
[00027] After segmenting the inactive frame, the presentation content, and the presenter's recording are synchronized. The synchronization between the presentation content and the presenter's recording may be required in a situation where the presentation content and the presenter's recording are recorded by two different systems and therefore may have different sampling rates, due to which the presentation content and the presenter's recording may go out of synchronization.
[00028] After the inactive time slot and the synchronization points are determined, the system synthesizes the presentation content and the presenter's recording to create the multimedia presentation content. The system creates a synthesized content by synthesizing the audio and visual parts of presenter's recording with the presentation content The synthesized content shows both the content and the presenter's recording alternately based on the methods described above.
[00029] In one implementation, if the presenter's recording is not available for a particular presentation content frame, only the presentation content is provided to the viewer. In one embodiment, the viewer may be provided with an option td choose between the presentation content and the presenter's recording for viewing by providing inputs through a user interface, for example, by clicking on an icon provided on the display device. In another embodiment, the user may switch between the presentation content and the presenter's recording by using gazing techniques. For example, the presenter's recording can be displayed along with the presentation content as a small thumbnail at one corner of the screen. If the viewer gazes at the thumbnail for a certain amount of time, the presenter's recording may be displayed as a full screen, while the presentation content may be reduced to a thumbnail view. Similarly, when the viewer gazes at
the presentation content, which has been turned into a thumbnail view, the presentation content turns to a full screen view, while the presenter's recording may be reduced to a thumbnail view.
[00030] In one embodiment, the content synthesis system can also synthesize the presentation content and presenter's recording in real time. The content synthesis system analyzes the presentation content in real time and determines time instances where it is appropriate to switch to presenter's recording. In one implementation, content synthesis system learns the presentation style of the presenter and then predicts the suitable time instances to toggle between the presentation content and presenter's recording.
[00031] The present system synthesizes the presentation content and the presenter's recording to prepare multimedia presentation content without much of editing efforts and works without any intervention from the viewer. The present system can be used for e-learning and distance learning where the lecture content and the recording of the teacher is recorded and synthesized to create a multimedia presentation content. The multimedia presentation content, also referred to as synthesized content, can be provided to the student at a remote location over a network. The synthesized content can be displayed on small devices, such as a mobile phone. Further, the viewer can also control the display of the presentation content and the presenter's recording by toggling between the presentation content and the presenter's recording alternately.
[00032] Although, the present system is explained in context of lectures or seminars for purposes, such as e-learning or distant learning, it will be appreciated that such an explanation is merely for the purpose of illustration and should not be construed as a limitation. The present system may be extended to other events, such as, seminar, conferences, web meetings and demonstration.
[00033] Fig. 1 shows an exemplary network environment 100 implementing a content synthesis system 102 for creating multimedia presentation content by synthesizing presentation content with a recording of a presenter presenting the presentation content, according to an embodiment of the present subject matter. The content synthesis system 102 may be used in e-learning and distance learning, where the presentation content of a lecture or a classroom, such as slides, and content on the board, is synthesized with the recording of the presenter presenting the presentation content to create a synthesized content. The synthesized content,
interchangeably referred to as multimedia presentation content, may be provided to the students even if the students have missed the lecture. The synthesized content may be provided through a downloadable file or an e-mail attachment. The synthesized content comprises the presentation content and presenter's recording spaced alternately such that both the presentation content and presenter's recording are presented for definite time slots thereby making the multimedia presentation interesting.
[00034] The network environment 100 includes the content synthesis system 102 communicating, through a network 104, with a plurality of client devices 106-1, 160-2... 106-N, hereinafter collectively referred to as client devices 106. The client devices 106 can be implemented as any of a variety of computing devices, including, for example, servers, a desktop PC, a notebook or portable computer, a workstation, a mobile computing device, an entertainment device, and an internet appliance.
[00035] The network 104 may be a wireless network, wired network or a combination thereof. The network 104 can be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, and such. The network 104 may either be a dedicated network or a shared network, which represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), etc., to communicate with each other.
[00036] The content synthesis system 102 can be implemented as any of a variety of computing devices, including, for example, servers, a desktop PC, a notebook or portable computer, a workstation, a mainframe computer, a mobile computing device, and an internet appliance.
[00037] The content synthesis system 102 analyzes the presentation content and the presenter's recording to create the synthesized content. The presentation content can include, for example, a slide show, a video recording of the content presented in the form of demonstration, content written on black or white board, drawing, and sketches. The presenter's recording is in the form of an audio-visual recording of the presenter presenting the presentation content. The
synthesized content may display portions of the presentation content and presenters recording alternately.
[00038] Consider an example of a presenter delivering a lecture for an hour using a presentation having 10 slides. For instance, the presenter may spend 5 min on the first slide and explain the second slide for the next 15 minutes and so on. The slides and the presenter's recording are recorded through one or more recording devices. The recording of a particular slide of the presentation may be captured in multiple frames of the recording. For example, recording of the first slide may comprise a set of frames extending upto 5 minutes. Similarly, the next set frames may extend upto the next 15 minutes. These sets of frames may be referred to as die first and the second set of presentation content frames, respectively. In addition, recording of me presenter explaining the slide is also recorded. Accordingly, while the presenter is explaining the first slide for the first 5 minutes and the second slide for the next 15 minutes, the first and the second set of presenter's recording frames are captured. It will be understood that die recording for the first slide has a first set of presentation content frames corresponding to the first set of presenter's recording frames and so on.
[00039] In the above example, the synthesized content created is such that a viewer is not presented with, say, the presentation content alone for the first 5 minutes which may seem monotonous. Rather, in the 5 minutes, which is the span of display of the first set of presentation content frames, frames from the presenter's recording where the presenter's explanation, facial expression, hand movement etc., for the first slide are captured, are interleaved. For example, the presentation content may be displayed for the first 2 minutes and then the presenter's recording may follow. Similarly in the next 15 minutes, the entire 15 minutes duration may be segmented in, say, four sequential slots of 4 minutes, 5 minutes, 2 minutes, and 4 min. The first sequential slot may provide a portion of the second set of presenter's recording frames to a viewer for 4 minutes while the second sequential slot may present a part of the second set of presentation content frames and the remaining two frames may alternate between portions of the second set of presentation content frames and second set of presenter's recording frames in a similar manner.
[00040] In another example, consider a video recording of an experiment or demonstration being conducted. The entire recording may consist of 70 frames. Each frame is compared with its subsequent frame to identify any change in the view of the presentation content. A first frame is
compared with a second frame, the second frame is compared a third frame, the third frame is compared with a fourth frame and so on. If a change in the view of the presentation content is observed at the sixteenth frame then frames from 1-15 are identified as first set of presentation content frames. If the next change is observed at frame number 25 then frames 16-24 are identified as second set of presentation content frames. Similarly, all the frames are compared with the subsequent frame to identify the sets of presentation content frames. The presenter's recording for the first set of presentation content frame corresponds to first set of presenter's recording frame. Similarly, the presenter's recording for the second set of presentation content frame corresponds to second set of presenter's recording frame and so on. In the present example, the time duration of the first set of presentation content frames is computed by comparing the first frame and the last frame of the first set of presentation content frames, that is frame 1 and frame 15.
[00041] In order to create a synthesized content, the first set of presentation content frames, that is frames 1-15, are segmented into different segments to provide both the first set of presentation content frame and first set of presenter's recording alternately. Similarly, other sets of presentation content frames are segmented to insert corresponding set of presenter's recording.
[00042] The afore going example is provided merely for the purpose of ease of explanation of concepts related to content synthesis and should not be construed as a limitation. From the above examples, it is apparent that for creation of the synthesized content, the time duration corresponding to a set of frames needs to be determined. Also, the time duration of the sequential time slots for display of presentation content and presenter's recording in that time duration need to be computed.
[00043] In one embodiment, the content synthesis system 102 includes an analysis module 108 and a synthesis module 110 configured to determine the time duration corresponding to a set of frames and sequential time slots related to the same. The time duration corresponding to a set of frames is referred to as time duration, while a sequential time slot for displaying the presentation content or presenter's recording within the time duration is referred to as a display time. Further, a time slot may include multiple sequential time slots of the presentation content and the presenter's recording.
[00044] The analysis module 108 receives the presentation content and the presenter's recording as input streams from same or different sources. The analysis module 108 analyzes the presentation content and the presenter's recording. In one implementation, the analysis module 108 analyzes the frames of the presentation content to determine time instances in the presentation content where it is appropriate to insert the presenter's recording. In another implementation, the analysis module 108 may analyze the frames of the presenter's recording, in order to determine time instances in the presenter's recording where the frames of presentation content can be inserted. In an example, the analysis module 108 analyzes the first set of presentation content frame to determine time instance in the first set of presentation content frames, where the first set of presenter's recording can be inserted. The time instances in the presentation content where the presenter's recording can be inserted are based on durations during which the presentation content remains inactive for a significant period of time. The durations during which the presentation content remains inactive for a significant period of time are referred as inactive time slot. In an embodiment, presenter's recording can also be analyzed to determine an inactive time slot in the presenter's recording, to insert the presentation content.
[00045] In order to determine the inactive time slot in the presentation content, the analysis module 108 extracts the first set of presentation content frames. In one embodiment, a pair of consecutive frames is compared with each other. For example, each frame of the first set presentation content frames is compared with its subsequent frame to determine an instance where a change in presentation content is observed. For example, the first frame of the first set of presentation content frames may be compared with the second frame, the second frame may be compared with the third frame, and so on until a change in the presentation content occurs. The change may be observed once the first frame of the second set of presentation content frames is accessed.
[00046] For the purpose of comparison, each frame of the first set of presentation content frames is subtracted from its subsequent frame. Further, the difference between the frames is compared with a threshold to filter out the noise which may occur due to slight change in cursor position, blinking of cursor etc. The time difference between the frames where a change in the presentation content occurs is computed. For example, time difference between the first frame and the last frame of the first set of presentation content frames is computed.
[00047] The analysis module 108 determines whether or not this time difference is greater than a predefined inactivity period. If the time difference is greater than the predefined inactivity period, the analysis module 108 identifies the first set of presentation content frames as an inactive frame.
[00048] After determining the inactive frame and total duration of the inactive frame, the analysis module segments the inactive frame of the presentation content to interleave the presenter's recording in between the presentation content. As explained before, the segments are basically the display time of the frames of presentation content and presenter's recording. For example, the first set of presentation content frames is segmented to interleave the first set of presenter's recording with first set of presentation content frames.
[00049] Further, the analysis module 108 determines the display time of the frames of presentation content and presenter's recording in the inactive frame, in order to segment the inactive frame.
[00050] In one implementation, display time of presentation content and the presenter's recording may fixed and predefined by the system, for example, if an inactive frame is of 60seconds, the inactive frame is segmented into 4 slots to provide the presentation content and presenter's recording twice for 15 seconds.
[00051] In another implementation, display time of the presentation content and the presenter's recording may be variable and based on user input. For example, in an inactive frame of 60secconds, the presentation content and the presenter's recording may be provided in four slots, 20seconds, 10 seconds, 15 seconds, and 15 seconds respectively. The variable display time may be determined by the analysis module 108 based on inputs received from one or more user interfaces. The viewer may interact with the system through user interfaces to provide inputs based on different factors, such as, on the basis of time required to grasp the content. In another example, viewers may provide input on the basis of presenter's recording. For example, if a part of the presenter's recording is interesting one or more viewer may want that part to be provided for a longer duration.
{00052] In yet another implementation, the display time of the presentation content and presenter's recording may be variable and be determined by the content synthesis system 102 on the basis of a set of predefined analytical rules. In one example, the display time of the presentation content in a particular inactive frame is determined on the basis of actual content present in a particular frame.
[00053] After determining the display time in accordance with any of the various afore mentioned methods, the analysis module 108 divides the inactive frame of the presentation content into different segments based on the display time of the presentation content and the presenters recording. These segments are divided so that the presentation content and the presenter's recording can be shown alternately.
[00054] Further, the synthesis module 110 synthesizes the presentation content with the presenter's recording in the inactive frame to create the synthesized content. The synthesized content shows the presentation content and the presenter's recording alternately based on the display time of the presentation content and the presenter's recording in the inactive frame.
[00055] In one implementation, viewers can switch between the presentation content and the presenter's recording as per their convenience. In one example, if the presentation content is complex, such as a derivation of complex mathematical equation, a particular viewer may want to view the presentation content for more time. So when the presenter's recording appears the viewer may override presenter's recording and view the presentation content The viewer may view the presentation content for more time and is not bound to view the presentation content and the presenter's recording as created in synthesized content. The viewer may interact with the content synthesis system 102 through the client device 106. The manner in which the synthesized content is created and displayed is further explained in detail in conjunction with Fig. 2.
[00056] Fig. 2 illustrates an exemplary content synthesis system 102, in accordance with an embodiment of the present subject matter. In said embodiment, the content synthesis system 102 includes a processors) 202, interface^) 204, and a memory 206. The processor 202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. The processor 202 is coupled to the
memory 206. Among other capabilities, the processor 202 is configured to fetch and execute computer-readable instructions stored in the memory 206.
[00057] The interface(s) 204 may include a variety of software and hardware interfaces, for example, a web interface allowing the content synthesis system 102 to interact with the client devices 106. Further, the interface(s) 204 may enable the content synthesis system 102 to communicate with other computing devices, such as web servers and external repositories or databases. The interface(s) 204 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example LAN, cable, etc., and wireless networks such as WLAN, cellular, or satellite. The interface(s) 204 may include one or more ports for connecting a number of computing devices to each other or to another server.
[00058] The memory 206 can include any computer-readable medium known in the art including, for example, volatile memory (e.g., RAM), and/or non-volatile memory (e.g., EPROM, flash memory, etc.). The memory 206 includes one or more module(s) 208 and data 210. In one embodiment, the module 208 further includes the analysis module 108, the synthesis module 110, a user interaction module 216, and other module(s) 2IS. The analysis module 108 includes a content analyzer 212 and a segmentation module 214.
[00059] The data 210 serves, amongst other things, as a repository for storing data processed, received and generated by the one or more of the module(s) 208. The data 210 includes, for example, presentation content data 220, audio-visual content data 222, synthesized content data 224, and other data 226. In one embodiment, the presentation content data 220, the audio-visual content data 222, and the synthesized content data 224, may be stored in the memory 206 in the form of data structures. In one embodiment, the presentation content data 220 pertains to presentation content Accordingly, me presentation content recorded may be stored n the memory 206 as presentation content data 220 while the presenters recording may be saved as audio-visual content data 222. Similarly, tiie synthesized content data 224 pertains to synthesized content.
[00060] As mentioned previously, the content synthesis system 102 can synthesize presentation content data 220 with the audio-visual content data 222 to generate the synthesized content data 224. The presentation content data 220 may include, for example, slide shows, and
video recordings of demonstration or experiments, content on board or charts, and drawings and sketches presented in an event. Whereas, the audio-visual content data 222 includes audio-visual recording of the presenter presenting the presentation content data 220. The audio-visual recording captures the presenter's articulation, facial expression etc.
[00061] In one implementation, the synthesized content data 224, generated by the content synthesis system 102, switches between the presentation content data 220 and the audio-visual content data 222 alternately without any inputs from a viewer viewing the synthesized content data 224. In another implementation, the viewer can also control the switching between the presentation content data 220 and the audio-visual content data 222.
[00062] In order to create the synthesized content data 224, the content analyzer 212 receives the presentation content data 220 and the audio-visual content data 222 as input streams from one or more sources. After receiving the presentation content data 220 and audio-visual content data 222, the content analyzer 212 analyzes the presentation content data 220 and audiovisual content data 222. In one implementation, the content analyzer 212 analyzes the frames of the presentation content data 220 to determine time instances where it is appropriate to insert the audio-visual content data 222. In another implementation, the content analyzer 212 analyzes audio-visual content data 222 to determine time instance where the presentation content data 220 can be inserted. In one example, the time instances in the presentation content data 220 are based on duration during which the presentation content data 220 remains inactive for a significant period of time and the audio-visual content data 222 can be inserted for a predetermined time slots at such time instances. A duration during which the presentation content data 220 remains inactive for a significant period of time are referred as inactive time slot.
[00063] In one implementation, in order to determine inactive time slot in presentation content data 220, the content analyzer 212 analyzes frames of the presentation content data 220. Similarly, content analyzer 212 can also analyze the frames of audio-visual content data 222 to determine inactive time slot in order to insert the presentation content data 220.
[00064] In one implementation, the inactive time slots in the presentation content data 220 can be determined by analyzing the frames of the presentation content data 220. In order to determine the inactive time slots, the content analyzer 212 extracts the frames of presentation
content data 220. The content analyzer 212 compares each frame of the presentation content frames with its subsequent frames until a change is observed. In an example, the content analyzer 212 extracts the first set of presentation content frames. The content analyzer 212 compares each frame of the first set of presentation content frames with its subsequent frame until a change in the current view of the presentation content data 220 occurs. The content analyzer 212 subtracts each frame of the first set of presentation content frame from its subsequent frame to generate a new frame. Further, the content analyzer 212 analyzes the new frame. In one implementation, the new frame is binarized by the content analyzer 212 to eliminate noise and other disturbances. Further, the content analyzer 212 normalizes the difference between the first frame and subsequent frame of the first set of presentation content frames to identify any change the pixels of the new frame.
[00065] In one implementation, the content analyzer 212 analyzes the new frame by the change in pixels. Black pixels indicate that there is no change in the view of the presentation content data 220 frame and the non black pixels indicate that the view of the presentation content data 220 has changed. For example, the content analyzer 212 determines whether the first set of presentation content frames has changed or not,
[00066] If the new frame has all black pixels, the content analyzer 212 identifies that the first set of presentation content frames of the presentation content data 220 has not changed for a significant period of time. In addition, the content analyzer 212 determines whether the difference between the frames, in terms of pixels, is less than a threshold value. If the difference is less than the threshold, the first set of presentation content frames of presentation content data 220 are identified as inactive frames. The content analyzer 212 also analyzes other sets of presentation content frames of presentation content data 222 to determine inactive frame.
[00067] In one embodiment, in order to determine the threshold value that accounts for the noise, is based on user inputs, may be built in the content analyzer 212. For the purpose, a one or more viewers may be provided with a sequence of successive frames of the presentation content data 220. A viewer may identify the difference between the frames of the presentation content data 220. Frame number of the frame of the presentation content data 220 where the viewer may detect any change in the presentation content with respect to the previous frame is recorded. Further, the frame number of the presentation content data where the viewer does not detect any
change is recorded. The recorded threshold value may be used to define thresholds at future instances.
[00068] After determining the inactive frame, the content analyzer 212 compares the time duration of the inactive frame with a predefined inactivity period. If the time duration of the inactive frame is greater than a predefined inactivity period, the inactive frame is segmented into different segments to provide frames of presentation content data 220 and audio-visual content data 222 alternately. In one implementation, the content analyzer 212 determines the time duration of the inactive frame on die basis of comparison of the first frame of the first set of presentation content frames and the last frame of the first set of presentation content frames.
[00069] Further, the segmentation module 214 segments the inactive frame of the presentation content data 220 to provide audio-visual content data 222 along with presentation content data 220 within the time duration of the inactive frame.
[00070] In order to segment the inactive frame, the content analyzer 212 determines the display times of the frames of presentation content data 220 and audio-visual content data 222 in the inactive frame. In one implementation, the display time of the presentation content data 220 and the audio-visual content data 222 is fixed and predefined by the content analyzer 212. In one example, the content analyzer 212 determines the display time of the presentation content data 220 and audio-visual content data 222 on the basis of the time duration of the inactive frame. If the time duration of inactive frame is less than 50sec, the inactive frame is divided into two segments to provide the presentation content data 220 and the audio-visual content data 222. If the duration of the inactive frame is 50sec to 100 sec, the inactive frame is divided into four segments to provide presentation content data 220 and audio-visual content data 222 alternately. Therefore, if an inactive frame is of 40 sec, the inactive frame is divided into two segments of 20sec each, to provide the presentation content data 220 and audio-visual content data 222 and if the inactive frame is of 80 sec duration, the inactive frame is divided into four equal segments each of 20sec, to provided presentation content data 220 and audio-visual content data 222 alternately. Accordingly, a viewer is presented with the presentation content for 20sec, followed by presenter's recording for die next 20 sec, again followed by one slot each of presentation content and presenter's recording.
[00071] In another implementation, as aforementioned, the display time of the presentation content 220 and audio-visual content data 222 is variable, based on input received from one or more viewers. The frames of the presentation content data 220 may be provided to a sample of viewers and the time required to grasp the content is sought from the viewers. Based on the time indicated by the viewer, the display time of the frame of the presentation content data 220 is determined. For example, in a video recording of the information on a board, where the first portion of the recording only has a drawing and a second portion has drawing with labels, one or more viewers indicates that the display time of the second portion needs to be greater man the first portion as more time will be required to grasp the presentation content data 220. Therefore, in the inactive frame for first portion, audio-visual content data 222 will be provided for more time as compared to the presentation content data 220. So if the inactive frame is of 50 sec, the audio-visual content da,ta 222 may be provided for 30 sec and presentation content data for 20sec. In another example, one or more viewers may provide input, to view the audio-visual content data 222 for a longer duration, if the audio-visual content data 222 is interesting.
I00072J I*1 me above examples, in order to find the display time of the presentation content data 220 in an inactive frame, the presentation content data 220 is classified into different levels of complexities, such as, high, medium, and low based on the inputs received from one or more viewers. For example, a. frame having a mathematical formula may be classified as having high complexity by a viewer and a graphical representation may be classified having low complexity. Thus, if the complexity of the presentation content data 220 is high, a time required to grasp the content may be high and if its low time required is less. The frames of different complexity, such as high, medium and low, are presented to a set of viewers. The minimum time required by the set of viewers to grasp the content of the frames is recorded. For example, if a particular frame having a high complexity is presented to the sample of viewers, rnimmum time required to grasp the presentation content data 220 by a viewer is recorded and provide to the content analyzer 212. Based on the recorded time, the content analyzer 212 determines the display time of a frame of presentation content 220. In one implementation, the display time of a frame should be such that it allows maximum viewers to read the content. For example, display time is chosen such that about 80% of the viewers can grasp the content. The remaining 20% can later refer to the frame, for example, if the audio-visual content data 222 is being presented the viewer may override it and view the presentation content data.
[00073] Accordingly, based on inputs received from the viewers the presentation content data 220 may be identified to have high, medium, and low complexity and a display time may be defined for each complexity type. For example, if a frame of the presentation content data 220 has a high complexity the display time of the presentation content 220 will be more man audiovisual content data 222 in an inactive frame. If a frame of presentation content data 220 has a low complexity, the display time of the audio-visual content data 222 will be more than presentation content data 220.
[00074] Based on the inputs from the viewer, the content analyzer 212 determines the display time of the frames presentation content data 220 and audio-visual content data 222 in the inactive frame.
[00075] In yet another implementation, the content analyzer 212 determines the display time of the frames of presentation content data 220 and audio-visual content data 222, based on different predefined analytical rules. In one example, the content analyzer 212 compares first set of presentation content frames with a second set of presentation content frames. In one example, the content analyzer 212 determines that in a presentation a first slide has two bullet points and the second slide has ten bullet points, it accordingly determines the display time of the second slide to be more than the first slide. Accordingly, in the first slide frames of presentation content data 220 are displayed for less time as compared to the frames of audio-visual content data 222. In another example, if the first set of presentation content frame has lots of text and the second set of presentation content frames has only images, then the content analyzer 212 determines that display time of the first set of presentation content frames has to be more than the second set of presentation content frames. So, in the inactive frame for the first set of presentation content frames, the frames presentation content data 220 will be displayed for more time and audiovisual content data 222 will be displayed for less time.
[00076] In yet another example, the content analyzer 212 may use certain analytical rules determine actual presentation content in a set of presentation content frames. For the purpose, in one example, the content analyzer 212 may count words present in different frames of the presentation content data 220. The content analyzer 212 counts the words in the first set of presentation content frames by, for example, reading the number of spaces. If the words are 0-100 then the display time of the presentation content data 220 in the inactive frame will be 30%
of the time duration of the inactive frame. If the words are 100-300 the display time of the presentation content data 220 in the inactive frame is 50% of the time duration of the inactive frame. Consider an example in which inactive frame is of 100 sec duration, in the first case presentation content data 220 will be provided for 30sec and audio-visual content data 222 will be provided for 70 sec. In the second case presentation content data 220 will be provided for 50sec and audio-visual content data 222 will be provided for 50sec.
100077] The content analyzer 212 may also analyze the audio-visual content data 222 to determine the display time of a frame of the audio-visual content data 222 in the inactive frame. The content analyzer 212 uses different motion detection techniques to analyze the frames of audio-visual content data 222. The content analyzer 212 may compare two frames of the audiovisual content data 222 to determine the display time. In one example, the content analyzer determines whether the first frame of the audio-visual content data 222 has more motion, such as hand movement and body movement of the presenter, as compared to second frame the display time of the first frame of the audio-visual content data 222 will be more than the second frame,
[00078] After determining the time duration of inactive frame and display time of the presentation content data 220 and audio-visual content data 222, the segmentation module 214 splits the time duration of the inactive frame into different segments to display frames of presentation content data 220 and audio-visual content data 222 alternately. In one implementation, the segmentation is done, such that a first segment displays the presentation content data 220 based on the display time of the presentation content data 220 computed above and a second segment displays the audio-visual content data 222. In another implementation, for example, in an inactive frame of 120 seconds, the inactive frame may be divided into 5 segments of 40seconds, 20seconds, 25seconds, I Sseconds, and 20 seconds. The first segment may displays the frame audio-visual content data 222 for 40seconds followed by the frame of the presentation content data for 20 seconds the remaining segments displays audio-visual content data 222 and presentation content data 220 alternately for 25seconds, 15seconds, and 20seconds.
[00079] Subsequently, the synthesis module 110 determines synchronization points between the presentation content data 220 and the audio-visual content data 222. The presentation content data 220 and the audio-visual content data 222 may be recorded by two different systems. Therefore, a sampling rate of the presentation content data 220 and the audio-visual content data
222 may be different and may go out of synchronization. In one implementation, the synthesis module 110 may extract an audio content from the audio-visual content data 222. After extracting the audio content, the synthesis module 110 determines time instances when the audio content needs to be synchronized with the presentation content data 220.
[00080] After detennining the synchronization points between the presentation content data 220 and the audio-visual content data 222, the synthesis module 110 synthesizes the frames of the presentation content data 220 and the audio-visual content data 222 to create the synthesized content data 224. In one implementation, the synthesis module 110 creates the synthesized content data 224 by synthesizing audio and video part of the audio-visual content data 222 with the presentation content data 220 at appropriate instances. The synthesized content data 224 presents the presentation content data 220 and the audio-visual content data 222 alternately. The synthesized content data 224 can be presented to the viewer through the client devices 106.
[00081] In one embodiment, the user interaction module 216 provides options to the viewer to choose between the presentation content data 220 and the audio-visual content data 222 irrespective of the synthesized content data 224 created by the content synthesis system 102. In one example, the user interaction module 216 provides an option to the viewer to toggle between the presentation content data 220 and the audio-visual content data 222, be clicking on icon, using user interfaces devices, such as a mouse. A particular viewer can view only the presentation content data 220 or the audio-visual content data 222, irrespective of the synthesized content data 224 created by the content synthesis system 102.
[00082] In another example, the viewer can switch between the presentation content data 220 and the audio-visual content data 222 by using different gestures such as eye movement, head movement, and hand movement.. The audio-visual content data 222 can be displayed along with the presentation content data 220 as a small thumbnail at a corner of the screen. The user interaction module 216 analyzes different gestures of the user, for example user interaction module determines, whether the viewer is gazing at the presentation content data 220 or the audio-visual content data 222. In another case, the user interaction module 216 analyzes the head movement of the viewer. If the viewer is gazing at the audio-visual content data 222, the user interaction module 216 determines the time for which the viewer gazes at the audio-visual content data 222. If this time is greater than a predefined amount of time, the audio-visual
content data 222 is displayed as a full screen and the presentation content data 220 is displayed as a small thumbnail. Similarly, if the viewer gazes at the presentation content data 220 for a certain amount of time, the presentation content data 220 is displayed as a full screen and the audio-visual content data 222 is displayed as a thumbnail. In one implementation, display devices, such as the client devices 106, may be provided with a camera to capture the eye movement of the viewer.
[00083] In an embodiment, synthesized content data 224 may be created in real time also. The content analyzer 212 receives presentation content data 220 and audio-visual content data 222 as different streams in real time. The content analyzer 212 analyzes the presentation content data 220 to determine inactive frame and inserts frames of the audio-visual content data 222 with the presentation content data 220 based on the methods described above.
1&&094} Fig. 3 illustrates an exemplary method to synthesize multimedia presentation content. The exemplary method 300 may be described in the general context of computer executable instructions. The method 300 may be a computer implementable method. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, and the like that perform particular functions or implement particular abstract data types. The method may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communication network. In a distributed computing environment, computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.
[00085] The order in which the method is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the mediod, or an alternate method. Additionally, individual blocks may be deleted from the method without departing from the spirit and scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof.
[00086] In accordance with one embodiment of the present subject matter, the method 300 may be implemented in the previously described system 102. However, it will be appreciated by
one skilled in the art that such an implementation is not limiting. The method 300 may be implemented in a variety of such similar systems.
[00087] At block 302, presentation content data and audio-visual content data are received. In one implementation, die content analyzer 212 receives frames of the presentation content data 220 and the audio-visual content data 222 as different stream from different sources. The presentation content data 220 and audio-visual content data 222 are recorded by different recording systems. The presentation content data 220 and audio-visual content data 222 are captured as frames. The content synthesis system 102 synthesizes the frames of presentation content data 220 and frames audio-visual content data 222 to create the multimedia presentation content.
(00088] At block 304, the presentation content data, for example, the presentation content data 220 is analyzed to identify an inactive frame in the presentation content data and time duration of the inactive frame. If no change, in the presentation content data 220 for a particular set of presentation content frames, occurs for a certain amount of time the set of presentation content frame is identified as the inactive frame. Although, the explanation of the method 300 at block 304 has been described in context of the presentation content data, it will be understood that method 300 may be implemented in a similar manner, with changes apparent to one skilled in the art, by analyzing the audio-visual content to create the synchronized content.
[00089] In one implementation, in order to determine the inactive frame the content analyzer 212 subtracts each frame of the first set of presentation content frames from its subsequent frame to generate a new frame. If the new frame has all black pixels the first set of presentation frames has not changed for a significant period of time. Further, the difference between the frames is compared with a certain threshold value. If the difference is less than a threshold, the first set of presentation content frames is marked as the inactive frame. Further, a determination is made whether the time duration of the inactive frame is greater than a predefined inactivity period. If the time duration is greater than the predefined inactivity period, the content analyzer 212 segments the inactive frame into different segment as explained at block 308.
[00090] At block 306, the display time of the frames of the presentation content data and audio-visual content data in the inactive frame is determined. In an implementation the display
time of the presentation content data 220 and audio-visual content data 222, in an inactive frame, is fixed and predefined by the system. For example, if the inactive frame is of 100 seconds, the inactive frame is divided into four equal slots of 25 seconds each. The four slots display frames of presentation content data 220 and audio-visual content data 222 alternately. In another implementation, the display time of the frames of the presentation content data 220 and audiovisual content data 222 may be variable.
[00091] In one embodiment, the variable time may be determined based on user input. In one example, in the inactive frame the display time of the frame of presentation content data 220 and audio-visual content 222 is determined on the basis of complexity of the frame, provided by one or more users. For example, if the input provided by one or more viewer indicates that the presentation content data 220 is highly complex, rules may be built in the system 102 to the display time of the presentation content data 220 for a longer time as compared to audio-visual content data 222 in that particular inactive frame.
[00092] In another embodiment, the variable time may be determined by the content analyzer 212 using some predefined analytical rules. In another cases, the content analyzer 212 may count the number of words in a particular frame of presentation content data 220. The display time of the presentation content data 220 and audio-visual content data 222 is determined on the basis of number of words in the frame of presentation content data 220. For example, if the words are 0-100, display time of the frames presentation content data 220 may be 30% of the time duration of the inactive frame and if the words are 100-300 display time of the frames presentation content data 220 may be 50% of the time duration of the inactive frame. In case of audio-visual content data 222, the content analyzer 212 may employ motion detection techniques to determine the display time of the frame of the audio-visual content data 222.
[00093] At block 308, the inactive frame is segmented to display the presentation content data and the audio-visual content data alternately. After determining the display time of the frames of presentation content data the audio-visual content data as described at block 306, the inactive frame is segmented in accordance with the time duration of the inactive frame and display time of the presentation content data and presentation content data computed above at block 304 and 308. In one example, an inactive frame of 30 seconds is divided into 3 segments, the first segment displays the frame of the audio-visual content data 222 for 15 seconds, second
segment displays frame of presentation content data 220 for 10 seconds and the third segment displays audio-visual content data 222 for 5 seconds. In one implementation, the segmentation module 216 segments the time duration of the inactive frame into different segments to display the presentation content data 220 and the audio-visual content data 222 alternately.
[00094] At block 310, synchronization point between presentation content data and audiovisual content data are identified. The synchronization points are specific points where the frames of the audio-visual content data need to be synchronized with the frames of the presentation content data. In one implementation, the synthesis module 110 identifies the synchronization points between the presentation content data 220 and the audio-visual content data 222. In order to identify the synchronization points, the synthesis module 110 extracts audio from the audio-visual content data 222 to determine points where the audio needs to be synchronized with the presentation content data 220. The synchronization points may not he determined if the presentation content data 220 and audio-visual content 222 are already in synchronization, that is, for example, in case they may be recorded at the same rate.
[00095] At block 312, multimedia presentation content, also referred to as synthesized content, is created by synthesizing the presentation content data and the audio-visual content data. The multimedia presentation content presents the presentation content data 220 and the audio-visual content data 222 alternately. In one implementation, the synthesis module 110 synthesizes frames of the presentation content data 220 with the frames of audio-visual content date 222 to create the multimedia presentation content, for example, the synthesized content data 224.
[00096] The multimedia presentation content is presented to a viewer. The multimedia presentation content, for example synthesized content data 224, can be presented to the viewer through a display device, for example, computer, laptop, mobile, and PDA. The synthesized content data 224 is a multimedia presentation used for e-learning and distance learning. In one example, the synthesized content can also be provided to a viewer through a downloadable file or an attachment which the user may view the on a client device, for example, client device 106.
[00097] Fig. 4 illustrates an exemplary method of furnishing multimedia presentation content. The order in which the exemplary method 400 is described is not intended to be
construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method, or an alternate method. Additionally, individual blocks may be deleted from the method without departing from the spirit and scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof.
[00098] At block 402, one of the presentation content data or the audio-visual content data is provided to a viewer based on the segmentation of an inactive frame. In one implementation, the segmentation of the inactive frame is done on the basis of time duration of the inactive frame and display time of presentation content data 220 and audio-visual content data 222. In one implementation, the synthesized content data 224 is provided to the viewer through the client device 106.
[00099] At block 404, a determination is made whether an input for toggling has been received from the viewer. Toggling refer to switching between the presentation content data and audio-visual content data. In one implementation, the input can be provided by a viewer through a switch on the client device 106. In another implementation, the input can be provided by the viewer by gazing at a section of the display of the client device 106 for a predetermined period of time. For example, the audio-visual content data 222 may be provided as a small thumbnail at one corner of the screen of the client device 106 while the presentation content data 220 occupying the entire screen is displayed. In such an implementation, gazing techniques may be used to provide input for toggling between the presentation content data 220 and audio-visual content data 222. If the viewer provides an input, the control flows to block 406. However, if the viewer does not provide an input the control flows to block 402 and the synthesized content is provided to the viewer without any changes.
[000100] At block 406, the inputs provided by the viewer at block 404 are analyzed. Based on the analysis a determination is made whether the viewer wants to view the presentation content data 220 or the audio-visual content data 222. In the above example, user interaction module 216 determines whether the viewer is gazing at the presentation content data 220 or me audio-visual content data 222. If the viewer is gazing at presentation content data 220 the user interaction module 216 may identify it as an indication that toggling is desired and the presentation content data 220 is displayed in full screen. In another example, the user interaction
module 216 identifies gestures, such as eye and head movement of the user to toggle between presentation content data 220 and video content data 222 alternately.
[000101] At block 408, switching to either the presentation content data or the audio-visual content data is done based on the inputs received at the block 404. In one example, if the viewer continuously gazes at the thumbnail depicting the audio-visual content data 222 the audio-visual content data 222 turns into full screen and the presentation content data 220 turns into a thumbnail. If the viewer gazes at the presentation content data 220, the audio-visual content data 222 turns into a thumbnail and the presentation content data 222 expand to full screen. In another example, if the audio-visual content data 222 is being provided and the viewer provides an input, through a switch, to view presentation content data 222, the presentation content data 222 is provided to the viewer.
CONCLUSION
[000102] Although embodiments for creating a multimedia presentation content using a presentation content data and an audio-visual content data have been described in language specific to structural features and/or methods, it is to be understood that the invention is not necessarily limited to the specific features or methods described. Rather, the specific features and methods for synthesizing presentation content and audio-visual content are disclosed as exemplary implementations of the present invention.
I/We Claim:
1. A method for synthesizing multimedia presentation content comprising:
analyzing frames of at least one of a presentation content data and audio-visual content data to identify an inactive frame;
computing a time duration of the inactive frame;
segmenting the inactive frame based on time duration and at least on one of the presentation content data and audio-visual content data; and
creating a multimedia presentation content, wherein the multimedia presentation content comprises at least one frame of the presentation content data and at least one frame of the audiovisual content data present within the time duration.
2. The method as claimed in claim 1, wherein the analyzing further comprises comparing at least one frame of at least one of the presentation content data and audio-visual content data with a subsequent frame of at least one of the presentation content data and audio-visual content data, respectively.
3. The method as claimed in claim 1, wherein the segmenting farther comprises calculating a display time for at least one of the presentation content data and audio-visual content data based at least on one of predefined rules and user defined inputs.
4. The method as claimed in claim 1, further comprising synchronizing the presentation content data and the audio-visual content data.
5. The method as claimed in claim 1, further comprising providing the multimedia presentation content to a viewer.
6. A content synthesis system (102) comprising:
a processor (202);
a memory (206) coupled to the processor (202), wherein the memory (206) comprises,
an analysis module (10S) configured to:
analyze frames of at least one of a presentation content data (220) and audio-visual content data (222) to identify an inactive frame; and
segment the inactive frame based at least on one of the presentation content data (220) and audio-visual content data (222); and
a synthesis module (110) configured to create a synthesized content data (224) based on the segmented inactive frame, wherein the synthesized content data (224) comprises at least one frame of the presentation content data (220) and at least one frame of the audio-visual content data (222) present within the inactive frame such that a display time of the at least one frame of the presentation content data (220) and the audio-visual content data (222) is based at least in part on a time duration of the inactive frame.
7. The content synthesis system (102) as claimed in claim 6, further comprising a content analyzer (212) configured to compute the time duration of the inactive frame.
8. The content synthesis system (102) as claimed in claim 7, wherein the content analyzer (212) analyzes at least one of the presentation content data (220) and the audio-visual content data (222) to determine the display time of at least one of the presentation content data (220) and audio-visual content data (222).
9. The content synthesis system (102) as claimed in claim 6, further comprising a segmentation module (214) configured to segment the inactive frame in a plurality of segments to provide presentation content data (220) and audio-visual content data (222) alternately.
10. The content synthesis system (102) as claimed in claim 6, wherein the synthesis module (212) is further configured to identify at least one synchronization point between the presentation content data (220) and the audio-visual content data (222).
11. The content synthesis system (102) as claimed in claim 1, wherein the content analyzer (212) classifies the presentation content data (220) in different level of complexities.
12. The content synthesis system as claimed in claim 7, wherein the content analyzer (212) is further configured to compare at least one frame of the presentation content data (220) and a subsequent frame of the presentation content data (220) to identify the inactive frame.
13. The system (102) as claimed in claim 6, wherein the synthesized content data (224) provides presentation content data (220) and audio-visual content data (222) alternately.
14. The content synthesis system (102) as claimed in claim 6, further comprising a user interaction module (216) configured to analyze user input for providing at least one of the presentation content data (220) and die audio-visual content data (222).