System And Method For Real Time Transcription Of A Web Conference For

< Back

System And Method For Real Time Transcription Of A Web Conference For Task Execution

Abstract: A system (100) for real-time transcription of a web-conference is disclosed. A media data receiving module (110) receives media data corresponding to the web-conference. A speech recognition module (120) processes the media data into a pre-defined format, recognizes speech from processed media data. A keyword identification module (130) extracts one or more first set of keywords from the speech recognized, generates one or more transcripts from the one or more first set of keywords extracted. A transcript broadcasting module (140) broadcasts the one or more transcripts on a display interface in real-time for one or more user associated activities. A workflow mechanization module (150) extracts one or more second set of keywords from the transcript, compares the one or more second set of keywords with one or more predetermined keywords of the transcript associated with one or more domain specific tasks, triggers a workflow corresponding to execution of a domain specific task from the one or more domain specific tasks. FIG. 1

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

04 June 2021

Publication Number

49/2022

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

filings@ipflair.com

Parent Application

Applicants

ASTI INFOTECH PRIVATE LIMITED

NO 90, MANJUNATH KANNIKA (MANKA), GROUND FLOOR, 2ND MAIN, ELECTRONIC CITY PHASE 1, BANGALORE, 560100, KARNATAKA, INDIA

Inventors

1. MAHENDRA PRATAP CHOUDHARY

VENI- 201, SJR VERITY, KASAVANAHALLI, HOSA ROAD, BENGALURU RURAL, 560035, KARNATAKA, INDIA

2. MANDEEP SINGH

HOUSE NO. 73, WARD NO.-6, NEAR HANUMAN MANDIR, DUGAL KALAN, PATRAN, PATIALA, 147105, PUNJAB, INDIA

3. SONAL MALHOTRA

HOUSE NO. 208, WARD NO. 14, PREMNAGAR, PATHAKHEDA, BETUL, 460449, MADHYA PRADESH INDIA

4. NAVIN MISTRY

E-904, WESTERNHILLS PHASE 2. S.N. 45/1, NEAR BELA CASA, BANER-SUS, PUNE, 411021, MAHARASHTRA, INDIA

Claims

1. A system (100) for real-time transcription of a web-conference for a task execution comprising: a processing subsystem (105) hosted on a server (108), wherein the processing subsystem (105) is configured to execute on a network to control bidirectional communications among a plurality of modules comprising: a media data receiving module (110) configured to receive media data corresponding to the web-conference, wherein the media data comprises a video content or an audio content; a speech recognition module (120) operatively coupled to the media data receiving module (110), wherein the speech recognition module (120) is configured to: process the media data received into a pre-defined format using one or more speech processing techniques; and recognize speech from processed media data of the predefined format in real-time by using a speech recognition technique; a keyword identification module (130) operatively coupled to the speech recognition module (120), wherein the keyword identification module (130) is configured to: extract one or more first set of keywords from the speech recognized based on a plurality of keyword spotting factors by using one or more keyword spotting techniques; and generate one or more transcripts from the one or more first set of keywords extracted; a transcript broadcasting module (140) operatively coupled to the keyword identification module (130), wherein the transcript broadcasting module (140) is configured to broadcast the one or more transcripts on a display interface in real-time for one or more user associated activities; and a workflow mechanization module (150) operatively coupled to the transcript broadcasting module (140), wherein the workflow mechanization module (150) is configured to: extract one or more second set of keywords from the one or more transcripts based on the one or more user associated activities; compare the one or more second set of keywords with one or more predetermined keywords of the one or more transcripts associated with one or more domain specific tasks; and trigger a workflow corresponding to execution of a domain specific task from the one or more domain specific tasks upon comparison of the one or more second set of keywords with the one or more predetermined keywords.

2. The system (100) as claimed in claim 1, wherein the media data corresponding to the web-conference is received from one or more media recording devices.

3. The system (100) as claimed in claim 1, wherein the one or more speech processing techniques comprises at least one of dynamic time warping technique, hidden Markov model, artificial neural network or a phase ware processing technique.

4. The system (100) as claimed in claim 1, wherein the plurality of keyword spotting factors comprises at least one of topic detection, fast audio search, voice enablement or a combination thereof.

5. The system (100) as claimed in claim 1, wherein the one or more keyword spotting techniques comprises at least one of a large vocabulary continuous speech recognition technique, an acoustic keyword spotting technique, a phonetic search keyword spotting technique or a combination thereof.

6. The system (100) as claimed in claim 1, wherein the one or more user associated activities comprises at least one of searching of one or more words from the transcript using external hyperlinks, selection of one or more words from the transcript, one or more advertisements recommendation for the user or a combination thereof.

7. The system (100) as claimed in claim 1, wherein the speech recognition module (120) is configured to determine a voiceprint of speaker in the web-conference from the media data based on a voice waveform of the speaker.

8. The system (100) as claimed in claim 1, wherein the keyword identification module (130) is configured to interpret the speech recognized for extraction of the one or more firs set of keywords using a natural language processing technique.

9. A method (300) comprising: receiving, by a media data receiving module of a processing subsystem, media data corresponding to the web-conference, wherein the media data comprises a video content or an audio content (310); processing, by a speech recognition module of the processing subsystem, the media data received into a pre-defined format using one or more speech processing techniques (320); recognizing, by the speech recognition module of the processing subsystem, speech from processed media data of the predefined format in real-time by using a speech recognition technique (330); extracting, by a keyword identification module of the processing subsystem, one or more first set of keywords from the speech recognized based on a plurality of keyword spotting factors by using one or more keyword spotting techniques (340); generating, by the keyword identification module of the processing subsystem, one or more transcripts from the one or more first set of keywords extracted (350); broadcasting, by a transcript broadcasting module of the processing subsystem, the one or more transcripts on a display interface in real-time for one or more user associated activities (360); extracting, by a workflow mechanization module of the processing subsystem, one or more second set of keywords from the transcript based on the one or more user associated activities (370); comparing, by the workflow mechanization module of the processing subsystem, the one or more second set of keywords with one or more predetermined keywords of the transcript associated with one or more domain specific tasks (380); and triggering, by the workflow mechanization module of the processing subsystem, a workflow corresponding to execution of a domain specific task from the one or more domain specific tasks upon comparison of the one or more second set of keywords with the one or more predetermined keywords (390). Dated this 04th day of June 2021 Signature Harish Naidu Patent Agent (IN/PA-2896) Agent for the Applicant , Description:BACKGROUND [0001] Embodiments of the present disclosure relate to a transcription system and more particularly to a system and a method for real-time transcription of a web-conference for task execution. [0002] Web conferencing is often used for business and personal use as an effective and convenient communication method that bypasses the need to physically travel to a location to have a face-to-face conversation. The web conferences such as audio conferences or video conferences are becoming increasingly popular because they can simultaneously connect hundreds of people from anywhere on the planet to a live and continue conversation from any place of the world. Like in any conversation, however, video conferences may be impeded by language barriers, unrecognizable accents, fast speaking, or the chance that attendees arrive late to a multi-person conference and miss what was previously discussed. Various systems are available which helps in analysis of the conversation in the web-conferences by creating transcripts. [0003] Conventionally, the system available for web-conference transcription includes converting the video or the audio content into text format. However, such a conventional system involves manual intervention in transcript formation which is time consuming as well as compromises accuracy of transcription process. Also, such a conventional system is unable to perform real-time transcription from the media content. Moreover, such a conventional system size generates transcript of larger sizes for the entire conference which further increases space complexity for storage of the transcripts. [0004] Hence, there is a need for an improved system and a method for real-time transcription of a web-conference for task execution in order to address the aforementioned issues. BRIEF DESCRIPTION [0005] In accordance with an embodiment, of the present disclosure, a system for real-time transcription of a web-conference for task execution is disclosed. The system includes a processing subsystem hosted on a server, wherein the processing subsystem is configured to execute on a network to control bidirectional communications among a plurality of modules. The processing subsystem includes a media data receiving module configured to receive media data corresponding to the web-conference, wherein the media data comprises a video content or an audio content. The processing subsystem also includes a speech recognition module operatively coupled to the media data receiving module. The speech recognition module is configured to process the media data received into a pre-defined format using one or more speech processing techniques. The speech recognition module is also configured to recognize speech from processed media data of the predefined format in real-time by using a speech recognition technique. The processing subsystem also includes a keyword identification module operatively coupled to the speech recognition module. The keyword identification module is configured to extract one or more first set of keywords from the speech recognized based on a plurality of keyword spotting factors by using one or more keyword spotting techniques. The keyword identification module is also configured to generate one or more transcripts from the one or more first set of keywords extracted. The processing subsystem also includes a transcript broadcasting module operatively coupled to the keyword identification module. The transcript broadcasting module is configured to broadcast the one or more transcripts on a display interface in real-time for one or more user associated activities. The processing subsystem also includes a workflow mechanization module operatively coupled to the transcript broadcasting module. The workflow mechanization module is configured to extract one or more second set of keywords from the transcript based on the one or more user associated activities. The workflow mechanization module is also configured to compare the one or more second set of keywords with one or more predetermined keywords of the transcript associated with one or more domain specific tasks. The workflow mechanization module is also configured to trigger a workflow corresponding to execution of a domain specific task from the one or more domain specific tasks upon comparison of the one or more second set of keywords with the one or more predetermined keywords. [0006] In accordance with another embodiment of the present disclosure, a method for real-time transcription of a web-conference for task execution is disclosed. The method includes receiving, by a media data receiving module of a processing subsystem, media data corresponding to the web-conference, wherein the media data comprises a video content or an audio content. The method also includes processing, by a speech recognition module of the processing subsystem, the media data received into a pre-defined format using one or more speech processing techniques. The method also includes recognizing, by the speech recognition module of the processing subsystem, speech from processed media data of the predefined format in real-time by using a speech recognition technique. The method also includes extracting, by a keyword identification module of the processing subsystem, one or more first set of keywords from the speech recognized based on a plurality of keyword spotting factors by using one or more keyword spotting techniques. The method also includes generating, by the keyword identification module of the processing subsystem, one or more transcripts from the one or more first set of keywords extracted. The method also includes broadcasting, by a transcript broadcasting module of the processing subsystem, the one or more transcripts on a display interface in real-time for one or more user associated activities. The method also includes extracting, by a workflow mechanization module of the processing subsystem, one or more second set of keywords from the transcript based on the one or more user associated activities. The method also includes comparing, by the workflow mechanization module of the processing subsystem, the one or more second set of keywords with one or more predetermined keywords of the transcript associated with one or more domain specific tasks. The method also includes triggering, by the workflow mechanization module of the processing subsystem, a workflow corresponding to execution of a domain specific task from the one or more domain specific tasks upon comparison of the one or more second set of keywords with the one or more predetermined keywords. [0007] To further clarify the advantages and features of the present disclosure, a more particular description of the disclosure will follow by reference to specific embodiments thereof, which are illustrated in the appended figures. It is to be appreciated that these figures depict only typical embodiments of the disclosure and are therefore not to be considered limiting in scope. The disclosure will be described and explained with additional specificity and detail with the appended figures. BRIEF DESCRIPTION OF THE DRAWINGS The disclosure will be described and explained with additional specificity and detail with the accompanying figures in which: [0008] FIG. 1 is a block diagram of a system for real-time transcription of a web-conference for task execution in accordance with an embodiment of the present disclosure; [0009] FIG. 2 is a schematic representation of an exemplary embodiment of a system for real-time transcription of a web-conference for task execution of FIG. 1 in accordance with an embodiment of the present disclosure; [0010] FIG. 3 is a block diagram of a computer or a server in accordance with an embodiment of the present disclosure; and [0011] FIG. 4 (a) and FIG. 4 (b) is a flow chart representing the steps involved in a method for real-time transcription of a web-conference for task execution in accordance with an embodiment of the present disclosure. [0012] Further, those skilled in the art will appreciate that elements in the figures are illustrated for simplicity and may not have necessarily been drawn to scale. Furthermore, in terms of the construction of the device, one or more components of the device may have been represented in the figures by conventional symbols, and the figures may show only those specific details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the figures with details that will be readily apparent to those skilled in the art having the benefit of the description herein. DETAILED DESCRIPTION [0013] For the purpose of promoting an understanding of the principles of the disclosure, reference will now be made to the embodiment illustrated in the figures and specific language will be used to describe them. It will nevertheless be understood that no limitation of the scope of the disclosure is thereby intended. Such alterations and further modifications in the illustrated system, and such further applications of the principles of the disclosure as would normally occur to those skilled in the art are to be construed as being within the scope of the present disclosure. [0014] The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such a process or method. Similarly, one or more devices or sub-systems or elements or structures or components preceded by "comprises... a" does not, without more constraints, preclude the existence of other devices, sub-systems, elements, structures, components, additional devices, additional sub-systems, additional elements, additional structures or additional components. Appearances of the phrase "in an embodiment", "in another embodiment" and similar language throughout this specification may, but not necessarily do, all refer to the same embodiment. [0015] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the art to which this disclosure belongs. The system, methods, and examples provided herein are only illustrative and not intended to be limiting. [0016] In the following specification and the claims, reference will be made to a number of terms, which shall be defined to have the following meanings. The singular forms “a”, “an”, and “the” include plural references unless the context clearly dictates otherwise. [0017] Embodiments of the present disclosure relate to a system and a method for real-time transcription of a web-conference for task execution. The system includes a processing subsystem hosted on a server, wherein the processing subsystem is configured to execute on a network to control bidirectional communications among a plurality of modules. The processing subsystem includes a media data receiving module configured to receive media data corresponding to the web-conference, wherein the media data comprises a video content or an audio content. The processing subsystem also includes a speech recognition module operatively coupled to the media data receiving module. The speech recognition module is configured to process the media data received into a pre-defined format using one or more speech processing techniques. The speech recognition module is also configured to recognize speech from processed media data of the predefined format in real-time by using a speech recognition technique. The processing subsystem also includes a keyword identification module operatively coupled to the speech recognition module. The keyword identification module is configured to extract one or more first set of keywords from the speech recognized based on a plurality of keyword spotting factors by using one or more keyword spotting techniques. The keyword identification module is also configured to generate one or more transcripts from the one or more first set of keywords extracted. The processing subsystem also includes a transcript broadcasting module operatively coupled to the keyword identification module. The transcript broadcasting module is configured to broadcast the one or more transcripts on a display interface in real-time for one or more user associated activities. The processing subsystem also includes a workflow mechanization module operatively coupled to the transcript broadcasting module. The workflow mechanization module is configured to extract one or more second set of keywords from the transcript based on the one or more user associated activities. The workflow mechanization module is also configured to compare the one or more second set of keywords with one or more predetermined keywords of the transcript associated with one or more domain specific tasks. The workflow mechanization module is also configured to trigger a workflow corresponding to execution of a domain specific task from the one or more domain specific tasks upon comparison of the one or more second set of keywords with the one or more predetermined keywords. [0018] FIG. 1 is a block diagram of a system (100) for real-time transcription of a web-conference for task execution in accordance with an embodiment of the present disclosure. The system (100) includes a processing subsystem (105) hosted on a server (108). In one embodiment, the server (108) may include a cloud server. In another embodiment, the server (108) may include a local server. The processing subsystem (105) is configured to execute on a network (not shown in FIG. 1) to control bidirectional communications among a plurality of modules. In one embodiment, the network may include a wired network such as local area network (LAN). In another embodiment, the network may include a wireless network such as Wi-Fi, Bluetooth, Zigbee, near field communication (NFC), infra-red communication (RFID) or the like. [0019] The processing subsystem (105) includes a media data receiving module (110) configured to receive media data corresponding to the web-conference, wherein the media data includes a video content or an audio content. In one embodiment, the media data corresponding to the web-conference is received from one or more media recording devices. In such embodiment, the one or more media recording devices may include, but not limited to a voice recorder of an electronic device, an inbuilt microphone of an electronic device, an image acquisition device and the like. In such embodiment, the electronic device may include at least one of a laptop, a desktop, a mobile phone, a personal digital assistant (PDA), an electronic tablet and the like. [0020] The processing subsystem (105) also includes a speech recognition module (120) operatively coupled to the media data receiving module (110). The speech recognition module (120) is configured to process the media data received into a pre-defined format using one or more speech processing techniques. As used herein, the term ‘one or more speech processing techniques’ is defined as a technique which deals with study and processing methods of speech signals in digital representation. In one embodiment, the one or more speech processing techniques may include at least one of dynamic time warping technique, hidden Markov model, artificial neural network or a phase ware processing technique. The speech recognition module (120) is also configured to determine a voiceprint of speaker in the web-conference from the media data based on a voice waveform of the speaker. The speech recognition module (120) is also configured to recognize speech from processed media data of the predefined format in real-time by using a speech recognition technique. Again, the term ‘speech recognition technique’ is defined as a technique which enables a machine to identify words spoken aloud and convert them into readable text. [0021] The processing subsystem (105) also includes a keyword identification module (130) operatively coupled to the speech recognition module (120). In a particular embodiment, the keyword identification module (130) is configured to interpret the speech recognized for extraction of the one or more firs set of keywords using a natural language processing technique (NLP). The keyword identification module (130) is configured to extract one or more first set of keywords from the speech recognized based on a plurality of keyword spotting factors by using one or more keyword spotting techniques. As used herein, the term ‘one or more first set of keywords’ is defined as one or more terms or phrases recognized from speech uttered in the media data upon speech recognition. In one embodiment, the plurality of keyword spotting factors may include at least one of topic detection, fast audio search, voice enablement or a combination thereof. In such embodiment, the one or more keyword spotting techniques may include at least one of a large vocabulary continuous speech recognition (LVCSR) technique, an acoustic keyword spotting technique, a phonetic search keyword spotting technique or a combination thereof. The keyword identification module (130) is also configured to generate one or more transcripts from the one or more first set of keywords extracted. As used herein, the term ‘one or more transcripts’ is defined as one or more records or phrases of the speech uttered by a speaker in a video or an audio. [0022] The processing subsystem (105) also includes a transcript broadcasting module (140) operatively coupled to the keyword identification module (130). The transcript broadcasting module (140) is configured to broadcast the one or more transcripts on a display interface in real-time for one or more user associated activities. In one embodiment, the display interface may include a graphical user interface (GUI) of the electronic device. In some embodiment, the one or more user associated activities may include, but not limited to, at least one of searching of one or more words from the transcript using external hyperlinks, selection of one or more words from the transcript, one or more advertisements recommendation for the user or a combination thereof. [0023] The processing subsystem (105) also includes a workflow mechanization module (150) operatively coupled to the transcript broadcasting module (140). The workflow mechanization module (150) is configured to extract one or more second set of keywords from the one or more transcripts based on the one or more user associated activities. As used herein, the term ‘one or more second set of keywords’ is defined as one or more significant words which are extracted from the transcript automatically based on some criteria. [0024] The workflow mechanization module (150) is also configured to compare the one or more second set of keywords with one or more predetermined keywords of the transcript associated with one or more domain specific tasks. Similarly, the term ‘one or more predetermined keywords’ is defined as one or more keywords which are already selected corresponding to any particular task and prestored in database for future reference. The workflow mechanization module (150) is also configured to trigger a workflow corresponding to execution of a domain specific task from the one or more domain specific tasks upon comparison of the one or more second set of keywords with the one or more predetermined keywords. In a specific embodiment, the one or more domain specific tasks may include, but not limited to, a bank specific task, an industrial task, a medical assistance task, a cooking instruction task, an educational institution specific task and the like. In such embodiment, the bank specific task may include an issue of a chequebook, a balance check task and the like. In another embodiment, the industrial task may include an instruction for running a software code, an instruction for switching off a control system of a machine and the like. In yet another embodiment, the medical assistance task may include an instruction for initiating diagnostic test, an instruction for initiating a surgery and the like. In one embodiment, the cooking instruction task may include addition of a specific ingredient, cooking off a gas stove and the like. [0025] FIG. 2 is a schematic representation of an exemplary embodiment of a system (100) for real-time transcription of a web-conference for task execution of FIG. 1 in accordance with an embodiment of the present disclosure. Considering an example, in which the system (100) is utilized by a bank. Let’s assume that in the bank, an employee is having an online conversation with a client related to one or more baking offers. Such online conversation or the web-conversation between the client and an employee of the bank is critical and if recorded via any means helps in further processes such as in automation of a task execution. For utilizing the conversation of the web-conference, real-time transcription process is essential so that it could be used further for one or more purposes. [0026] For receiving recording of media data corresponding to the web-conference, a media data receiving module (110) of the system (100) is utilized. The media data receiving module (110) is located on a processing subsystem (105) which is hosted on a cloud server (108). In the example used herein, the media data includes an audio content. Upon receiving the media data, a speech recognition module (120) of the processing subsystem (105) process the media data received into a pre-defined format using one or more speech processing techniques. For example, the one or more speech processing techniques may include at least one of dynamic time warping technique, hidden Markov model, artificial neural network or a phase ware processing technique. Once, the media data is processed, the speech recognition module (120) recognizes speech from processed media data of the predefined format in real-time by using a speech recognition technique. [0027] Again, based on the speech recognized, a keyword identification module (130) of the processing subsystem (105) interprets the speech for extraction of the one or more firs set of keywords using a natural language processing technique (NLP). Also, the keyword identification module (130) extracts one or more first set of keywords from the speech recognized based on a plurality of keyword spotting factors by using one or more keyword spotting techniques. For example, the plurality of keyword spotting factors may include at least one of a large vocabulary continuous speech recognition (LVCSR) technique, an acoustic keyword spotting technique, a phonetic search keyword spotting technique or a combination thereof. Further, the keyword identification module (130) is also configured to generate one or more transcripts from the one or more first set of keywords extracted. [0028] Once, the one or more transcripts are generated, a transcript broadcasting module (140) is configured to broadcast the one or more transcripts on a display interface in real-time for one or more user associated activities. For example, the display interface may include a graphical user interface (GUI) of the electronic device. Here, the one or more user associated activities may include, but not limited to, at least one of searching of one or more words from the transcript using external hyperlinks, selection of one or more words from the transcript, one or more advertisements recommendation for the user or a combination thereof. [0029] Based on the one or more user associated activities, a workflow mechanization module (150) is configured to extract one or more second set of keywords from the one or more transcripts. Also, the workflow mechanization module (150) is also configured to compare the one or more second set of keywords with one or more predetermined keywords of the one or more transcripts associated with one or more domain specific tasks. Further, upon comparison of the one or more second set of keywords with the one or more predetermined keywords, the workflow mechanization module (150) is also configured to trigger a workflow corresponding to execution of a domain specific task from the one or more domain specific tasks. In the example used herein, as the domain specific task was associated with the bank, so an example of such task includes an issue of a chequebook for client in order to avail an offer or a balance check task for providing the offer to the client and the like. Thus, the system (100) by automating the real-time transcription process from the web-conference further helps in execution of several domain specific tasks in various sectors in a faster and an accurate way. [0030] FIG. 3 is a block diagram of a computer or a server in accordance with an embodiment of the present disclosure. The server (200) includes processor(s) (230), and memory (210) operatively coupled to the bus (220). The processor(s) (230), as used herein, means any type of computational circuit, such as, but not limited to, a microprocessor, a microcontroller, a complex instruction set computing microprocessor, a reduced instruction set computing microprocessor, a very long instruction word microprocessor, an explicitly parallel instruction computing microprocessor, a digital signal processor, or any other type of processing circuit, or a combination thereof. [0031] The memory (210) includes several subsystems stored in the form of executable program which instructs the processor (230) to perform the method steps illustrated in FIG. 1. The memory (210) includes a processing subsystem (105) of FIG.1. The processing subsystem (105) further has following modules: a media data receiving module (110), a speech recognition module (120), a keyword identification module (130), a transcript broadcasting module (140) and a workflow mechanization module (150). [0032] The media data receiving module (110) is configured to receive media data corresponding to the web-conference, wherein the media data comprises a video content or an audio content. The speech recognition module (120) is configured to process the media data received into a pre-defined format using one or more speech processing techniques. The speech recognition module (120) is also configured to recognize speech from processed media data of the predefined format in real-time by using a speech recognition technique. The keyword identification module (130) is configured to extract one or more first set of keywords from the speech recognized based on a plurality of keyword spotting factors by using one or more keyword spotting techniques. The keyword identification module (130) is also configured to generate one or more transcripts from the one or more first set of keywords extracted. The transcript broadcasting module (140) is configured to broadcast the one or more transcripts on a display interface in real-time for one or more user associated activities. The workflow mechanization module (150) is configured to extract one or more second set of keywords from the transcript based on the one or more user associated activities. The workflow mechanization module (150) is also configured to compare the one or more second set of keywords with one or more predetermined keywords of the transcript associated with one or more domain specific tasks. The workflow mechanization module (150) is also configured to trigger a workflow corresponding to execution of a domain specific task from the one or more domain specific tasks upon comparison of the one or more second set of keywords with the one or more predetermined keywords. [0033] The bus (220) as used herein refers to be internal memory channels or computer network that is used to connect computer components and transfer data between them. The bus (220) includes a serial bus or a parallel bus, wherein the serial bus transmits data in bit-serial format and the parallel bus transmits data across multiple wires. The bus (220) as used herein, may include but not limited to, a system bus, an internal bus, an external bus, an expansion bus, a frontside bus, a backside bus and the like. [0034] FIG. 4 (a) and FIG. 4 (b) is a flow chart representing the steps involved in a method (300) for real-time transcription of a web-conference for task execution in accordance with an embodiment of the present disclosure. The method (300) includes receiving, by a media data receiving module of a processing subsystem, media data corresponding to the web-conference, wherein the media data comprises a video content or an audio content in step 310. In one embodiment, receiving the media data corresponding to the web-conference may include receiving the media data from one or more media recording devices. In such embodiment, the one or more media recording devices may include, but not limited to a voice recorder of an electronic device, an inbuilt microphone of an electronic device, an image acquisition device and the like. [0035] The method (300) also includes processing, by a speech recognition module of the processing subsystem, the media data received into a pre-defined format using one or more speech processing techniques in step 320. In one embodiment, processing the media data received into the pre-defined format using the one or more speech processing techniques may include processing the media data using the one or more speech processing techniques which may include at least one of dynamic time warping technique, hidden Markov model, artificial neural network or a phase ware processing technique. The method (300) also includes recognizing, by the speech recognition module of the processing subsystem, speech from processed media data of the predefined format in real-time by using a speech recognition technique in step 330. [0036] The method (300) also includes extracting, by a keyword identification module of the processing subsystem, one or more first set of keywords from the speech recognized based on a plurality of keyword spotting factors by using one or more keyword spotting techniques in step 340. In one embodiment, extracting the one or more first set of the keywords from the speech recognized may include extracting the one or more first set of the keywords based on at least one of topic detection, fast audio search, voice enablement or a combination thereof. In such embodiment, extracting the first set of the one or more keywords from the speech may include extracting the first set of the one or more keywords using at least one of topic detection, fast audio search, voice enablement or a combination thereof. The method (300) also includes generating, by the keyword identification module of the processing subsystem, one or more transcripts from the one or more first set of keywords extracted in step 350. [0037] The method (300) also includes broadcasting, by a transcript broadcasting module of the processing subsystem, the one or more transcripts on a display interface in real-time for one or more user associated activities in step 360. In one embodiment, broadcasting the one or more transcripts on the display interface in the real-time may include broadcasting the one or more transcripts on the display interface in the real-time for the one or more user associated activities including, but not limited to, at least one of searching of one or more words from the transcript using external hyperlinks, selection of one or more words from the transcript, one or more advertisements recommendation for the user or a combination thereof. [0038] The method (300) also includes extracting, by a workflow mechanization module of the processing subsystem, one or more second set of keywords from the transcript based on the one or more user associated activities in step 370. The method (300) also includes comparing, by the workflow mechanization module of the processing subsystem, the one or more second set of keywords with one or more predetermined keywords of the transcript associated with one or more domain specific tasks in step 380. In one embodiment, comparing the one or more second set of keywords with the one or more predetermined keywords of the transcript associated with the one or more domain specific tasks may include comparing the one or more second set of keywords with the one or more predetermined keywords of the transcript associated with including, but not limited to, a bank specific task, an industrial task, a medical assistance task, a cooking instruction task, an educational institution specific task and the like. [0039] The method (300) also includes triggering, by the workflow mechanization module of the processing subsystem, a workflow corresponding to execution of a domain specific task from the one or more domain specific tasks upon comparison of the one or more second set of keywords with the one or more predetermined keywords in step 390. [0040] Various embodiments of the present disclosure relate to a system for real-time transcription of the web-conference automatically which not only makes the transcription process faster but also eliminates manual intervention. [0041] Moreover, the present disclosed system generates smaller size transcripts from the speech uttered in the web-conference which further helps in reducing the size of the transcripts thereby solves the problem of space complexity. [0042] Furthermore, the present disclosed system also automatically initiates a workflow for execution of a particular task based on the transcripts generated in real-time. Hence, helps in automation of the workflow based on the speech uttered in the web-conference through the transcription process. [0043] It will be understood by those skilled in the art that the foregoing general description and the following detailed description are exemplary and explanatory of the disclosure and are not intended to be restrictive thereof. [0044] While specific language has been used to describe the disclosure, any limitations arising on account of the same are not intended. As would be apparent to a person skilled in the art, various working modifications may be made to the method in order to implement the inventive concept as taught herein. [0045] The figures and the foregoing description give examples of embodiments. Those skilled in the art will appreciate that one or more of the described elements may well be combined into a single functional element. Alternatively, certain elements may be split into multiple functional elements. Elements from one embodiment may be added to another embodiment. For example, the order of processes described herein may be changed and are not limited to the manner described herein. Moreover, the actions of any flow diagram need not be implemented in the order shown; nor do all of the acts need to be necessarily performed. Also, those acts that are not dependent on other acts may be performed in parallel with the other acts. The scope of embodiments is by no means limited by these specific examples.

Specification

Claims:1. A system (100) for real-time transcription of a web-conference for a task execution comprising:
a processing subsystem (105) hosted on a server (108), wherein the processing subsystem (105) is configured to execute on a network to control bidirectional communications among a plurality of modules comprising:
a media data receiving module (110) configured to receive media data corresponding to the web-conference, wherein the media data comprises a video content or an audio content;
a speech recognition module (120) operatively coupled to the media data receiving module (110), wherein the speech recognition module (120) is configured to:
process the media data received into a pre-defined format using one or more speech processing techniques; and
recognize speech from processed media data of the predefined format in real-time by using a speech recognition technique;
a keyword identification module (130) operatively coupled to the speech recognition module (120), wherein the keyword identification module (130) is configured to:
extract one or more first set of keywords from the speech recognized based on a plurality of keyword spotting factors by using one or more keyword spotting techniques; and
generate one or more transcripts from the one or more first set of keywords extracted;
a transcript broadcasting module (140) operatively coupled to the keyword identification module (130), wherein the transcript broadcasting module (140) is configured to broadcast the one or more transcripts on a display interface in real-time for one or more user associated activities; and
a workflow mechanization module (150) operatively coupled to the transcript broadcasting module (140), wherein the workflow mechanization module (150) is configured to:
extract one or more second set of keywords from the one or more transcripts based on the one or more user associated activities;
compare the one or more second set of keywords with one or more predetermined keywords of the one or more transcripts associated with one or more domain specific tasks; and
trigger a workflow corresponding to execution of a domain specific task from the one or more domain specific tasks upon comparison of the one or more second set of keywords with the one or more predetermined keywords.
2. The system (100) as claimed in claim 1, wherein the media data corresponding to the web-conference is received from one or more media recording devices.
3. The system (100) as claimed in claim 1, wherein the one or more speech processing techniques comprises at least one of dynamic time warping technique, hidden Markov model, artificial neural network or a phase ware processing technique.
4. The system (100) as claimed in claim 1, wherein the plurality of keyword spotting factors comprises at least one of topic detection, fast audio search, voice enablement or a combination thereof.
5. The system (100) as claimed in claim 1, wherein the one or more keyword spotting techniques comprises at least one of a large vocabulary continuous speech recognition technique, an acoustic keyword spotting technique, a phonetic search keyword spotting technique or a combination thereof.
6. The system (100) as claimed in claim 1, wherein the one or more user associated activities comprises at least one of searching of one or more words from the transcript using external hyperlinks, selection of one or more words from the transcript, one or more advertisements recommendation for the user or a combination thereof.
7. The system (100) as claimed in claim 1, wherein the speech recognition module (120) is configured to determine a voiceprint of speaker in the web-conference from the media data based on a voice waveform of the speaker.
8. The system (100) as claimed in claim 1, wherein the keyword identification module (130) is configured to interpret the speech recognized for extraction of the one or more firs set of keywords using a natural language processing technique.
9. A method (300) comprising:
receiving, by a media data receiving module of a processing subsystem, media data corresponding to the web-conference, wherein the media data comprises a video content or an audio content (310);
processing, by a speech recognition module of the processing subsystem, the media data received into a pre-defined format using one or more speech processing techniques (320);
recognizing, by the speech recognition module of the processing subsystem, speech from processed media data of the predefined format in real-time by using a speech recognition technique (330);
extracting, by a keyword identification module of the processing subsystem, one or more first set of keywords from the speech recognized based on a plurality of keyword spotting factors by using one or more keyword spotting techniques (340);
generating, by the keyword identification module of the processing subsystem, one or more transcripts from the one or more first set of keywords extracted (350);
broadcasting, by a transcript broadcasting module of the processing subsystem, the one or more transcripts on a display interface in real-time for one or more user associated activities (360);
extracting, by a workflow mechanization module of the processing subsystem, one or more second set of keywords from the transcript based on the one or more user associated activities (370);
comparing, by the workflow mechanization module of the processing subsystem, the one or more second set of keywords with one or more predetermined keywords of the transcript associated with one or more domain specific tasks (380); and
triggering, by the workflow mechanization module of the processing subsystem, a workflow corresponding to execution of a domain specific task from the one or more domain specific tasks upon comparison of the one or more second set of keywords with the one or more predetermined keywords (390).

Dated this 04th day of June 2021

Signature

Harish Naidu
Patent Agent (IN/PA-2896)
Agent for the Applicant
, Description:BACKGROUND
[0001] Embodiments of the present disclosure relate to a transcription system and more particularly to a system and a method for real-time transcription of a web-conference for task execution.
[0002] Web conferencing is often used for business and personal use as an effective and convenient communication method that bypasses the need to physically travel to a location to have a face-to-face conversation. The web conferences such as audio conferences or video conferences are becoming increasingly popular because they can simultaneously connect hundreds of people from anywhere on the planet to a live and continue conversation from any place of the world. Like in any conversation, however, video conferences may be impeded by language barriers, unrecognizable accents, fast speaking, or the chance that attendees arrive late to a multi-person conference and miss what was previously discussed. Various systems are available which helps in analysis of the conversation in the web-conferences by creating transcripts.
[0003] Conventionally, the system available for web-conference transcription includes converting the video or the audio content into text format. However, such a conventional system involves manual intervention in transcript formation which is time consuming as well as compromises accuracy of transcription process. Also, such a conventional system is unable to perform real-time transcription from the media content. Moreover, such a conventional system size generates transcript of larger sizes for the entire conference which further increases space complexity for storage of the transcripts.
[0004] Hence, there is a need for an improved system and a method for real-time transcription of a web-conference for task execution in order to address the aforementioned issues.
BRIEF DESCRIPTION
[0005] In accordance with an embodiment, of the present disclosure, a system for real-time transcription of a web-conference for task execution is disclosed. The system includes a processing subsystem hosted on a server, wherein the processing subsystem is configured to execute on a network to control bidirectional communications among a plurality of modules. The processing subsystem includes a media data receiving module configured to receive media data corresponding to the web-conference, wherein the media data comprises a video content or an audio content. The processing subsystem also includes a speech recognition module operatively coupled to the media data receiving module. The speech recognition module is configured to process the media data received into a pre-defined format using one or more speech processing techniques. The speech recognition module is also configured to recognize speech from processed media data of the predefined format in real-time by using a speech recognition technique. The processing subsystem also includes a keyword identification module operatively coupled to the speech recognition module. The keyword identification module is configured to extract one or more first set of keywords from the speech recognized based on a plurality of keyword spotting factors by using one or more keyword spotting techniques. The keyword identification module is also configured to generate one or more transcripts from the one or more first set of keywords extracted. The processing subsystem also includes a transcript broadcasting module operatively coupled to the keyword identification module. The transcript broadcasting module is configured to broadcast the one or more transcripts on a display interface in real-time for one or more user associated activities. The processing subsystem also includes a workflow mechanization module operatively coupled to the transcript broadcasting module. The workflow mechanization module is configured to extract one or more second set of keywords from the transcript based on the one or more user associated activities. The workflow mechanization module is also configured to compare the one or more second set of keywords with one or more predetermined keywords of the transcript associated with one or more domain specific tasks. The workflow mechanization module is also configured to trigger a workflow corresponding to execution of a domain specific task from the one or more domain specific tasks upon comparison of the one or more second set of keywords with the one or more predetermined keywords.
[0006] In accordance with another embodiment of the present disclosure, a method for real-time transcription of a web-conference for task execution is disclosed. The method includes receiving, by a media data receiving module of a processing subsystem, media data corresponding to the web-conference, wherein the media data comprises a video content or an audio content. The method also includes processing, by a speech recognition module of the processing subsystem, the media data received into a pre-defined format using one or more speech processing techniques. The method also includes recognizing, by the speech recognition module of the processing subsystem, speech from processed media data of the predefined format in real-time by using a speech recognition technique. The method also includes extracting, by a keyword identification module of the processing subsystem, one or more first set of keywords from the speech recognized based on a plurality of keyword spotting factors by using one or more keyword spotting techniques. The method also includes generating, by the keyword identification module of the processing subsystem, one or more transcripts from the one or more first set of keywords extracted. The method also includes broadcasting, by a transcript broadcasting module of the processing subsystem, the one or more transcripts on a display interface in real-time for one or more user associated activities. The method also includes extracting, by a workflow mechanization module of the processing subsystem, one or more second set of keywords from the transcript based on the one or more user associated activities. The method also includes comparing, by the workflow mechanization module of the processing subsystem, the one or more second set of keywords with one or more predetermined keywords of the transcript associated with one or more domain specific tasks. The method also includes triggering, by the workflow mechanization module of the processing subsystem, a workflow corresponding to execution of a domain specific task from the one or more domain specific tasks upon comparison of the one or more second set of keywords with the one or more predetermined keywords.
[0007] To further clarify the advantages and features of the present disclosure, a more particular description of the disclosure will follow by reference to specific embodiments thereof, which are illustrated in the appended figures. It is to be appreciated that these figures depict only typical embodiments of the disclosure and are therefore not to be considered limiting in scope. The disclosure will be described and explained with additional specificity and detail with the appended figures.
BRIEF DESCRIPTION OF THE DRAWINGS
The disclosure will be described and explained with additional specificity and detail with the accompanying figures in which:
[0008] FIG. 1 is a block diagram of a system for real-time transcription of a web-conference for task execution in accordance with an embodiment of the present disclosure;
[0009] FIG. 2 is a schematic representation of an exemplary embodiment of a system for real-time transcription of a web-conference for task execution of FIG. 1 in accordance with an embodiment of the present disclosure;
[0010] FIG. 3 is a block diagram of a computer or a server in accordance with an embodiment of the present disclosure; and
[0011] FIG. 4 (a) and FIG. 4 (b) is a flow chart representing the steps involved in a method for real-time transcription of a web-conference for task execution in accordance with an embodiment of the present disclosure.
[0012] Further, those skilled in the art will appreciate that elements in the figures are illustrated for simplicity and may not have necessarily been drawn to scale. Furthermore, in terms of the construction of the device, one or more components of the device may have been represented in the figures by conventional symbols, and the figures may show only those specific details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the figures with details that will be readily apparent to those skilled in the art having the benefit of the description herein.
DETAILED DESCRIPTION
[0013] For the purpose of promoting an understanding of the principles of the disclosure, reference will now be made to the embodiment illustrated in the figures and specific language will be used to describe them. It will nevertheless be understood that no limitation of the scope of the disclosure is thereby intended. Such alterations and further modifications in the illustrated system, and such further applications of the principles of the disclosure as would normally occur to those skilled in the art are to be construed as being within the scope of the present disclosure.
[0014] The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such a process or method. Similarly, one or more devices or sub-systems or elements or structures or components preceded by "comprises... a" does not, without more constraints, preclude the existence of other devices, sub-systems, elements, structures, components, additional devices, additional sub-systems, additional elements, additional structures or additional components. Appearances of the phrase "in an embodiment", "in another embodiment" and similar language throughout this specification may, but not necessarily do, all refer to the same embodiment.
[0015] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the art to which this disclosure belongs. The system, methods, and examples provided herein are only illustrative and not intended to be limiting.
[0016] In the following specification and the claims, reference will be made to a number of terms, which shall be defined to have the following meanings. The singular forms “a”, “an”, and “the” include plural references unless the context clearly dictates otherwise.
[0017] Embodiments of the present disclosure relate to a system and a method for real-time transcription of a web-conference for task execution. The system includes a processing subsystem hosted on a server, wherein the processing subsystem is configured to execute on a network to control bidirectional communications among a plurality of modules. The processing subsystem includes a media data receiving module configured to receive media data corresponding to the web-conference, wherein the media data comprises a video content or an audio content. The processing subsystem also includes a speech recognition module operatively coupled to the media data receiving module. The speech recognition module is configured to process the media data received into a pre-defined format using one or more speech processing techniques. The speech recognition module is also configured to recognize speech from processed media data of the predefined format in real-time by using a speech recognition technique. The processing subsystem also includes a keyword identification module operatively coupled to the speech recognition module. The keyword identification module is configured to extract one or more first set of keywords from the speech recognized based on a plurality of keyword spotting factors by using one or more keyword spotting techniques. The keyword identification module is also configured to generate one or more transcripts from the one or more first set of keywords extracted. The processing subsystem also includes a transcript broadcasting module operatively coupled to the keyword identification module. The transcript broadcasting module is configured to broadcast the one or more transcripts on a display interface in real-time for one or more user associated activities. The processing subsystem also includes a workflow mechanization module operatively coupled to the transcript broadcasting module. The workflow mechanization module is configured to extract one or more second set of keywords from the transcript based on the one or more user associated activities. The workflow mechanization module is also configured to compare the one or more second set of keywords with one or more predetermined keywords of the transcript associated with one or more domain specific tasks. The workflow mechanization module is also configured to trigger a workflow corresponding to execution of a domain specific task from the one or more domain specific tasks upon comparison of the one or more second set of keywords with the one or more predetermined keywords.
[0018] FIG. 1 is a block diagram of a system (100) for real-time transcription of a web-conference for task execution in accordance with an embodiment of the present disclosure. The system (100) includes a processing subsystem (105) hosted on a server (108). In one embodiment, the server (108) may include a cloud server. In another embodiment, the server (108) may include a local server. The processing subsystem (105) is configured to execute on a network (not shown in FIG. 1) to control bidirectional communications among a plurality of modules. In one embodiment, the network may include a wired network such as local area network (LAN). In another embodiment, the network may include a wireless network such as Wi-Fi, Bluetooth, Zigbee, near field communication (NFC), infra-red communication (RFID) or the like.
[0019] The processing subsystem (105) includes a media data receiving module (110) configured to receive media data corresponding to the web-conference, wherein the media data includes a video content or an audio content. In one embodiment, the media data corresponding to the web-conference is received from one or more media recording devices. In such embodiment, the one or more media recording devices may include, but not limited to a voice recorder of an electronic device, an inbuilt microphone of an electronic device, an image acquisition device and the like. In such embodiment, the electronic device may include at least one of a laptop, a desktop, a mobile phone, a personal digital assistant (PDA), an electronic tablet and the like.
[0020] The processing subsystem (105) also includes a speech recognition module (120) operatively coupled to the media data receiving module (110). The speech recognition module (120) is configured to process the media data received into a pre-defined format using one or more speech processing techniques. As used herein, the term ‘one or more speech processing techniques’ is defined as a technique which deals with study and processing methods of speech signals in digital representation. In one embodiment, the one or more speech processing techniques may include at least one of dynamic time warping technique, hidden Markov model, artificial neural network or a phase ware processing technique. The speech recognition module (120) is also configured to determine a voiceprint of speaker in the web-conference from the media data based on a voice waveform of the speaker. The speech recognition module (120) is also configured to recognize speech from processed media data of the predefined format in real-time by using a speech recognition technique. Again, the term ‘speech recognition technique’ is defined as a technique which enables a machine to identify words spoken aloud and convert them into readable text.
[0021] The processing subsystem (105) also includes a keyword identification module (130) operatively coupled to the speech recognition module (120). In a particular embodiment, the keyword identification module (130) is configured to interpret the speech recognized for extraction of the one or more firs set of keywords using a natural language processing technique (NLP). The keyword identification module (130) is configured to extract one or more first set of keywords from the speech recognized based on a plurality of keyword spotting factors by using one or more keyword spotting techniques. As used herein, the term ‘one or more first set of keywords’ is defined as one or more terms or phrases recognized from speech uttered in the media data upon speech recognition. In one embodiment, the plurality of keyword spotting factors may include at least one of topic detection, fast audio search, voice enablement or a combination thereof. In such embodiment, the one or more keyword spotting techniques may include at least one of a large vocabulary continuous speech recognition (LVCSR) technique, an acoustic keyword spotting technique, a phonetic search keyword spotting technique or a combination thereof. The keyword identification module (130) is also configured to generate one or more transcripts from the one or more first set of keywords extracted. As used herein, the term ‘one or more transcripts’ is defined as one or more records or phrases of the speech uttered by a speaker in a video or an audio.
[0022] The processing subsystem (105) also includes a transcript broadcasting module (140) operatively coupled to the keyword identification module (130). The transcript broadcasting module (140) is configured to broadcast the one or more transcripts on a display interface in real-time for one or more user associated activities. In one embodiment, the display interface may include a graphical user interface (GUI) of the electronic device. In some embodiment, the one or more user associated activities may include, but not limited to, at least one of searching of one or more words from the transcript using external hyperlinks, selection of one or more words from the transcript, one or more advertisements recommendation for the user or a combination thereof.
[0023] The processing subsystem (105) also includes a workflow mechanization module (150) operatively coupled to the transcript broadcasting module (140). The workflow mechanization module (150) is configured to extract one or more second set of keywords from the one or more transcripts based on the one or more user associated activities. As used herein, the term ‘one or more second set of keywords’ is defined as one or more significant words which are extracted from the transcript automatically based on some criteria.
[0024] The workflow mechanization module (150) is also configured to compare the one or more second set of keywords with one or more predetermined keywords of the transcript associated with one or more domain specific tasks. Similarly, the term ‘one or more predetermined keywords’ is defined as one or more keywords which are already selected corresponding to any particular task and prestored in database for future reference. The workflow mechanization module (150) is also configured to trigger a workflow corresponding to execution of a domain specific task from the one or more domain specific tasks upon comparison of the one or more second set of keywords with the one or more predetermined keywords. In a specific embodiment, the one or more domain specific tasks may include, but not limited to, a bank specific task, an industrial task, a medical assistance task, a cooking instruction task, an educational institution specific task and the like. In such embodiment, the bank specific task may include an issue of a chequebook, a balance check task and the like. In another embodiment, the industrial task may include an instruction for running a software code, an instruction for switching off a control system of a machine and the like. In yet another embodiment, the medical assistance task may include an instruction for initiating diagnostic test, an instruction for initiating a surgery and the like. In one embodiment, the cooking instruction task may include addition of a specific ingredient, cooking off a gas stove and the like.
[0025] FIG. 2 is a schematic representation of an exemplary embodiment of a system (100) for real-time transcription of a web-conference for task execution of FIG. 1 in accordance with an embodiment of the present disclosure. Considering an example, in which the system (100) is utilized by a bank. Let’s assume that in the bank, an employee is having an online conversation with a client related to one or more baking offers. Such online conversation or the web-conversation between the client and an employee of the bank is critical and if recorded via any means helps in further processes such as in automation of a task execution. For utilizing the conversation of the web-conference, real-time transcription process is essential so that it could be used further for one or more purposes.
[0026] For receiving recording of media data corresponding to the web-conference, a media data receiving module (110) of the system (100) is utilized. The media data receiving module (110) is located on a processing subsystem (105) which is hosted on a cloud server (108). In the example used herein, the media data includes an audio content. Upon receiving the media data, a speech recognition module (120) of the processing subsystem (105) process the media data received into a pre-defined format using one or more speech processing techniques. For example, the one or more speech processing techniques may include at least one of dynamic time warping technique, hidden Markov model, artificial neural network or a phase ware processing technique. Once, the media data is processed, the speech recognition module (120) recognizes speech from processed media data of the predefined format in real-time by using a speech recognition technique.
[0027] Again, based on the speech recognized, a keyword identification module (130) of the processing subsystem (105) interprets the speech for extraction of the one or more firs set of keywords using a natural language processing technique (NLP). Also, the keyword identification module (130) extracts one or more first set of keywords from the speech recognized based on a plurality of keyword spotting factors by using one or more keyword spotting techniques. For example, the plurality of keyword spotting factors may include at least one of a large vocabulary continuous speech recognition (LVCSR) technique, an acoustic keyword spotting technique, a phonetic search keyword spotting technique or a combination thereof. Further, the keyword identification module (130) is also configured to generate one or more transcripts from the one or more first set of keywords extracted.
[0028] Once, the one or more transcripts are generated, a transcript broadcasting module (140) is configured to broadcast the one or more transcripts on a display interface in real-time for one or more user associated activities. For example, the display interface may include a graphical user interface (GUI) of the electronic device. Here, the one or more user associated activities may include, but not limited to, at least one of searching of one or more words from the transcript using external hyperlinks, selection of one or more words from the transcript, one or more advertisements recommendation for the user or a combination thereof.
[0029] Based on the one or more user associated activities, a workflow mechanization module (150) is configured to extract one or more second set of keywords from the one or more transcripts. Also, the workflow mechanization module (150) is also configured to compare the one or more second set of keywords with one or more predetermined keywords of the one or more transcripts associated with one or more domain specific tasks. Further, upon comparison of the one or more second set of keywords with the one or more predetermined keywords, the workflow mechanization module (150) is also configured to trigger a workflow corresponding to execution of a domain specific task from the one or more domain specific tasks. In the example used herein, as the domain specific task was associated with the bank, so an example of such task includes an issue of a chequebook for client in order to avail an offer or a balance check task for providing the offer to the client and the like. Thus, the system (100) by automating the real-time transcription process from the web-conference further helps in execution of several domain specific tasks in various sectors in a faster and an accurate way.
[0030] FIG. 3 is a block diagram of a computer or a server in accordance with an embodiment of the present disclosure. The server (200) includes processor(s) (230), and memory (210) operatively coupled to the bus (220). The processor(s) (230), as used herein, means any type of computational circuit, such as, but not limited to, a microprocessor, a microcontroller, a complex instruction set computing microprocessor, a reduced instruction set computing microprocessor, a very long instruction word microprocessor, an explicitly parallel instruction computing microprocessor, a digital signal processor, or any other type of processing circuit, or a combination thereof.
[0031] The memory (210) includes several subsystems stored in the form of executable program which instructs the processor (230) to perform the method steps illustrated in FIG. 1. The memory (210) includes a processing subsystem (105) of FIG.1. The processing subsystem (105) further has following modules: a media data receiving module (110), a speech recognition module (120), a keyword identification module (130), a transcript broadcasting module (140) and a workflow mechanization module (150).
[0032] The media data receiving module (110) is configured to receive media data corresponding to the web-conference, wherein the media data comprises a video content or an audio content. The speech recognition module (120) is configured to process the media data received into a pre-defined format using one or more speech processing techniques. The speech recognition module (120) is also configured to recognize speech from processed media data of the predefined format in real-time by using a speech recognition technique. The keyword identification module (130) is configured to extract one or more first set of keywords from the speech recognized based on a plurality of keyword spotting factors by using one or more keyword spotting techniques. The keyword identification module (130) is also configured to generate one or more transcripts from the one or more first set of keywords extracted. The transcript broadcasting module (140) is configured to broadcast the one or more transcripts on a display interface in real-time for one or more user associated activities. The workflow mechanization module (150) is configured to extract one or more second set of keywords from the transcript based on the one or more user associated activities. The workflow mechanization module (150) is also configured to compare the one or more second set of keywords with one or more predetermined keywords of the transcript associated with one or more domain specific tasks. The workflow mechanization module (150) is also configured to trigger a workflow corresponding to execution of a domain specific task from the one or more domain specific tasks upon comparison of the one or more second set of keywords with the one or more predetermined keywords.
[0033] The bus (220) as used herein refers to be internal memory channels or computer network that is used to connect computer components and transfer data between them. The bus (220) includes a serial bus or a parallel bus, wherein the serial bus transmits data in bit-serial format and the parallel bus transmits data across multiple wires. The bus (220) as used herein, may include but not limited to, a system bus, an internal bus, an external bus, an expansion bus, a frontside bus, a backside bus and the like.
[0034] FIG. 4 (a) and FIG. 4 (b) is a flow chart representing the steps involved in a method (300) for real-time transcription of a web-conference for task execution in accordance with an embodiment of the present disclosure. The method (300) includes receiving, by a media data receiving module of a processing subsystem, media data corresponding to the web-conference, wherein the media data comprises a video content or an audio content in step 310. In one embodiment, receiving the media data corresponding to the web-conference may include receiving the media data from one or more media recording devices. In such embodiment, the one or more media recording devices may include, but not limited to a voice recorder of an electronic device, an inbuilt microphone of an electronic device, an image acquisition device and the like.
[0035] The method (300) also includes processing, by a speech recognition module of the processing subsystem, the media data received into a pre-defined format using one or more speech processing techniques in step 320. In one embodiment, processing the media data received into the pre-defined format using the one or more speech processing techniques may include processing the media data using the one or more speech processing techniques which may include at least one of dynamic time warping technique, hidden Markov model, artificial neural network or a phase ware processing technique. The method (300) also includes recognizing, by the speech recognition module of the processing subsystem, speech from processed media data of the predefined format in real-time by using a speech recognition technique in step 330.
[0036] The method (300) also includes extracting, by a keyword identification module of the processing subsystem, one or more first set of keywords from the speech recognized based on a plurality of keyword spotting factors by using one or more keyword spotting techniques in step 340. In one embodiment, extracting the one or more first set of the keywords from the speech recognized may include extracting the one or more first set of the keywords based on at least one of topic detection, fast audio search, voice enablement or a combination thereof. In such embodiment, extracting the first set of the one or more keywords from the speech may include extracting the first set of the one or more keywords using at least one of topic detection, fast audio search, voice enablement or a combination thereof. The method (300) also includes generating, by the keyword identification module of the processing subsystem, one or more transcripts from the one or more first set of keywords extracted in step 350.
[0037] The method (300) also includes broadcasting, by a transcript broadcasting module of the processing subsystem, the one or more transcripts on a display interface in real-time for one or more user associated activities in step 360. In one embodiment, broadcasting the one or more transcripts on the display interface in the real-time may include broadcasting the one or more transcripts on the display interface in the real-time for the one or more user associated activities including, but not limited to, at least one of searching of one or more words from the transcript using external hyperlinks, selection of one or more words from the transcript, one or more advertisements recommendation for the user or a combination thereof.
[0038] The method (300) also includes extracting, by a workflow mechanization module of the processing subsystem, one or more second set of keywords from the transcript based on the one or more user associated activities in step 370. The method (300) also includes comparing, by the workflow mechanization module of the processing subsystem, the one or more second set of keywords with one or more predetermined keywords of the transcript associated with one or more domain specific tasks in step 380. In one embodiment, comparing the one or more second set of keywords with the one or more predetermined keywords of the transcript associated with the one or more domain specific tasks may include comparing the one or more second set of keywords with the one or more predetermined keywords of the transcript associated with including, but not limited to, a bank specific task, an industrial task, a medical assistance task, a cooking instruction task, an educational institution specific task and the like.
[0039] The method (300) also includes triggering, by the workflow mechanization module of the processing subsystem, a workflow corresponding to execution of a domain specific task from the one or more domain specific tasks upon comparison of the one or more second set of keywords with the one or more predetermined keywords in step 390.
[0040] Various embodiments of the present disclosure relate to a system for real-time transcription of the web-conference automatically which not only makes the transcription process faster but also eliminates manual intervention.
[0041] Moreover, the present disclosed system generates smaller size transcripts from the speech uttered in the web-conference which further helps in reducing the size of the transcripts thereby solves the problem of space complexity.
[0042] Furthermore, the present disclosed system also automatically initiates a workflow for execution of a particular task based on the transcripts generated in real-time. Hence, helps in automation of the workflow based on the speech uttered in the web-conference through the transcription process.
[0043] It will be understood by those skilled in the art that the foregoing general description and the following detailed description are exemplary and explanatory of the disclosure and are not intended to be restrictive thereof.
[0044] While specific language has been used to describe the disclosure, any limitations arising on account of the same are not intended. As would be apparent to a person skilled in the art, various working modifications may be made to the method in order to implement the inventive concept as taught herein.
[0045] The figures and the foregoing description give examples of embodiments. Those skilled in the art will appreciate that one or more of the described elements may well be combined into a single functional element. Alternatively, certain elements may be split into multiple functional elements. Elements from one embodiment may be added to another embodiment. For example, the order of processes described herein may be changed and are not limited to the manner described herein. Moreover, the actions of any flow diagram need not be implemented in the order shown; nor do all of the acts need to be necessarily performed. Also, those acts that are not dependent on other acts may be performed in parallel with the other acts. The scope of embodiments is by no means limited by these specific examples.

Documents

Application Documents

#	Name	Date
1	202141024995-STATEMENT OF UNDERTAKING (FORM 3) [04-06-2021(online)].pdf	2021-06-04
2	202141024995-PROOF OF RIGHT [04-06-2021(online)].pdf	2021-06-04
3	202141024995-POWER OF AUTHORITY [04-06-2021(online)].pdf	2021-06-04
4	202141024995-FORM FOR SMALL ENTITY(FORM-28) [04-06-2021(online)].pdf	2021-06-04
5	202141024995-FORM FOR SMALL ENTITY [04-06-2021(online)].pdf	2021-06-04
6	202141024995-FORM 1 [04-06-2021(online)].pdf	2021-06-04
7	202141024995-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [04-06-2021(online)].pdf	2021-06-04
8	202141024995-EVIDENCE FOR REGISTRATION UNDER SSI [04-06-2021(online)].pdf	2021-06-04
9	202141024995-DRAWINGS [04-06-2021(online)].pdf	2021-06-04
10	202141024995-DECLARATION OF INVENTORSHIP (FORM 5) [04-06-2021(online)].pdf	2021-06-04
11	202141024995-COMPLETE SPECIFICATION [04-06-2021(online)].pdf	2021-06-04
12	202141024995-FORM-8 [08-04-2025(online)].pdf	2025-04-08
13	202141024995-FORM 18 [30-05-2025(online)].pdf	2025-05-30