Abstract: The invention relates to method and system for converting digital documents into Augmented Reality (AR) content. The method includes extracting (201) information including a plurality of images and associated text from a digital document; determining (202) an image quality score for the plurality of images using a deep learning model (103a); selecting (203) an AR engine (402) from a set of AR engines based on an weighted image quality score and an augmentation scenario; identifying (204) one or more objects in the digital document, based on an accuracy level of an identification mode, using the selected AR engine (402); and generating (205) binaries based on the information extracted from the digital document and the one or more objects.
[001] Generally, the invention relates to Augmented Reality (AR). More specifically, the invention relates to a method and system for converting digital documents into Augmented Reality (AR) content.
BACKGROUND
[002] Typically, user manuals and help content documents are available either as digital soft copies or physical printed copies. For example, the user manuals and the help content documents related to modern products and machines may be used to understand product features, installation steps, replacement, and repair procedures. A user such as a field support executive needs to follow steps, diagrams and images mentioned in such type of user manuals and help content documents to perform regular checkups, repair, and replacement jobs. In addition to the user manuals and help content documents, the field support executive requires additional training and experience to handle complex scenarios related to processes of products’ replacement and repair and machines’ replacement and repair. Complex machine and products may have hundreds of pages in their user manuals and help content documents. However, searching specific steps for a specific scenario in such type of documents is difficult. Moreover, it is difficult to follow schematic images mentioned in a user guide to compare it with an actual physical object and to understand next step for repair/replacement task.
[003] Further, the physical copies are not friendly and not easy to carry. Moreover, the soft copies are difficult to read and lack in pointing instance specific issues. In the soft copies indexes are difficult to navigate and might end up providing multiple reference. Further, these user manuals are difficult in understanding as images are schematic in nature and are not in-line with an actual product. With the release of a new product version, both types of physical copies and the soft copies of the user manuals become obsolete. Also, maintaining and upgrading of a user manual requires cost and efforts.
[004] The manual process may provide various limitations including low first visit fix rate due to lack of expert knowledge. Further, training and onboarding of a new field support engineer relies on static content and do not give exposure to real world scenarios, thereby increasing learning curve may be obtained. In other words, training a new field support staff is a time-consuming process as some scenarios require experience to complete a job. Further, the limitations include difficulty to communicate with staffs (or area expert engineer) to get further help from remote (ground zero) location, free hands operations are not feasible using such physical and digital user manuals.
SUMMARY
[005] In one embodiment, a method for converting digital documents into Augmented Reality (AR) content is disclosed. The method may include extracting information including a plurality of images and associated text from a digital document. The method may further include determining an image quality score for each of the plurality of images using a deep learning model. It should be noted that the deep learning model may be trained using training data and historical data, to predict quality of an image. The method may further include selecting an AR engine from a set of AR engines based on a weighted image quality score and an augmentation scenario. It should be noted that, the weighted image quality score may be computed based on the image quality score for each of a set of images, from the plurality of images. Moreover, each of the set of images may have the image quality score greater than a pre-defined threshold value. Further, it should be noted that the augmentation scenario is based on the plurality of images and the associated text. The method may further include identifying one or more objects in the digital document, based on an accuracy level of identification mode, using the selected AR engine. The method may further include generating binaries based on the information extracted from the digital document and the identified one or more objects.
[006] In another embodiment, a system for converting digital documents into Augmented Reality (AR) content is disclosed. The system may include a processor and a memory communicatively coupled to the processor. The memory may store processor-executable instructions, which, on execution, may cause the processor to extract information including a plurality of images and associated text from a digital document. The processor-executable instructions, on execution, may further cause the processor to determine an image quality score for each of the plurality of images using a deep learning model. It should be noted that the deep learning model may be trained, using training data and historical data, to predict quality of an image. The processor-executable instructions, on execution, may further cause the processor to select an AR engine from a set of AR engines based on a weighted image quality score and an augmentation scenario. It should be noted that, the weighted image quality score may be computed based on the image quality score for each of a set of images, from the plurality of images. Moreover, each of the set of images may have the image quality score greater than a pre-defined threshold value. Further, it should be noted that the augmentation scenario is based on the plurality of images and the associated text. The processor-executable instructions, on execution, may further cause the processor to identify one or more objects in the digital document, based on an accuracy level of identification mode, using the selected AR engine. The processor-executable instructions, on execution, may further cause the processor to generate binaries based on the information extracted from the digital document and the identified one or more objects.
[007] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
[008] The present application can be best understood by reference to the following description taken in conjunction with the accompanying drawing figures, in which like parts may be referred to by like numerals
[009] FIG. 1 illustrates a functional block diagram of an Augmented Reality (AR) content generation device, in accordance with some embodiments of the present disclosure.
[010] FIG. 2 illustrates a flow diagram of an exemplary process for converting digital documents into Augmented Reality (AR) content, in accordance with some embodiments of the present disclosure.
[011] FIG. 3 illustrates a flow diagram of a simple process flow for converting digital documents into Augmented Reality (AR) content, in accordance with some embodiments of the present disclosure.
[012] FIGS. 4 illustrates a block diagram of an exemplary system for converting digital documents into Augmented Reality (AR) content, in accordance with some embodiments of the present disclosure.
[013] FIGS. 5 illustrates components for converting digital documents into Augmented Reality (AR) content, in accordance with some embodiments of the present disclosure.
[014] FIGS. 6 illustrates a block diagram of an exemplary computer system for implementing embodiments consistent with the present disclosure.
DETAILED DESCRIPTION OF THE DRAWINGS
[015] The following description is presented to enable a person of ordinary skill in the art to make and use the invention and is provided in the context of particular applications and their requirements. Various modifications to the embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the invention. Moreover, in the following description, numerous details are set forth for the purpose of explanation. However, one of ordinary skill in the art will realize that the invention might be practiced without the use of these specific details. In other instances, well-known structures and devices are shown in block diagram form in order not to obscure the description of the invention with unnecessary detail. Thus, the present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.
[016] While the invention is described in terms of particular examples and illustrative figures, those of ordinary skill in the art will recognize that the invention is not limited to the examples or figures described. Those skilled in the art will recognize that the operations of the various embodiments may be implemented using hardware, software, firmware, or combinations thereof, as appropriate. For example, some processes can be carried out using processors or other digital circuitry under the control of software, firmware, or hard-wired logic. (The term “logic” herein refers to fixed hardware, programmable logic and/or an appropriate combination thereof, as would be recognized by one skilled in the art to carry out the recited functions). Software and firmware can be stored on computer-readable storage media. Some other processes can be implemented using analog circuitry, as is well known to one of ordinary skill in the art. Additionally, memory or other storage, as well as communication components, may be employed in embodiments of the invention.
[017] Referring now to FIG. 1, a functional block diagram of an Augmented Reality (AR) content generation device 100 is illustrated, in accordance with some embodiments of the present disclosure. In some embodiments, the AR content generation device 100 may include an information extraction module 102, a score determination module 103, an AR engine selection module 104, an object identification module 105, and a binary generation module 106. Further, the AR content generation device 100 may also include a data store (not shown in FIG. 1) in order to store intermediate results generated by the modules 102-106.
[018] The information extraction module 102 may be configured to extract information from a digital document 101. By way of an example, the digital document 101 may be a user manual, or a troubleshooting guide. The information may include a plurality of images and associated text. Further, the plurality of images may include a two-dimensional (2D) image, a schematic image, and a flowchart. The associated text for a particular image may include labels of one or more components of the particular image and associated description for each of the one or more components. The information extraction module 102 may further process the extracted information to the score determination module 103.
[019] The score determination module 103 may be configured to determine an image quality score for each of the plurality of images. In particular, to determine the image quality score, the score determination module 103 may include a deep learning model 103a. The deep learning model 103a may be trained using training data and historical data. The deep learning model 103a may predict quality of the image based on a color of the image, a size of the image, a resolution of the image, and so forth. The score determination module 103 may be communicatively coupled to the AR engine selection module 104.
[020] The AR engine selection module 104 may be configured to select an AR engine from a set of AR engines. It should be noted that the AR engine selection module 104 may select the AR engine based on a weighted image quality score and an augmentation scenario. In some embodiments, the weighted image quality score may be an average image quality score. In some embodiments, the selected AR engine may be a marker-based engine. In some other embodiments, the selected AR engine may be a marker less engine. For example, the marker-based engine may identify the one or more objects using a computer vision-based approach. The marker less engine may identify the one or more objects using a machine learning model. Further, the AR engine selection module 104 may be communicatively coupled to the object identification module 105.
[021] The marker-based AR engine uses an approach based on scanning a target object/image and identifying markers from it. The marker-based AR engine may identify appropriate markers matches and augment virtual object on the top of camera scene of scanned image/object. This approach works on both type of targets like real object scanning as well as image scanning. An augmented virtual object may be a 2D or a 3D image. The AR content generation device 100 uses 2D image or real object as marker and augments 2D images as virtual object based on object identification during the scanning. The marker-based AR engine uses algorithms such as Fast Library for Approximate Nearest Neighbors (FLANN) and Brute-Force.
[022] The marker less AR engine is based on the machine learning model. The marker less AR engine may be created and trained from source images extracted from digital documents. Further, user experience and accuracy may be increased by feeding additional images, where useful images may be captured in real time. The marker less AR engine may augment text and images based on object identification. Approaches used in are Image classifications and Object Identifications.
[023] The object identification module 105 may be configured to identify one or more objects in the digital document 101, using the selected AR engine. For identification of the one or more objects, the object identification module 105 may consider an accuracy level of identification mode. Further, the object identification module 105 may be operatively connected to the binary generation module 106.
[024] The binary generation module 106 may be configured to generate binaries based on the information extracted from the digital document 101 and the identified one or more objects. In some embodiments, the AR content generation device 100 may determine sufficiency of the plurality of images and the associated text for output (AR content) 108 generation. Moreover, in some embodiments, additional inputs from a user 107 may be received by the AR content generation device 100. The additional inputs may include metadata of the one or more objects in the image to identify a plurality of components of the image during object identification. Thus, in some embodiments, the binaries may be generated based on the metadata provided by the user 107.
[025] In some embodiments, the user 107 may provide input to the AR content generation device 100 on one or more of an augmentation type, a processing option, and the accuracy level of identification mode. The augmentation type may include a text augmentation and an image augmentation. The accuracy level of identification mode may be a strict accuracy level of identification mode, a medium accuracy level of identification mode and a low accuracy level of identification mode. Further, the processing option may include an on-cloud processing option and an offline processing option.
[026] The AR content generation device 100 may receive the digital document 101 and convert the digital document 101 automatically into the AR content 108. Further, the AR content generation device 100 extracts images to build an AR database. The AR content generation device 100 identifies an appropriate AR engine for object identification based on training images and augmentation scenarios. In some embodiments, the AR content generation device 100 may help in generating a mobile application that may leverage augmented reality capabilities with object identification, text processing, augmentation, and communication. The generated mobile application may work with a processing power of on-cloud or native embedded (i.e., offline) mode. Once the mobile application is generated, the generated mobile may provide an advantage by enabling end-users (e.g., technician) to fetch information related to any components (i.e., a product or a service). By way of an example, the information fetched for a component, i.e., a product (for example: a smart watch) may include a plurality of steps to set-up the smart watch.
[027] It should be noted that the AR content generation device 100 may be implemented in programmable hardware devices such as programmable gate arrays, programmable array logic, programmable logic devices, or the like. Alternatively, the AR content generation device 100 may be implemented in software for execution by various types of processors. An identified engine/module of executable code may, for instance, include one or more physical or logical blocks of computer instructions which may, for instance, be organized as a component, module, procedure, function, or other construct. Nevertheless, the executables of an identified engine/module need not be physically located together but may include disparate instructions stored in different locations which, when joined logically together, comprise the identified engine/module and achieve the stated purpose of the identified engine/module. Indeed, an engine or a module of executable code may be a single instruction, or many instructions, and may even be distributed over several different code segments, among different applications, and across several memory devices.
[028] As will be appreciated by one skilled in the art, a variety of processes may be employed for converting digital documents into Augmented Reality (AR) content. For example, the exemplary AR content generation device 100 may automatically convert digital documents into the AR content, by the process discussed herein. In particular, as will be appreciated by those of ordinary skill in the art, control logic and/or automated routines for performing the techniques and steps described herein may be implemented by the AR content generation device 100 either by hardware, software, or combinations of hardware and software. For example, suitable code may be accessed and executed by the one or more processors on the AR content generation device 100 to perform some or all of the techniques described herein. Similarly, application specific integrated circuits (ASICs) configured to perform some or all the processes described herein may be included in the one or more processors of the AR content generation device 100.
[029] Referring now to FIG. 2, an exemplary process 200 for converting digital documents into Augmented Reality (AR) content is depicted, in accordance with some embodiments of the present disclosure. Each step of the process 200 may be performed by various modules of an AR content generation device (similar to the AR content generation device 100). FIG. 2 is explained in conjunction with FIG. 1.
[030] At step 201, information may be extracted from a digital document (for example, the digital document 101) using an information extraction module (same as the information extraction module 102). The information may include a plurality of images and associated text. Further, the plurality of images may include, but is not limited to, a two-dimensional (2D) image, a schematic image, and a flowchart. The associated text for a particular image may include, but is not limited to, labels of one or more components/objects of the particular image and associated description for each of the one or more components/objects.
[031] At step 202, an image quality score may be determined for each of the plurality of images using a deep learning model of a score determination module (same as the deep learning model 103a of the score determination module 103). As explained in FIG. 1, the deep learning model may be trained using training data and historical data, to predict quality of the plurality of an image.
[032] At step 203, an AR engine from a set of AR engines may be selected using an AR engine selection module (similar to the AR engine selection module 104). The AR engine may be selected based on a weighted image quality score and an augmentation scenario. In an embodiment, the weighted image quality score may be computed based on the image quality score for each of a set of images, from the plurality of images. Moreover, each of the set of images may have the image quality score greater than a pre-defined threshold value.
[033] By way of an example, suppose the pre-defined threshold value may be defined to be ‘5’. Based on the defined threshold value, each of the set of images from the plurality of images having the image quality score greater than the pre-defined threshold value, i.e., ‘5’ may be selected for computing the weighted image quality score. Once each of the set of images are selected, then the image quality score associated with each of the set of images may be employed to compute the weighted image quality score. Further, the augmentation scenario may be based on the plurality of images and the associated text. For example, the augmentation scenario used for selecting the AR engine may be selected based on an augmentation of at least one of the plurality of images, the text associated with each of the plurality of images, or a combination of each of the plurality of images along with the associated text. Further, based on the computed weighted image quality score and the selected augmentation scenario, the AR engine may be selected.
[034] In an embodiment, the AR engine may be at least one of a marker-based engine and a marker less engine. In some embodiments, the marker-based engine may be selected. The marker-based engine may identify the one or more objects using a computer vision-based approach. In the computer vision-based approach, the AR engine may prepare a single database by storing collection of markers, and the one or more objects may be identified via the single database. In some other embodiments, the marker less engine may be selected. The marker less engine may identify the one or more objects using a machine learning model. Moreover, when the AR engine is the marker less engine, then the AR engine may create the machine learning model. The created machine learning model may be used to identify the one or more objects. As will be appreciated, for each of the digital document, one AR engine may be selected. Once the AR engine is selected, the selected AR engine may be used as default engine for performing further process.
[035] Thereafter, at step 204, one or more objects in the digital document may be identified using an object identification module (analogous to the object identification module 106). The object identification may be performed based on an accuracy level of identification mode. It should be noted that the object identification module may use the selected AR engine for identifying the one or more objects.
[036] At step 205, binaries may be generated based on the information extracted from the digital document and the identified one or more objects. A binary generation module (analogous to the binary generation module 106) may be employed for generating the binaries. In some embodiments, it may be checked if the plurality of images and the associated text are sufficient or not, for AR content generation. In case the plurality of images and the associated text are insufficient, additional inputs may be received from the user. The additional inputs may include metadata of the one or more objects in the image to identify a plurality of components/objects of the image during object identification. Therefore, in some embodiments, the metadata provided by the user may also be considered to generate the binaries.
[037] Further, in some embodiments, an input may be received from a user on one or more of an augmentation type, a processing option, and the accuracy level of identification mode. The augmentation type may include a text augmentation and an image augmentation. The accuracy level of identification mode may include a strict accuracy level of identification mode, a medium accuracy level of identification mode and a low accuracy level of identification mode. The processing option may include an on-cloud processing option and an offline processing option.
[038] Referring now to FIG. 3, a simple process flow 200 for converting digital documents into Augmented Reality (AR) content is depicted, in accordance with some embodiments of the present disclosure. At step 301, a digital document may be uploaded in an AR content generation device. In reference to FIG. 1, the AR content generation device may correspond to the AR content generation device 100. Once the digital document gets uploaded, at step 302, the uploaded digital document may be pre-processed. The digital document may be pre-processed to extract information from the digital document. The information that needs to be extracted may include a plurality of images and associated text.
[039] Upon extracting the information, at step 303, an images quality score may be computed for each of the plurality of images. The image quality score may be computed using a deep learning model. As explained above, the deep learning model may be trained using training data and historical data, to predict quality of each of the plurality of an image. Further, the image quality score may be computed for each of the plurality of images based on the quality of each of the plurality of images. Once the image quality score is computed for each of the plurality of images, the set of images from the plurality of images may be selected based on the pre-defined threshold value. It should be noted that, the image quality score associated with each of the set of images may be greater than the pre-defined threshold value. Upon selecting the set of images, an overall image quality score (also referred as the weighted image quality score) may be computed. In an embodiment, the overall image quality score may be computed based on the image quality score associated with each of the set of images.
[040] Once the overall image quality score is computed, at step 304, an AR engine may be selected from the set of AR engines. The AR engine may be selected based on the image quality score and an augmentation scenario. The augmentation scenario may be selected based on an augmentation of at least one of the plurality of images, the text associated with each of the plurality of images, or a combination of each of the plurality of images along with the associated text. Once the AR engine is selected, one or more objects may be identified in the digital document, based on an accuracy level of an identification mode, using the selected AR engine. In an embodiment, the AR engine may be at least one of the marker-based engine and the marker less engine. In some embodiments, the marker-based engine may be selected. The marker-based engine may identify the one or more objects using the computer vision-based approach. In the present embodiment, when the AR engine is the marker-based engine, the AR engine may prepare the single database by storing collection of markers, and the one or more objects may be identified via the single database using the computer vision-based approach. In some other embodiments, the marker less engine may be selected. The marker less engine may identify the one or more objects using a machine learning model. In present embodiment, when the AR engine is the marker less engine, the AR engine may create the machine learning model, and the one or more objects may be identified using the machine learning model. As will be appreciated, for each of the digital document, one AR engine may be selected. Further, at step 305, binaries may be generated using the selected AR engine. The binaries may be generated based on the extracted information and the one or more objects.
[041] Referring now to FIGS. 4, an exemplary system 400 for converting a digital document into Augmented Reality (AR) content is illustrated, in accordance with some embodiments of the present disclosure. FIG. 4 is illustrated in conjunction with FIGS. 1-3. The system 400 may include a digital document 401, AR engines 402, a target physical device 403. The digital document 401 may further include images/schematic images 401a, and associated text/labels 401b.
[042] The system 400 may include a content parser and an image simulator which are explained in detail in conjunction with FIG. 5. The content parser may parse content of the digital document 401. The AR engines 402 may include a marker less engine and a marker-based engine which may be selected based on image quality. Further, the target physical device 403 may include, but are not limited to, a mobile device, a laptop, a tablet, a computer, and smart glasses. The system 400 provides augmentation features 404 including augmented text and steps, augmented images, pop-up detail view, drawing and sketch, audio connect, and video connect.
[043] Images and associated text from the digital document 401 may be extracted and stored in a database. Further, the marker-based engine or the marker less engine may be used to build an object identification model after refining images and mapping text. The object identification model may be used to process live scenes and display relevant augmented text and images on the target physical device 403.
[044] In some embodiments, the system 400 may use an ARCore technology, an ARKit technology, an OpenCV technology, a TensorFlow technology, and the like. Further, platforms used by the system 400 may include, but are not limited to, Android, iOS, Vuforia, and Unity.
[045] Referring now to FIG. 5, components for converting digital documents into Augmented Reality (AR) content are illustrated, in accordance with some embodiments of the present disclosure. FIG. 5 is explained in conjunction with FIGS. 1-4. The components may include a web admin panel 501, a marker less AR engine 502, a marker-based AR engine 503, and a mobile application 504.
[046] The web admin panel 501 may include user management, content upload, image extraction and simulation, model builder, and Android Application Package (APK) builder. Further the web admin panel 501 may allow the user management under which options to create projects and admin account may be provided to users. The web admin panel 501 provides a dashboard and reports including content uploaded, images extracted, models built, and APKs downloaded.
[047] Further, content of digital documents (such as, user manuals) in different file formats may be uploaded in the web admin panel 501. Further, details including, not limited to, total number of pages, images in the uploaded content may be displayed. Moreover, in some embodiments, images and corresponding text data may be extracted from the digital documents that may be further mapped with extracted labels and displayed in the web admin panel 501. The users may be provided with options to modify the labels and corresponding text information for any image.
[048] For requirement of modelling, the web admin panel 501 may be integrated with third party image annotation tools. The users may avail these options to label particular objects in the images for object identification models. By way of an example, the user may submit a form after reviewing the extracted images, associated labels, and mapped content for updating. On submission, image simulation may happen in a server with techniques, such as greyscale, rotate, and re-size options for machine learning models.
[049] Further, the extracted and simulated images may be then sent to different online cloud services based on a chosen AR engine for database preparation. For example, for a marker-based AR engine, models may be created for target identification. For marker less AR engine, option is available for users to choose either image classification model or object identification model. The marker less engine (i.e., machine learning ML models) may be created in Graphical Processing Unit (GPU) machines. The provision and relevant code execution for model training is achieved by inbuild workflow automation.
[050] Further, prepared databases and trained models may be downloaded automatically in a web admin panel server which may be used for building APKs using supported AR Integrated Development Environments (IDEs). Once APK building process is completed, a user may download the APK.
[051] The marker less engine (i.e., a machine learning (ML) engine) 502 may identify objects in the target image using a deep learning model. Based on available number of images per steps, choices of image classification and Object identification may be provided building the model. For example, for object detections expected images are more than images expected for image classification. It should be noted that TensorFlow Libraries may be utilized in the marker less AR engine as the TensorFlow object detection works on a concept of deep machine learning and computer vision.
[052] The marker-based AR engine 503 uses computer vision-based libraries to detect markers in a target image and further match it with images within an associated database. An AR engine may be chosen based on image quality that may be verified by calculating an image quality score. A custom marker-based AR engine may be developed using OpenCV with Java and python implementation for native and online support, respectively. Further, target identification algorithms like FLANN and BRUTE Force may be used in the marker-based approach to identify the scanned object to suggest appropriate solution. The marker-based engine may include a FLANN based matcher, and/or Brute-Force matcher.
[053] The APK downloaded from the web admin panel 501 may be installed in any mobile as an AR Application. Once installed, the AR application provides various features to the users including settings and configuration, 2D/3D image augmentation, and text augmentation The users may select text augmentation or 2D/3D image augmentation or combination of both. Further, different identification modes such as strict accuracy level of identification, medium accuracy level of identification, and Low accuracy level of identification are available for object or image identification.
[054] The disclosed methods and systems may be implemented on a conventional or a general-purpose computer system, such as a personal computer (PC) or server computer. Referring now to FIG. 6, an exemplary computing system 600 that may be employed to implement processing functionality for various embodiments (e.g., as a SIMD device, client device, server device, one or more processors, or the like) is illustrated. Those skilled in the relevant art will also recognize how to implement the invention using other computer systems or architectures. The computing system 600 may represent, for example, a user device such as a desktop, a laptop, a mobile phone, personal entertainment device, DVR, and so on, or any other type of special or general-purpose computing device as may be desirable or appropriate for a given application or environment. The computing system 600 may include one or more processors, such as a processor 601 that may be implemented using a general or special purpose processing engine such as, for example, a microprocessor, microcontroller or other control logic. In this example, the processor 601 is connected to a bus 602 or other communication medium. In some embodiments, the processor 601 may be an AI processor, which may be implemented as a Tensor Processing Unit (TPU), or a graphical processor unit, or a custom programmable solution Field-Programmable Gate Array (FPGA).
[055] The computing system 600 may also include a memory 603 (main memory), for example, Random Access Memory (RAM) or other dynamic memory, for storing information and instructions to be executed by the processor 601. The memory 603 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by the processor 601. The computing system 600 may likewise include a read only memory (“ROM”) or other static storage device coupled to bus 602 for storing static information and instructions for the processor 601.
[056] The computing system 600 may also include a storage device 604, which may include, for example, a media drives 605 and a removable storage interface. The media drive 605 may include a drive or other mechanism to support fixed or removable storage media, such as a hard disk drive, a floppy disk drive, a magnetic tape drive, an SD card port, a USB port, a micro-USB, an optical disk drive, a CD or DVD drive (R or RW), or other removable or fixed media drive. A storage media 606 may include, for example, a hard disk, magnetic tape, flash drive, or other fixed or removable medium that is read by and written to by the media drive 605. As these examples illustrate, the storage media 606 may include a computer-readable storage medium having stored there in particular computer software or data.
[057] In alternative embodiments, the storage devices 604 may include other similar instrumentalities for allowing computer programs or other instructions or data to be loaded into the computing system 600. Such instrumentalities may include, for example, a removable storage unit 607 and a storage unit interface 608, such as a program cartridge and cartridge interface, a removable memory (for example, a flash memory or other removable memory module) and memory slot, and other removable storage units and interfaces that allow software and data to be transferred from the removable storage unit 607 to the computing system 600.
[058] The computing system 600 may also include a communications interface 609. The communications interface 609 may be used to allow software and data to be transferred between the computing system 600 and external devices. Examples of the communications interface 609 may include a network interface (such as an Ethernet or other NIC card), a communications port (such as for example, a USB port, a micro USB port), Near field Communication (NFC), etc. Software and data transferred via the communications interface 609 are in the form of signals which may be electronic, electromagnetic, optical, or other signals capable of being received by the communications interface 609. These signals are provided to the communications interface 609 via a channel 610. The channel 610 may carry signals and may be implemented using a wireless medium, wire or cable, fiber optics, or other communications medium. Some examples of the channel 610 may include a phone line, a cellular phone link, an RF link, a Bluetooth link, a network interface, a local or wide area network, and other communications channels.
[059] The computing system 600 may further include Input/Output (I/O) devices 611. Examples may include, but are not limited to a display, keypad, microphone, audio speakers, vibrating motor, LED lights, etc. The I/O devices 611 may receive input from a user and also display an output of the computation performed by the processor 601. In this document, the terms “computer program product” and “computer-readable medium” may be used generally to refer to media such as, for example, the memory 603, the storage devices 604, the removable storage unit 607, or signal(s) on the channel 610. These and other forms of computer-readable media may be involved in providing one or more sequences of one or more instructions to the processor 601 for execution. Such instructions, generally referred to as “computer program code” (which may be grouped in the form of computer programs or other groupings), when executed, enable the computing system 600 to perform features or functions of embodiments of the present invention.
[060] In an embodiment where the elements are implemented using software, the software may be stored in a computer-readable medium and loaded into the computing system 600 using, for example, the removable storage unit 607, the media drive 605 or the communications interface 609. The control logic (in this example, software instructions or computer program code), when executed by the processor 601, causes the processor 601 to perform the functions of the invention as described herein.
[061] Thus, the present disclosure may overcome drawbacks of traditional systems discussed before. The disclosed method and system in the present disclosure may be a better centralize, auto focus, hands free and vision-based solution. Further, the disclosed system is efficient, time and cost effective due to its reusability aspect (i.e., the system uses existing digital user guide and content). The disclosed system has its own custom AR engine having capability to use third party AR engines like Vuforia, AR Core and Wikitude.
[062] It will be appreciated that, for clarity purposes, the above description has described embodiments of the invention with reference to different functional units and processors. However, it will be apparent that any suitable distribution of functionality between different functional units, processors or domains may be used without detracting from the invention. For example, functionality illustrated to be performed by separate processors or controllers may be performed by the same processor or controller. Hence, references to specific functional units are only to be seen as references to suitable means for providing the described functionality, rather than indicative of a strict logical or physical structure or organization.
[063] Although the present invention has been described in connection with some embodiments, it is not intended to be limited to the specific form set forth herein. Rather, the scope of the present invention is limited only by the claims. Additionally, although a feature may appear to be described in connection with particular embodiments, one skilled in the art would recognize that various features of the described embodiments may be combined in accordance with the invention.
[064] Furthermore, although individually listed, a plurality of means, elements or process steps may be implemented by, for example, a single unit or processor. Additionally, although individual features may be included in different claims, these may possibly be advantageously combined, and the inclusion in different claims does not imply that a combination of features is not feasible and/or advantageous. Also, the inclusion of a feature in one category of claims does not imply a limitation to this category, but rather the feature may be equally applicable to other claim categories, as appropriate.
CLAIMS
What is claimed is:
1. A method for converting digital documents into Augmented Reality (AR) content, the method comprising:
extracting (201), by an AR content generation device (100), information comprising a plurality of images and associated text from a digital document;
determining (202), by the AR content generation device (100), an image quality score for each of the plurality of images using a deep learning model (103a), wherein the deep learning model (103a) is trained, using training data and historical data, to predict quality of the plurality of images;
selecting (203), by the AR content generation device (100), an AR engine (402) from a set of AR engines based on a weighted image quality score and an augmentation scenario, wherein the weighted image quality score is computed based on the image quality score for each of a set of images, from the plurality of images, wherein each of the set of images has the image quality score greater than a pre-defined threshold value, and wherein the augmentation scenario is based on the plurality of images and the associated text;
identifying (204), by the AR content generation device (100), one or more objects in the digital document, based on an accuracy level of an identification mode, using the AR engine (402); and
generating (205), by the AR content generation device (100), binaries based on the information extracted from the digital document and the one or more objects.
2. The method as claimed in claim 1, further comprising:
determining if the plurality of images and the associated text are insufficient for AR content generation; and
receiving additional inputs from a user, wherein the additional inputs comprise metadata of the one or more objects in an image to identify a plurality of components of the image during object identification, and wherein the binaries are generated based on the metadata provided by the user.
3. The method as claimed in claim 1, wherein the AR engine (402) comprises one of a marker-based engine and a marker less engine, and wherein the marker-based engine identifies the one or more objects using a computer vision-based approach, and the marker less engine identifies the one or more objects using a machine learning model.
4. The method as claimed in claim 1, wherein the plurality of images comprises a two-dimensional (2D) image, a schematic image, and a flowchart, and wherein the associated text for a particular image comprises labels of one or more components of the particular image and associated description for each of the one or more components.
5. The method as claimed in claim 1, further comprising receiving an input, from a user, on one or more of an augmentation type, a processing option, and the accuracy level of the identification mode, wherein the augmentation type comprises a text augmentation and an image augmentation, and wherein the accuracy level of the identification mode comprises a strict accuracy level of the identification mode, a medium accuracy level of the identification mode, and a low accuracy level of the identification mode, and wherein the processing option comprises an on-cloud processing option and an offline processing option.
6. A system for converting digital documents into Augmented Reality (AR) content, the system comprising:
a processor (601); and
a memory (603) communicatively coupled to the processor, wherein the memory (603) stores processor-executable instructions, which, on execution, cause the processor (601) to:
extract (201) information comprising a plurality of images and associated text from a digital document;
determine (202) an image quality score for each of the plurality of images using a deep learning model (103a), wherein the deep learning model (103a) is trained, using training data and historical data, to predict quality of the plurality of images;
select (203) an AR engine (402) from a set of AR engines based on a weighted image quality score and an augmentation scenario, wherein the weighted image quality score is computed based on the image quality score for each of a set of images, from the plurality of images, wherein each of the set of images has the image quality score greater than a pre-defined threshold value, and wherein the augmentation scenario is based on the plurality of images and the associated text;
identify (204) one or more objects in the digital document, based on an accuracy level of an identification mode, using the AR engine (402); and
generate (205) binaries based on the information extracted from the digital document and the one or more objects.
7. The system as claimed in claim 6, wherein the processor-executable instructions further cause the processor (601) to:
determine if the plurality of images and the associated text are insufficient for AR content generation; and
receive additional inputs from a user, wherein the additional inputs comprise metadata of the one or more objects in an image to identify a plurality of components of the image during object identification, and wherein the binaries are generated based on the metadata provided by the user.
8. The system as claimed in claim 6, wherein the AR engine (402) comprises one of a marker-based engine and a marker less engine, and wherein the marker-based engine identifies the one or more objects using a computer vision-based approach, and the marker less engine identifies the one or more objects using a machine learning model.
9. The system as claimed in claim 6, wherein the plurality of images comprises a two-dimensional (2D) image, a schematic image, and a flowchart, and wherein the associated text for a particular image comprises labels of one or more components of the particular image and associated description for each of the one or more components.
10. The system as claimed in claim 6, wherein the processor-executable instructions further cause the processor (601) to receive an input, from a user, on one or more of an augmentation type, a processing option, and the accuracy level of the identification mode, wherein the augmentation type comprises a text augmentation and an image augmentation, and wherein the accuracy level of the identification mode comprises a strict accuracy level of the identification mode, a medium accuracy level of the identification mode and a low accuracy level of the identification mode, and wherein the processing option comprises an on-cloud processing option and an offline processing option.
| # | Name | Date |
|---|---|---|
| 1 | 202211030068-CLAIMS [20-03-2023(online)].pdf | 2023-03-20 |
| 1 | 202211030068-STATEMENT OF UNDERTAKING (FORM 3) [25-05-2022(online)].pdf | 2022-05-25 |
| 2 | 202211030068-CORRESPONDENCE [20-03-2023(online)].pdf | 2023-03-20 |
| 2 | 202211030068-REQUEST FOR EXAMINATION (FORM-18) [25-05-2022(online)].pdf | 2022-05-25 |
| 3 | 202211030068-REQUEST FOR EARLY PUBLICATION(FORM-9) [25-05-2022(online)].pdf | 2022-05-25 |
| 3 | 202211030068-DRAWING [20-03-2023(online)].pdf | 2023-03-20 |
| 4 | 202211030068-PROOF OF RIGHT [25-05-2022(online)].pdf | 2022-05-25 |
| 4 | 202211030068-FER_SER_REPLY [20-03-2023(online)].pdf | 2023-03-20 |
| 5 | 202211030068-POWER OF AUTHORITY [25-05-2022(online)].pdf | 2022-05-25 |
| 5 | 202211030068-OTHERS [20-03-2023(online)].pdf | 2023-03-20 |
| 6 | 202211030068-FORM-9 [25-05-2022(online)].pdf | 2022-05-25 |
| 6 | 202211030068-FER.pdf | 2022-09-21 |
| 7 | 202211030068-FORM 18 [25-05-2022(online)].pdf | 2022-05-25 |
| 7 | 202211030068-COMPLETE SPECIFICATION [25-05-2022(online)].pdf | 2022-05-25 |
| 8 | 202211030068-FORM 1 [25-05-2022(online)].pdf | 2022-05-25 |
| 8 | 202211030068-DECLARATION OF INVENTORSHIP (FORM 5) [25-05-2022(online)].pdf | 2022-05-25 |
| 9 | 202211030068-DRAWINGS [25-05-2022(online)].pdf | 2022-05-25 |
| 9 | 202211030068-FIGURE OF ABSTRACT [25-05-2022(online)].jpg | 2022-05-25 |
| 10 | 202211030068-DRAWINGS [25-05-2022(online)].pdf | 2022-05-25 |
| 10 | 202211030068-FIGURE OF ABSTRACT [25-05-2022(online)].jpg | 2022-05-25 |
| 11 | 202211030068-DECLARATION OF INVENTORSHIP (FORM 5) [25-05-2022(online)].pdf | 2022-05-25 |
| 11 | 202211030068-FORM 1 [25-05-2022(online)].pdf | 2022-05-25 |
| 12 | 202211030068-COMPLETE SPECIFICATION [25-05-2022(online)].pdf | 2022-05-25 |
| 12 | 202211030068-FORM 18 [25-05-2022(online)].pdf | 2022-05-25 |
| 13 | 202211030068-FER.pdf | 2022-09-21 |
| 13 | 202211030068-FORM-9 [25-05-2022(online)].pdf | 2022-05-25 |
| 14 | 202211030068-OTHERS [20-03-2023(online)].pdf | 2023-03-20 |
| 14 | 202211030068-POWER OF AUTHORITY [25-05-2022(online)].pdf | 2022-05-25 |
| 15 | 202211030068-FER_SER_REPLY [20-03-2023(online)].pdf | 2023-03-20 |
| 15 | 202211030068-PROOF OF RIGHT [25-05-2022(online)].pdf | 2022-05-25 |
| 16 | 202211030068-DRAWING [20-03-2023(online)].pdf | 2023-03-20 |
| 16 | 202211030068-REQUEST FOR EARLY PUBLICATION(FORM-9) [25-05-2022(online)].pdf | 2022-05-25 |
| 17 | 202211030068-CORRESPONDENCE [20-03-2023(online)].pdf | 2023-03-20 |
| 17 | 202211030068-REQUEST FOR EXAMINATION (FORM-18) [25-05-2022(online)].pdf | 2022-05-25 |
| 18 | 202211030068-STATEMENT OF UNDERTAKING (FORM 3) [25-05-2022(online)].pdf | 2022-05-25 |
| 18 | 202211030068-CLAIMS [20-03-2023(online)].pdf | 2023-03-20 |
| 19 | 202211030068-PatentCertificate16-10-2025.pdf | 2025-10-16 |
| 20 | 202211030068-IntimationOfGrant16-10-2025.pdf | 2025-10-16 |
| 1 | SearchHistory_202211030068AE_17-12-2023.pdf |
| 1 | Search_202211030068E_21-09-2022.pdf |
| 2 | SearchHistory_202211030068AE_17-12-2023.pdf |
| 2 | Search_202211030068E_21-09-2022.pdf |