Abstract: A CONTROLLER AND METHOD TO AUTOMATICALLY GENERATE METADATA FROM AT LEAST ONE IMAGE Abstract The controller 100 receives at least one image and pre-process the at least one image to enhance them using image processing module 102, characterized in that, the controller 100 identifies the number of screens in the processed images using a screen detection module 104. In the identified screens, user interface components are detected using an object detection module 106. By removing the identified screens and detected user interface components, directional flow line with a starting and ending point coordinates is detected using object detection module 106. Detected screens are grouped with the detected user interface components and paired with the flowlines. The controller 100 and method automatically generated metadata for the detected at least one image. Figure 1
Description:Complete Specification:
The following specification describes and ascertains the nature of this invention and the manner in which it is to be performed.
Field of the invention:
[0001] The present invention relates to a controller and method to automatically generate metadata from at least one image.
Background of the invention:
[0002] A User Interface (UI) flow diagram is a diagrammatic representation consisting of sequence of screens that a user goes through while using a software application or website, wherein those screens comprise of virtual buttons, components of different colors, sizes, shapes, etc. In a software development workflow, these representations are given by the User Interface & Experience (UI/UX) teams in the form of specifications. These specifications play a crucial role or form an integral part to different stakeholders associated with certain project. However, due to manual interpretation, these specifications are interpreted differently than required by stakeholders at different stages of the project and these pose as a challenge.
[0003] According to a prior art US2019250891, techniques are disclosed for automating GUI development from a Graphical user interface (GUI) screen image that includes text information and one or more graphic user interface components. The GUI screen image is analysed to extract text information and to identify the UI components included in the GUI screen. One or more text regions in the GUI screen image are detected and are replaced with placeholders. Images of one or more graphic user interface components in the GUI screen are extracted from the GUI screen image and are classified using a machine learning-based classifier. A GUI model is generated for the GUI based upon the classification results, locations of the one or more text regions, and locations of the one or more graphic user interface components. The generated model can then be used to generate one or more implementations (e.g., executable code) of the GUI, possibly for various platforms in different programming languages.
Brief description of the accompanying drawings:
[0004] An embodiment of the disclosure is described with reference to the following accompanying drawings.
[0005] Fig. 1 illustrates a block diagram of a controller to automatically generate metadata from at least one image, according to an embodiment of the present invention;
[0006] Fig. 2 illustrates different stages performed throughout the process of automatic generation of metadata, according to an embodiment of the present invention, and
[0007] Fig. 3 illustrates a flow diagram of a method for automatically generating metadata from at least one image, according to the present invention.
Detailed description of the embodiments:
[0008] Fig. 1 illustrates a block diagram of a controller 100 to automatically generate metadata from at least one image, according to an embodiment of the present invention. The controller 100 is configured to receive at least one image/document/file, , as an input in any form or combination, for example, png, jpeg, jpg, pdf, screen shot, snippet, cropped image or any other known or new formats. Here, the received at least one image gives an overall understanding of whole workflow that would happen if a user interacted with any of the UI components present in the screen. The received at least one image is the diagrammatic representation consisting of sequence of UI screens that the user goes through while using a software application or website or mobile application or a webpage, etc., which are received as an input for the processing across different stages. Each image in at least one image has plurality of UI screens. In simple words, the images represent the Graphical User Interface (GUI) or User Interface (UI) screens for a specific application.
[0009] Once at least one image is received, the controller 100 is configured to pre-process the received at least one image to enhance the images using the image processing module 102. The image processing module 102 comprises removal of noise, enhance edges, etc. using some methods such as but not limited to an image resizing, image smoothing, etc. The image processing module 102 enhances or improves the quality of the at least one image in several ways which makes them more suitable for further processing with computer vision algorithms. The image processing module 102 simplifies the overall structure of the at least one image, which improves the accuracy and speed of the algorithms used to extract feature and recognizes pattern in images.
[0010] Once pre-processing is done, characterized in that, the controller 100 configured to identify the number of different UI screens present in the processed at least one image, using a screen detection module 104 (also known as contour detection module), which makes grouping of UI/UX component with respective screens easy. The detection of UI screen using the screen detection module 104 gives proper navigation direction when user interacts with UI element present in the UI screen. The screen detection module 104 is a fine-tuned deep learning model trained on plurality of images received as an input used for detecting various UI screens present in the received at least one image. Upon successful screen identification, the controller 100 configured to detect user interface component in different identified UI screens of the at least one using an object detection module 106. Here, along with the identification, UI component with parent screen is mapped and pixel wise information of each component is calculated. Upon components detection, color and text associated with the detected component is identified and information associated UI components is mapped. The results of component detection are validated by object detection model 106 which is finetuned using a neural network. The neural network predicts/estimates bounding boxes and class probabilities for objects in the image. The object detection model 106 divides the image into grid and makes predictions for each grid cell, which allows it to efficiently detect objects in real time.
[0011] Once the UI screen identification and UI component detection is done, the controller 100 is configured to detect the directional flow lines with a starting point and an ending point coordinate using the object detection module 106 by removing the identified UI screen and the detected user interface components. The controller 100 performs 8 directionality BFS (Breadth First Search). Upon removal of screens and all UI components, images of at least one image are left with flow lines. The finetuned object detection model 106 gives starting and ending point co-ordinates of all the detected flow lines. Since, image is left with flow lines, it is possible to filter out all pixel coordinates. The ending point co-ordinates over the image is mapped and respective region of interest is cropped. The 8-directionality breadth first search traversal traverses flow lines until it falls or comes in proximity of the starting point and when a respective starting point is found, detection of the line is done/completed. Upon line detection, it is mapped with its starting point and ending point coordinates.
[0012] Upon flow lines detection, the controller 100 is configured to group the detected user interface component with detected UI screens and pair the flowlines with the detected UI screens. The controller 100 maps the UI/UX component. Here, each component is mapped with respective parent UI screen. For this, identified flowlines are paired with starting and ending UI screen using minimum distance criterion. Using equations [D (Si,P)= Minimum of Distance with all four-line segments of co-ordinates and D(Si,P)= Min (D(P,AB), D(P,AC), D(P, CD), D(P,BD)), wherein A, B, C and D are the starting and ending point of the UI screens and Si is the UI screen and P is the point] flow points are mapped with the UI screen having close proximity. This determines the flow direction through UI screens i.e., which is the source and destination UI screen, and the UI components are mapped and grouped accordingly. Once all the information is compiled, metadata/schema is generated for at least one image in JSON form or all known or new forms, which represents the characteristics of UI/UX flow.
[0013] According to an embodiment of the present invention, a device to automatically generate metadata from at least one image is disclosed. The device comprises the controller 100 configured to receive at least one image as input and preprocess received at least one image to enhance the images using image processing module 102, characterized in that, controller 100 is configured to identify the number of UI screens in the processed at least one image using the screen detection module 104. By using the object detection module 106, the controller 100 is configured to detect the user interface component in the identified screens. The controller 100 detects the directional flow line with the starting and ending point coordinates using the object detection module 106 by removing the identified UI screen and the detected user interface components. Upon detection of the directional flow line, the controller 100 is configured to group the detected user interface components with detected screens and pair the flow lines with the detected UI screens. Once all the information is compiled, metadata is generated in JSON form or all known or new forms, which represents the characteristics of UI/UX flow.
[0014] In accordance with an embodiment of the present invention, the controller 100 is provided with necessary signal detection, acquisition, and processing circuits. The controller 100 is the one which comprises input interface, output interfaces having pins or ports, the memory element such as Random Access Memory (RAM) and/or Read Only Memory (ROM), Analog-to-Digital Converter (ADC) and a Digital-to-Analog Convertor (DAC), clocks, timers, counters and at least one processor (capable of implementing machine learning) connected with each other and to other components through communication bus channels. The memory element (not shown) is pre-stored with logics or instructions or programs or applications or modules/models and/or threshold values/ranges, reference values, predefined/predetermined criteria/conditions, lists, knowledge sources which is/are accessed by the at least one processor as per the defined routines. The internal components of the controller 100 are not explained for being state of the art, and the same must not be understood in a limiting manner. The controller 100 may also comprise communication units such as transceivers to communicate through wireless or wired means such as Global System for Mobile Communications (GSM), 3G, 4G, 5G, Wi-Fi, Bluetooth, Ethernet, serial networks, and the like. The controller 100 is implementable in the form of System-in-Package (SiP) or System-on-Chip (SOC) or any other known types. Examples of controller 100 comprises but not limited to, microcontroller, microprocessor, microcomputer, Electronic Control Units (ECUs), etc.
[0015] In accordance with an embodiment of the present invention, the controller 100 is part of the device which is at least one of a group comprising a computer, laptop, a cloud computer, a smart phone, and the like.
[0016] Fig. 2 illustrates different stages performed throughout the process of automatic generation of metadata, according to an embodiment of the present invention. The working of the controller 100 is explained as an example. Consider the user receives at least one image as input for generation of metadata in PDF format in a laptop. The user processes the input through a UI platform or UI framework, or UI module stored in the memory element of the controller 100. The user uploads the at least one image into the UI module which is taken as the input by the controller 100 and processed further. As shown in Fig. 2, three images, namely, a first image 202, a second image 204 and a third image 206, are shown to have received in different forms for e.g., png, jpg, jpeg, pdf, etc. or in any new form. The three images are shown for simplicity in understanding and is possible to be more as per application. After receiving the images, the controller 100 pre-processes the images using different models for e.g., image resizing, image smoothing, etc. Using these techniques, the overall structure of the images is enhanced and simplified by the controller 100. In the received images, screen detection module 104 is applied to identify different number of UI screens. The identification makes the grouping of the UI/UX component with respective UI screen easy. In each of the three images 202, 204, 206, three UI screens 210, 212 and 214 are shown. This is just for explanation and not to be understood in limiting manner.
[0017] As shown in the second image 204, different components like radio button, check box, etc. along with the color and text are identified. After the components, its color and text are identified, the UI component with their parent screen is done and pixel wise information of all the identified components is also calculated. Further, the text and color are mapped with the respective components.
[0018] As shown in the third block 206, the flow lines are detected using 8 directionality breadth first search. In this process of flow line 216 detection, the identified UI screens and components are removed, and the images are left with the flow lines 216 as show in a fourth image 208. The fine-tuned object detection module 106 gives the starting point and the ending point coordinate of the flow lines as shown i.e., 206. These dots in different screen shows the starting and ending point coordinates. Once the coordinates are available, the ending point coordinates is mapped over the image and respective region of interest is cropped. The 8-directionality breadth first search traversal traverses flow lines until that falls or comes in proximity of starting point and once the starting point is found, the detection of the line is done. Finally, the mapping of the detected flow line with its starting point and ending point coordinates is performed. In the third instance 206, the controller 100 groups the UI/UX information with respective screen using minimum distance criterion. Using this criterion, flow points are mapped with the UI screens having close proximity. This also determines the flow direction through UI screen i.e., which is the source and destination UI screen. Once, the grouping is done, metadata is generated for the detected at least one image by the controller 100, which represents the characteristics of UI/UX flow.
[0019] In another working example, the controller 100 is part of cloud computer connected through internet to the input units which is used to upload the at least one image as input. Once at least one image is received as input, the controller 100 performs the aforementioned processes and generates the metadata. The metadata is stored in the cloud itself and is retrievable as per requirement.
[0020] Fig. 3 illustrates a flow diagram of a method for automatically generating metadata from an at least one image, according to the present invention. The method comprises plurality of steps of which a step 302 comprises receiving at least one image as input in png, jpeg, jpg, pdf or in any new or known formats. A step 304 comprises pre-processing the received at least one image for enhancing the images using image processing module 102. A step of 306 comprises identifying the number of UI screens in the processed at least one image using the screen detection module 104. Here, the number of UI screens present in the received images are identified, so that the grouping of respective UI/UX component is simple. A step of 308 comprises detecting the UI component in the identified screens of the at least one image using the object detection module 106. Here, pixel wise information of each component is calculated, and the actual or correct information is mapped with UI component. The object detection model 106 is finetuned using the single neural network which used neural network to predict bounding boxes and class probabilities for objects in the images. A step of 310 comprises detecting the directional flow line with the starting point and the ending point coordinate using the object detection module 106 by removing the identified UI screens and the detected user interface components. The flow lines are detected using 8 directionality breadth first search. A step 312 comprises grouping the detected user interface component with detected screens and pairing the flowlines with detected UI screens. A step 314 comprises generating metadata for the detected at least one image.
[0021] According to the method, the pre-processing methods are image resizing, image smoothing, and the like. By combining at least one of the pre-processing methods, at least one image is enhanced in several ways which make images more suitable for further processing. The pre-processing methods remove noises, enhance edges, simplify the image overall structure, etc.
[0022] According to the method, the flow lines are detected by using 8 directionality BSF along pairing them with the starting and ending points. Here, the first stage is the removal of UI screens from received images and then the respective screen coordinate is obtained using screen identification module. Upon detection of the UI screen coordinate, object detection module 106 detects the starting and ending point coordinate of flow lines. Upon detection of flow lines, UI/UX components are mapped with respective parent screen for which identified flow lines are paired with the starting and ending UI screen using minimum distance criterion. This also determines the flow direction through UI screen i.e., which is the source and destination UI screen.
[0023] According to the method, the metadata is generated for the detected at least one image once all the information is compiled, the generated metadata is in the form of JSON file and any all known or new form which represents the characteristics of UI/UX flow.
[0024] According to the present invention, the controller 100 and method to create schema from Graphical User Interface (GUI) based images to ease project development workflow is disclosed. The controller 100 and the method fulfills the requirement of different teams in an organization for example, this is usable to generate requirements specifications in natural language i.e., product teams, user interface code for development i.e., development teams, test interface for testing the developed code i.e., testing team and many more. To overcome such challenge, an automated method is required, wherein metadata is generated from at least one image and actual or correct interpretation of the UI/UX component is done automatically and different types of test cases is generated automatically for different use cases.
[0025] It should be understood that the embodiments explained in the description above are only illustrative and do not limit the scope of this invention. Many such embodiments and other modification and changes in the embodiment explained in the description are envisaged. The scope of the invention is only limited by the scope of the claims.
, Claims:We Claim:
1. A controller (100) to automatically generate metadata for User Interface (UI) screens from at least one image, said controller (100) configured to:
a. receive at least one image as an input;
b. pre-process said received at least one image using image processing module (102), characterized in that;
c. identify a number of UI screens in the processed at least one image using a screen detection module (104);
d. detect user interface component in the identified UI screen at least one image using an object detection module (106);
e. detect a directional flow line with a starting point and an ending point coordinate using said object detection module (106) by removing said identified UI screen and the detected user interface components;
f. group the detected user interface components with the detected screens and pair the flowlines with the detected UI screens, and
g. generate a metadata for the detected at least one image UI screens.
2. The controller (100) as claimed in claim 1, wherein said pre-processing is achieved using methods selected from a group of images resizing and image smoothing, and the like.
3. The controller (100) as claimed in claim 1, wherein said object detection model is trained using artificial neural network, machine learning module.
4. The controller (100) as claimed in claim 1, wherein the detection of the directional flow line is identified using an 8-directionality breadth first search.
5. The controller (100) as claimed in claim 1, wherein the detected directional flow line is linked with the starting and ending UI screens are identified using a minimal distance criterion.
6. A method for automatically generating metadata for User Interface (UI) screens from at least one image-, said method performed by a controller (100) comprising the steps of:
a) receiving at least one image as an input;
b) pre-processing said received at least one image using image processing module (102), characterized by
c) identifying a number of UI screens in said processed at least one image using a screen detection module (104);
d) detecting a user interface component in the identified UI screens at least one image using an object detection module (106);
e) detecting a directional flow line with a starting point and an ending point coordinate using said object detection module (106) by removing the identified UI screens and the detected user interface components;
f) grouping the detected user interface components with the detected UI screens and pairing the flowlines with the detected UI screens, and
g) generating a metadata for the detected at least one image UI screens.
7. The method as claimed in claim 6, wherein said step of pre-processing is achieved using methods selected from a group of image resizing, image smoothing, and the like.
8. The method as claimed in claim 6, wherein said object detection module 106 is trained using artificial neural network, machine learning.
9. The method as claimed in claim 6, wherein the detection of directional flow line is identified using an 8-directionality breadth first search.
10. The method as claimed in claim 9, wherein the detected directional flow line is linked with the starting and ending screen using a minimal distance criterion.
| # | Name | Date |
|---|---|---|
| 1 | 202341072489-POWER OF AUTHORITY [25-10-2023(online)].pdf | 2023-10-25 |
| 2 | 202341072489-FORM 1 [25-10-2023(online)].pdf | 2023-10-25 |
| 3 | 202341072489-DRAWINGS [25-10-2023(online)].pdf | 2023-10-25 |
| 4 | 202341072489-DECLARATION OF INVENTORSHIP (FORM 5) [25-10-2023(online)].pdf | 2023-10-25 |
| 5 | 202341072489-COMPLETE SPECIFICATION [25-10-2023(online)].pdf | 2023-10-25 |