Abstract: Complete automation in the process of accurately extracting and interpreting User Interface (UI) design requirements provided through a wireframe is required considering rapidly changing technology platforms, shrinking timelines and frequently changing UI design requirements. The embodiments herein provide a method and system for automatically recognizing User Interface (UI) elements from UI designs or wireframes to generate working screens. The method utilizes an optimal combination of Machine learning techniques, image processing techniques, Optical Character Recognition (OCRs) and expert systems with rule engines to process an image of a wireframe, which defines the UI design requirement. The wireframe image is processed in accordance with the method disclosed herein to accurately detect, recognize and group the recognized UI elements in accordance with parent child relationship to generate a UI element list. The UI element list is further converted to XML based specifications files to generate working screen. [To be published with FIG. 6B]
FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION (See Section 10 and Rule 13)
Title of invention:
TITLE
METHOD AND SYSTEM FOR AUTOMATICALLY RECOGNIZING UI
ELEMENTS FROM WIREFRAMES TO GENERATE WORKING
SCREENS
Applicant
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
Preamble to the description
The following specification particularly describes the invention and the manner in which it is to be performed.
CROSS-REFERENCE TO RELATED APPLICATIONS AND PRIORITY
[001] The present application is a patent of addition of Indian Patent Application No. 201921018637, filed on 9 May 2019, the entire content of which is hereby incorporated herein by way of reference.
TECHNICAL FIELD [002] The embodiments herein generally relate to the field of User Interface (UI) design of web based applications and, more particularly, to automatically recognizing UI elements from UI designs or wireframes to generate working screens.
BACKGROUND [003] In the digital world, web applications on digital devices are a link between consumer and services provided through internet. User Interface (UI) designing for these web applications is challenging considering rapidly changing technology platforms, shrinking timelines, short sprints in agile development and frequently changing UI design requirements considering the customer centric approach followed by service providers. Conventionally, developers worked upon UI design requirements provided via wireframes, further on, added the technical expertise from an expert to develop a working screen or as-is screen. Any review changes or client requirements required multiple iterations and repeated involvement of technical experts along with the developer to bring in the requested change. Thus, the design loop is required to be repeated right from the UI design or wireframe stage, effectively making the process time consuming. This task is tedious and time consuming for the persons involved in UI design and development. Further, rapid technology changes has made it challenging to update the UI designs with the endless new resolutions and devices. For example, creating a website version for each resolution and new device would be impractical. Thus, automation is desired for time efficient and effort efficient design and development process.
[004] The Applicant has addressed concerns and limitations in the art towards automation in process of wireframe to working screens with technology agnostic approach, in applicant’s Indian Patent Application No. 201921018637, filed on 9 May 2019 by providing a system with a UI editor. The UI editor, enables a user, typically a developer, to work and edit on an image of a wireframe, alternatively referred to as UI design. Further, the edited image is processed automatically by the system to extract specifications or user requirements from wireframes and the specifications are converted to a technological independent model generated by a model editor. This technological independent model can then be imported into UI models and followed by code conversion to the required technology stack. User has control to modify the imported specifications before code conversion. However, this UI editor requires manual intervention to identify and label the UI components, which is semi-automated and time consuming, in turn affecting user experience. Thus, automating the process of accurately extracting and interpreting and the UI design requirements provided through a wireframe is desired.
[005] Recent developments provide machine learning based approaches reducing human effort in identifying components of UI design and generate UI code for working screen. These developments can be categorized as: i) training a deep learning model by providing images and the corresponding DSLs as input and ii) identifying some of the popular UI elements and creating either a Java or HTML or Mobile or any technology pages. However, the first approach requires interpreters to translate DSLs to working screens and second approach is bound by technology limitations.
SUMMARY [006] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a method for recognizing User Interface (UI) elements from wireframes to generate working screens, the method comprising accessing an
image of a wireframe providing UI design requirements from a database, wherein UI design requirements comprise one or more UI elements having a plurality of text elements and a plurality of control elements. Further, recognizing the plurality of text elements in the image, by processing the image using a pre-trained text detection neural network followed by an Optical Character Recognition (OCR), wherein the recognized plurality of text elements comprise a plurality of column groups within each of a plurality of row groups of text characters identified in the image, wherein each of the plurality of column groups comprises a set of text characters marked with a text bounding box, wherein the text bounding box is uniquely identified with a text quadruplet defining corner coordinates of the text bounding box with respect to coordinates of a reference corner of the image, a height and a width of the text bounding box. Furthermore, recognizing and labelling the plurality of control elements in the image using one of combination of one or more image processing techniques and a pre-trained Support Vector machine (SVM) classifier, and a pre-trained deep learning model. Each of the recognized plurality of control elements is marked with a control bounding box, and a label representing a control type of the control element, wherein the control bounding box is uniquely identified by a control quadruplet defining corner coordinates of the control bounding box with reference to coordinates of a reference corner of the image, a height and a width of the control bounding box. Further, establishing a parent-child relationship to identify one or more parent elements and one or more child elements nested within each of the one or more parent elements among the labelled plurality of control elements using distance computation based on the control quadruplet, wherein the control quadruplet of each of the identified one or more child elements is revised in accordance to reference corner of corresponding parent elements to define positioning of a child element within a parent element. Furthermore, generating a UI element list by mapping each of the plurality of control elements with each of the plurality of text elements based on the text quadruplet of each of the created plurality of column groups, the control quadruplet of each of the labelled plurality of control elements, the established parent child relationship among the labelled
plurality of control elements and a set predefined rules. Further, generating a part summary for each of the plurality of control elements, wherein the part summary comprises a control name, a control type, the control quadruplet and a control content corresponding to the set of characters corresponding to each of the plurality of text elements mapped in the UI element list using a UI editor. Thereafter, converting the part summary to an eXtensible Markup Language (XML) based specification files in accordance with a set of XML support files accessible to UI editor. Further, generating a technology independent model, based upon a predefined standard, from the XML based specification files using a model editor. Furthermore, generating a working screen for the received image of the wireframe from the technology independent model using a model editor in accordance with a preset technology platform for the wireframe, wherein the working screen is displayed to a user via a graphical user interface.
[007] In another aspect, a system for recognizing User Interface (UI) elements from wireframes to generate working screens is provided. The system comprises a memory storing instructions; one or more Input/Output (I/O) interfaces; and processor(s) coupled to the memory via the one or more I/O interfaces, wherein the processor(s) is configured by the instructions to access an image of a wireframe providing UI design requirements from a database in the memory, wherein UI design requirements comprise one or more UI elements having a plurality of text elements and a plurality of control elements. Further, recognize the plurality of text elements in the image, by processing the image using a pre-trained text detection neural network followed by an Optical Character Recognition (OCR), wherein the recognized plurality of text elements comprise a plurality of column groups within each of a plurality of row groups of text characters identified in the image, wherein each of the plurality of column groups comprises a set of text characters marked with a text bounding box, wherein the text bounding box is uniquely identified with a text quadruplet defining corner coordinates of the text bounding box with respect to coordinates of a reference corner of the image, a height and a width of the text bounding box. Furthermore, recognize and labelling the plurality of control elements in the
image using one of combination of one or more image processing techniques and a pre-trained Support Vector machine (SVM) classifier, and a pre-trained deep learning model. Each of the recognized plurality of control elements is marked with a control bounding box, and a label representing a control type of the control element, wherein the control bounding box is uniquely identified by a control quadruplet defining corner coordinates of the control bounding box with reference to coordinates of a reference corner of the image, a height and a width of the control bounding box. Further, establish a parent-child relationship to identify one or more parent elements and one or more child elements nested within each of the one or more parent elements among the labelled plurality of control elements using distance computation based on the control quadruplet, wherein the control quadruplet of each of the identified one or more child elements is revised in accordance to reference corner of corresponding parent elements to defined positioning of a child element within a parent element. Furthermore, generate a UI element list by mapping each of the plurality of control elements with each of the plurality of text elements based on the text quadruplet of each of the created plurality of column groups, the control quadruplet of each of the labelled plurality of control elements, the established parent child relationship among the labelled plurality of control elements and a set predefined rules. Further, generate a part summary for each of the plurality of control elements, wherein the part summary comprises a control name, a control type, the control quadruplet and a control content corresponding to the set of characters corresponding to each of the plurality of text elements mapped in the UI element list using a UI editor. Thereafter, convert the part summary to an eXtensible Markup Language (XML) based specification files in accordance with a set of XML support files accessible to UI editor. Further, generate a technology independent model, based upon a predefined standard, from the XML based specification files using a model editor. Furthermore, generate a working screen for the received image of the wireframe from the technology independent model using a model editor in accordance with a preset technology platform for the wireframe.
[008] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
[009] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
[010] FIG. 1 is a functional block diagram of a system for automatically recognizing User Interface (UI) elements from UI designs or wireframes to generate working screens, in accordance with some embodiments of the present disclosure.
[011] FIG. 2A through 2C is a flow diagram illustrating a method for automatically recognizing the UI elements from the UI designs or the wireframes to generate the working screens, using the system of FIG. 1, in accordance with some embodiments of the present disclosure.
[012] FIG. 3 illustrate a flow diagram for a process providing steps for text recognition of text elements within the UI elements, in accordance with some embodiments of the present disclosure.
[013] FIG. 4 is an example illustrating eight coordinates to four coordinates conversion of bounding boxes of the detected UI elements, in accordance with some embodiments of the present disclosure.
[014] FIG. 5A through FIG. 5C is an example illustrating establishment of parent child relationship among the detected control elements in the UI elements, in accordance with some embodiments of the present disclosure.
[015] FIG. 6A through 6B are example illustrating output displayed by system of FIG 1 for intermediate steps of the method of FIG. 2 implemented by the system of FIG. 1, in accordance with some embodiments of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS
[016] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments. It is intended that the following detailed description be considered as exemplary only, with the true scope being indicated by the following claims.
[017] The Applicant has addressed concerns and limitations in the art, in applicant’s Indian Patent Application No. 201921018637, filed on 9 May 2019 by providing a system with a UI editor to capture UI design requirements from wireframes. The UI editor, enables a user, typically a developer, to work and edit on an image of a wireframe. Further, the edited image is processed automatically by the system to extract specifications or user requirements from wireframes and converts the specifications to a technological independent model generated by the model editor. This technological independent model can then be imported into UI models, followed by code conversion to the required technology stack. User is also provided a control to modify the imported specifications before going to code conversion. However, this UI editor requires manual intervention to identify and label the UI components or UI elements. Thus, further automation is implemented to applicant’s Indian Patent Application No. 201921018637 so as to completely eliminate manual intervention will enhance user experience and make the UI design to development process time efficient and less erroneous.
[018] To provide an end to end automated approach to further enhance the Indian Patent Application No. 201921018637, the embodiments herein provide a method and system for automatically recognizing User Interface (UI) elements from UI designs or wireframes to generate working screens. The method utilizes an optimal combination of Machine learning techniques, image processing techniques, Optical Character Recognition (OCRs) and expert systems with rule engines to process an image of a wireframe, wherein the wireframe
defines the UI design requirement. The wireframe image is processed in accordance with the method disclosed herein to accurately detect, recognize and group the recognized UI elements in accordance with parent child relationship to generate a UI element list. The UI element list comprises a plurality of control elements mapped to a plurality of text elements, both detected from the processed wireframe. Once the UI element list is generated, which captures all the UI elements with the interrelations, the method disclosed herein utilizes the steps described in the Indian Patent Application No. 201921018637, to generate a part summary for each of the plurality of control elements of the UI element list using the UI editor. Further, the method comprises converting the part summary to an eXtensible Markup Language (XML) based specification file in accordance with a set of XML support files accessible to UI editor. Using the XML based specification files, which captures the detected and recognized UI elements, a technology independent model is generated by a model editor. The model editor utilizes a predefined standard, such as Open Mobile Alliance (OMA) to generate the technology independent model. The technology independent model can then be converted to a working screen implementing all UI design requirements in the received image based on technology platform provided to the model editor.
[019] Referring now to the drawings, and more particularly to FIGS. 1 through 6B, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
[020] FIG. 1 is a functional block diagram of a system 100 for automatically recognizing User Interface (UI) elements from UI designs or wireframes to generate working screens, in accordance with some embodiments of the present disclosure. In an embodiment, the system 100 includes a processor (s) 104, communication interface device(s), alternatively referred as or input/output (I/O) interface(s) 106, and one or more data storage devices or memory 102 operatively coupled to the processor(s) 104. In an embodiment, the processors(s) 104, can be one or more hardware processors 104. In an embodiment, the one or
more hardware processors 104 can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) is configured to fetch and execute computer-readable instructions stored in a memory 102. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.
[021] The I/O interface(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface device(s) can include one or more ports for connecting a number of devices to one another or to another server. The I/O interface 106 provides interface to receive the images of the wireframe from external data sources. In an embodiment, images of wireframes may be stored in a database 108 of the memory 102. Further, the database 108 may also contain image processing libraries such image library cv2, SVM classifier models, deep learning models, OCR modules and the like used for automatically recognizing the UI elements in the wireframe such as the control elements and the text elements. Further, the memory may store the UI editor and the model editor, referred in the Indian Patent Application No. 201921018637, for automatically generating working screens from the recognized UI elements.
[022] The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. Furthermore, the memory 102 may comprise information pertaining to input(s)/output(s) of
each step performed by the one or more hardware processors 104 of the system 100 and methods of the present disclosure.
[023] FIG. 2A through FIG. 2C is a flow diagram illustrating a method for automatically recognizing the UI elements from the UI designs or the wireframes to generate the working screens, using the system of FIG. 1, in accordance with some embodiments of the present disclosure.
[024] In an embodiment, the system 100 comprises one or more data storage devices or the memory 102 operatively coupled to the processor(s) 104 and is configured to store instructions for execution of steps of the method 200 by the one or more hardware processors 104. The steps of the method 200 of the present disclosure will now be explained with reference to the components or blocks of the system 100 as depicted in FIG. 1 and the steps of flow diagram as depicted in FIG. 2A through FIG. 3B along with illustrative examples from FIG. 4 through FIG. 6B. Although process steps, method steps, techniques or the like may be described in a sequential order, such processes, methods and techniques may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps to be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously.
[025] Referring to the steps of the method 200, steps 202 through steps 212, describe processing of wireframes to automatically recognize one or more UI elements and generate the UI element list. Further, the steps 214 and 218 generate the XML specification files and generate working screens, as described in applicant’s Indian Patent Application No. 201921018637. Thus, steps 202 through 212 provide enhancement over the applicant’s Indian Patent Application No. 201921018637 by providing automation at wireframe UI element detection and recognition level, with higher accuracy, effectively enhancing user experience.
[026] Referring now to the steps of the method 200, at step 202, the one or more hardware processors 104 are configured to access the image of a wireframe of interest from the database 108, which may be accessed and stored
from an external data source. The wireframe provides UI design requirements in terms of a plurality of text elements and a plurality of control elements, as depicted in partial view of an example image of an example wireframe example of FIG 6A. For example, 602 indicates a text element ‘control1’ within a control element 604 of type drop down, whereas, the text element 606 is adjacent to the control element.
[027] The wireframe image received is further processed at step 204 of the method 200, wherein the one or more hardware processors104 are configured to recognize the plurality of text elements in the image, by processing the image received using a pre-trained text detection neural network followed by the Optical Character Recognition (OCR).
[028] The recognition of text elements is further elaborated in conjunction with steps of a process 300 as depicted in FIG. 3. At step 302 of the process 300, the image received, is processed using the pre-trained text detection neural network to identify a plurality of text regions in the image. For example, the neural network model used may be Text detection Connection Text Proposal Network (CTPN) or the like. The detected plurality of text regions are marked with a plurality of bounding boxes. Each of the plurality of bounding boxes are uniquely identified by a first set of eight spatial coordinates corresponding to x and y coordinates of four corners of each bounding box with respect to coordinates of a reference corner of the image of the wireframe, as depicted by illustrative example 402 of in FIG. 4. Further, at step 304, each of the plurality of text regions is cropped in accordance with the bounding boxes marked for each of the plurality of text regions.
[029] At step 306, each of the cropped plurality of text regions is processed using the OCR to mark a plurality of character bounding boxes around each of a plurality of characters in each of the cropped plurality of text regions and recognize the plurality of characters. The characters recognized using the OCR techniques can be stored in the memory. Each of the plurality of character bounding boxes are uniquely identified by a second set of eight spatial
coordinates corresponding to x and y coordinates of four corners of each character bounding box.
[030] At step 308, the second set of eight spatial coordinates corresponding to each of the plurality of characters is converted, using distance computation between the coordinates, to a set of four spatial coordinates. The four coordinates correspond to a first corner among the four corners of each of the plurality of character bounding boxes, a height of each of the plurality of character bounding boxes and a width of each of the plurality of character bounding boxes, as depicted in an illustrative example 404 of FIG. 4.
[031] Since the wireframe is not an entire text document and text may be present in user intended random positions on wireframe, the method 200 localizes the text regions using the Text detection CTPN and the region details are passed on to the OCR to capture characters from position A (start of the cropped text region) to position B (end of the cropped text region).
[032] At step 310 each of the plurality of character bounding boxes are assigned a row number based on corresponding x coordinate of each of the plurality of character bounding boxes. At step 312 row based grouping of characters among the plurality of characters is performed to create a plurality of row groups based on a predefined row vicinity threshold. The predefined row vicinity threshold can be decided based on panel length, layout or a border, that separates two lines on the image of the wireframe. Each of the plurality of row groups is assigned a unique line number in sequence. Once row grouping is performed, at step 314 column based grouping of characters associated with each of the plurality of row groups is performed. This creates a plurality of column groups within each of the plurality of row groups based on the y coordinate of each of the character bounding boxes of each of the plurality of row groups in accordance with a predefined column vicinity threshold. Since character width is consistent across font size, this value is used as threshold to combine these boxes column wise. The predefined column vicinity threshold depends on the font size or no separation between columns like border separation and the like. Each of the plurality of column groups represents a text element with the text bounding box
and is identified by a set of four spatial coordinates comprising x and y coordinates of a first corner of a first character bounding box of each of the plurality of column groups and a height of each of the plurality of character bounding boxes and a width, wherein the width of the text bounding box is computed based on number of character bounding boxes present in each of the plurality of column groups. The four coordinates representation, is as shown in example 404 of FIG. 4.
[033] Using OCR alone can provide text and their bounding boxes but this might not cover all the text areas, as the text is scattered on the screen. This accuracy with only OCR for text recognition in the wireframe, accuracy will be p0or. It is thus required to first identify the areas and then use OCR to go the location of text and capture it. So a rightful combination of first identifying text regions (with pre-trained Text Detection CTPN) and then extracting text (for example, with OCR engine tesseract) used by the system 100 disclosed herein, enables achieving near 100% accuracy in text recognition.
[034] Further, it may be observed that since text in the wireframe may be of varying font sizes, for received image, of a certain resolution, text region detection may not provide any output since some areas of the image may be blurred. In such scenarios, the one or more hardware processors 104 are configured to vary (shrink) the received image to different resolutions and repeat the text recognition process to get the text regions, not captured at original resolution of the received image. Once the text regions are identified at different resolutions, output at each resolution is restored in accordance to the shrink percentage (and all text regions (bounding boxes) are collected and duplicates are filtered out.
[035] Thus in an embodiment, at step 302 before and proceeding to step 304, the one or more hardware processor 104 is configured to detect absence of the plurality of text region post processing of the image. Further, original resolution of the image may be changed to generate a set of images at one or more predefined resolutions percentages. Thereafter, each image among the set of images is processed to detect corresponding plurality of text regions in each of
the set of images and the resolution of the detected corresponding plurality of text regions is restored in the set of images to the original resolution. Once resolution is restored, corresponding plurality of text regions of each of the set of images are added by filtering out duplicates to generate the processed image comprising the filtered plurality of text regions. Thereafter, at step 304, the processed image comprising the filtered plurality of text regions to be processed are cropped using the OCR.
[036] Referring back to the steps of the method 200, once the text elements are recognized, at step 206, the one or more hardware processor 104 is configured to recognize and label the plurality of control elements in the image using one of combination of one or more image processing techniques, a pre-trained Support Vector machine (SVM) classifier, and a pre-trained deep learning model. Each of the recognized plurality of control elements is marked with a control bounding box, and a label representing a control type of the control element. For example, the control type of the control element can be a text box, text areas, a button (e.g. a radio button and the like), a check box and any standard UI controls. The control bounding box is uniquely identified by a control quadruplet defining corner coordinates of the control bounding box with reference to coordinates of a reference corner of the image, a height and a width of the control bounding box.
[037] FIG. 6B depicts the partial image of the wireframe with the detected plurality of text elements and the plurality of control elements with bounding boxes in the partial view of the image. The detected text elements and the control elements are stored in the database 108 of the memory 102, for future reference.
[038] Once each of the control elements are marked with bounding boxes, it is critical to identify any nesting and relations between these control elements. Thus at step 208, the one or more hardware processors 104 are configured to, establish a parent- child relationship to identify one or more parent elements and one or more child elements nested within each of the one or more parent elements among the labelled plurality of control elements. The nesting can
be identified using distance computation based on the control quadruplet. Once the parent child relationship is established, the control quadruplet of each of the identified one or more child elements is revised in accordance to reference corner of corresponding parent elements. This defines positioning of each child element within the parent element.
[039] FIG. 5A, in conjunction with an example pseudo code below explains nesting level detection based on a condition check for each of the control elements. It can be noted the control quadruplet prior to nesting utilizes image corner as reference, as depicted in FIG. 5B. While the control quadruplets for all child elements are with reference to the parent element, as depicted in FIG. 5C.
Pseudo code1: for nesting level detection
Condition for control element 1 having control element 2 as child: coordinates for control element 1 x1, y1, w1, h1 coordinates for text element x2, y2, w2, h2
x2 > x1;
x2+ w2 < x1 + w1;
y2 > y1;
y2 + h2 < y1+ h1;
sample output
{level_1:{textElement1:{....},controlElement1:{ },....},
level_2:{controlElement4:{ },....},
level_3:{controlElement61:{ },controlElement29:{ }, },
....} [040] Further, mapping child to parent elements is explained using a pseudo code 2 in conjunction with FIG. 5C
Pseudo code2: mapping child to parent elements
n = number of levels
for each level from last to first:
find_parent(level i elements) in level i-1 elements
map child to parent and adjust position with respect to parent element.
sample output
{controlElement1: {...., type:'Panel', child:
{controlElement4:
{
child:
{controlElement29:{ }}
} } } }
[041] Once the parent-child relation is established, then at step 210, the UI element list is generated by mapping each of the plurality of control elements with each of the plurality of text elements based on the text quadruplet of each of the created plurality of column groups, the control quadruplet of each of the labelled plurality of control elements and the established parent child relationship among the labelled plurality of control elements and a set predefined rules. The UI element list generated for the received image is stored in the memory, with a unique tag specifying the image for which the Uio element list is created. This UI element list may be used for any future reference. The mapping can be one to one or many one. A rule based system (expert system) is used to map text elements to control elements. Few illustrative rules are mentioned below for a plurality of identified cases:
Case 1: Control element of type Text Box
If a text element is on left proximity of control element then
this text is assigned as caption/label for that control element. mostly
If a text element is inside a control element of type Text Box then this text is assigned as content for that control element.
Condition:
coordinates for control element xc, yc, wc, hc
coordinates for text element xt, yt, wt, ht
xt > xc;
xt + wt < xc + wc;
yt > yc;
yt + ht < yc + hc;
Case 2: Control element of type Radio Button
If a text element is on left proximity of control element then this text is assigned as caption/label for that control element
Condition: can be defined by system administartor based on coordinates values as in case 1
If a text element is on right proximity of control element then this text is assigned as caption/label for that control element
Condition: can be defined by system administartor based on coordinates
values as in case 1
Case 3: Control element of type Check Box
If a text element is on left proximity of control element then this text is assigned as caption/label for that control element
Condition: XXX (based on coordinates values)
If a text element is on right proximity of control element then this text is assigned as caption/label for that control element.
Condition: YYY (based on coordinates values) Case 4: Control element of type Button
If a text element is inside a control element then this text is assigned as content for that control element.
Condition:
coordinates for control element xc, yc, wc, hc
coordinates for text element xt, yt, wt, ht
xt > xc;
xt + wt < xc + wc;
yt > yc;
yt + ht < yc + hc;
Case 5: Control element of type Drop down
If a text element is inside a control element then this text is assigned as content for that control element Condition:
coordinates for control element xc, yc, wc, hc coordinates for text element xt, yt, wt, ht xt > xc;
xt + wt < xc + wc; yt > yc; yt + ht < yc + hc;
[042] The control elements detected, can alternatively referred as parts, as in applicant’s Indian Patent Application No. 201921018637. Thus for every control element in the UI element list, at step 212, the one or more hardware processors 104 are configured to generate a part summary for each of the plurality of control elements. Referring to the applicant’s Indian Patent Application No. 201921018637, it can be understood that term control element herein maps to term part. Thus part summary can be understood as control element summary. The part summary comprises a control name, a control type, the control quadruplet and a control content corresponding to the set of characters corresponding to each of the plurality of text elements mapped in the UI element list using the UI editor, as described in detail in applicant’s Indian Patent Application No. 201921018637, and not repeated for brevity. The part summary further includes aesthetic details of each identified UI element ( text element) like color of text, background color, border, font size, font type and the like that are extracted during the text recognition.
[043] At step 214, the one or more hardware processors 104 are configured to convert the part summary to an eXtensible Markup Language (XML) based specification file in accordance with a set of XML support files accessible to UI editor. The set of XML support files enable linking of files that store the parts cropped by the user to the XML-based specification file, as described in detail in applicant’s Indian Patent Application No. 201921018637, and not repeated for brevity.
[044] At step 216, the one or more hardware processors 104 are configured to generate a technology independent model, based upon a predefined standard, from the XML based specification files using a model editor. For example, the standard referred can be the Open Mobile Alliance (OMA) standard or the like. The technology independent model is a relational model as per Object Modelling Group standards, created for the XML-based specification file and captures functional requirements of the wireframe from the XML-based specification file. Further, the technology independent model is independent of the implementation technology to be used for generating the as-is screen with responsive behavior. At step 218, the one or more hardware processors 104 are configured to generate the working screen for the received image of the wireframe from the technology independent model using the model editor in accordance with a preset technology platform for the wireframe. The system 100 can receive user input indicating an implementation technology to be used from a plurality of implementation technologies available for generating the working screen. The working screen is displayed to a user via the graphical user interface of the system 100.
[045] Thus, the technology independent model can be created from XML specifications into UI models and followed by code conversion to the required technology stack. User is also provided a control to modify the imported specifications before going to code conversion along with choice of technology during the generation.
[046] In a specific scenario, say after generating the working screen, the wireframe may be updated to incorporate suggestions in the UI design
requirements. Conventionally, it would be necessary to repeat the entire steps 202 through 218 of method 200 on the new image of the updated wireframe, to generate new working screen in-line with the updated UI design requirements. However, if the changes are minimal, repetition of the entire steps of method 200 is time and computation inefficient. Thus, the system 100 disclosed herein is configured to only repeat the steps from 202 through the steps 210 till generation the UI element list, to get an updated UI element list instead of repeating the entire set of steps as mentioned above. The updated UI element list is now tagged with the new image of the updated wire frame. The system 100 is further configured to compare and determine the difference between the UI element list and the updated UI element list to identify the changes in the UI elements. Once the changes are identified, only the changes (new updated UI elements) are processed further to generate part summary. Further, the XML based specification files are modified only for the changes detected. The change detection process can be referred as delta computation process.
Pseudo code 3: (delta computation).
oldRegions = FromStorage oldRegions = = FromStep5 Each(newregion){ Exist(old regions)
Do nothing notExist(oldregions)
addtoList()
}
[047] Thus, the method disclosed utilizes an optimal combination of Machine learning techniques, image processing techniques, Optical Character Recognition (OCRs) and expert systems with rule engines to process an image of a wireframe, which defines the UI design requirement. The wireframe image is processed to accurately detect, recognize and group the recognized UI elements in accordance with parent child relationship.
[048] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[049] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
[050] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the
program for use by or in connection with the instruction execution system, apparatus, or device.
[051] The illustrated steps are set out to explain the exemplary
embodiments shown, and it should be anticipated that ongoing technological
development will change the manner in which particular functions are performed.
These examples are presented herein for purposes of illustration, and not
limitation. Further, the boundaries of the functional building blocks have been
arbitrarily defined herein for the convenience of the description. Alternative
boundaries can be defined so long as the specified functions and relationships
thereof are appropriately performed. Alternatives (including equivalents,
extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[052] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile
memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks,
and any other known physical storage media.
[053] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.
We Claim:
1. A method for recognizing User Interface (UI) elements from wireframes to generate working screens, the method comprising:
accessing (202), by one or more hardware processors, an image of a wireframe providing UI design requirements from a database, wherein the UI design requirements comprise one or more UI elements having a plurality of text elements and a plurality of control elements;
recognizing (204), by the one or more hardware processors, the plurality of text elements in the image, by processing the image using a pre-trained text detection neural network followed by an Optical Character Recognition (OCR), wherein the recognized plurality of text elements comprise a plurality of column groups within each of a plurality of row groups of text characters identified in the image, wherein each of the plurality of column groups comprises a set of text characters marked with a text bounding box, and wherein the text bounding box is uniquely identified with a text quadruplet defining corner coordinates of the text bounding box with respect to coordinates of a reference corner of the image, a height and a width of the text bounding box;
recognizing and labelling (206), by the one or more hardware processors, the plurality of control elements in the image using one of combination of one or more image processing techniques and a pre-trained Support Vector machine (SVM) classifier, and a pre-trained deep learning model, wherein each of the recognized plurality of control elements is marked with a control bounding box, and a label representing a control type of the control element, and wherein the control bounding box is uniquely identified by a control quadruplet defining corner coordinates of the control bounding box with reference to coordinates of a reference corner of the image, a height and a width of the control bounding box;
establishing (208), by the one or more hardware processors, a parent -child relationship to identify one or more parent elements and one
or more child elements nested within each of the one or more parent elements among the labelled plurality of control elements using distance computation based on the control quadruplet, and wherein the control quadruplet of each of the identified one or more child elements is revised in accordance to a reference corner of corresponding parent elements to define positioning of a child element within a parent element;
generating (210), by the one or more hardware processors, a UI element list by mapping each of the plurality of control elements with each of the plurality of text elements based on the text quadruplet of each of the created plurality of column groups, the control quadruplet of each of the labelled plurality of control elements, the established parent child relationship among the labelled plurality of control elements and a set predefined rules;
generating (212), by the one or more hardware processors, a part summary for each of the plurality of control elements, wherein the part summary comprises a control name, a control type, the control quadruplet and a control content corresponding to the set of characters corresponding to each of the plurality of text elements mapped in the UI element list using a UI editor;
converting (214), by the one or more hardware processors, the part summary to an eXtensible Markup Language (XML) based specification files in accordance with a set of XML support files accessible to the UI editor;
generating (216), by the one or more hardware processors, a technology independent model, based upon a predefined standard, from the XML based specification files using a model editor; and
generating (218), by the one or more hardware processors, a working screen for the received image of the wireframe from the technology independent model using the model editor in accordance with a preset technology platform for the wireframe, wherein the working screen is displayed to a user via a graphical user interface.
2. The method as claimed in claim 1, wherein recognizing the plurality of text elements in the image, using the pre-trained text detection neural network followed by the OCR comprises:
processing (302) the image using the pre-trained text detection neural network to detect a plurality of text regions in the image, wherein the detected plurality of text regions are marked with a plurality of bounding boxes, wherein each of the plurality of bounding boxes are uniquely identified by a first set of eight spatial coordinates corresponding to x and y coordinates of four corners of each bounding box with respect to the coordinates of the reference corner of the image;
cropping (304) each of the plurality of text regions in accordance with the bounding boxes marked for each of the plurality of text regions;
processing (306) each of the cropped plurality of text regions using the OCR to mark a plurality of character bounding boxes around each of a plurality of characters in each of the cropped plurality of text regions and recognizing the plurality of characters, wherein each of the plurality of character bounding boxes are uniquely identified by a second set of eight spatial coordinates corresponding to x and y coordinates of four corners of each character bounding box with respect to the coordinates of the reference corner of the image;
converting (308) the second set of eight spatial coordinates corresponding to each of the plurality of characters to a set of four spatial coordinates comprising x and y coordinates of a first corner among the four corners of each of the plurality of character bounding boxes, a height of each of the plurality of character bounding boxes and a width of each of the plurality of character bounding boxes;
assigning (310) each of the plurality of character bounding boxes a row number based on corresponding x coordinate of each of the plurality of character bounding boxes;
performing (312) row based grouping of characters among the plurality of characters to create a plurality of row groups based on a
predefined row vicinity threshold, wherein each of the plurality of row groups is assigned a unique line number in sequence; and
performing (314) column based grouping of characters associated with each of the plurality of row groups to create a plurality of column groups within each of the plurality of row groups based on the y coordinate of each of the character bounding boxes of each of the plurality of row groups in accordance with a predefined column vicinity threshold, wherein each of the plurality of column groups represents a text element marked with the text bounding box uniquely identified by the text quadruplet, and wherein the width of the text bounding box is computed based on number of character bounding boxes present in each of the plurality of column groups.
3. The method as claimed in claim 2, wherein the processing of the image
using the pre-trained text detection neural network to detect the plurality
of text regions comprises:
detecting an absence of the plurality of text region post processing of the image;
changing original resolution of the image to generate a set of images at one or more predefined resolutions percentages;
processing each image among the set of images to detect corresponding plurality of text regions in each of the set of images;
restoring the resolution of the detected corresponding plurality of text regions in the set of images to the original resolution;
adding corresponding plurality of text regions of each of the set of images by filtering out duplicates to generate the processed image comprising the filtered plurality of text regions; and
cropping the processed image comprising the filtered plurality of text regions to be processed using the OCR.
4. The method as claimed in claim 1, further comprising:
receiving an updated image of an updated wireframe;
recognizing the plurality of text elements and the plurality of control elements in the updated image to generate an updated UI element list;
comparing the updated UI element list with the UI element list to identify one or more changed UI elements;
generating an updated parts summary for the one or more changed
UI elements;
updating the XML based specification files with the updated parts
summary;
generating, by the one or more hardware processors, an updated technology independent model, based upon the predefined standard, from the updated XML based specification files using the model editor; and
generating, by the one or more hardware processors, an updated working screen for the updated image of the updated wireframe from the technology independent model using the model editor in accordance with the preset technology platform for the wireframe.
5. A system (100) for recognizing User Interface (UI) elements from wireframes to generate working screens, the system (100) comprising: a memory (102) storing instructions;
one or more Input/Output (I/O) interfaces (106); and
one or more hardware processors (104) coupled to the memory (102) via the one or more I/O interfaces (106), wherein the one or more hardware processors (104) are configured by the instructions to:
access an image of a wireframe providing UI design requirements from a database (108) in the memory (102), wherein the UI design requirements comprise one or more UI elements having a plurality of text elements and a plurality of control elements;
recognize the plurality of text elements in the image, by processing the image using a pre-trained text detection neural network
followed by an Optical Character Recognition (OCR), wherein the
recognized plurality of text elements comprise a plurality of column
groups within each of a plurality of row groups of text characters identified in the image, wherein each of the plurality of column groups comprises a set of text characters marked with a text bounding box, and wherein the text bounding box is uniquely identified with a text quadruplet defining corner coordinates of the text bounding box with respect to coordinates of a reference corner of the image, a height and a width of the text bounding box;
recognize and label the plurality of control elements in the image using one of combination of one or more image processing techniques and a pre-trained Support Vector machine (SVM) classifier, and a pre-trained deep learning model, wherein each of the recognized plurality of control elements is marked with a control bounding box, and a label representing a control type of the control element, wherein the control bounding box is uniquely identified by a control quadruplet defining corner coordinates of the control bounding box with reference to coordinates of a reference corner of the image, a height and a width of the control bounding box;
establish a parent-child relationship to identify one or more parent elements and one or more child elements nested within each of the one or more parent elements among the labelled plurality of control elements using distance computation based on the control quadruplet, and wherein the control quadruplet of each of the identified one or more child elements is revised in accordance to a reference corner of corresponding parent elements to define positioning of a child element within a parent element;
generate a UI element list by mapping each of the plurality of control elements with each of the plurality of text elements based on the text quadruplet of each of the created plurality of column groups, the control quadruplet of each of the labelled plurality of control elements, the established parent child relationship among the labelled plurality of control elements and a set predefined rules;
generate a part summary for each of the plurality of control elements, wherein the part summary comprises a control name, a control type, the control quadruplet and a control content corresponding to the set of characters corresponding to each of the plurality of text elements mapped in the UI element list using a UI editor;
convert the part summary to an eXtensible Markup Language (XML) based specification files in accordance with a set of XML support files accessible to the UI editor;
generate a technology independent model, based upon a predefined standard, from the XML based specification files using a model editor; and
generate a working screen for the received image of the wireframe from the technology independent model using the model editor in accordance with a preset technology platform for the wireframe, wherein the working screen is displayed to a user via a graphical user interface.
. The system as claimed in claim 5, wherein the one or more hardware processors (104) are configured to recognize the plurality of text elements in the image, using the pre-trained text detection neural network followed by the OCR by:
processing the image using the pre-trained text detection neural network to detect a plurality of text regions in the image, wherein the detected plurality of text regions are marked with a plurality of bounding boxes, wherein each of the plurality of bounding boxes are uniquely identified by a first set of eight spatial coordinates corresponding to x and y coordinates of four corners of each bounding box with respect to the coordinates of the reference corner of the image;
cropping each of the plurality of text regions in accordance with the bounding boxes marked for each of the plurality of text regions;
processing each of the cropped plurality of text regions using the OCR to mark a plurality of character bounding boxes around each of a
plurality of characters in each of the cropped plurality of text regions and recognizing the plurality of characters, wherein each of the plurality of character bounding boxes are uniquely identified by a second set of eight spatial coordinates corresponding to x and y coordinates of four corners of each character bounding box with respect to the coordinates of the reference corner of the image;
converting the second set of eight spatial coordinates corresponding to each of the plurality of characters to a set of four spatial coordinates comprising x and y coordinates of a first corner among the four corners of each of the plurality of character bounding boxes, a height of each of the plurality of character bounding boxes and a width of each of the plurality of character bounding boxes;
assigning each of the plurality of character bounding boxes a row number based on corresponding x coordinate of each of the plurality of character bounding boxes;
performing row based grouping of characters among the plurality of characters to create a plurality of row groups based on a predefined row vicinity threshold, wherein each of the plurality of row groups is assigned a unique line number in sequence; and
performing column based grouping of characters associated with each of the plurality of row groups to create a plurality of column groups within each of the plurality of row groups based on the y coordinate of each of the character bounding boxes of each of the plurality of row groups in accordance with a predefined column vicinity threshold, wherein each of the plurality of column groups represent a text element marked with the text bounding box uniquely identified by the text quadruplet, and, wherein the width of the text bounding box is computed based on number of character bounding boxes present in each of the plurality of column groups.
7. The system as claimed in claim 6, wherein the one or more hardware
processor (104) are configured to process the image using the pre-trained
text detection neural network to detect the plurality of text regions by:
detecting an absence of the plurality of text region post processing of the image;
changing original resolution of the image to generate a set of images at one or more predefined resolutions percentages;
processing each image among the set of images to detect corresponding plurality of text regions in each of the set of images;
restoring the resolution of the detected corresponding plurality of text regions in the set of images to the original resolution;
adding corresponding plurality of text regions of each of the set of images by filtering out duplicates to generate the processed image comprising the filtered plurality of text regions; and
cropping the processed image comprising the filtered plurality of text regions to be processed using the OCR.
8. The system (100) as claimed in claim 5, wherein the one or more
hardware processors (104) are further configured to:
receive an updated image of an updated wireframe;
recognize the plurality of text elements and the plurality of control elements in the updated image to generate an updated UI element list;
compare the updated UI element list with the UI element list to identify one or more changed UI elements;
generate an updated parts summary for the one or more changed
UI elements;
updating the XML based specification files with the updated part
summary;
generate an updated technology independent model, based upon the predefined standard, from the updated XML based specification files using the model editor; and
generate an updated working screen for the updated image of the updated wireframe from the technology independent model using the model editor in accordance with the preset technology platform for the wireframe.
| # | Name | Date |
|---|---|---|
| 1 | 201923043574-IntimationOfGrant08-04-2024.pdf | 2024-04-08 |
| 1 | 201923043574-STATEMENT OF UNDERTAKING (FORM 3) [25-10-2019(online)].pdf | 2019-10-25 |
| 2 | 201923043574-PatentCertificate08-04-2024.pdf | 2024-04-08 |
| 2 | 201923043574-REQUEST FOR EXAMINATION (FORM-18) [25-10-2019(online)].pdf | 2019-10-25 |
| 3 | 201923043574-FORM 18 [25-10-2019(online)].pdf | 2019-10-25 |
| 3 | 201923043574-CLAIMS [16-11-2021(online)].pdf | 2021-11-16 |
| 4 | 201923043574-FORM 1 [25-10-2019(online)].pdf | 2019-10-25 |
| 4 | 201923043574-FER_SER_REPLY [16-11-2021(online)].pdf | 2021-11-16 |
| 5 | 201923043574-FIGURE OF ABSTRACT [25-10-2019(online)].jpg | 2019-10-25 |
| 5 | 201923043574-FER.pdf | 2021-10-19 |
| 6 | Abstract1.jpg | 2021-10-19 |
| 6 | 201923043574-DRAWINGS [25-10-2019(online)].pdf | 2019-10-25 |
| 7 | 201923043574-FORM-26 [19-03-2020(online)].pdf | 2020-03-19 |
| 7 | 201923043574-DECLARATION OF INVENTORSHIP (FORM 5) [25-10-2019(online)].pdf | 2019-10-25 |
| 8 | 201923043574-COMPLETE SPECIFICATION [25-10-2019(online)].pdf | 2019-10-25 |
| 8 | 201923043574-ORIGINAL UR 6(1A) FORM 1-141119.pdf | 2019-11-16 |
| 9 | 201923043574-Proof of Right (MANDATORY) [12-11-2019(online)].pdf | 2019-11-12 |
| 10 | 201923043574-ORIGINAL UR 6(1A) FORM 1-141119.pdf | 2019-11-16 |
| 10 | 201923043574-COMPLETE SPECIFICATION [25-10-2019(online)].pdf | 2019-10-25 |
| 11 | 201923043574-FORM-26 [19-03-2020(online)].pdf | 2020-03-19 |
| 11 | 201923043574-DECLARATION OF INVENTORSHIP (FORM 5) [25-10-2019(online)].pdf | 2019-10-25 |
| 12 | Abstract1.jpg | 2021-10-19 |
| 12 | 201923043574-DRAWINGS [25-10-2019(online)].pdf | 2019-10-25 |
| 13 | 201923043574-FIGURE OF ABSTRACT [25-10-2019(online)].jpg | 2019-10-25 |
| 13 | 201923043574-FER.pdf | 2021-10-19 |
| 14 | 201923043574-FORM 1 [25-10-2019(online)].pdf | 2019-10-25 |
| 14 | 201923043574-FER_SER_REPLY [16-11-2021(online)].pdf | 2021-11-16 |
| 15 | 201923043574-FORM 18 [25-10-2019(online)].pdf | 2019-10-25 |
| 15 | 201923043574-CLAIMS [16-11-2021(online)].pdf | 2021-11-16 |
| 16 | 201923043574-REQUEST FOR EXAMINATION (FORM-18) [25-10-2019(online)].pdf | 2019-10-25 |
| 16 | 201923043574-PatentCertificate08-04-2024.pdf | 2024-04-08 |
| 17 | 201923043574-STATEMENT OF UNDERTAKING (FORM 3) [25-10-2019(online)].pdf | 2019-10-25 |
| 17 | 201923043574-IntimationOfGrant08-04-2024.pdf | 2024-04-08 |
| 1 | 2021-06-0415-28-15E_04-06-2021.pdf |