System For Generating Digital Artwork From Textual Input

< Back

System For Generating Digital Artwork From Textual Input

Abstract: SYSTEM FOR GENERATING DIGITAL ARTWORK FROM TEXTUAL INPUT ABSTRACT A system (100) for generating digital artwork from a textual input is disclosed. The system (100) comprises a client device (102). The system (100) further comprises a second processor (110) located on an application server (108). The system (100) is configured to receive the text prompt from the client device (102); process the text prompt using natural language processing to extract semantic meaning; embed the processed text prompt into a representation compatible with an image generation model; generate a digital image corresponding to the text prompt using a diffusion-based machine learning model; and deliver the generated digital image to the client device (102) through a platform (104). The system (100) provides users with multiple variations and customization options. Claims: 10, Figures: 3 Figure 1 is selected.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

10 October 2025

Publication Number

46/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

SR University

SR University, Ananthasagar, Warangal Telangana India 506371 patent@sru.edu.in 08702818333

Inventors

1. D. Mohammed Ali Shaik

SR University, Ananthasagar, Hasanparthy (PO), Warangal, Telangana, India-506371

Specification

Description:BACKGROUND
Field of Invention
[001] Embodiments of the present invention generally relate to an image generator and particularly to a system for generating digital artwork from a textual input.
Description of Related Art
[002] The field of digital art creation has witnessed rapid growth in recent years. Traditional methods of producing artwork often rely on manual techniques, specialized training, and access to professional tools. These conventional approaches demand both time and skill, that often limit participation to individuals with artistic expertise or access to advanced resources. As a result, many potential creators face challenges in expressing their ideas visually.
[003] In response, integration of artificial intelligence and computational models into creative industries has gained prominence. Machine learning techniques and natural language processing have been adapted to interpret human inputs and generate visual outputs. Various platforms demonstrate the ability to convert descriptive input into images, though such solutions often require technical familiarity and an understanding of specific prompt structures to achieve optimal results.
[004] Despite these advancements, limitations remain in accessibility, usability, and customization. Many current solutions do not adequately serve beginners or casual users, as they present steep learning curves or restricted creative control. Furthermore, existing tools frequently lack flexibility in accommodating diverse styles, contexts, and levels of detail.
[005] There is thus a need for an improved and advanced system for generating digital artwork from a textual input that can administer the aforementioned limitations in a more efficient manner.
SUMMARY
[006] Embodiments in accordance with the present invention provide a system for generating digital artwork from a textual input. The system comprising a client device, comprising a first processor, adapted to deliver a text prompt to a platform. The system further comprising a second processor located on an application server. The system further comprising a communication network adapted to establish a communicative link connecting the client device to the application server. The system further comprising a storage medium comprising programming instructions executable by the second processor. The second processor is configured to receive the text prompt from the client device; process the text prompt using natural language processing to extract semantic meaning; embed the processed text prompt into a representation compatible with an image generation model; generate a digital image corresponding to the text prompt using a diffusion-based machine learning model; and deliver the generated digital image to the client device through the platform.
[007] Embodiments in accordance with the present invention further provide a method for generating digital artwork from a textual input. The method comprising steps of receiving a text prompt from a client device; processing the text prompt using natural language processing to extract semantic meaning; embedding the processed text prompt into a representation compatible with an image generation model; generating a digital image corresponding to the text prompt using a diffusion-based machine learning model; and delivering the generated digital image to the client device through the platform.
[008] Embodiments of the present invention may provide a number of advantages depending on their particular configuration. First, embodiments of the present application may provide a system for generating digital artwork from a textual input.
[009] Next, embodiments of the present application may provide a system that allows users to generate professional-quality digital artwork directly from simple text prompts.
[0010] Next, embodiments of the present application may provide a system that eliminates a need for specialized artistic skill or technical expertise.
[0011] Next, embodiments of the present application may provide a system that reduces time required to produce visual content.
[0012] Next, embodiments of the present application may provide a system that enables users to obtain high-quality images within seconds rather than hours.
[0013] Next, embodiments of the present application may provide a system that lowers barrier to entry for digital art creation by offering an affordable and user-friendly platform that caters to students, hobbyists, educators, and professionals alike.
[0014] Next, embodiments of the present application may provide a system that provides users with multiple variations and customization options.
[0015] Next, embodiments of the present application may provide a system that enhances creative freedom and enabling unique outputs tailored to individual preferences.
[0016] Next, embodiments of the present application may provide a system that supports diverse use cases, including education, content creation, entertainment, design, and social media.
[0017] Next, embodiments of the present application may provide a system that broadens utility across multiple industries.
[0018] These and other advantages will be apparent from the present application of the embodiments described herein.
[0019] The preceding is a simplified summary to provide an understanding of some embodiments of the present invention. This summary is neither an extensive nor exhaustive overview of the present invention and its various embodiments. The summary presents selected concepts of the embodiments of the present invention in a simplified form as an introduction to the more detailed description presented below. As will be appreciated, other embodiments of the present invention are possible utilizing, alone or in combination, one or more of the features set forth above or described in detail below.
BRIEF DESCRIPTION OF THE DRAWINGS
[0020] The above and still further features and advantages of embodiments of the present invention will become apparent upon consideration of the following detailed description of embodiments thereof, especially when taken in conjunction with the accompanying drawings, and wherein:
[0021] FIG. 1 illustrates a system for generating digital artwork from a textual input, according to an embodiment of the present invention;
[0022] FIG. 2 illustrates a table depicting comparison of the generative system, according to an embodiment of the present invention; and
[0023] FIG. 3 depicts a flowchart of a method for generating digital artwork from a textual input, according to an embodiment of the present invention.
[0024] The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description or the claims. As used throughout this application, the word "may" is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include”, “including”, and “includes” mean including but not limited to. To facilitate understanding, like reference numerals have been used, where possible, to designate like elements common to the figures. Optional portions of the figures may be illustrated using dashed or dotted lines, unless the context of usage indicates otherwise.
DETAILED DESCRIPTION
[0025] The following description includes the preferred best mode of one embodiment of the present invention. It will be clear from this description of the invention that the invention is not limited to these illustrated embodiments but that the invention also includes a variety of modifications and embodiments thereto. Therefore, the present description should be seen as illustrative and not limiting. While the invention is susceptible to various modifications and alternative constructions, it should be understood, that there is no intention to limit the invention to the specific form disclosed, but, on the contrary, the invention is to cover all modifications, alternative constructions, and equivalents falling within the scope of the invention as defined in the claims.
[0026] In any embodiment described herein, the open-ended terms "comprising", "comprises”, and the like (which are synonymous with "including", "having” and "characterized by") may be replaced by the respective partially closed phrases "consisting essentially of", “consists essentially of", and the like or the respective closed phrases "consisting of", "consists of”, the like.
[0027] As used herein, the singular forms “a”, “an”, and “the” designate both the singular and the plural, unless expressly stated to designate the singular only.
[0028] As used herein, the term ‘user’ refers to a person or entity that provides textual or descriptive input for the purpose of generating images, and that receives or utilizes the images produced by the system.
[0029] FIG. 1 illustrates a system 100 for generating digital artwork from a textual input, according to an embodiment of the present invention. In an embodiment of the present invention, the system 100 may provide a structured framework for producing digital artwork from textual descriptions. The system 100 may be designed as an integrated architecture that combines natural language processing, machine learning, and image enhancement techniques in order to convert user input into high-quality visual output.
[0030] According to the embodiments of the present invention, the system 100 may incorporate non-limiting hardware components to enhance a processing speed and an efficiency such as the system 100 may comprise a client device 102, a platform 104, a first processor 106, an application server 108, a second processor 110, a communication network 112, and a storage medium 114. In an embodiment of the present invention, the hardware components of the system 100 may be integrated with computer-executable instructions for overcoming the challenges and the limitations of the existing generative systems.
[0031] In an embodiment of the present invention, the client device 102 may be an electronic device used by a user. The client device 102 may enable the user to browse and operate on the platform 104. The client device 102 may enable the user to deliver a text prompt to the platform 104. The text prompt may comprise the textual inputs such as, but not limited to, an object, a scene, an artistic style, an emotion, a descriptive phrase, and so forth. The client device 102 may enable the user to view a digital image generated by the platform 104.
[0032] The client device 102 may be, but not limited to, a personal computer, a desktop, a server, a laptop, and alike. Embodiments of the present invention are intended to include or otherwise cover any type of the client device 102 including known, related art, and/or later developed technologies. The client device 102 may comprise the first processor 106. The first processor 106 may enable a computation of the platform 104. Further, the first processor 106 may be configured to relay actions of the user on the client device 102 to the second processor 110.
[0033] In an embodiment of the present invention, the application server 108 may be a hardware adapted to accommodate and enable an installation of the second processor 110. The application server 108 may be, but not limited to, a motherboard, a wired board, a mainframe, and so forth. Embodiments of the present invention are intended to include or otherwise cover any type of the application server 108, including known, related art, and/or later developed technologies.
[0034] In an embodiment of the present invention, the second processor 110 may be located on the application server 108. The second processor 110 may be configured to receive the text prompt from the client device 102. The second processor 110 may be configured to process the text prompt using natural language processing to extract semantic meaning. The second processor 110 may be configured to embed the processed text prompt into a representation compatible with an image generation model.
[0035] The second processor 110 may be configured to generate the digital image corresponding to the text prompt using a diffusion-based machine learning model. The diffusion-based machine learning model may comprise a pre-trained model selected from Stable Diffusion, DALL·E, or an equivalent. The second processor 110 may be configured to deliver the generated digital image to the client device 102 through the platform 104. The delivery of the generated image comprises at least one of display on screen, download in a digital format, share on an online platform, and so forth. The second processor 110 may be configured to enhance the digital image through post-processing techniques including upscaling, noise reduction, style transfer, color grading, composition enhancements, style uplifting, application of style filters, and so forth.
[0036] In an embodiment of the present invention, the communication network 112 may be adapted to establish a communicative link connecting the client device 102 to the application server 108. The communication network 112 may be, but not limited to a wired communication network, a wireless communication network, and so forth. Embodiments of the present invention are intended to include or otherwise cover any type of the communication network 112, including known, related art, and/or later developed technologies.
[0037] The wired communication network may be enabled by means such as, but not limited to, a twisted pair cable, a co-axial cable, an Ethernet cable, a modem, a router, a switch, and so forth. Embodiments of the present invention are intended to include or otherwise cover any type of the means that may enable the wired communication network, including known, related art, and/or later developed technologies.
[0038] The wireless communication network may be enabled by means such as, but not limited to, a Wi-Fi communication module, a Bluetooth communication module, a millimetre waves communication module, an Ultra-High Frequency (UHF) communication module, and so forth. Embodiments of the present invention are intended to include or otherwise cover any type of the means that may enable the wireless communication network, including known, related art, and/or later developed technologies.
[0039] In an embodiment of the present invention, the storage medium 114 may store the computer programmable instructions in form of programming modules. The storage medium 114 may be a non-transitory storage medium, in an embodiment of the present invention. The storage medium 114 may communicate with the second processor 110 and execute a computer-readable set of instructions present in storage medium 114, in an embodiment of the present invention.
[0040] The storage medium 114 may be, but not limited to, a Random-Access Memory (RAM), a Static Random-Access Memory (SRAM), a Dynamic Random-Access Memory (DRAM), a Read Only Memory (ROM), an Erasable Programmable Read-only Memory (EPROM), an Electrically Erasable Programmable Read-only Memory (EEPROM), a NAND Flash, a Secure Digital (SD) memory, a cache memory, a Hard Disk Drive (HDD), a Solid-State Drive (SSD) and so forth. Embodiments of the present invention are intended to include or otherwise cover any type of the storage medium 114, including known, related art, and/or later developed technologies.
[0041] In an exemplary scenario of the present invention, the user may deliver the text prompt comprising the textual inputs. For example, the text prompt may be ‘a serene mountain landscape at sunset’. The system 100 may deploy Natural Language Processing (NLP) techniques to understand and tokenize the text prompt. Further, a Contrastive Language-Image Pre-training (CLIP) tool may be deployed to align a meaning of the text prompt with visual features. Further, the system 100 may execute the diffusion-based machine learning model to gradually convert noise into the digital image that may match the semantic meaning of the textual inputs. The conversion of the noise, using the diffusion-based machine learning model may generate digital image corresponding to ‘a serene mountain landscape at sunset’.
[0042] Additionally, the diffusion-based machine learning model may generate a set and/or multiple variation of the digital images corresponding to ‘a serene mountain landscape at sunset.’ The system 100 may further enhance, and may display the generated digital image. The system 100 may further be open for additional textual inputs and feedbacks from the user for refinement of the generated digital image. The feedbacks may allow modulation of the system 100 to improve future outputs, and adjusts generation time based on the complexity of the text prompt. Further, the system 100 may ensure ethical usage and ownership of the generated digital image. The system 100 may support various predefined artistic styles, such as realism and abstract. Additionally, system 100 may deliver results in real time, providing a quick and efficient creative experience.
[0043] In another exemplary scenario of the present invention, the user may deliver the text prompt comprising the textual inputs. For example, the text prompt may be ‘a futuristic city skyline under a starry night sky’. The system 100 may deploy the Natural Language Processing (NLP) techniques to analyze and tokenize the text prompt. Further, the Contrastive Language-Image Pre-training (CLIP) tool may be deployed to correlate the semantic meaning of the text prompt with corresponding visual attributes. Further, the system 100 may execute the diffusion-based machine learning model to iteratively refine random noise into a digital image that matches the textual description. The conversion of the noise, using the diffusion-based machine learning model, may generate the digital image corresponding to ‘a futuristic city skyline under a starry night sky’. Additionally, the diffusion-based machine learning model may generate multiple variations of the digital image corresponding to ‘a futuristic city skyline under a starry night sky’. The system 100 may further enhance, optimize, and display the generated digital image. The system 100 may remain receptive to additional textual inputs and interactive feedback from the user for refinement of the generated digital image. Such feedback may enable adaptive learning of the system 100 for improving the quality of future outputs and adjusting system performance according to the complexity of the text prompt.
[0044] Furthermore, the system 100 may implement responsible usage protocols and ensure rightful ownership of the generated digital image. The system 100 may also support predefined creative modes such as photorealism, cyberpunk, or impressionism. Additionally, the system 100 may deliver the generated outputs in near real time, ensuring a smooth, efficient, and engaging creative experience for the user.
[0045] FIG. 2 illustrates a table 200 depicting a comparison of the system 100, according to an embodiment of the present invention. In an embodiment of the present invention, the system 100 may demonstrate superior performance in comparison with existing art generation platforms. The system 100 achieves an image quality score of 4.5, a creativity score of 4.7, and a prompt accuracy score of 4.6 on a scale of 1 to 5. In comparison, DALL·E records scores of 4.2 for image quality, 4.3 for creativity, and 4.1 for prompt accuracy; Mid Journey records scores of 4.6 for image quality, 4.8 for creativity, and 4.4 for prompt accuracy; and Stable Diffusion records scores of 4.0 for image quality, 4.1 for creativity, and 4.2 for prompt accuracy. The results establish that the system 100 consistently delivers high-quality outputs with strong alignment to user-provided prompts while maintaining competitive performance across creativity and accuracy dimensions.
[0046] FIG. 3 depicts a flowchart of a method 200 for generating digital artwork from the textual input using the system 100, according to an embodiment of the present invention.
[0047] At step 302, the system 100 may receive the text prompt from the client device 102.
[0048] At step 304, the system 100 may process the text prompt using the natural language processing to extract the semantic meaning.
[0049] At step 306, the system 100 may embed the processed text prompt into the representation compatible with the image generation model.
[0050] At step 308, the system 100 may generate the digital image corresponding to the text prompt using the diffusion-based machine learning model.
[0051] At step 310, the system 100 may deliver the generated digital image to the client device 102 through the platform 104.
[0052] At step 312, the system 100 may enhance the digital image through the post-processing techniques.
[0053] While the invention has been described in connection with what is presently considered to be the most practical and various embodiments, it is to be understood that the invention is not to be limited to the disclosed embodiments, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the scope of the appended claims.
[0054] This written description uses examples to disclose the invention, including the best mode, and also to enable any person skilled in the art to practice the invention, including making and using any devices or systems and performing any incorporated methods. The patentable scope of the invention is defined in the claims, and may include other examples that occur to those skilled in the art. Such other examples are intended to be within the scope of the claims if they have structural elements that do not differ from the literal language of the claims, or if they include equivalent structural elements within substantial differences from the literal languages of the claims. , Claims:CLAIMS
I/We Claim:
1. A system (100) for generating digital artwork from a textual input, the system (100) comprising:
a client device (102), comprising a first processor (106), adapted to deliver a text prompt to a platform (104); and
an application server (108) comprising a second processor (110), wherein the application server (108) is in communication with the client device (102) using a communication network (112), characterized in that the second processor (110), is configured to:
receive the text prompt from the client device (102);
process the text prompt using natural language processing to extract semantic meaning;
embed the processed text prompt into a representation compatible with an image generation model;
generate a digital image corresponding to the text prompt using a diffusion-based machine learning model; and
deliver the generated digital image to the client device (102) through the platform (104).
2. The system (100) as claimed in claim 1, wherein the second processor (110) is configured to deploy a Contrastive Language-Image Pre-training (CLIP) tool to align a meaning of the text prompt with visual features of the generated digital image.
3. The system (100) as claimed in claim 1, wherein the diffusion-based machine learning model is configured to gradually convert noise into digital images to match the extracted semantic meaning of the text prompt.
4. The system (100) as claimed in claim 1, wherein the second processor (110) is configured to enhance the digital image through post-processing techniques including upscaling, noise reduction, style transfer, color grading, composition enhancements, style uplifting, application of style filters, or a combination thereof.
5. The system (100) as claimed in claim 1, wherein the text prompt comprises the textual inputs selected from one of an object, a scene, an artistic style, an emotion, a descriptive phrase, or a combination thereof.
6. The system (100) as claimed in claim 1, wherein the delivery of the generated image comprises at least one of display on screen, download in a digital format, share on an online platform, or a combination thereof.
7. A method (200) for generating digital artwork from a textual input, the method (200) is characterized by steps of:
receiving a text prompt from a client device (102);
processing the text prompt using natural language processing to extract semantic meaning;
embedding the processed text prompt into a representation compatible with an image generation model;
generating a digital image corresponding to the text prompt using a diffusion-based machine learning model; and
delivering the generated digital image to the client device (102) through the platform (104).
8. The method (200) as claimed in claim 7, comprising a step of enhancing the digital image through post-processing techniques including upscaling, noise reduction, style transfer, color grading, composition enhancements, style uplifting, application of style filters, or a combination thereof.
9. The method (200) as claimed in claim 7, wherein the text prompt comprises the textual inputs selected from one of an object, a scene, an artistic style, an emotion, a descriptive phrase, or a combination thereof.
10. The method (200) as claimed in claim 7, wherein the delivery of the generated image comprises at least one of display on screen, download in a digital format, share on an online platform, or a combination thereof.
Date: October 08, 2025
Place: Noida

Nainsi Rastogi
Patent Agent (IN/PA-2372)
Agent for the Applicant

Documents

Application Documents

#	Name	Date
1	202541098309-STATEMENT OF UNDERTAKING (FORM 3) [10-10-2025(online)].pdf	2025-10-10
2	202541098309-REQUEST FOR EARLY PUBLICATION(FORM-9) [10-10-2025(online)].pdf	2025-10-10
3	202541098309-POWER OF AUTHORITY [10-10-2025(online)].pdf	2025-10-10
4	202541098309-OTHERS [10-10-2025(online)].pdf	2025-10-10
5	202541098309-FORM-9 [10-10-2025(online)].pdf	2025-10-10
6	202541098309-FORM FOR SMALL ENTITY(FORM-28) [10-10-2025(online)].pdf	2025-10-10
7	202541098309-FORM 1 [10-10-2025(online)].pdf	2025-10-10
8	202541098309-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [10-10-2025(online)].pdf	2025-10-10
9	202541098309-EDUCATIONAL INSTITUTION(S) [10-10-2025(online)].pdf	2025-10-10
10	202541098309-DRAWINGS [10-10-2025(online)].pdf	2025-10-10
11	202541098309-DECLARATION OF INVENTORSHIP (FORM 5) [10-10-2025(online)].pdf	2025-10-10
12	202541098309-COMPLETE SPECIFICATION [10-10-2025(online)].pdf	2025-10-10