Sign In to Follow Application
View All Documents & Correspondence

Federated Learning Based Multi Scale Discriminator Optimization Method For Gan Based Realistic Scene Synthesis

Abstract: The present invention discloses a federated learning based optimization method for multi scale discriminators in generative adversarial networks to synthesize highly realistic scene images while preserving data privacy and reducing communication overhead. A central aggregation server distributes global discriminator weights to multiple client training nodes, each hosting a local generator and a discriminator comprising parallel subnetworks at varying resolutions. Client nodes perform local training, compute scale specific gradient updates, encrypt them, and transmit to the server. The server aggregates updates using weighted averages based on client specific reliability metrics and adapts learning rates per scale. Optional pruning and hybrid synchronous–asynchronous strategies further enhance efficiency. Secure MQTT/TLS communication ensures confidentiality. This approach enables distributed model training across heterogeneous hardware, achieving superior image fidelity and structural integrity without sharing raw data. Accompanied Drawing [Fig. 1]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
16 May 2025
Publication Number
23/2025
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
Parent Application

Applicants

SR University
Anantha Sagar, Hasanparthy (PO) Warangal – 506371, Telangana, India

Inventors

1. Mrs. Pragathi Maram
Research Scholar, School of Computer Science and Artificial Intelligence, SR University, Warangal, Telangana, 506371, India
2. Dr. P. Pramod Kumar
Associate Professor, School of Computer Science and Artificial Intelligence, SR University, Warangal, Telangana, 506371, India

Specification

Description:[001] The present invention relates to the fields of machine learning and computer vision, and more particularly to a federated learning framework for optimizing multi scale discriminator architectures within generative adversarial networks to achieve high fidelity realistic scene synthesis while preserving data privacy and minimizing communication overhead.
BACKGROUND OF THE INVENTION
[002] Background description includes information that may be useful in understanding the present disclosure. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed disclosure, or that any publication specifically or implicitly referenced is prior art.
[003] Realistic scene synthesis has seen rapid advances through deep convolutional GANs that employ discriminators to critique generated images, driving the generator toward greater fidelity. However, traditional approaches rely on centralized datasets and single scale discriminators, which limit both the resolution of detail and the diversity of learned representations.
[004] Federated learning has emerged as a promising paradigm for distributed model training, enabling multiple client devices to collaboratively train a shared global model without exposing their raw data. In conventional federated GAN schemes, clients exchange parameter updates for a unified discriminator, often at a single image resolution, and aggregation is typically performed via simple averaging. While such methods preserve data privacy, they suffer from reduced per scale sensitivity and increased convergence instability when synthesizing complex, high resolution scenes.
[005] Several works have attempted to integrate multi scale discrimination into GANs. For example, multi scale PatchGAN discriminators have been proposed to assess image patches at different receptive fields, improving texture realism at varying resolutions. However, these methods are confined to centralized training settings and cannot leverage distributed data sources without significant privacy risks. Similarly, U Net based discriminators have been applied in centralized GANs to capture both global context and fine details, yet they entail high communication costs and extended training times when naively ported to federated environments.
[006] On the federated learning front, algorithms such as Federated Averaging (FedAvg) have been adopted to train GAN discriminators across smartphones and edge devices. Although FedAvg ensures confidentiality of client datasets, it treats all parameter updates uniformly and does not differentiate between contributions at different image scales. Consequently, high resolution features are under represented in the global discriminator update, leading to suboptimal synthesis quality and potential mode collapse in fine grained regions.
[007] The cited prior art suffers from several drawbacks when applied to realistic scene synthesis in federated settings. Centralized multi scale discriminators cannot accommodate distributed data silos without violating privacy constraints. Conversely, federated methods that use single scale discriminators or apply uniform averaging fail to capture scale dependent nuances, resulting in poor high resolution detail, unstable convergence, and prohibitive communication overhead when naively scaling to multiple resolutions.
[008] The present invention addresses these shortcomings by introducing a Federated Learning Based Multi Scale Discriminator Optimization Method that integrates advanced multi scale discriminator architectures with a weighted aggregation algorithm tailored for federated environments.
SUMMARY OF THE INVENTION
[009] This section is provided to introduce certain objects and aspects of the present disclosure in a simplified form that are further described below in the detailed description. This summary is not intended to identify the key features or the scope of the claimed subject matter.
[010] The invention provides a federated learning framework specifically designed to optimize multi scale discriminators within GAN architectures for high fidelity scene synthesis. A central aggregation server initializes and distributes global discriminator parameters to a plurality of client training nodes, each node hosting a local generator and a discriminator comprising parallel subnetworks at multiple resolutions (for example, 64×64, 128×128 and 256×256 pixels). During each training round, client nodes perform local adversarial training, compute scale specific gradient updates, encrypt these updates via a two layer key scheme, and transmit them over MQTT/TLS channels. At the aggregation server, a reliability weighted averaging algorithm integrates the encrypted gradients for each scale, updates the global model, and dynamically adjusts learning rates based on validation metrics, thereby ensuring stable convergence and consistent synthesis quality across all spatial scales.
[011] In further embodiments, client nodes may apply structured pruning to discriminator channels to reduce communication payloads without significant loss in synthesis fidelity, and may operate under a hybrid synchronous–asynchronous update regime to accommodate intermittent connectivity. A backpropagation control unit within each client routes discriminator losses through gradient reversal layers to enforce multi scale feedback to the generator. Optional extensions include temporal discriminator modules for coherent video generation and semantic segmentation branches for class aware refinement. By preserving raw data locally, minimizing communication overhead, and leveraging heterogeneous hardware environments, the invention achieves marked improvements in realism metrics such as Fréchet Inception Distance and Structural Similarity Index compared to conventional centralized and single scale federated approaches.
BRIEF DESCRIPTION OF DRAWINGS
[012] The accompanying drawings are included to provide a further understanding of the present disclosure, and are incorporated in, and constitute a part of this specification. The drawings illustrate exemplary embodiments of the present disclosure, and together with the description, serve to explain the principles of the present disclosure.
[013] In the figures, similar components, and/or features may have the same reference label. Further, various components of the same type may be distinguished by following the reference label with a second label that distinguishes among the similar components. If only the first reference label is used in the specification, the description is applicable to any one of the similar components having the same first reference label irrespective of the second reference label.
[014] Fig. 1 illustrates working block flowchart associated with a federated learning based multi scale discriminator optimization method for GAN based realistic scene synthesis, in accordance with the embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
[015] The following is a detailed description of embodiments of the disclosure depicted in the accompanying drawings. The embodiments are in such detail as to clearly communicate the disclosure. However, the amount of detail offered is not intended to limit the anticipated variations of embodiments. On the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit, and scope of the present disclosure as defined by the appended claims.
[016] In the following description, numerous specific details are set forth in order to provide a thorough understanding of embodiments of the present invention. It will be apparent to one skilled in the art that embodiments of the present invention may be practiced without some of these specific details.
[017] Specific details are given in the following description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. For example, circuits, systems, networks, processes, and other components may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail to avoid obscuring the embodiments.
[018] Also, it is noted that individual embodiments may be described as a process that is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed but could have additional steps not included in a figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination can correspond to a return of the function to the calling function or the main function.
[019] The word “exemplary” and/or “demonstrative” is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as “exemplary” and/or “demonstrative” is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms “includes,” “has,” “contains,” and other similar words are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term “comprising” as an open transition word without precluding any additional or other elements.
[020] Reference throughout this specification to “one embodiment” or “an embodiment” or “an instance” or “one instance” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
[021] Referring to Figures 1, the present invention relates to a novel Federated Learning Based Multi Scale Discriminator Optimization Method for Generative Adversarial Network (GAN) Based realistic scene synthesis. The invention uniquely integrates distributed model training across multiple client devices with advanced multi scale discriminator architectures to produce highly realistic scene images, while preserving data privacy and reducing communication overhead.
[022] The detailed description of the invention is set forth herein in narrative form, comprising various embodiments and examples illustrating the interconnection and cooperative operation of each system component. The description emphasizes the technical details of modules, algorithms, network protocols, optimization strategies, and hardware configurations.
[023] The system architecture consists of a central Federated Aggregation Server (FAS), a plurality of Client Training Nodes (CTNs), and a set of Model Parameter Communication Channels (MPCCs). The FAS orchestrates the global model update, receives local discriminator weight deltas, and performs aggregation using a weighted averaging mechanism tailored for multi scale architectures. The CTNs host a local GAN instance comprising a Generator Network (GN) and a Multi Scale Discriminator Network (MDN).
[024] Each CTN is configured with a heterogeneous hardware environment, including GPU accelerators, local storage for private scene datasets, and secure communication modules. The GN is a deep convolutional network that synthesizes scene images from latent vectors, whereas the MDN consists of multiple parallel discriminator subnetworks operating at different image resolutions (e.g., 64×64, 128×128, 256×256 pixels) to evaluate both global context and fine-grained textures.
[025] The FAS implements a Federated Averaging with Multi Scale Sensitivity (FAMS) algorithm. During each training round, the CTNs perform local discriminator optimization by computing gradient updates for each scale-specific subnetwork. These local updates, denoted ΔW_i^s for CTN i and scale s, are encrypted and transmitted via the MPCCs to the FAS, ensuring end-to-end confidentiality through homomorphic encryption.
[026] Upon receipt of ΔW_i^s from all participating CTNs, the FAS computes a scale wise weighted average: W^s_{new} = \sum_i \phi_i^s \u0394W_i^s, where \phi_i^s represents the reliability coefficient determined by a local validation metric at CTN i for scale s. This weighted averaging accounts for data heterogeneity across nodes and balances contributions from diverse client datasets.
[027] Example Embodiment 1: A network of smart city surveillance cameras serves as CTNs. Each camera node trains the local GAN on frames captured from its monitoring radius. The local MDN penalizes low contrast anomalies at a lower resolution while detecting fine-grained details such as vehicle license plates at a higher resolution. The FAS aggregates gradients from fifty CTNs every epoch, ensuring real time global model updates with a communication budget of 10 MB per round.
[028] Example Embodiment 2: In an agricultural monitoring scenario, CTNs consist of drone-mounted imaging systems capturing crop field photographs. The local MDN scales include infrared and visible spectrum discriminators. Through federated learning, the global GAN learns to synthesize realistic farmland scenes under varying illumination and seasonal conditions without exposing raw images from individual farms.
[029] The interconnection between the GN and MDN in each CTN is established via a backpropagation control unit that routes discriminator loss signals to the generator through gradient reversal layers. This arrangement ensures that generator updates reflect a multi-scale critique, improving synthesis fidelity across spatial resolutions.
[030] Communication protocols within the MPCCs adhere to the MQTT standard with TLS v1.3. Packet payloads containing encrypted gradient updates are encapsulated in fixed size frames to minimize packet fragmentation and optimize throughput over lossy wireless links.
[031] Table 1 below demonstrates comparative results of the synthesized image quality, measured via Fréchet Inception Distance (FID) and Structural Similarity Index (SSIM), for embodiments using single scale versus multi scale discriminators under federated versus centralized training.

[032] As shown in Table 1, the proposed federated multi scale approach significantly improves FID and SSIM, demonstrating enhanced realism and structural integrity in synthesized scenes.
[033] The FAMS algorithm further incorporates adaptive learning rate scheduling for each scale subnetwork. A learning rate \alpha^s is dynamically adjusted based on the moving average of local validation loss, ensuring stable convergence across scales and preventing mode collapse in high resolution synthesis tasks.
[034] Example Embodiment 3: Urban traffic simulation nodes collect real world traffic pattern images. The multi scale discriminator includes a temporal discriminator subnetwork analyzing frame sequences at 8 frame intervals. The FAS extends FAMS to account for temporal coherence gradients, enhancing video scene synthesis capabilities.
[035] In another embodiment, CTNs utilize federated pruning strategies to reduce model size before transmitting updates. The MDN applies channel pruning with a sparsity target of 50%, decreasing communication payloads by approximately 40% without impairing synthesis quality, as evidenced in Table 2.

[036] Table 2 illustrates that pruning reduces communication overhead while maintaining acceptable quality metrics, validating the efficacy of integrated model compression.
[037] The federated workflow commences with FAS broadcasting the initial global discriminator weights to CTNs. Each CTN then executes local training epochs comprising alternating generator and multi scale discriminator updates, typically five alternations per round. Upon completion, CTNs encrypt and transmit only discriminator weight deltas, minimizing upload size.
[038] The security of data transfer is ensured via a dual-layer encryption scheme: local gradient updates are first encrypted using a symmetric key generated per CTN session, then the symmetric key is encrypted using the FAS’s public key. The FAS applies its private key to retrieve symmetric keys and then decrypts gradient updates.
[039] Embodiment Example 4: Industrial robotics arms equipped with vision sensors serve as CTNs in a manufacturing quality control application. The GAN synthesizes defect-free product images, allowing anomaly detection modules to learn diverse defect patterns. Federated multi scale discriminator training enhances defect localization accuracy at both assembly line and component level resolutions.
[040] The invention also contemplates a hybrid synchronous–asynchronous update strategy. CTNs with unreliable network connectivity may perform asynchronous gradient uploads, with their contributions integrated during the next synchronous aggregation cycle without stalling the global model training.
[041] In yet another embodiment, the MDN is augmented with semantic segmentation subnetworks. These subnetworks evaluate class specific synthesis quality, enabling the generator to refine object boundaries and context placement. The FAS computes class weighted reliability coefficients \phi_i^{s,c} for scale s and class c, further improving scene composition realism.
[042] Hardware Implementation Example: The FAS can be deployed on a cloud based GPU cluster, while CTNs run on edge devices featuring NVIDIA Jetson modules. The secure communication module is implemented using OpenSSL libraries, and homomorphic encryption leverages the Microsoft SEAL library.
[043] The interconnection between software modules and hardware interfaces is managed by a middleware layer that abstracts encryption, communication, and model serialization. This middleware provides APIs for training invocation, gradient packaging, and session management.
[044] To demonstrate efficacy, a comparative ablation study was conducted across 100 federated training rounds. Results indicated a 27% reduction in FID and a 15% increase in SSIM compared to baseline federated single scale methods. Further, communication cost per round was reduced by up to 45% with integrated pruning strategies.
[045] Embodiment Example 5: In medical imaging applications, CTNs include hospital workstations processing radiographic scans. The multi scale discriminator evaluates both organ level structures and pixel level texture anomalies. Federated learning preserves patient privacy while enabling cross institutional model training.
[046] The system further includes a fallback mechanism where CTNs with stale model versions receive differential updates to synchronize weights without full retransmission. This mechanism leverages delta encoding to compress weight differentials.
[047] Control Flow Diagram: Upon receiving convergence criteria from FAS, each CTN invokes a local training script. Generator loss is computed via adversarial feedback, while discriminator losses at each scale are aggregated to form a composite loss function: L_D = \sum_s \lambda^s L_D^s.
[048] The composite discriminator weight update is then split into scale specific segments for encryption and transmission. The FAS recombines these segments into the global MDN architecture.
[049] The invention provides a robust method for realistic scene synthesis across distributed data silos, achieving high quality, privacy preservation, and communication efficiency. The detailed system integration and multiple embodiments illustrate the versatility and effectiveness of the invention in diverse application domains.

, Claims:1. A method for federated learning based multi scale discriminator optimization in generative adversarial network based realistic scene synthesis, comprising:
a) initializing, at a central aggregation server, global discriminator weights for a multi scale discriminator network;
b) transmitting said global discriminator weights to a plurality of client training nodes, each node hosting a local generator network and a multi scale discriminator network;
c) at each client node, performing local training rounds in which:
i. the generator network synthesizes scene images from latent inputs;
ii. the multi scale discriminator network evaluates generated images at multiple resolutions to compute scale specific gradient updates;
d) encrypting and transmitting the scale specific gradient updates from each client node back to the central aggregation server;
e) at the central aggregation server, computing a weighted average of received gradient updates for each scale based on client specific reliability coefficients; and
f) updating the global discriminator weights with the aggregated scale specific averages for use in a subsequent training round.
2. The method as claimed in claim 1, wherein the multi scale discriminator network comprises parallel subnetworks operating at at least three distinct image resolutions.
3. The method as claimed in claim 1, wherein client specific reliability coefficients are determined by local validation metrics measuring discriminator performance on private datasets.
4. The method as claimed in claim 1, wherein gradient updates are encrypted at each client node using a symmetric key, and the symmetric key is further encrypted with the public key of the central aggregation server.
5. The method as claimed in claim 1, further comprising, at the central aggregation server, applying adaptive learning rate scheduling for each scale specific subnetwork based on moving averages of local validation losses.
6. The method as claimed in claim 1, wherein client nodes implement channel pruning on the multi scale discriminator network to reduce model size prior to encryption and transmission of gradient updates.
7. The method as claimed in claim 1, wherein communication between the central aggregation server and client nodes occurs over an MQTT protocol secured with TLS v1.3, encapsulating payloads in fixed size frames.
8. The method as claimed in claim 1, further comprising a hybrid synchronous–asynchronous update strategy whereby client nodes with unreliable connectivity transmit updates asynchronously and are integrated during the next scheduled aggregation.
9. The method as claimed in claim 1, wherein a backpropagation control unit at each client node routes discriminator loss signals through gradient reversal layers to the generator network to enforce a multi scale critique.
10. The method as claimed in claim 1, wherein the federated learning framework is applied across heterogeneous hardware environments, including edge deployed GPU modules and cloud based aggregation servers.

Documents

Application Documents

# Name Date
1 202541047544-STATEMENT OF UNDERTAKING (FORM 3) [16-05-2025(online)].pdf 2025-05-16
2 202541047544-REQUEST FOR EARLY PUBLICATION(FORM-9) [16-05-2025(online)].pdf 2025-05-16
3 202541047544-FORM-9 [16-05-2025(online)].pdf 2025-05-16
4 202541047544-FORM 1 [16-05-2025(online)].pdf 2025-05-16
5 202541047544-DRAWINGS [16-05-2025(online)].pdf 2025-05-16
6 202541047544-DECLARATION OF INVENTORSHIP (FORM 5) [16-05-2025(online)].pdf 2025-05-16
7 202541047544-COMPLETE SPECIFICATION [16-05-2025(online)].pdf 2025-05-16