Abstract: ABSTRACT DYNAMIC DECENTRALIZED SYSTEM FOR SECURE ADAPTIVE FINE-TUNING OF LANGUAGE MODELS A dynamic decentralized system (100) for adaptive fine-tuning of large language models, comprising: a network configuration (102); a plurality of edge nodes (101), each edge node comprising a training module (103); a secure aggregation unit (104) connected to the plurality of edge nodes; a filter unit (105) connected to the plurality of edge nodes; and a sparse adapter subspace unit (106) connected to the plurality of edge nodes and communicatively coupled with the secure aggregation unit and the filter unit. The system reconfigures peer edge nodes based on real-time network configuration and edge node capabilities. The edge nodes tailor local epochs based on node capabilities. The network configuration employs control-variate and low rank adaptation methods. The secure aggregation unit employs additive secret-sharing for peer-to-peer communication. The filter unit integrates coordinate-wise median filtering. The sparse adapter subspace unit enables collaboration with different data types and modalities without interference. Figure 1
Description:FIELD OF THE INVENTION
[0001] The present disclosure relates to a dynamic decentralized system for secure adaptive fine-tuning of language models. More particularly, the present invention provides a dynamic and secure system for adaptive fine-tuning of large language models using low-rank adapters within a decentralized peer-to-peer network of edge nodes. The system is provided to preserve data privacy and model integrity while enabling efficient domain & task adaptation across heterogeneous environments.
BACKGROUND OF THE INVENTION
[0002] Large language models have become increasingly prevalent in various applications, ranging from natural language processing and conversational agents to content generation and decision support systems. These models, typically trained on vast and diverse datasets, have demonstrated remarkable capabilities in understanding, reasoning, and generating human-like text across multiple domains. However, the deployment, customization, and fine-tuning of such models present several challenges, particularly in decentralized or resource-constrained environments. These challenges include ensuring data privacy, handling heterogeneous hardware and data distributions, managing communication overhead, and maintaining model integrity and performance across distributed nodes.
[0003] Fine-tuning large language models often requires substantial computational resources and access to sensitive or proprietary data. Traditional approaches to fine-tuning typically involve centralized systems, where a single entity controls both the orchestration of the training process and complete access to the aggregated data. This centralized architecture may raise significant concerns regarding data privacy, security vulnerabilities, regulatory compliance, and the potential for misuse or unauthorized exploitation of user data especially in sensitive domains such as healthcare, finance, or personalized services.
[0004] Decentralized systems offer an alternative approach, allowing multiple parties such as individual data owners, edge devices, or organizational nodes to collaboratively participate in the fine-tuning process without relying on a central authority. These systems can potentially address some of the critical concerns associated with centralized training pipelines, particularly in terms of user data privacy, compliance with data sovereignty regulations, and minimizing single points of failure. However, decentralized fine-tuning introduces its own set of challenges, including ensuring communication efficiency over heterogeneous and bandwidth-limited networks, managing data heterogeneity across participants with non-IID (independent and identically distributed) data, and implementing robust and secure aggregation methods to ensure convergence without compromising model integrity or leaking sensitive information.
[0005] Parameter-efficient fine-tuning techniques, such as low-rank adaptation (LoRA), have emerged as promising approaches to reduce the computational and memory requirements of fine-tuning large language models. These methods involve injecting trainable low-rank matrices into specific layers of a pre-trained model and updating only these lightweight components while keeping the majority of the model parameters frozen. Such strategies enable scalable adaptation even on resource-constrained devices. However, applying these techniques in decentralized settings presents additional complexities, particularly in managing the secure distribution, synchronization, and aggregation of parameter updates across heterogeneous and intermittently connected edge nodes.
[0006] The heterogeneity of data and computational resources across different nodes in a decentralized network poses another challenge. Nodes may have varying amounts and types of data ranging from structured to unstructured and differing computational capabilities, such as processing power, memory, and energy constraints. Addressing this heterogeneity while ensuring efficient and effective fine-tuning requires careful consideration of task scheduling, model update mechanisms, load balancing, and secure aggregation strategies that are robust to such variations across the network.
[0007] Security and privacy considerations are paramount in decentralized fine-tuning systems. Protecting the confidentiality of individual updates while still allowing for effective aggregation is a complex task, especially in environments with heterogeneous data sources and varying trust levels. This necessitates the use of secure multiparty computation or secret-sharing techniques that ensure model updates that cannot be reverse-engineered or attributed to specific nodes. Additionally, safeguarding against malicious actors who may attempt to inject poisoned or adversarial updates into the system is crucial for maintaining the integrity, robustness, and fairness of the fine-tuned model across diverse deployment scenarios.
[0008] As the field of large language models continues to evolve, there is a growing interest in developing methods that can handle multi-modal data (such as text, images, audio, structured data and unstructured data) and support multi-task learning across diverse domains. Extending decentralized fine-tuning approaches to accommodate these scenarios introduces additional complexities, including the need to manage heterogeneous data modalities, ensure balanced task prioritization, and prevent negative transfer or interference between tasks during distributed training.
[0009] The prior‑art document US20210342677A1 while being focused on hardware‑centric eXplainable Neural Network (XNN) architectures detailing various optimizations aimed at improving interpretability and energy efficiency. It conceptually references the suitability of XNNs for “decentralized or federated implementation” in IoT/edge/mesh networks and mentions “Distributed XNNs” in the context of client-server or serverless topologies. However it does not disclose any working federated-learning protocol, there are no update workflows, aggregation schemes, communication formats, or privacy-budget enforcement mechanisms, nor any embodiment that demonstrates decentralized convergence. Its notion of “dynamic” behavior is restricted to on-chip reconfiguration and does not extend to runtime reallocation of training roles, adaptation to bandwidth or node availability, or dynamic topology updates in a distributed network. No mechanisms are provided for heterogeneity-aware scheduling; all compute models assume homogeneous hardware and static configurations, with no discussion of dispatching tasks based on memory, latency, battery, or bandwidth constraints. Likewise, the document lacks secure peer-to-peer adapter-sharing protocols there is no cryptographic design or communication mechanism for exchanging parameter modules like low-rank adapters between decentralized, potentially untrusted nodes. Although secure enclaves and SMPC-inspired polynomial evaluations are briefly mentioned as possible future directions, they remain theoretical suggestions without any concrete system design, verified execution model, or demonstrated privacy-preserving pipeline for inference or explainability.
[0010] Therefore, in order to overcome the above-mentioned drawbacks, there is a requirement for a decentralized fine-tuning system for large language models that leverages distributed computational resources across edge devices. Such a system must employ innovative approaches capable of balancing efficiency and effectiveness in model adaptation, while simultaneously ensuring data privacy, communication security, and robustness against node heterogeneity and adversarial threats.
OBJECTIVE OF THE INVENTION
[0011] The primary objective of the present invention is to provide a dynamic, decentralized system for the secure and adaptive fine-tuning of large language models, specifically designed to address challenges related to efficiency, data privacy, and security in distributed computing environments.
[0012] Another objective of the present invention is to provide a system capable of dynamically reconfiguring peer-to-peer connections between edge nodes based on real-time network metrics and individual node capabilities, thereby optimizing the network topology to enhance convergence rates, reduce latency, and improve overall communication efficiency during the fine-tuning process.
[0013] Another objective of the present invention is to provide a method for dynamically tailoring the number of local training epochs at each edge node based on its computational resources, thereby enabling heterogeneous deployments by accommodating variability in hardware capabilities and ensures efficient utilization of available resources during the fine-tuning process.
[0014] Another objective of the present invention is to incorporate a combination of control-variate techniques and low-rank adaptation methods to effectively manage data heterogeneity across edge nodes, thereby enhancing the stability and efficiency of the fine-tuning process in environments characterized by non-identical and independently distributed (non-IID) data.
[0015] Another objective of the present invention is to provide a secure, serverless aggregation protocol based on additive secret-sharing, enabling edge nodes to engage in peer-to-peer communication and exchange adapter updates without dependence on a central server, thereby enhancing data privacy, mitigating risks of single points of failure, and promoting resilience in decentralized training environments.
[0016] Another objective of the present invention is to improve secure aggregation efficiency and to mitigate the impact of slow or resource-constrained clients by grouping edge nodes into clusters based on data-distribution similarity or network proximity. The system performs a two-stage, additive-secret-sharing aggregation: first aggregating adapter updates within each cluster, and then exchanging and aggregating these cluster-level results across clusters. Clusters may be adaptively re-formed to maintain performance. These measures preserve end-to-end privacy, minimize communication overhead, and ensure resilience against stragglers.
[0017] Another objective of the present invention is to provide a robust filtering mechanism employing coordinate-wise median methods to safeguard the system against poisoned or malicious model updates, thereby ensuring the integrity and reliability of the fine-tuned model even in the presence of adversarial or compromised edge nodes.
[0018] Another objective of the present invention is to provide a sparse adapter subspace that enables collaboration between edge nodes operating on diverse data types and modalities, thereby supporting multi-modal and multi-task learning scenarios while minimizing cross-task interference and preserving task-specific representations.
[0019] Another objective of the present invention is to provide a rank-adaptive low-rank adaptation method that dynamically adjusts the rank of each adapter during the training process, enabling a balance between model capacity and computational efficiency by tailoring the adapter rank in response to task complexity, data characteristics, and available resources.
[0020] Another objective of the present invention is to provide techniques for orthogonalization and sparsification of adapters to reduce cross-task interference in multi-task learning scenarios, thereby enabling more effective knowledge sharing across different tasks by preserving task-specific representations and minimizing negative transfer (e.g., when adapters are trained concurrently or sequentially on heterogeneous datasets).
[0021] Yet another objective of the present invention is to provide a system capable of supporting continual learning scenarios, wherein the model can adapt to new tasks over time while preserving performance on previously learned tasks through strategic management of adapter subspaces, enabling efficient task-specific adaptation.
SUMMARY OF THE INVENTION
[0022] The present invention provides a dynamic decentralized system (100) for adaptive fine-tuning of large language models. The system (100) includes a network configuration unit (102) that monitors network metrics and dynamically rewires the peer-to-peer topology of the edge nodes (101) in real time. A plurality of edge nodes (101) is present, each with a training module (103) for local model updates. Each training module (103) tailors its number of local training epochs based on its node’s capabilities and employs control-variate drift correction and low-rank adaptation methods for efficient, personalized updates. A secure aggregation unit (104) performs additive secret-sharing to enable privacy preserving peer-to-peer exchange of adapter updates without relying on a central server. A filter unit (105) applies coordinate-wise median filtering to the received adapter updates, defending the system (100) against poisoned or adversarial updates. A sparse adapter subspace unit (106) partitions adapter parameters into modality-specific subspaces and applies orthogonalization and sparsification, enabling collaboration across different data modalities while reducing cross-task interference. The system (100) also supports dynamic rewiring of peer connections by the network configuration unit (102), adaptive adapter rank modulation by the training module (103) during training, and subspace orthogonalization strategies by the sparse adapter subspace unit (106) to manage heterogeneity across tasks and nodes. A corresponding method and non-transitory computer-readable medium are provided, which include operations for initializing the decentralized network (100), adaptively reconfiguring peer connections using the network configuration unit (102), customizing local training schedules via each training module (103), applying heterogeneity-aware optimization techniques, executing secure aggregation protocols through the secure aggregation unit (104), and maintaining sparse adapter subspaces using the sparse adapter subspace unit (106) to support decentralized, multi-modal, and multi-task learning across diverse environments.
BRIEF DESCRIPTION OF DRAWINGS
[0023] The present invention will be better understood after reading the following detailed description of the presently preferred aspects thereof with reference to the appended drawings, in which the features, other aspects and advantages of certain exemplary embodiments of the invention will be more apparent from the accompanying drawing in which
[0024] FIG. 1 illustrates a block diagram of a dynamic decentralized system for adaptive fine-tuning of large language models
[0025] FIG. 2 illustrates a schematic diagram of a decentralized peer-to-peer network for low-rank fine-tuning, according to aspects of the present disclosure.
[0026] FIG. 3 depicts a flowchart for heterogeneity-aware gradient correction in a decentralized network, according to an embodiment.
[0027] FIG. 4 illustrates a flowchart for an adaptive local update scheduling mechanism, according to aspects of the present disclosure.
[0028] FIG. 5 depicts a secure aggregation protocol with three main phases, according to an embodiment.
[0029] FIG. 6 illustrates a flowchart for a rank-adaptive LoRA training process, according to aspects of the present disclosure.
[0030] FIG. 7 depicts a flowchart for robust aggregation in a decentralized system, according to an embodiment.
[0031] FIG. 8 illustrates a flowchart showing extension to multi-task learning and processing in a decentralized system, according to aspects of the present disclosure.
[0032] FIG. 9 illustrates a flowchart for a dynamic topology adaptation process in a decentralized network.
[0033] FIG. 10 illustrates a flowchart for a decentralized fine-tuning process.
[0034] FIG. 11 illustrates a flowchart for client preparation and peer communication in a decentralized system.
[0035] FIG. 12 illustrates a block diagram of a dynamic decentralized system with adaptive clustering and two-tier secure aggregation for global model updates.
DETAILED DESCRIPTION OF THE INVENTION
[0036] The following detailed description and embodiments set forth herein below are merely exemplary out of the wide variety and arrangement of instructions which can be employed with the present invention. The present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. All the features disclosed in this specification may be replaced by similar other or alternative features performing similar or same or equivalent purposes. Thus, unless expressly stated otherwise, they all are within the scope of the present invention.
[0037] Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the invention. In addition, descriptions of well-known functions and constructions are omitted for clarity and conciseness.
[0038] It should be emphasized that the term “comprises/comprising” when used in this specification is taken to specify the presence of stated features, steps or components but does not preclude the presence or addition of one or more other features, steps, components or groups thereof.
[0039] The present invention provides a dynamic decentralized system for secure adaptive fine-tuning of large language models. The system may utilize a network of edge nodes to perform distributed fine-tuning while addressing challenges related to data privacy, security, and heterogeneity across nodes. The edge nodes locally process sensitive data without transmitting raw inputs to a central server, thereby preserving user privacy. The system further ensures secure aggregation of model updates, robust coordination of training across dynamic network topologies, and adaptability to variations in computational capacity and data distributions across participating nodes.
[0040] In an embodiment, the present invention provides a dynamic decentralized system for adaptive fine‑tuning of large language models, comprising a network configuration unit (102) that continuously monitors real‑time metrics (e.g., communication bandwidth, latency, compute and memory availability) and dynamically rewires peer‑to‑peer connections among the edge nodes (101); a plurality of edge nodes (101), each comprising a training module (103); a secure aggregation unit (104) connected to the plurality of edge nodes; a filter unit (105) connected to the plurality of edge nodes; and a sparse adapter subspace unit (106) connected to the plurality of edge nodes and communicatively coupled with both the secure aggregation unit and the filter unit.
[0041] The network configuration unit (102) further groups edge nodes into clusters based on low‑dimensional feature vectors derived from local data distributions or network-proximity metrics (e.g., latency PCA) to enable hierarchical two‑stage secure aggregation. Within each cluster, adapter updates are first aggregated; thereafter, designated cluster heads exchange and re‑aggregate these intermediate results across clusters to produce the global update. Clusters may be re‑formed periodically (every T rounds) or when a cluster‑health metric (such as intra‑cluster variance or aggregation delay) exceeds a predefined threshold.
[0042] Each training module (103) performs local fine‑tuning using a combined control‑variate drift‑correction method and low‑rank adaptation, and autonomously determines its number of local epochs based on current CPU, memory, and power constraints. Additionally, each module employs a rank‑adaptive low‑rank adaptation method that dynamically adjusts the adapter’s rank in response to convergence signals (e.g., loss-plateau detection) and available bandwidth, thereby balancing model capacity with communication and compute efficiency.
[0043] The secure aggregation unit (104) utilizes additive secret‑sharing to combine adapter updates in a fully decentralized, serverless manner. Each edge node partitions its update into cryptographic shares and exchanges them peer‑to‑peer; only the aggregate sum is ever reconstructed, preserving individual update privacy and eliminating any single point of failure.
[0044] The filter unit (105) applies a coordinate‑wise median aggregation technique optionally supplemented by a Krum‑style robust filter to detect and remove adversarial or poisoned adapter updates, enhancing overall model integrity and stability.
[0045] The sparse adapter subspace unit (106) partitions adapter parameters into task‑ or modality‑specific subspaces, applies orthogonalization and sparsification to minimize cross‑task interference, and supports continual learning by freezing or zeroing out elements of prior-task subspaces when new tasks are introduced. This unit thereby enables modular, multi‑modal, and multi‑task collaboration across heterogeneous edge nodes while preserving task‑specific representations and preventing catastrophic forgetting.
[0046] In another embodiment, the control-variate method employed within the training module (103) incorporates drift correction mechanisms to maintain stability and alignment of local model updates with the global objective. This helps mitigate divergence due to heterogeneous data distributions across edge nodes (101). Additionally, the low-rank adaptation method integrates subspace-aware normalization to effectively leverage the inherent low-rank structure of parameter updates, enabling parameter-efficient fine-tuning. This method dynamically adjusts the rank configuration of each edge node (101) during the training process based on node-specific resource constraints and convergence behaviour. The network configuration unit (102) dynamically rewires peer-to-peer connections among edge nodes (101) at each training round to optimize gradient mixing and accelerate convergence. The dynamic rewiring strategy leverages graph-theoretic criteria such as model update variance, node centrality, and communication bandwidth. Furthermore, the sparse adapter subspace unit (106) performs orthogonalization and sparsification of task-specific adapters to minimize cross-task interference and promote modularity in multi-task learning scenarios, especially when edge nodes (101) participate in heterogeneous tasks.
[0047] In another embodiment, a method for adaptive fine-tuning of large language models in a dynamic decentralized system is provided, the method comprising:
• configuring a network of a plurality of edge nodes (101), each comprising a local training module (103) and communication interface;
• dynamically reconfiguring peer-to-peer connectivity among edge nodes (101) based on real-time network metrics and individual edge node capabilities such as compute power, memory, or data availability, using a network configuration unit (102);
• tailoring the number of local training epochs individually for each edge node (101), in accordance with the respective node’s resource constraints and dataset size, as determined by its training module (103);
• employing a combination of control-variate methods and low-rank adaptation techniques within each training module (103) to address heterogeneity in node capabilities and local data distributions across the edge network;
• performing secure aggregation of adapter updates via additive secret-sharing protocols using the secure aggregation unit (104), thereby enabling decentralized peer-to-peer model synchronization without reliance on a central server;
• applying a coordinate-wise median-based filtering mechanism via the filter unit (105) to detect and suppress malicious or poisoned adapter contributions during aggregation;
• utilizing a sparse adapter subspace unit (106) to support collaboration among edge nodes (101) possessing heterogeneous datasets, data modalities, or domain-specific knowledge, thereby preventing cross-interference and promoting efficient fine-tuning.
[0048] In another embodiment, the present invention provides a non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform operations for adaptive fine-tuning of large language models in a dynamic decentralized system, the operations comprising:
• initializing the network configuration unit (102) with a plurality of edge nodes (101) interconnected via a peer-to-peer topology;
• dynamically reconfiguring peer connections between the edge nodes (101) using the network configuration unit (102) based on real-time network metrics, such as latency, bandwidth, and node reliability, as well as individual node capabilities;
• adjusting local training schedules for each edge node (101) via its training module (103) proportionally to that node’s available computational resources, memory footprint, and energy constraints;
• applying a combination of control-variate and low-rank adaptation methods within each training module (103) to effectively manage statistical and system heterogeneity across the distributed edge nodes (101);
• implementing a secure, serverless aggregation protocol for adapter updates using the secure aggregation unit (104) and additive secret-sharing, thereby ensuring privacy-preserving model synchronization without reliance on a centralized coordinator;
• filtering received adapter updates with the filter unit (105) using a coordinate-wise median aggregation method to enhance robustness against malicious or corrupted inputs from potentially adversarial edge nodes (101); and
• maintaining a shared sparse adapter subspace across participating edge nodes (101) via the sparse adapter subspace unit (106) to support efficient multi-modal and multi-task learning while preserving model compactness and generalization capability.
[0049] In another embodiment, dynamically reconfiguring peer connections comprises:
• evaluating current model variance and node centrality for each edge node; and
• rewiring connections between edge nodes based on the evaluation to optimize mixing and convergence.
[0050] In another embodiment, dynamically reconfiguring peer connections comprises:
• evaluating current model variance and node centrality for each edge node (101), wherein model variance indicates divergence in local updates and node centrality reflects the structural importance of the node within the peer-to-peer topology; and
• rewiring connections between edge nodes (101) by the network configuration unit (102) based on the evaluation to optimize efficiency and accelerate convergence across the decentralized network while preserving data locality and communication constraints.
[0051] In another embodiment, adjusting local training schedules comprises:
• assessing available computational resources on each edge node (101), including CPU utilization, memory availability, network bandwidth, and battery level, to ensure optimal use of device-specific capacities; and
• dynamically determining, by the training module (103) within each edge node (101), an appropriate number of local training epochs based on the assessed resource profile, such that training efficiency is balanced with energy and bandwidth constraints in heterogeneous environments.
[0052] In another embodiment, applying the combination of control-variate and low-rank adaptation methods by the training module (103) of each edge node (101) comprises:
• implementing a drift correction mechanism to compensate for statistical heterogeneity across edge nodes (101), wherein control variates are used to stabilize local updates by reducing gradient variance introduced by non-i.i.d. data distributions; and
• applying subspace-aware normalization to low-rank adapter updates before aggregation by the sparse adapter subspace unit (106), ensuring that the updates remain consistent across nodes (101) by aligning them within a common latent representation space derived from the adapter subspace.
[0053] In another embodiment, implementing the secure, serverless aggregation protocol comprises:
• splitting adapter updates using additive secret sharing at each edge node (101) to ensure that no single node possesses complete update information;
• exchanging secret shares with neighboring edge nodes (101) over encrypted peer-to-peer channels to preserve confidentiality and decentralization; and
• combining shares to reconstruct the aggregated adapter when a quorum of edge nodes (101) is reached, thereby enabling secure, privacy-preserving model synchronization without reliance on a central server, under the coordination of the secure aggregation unit (104) as part of the decentralized network managed by the network configuration unit (102).
[0054] In another embodiment, maintaining the sparse adapter subspace by the sparse adapter subspace unit (106) comprises:
• logically separating adapters corresponding to different tasks or modalities to preserve task-specific representations on each edge node (101);
• applying orthogonalization or sparsification techniques such as low-rank constraints or pruning within the sparse adapter subspace unit (106) to minimize interference between concurrently fine-tuned tasks and to promote parameter efficiency; and
• aggregating the task-specific or modality-specific adapters across the edge nodes (101) using the secure aggregation unit (104) and filter unit (105) prior to their integration into the global model, thereby ensuring coherent and non-destructive model updates in decentralized settings.
[0055] FIG. 1 illustrates a block diagram of a dynamic decentralized system (100) for adaptive fine-tuning of large language models. The system (100) includes a network configuration unit (102) connected to a plurality of edge nodes (101), which are distributed across a peer-to-peer topology. Each edge node (101) contains a training module (103) configured to perform local computations and fine-tune language models using locally available data. The system (100) further comprises a secure aggregation unit (104) and a filter unit (105), both operatively connected to the edge nodes (101) to manage and validate the flow of updates across the network. The secure aggregation unit (104) ensures privacy-preserving model update exchange using additive secret sharing or similar cryptographic techniques, while the filter unit (105) evaluates and selectively forwards updates based on trust metrics or performance thresholds. Both the secure aggregation unit (104) and the filter unit (105) are communicatively coupled with a sparse adapter subspace unit (106), which manages modular low-rank adapter updates and coordinates parameter-efficient fine-tuning strategies across the system. The network configuration unit (102) enables dynamic connectivity and orchestrates communication protocols between edge nodes (101), optimizing for latency, bandwidth, or node reliability. The components are logically arranged in a hierarchical yet decentralized architecture, with the network configuration unit (102) coordinating the edge nodes (101), which in turn interact with the secure aggregation unit (104), filter unit (105), and sparse adapter subspace unit (106). The connections between these components represent secure data exchange pathways and control signals essential for coordinated distributed training within the system (100).
[0056] The main objective of the present invention is to enable efficient, secure, and privacy preserving fine-tuning of large language models in a decentralized manner. The proposed system (100) facilitates collaborative model improvement across multiple edge nodes (101) or participants, without reliance on a centralized authority or the need to directly exchange raw or sensitive data. It incorporates mechanisms including secure aggregation via the secure aggregation unit (104), decentralized coordination through the network configuration unit (102), and local adaptation performed by the training module (103) at each edge node (101), to address challenges of data heterogeneity, communication overhead, and trust in distributed environments. The sparse adapter subspace unit (106) further supports modular learning across tasks and modalities. The system (100) is designed to maintain a balance between computational efficiency, model performance, data confidentiality, and system robustness in real-world decentralized fine-tuning scenarios.
[0057] FIG. 2 illustrates a schematic diagram of a decentralized peer-to-peer network for low-rank fine-tuning of large language models. The network includes a first edge node (1), a second edge node (2), a third edge node (3), a fourth edge node (4), and a fifth client node (5), all participating in collaborative model adaptation. Each edge node comprises a local adapter, which facilitates fine-tuning based on locally available data without sharing raw data externally. The network configuration unit (102) dynamically manages the topology, as depicted by both solid and dashed lines connecting the edge nodes (101). Solid lines may indicate active communication links over which adapter updates or model parameters are shared among the nodes, while dashed lines may represent potential or provisional connections that are established in subsequent training rounds as part of the adaptive reconfiguration process.
[0058] A dynamic topology adaptation process in the decentralized network is illustrated in FIG. 9. The process begins with network configuration initialization, wherein the network configuration unit (102) assesses current network conditions and assigns baseline parameters for communication and topology structure. This is followed by edge node setup using the edge nodes (101), where each node (101) registers with the system, performs identity verification, and shares relevant capability metadata such as compute availability and data domain with the network configuration unit (102). Initial peer connections are established by the network configuration unit (102) based on proximity, resource compatibility, or predefined heuristics. The dynamic topology adaptation section is subsequently triggered, starting with the selection of a peer adaptation method by the network configuration unit (102), which may involve criteria such as node churn, data similarity, or load balancing. Depending on the selected strategy, the process proceeds to either random peer assignment, wherein edge nodes (101) are reassigned peers stochastically to enhance robustness and exploration, or graph-based peer assignment, which constructs or adjusts peer relationships among edge nodes (101) using a connectivity graph optimized for factors such as communication efficiency, training loss convergence, or trust score propagation.
[0059] In some cases, the degree of each edge node (101) may vary dynamically based on computational capacity, memory, bandwidth, or reliability metrics. More powerful edge nodes (101) such as those with higher processing power, stable connectivity, or increased storage may connect to a greater number of peers, while resource-constrained or intermittently connected nodes (101) may maintain fewer connections. By continually reshaping the peer-to-peer topology, the dynamic decentralized system (100) mitigates the formation of clusters with stale or redundant local models, thereby potentially accelerating global model convergence. The dynamic graph adaptation may be achieved without transferring any raw training data across nodes (101). Instead, only low-rank adapter weights, generated by each training module (103) within the edge nodes (101), may be exchanged, preserving privacy and reducing bandwidth usage. This mechanism enhances communication efficiency by promoting effective mixing of model updates across diverse and heterogeneous network regions, which may result in improved model accuracy under limited or variable bandwidth conditions. The network configuration unit (102) may orchestrate this decentralized peer network within the dynamic decentralized system (100), intelligently reconfiguring edge node (101) connectivity to balance learning speed, resource consumption, and communication overhead. The dynamic rewiring may be informed by real-time network metrics such as latency or throughput, and by the individual performance profiles of each edge node (101). In some implementations, the network configuration unit (102) may update peer links adaptively at the beginning of each training round to optimize mixing and convergence across the dynamic decentralized system (100).
[0060] An adaptive local update scheduling mechanism is illustrated in FIG. 4. The process begins with assessing available resources for each of the edge nodes (101) participating in the decentralized fine-tuning process. In some cases, the dynamic decentralized system (100), through the network configuration unit (102), may assess computational resources including CPU usage, memory availability, network bandwidth, and battery level for each of the edge nodes (101). These assessments may be performed periodically or on-demand, depending on the network conditions and task urgency. Based on the resource profile of each edge node (101), the training module (103) in each node may autonomously determine the optimal number of local fine-tuning epochs to perform before participating in communication or synchronization with the secure aggregation unit (104). For example, the first edge node (1) with high computational capacity and stable power may execute many local update steps between communication rounds, thereby reducing overhead. In contrast, the second edge node (2), which may be battery-constrained or experiencing poor connectivity, may perform fewer steps or potentially skip some communication rounds to conserve resources. The training module (103) may measure available compute cycles, memory headroom, and battery levels before initiating a training phase. Using this information, it may calculate a resource-aware and context-sensitive number of local epochs for fine-tuning. This heterogeneity-aware scheduling approach helps balance the training workload across the network and mitigates bottlenecks caused by slower or resource-constrained edge nodes (101), thereby enhancing the overall efficiency of the decentralized learning process coordinated by the network configuration unit (102). After computing the local adapter updates, each edge node (101) may determine an optimal transmission window based on its current state. For instance, the third edge node (3), facing intermittent connectivity or energy constraints, may choose to communicate only after completing multiple local epochs. Conversely, the fourth edge node (4), equipped with better connectivity and higher uptime, may send updates more frequently to the secure aggregation unit (104). This adaptive scheduling mechanism enables asynchronous participation and improves the robustness of the system (100) under diverse operating conditions.
[0061] The dynamic decentralized system (100) may support semi-asynchronous operation, wherein not all edge nodes (101) are required to synchronize during every communication round. In scenarios where an edge node (101) experiences latency, temporary disconnection, or resource constraints, it may defer participation in global synchronization via the secure aggregation unit (104) without interrupting its local training process executed by the training module (103). The edge node (101) may independently continue fine-tuning its local model using the training module (103) and rejoin the synchronization process managed by the network configuration unit (102) in a subsequent round when it is ready. This operational flexibility enhances the system's (100) robustness and scalability by accommodating fluctuating network conditions, heterogeneous hardware capabilities, and variable availability across the distributed edge nodes (101).
[0062] The scheduling logic implemented by the training module (103) may be centralized, wherein peer edge nodes (101) negotiate update schedules through a coordination protocol, or fully local, wherein each edge node (101) determines its update timing independently based on local resource availability, training progress, or network conditions. In either case, the global model may converge over time as long as updates are eventually propagated and aggregated across the network through the secure aggregation unit (104). This flexible scheduling mechanism supports asynchronous and heterogeneous learning environments, which are characteristic of decentralized systems.
[0063] By employing this adaptive scheduling approach, the dynamic decentralized system (100) may ensure that resource-constrained edge nodes (101) contribute to the training process without becoming overloaded. This may allow the overall training to proceed smoothly despite heterogeneity among the edge nodes (101) in terms of computational resources and network connectivity. The adaptive scheduler, implemented within the training module (103) of each edge node (101), may dynamically allocate training tasks based on real-time monitoring of each node’s CPU utilization, memory availability, battery level (for mobile nodes), and bandwidth status. Nodes with limited capacity may be assigned smaller model partitions, lower-frequency update cycles, or simpler computation tasks, thereby promoting balanced workload distribution and preventing system bottlenecks or node dropouts.
[0064] The dynamic decentralized system (100) may implement heterogeneity-aware correction of local adapter updates to mitigate client drift and improve convergence across non-identically distributed data settings. FIG. 3 illustrates a flowchart for heterogeneity-aware gradient correction in the decentralized network. Each edge node (101), through its training module (103), may apply a local correction term derived from global or neighbourhood statistics (which may be coordinated via the network configuration unit (102)), thereby aligning its updates with the overall training trajectory. This mechanism ensures that personalized updates remain compatible with aggregated model states managed by the secure aggregation unit (104), enabling stable and efficient fine-tuning in heterogeneous environments.
[0065] In some cases, the network configuration unit (102) may employ a control-variate method and a low-rank adaptation method to handle heterogeneity among the edge nodes (101), including variations in data distribution, computational resources, and communication bandwidth. These techniques assist in reducing variance and improving convergence across the decentralized network. The training module (103) in each of the edge nodes (101) may apply correction terms derived from the control variates to compensate for local model drift from the network average, thereby enhancing stability and accuracy during the aggregation of adapter updates processed by the secure aggregation unit (104).
[0066] FIG. 10 depicts a flowchart for a decentralized fine-tuning process, including a drift compensation module where the control-variate method may be applied. The control-variate method may employ drift correction to keep the training of the training module (103) in the edge nodes (101) on track, particularly in the presence of non-IID (non-independent and identically distributed) data across nodes in the dynamic decentralized system (100). In some implementations, each of the edge nodes (101) may maintain a small drift vector shared among neighboring edge nodes (101) to implement corrections similar to the SCAFFOLD algorithm, thereby reducing client-drift in local updates. When fusing low-rank adapter updates from different edge nodes (101), the training module (103) may perform a low-rank fusion step that accounts for system heterogeneity and local specialization. The low-rank adaptation method (1104) may apply subspace-aware normalization to account for the lowrank properties of the adapters and ensure alignment across learned representations. Since each edge node's (101) adapter may be represented as a low-rank matrix, the fusion process may project local updates into a common subspace or normalize them using shared basis representations, improving convergence stability and generalization.
[0067] In some cases, if an edge node’s (101) loss value is large indicating a high gradient magnitude or potential divergence from the global model then the corresponding adapter update from that edge node (101) may be scaled down by the training module (103) before being submitted to the secure aggregation unit (104), in order to stabilize convergence and prevent model drift. Alternatively, the edge nodes (101) may maintain an exponential moving average of their local adapter updates over time within the training module (103), using it to re-center or normalize subsequent updates. This normalization process ensures that individual updates remain consistent with historical behavior, thereby improving the robustness of the aggregation process within the secure aggregation unit (104) in the presence of noisy or heterogeneous data distributions.
[0068] The heterogeneity-aware fusion may be implemented by weighting updates proportional to factors such as local data size, statistical heterogeneity, or validation performance on held-out datasets. In some implementations, the principal components of adapter updates generated by the training module (103) in each edge node (101) may be aligned during the fusion process using dimensionality-reduction techniques such as Principal Component Analysis (PCA) or similar low-rank approximation methods. By treating the group of low-rank gradients typically arising from adapter modules such as LoRA or other sparse fine-tuning mechanisms as a fused object and applying heterogeneity mitigation techniques at the adapter level within the secure aggregation unit (104), the dynamic decentralized system (100) may achieve improved model performance and convergence under non-identical and independently distributed (non-IID) data settings across the plurality of edge nodes (101).
[0069] The dynamic decentralized system (100) may implement a secure, serverless aggregation protocol through a secure aggregation unit (104) to preserve the privacy of adapter updates transmitted from the edge nodes (101). This protocol enables the aggregation of locally trained model updates using additive secret-sharing or homomorphic encryption mechanisms, thereby ensuring that no individual edge node (101)’s data is exposed during the aggregation process. To facilitate this, each edge node (101) performs cryptographic masking and coordinates with selected peers within the peerto- peer network configured by the network configuration unit (102) for secure communication. FIG. 11 illustrates a flowchart for edge node (101) preparation and peer communication in the dynamic decentralized system (100), including key steps for secure aggregation handled by the secure aggregation unit (104).
[0070] In some cases, the secure aggregation unit (104) may encode adapter updates from the edge nodes (101) using additive secret sharing or homomorphic encryption techniques. This encoding ensures that individual adapter updates remain confidential and are not exposed in plaintext, except during or after secure aggregation. The secure aggregation unit (104) may enable each edge node (101) to partition its adapter parameters into multiple cryptographic shares, which are then distributed among a subset of peer edge nodes (101) in the network. For instance, a first edge node (1) (representing an instance of edge node (101)) may generate random masks or noise vectors, add them to its adapter update locally via its training module (103), and transmit the masked updates to neighboring edge nodes (101) over communication pathways coordinated by the network configuration unit (102). The corresponding mask components may be distributed separately such that, unless a predefined quorum or threshold of edge nodes (101) collaborates, the original adapter values cannot be reconstructed. Only when a sufficient number of shares are combined such as during a secure aggregation round managed by the secure aggregation unit (104) do the masks cancel out, revealing the aggregate sum of adapter updates. The secure aggregation unit (104) may thus facilitate a fully decentralized, serverless aggregation mechanism that eliminates the need for a central coordinating server. In such embodiments, the edge nodes (101) may participate in secure multi-party computation (MPC) protocols, possibly organizing themselves into a cryptographic spanning tree or clique topology using communication routing orchestrated by the network configuration unit (102) to efficiently compute the global sum of adapters without compromising individual contributions. This fully decentralized protocol eliminates the need for a central server while preserving end-to-end privacy of the training data.
[0071] The protocol implemented by the secure aggregation unit (104) may guarantee that, as long as a threshold number of honest edge nodes (101) participate in the aggregation process, a global aggregated adapter can be correctly and securely computed. Each edge node's (101) individual adapter remains encrypted and undisclosed to other edge nodes (101) or the secure aggregation unit (104). This approach may extend secure aggregation techniques commonly used in federated learning to a fully decentralized, peer-to-peer network (coordinated by the network configuration unit (102)), eliminating reliance on a central server and enhancing resilience against node compromise or data leakage. Furthermore, the aggregation protocol may be designed to support additive secret sharing or homomorphic encryption to ensure that model updates are combined securely without revealing any raw data or intermediate representations.
[0072] In practice, the secure aggregation unit (104) may coordinate the following sequence of operations to ensure privacy-preserving model updates during decentralized fine-tuning:
1. Each edge node (101) masks its locally updated adapter parameters using a secret-sharing or additive noise mechanism and sends the masked data to a designated subset of neighboring edge nodes (101) within the peer-to-peer network managed by the network configuration unit (102).
2. The receiving edge nodes (101) participate in a secure exchange protocol, wherein they compute and share intermediate partial aggregates with their respective peers under coordination by the secure aggregation unit (104).
3. Through multiple iterative rounds of peer communication and aggregation, each participating edge node (101) computes a unique fragment representing a share of the total masked sum.
4. Finally, the true global aggregate of adapter updates is reconstructed by securely combining the individual fragments collected from multiple edge nodes (101), under the control of the secure aggregation unit (104), without exposing any single node's raw update.
[0073] As shown in FIG. 11, the process may include a "Distribute Shares to Peers" step, wherein each participating edge node (101) generates secret shares of its local model update and securely distributes them to a subset of peer edge nodes (101) in the network. This is followed by the "Send Masked Net Updates" step, in which the edge nodes (101) compute and transmit masked updates that preserve data privacy. Subsequently, during the "Exchange Partial Aggregates" step, the edge nodes (101) collaboratively compute intermediate aggregated values using the received shares. The "Multiple Rounds of Exchange" step indicates that this secure aggregation process, coordinated by the secure aggregation unit (104), may involve iterative communication rounds among the edge nodes (101) to refine the aggregation results and mitigate straggler effects or dropout scenarios, thereby ensuring robustness and consistency in the final model update.
[0074] The secure aggregation unit (104) may maintain the privacy of each edge node's (101) LoRA update even if some peer edge nodes (101) are curious or passively malicious, as individual adapters remain masked throughout the aggregation process. This ensures that no single edge node (101) can reconstruct another’s local model update. In some implementations, the secure aggregation unit (104) may incorporate differential privacy noise prior to the masking step to provide additional protection against inference attacks, thereby enhancing the overall robustness of the decentralized fine-tuning process. The combination of additive masking and differential privacy helps to preserve confidentiality while enabling collaborative learning across a heterogeneous and potentially semi-trusted network of edge nodes (101).
[0075] By implementing this secure serverless aggregation protocol, the secure aggregation unit (104) may enable adapter aggregation without any single party having full control over the process, ensuring both decentralization and confidentiality in the dynamic decentralized system (100). This protocol operates through additive secret sharing among the edge nodes (101), wherein locally trained adapter updates generated by the training modules (103) within the edge nodes (101) are securely combined in a distributed manner, mitigating risks of data leakage and ensuring that no individual node or intermediary gains access to the complete model or training data.
[0076] In order to further improve aggregation efficiency and reduce the impact of slow or resource‑constrained clients (“stragglers”), the edge network nodes (101) may be grouped into one or more clusters based on data similarity or network proximity. Prior to each secure aggregation round, each node (101) may compute a low‑dimensional feature vector (e.g., principal component projections of its local data distribution, or network latency/ping statistics) that characterizes its local context. This vector is then broadcast to its immediate peers in the overlay network via the network configuration unit (102). The received vectors may be used by the network configuration unit (102) to dynamically form or update clusters in a decentralized manner, allowing aggregation to proceed preferentially within groups of well-matched nodes before global model synchronization through the secure aggregation unit (104).
[0077] The clustering unit (107), which may be implemented within the network configuration unit (102) or as a standalone software module (and can operate in a fully decentralized fashion via a round-robin leader election), collects these feature vectors from each edge node (101) and applies a decentralized clustering algorithm (for example, k-means, hierarchical agglomerative clustering, or graph partitioning based on network adjacency metrics) to partition the edge nodes (101) into disjoint clusters C₁…Ck. Cluster size and membership are dynamically adapted based on overall network size, variance in feature distributions generated by the training modules (103), and the topology of the peer-to-peer communication graph configured by the network configuration unit (102), ensuring balanced clusters and minimized intra-cluster overhead.
[0078] During secure aggregation, local updates from the edge nodes (101) are first combined within each cluster using the secret-sharing/encryption protocol described in [0061]-[0066], thereby preserving privacy even within the cluster. Designated cluster heads which may be specific edge nodes (101) or dynamically elected peer aggregators then exchange per-cluster aggregates and perform a secondary, global aggregation using the secure aggregation unit (104) to produce the final LoRA-adapter update (1104). This hierarchical approach reduces end-to-end latency by localizing the effects of straggler nodes within their respective clusters.
[0079] Optionally, clusters may be re-formed every T rounds, where T is a configurable parameter, or earlier if a continuously monitored “cluster-health” metric computed by the network configuration unit (102) using intra-cluster variance of feature vectors and observed aggregation delays exceeds a predefined threshold. This reconfiguration process ensures that the clustering of edge nodes (101) remains optimal, thereby maintaining efficient training and communication performance across the dynamic decentralized system.
[0080] The training module (103) in the dynamic decentralized system (100) may implement a rank adaptive low-rank adaptation method to efficiently fine-tune large language models on resource constrained edge nodes (101). This method enables dynamic adjustment of the rank of adapter modules (1104) during the training process, allowing for optimization of both model performance and computational overhead based on local data complexity and system constraints. The rank is not fixed beforehand but adaptively updated in response to training feedback signals, such as gradient norms or performance plateaus. This approach allows each edge node (101) to adjust its fine-tuning capacity autonomously. FIG. 6 illustrates a flowchart for a rank-adaptive low-rank adaptation method training process.
[0081] In some cases, the training module (103) of an edge node (101) may initialize adapter matrices for an adapter (1104) with a low initial rank to minimize computational and memory overhead, especially in resource-constrained edge environments. The adapter (1104) may be implemented using techniques such as Low-Rank Adaptation (LoRA) or similar lightweight parameter-efficient fine-tuning methods. The process may then enter a training loop where the adapter matrices are iteratively updated using locally available data at each edge node (101), allowing personalized or domain adapted learning while preserving the core model parameters.
[0082] During training, the training module (103) in each edge node (101) may monitor convergence behavior and resource utilization metrics, such as memory usage, processing load, and communication overhead. If the local loss or validation metric for an edge node (101) plateaus or stops improving beyond a defined threshold, the training module (103) may dynamically increase the rank of the adapter (1104) to enhance model capacity and enable finer gradient updates. Conversely, if network communication costs or computational constraints become prohibitive particularly in bandwidth-limited or battery-sensitive edge environments then the training module (103) may reduce the rank of the adapter (1104) to lower synchronization and processing overhead. In such cases, the adapter (1104) may be retrained with the reduced rank to maintain acceptable performance levels while optimizing resource efficiency.
[0083] The flowchart in FIG. 6 demonstrates a decision point to determine whether rank adjustment is required based on training dynamics, such as convergence rate or validation loss trends. If adjustment is needed, the process proceeds to increase or decrease the rank r, enabling the system to adaptively allocate model capacity. This is followed by updating the low-rank decomposition of the adapter module (1104) within the training module (103) of the respective edge node (101), ensuring efficient parameterization. If no adjustment is needed, or after completing the rank adjustment, the process resumes the local training loop on the edge node (101).
[0084] In some implementations, the rank may be adjusted globally by consensus among the edge nodes (101) via their respective training modules (103). After several rounds of training, the edge nodes (101), coordinated through the network configuration unit (102), may agree to update the target rank of their respective adapters (1104) based on aggregate metrics across the network, such as validation loss, convergence rate, or communication efficiency. This dynamic adjustment ensures that the low-rank adaptation performed by each training module (103) remains optimal with respect to the evolving characteristics of the distributed data and network conditions, thereby maintaining model accuracy while minimizing computational and bandwidth overhead.
[0085] The training module (103) may implement coordination mechanisms to manage rank changes across the edge nodes (101) during adapter fine-tuning. This coordination ensures consistency in the low-rank adaptation process across the decentralized network. In some implementations, the edge nodes (101) may perform a Riemannian gradient step to maintain or enforce a low-rank structure in the adapter matrices (1104). Alternatively, peer edge nodes (101) may exchange singular values of their respective adapters (1104) to enable synchronized re-basing of matrix decompositions. When a rank increase is triggered either locally by the training module (103) or via consensus among multiple edge nodes (101), the newly added dimensions of the adapter matrices (1104) may be initialized in a mathematically consistent and informed manner, such as by extrapolating from previous low-rank factors, to ensure backward compatibility and training stability. This rank-adaptive mechanism enables each adapter (1104) within an edge node (101) to dynamically allocate model capacity proportional to the complexity of its respective local task. For instance, the first edge node (1), encountering relatively simple or homogeneous data, may employ a low-rank adapter (1104), while a second edge node (2), tasked with learning from more complex, heterogeneous, or noisy data, may leverage a higher-rank adapter (1104) to better capture relevant patterns.
[0086] The training module (103) of each edge node (101) may accommodate the varying amounts of data transmitted by edge nodes (101) with different adapter ranks through the adaptive scheduling mechanism implemented by the network configuration unit (102). This mechanism may assess the computational capability, network bandwidth, and availability of each edge node (101) to allocate tasks accordingly. By dynamically adjusting the adapter rank and associated workload via the rank-adaptive low-rank adaptation method executed within the training module (103), the approach may improve training efficiency by tailoring model complexity to the resource profile of each edge node (101), thereby enhancing overall performance across the decentralized network.
[0087] The dynamic decentralized system (100) may implement robust aggregation and filtering mechanisms through a filter unit (105) to defend against malicious or malfunctioning edge nodes (101). The filter unit (105) may evaluate updates received from individual edge nodes (101) via their respective training modules (103) based on statistical deviation, trust scores, or cryptographic proof mechanisms, thereby ensuring only high-quality and trustworthy model updates are aggregated by the secure aggregation unit (104). This safeguards the integrity of the global model managed through the collaborative functioning of the network configuration unit (102) and enhances the resilience of the system (100) against poisoning attacks or faulty data contributions. FIG. 7 illustrates a flowchart for robust aggregation in the decentralized system (100).
[0088] In some cases, before computing a local aggregate, the filter unit (105) may examine incoming updates from neighboring edge nodes (101) to detect and eliminate outlier contributions that may degrade the quality of model convergence. The filter unit (105) may apply simple or advanced statistical rules to identify and remove extreme values from adapter updates received through the secure aggregation unit (104). For example, the filter unit (105) may drop updates whose entries significantly exceed a median-based threshold, thereby mitigating the influence of anomalous or potentially malicious edge nodes (101) while preserving the integrity of decentralized training managed under the coordination of the network configuration unit (102).
[0089] The filter unit (105) may compute the coordinate-wise median of the received adapter vectors submitted by the plurality of edge nodes (101) during the decentralized fine-tuning process managed by the network configuration unit (102). This operation serves as a robust aggregation technique that mitigates the influence of anomalous or potentially malicious updates generated by individual training modules (103) within the edge nodes (101). In some implementations, the filter unit (105) may further discard any adapter update whose distance from the computed median exceeds a predefined threshold, indicative of statistical deviation or inconsistency. This coordinate-wise median method may exclude up to half of the outliers in a given update round, thereby enhancing the integrity of the aggregation process and potentially ensuring that the remaining updates originate from honest peer edge nodes (101) operating within expected behavioral norms, prior to being processed by the secure aggregation unit (104).
[0090] In some cases, the filter unit (105) may implement more advanced filtering methods to enhance robustness against malicious or low-quality updates. For instance, the filter unit (105) may apply a variant of the Krum algorithm, a known Byzantine-resilient aggregation technique. In this approach, the filter unit (105) may compute pairwise Euclidean distances between model updates received from the edge nodes (101), each generated by their respective training modules (103). It then identifies the update whose k nearest neighbors are closest in terms of aggregate distance, indicating higher consistency with the majority of nodes. This selected update is considered statistically reliable and may be forwarded for secure aggregation by the secure aggregation unit (104), under the coordination of the network configuration unit (102), while others potentially representing outliers or adversarial inputs may be discarded or downweighted. This mechanism ensures that the decentralized training process remains robust in the presence of heterogeneous, noisy, or even adversarial contributions within the peer-to-peer network of edge nodes (101).
[0091] The filter unit (105) may operate in a decentralized context wherein the plurality of edge nodes (101) collaboratively execute filtering algorithms without reliance on a central coordinator, such as the network configuration unit (102). Each edge node (101) may perform local selection of model updates based on predefined criteria such as update quality, anomaly detection, or gradient consistency. Following local selection, a consensus mechanism such as majority voting, weighted voting, or iterative rounds of validation may be employed to determine the global set of updates to be accepted. This decentralized filtering by the filter unit (105) ensures robustness against malicious or low-quality updates while maintaining system scalability and fault tolerance.
[0092] By filtering updates, the filter unit (105) may make the dynamic decentralized system (100) resilient to a bounded number of Byzantine edge nodes (101). Even if some edge nodes (101) send maliciously poisoned adapter values such as adversarially modified low-rank updates or anomalous gradients the filtering mechanism within the filter unit (105) may detect and exclude such updates based on statistical deviation, similarity metrics, or reputation scores. This selective inclusion may prevent poisoned inputs from biasing the global model during secure aggregation performed by the secure aggregation unit (104) and maintain the overall robustness and integrity of the fine-tuning process in the system (100).
[0093] In some implementations, the filter unit (105) may incorporate anomaly detection mechanisms to identify peer edge nodes (101) that persistently deviate from expected model update patterns, communication protocols, or resource behavior. Such deviations may indicate faults, adversarial behavior, or hardware constraints. Upon detection, the anomalous edge nodes (101) may be temporarily excluded from the dynamic peer-to-peer topology maintained by the network configuration unit (102) to maintain overall system integrity and performance.
[0094] The combination of secure aggregation through the secure aggregation unit (104) and robust filtering via the filter unit (105) may help maintain both privacy and integrity in the dynamic decentralized system (100). The secure aggregation unit (104) ensures that individual model updates from the plurality of edge nodes (101), each comprising a training module (103), remain confidential during collective processing, while the filter unit (105), operating in coordination with the sparse adapter subspace unit (106), detects and excludes anomalous or potentially malicious contributions before they influence the global model state. The coordination of these components is orchestrated by the network configuration unit (102), which ensures secure and efficient communication pathways among the edge nodes (101). Together, these components enhance trustworthiness and resilience in the system’s collaborative fine-tuning process.
[0095] The dynamic decentralized system (100) may extend its capabilities to support multi-modal models and multi-task learning through the sparse adapter subspace unit (106), which enables parameter-efficient adaptation across diverse tasks and input modalities. Each edge node (101) may access and contribute to these sparse subspaces during training and synchronization. FIG. 8 illustrates a flowchart showing the extension to multi-task learning and processing in the decentralized system (100), wherein the sparse adapters managed by the sparse adapter subspace unit (106) dynamically allocate and manage subspaces corresponding to different tasks or modalities across participating edge nodes (101), thereby enhancing generalization and system-wide adaptability.
[0096] In some cases, the sparse adapter subspace unit (106) may manage adapters for different modalities and tasks to prevent interference between them during fine-tuning. This is particularly beneficial in heterogeneous training environments where multiple edge nodes (101) handle diverse data types such as text, audio, image, or sensor signals. For multi-modal scenarios, the sparse adapter subspace unit (106) may logically separate and maintain distinct adapter subspaces for each modality to preserve modality-specific features and optimize task performance. These modality-specific adapters may be locally trained by the training module (103) within each edge node (101) and subsequently synchronized through the secure aggregation unit (104) and filtered by the filter unit (105) if necessary. The flowchart in FIG. 8 illustrates this with a "Separate Adapters per Modality" step within the Multi-Modal Scenario path, indicating that adapter routing and selection are dynamically adjusted based on input modality.
[0097] The sparse adapter subspace unit (106) may enable the edge nodes (101) to collaboratively fine-tune large language models across heterogeneous data types and modalities (e.g., text, speech, images) without causing interference between task-specific or modality-specific knowledge representations. In some implementations, the sparse adapter subspace unit (106) may maintain modular lightweight adapters within a shared low-rank subspace and dynamically allocate or isolate them per modality for each edge node (101). It may also synchronize and aggregate these adapters across participating edge nodes (101) in a modality-aware manner, with coordination optionally facilitated by the network configuration unit (102). This synchronization and aggregation process may be represented by the “Synchronise & Aggregate per Modality” step shown in FIG. 8, ensuring that the decentralized model evolution remains stable and efficient across varying input domains.
[0098] For multi-task learning scenarios, the sparse adapter subspace unit (106) may employ orthogonalization and sparsification of adapters to reduce cross-task interference and enhance taskspecific generalization. This enables the dynamic decentralized system (100) to manage multiple tasks concurrently while preserving performance isolation between them. The edge nodes (101) utilize this functionality during local training and aggregation. FIG. 8 depicts this approach in the Multi-Task Scenario path, which includes steps for "Separate Adapters per Task" and "Orthogonal or Sparse Adapter Design."
[0099] In some cases, the sparse adapter subspace unit (106) may fix one projection matrix, typically the down-projection, as a random orthonormal basis that is pre-generated or shared across tasks, while only training the complementary up-projection matrix under a task-specific sparsity mask. This approach enables efficient parameter isolation, ensuring that adapters for different tasks are constrained to occupy orthogonal subspaces within the shared parameter space. As a result, it potentially reduces cross-task interference during adapter merging, thereby enhancing modularity and facilitating continual learning across heterogeneous edge nodes (101) participating in the decentralized system (100), under the coordination of the network configuration unit (102).
[0100] The sparse adapter subspace unit (106) may implement a tagging system for adapters to enable modular and task-specific fine-tuning across heterogeneous edge nodes (101). As shown in FIG. 8, the process may include steps to “Tag Adapters by Modality or Task” and “Average Adapters by Modality or Task.” Each adapter generated during local training by a respective training module (103) within an edge node (101) may be tagged with metadata indicating its associated modality (e.g., text, speech, image) or downstream task (e.g., sentiment analysis, question answering). This tagging system may allow the dynamic decentralized system (100) to manage, isolate, and selectively aggregate adapters via coordination between the sparse adapter subspace unit (106), secure aggregation unit (104), and filter unit (105) based on shared characteristics, thereby improving taskaligned model generalization while preserving the decentralized nature of the system.
[0101] In some implementations, the sparse adapter subspace unit (106) may support continual learning by enabling the model to incorporate knowledge from new tasks without catastrophically forgetting previously learned information. As illustrated in FIG. 8, the Continual Learning path includes steps such as “Zero-Out or Freeze Old Task Elements,” which prevent updates to previously critical parameters, and the use of “Sparsity Adapters for Prior Task Retention,” which allocate sparse, taskspecific subspaces within the model. This architectural strategy may allow the dynamic decentralized system (100) to efficiently adapt to evolving user data or domain-specific tasks across distributed edge nodes (101), while maintaining robustness and accuracy on earlier tasks. Additionally, this mechanism implemented by the sparse adapter subspace unit (106) promotes parameter reuse and reduces the risk of interference between tasks, which is particularly important in resource-constrained decentralized environments comprising heterogeneous edge nodes (101).
[0102] The sparse adapter subspace unit (106) may enable cross-modal consistency by facilitating occasional cross-view communication across heterogeneous training modalities. In some cases, the first edge node (1), which is part of the plurality of edge nodes (101), and engaged in training on visual data, may share its learned sparse adapters with the second edge node (2), also part of the edge node group (101), which may be training on textual data. This sharing of modality-specific adapters may allow the receiving edge node (2) to benefit from cross-modal inductive biases without directly accessing the raw data, thereby preserving privacy. Such sharing may occur using the same secure, serverless communication method implemented by the secure aggregation unit (104), ensuring confidentiality and integrity during transmission. This mechanism helps align representations across modalities while maintaining decentralized training constraints as orchestrated by the network configuration unit (102).
[0103] For scenarios where multiple edge nodes (101) train on different tasks or domain-specific datasets, the sparse adapter subspace unit (106) may allow sharing of only the overlapping or taskagnostic components of the adapters, thereby preserving specialization while enabling cross-task generalization. Alternatively, the sparse adapter subspace unit (106), in coordination with the training module (103) and the secure aggregation unit (104), may implement knowledge distillation techniques to extract and merge distilled representations from task-specific adapters, facilitating the integration of diverse learnings into a unified adapter subspace without requiring direct parameter sharing between the edge nodes (101).
[0104] The sparse adapter subspace unit (106) may manage the aggregation of adapters for multitask scenarios, enabling efficient knowledge sharing across heterogeneous tasks within the dynamic decentralized system (100). In some implementations, the third edge node (3) and the fourth edge node (4) both instances of the plurality of edge nodes (101) may compute task-specific averages of their respective adapter matrices, maintaining isolation between task domains. The sparse adapter subspace unit (106), in coordination with the secure aggregation unit (104) and the filter unit (105), may apply task similarity weightings during this aggregation process to emphasize contributions from edge nodes (101) handling semantically similar tasks, thereby improving generalization while preserving task-specific nuances.
[0105] By implementing these techniques, the sparse adapter subspace unit (106) may extend the capabilities of the dynamic decentralized system (100) to handle multi-modal and multi-task learning scenarios, such as simultaneous fine-tuning on text, image, and structured data modalities or adapting to diverse task objectives across domains. This extension enables efficient parameter sharing and modular adaptation across the edge nodes (101), while maintaining the core principles of adaptivity, security, and robustness in the decentralized fine-tuning process enabled by the coordinated interaction of the network configuration unit (102), training modules (103), secure aggregation unit (104), filter unit (105), and sparse adapter subspace unit (106).
[0106] FIG. 9 illustrates a flowchart for a dynamic topology adaptation process in a decentralized network. The process begins with network initialization, followed by client nodes setup and initial peer connections establishment. These initial steps prepare the network for distributed operation. The process then enters a dynamic topology adaptation phase, which branches into two parallel paths. One path implements random peer assignment, while the other implements graph-based peer assignment methods. The random peer assignment path allows for stochastic network connections, while the graph-based approach uses network metrics to determine optimal peer relationships.
[0107] Following the peer assignment, the process moves to steps for assigning nodes based on their degree and power. This includes assigning more powerful nodes to weaker clients, which helps balance the network load. The process then proceeds to update peer connections based on the new assignments.
[0108] The flowchart shows a feedback loop where the updated peer connections feed back into the topology adaptation phase, allowing for continuous optimization of the network structure. The process includes decision points that determine whether to proceed with random or graph-based assignment methods based on network conditions.
[0109] The process concludes with steps for finalizing the updated peer connections and preparing for the next iteration of topology adaptation. The arrangement of steps shows how the network can dynamically reconfigure itself while maintaining operational continuity.
[0110] FIG. 10 illustrates a flowchart for a decentralized fine-tuning process. The process begins with two parallel input paths that converge into a single processing stream. These initial paths represent different data input sources that feed into the system. The process continues through a sequence of processing steps arranged vertically. After the initial convergence point, the flow moves to a data preparation stage, followed by a model configuration phase. The flowchart then shows a branching path where the process splits into multiple parallel operations. These parallel operations lead to a central processing stage (1007) where the data undergoes transformation and analysis.
[0111] Following the central processing stage, the flow converges again into a single path leading to an output generation phase (1009). This phase includes multiple sequential steps for processing and finalizing the results. The flowchart shows feedback loops (1010) at various points, allowing for iterative refinement of the process.
[0112] The process concludes with a final output stage (1011), represented at the bottom of the flowchart. The overall structure demonstrates a systematic approach to decentralized fine-tuning, incorporating both parallel processing capabilities and sequential operations. The flowchart includes decision points, processing stages, and data flow paths that work together to facilitate the fine-tuning process.
[0113] FIG. 11 illustrates a flowchart for client preparation and peer communication in a decentralized system. The flowchart comprises three main sections: client preparation, peer communication, and aggregation and reconstruction.
[0114] The client preparation section begins with a start point (1100) and proceeds to a prepare adapter update step (1102). This leads to a decision point (1104) where the system determines whether to use baseline sharing or alternative methods.
[0115] The peer communication section shows two parallel paths. The first path involves distributing shares to peers (1106), followed by sending masked net updates (1108). The second path involves hyperparameter sharing (1110) and exchange period allocation (1112). These paths converge at a multiple rounds exchange step (1114).
[0116] The aggregation and reconstruction section begins with collecting shares from peers (1116). This is followed by an exchange server masks step (1118), which leads to a transfer shares to remainder step (1120). The process concludes with a final reconstruction phase (1122).
[0117] The flowchart demonstrates a sequential process flow with parallel processing capabilities during the peer communication phase. The connections between steps indicate both direct progression and conditional branching based on the sharing method selected. The structure allows for iterative processing through multiple rounds of exchange before proceeding to the final aggregation steps.
[0118] FIG. 12 illustrates a block diagram of a dynamic decentralized system for adaptive fine-tuning of language models. The system comprises three main processing sections arranged vertically:
[0119] 1. Upper section: Shows the cluster formation process, including statistical analysis and parameter updates.
[0120] 2. Middle section: Illustrates the secure update aggregation pathway, containing local model updates, secure sharing mechanisms, and aggregate computation modules.
[0121] 3. Lower section: Depicts the secure aggregation pathway, comprising parameter aggregation and final model state updates.
[0122] The system features bidirectional connections between modules, indicating interactive data flow and feedback mechanisms. It shows multiple parallel pathways for concurrent processing and includes decision points for branching into alternative paths. The system incorporates feedback loops and iterative processing paths, enabling continuous refinement and optimization of model parameters during fine-tuning.
[0123] While this invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the scope of the appended claim.
, Claims:WE CLAIM
1. A dynamic decentralized system (100) for adaptive fine-tuning of large language models, comprising:
• a network configuration unit (102) that monitors network metrics;
• a secure aggregation unit (104) ;
• a filter unit (105) connected to the plurality of edge nodes (101), configured to apply robust aggregation filters on received adapter updates; and
• a sparse adapter subspace unit (106) connected to the plurality of edge nodes (101) and communicatively coupled with the secure aggregation unit (104) and the filter unit (105), configured to partition and manage low-rank adapters across multiple modalities and tasks;
(I) wherein, the network configuration unit (102) dynamically rewires peer-to-peer connections among the edge nodes (101), training each round on the basis of a plurality of real-time metrics and individual node capabilities to enable secure and decentralized fine-tuning;
(II) the plurality of edge nodes (101), via their respective training modules, dynamically tailor a number of local epochs each round according to real‑time resource measurements and individual node capabilities, enabling heterogeneous deployments within the system (100);
(III) the training module (103) is configured to apply a combined control-variate method and a low rank adaptation method based on the node’s local data distribution and capabilities, thereby allowing the system (100) to deal with distinguishing edge nodes (101) and their respective data and node capabilities to keep the training process efficient and effective;
(IV) the secure aggregation unit (104) enables peer-to-peer exchange of low rank adapter representations utilizing additive secret-sharing, while the network configuration unit (102) maintains the adapters (1104) in a bandwidth efficient form, without utilizing a central server;
(V) the filter unit (105) integrates a coordinate-wise median method to safeguard the system (100) against poisoned adapters updates; and
(VI) the sparse adapter subspace unit (106) enables the plurality of edge nodes (101) to collaborate across different data types and modalities by partitioning adapters into modality‑specific subspaces and applying orthogonalization and sparsification of adapters to reduce cross-task interference in multi-task learning scenarios.
2. The dynamic decentralized system (100) as claimed in claim 1, wherein the control-variate drift correction method employs drift correction to compensate for client‑specific update drift, thereby maintaining convergence of the local training performed by the training module (103) in each edge node (101).
3. The dynamic decentralized system (100) of claim 1, wherein the low-rank adaptation method applies subspace-aware normalization to account for low-rank properties and preserve numerical stability in adapter fusion.
4. The dynamic decentralized system (100) of claim 3, wherein the low-rank adaptation method dynamically adjusts the rank of each adapter at each edge node (101) during the training process in the training module (103) based on detection of local training loss plateaus and on available bandwidth constraints.
5. The dynamic decentralized system (100) of claim 1, wherein the network configuration unit (102) dynamically rewires peer connections among edge nodes (101) at each training round based on real-time metrics to improve mixing and accelerate convergence.
6. The dynamic decentralized system (100) of claim 5, wherein the dynamic rewiring of peer connections is performed by a graph-theoretic criteria that considers current model variance, node centrality and pairwise model-update similarity.
7. The dynamic decentralized system (100) as claimed in claim 1, wherein the plurality of edge node (101) and their respective training module (103) are configured to support continual learning, enabling the local model to be sequentially adapted to new tasks over time while preserving performance on previously learned tasks.
8. The dynamic decentralized system (100) as claimed in claim 1, further comprising a clustering unit (107) that comprises of:
a. collects low-dimensional feature vectors from each edge node characterizing data-distribution or network-metric similarity;
b. partitions the plurality of edge nodes into clusters C₁…Ck using a decentralized clustering algorithm; and
c. directs the Secure Aggregation Unit (104) to first aggregate adapter updates within each cluster, then aggregate the resulting cluster-level sums across cluster heads to form the final global adapter update.
9. A method for adaptive fine-tuning of large language models in a dynamic decentralized system (100), comprising:
• configuring, by a network configuration unit (102), a peer‑to‑peer network among a plurality of edge nodes (101), wherein the network configuration unit (102) continuously monitors real‑time network metrics per‑node compute and communication capabilities;
• dynamically rewiring, in each training round, peer connections among the edge nodes (101) based on said real‑time network metrics and per‑node capabilities; tailoring, at each edge node (101), a respective number of local training epochs in accordance with real‑time resource measurements comprising at least one of CPU utilization, battery level, and available bandwidth;
• employing, across the edge nodes (101), a control‑variate drift‑correction method to compensate for update drift arising from heterogeneity in local data distributions and node capabilities;
• applying, in conjunction with said drift‑correction method, a low‑rank adaptation method to enforce bandwidth‑efficient adapter representations;
• performing, by a secure aggregation unit (104), fully decentralized secure aggregation of adapter updates via additive secret-sharing, thereby obviating reliance on any central server;
• detecting and removing, via a coordinate‑wise median filter method, poisoned adapter updates to safeguard model integrity;
• partitioning, by a sparse adapter subspace unit (106), adapter updates into modality‑specific sparse subspaces and applying orthogonalization and sparsification within each subspace to prevent cross‑task interference across heterogeneous data modalities;
• grouping, by a clustering unit (107), the edge nodes (101) into clusters based on at least one of data similarity and network similarity;
• performing, for each cluster, a first‑stage aggregation of filtered, sparse‑subspace adapter updates within the cluster and a subsequent second‑stage aggregation across clusters to produce a final global update;
• wherein the sparse adapter subspace unit (106) is employed during both the first‑stage and second‑stage aggregations to maintain orthogonal, sparsified adapter representations and thereby minimize cross‑task interference in multi‑task learning scenarios.
10. The method as claimed in claim 9, wherein clustering of edge nodes is triggered periodically each round or dynamically upon detection that intra‑cluster variance of feature vectors exceeds a predefined threshold.
11. The method as claimed in claim 9, wherein the system applies a Krum‑style robust aggregation algorithm as an alternative to the coordinate‑wise median filter to remove malicious or faulty updates.
12. The method as claimed in claim 9, wherein the control-variate method comprises the steps of, at each edge node (101), estimating a drift term from prior local update deviations and applying a corrective adjustment to the current gradient, thereby compensating for client‐specific drift and maintain convergence of the decentralized training.
13. The method as claimed in claim 9, wherein the low rank adaptation method applies subspace-aware normalization to each adapter update for scaling and orthogonalizing the low‑rank factors to preserve numerical stability and account for low-rank properties.
| # | Name | Date |
|---|---|---|
| 1 | 202521070261-STATEMENT OF UNDERTAKING (FORM 3) [23-07-2025(online)].pdf | 2025-07-23 |
| 2 | 202521070261-REQUEST FOR EARLY PUBLICATION(FORM-9) [23-07-2025(online)].pdf | 2025-07-23 |
| 3 | 202521070261-OTHERS [23-07-2025(online)].pdf | 2025-07-23 |
| 4 | 202521070261-FORM-9 [23-07-2025(online)].pdf | 2025-07-23 |
| 5 | 202521070261-FORM FOR STARTUP [23-07-2025(online)].pdf | 2025-07-23 |
| 6 | 202521070261-FORM FOR SMALL ENTITY(FORM-28) [23-07-2025(online)].pdf | 2025-07-23 |
| 7 | 202521070261-FORM 1 [23-07-2025(online)].pdf | 2025-07-23 |
| 8 | 202521070261-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [23-07-2025(online)].pdf | 2025-07-23 |
| 9 | 202521070261-DRAWINGS [23-07-2025(online)].pdf | 2025-07-23 |
| 10 | 202521070261-DECLARATION OF INVENTORSHIP (FORM 5) [23-07-2025(online)].pdf | 2025-07-23 |
| 11 | 202521070261-COMPLETE SPECIFICATION [23-07-2025(online)].pdf | 2025-07-23 |
| 12 | 202521070261-STARTUP [24-07-2025(online)].pdf | 2025-07-24 |
| 13 | 202521070261-FORM28 [24-07-2025(online)].pdf | 2025-07-24 |
| 14 | 202521070261-FORM 18A [24-07-2025(online)].pdf | 2025-07-24 |
| 15 | Abstract.jpg | 2025-07-31 |
| 16 | 202521070261-FORM-26 [18-08-2025(online)].pdf | 2025-08-18 |
| 17 | 202521070261-Proof of Right [25-08-2025(online)].pdf | 2025-08-25 |