System And Method For Providing A Configurable Fault Tolerant

< Back

System And Method For Providing A Configurable Fault Tolerant Architecture

Abstract: The present disclosure relates to a system and method for providing a configurable fault tolerant architecture. The system includes at least four computation units (102, 104, 106, 108), a configuration management unit (116), and a configurable voter logic unit (210) coupled with a shared memory (118). The configurable voter logic unit (210) enables the system to adapt to different computing configurations in an event of a fault based on a health and activation status of the at least four computation units (102, 104, 106, 108) read and verified by the configuration management unit (116) and analysed by the configurable voter logic unit (210) in real-time to enhance fault tolerance of the system.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

15 March 2024

Publication Number

38/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

Parent Application

Applicants

Bharat Electronics Limited

Corporate Office, Outer Ring Road, Nagavara, Bangalore - 560045, Karnataka, India.

Inventors

1. AMIT P JAGTAP

Central Research Laboratory, Bharat Electronics Limited, Jalahalli P.O., Bangalore - 560013, Karnataka, India.

2. SANDEEP B

Central Research Laboratory, Bharat Electronics Limited, Jalahalli P.O., Bangalore - 560013, Karnataka, India.

3. MASTIK KUMAR

Central Research Laboratory, Bharat Electronics Limited, Jalahalli P.O., Bangalore - 560013, Karnataka, India.

4. SHRIKANT KUMAR

Central Research Laboratory, Bharat Electronics Limited, Jalahalli P.O., Bangalore - 560013, Karnataka, India.

5. SAROJ BHARTI

Central Research Laboratory, Bharat Electronics Limited, Jalahalli P.O., Bangalore - 560013, Karnataka, India.

6. RAVI PRAKASH REDDY

Central Research Laboratory, Bharat Electronics Limited, Jalahalli P.O., Bangalore - 560013, Karnataka, India.

7. RANJITH KUMAR V

Central Research Laboratory, Bharat Electronics Limited, Jalahalli P.O., Bangalore - 560013, Karnataka, India.

Specification

Description:TECHNICAL FIELD
[0001] The present disclosure relates, in general, to computer architecture systems, and more specifically, relates to a system and method for providing a configurable fault tolerant architecture to achieve near zero downtime of a plurality of arrangements of computers in an event of a failure of one or more computation units.

BACKGROUND
[0002] A conventional computer system which is build using multiple computational units continues to function without compromising the system functionality, even if one of the computational units fails. Such system is called a fault tolerant system. The fault tolerant system is required in safety, life & mission critical applications. The applications include military systems, nuclear power plants, high speed and mass public transports, industrial controllers in oil & gas industries and any system which is used in hazards environment etc. Since multiple computational units are involved in fault tolerant systems, such systems always require a voter or a comparator to compare the result of operations of each computational units against one other. The mechanism of failure detection and failure isolation is extremely important for such systems. Any failures in such systems may be either safe or dangerous failure. Such systems need to fail safely, especially, in applications where direct human life are affected because of system failure.
[0003] The most commonly deployed fault tolerant system is either based on triple modular redundancy (TMR) or a dual redundant voting architecture system, also referred to as 2oo3 and 2x2oo2 system architectures. The TMR 2oo3 system uses three computational units and a voter unit which compares the result of three elements. Out of these three at least two computational units output need to be matched for the system to be functioning and in case of any failure in any one compute element the system goes to shut down with safety state of all outputs. The 2x2oo2 system comprises of dual redundant 2oo2 systems. A dual redundant system is a design approach in which critical components or systems are duplicated to provide a level of fault tolerance. This redundancy helps to mitigate the impact of hardware or software failures and reduces the risk of system downtime. In 2oo2 part of 2x2oo2 system, the subsystems are tightly coupled for decision making and for voting of compute related activities. The other redundant 2oo2 system works in hot standby mode, ready to take over instantly in case of any failure occurs in active 2oo2 subsystem. This minimizes the time it takes to switch to the redundant component. The redundant 2oo2 system takes over the operation seamlessly, ensuring continuity of service without manual intervention.
[0004] From fault identification point of view through voting of computational units, at least a triple structure such as 2oo3 is required to identify the faulty compute element. In contrast, in doublet structure such as 2oo2, the identification of faulty element is not possible, the subsystem only can alarm the failure that has occurred in one of the computational units through voting among elements operating in parallel. A normal requirement in any high availability system is that the system shall be functioning even if one failure is detected in the system, means the system shall have fault tolerance of at least one or more. The problem of the architectures discussed above that both the architectures provide fault tolerance of only one. In case of 2oo3 system, the system continues to work in fail safe mode with two computational units if any one of the computational unit fails.
[0005] The 2x2oo2 system provides fault tolerance of one, in case of one failure the system switches to a hot standby system. Even though the 2x2oo2 system has fault tolerance of one, other problem is each of the computational units shall be of equal capability which increases power dissipation, size and cost of the system. Another problem that is generally faced is switching between the hot standby subsystems. In some cases, it is observed that, if the switch over is not implemented properly failure of one subsystem leads to continuous switching between the two 2oo2 subsystems for taking control or the hot standby subsystem never takes up even other subsystem is failed. The other disadvantage with 2x2oo2 system is, since the computational units are tightly coupled in 2oo2 configuration, even with single failure the other working compute element in 2oo2 subsystem is useless when control switches to the hot standby 2oo2 subsystem. In order to handle above problems and aiming to achieve high degree of fault tolerance with similar number of hardware what is conventionally being used in 2x2oo2 compute architecture, this invention proposes dynamically configurable flexible fault tolerant computation architecture. The dynamically configurable flexible compute architecture achieves more fault tolerance by configuring the system to its one level lower fault tolerant scheme on single failure and configuring system again to higher fault tolerance upon recovery.
[0006] Disclosed in US7685464 - Alternating Fault Tolerant Reconfigurable Computing Architecture is an application that uses multiple configurable devices (like FPGAs) and periodically configures one of the programmable devices from N number of devices to avoid SEU effect. These FPGAs are used for signal processing, receives input from sensors. There is a payload control computer which communicates with FPGAs (signal processing nodes) over common bus. A payload control computer issues command to payload interface block, A payload interface block receives processed payload data from signal processing nodes (FPGAs). Using these elements, a method to reduce the radiation effect on SRAM based electronic device (FPGA) was described. Internal architecture of signal processing node and configuration management is also discussed. Disclosed in US20050273653A1 is a Single Fault Tolerance in an Architecture with Redundant System are two redundant systems (say 1 & 2). Each redundant system has two processors and health monitoring process with it, health monitoring operates independently and performs voting function to identify faults within electronic module. Out of each processor card in system 1 & 2, one will be designated as coordinator to handle the reading health status from other card and perform voting task to identify failure. This system assumes only one fault occurs at a time.
[0007] Disclosed in US7392426- Redundant Processing Architecture for Single fault tolerance is a system comprises of two redundant logic devices. Each logic devices (FPGA) have two processors and a comparator. The first and second comparator operates as a distributed comparator system. Disclosed is a system architecture, where comparator was implemented in hardware. The comparator function is implemented in software called as software implemented fault tolerance (SIFT). In this method, first level of comparison is done at software level using SIFT, which looks for at least three processor outputs as a majority output. The second level of comparison is done by Hardware comparator. The two results of software and hardware compactor are combined as used as final voting to identify faulty processor. All these elements are implemented in FPGA including processors and comparators.
[0008] Disclosed in US7047440 - Dual/ Triple Redundant Computer System is a system is primarily aimed at emergency shutdown or to a critical ON-OFF control application. The voting in this architecture is done on two levels/stages. Firstly, voting is done at CPM stage and second time it is done at output module level. The Central Processing Modules performs two-out-of-three (2-of-3) majority voting by using the three-input data obtained through input modules. The CPMs send the resulted data of 2oo3 voting to the microcontroller A, B and C. Each microcontroller sends the data to two outputs circuits. The second stage voting is 2oo2 voting obtained by summation of outputs from the three-output circuit A, B and C.
[0009] Disclosed in US7065672- Apparatus and Methods for Fault tolerant Computing using a Switching fabric is a system related to generally to fault-tolerant computer systems and more specifically to a method and apparatus for communicating data between elements of a fault-tolerant computer system. The application talks about an asynchronous Switching fabric in between redundant data processing elements and the target IO System or CPU end nodes. The switching fabric includes network components, such as Switches, routers, repeaters, and transceivers interconnected through communications links.
[0010] Disclosed in US 6334194- Fault Tolerant Computer Employing double-redundant structure is an architecture comprises two judgment sections corresponding to each operation controller in the double-redundant structure. Each judgment section compares an output from the operation controller connected to the present judgment section with an output from the operation controller connected to the other judgment section. One judgment section receives a signal indicating a comparison result from the other judgment section, and compiles this signal and a comparison result obtained in the present judgment section with reference to additional diagnosis information so as to judge whether the output from the operation controller connected to the present judgment section is correct.
[0011] Disclosed in US6938183 - Fault Tolerant Processing Architecture is an invention that relates generally to fault tolerant computer processors, and more particularly, to a voted processing System. This fault tolerant processing circuit includes at least 3 processors, a synchronizing circuit and a fault logic circuit. Each Processor have inputs and outputs connected with them. The synchronizing circuit synchronizes outputs from each processor output modules. A fault logic circuit communicates with synchronizer circuit. A fault logic circuit compares outputs from each processor output module to detect errors in outputs. A fault occurs when no processor (of any three) is in majority. Once fault is detected the latched signal is used to reset respective processor circuit.
[0012] Disclosed in US20070220367A1 is a Fault Tolerant Computing System Disclosed in US6732300 - Hybrid Triple Redundant Computer System Abstract is an invention that relates to hybrid multiple redundant Systems that combine majority Voting with fault diagnostic and fault recovering means to provide correct outputs of a System in the presence of multiple System component faults. The system is built around hybrid triple modular redundancy. The system includes CPU modules, Inputs, Outputs and voting system for outputs. The system normally works as a 2oo3 configuration if no failure alarm is raised, in the event of failure of any one of the CPU configures to 2oo2 system. Further on one more additional failure system gets configured 1oo1 voting.
[0013] The inventions primarily focus on traditional 2oo2 or 2oo3 architectures and their combinations with other modules such as hardware and software comparators, fault diagnostics modules, etc. These architectures and arrangements are implemented either within a single FPGA or as separate hardware components. However, the discussed inventions do not cover the configurable fault-tolerant architecture, which offers greater fault tolerance compared to conventional approaches.
[0014] Therefore, it is desired to overcome the drawbacks, shortcomings, and limitations associated with existing solutions, and providing a configurable fault tolerant architecture to achieve near zero downtime of a plurality of arrangements of computers in an event of a failure of one or more computation units.

OBJECTS OF THE PRESENT DISCLOSURE
[0015] An object of the present disclosure relates is to provide a system and method for providing a configurable fault tolerant architecture to achieve near zero downtime of a plurality of arrangements of computers in an event of a failure of one or more computation units.
[0016] Another object of the present disclosure is to provide a system for providing a configurable fault tolerant architecture to achieve near zero downtime of a plurality of arrangements of computers in an event of a failure of one or more computation units to achieve higher degree of fault tolerance in comparison with conventional fault tolerant architecture.
[0017] Another object of the present disclosure is to provide a system that achieves high fault tolerance without any additional hardware requirement compared to conventional system.
[0018] Another object of the present disclosure is to provide a system that takes care that each CPUs executes the common application code regardless of the fault tolerant architecture when it gets configured from one fault tolerant architecture to other.
[0019] Another object of the present disclosure is to provide a system that automatically reconfigures itself to a higher fault tolerance level once it has recovered from a failure.
[0020] Yet another object of the present disclosure is to provide a system to attain high fault tolerance by means of dynamic configuration of computational units so that they support adaptive redundancy dynamically without compromising the system functionality.

SUMMARY
[0021] The present disclosure relates in general, to computer architecture systems, and more specifically, relates to a system and method for providing a configurable fault tolerant architecture to achieve near zero downtime of a plurality of arrangements of computers in an event of a failure of one or more computation units.
[0022] The present disclosure provides a system for providing a configurable fault tolerant architecture. The system includes at least four computation units, a configuration management unit to monitor and verify a status of the at least four computation units, and a configurable voter logic unit coupled with a shared memory. The configuration management unit is configured to obtain health status, CPU activation status, and interface address, from the at least four computation units. The configuration management unit is configured to read a slot ID and perform a self-test on each of the at least four computation units. Further, the configuration management unit is configured to verify an activation status of the at least four computation units. The configuration management unit is connected to WDT, a hot swap controller, and a temperature and power monitoring unit to collect heath status of each of the at least four computation units. The configurable voter logic unit analyses health status of the at least four computation units for configuring the fault tolerant architecture. The configurable voter logic unit configures the fault tolerant architecture by applying Adaptive Redundancy Scaling techniques. The configurable voter logic unit scales the fault tolerant architecture to a lower fault tolerant scheme in an event of a single failure. The configurable voter logic unit scales the fault tolerant architecture to a higher fault tolerance level upon recovery from the fault. The configurable voter logic unit enhances fault tolerance of the system by enabling the system to adapt to different computing configurations in an event of a fault based on a health and activation status of the at least four computation units in real-time.
[0023] In an aspect of the present disclosure, a method for providing a configurable fault tolerant architecture is disclosed. The method begins with monitoring, by a configuration management unit, health and activation status of at least four computation units. Next, the method verifies, by the configuration management unit, the activation status of the at least four computation units. In the end, the method configures, by a configurable voter logic unit, a fault tolerant architecture based on the verification.
[0024] Various objects, features, aspects, and advantages of the inventive subject matter will become more apparent from the following detailed description of preferred embodiments, along with the accompanying drawing figures in which like numerals represent like components.

BRIEF DESCRIPTION OF THE DRAWINGS
[0025] The following drawings form part of the present specification and are included to further illustrate aspects of the present disclosure. The disclosure may be better understood by reference to the drawings in combination with the detailed description of the specific embodiments presented herein.
[0026] FIG. 1 illustrates an exemplary block diagram representation (100) of the proposed system implementing concept 1 of dynamically configurable flexible computing architecture, in accordance with an embodiment of the present disclosure.
[0027] FIG. 2 illustrates an exemplary block diagram representation (200) of a computation unit of the system implementing concept 1 of dynamically configurable flexible compute architecture, in accordance with an embodiment of the present disclosure.
[0028] FIG. 3 illustrates an exemplary block diagram representation (300) of the proposed system implementing concept 2 of dynamically configurable flexible computing architecture, in accordance with an embodiment of the present disclosure.
[0029] FIG. 4 illustrates an exemplary block diagram representation (400) of the proposed system implementing concept 2 by configurable voter unit with sync and configuration control logic, in accordance with an embodiment of the present disclosure.
[0030] FIG. 5 illustrates an exemplary flow diagram representation (500) of the flow of dynamic configuration management for flexible computation architecture, in accordance with an embodiment of the present disclosure.

DETAILED DESCRIPTION
[0031] The following is a detailed description of embodiments of the disclosure depicted in the accompanying drawings. The embodiments are in such detail as to clearly communicate the disclosure. If the specification states a component or feature “may”, “can”, “could”, or “might” be included or have a characteristic, that particular component or feature is not required to be included or have the characteristic.
[0032] As used in the description herein and throughout the claims that follow, the meaning of “a,” “an,” and “the” includes plural reference unless the context clearly dictates otherwise. Also, as used in the description herein, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise.
[0033] Embodiments of the present disclosure relate to a system for providing a configurable fault tolerant architecture. The present disclosure relates in general, to computer architecture systems, and more specifically, relates to a system and method for providing a configurable fault tolerant architecture to achieve near zero downtime of a plurality of arrangements of computers in an event of a failure of one or more computation units.
[0034] In an embodiment, a system for providing a configurable fault tolerant architecture is disclosed. The system includes at least four computation units, a configuration management unit to monitor and verify a status of the at least four computation units, and a configurable voter logic unit coupled with a shared memory. The configurable voter logic unit (210) enhances fault tolerance of the system by enabling the system to adapt to one or more different computing configurations in an event of a fault based on the status of the at least four computation units (102, 104, 106, 108) in real-time. The configuration management unit is configured to obtain health status, CPU activation status, and interface address, from the at least four computation units. The configuration management unit (116) is configured to detect and isolate a fault of any of the at least four computation units (102, 104, 106, 108) at a modular level. Further, the configuration management unit is configured to verify an activation status of the at least four computation units. The configuration management unit is connected to watchdog timer (WDT), a hot swap controller, and a temperature and power monitoring unit to collect heath status of each of the at least four computation units. The configurable voter logic unit analyses health status of the at least four computation units for configuring the fault tolerant architecture. The configurable voter logic unit is configured to make the system adapt to the one or more different computing configurations comprising 2oo3+1, 2oo2+1, 2oo2 and 1oo1 configurations of the at least four computation units. The configurable voter logic unit configures the fault tolerant architecture by applying Adaptive Redundancy Scaling techniques. The configurable voter logic unit scales the fault tolerant architecture to a lower fault tolerant scheme in an event of a single failure. The configurable voter logic unit scales the fault tolerant architecture to a higher fault tolerance level upon recovery from the fault. The configurable voter logic unit enhances fault tolerance of the system by enabling the system to adapt to different computing configurations in an event of a fault based on a health and activation status of the at least four computation units in real-time.
[0035] In an embodiment of the present disclosure, a method for providing a configurable fault tolerant architecture is disclosed. The method begins with monitoring, by a configuration management unit, health and activation status of at least four computation units. Next, the method verifies, by the configuration management unit, the activation status of the at least four computation units. In the end, the method configures, by a configurable voter logic unit, a fault tolerant architecture based on the verification.
[0036] In an embodiment of the present disclosure, a computer architecture to achieve a high fault tolerance is proposed. The high fault tolerance is achieved by the system by means of dynamic configuration of one or more computation units in order to support adaptive redundancy dynamically without compromising system functionality. In an aspect of the present disclosure, at least two hardware arrangements are discussed for Dynamically Configurable Flexible Computing Architecture. A first approach of the proposed configurable fault tolerant architecture considers the distributed approach and a second approach considers the centralized approach for implementation of dynamic configuration of configurable voter. In concept 1 which is depicted in figure 1 and 2, dynamic configuration of a configurable voter logic unit is achieved by a configuration management unit and a shared memory. The configurable voter logic unit is a part of CPU logic. However, in concept 2, as depicted in figures 3 and 4, the configurable voter logic unit and configuration management unit are independent hardware apart from all compute modules.
[0037] FIG. 1 illustrates an exemplary block diagram representation (100) of the proposed system implementing concept 1 of dynamically configurable flexible computing architecture, in accordance with an embodiment of the present disclosure. Illustrated in Fig. 1 is a block diagram of a system for providing a dynamically configurable flexible, fault tolerant computing architecture where the configurable voter logic unit and the configuration management unit are arranged in a distributed fashion. Depicted in Fig. 1 is a representation of concept 1 that uses at least four computation units 102, 104, 106, and 108. Each of the at least four computation units is provided with a CPU 110a, 110b, 110c, and 110d (which are collectively referred to as CPU 110, herein) coupled to a shared memory 118a, 118b, 118c, and 118d (which are collectively referred to as shared memory 118, herein) and a configuration management unit 116a, 116b, 116c, and 116d (which are collectively referred to as configuration management unit 116, herein). Each of the CPUs 110a, 110b, 110c, and 110d communicates with one another over an interface 112a and 112b. The interface 112a and 112b (which are collectively referred to as interface 112, herein) may be a point-to-point communication interface in between each of the CPUs 110a, 110b, 110c, and 110d or a multipoint bus communication interface from all the CPUs 110a, 110b, 110c, and 110d. There is also an interface 114 provided for each of the CPUs 110a, 110b, 110c, and 110d for input and output connectivity. The configuration management unit 116a, 116b, 116c, and 116d are connected through a management bus 204. Each of the configuration management unit 116a, 116b, 116c, and 116d is internally connected to a hot swap controller 202 which enables supply of power to the at least four computation units 102, 104, 106, and 108 respectively. The configuration management unit 116a, 116b, 116c, and 116d in each of the at least four computation units 102, 104, 106, and 108 is connected to a shared memory 118a, 118b, 118c and 118d via an internal interface 120a, 120b, 120c and 120d (which are collectively referred to as internal interface 120, herein).
[0038] FIG. 2 illustrates an exemplary block diagram representation (200) of a computation unit of the system implementing concept 1 of dynamically configurable flexible compute architecture, in accordance with an embodiment of the present disclosure. Illustrated in Fig. 2 is a detailed block diagram of the at least four computation units 102, 104, 106, and 108. All of at least four computation units 102, 104, 106, and 108 have same hardware components and equal capabilities. The concept 1 approach focuses on distributed implementation of the configurable voter logic unit 210 and related required arrangement through the configuration management unit 116a, 116b, 116c, and 116d.
[0039] The configuration management unit 116a, 116b, 116c, and 116d in figure 2, is configured to handle configuration of the configurable voter logic unit 210 running as a part of the CPUs 110a, 110b, 110c, and 110d. The configuration management unit 116a, 116b, 116c, and 116d may also be responsible for local health management and communication with other CMUs to collect health status information for the at least four computation units 102, 104, 106, and 108 as discussed in figure 1. The CMU 116a, 116b, 116c, and 116d is connected to a WDT 206b, the hot swap controller 202, and a temperature and power monitoring unit 212 in order to collect heath status of each of the at least four computation units 102, 104, 106, and 108. The CMU 116a, 116b, 116c, and 116d also connected to a WDT 206a which is driven by the CPUs 110a, 110b, 110c, and 110d logic to the WDT 206a to record the CPUs 110a, 110b, 110c, and 110d active status. The CPUs 110a, 110b, 110c, and 110d may have two logic units implemented including a main control logic unit 208 and a configurable voter logic (CVL) unit 210. The CPUs 110a, 110b, 110c, and 110d have a CPU memory 220 to run the main control logic unit 208 and the CVL unit 210. The shared memory 118a, 118b, 118c, and 118d is provided with a plurality of memory sections 214, 216, 218, and specifically 216 for recording the health status of each of the at least four computation units 102, 104, 106, and 108 received from the respective CMUs 116a, 116b, 116c, and 116d.
[0040] One of the plurality of memory sections 214 of the shared memory 118a, 118b, 118c, and 118d records interface addresses that are involved in intercommunication of the at least four computation units 102, 104, 106, and 108 via the interface 112. One of the plurality of memory sections 218 holds a record of present running flexible computation fault tolerant architecture, which may be dynamically decided by the master CMU 116a, 116b, 116c, and 116d according to a proposed flow of action as illustrated in figure 5 (500).
[0041] FIG. 3 illustrates an exemplary block diagram representation (300) of the proposed system implementing concept 2 of dynamically configurable flexible computing architecture, in accordance with an embodiment of the present disclosure. Illustrated in Fig. 3 is an exemplary representation of the system implementing concept 2 of the Dynamically Configurable Flexible Computing Architecture, where the configurable voter logic unit and the configuration management unit are implemented in a central fashion. The configurable voter logic unit 324, the configuration management unit 322, and at least four identical central processor units 302, 304, 306, and 308 make up the architecture as shown in Fig. 3. A CPU, a BMC, and a communication logic unit may be found in every central processor unit. The at least four identical central processor units 302, 304, 306, and 308 run in parallel.
[0042] The configuration management unit 322 reads data from the BMCs 314, 316, 318, and 320 via an I/O bus 328. Moreover, the CPMs and the related configuration management unit 322 communicate with one another via the bus 322. The configurable voter logic unit 320 uses connections 326a, 326b, 326c, and 326d to check CPM status. The configurable voter logic unit 320 analyses health status data received from the BMCs and the CPMs for dynamically configuring the voting architecture. During boot up, the configurable voter logic unit 320 may set up voting for two-out-of-four (2-of-4). If there are one, two, or three defective modules, the system may dynamically configure to be two out of three (2-of-3), two out of two (2-of-2) and one out of one (1-of-1) modules.
[0043] FIG. 4 illustrates an exemplary block diagram representation (400) of the proposed system implementing concept 2 by configurable voter unit with sync and configuration control logic, in accordance with an embodiment of the present disclosure. Illustrated in Fig. 4 is the configurable voter logic unit and a sync and input selection logic unit 402. Along with the sync and input selection logic 402, depicted in Fig. 4 is an input capture and forward logic unit for each input coming from the central processor module 404, 406, 408, and 410 the configurable voter logic unit 414, an output capture and forward logic unit 416 and a configuration control logic unit 412.
[0044] The configuration control logic unit 412 receives an input message from the configuration management unit and forwards the message to the configurable voter logic unit 414 and the sync and input selection logic 402. This input capture and forward logic unit is connected to the sync and input selection logic unit via ports 418, 420, 422, 424. The sync and input selection logic unit use the data that is obtained from the configuration control logic unit 412. In order to dynamically configure the fault tolerant architecture, the configurable voter logic unit 414 additionally makes use of data obtained from the configuration control logic unit 412.
[0045] FIG. 5 illustrates an exemplary flow diagram representation (500) of the flow of dynamic configuration management for flexible computation architecture, in accordance with an embodiment of the present disclosure. Illustrated in figure 5 is a diagrammatic representation of a flow of dynamic configuration management for the flexible and fault tolerant computation architecture. At step 502, the system is booted to supply power to the fault tolerant computer architecture. At step 504, the Configuration Management Unit (CMU) monitors the health and relevant status of the at least four computation units. In this phase, the CMU reads a chassis slot ID and performs a self-test on each of the at least four computation units. At step 506, a Master CMU is appointed according to the slot ID. The Master CMU is configured to initiate queries to the other CMUs. At steps 508 and 510, the Master CMU obtains the health status from the other CMUs, along with a CPU activation status and interface addresses for communication. The Master CMU records all details of the other CMUs in a local shared memory and distributes a copy to all active CMUs. At step 514, the Master CMU verifies the CPU activation status of the at least four computation units. At step 516, an overview of availability and health status of the at least four computation units is provided to enable decision-making. If this condition holds true, the method proceeds to step 520; otherwise, the method proceeds to step 518.
[0046] At step 520, if the at least four computation units are in an available and healthy state, the Master CMU may save system configuration in a 2oo3+1 (1 redundant CE) architecture. At step 526, the local memory would be updated with the new configuration, and the Master CMU may send the same update to the CMUs of other computation units of the at least four computation units. At step 528, the CMUs of other computation units of the at least four computation units update the local memory with a latest configuration of the system. This node links to subsequent steps in the method. At step 530, the CMU releases memory to be read by the Configurable Voter Logic (CVL) unit. At step 532, the CVL unit reads the shared memory to obtain interface address. At step 534, the CVL unit is run and at step 536, the CVL unit executes the determined configurable and fault tolerant architecture of the system. This node is connected to the step 506, which involves repetition of a system cycle to check and verify configuration within a specific time. If the condition is false, then at step 518, an overview of the availability and health status of at least three computation elements is provided, offering guidance for the next steps in the decision-making. If this condition holds true, the method proceeds to step C; otherwise, the method proceeds to step 522. However, if the condition holds true, then after the true condition of step 518, the method advances to node C. At step 540, if the at least four computation units are in an available and healthy state, the Master CMU may save the system configuration in a 2oo2+1 (1 redundant CE) architecture.
[0047] At step 542, the local memory with would be updated with the new configuration, and the Master CMU may send the same update to the CMUs of the other computation units. At step 544, the CMU releases memory to be read by the CVL unit. This step concludes at step 552, followed by step 532 of the method. If the condition is false then the method proceeds to step 522 which provides an overview of the availability and health status of at least two computation units, offering guidance for the next steps in the decision-making process. If this condition holds true, the method proceeds to step D; otherwise, the method proceeds to step 524. If the condition holds true at step 110, the method advances to node D. At step 121, if the at least four computation units are in an available and healthy state, the Master CMU may save the new configuration in a 2oo2 architecture. At step 546, the local memory is updated with the new configuration, and the Master CMU may send the same update to the CMUs of the other computation units. At step 548, the CMU releases memory to be read by the CVL unit. The step 548 concludes at step 552, followed by the step 532 of the method. If the condition holds false then the method proceeds to step 524. If a false condition exists in this position, the system would activate a safe shutdown by using the 1oo1 system configuration.
[0048] It will be apparent to those skilled in the art that the system of the disclosure may be provided using some or all of the mentioned features and components without departing from the scope of the present disclosure. While various embodiments of the present disclosure have been illustrated and described herein, it will be clear that the disclosure is not limited to these embodiments only. Numerous modifications, changes, variations, substitutions, and equivalents will be apparent to those skilled in the art, without departing from the spirit and scope of the disclosure, as described in the claims.

ADVANTAGES OF THE PRESENT INVENTION
[0049] The present disclosure provides a system and method for providing a configurable fault tolerant architecture to achieve near zero downtime of a plurality of arrangements of computers in an event of a failure of one or more computation units.
[0050] The present disclosure provides a system for providing a configurable fault tolerant architecture to achieve near zero downtime of a plurality of arrangements of computers in an event of a failure of one or more computation units to achieve higher degree of fault tolerance in comparison with conventional fault tolerant architecture.
[0051] The present disclosure provides a system that achieves high fault tolerance without any additional hardware requirement compared to conventional system.
[0052] The present disclosure provides a system that takes care that each CPUs executes the common application code regardless of the fault tolerant architecture when it gets configured from one fault tolerant architecture to other.
, Claims:1. A system (100) for providing a configurable fault tolerant architecture comprising:
at least four computation units (102, 104, 106, 108);
a configuration management unit (116) to monitor and verify a status of the at least four computation units (102, 104, 106, 108); and
a configurable voter logic unit (210) coupled with a shared memory (118),
wherein the configurable voter logic unit (210) enhances fault tolerance of the system by enabling the system to adapt to one or more different computing configurations in an event of a fault based on the status of the at least four computation units (102, 104, 106, 108) in real-time.
2. The system (100) as claimed in claim 1, wherein the configuration management unit (116) is configured to obtain health status, CPU activation status, and interface address, from the at least four computation units 102, 104, 106, 108).
3. The system (100) as claimed in claim 1, wherein the configuration management unit (116) is configured to detect and isolate a fault of any of the at least four computation units (102, 104, 106, 108) at a modular level.
4. The system (100) as claimed in claim 1, wherein the configurable voter logic unit (210) is configured to make the system adapt to the one or more different computing configurations comprising 2oo3+1, 2oo2+1, 2oo2 and 1oo1 configurations of the at least four computation units (102, 104, 106, 108).
5. The system (100) as claimed in claim 1, wherein the configuration management unit (116) is connected to a watchdog timer (WDT) (206b), a hot swap controller (202), and a temperature and power monitoring unit (212) to collect heath status of each of the at least four computation units (102, 104, 106, 108).
6. The system (100) as claimed in claim 1, wherein the configurable voter logic unit (210) analyses health status of the at least four computation units (102, 104, 106, 108) for configuring the fault tolerant architecture.
7. The system (100) as claimed in claim 1, wherein the configurable voter logic unit (210) configures the fault tolerant architecture by applying Adaptive Redundancy Scaling techniques.
8. The system (100) as claimed in claim 1, wherein the configurable voter logic unit (210) scales the fault tolerant architecture to a lower fault tolerant scheme in an event of a single failure.
9. The system (100) as claimed in claim 1, wherein the configurable voter logic unit (210) scales the fault tolerant architecture to a higher fault tolerance level upon recovery from the fault.
10. A method (500) for providing a configurable fault tolerant architecture comprising steps of:
monitoring (504), by a configuration management unit (116), health and activation status of at least four computation units (102, 104, 106, 108);
verifying (514), by the configuration management unit (116), the activation status of the at least four computation units (102, 104, 106, 108) present; and
configuring (516), by a configurable voter logic unit (210), a fault tolerant architecture based on the verification.

Documents

Application Documents

#	Name	Date
1	202441019302-STATEMENT OF UNDERTAKING (FORM 3) [15-03-2024(online)].pdf	2024-03-15
2	202441019302-POWER OF AUTHORITY [15-03-2024(online)].pdf	2024-03-15
3	202441019302-FORM 1 [15-03-2024(online)].pdf	2024-03-15
4	202441019302-DRAWINGS [15-03-2024(online)].pdf	2024-03-15
5	202441019302-DECLARATION OF INVENTORSHIP (FORM 5) [15-03-2024(online)].pdf	2024-03-15
6	202441019302-COMPLETE SPECIFICATION [15-03-2024(online)].pdf	2024-03-15
7	202441019302-Proof of Right [16-09-2024(online)].pdf	2024-09-16
8	202441019302-POA [07-10-2024(online)].pdf	2024-10-07
9	202441019302-FORM 13 [07-10-2024(online)].pdf	2024-10-07
10	202441019302-AMENDED DOCUMENTS [07-10-2024(online)].pdf	2024-10-07
11	202441019302-Response to office action [01-11-2024(online)].pdf	2024-11-01