Integrated System On Chip (Soc) Based Auto Diagnosing Fault Tolerant

< Back

Integrated System On Chip (Soc) Based Auto Diagnosing Fault Tolerant (Adft) Module

Abstract: The present disclosure provides a System On-Chip (SOC) based Auto Diagnosing Fault Tolerant (ADFT) system having auto diagnosing, fault tolerant, less booting time, fly remote programming to increase the system availability and platform based on the indigenously developed IP cores for DVI, USB, Ethernet, Configurable UARTs, ADC, multiple memories such as SD card and QSPI for booting purposes and non-volatile memory cluster of NAND Flash, MRAMS and NVSRAM for the continuous storage of the events for 3.88 days.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

31 March 2023

Publication Number

40/2024

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

Parent Application

Applicants

BHARAT ELECTRONICS LIMITED

Outer Ring Road, Nagavara, Bangalore 560045, Karnataka, India

Inventors

1. Suresh Ammaiappan

Embedded Systems/PDIC, Bharat Electronics Limited, Jalahalli P.O., Bangalore-560013, Karnataka, India

2. Pruthviraj N N

Embedded Systems/PDIC, Bharat Electronics Limited, Jalahalli P.O., Bangalore-560013, Karnataka, India

3. Yatam Venkata Krishna Reddy

Embedded Systems/PDIC, Bharat Electronics Limited, Jalahalli P.O., Bangalore-560013, Karnataka, India

4. Mohan Kumar M

Embedded Systems/PDIC, Bharat Electronics Limited, Jalahalli P.O., Bangalore-560013, Karnataka, India

5. Nihar Ranjan

Embedded Systems/PDIC, Bharat Electronics Limited, Jalahalli P.O., Bangalore-560013, Karnataka, India

Specification

DESC:TECHNICAL FIELD
[0001] The present disclosure relates generally to a system on chip (SoC). The disclosure more particularly relates to System On-Chip (SOC) based Auto Diagnosing Fault Tolerant (ADFT) system.
BACKGROUND
[0002] Over the last decade, demand for high speed, more memory, more number of communication interfaces and IO lines have been increasing exponentially for any smart processing modules. The intelligence can be built through controllers, processors, FPGA’s, Integrated SOC’s, etc. The programmable and configurable intensity of artificial intelligence is based on the processing capability, instruction execution speed, number of core processors, supported memory range, interconnection capability, power consumption, etc. Generally, GPUs or CPUs such as Intel core processors, AMD core processors, etc., are used to make SBC, SOM because of more number of high speed core processors in a single chip and the support for the OS due to which more functionality can be realized by making use of the inbuilt library functions. Whereas SOC with inbuilt multiple ARM controllers are capable for application dense, real time and multimedia domains specially for the low, medium, high speed and even for the very high speed range applications. In addition to these inbuilt ARM controllers, considering the availability of any number of vendor specific soft-core processors such as NIOS, BLAZE, etc. processors, many architecture specific hard cores - DSP slices, memory elements, PLL’s, temperature sensors, ADC’s, DAC’s, etc. and many soft core libraries such as PCIe IP core, LAN IP core, configurable UART IP core, etc. Integrated SOC based SBC, SOM modules, or the custom computing modules designed with unique fault tolerant architecture are proposed for the low, medium, high speed and even for the very high-speed range applications including the real time and multimedia where low power, Command, Control, computing and Communication are the prime requirements.
[0003] In case of integrated SOC based SOM module, the basic off chip components such as power supply, reset, clock, boot memory, JTAG, On chip memory expansion circuits such as DDR, and additional data memory such as NAND and NOR required to operate the SOC, transceiver or adapter circuits to extend the available standard on chip peripherals such as PCIe, LAN, USB, UART, I2C, SPI, etc. are provided on the SOM and the termination of all the interfaces for B2B and C2C and the unused GPIO’s are provided at the connectors.
[0004] There are no automotive grade processors suitable for the high computational requirements. Whereas the unique architecture is designed considering the compatibility requirements for different environments. SOC’s and all the IC’s on SOM are identified with pin to pin and footprint compatibility to support different operating environments.
[0005] Error free communication or with less error, adaptability or resilience of the system even with unexpected sequence of operations without compromise on the performance through auto / self-diagnosing capability, quick booting, quick recovery though the implemented online monitoring through specialized debugging structure, large memory to trace the events, rich set of interfaces for communication or for sensor or control, indigenously developed IPs for memory controllers, ADC configuration and accessing the digital samples for multiple channels, and various opto isolated sensor and control signals, various booting mechanisms including the remote programing, are embodied in the proposed platform and disclosed.
[0006] There is a tradeoff between resources, power and size. Processors and controllers have limited resources and hence cannot accommodate many interfaces on a single board. One approach to accommodate more interfaces is to use many processors in cascade. But their high price and difficulty to program have resulted in a very low acceptance rate. Consequently, users prefer to use low cost general work station with limited resources. Some applications demand for redundant memories to avoid any data loss which can lead to critical failure during the mission. Hence, the proposed platform has been designed to have a provision for memory expansion through indigenously developed memory controllers in programmable Logic section of SOC.
[0007] US patent no. 11,314,508 B1 dated April 26, 2022, relates to FPGA Based computing system for processing data in size, weight, and power constrained environments. The document discloses technologies that are well suited for use in size, weight, and power (SWAP)-constrained environments are described herein. A host controller dispatches data processing instructions to hardware acceleration engines (HAEs) of one or more field programmable gate arrays (FPGAs) and further dispatches data transfer instructions to a memory controller, such that the HAEs perform processing operations on data stored in local memory devices of the HAEs in parallel with other data being transferred from external memory devices coupled to the FPGAs to the local memory devices.
[0008] US patent no. 11,169,722 B2 dated November 09, 2021 which relates to Memory System and SoC including Linear address remapping logic. The document discuss about a system - on - chip is connected to a first memory device and a second memory device. The system - on - chip comprises a memory controller configured to control an interleaving access operation on the first and second memory devices. A modem processor is configured to provide an address for accessing the first or second memory devices. A linear address remapping logic is configured to remap an address received from the modem processor and to provide the remapped address to the memory controller. The memory controller performs a linear access operation on the first or second memory device in response to receiving the remapped address.
[0009] US patent no. 10, 372,859 B2 dated August 06, 2019, which relates to System and method for designing System on Chip ( SOC ) circuits using Single Instruction Multiple Agent instructions. The document disclose a system and method for designing SoC by using a reinforcement learning processor. An SoC specification input is received and a plurality of domains and a plurality of subdomains is created using application specific instruction set to generate chip specific graph library. An interaction is initiated between the reinforcement learning agent and the reinforcement learning environment using the application specific instructions. Each of the SoC sub domains from the plurality of SoC sub domains is mapped to a combination of environment, rewards and actions by a second processor. Further, inter action of a plurality of agents is initiated with the reinforcement learning environment for a predefined number of times and further Q value, V value , R value , and A value is updated in the second memory module. Thereby, optimal chip architecture for designing SoC is acquired using application domain specific instruction set (ASI).
[0010] US patent no. 9,891,687 B2 dated February 13, 2018, which relates to Image forming apparatus, System On Chip (SoC) Unit, and Driving method thereof. The document discuss about an image forming apparatus is connected to a host device including first and second power domains which are separately supplied with power and includes first and second memories to be disposed in the second power domain, a main controller disposed in the first power domain and to perform a control operation using the first memory in a normal mode , and a sub - controller disposed in the second power domain and perform a control operation using the second memory in a power - saving mode , where when the normal mode is changed to the power - saving mode a power supply to the first power domain is shut off , the first memory operates in a self - refresh mode , and the main controller copies central processing unit ( CPU ) context information into a context storage unit , and when the power - saving mode is changed to the normal mode , the main controller is booted using the CPU context information stored in the context storage unit.
[0011] Heance, there is need of a system which not only automatically diagnose the system but also take appropriate steps to resolve in case of any failure in the system.

SUMMARY
[0012] This summary is provided to introduce concepts of the invention related to a System On-Chip (SOC) based Auto Diagnosing Fault Tolerant (ADFT) system, as disclosed herein. This summary is neither intended to identify essential features of the invention as per the present invention nor is it intended for use in determining or limiting the scope of the invention as per the present invention.
[0013] In accordance with the present invention, A System On-Chip (SOC) based Auto Diagnosing Fault Tolerant (ADFT) system comprising: a plurality of communication terminals configured to connect to a plurality of external subsystems and a plurality of memories; a processor connected to the plurality of communication terminals and configured to perform a sequence of operations comprising: receive data from the external subsystems in a predetermined time-period cycle and store the data in at least one memory; determine status of one or more predetermined events occurring at each of the external subsystems; determine one or more of the external subsystems from which data is not received; and set priority to receive data from the determined external subsystems in the next predetermined time-period cycle; and display the status of each of the external subsystems based on the data.
[0014] In an embodiment, the processor is configured to perform fault tolerance through continuous tracking of the sequence of operations.
[0015] In an embodiment, the processor is configured to update boot image while the current boot image is under operation.
[0016] In an embodiment, the processor is configured to perform fault tolerance upon occurrence of faults, wherein the faults are system failure, incorrect sequence of operation, and data error.
[0017] In an embodiment, the predetermined time-period is 10 milliseconds (ms).
[0018] In an embodiment, the data is stored in the plurality of the memories with a real-time clock data.
[0019] In an embodiment, when data error occurs in the data received from one or more sensors of the external subsystems, the processor is configured to perform fault tolerance by considering an average value of data received for each of the external system within the predetermined time-period.
[0020] In an embodiment, when data error occurs in the data received from the external subsystems, the processor is configured to perform fault tolerance by performing parity check or requesting retransmission from the external subsystems.
[0021] In an embodiment, when incorrect sequence of operation occurs, the processor is configured to perform fault tolerance by generating an alarm if the predetermined event is not determined at least three times consecutively.
[0022] In an embodiment, when system failure occurs, the processor is configured to perform fault tolerance by restoring the system data and resuming the sequence of operations from where it was stopped.
[0023] In an embodiment, the plurality of memories stores same data to ensure data availability during occurrence of any fault.
[0024] In another aspect of the present invention, a method for Auto Diagnosing Fault Tolerant (ADFT) comprising: receiving, by a processor, data from plurality of external subsystems in a predetermined time-period cycle through plurality of communication terminals; storing, by the processor, the data in one or more memories; determining, by the processor, status of one or more predetermined events occurring at each of the external subsystems; determining, by the processor, one or more of the external subsystems from which data is not received; and setting priority, by the processor, to receive data from the determined external subsystems in the next predetermined time-period cycle; and displaying, by a display unit, the status of each of the external subsystems.
[0025] In an embodiment, method further comprises performing fault tolerance, by the processor, through continuous tracking of the sequence of operations.
[0026] In an embodiment, method further comprises updating boot image, by the processor, while the current boot image is under operation.
[0027] In an embodiment, method further comprises performing fault tolerance, by the processor, upon occurrence of faults, wherein the faults are system failure, incorrect sequence of operation, and data error.
[0028] In an embodiment, the predetermined time-period is 10 milliseconds (ms).
[0029] In an embodiment, method further comprises storing the data in the plurality of the memories with a real-time clock data.
[0030] In an embodiment, when data error occurs in the data received from one or more sensors of the external subsystems, the method comprises performing fault tolerance by considering an average value of data received for each of the external system within the predetermined time-period.
[0031] In an embodiment, upon occurrence of data error in the data received from the external subsystems, the method comprises performing fault tolerance by performing parity check or requesting retransmission from the external subsystems.
[0032] In an embodiment, when incorrect sequence of operation occurs, the method comprises performing fault tolerance by generating an alarm if the predetermined event is not determined at least three time consecutively.
[0033] In an embodiment, upon occurrence of system failure, the method comprises performing fault tolerance by restoring the system data and resuming the sequence of operations from where it was stopped.
[0034] In an embodiment, the method further comprises storing same data into the plurality of memories stores to ensure data availability during occurrence of any fault.

BRIEF DESCRIPTION OF ACCOMPANYING DRAWINGS
[0035] The detailed description is described with reference to the accompanying figures.
[0036] Figure 1 illustrates a functional block diagram, according to an exemplary implementation of the present disclosure.
[0037] Figure 2 illustrates functional block diagram of Auto Diagnosing Fault Tolerant (ADFT) system, according to an exemplary implementation of the present disclosure.
[0038] It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative methods embodying the principles of the present disclosure. Similarly, it will be appreciated that any flow charts, flow diagrams, and the like represent various processes which may be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
DETAILED DESCRIPTION
[0039] The present disclosure describes a System On-Chip (SOC) based Auto Diagnosing Fault Tolerant (ADFT) system.
[0040] In the following description, for purpose of explanation, specific details are set forth in order to provide an understanding of the present disclosure. It will be apparent, however, to one skilled in the art that the present disclosure may be practiced without these details. One skilled in the art will recognize that embodiments of the present disclosure, some of which are described below, may be incorporated into a number of systems.
[0041] However, the systems and methods are not limited to the specific embodiments described herein. Further, structures and devices shown in the figures are illustrative of exemplary embodiments of the presently disclosure and are meant to avoid obscuring of the presently disclosure.
[0042] It should be noted that the description merely illustrates the principles of the present invention. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described herein, embody the principles of the present invention. Furthermore, all examples recited herein are principally intended expressly to be only for explanatory purposes to help the reader in understanding the principles of the invention and the concepts contributed by the inventor to furthering the art and are to be construed as being without limitation to such specifically recited examples and conditions. Moreover, all statements herein reciting principles, aspects, and embodiments of the invention, as well as specific examples thereof, are intended to encompass equivalents thereof.
[0043] The present disclosure relates to System On-Chip (SOC) based Auto Diagnosing Fault Tolerant (ADFT) system.
[0044] In an embodiment, the present disclosure describes high computing smart module made of integrated System On Chip (SOC) with auto diagnosing, fault tolerant, less booting time (in msec.), range for mission critical applications, on the fly remote programming to increase the system availability and platform developed based on the indigenously developed IP cores for DVI, USB, Ethernet, Configurable UARTs, ADC, multiple memories such as SD card and QSPI for booting purposes and non-volatile memory cluster of NAND Flash, MRAMS and NVSRAM for the continuous storage of the events for 3.88 days, etc in a single embodiment are disclosed. Further, the capability to be used as SBC with multiple memories, Black box type functionality not only for restoring the System data but also to resume its operation from where it was stopped due to unexpected power failure, processing capability due to inbuilt dual ARM cores and large programmable logic section for the concurrent execution of standard and custom interfaces along with uninterrupted storage of the events on non-volatile memories and rich set of indigenously developed IP cores for the simultaneous operation in interrupts mode and quick and easy way of testing and debugging capabilities through the inbuilt PSOT and BITE based testing and verification modules along with custom test jig are discussed.
[0045] In an embodiment, the present disclosure discloses an auto diagnosing system in which the occurrence of the all the predetermined events within 10 msec along with indexed values of the captured data from other subsystems are stored in Non-volatile memories. Once for every 10 msec time, these events are captured irrespective of the sequence of the occurrence and at the end of 10 msec, if any of the events has not occurred within the specified time segment, provision to configure the module is provided to go ahead with the captured data in the previous time segment with other option of notification of the subsystem. In case, other subsystem is not active, then also the module which has GUI as a part will be displaying the status that a particular event is not occurring from that particular subsystem. Even if one or more events are occurring at the end of 10 msec frame, processing engine performs the required operations on first priority and in parallel, module proceeds to the next segment frame. If a particular sequence within 10 msec is preferred, accordingly indexing of event, processing time and weighting time for an event, handshaking mechanism along with visual alarms are enabled.
[0046] In an embodiment, the system further discloses that all the captured data are stored with real time clock (RTC) values which are used for continuous tracking and auto diagnosing.
[0047] In an embodiment, an Integrated System-on-Chip (SOC) based auto diagnosing Fault tolerant (ADFT) module is developed for mission critical applications with high reliability and availability as the inherent factors. Hence, to support the maximum availability of the module, Boot image updation is implemented while the current boot image is under operation through remote programming using serial port and Ethernet ports. Based on the location of the product deployment, and the required urgency; the mode of remote programming is selected as ON the fly remote programming is achieved to increase the equipment availability.
[0048] Further, the same serial port and Ethernet based interfaces are used to download the complete memory data for offline analysis and for record maintenance. The communication is first authenticated using secure access and with assured Data integrity.
[0049] In an embodiment, the module is developed for mission critical applications with very low mission critical and initial booting time within few milli seconds. The maximum the claimed quick boot time is less than 1 sec, achieved through increased SPI clock, compressed boot image and the appropriately selected QSPI flash.
[0050] In an embodiment, the structure of multiple Non-volatile memories for quick access, huge data storage for extended tracking capability, redundancy to ensure the data availability at all circumstances, and to ensure the correctness of the data. Since the NAND Flash access time is less compared to MRAM and NVSRAM, one page size (8K) of captured data is written into MRAM first and NVSRAM in parallel. Then they are copied into NAND flash in the back end. Data are stored in NAND Flash and MRAM in linear circular addressing mode, where in NVSRAM, 16K size are used as buffers.
[0051] In an embodiment, the Integrated System On Chip (SOC) based auto diagnosing Fault tolerant (ADFT) module is configured to work as SBC for mission critical applications with high speed multiple interfaces such as dual Gigabit Ethernet interfaces with jumbo frame support up to 8K, PCIe, USB, multiple serial ports, dual inbuilt ARM cores etc., and the module supports Peta LINUX OS and mainly because of Configurable Platform Architecture to make the platform suitable for the different range /scale of applications.
[0052] In an embodiment, the fault tolerant architecture achieved through error detection, correction mechanisms, and the Non-Volatile memories for storing the continuous events and through continuous tracking of the sequence of operations without the compromise on the performance.
[0053] In an embodiment, the continuous tracking of the events for more than 100 hours with high reliability data capture sequences, is possible.
[0054] In an embodiment, the drivers, APIs and the BSP operates with universal environment like OS and Non-OS through effective implementation of interrupts against standard polling method.
[0055] Referring to figure 1, a Board include of ZynqSoC (hereafter referred as processor or controller or system on chip (SoC)), (350K) with 64MB QSPI flash, 20MB MRAM, 1MB NVSRAM, 8GB NAND Flash and 1024MB DDR3 memory. The board supports 2 Ethernet ports, one USB port, Five RS422 and two RS232 ports to backplane. At least forty one Digital output channels (DOP) and twenty eight Digital Input Channels (DIPs) are provided to the backplane from FPGA through level convertors. At least eight ADC channels are connected to backplane for the system control and monitoring purpose. On board NVSRAM, 1MB is provided for critical data backup. One RS232 port is available for Debug purpose separately. It also supports two RTC ports (one RTC to NVSRAM and other connected to PS and PL side) and one RESET port, one external Current Sensor and one external Voltage Sensor ports. This is the proven board configuration, But, due to the indigenous development of IP cores for all the interfaces and because of the Configurable Platform Architecture (CPA) the number of interfaces and type of interfaces are configurable and expandable based on the application(s).
[0056] ZynqSOC is Zynq family based on the Xilinx SoC architecture. The chosen FPGA is dual-core ARM CortexA9 based processing system (PS) and Xilinx programmable logic (PL) in a single device. The ARM Cortex-A9 CPUs are the heart of the PS and also include on-chip memory, external memory interfaces, and peripheral connectivity interfaces to USB PHY, Gigabit Ethernet PHY, CAN Interface ,RS422 and R232 transceivers etc. The PS and PL are interconnected through ARM AMBA AXI based High-bandwidth connectivity.
[0057] QSPI flash is used for loading the boot loader. 32 MB QSPI flash is provided as QSPI flash. Two such devices are used to support 64MB.DDR3 memory having 1024MB is used as run time memory. DDR section for 1024MB DDR3, 64-bit requirement is realized using two DDR3 ICs, each of 32-bit data width. 1MB NVSRAM is used on the board and further expandable as per requirement.
[0058] MRAM is the ideal memory solution for applications that must permanently store and retrieve critical data and programs quickly. The chosen MRAM provides highly reliable data storage over a wide range of temperatures. MRAM includes features like density of DRAM, Speed of SRAM and non-volatility of flash memory. Totally 5 such ICs each of 4MB are used to accommodate 20MB. NAND Flash is a type of non-volatile storage technology that does not require power to retain data.
[0059] NAND Flash devices may include an asynchronous data interface for high-performance I/O operations. There are five control signals used to implement the asynchronous data interface: CE#, CLE, ALE, WE#, and RE#. These devices use a highly multiplexed 8-bit bus to transfer commands, address, and data.
[0060] Ethernet PHY includes 10/100/1000 Mbps Ethernet interface provided for external board to board communication. Physical layer device is a single 10/100/1000 Gigabit Ethernet transceiver. The transceiver implements the Ethernet physical layer portion of the 1000BASE-T, 100BASE-TX, and 10BASE-T standards. Ethernet interface is configured in Processing System (PS) – ARM side. An integrated switching voltage regulator to generate all required voltages and operate in multiple modes. There are 2 On chip peripherals for Ethernets. As of now, one Ethernet is incorporated by making use of one of 2 on chip Ethernet cores. This Ethernet doesn’t support the jumbo frame. Whereas Ethernet interface provided at PL through IP core supports jumbo frame up to 8K. N number Ethernet interfaces can be provided limited by the application and the number GPIO pins available on the chosen SoC.
[0061] USB PHY may include at least one USB interface provided from PS section of FPGA. A Hi-speed USB2.0 Transceiver provides a configurable physical layer (PHY) solution and is an excellent match for a wide variety of products. 24 MHZ clock source is dedicatedly connected to PS section for USB. Outstanding ESD robustness eliminates the need for external ESD protection devices in typical applications.
[0062] ADC (Analogue to Digital Converter) may include at least one ADC chip, which support 8 analogue inputs channels. The said ADC is of 16 bit, and used for simultaneous sampling, analog-to-digital Data Acquisition Systems (DAS) with eight channels. The single supply operation, on-chip filtering, and high input impedance eliminate the need for driver op amps and external bipolar supplies. Flexible interface and fully integrated data acquisition solution may be achieved.
[0063] The board supports 28 DIP channels and 41 DOP channels. DIP channels take the input through VME / VPX connector and convert the 28V input into 3.3V CMOS signal as input to PL section of FPGA. These hermetically sealed Opto-couplers are capable of operation and storage over the high temperature range. DOP channels are controlled with low-level 3.3V and 1.8V CMOS input signals respectively. Each DOP control acts as a switch in allowing and sending the 28V input as output to a particular channel.
[0064] The board supports 5 No’s RS422 and 2 No’s RS232 interfaces, out of which 2 No’s RS422 and 1 No RS232 (debug) interfaces are connected to PS and remaining interfaces are connected to PL. One debug interface RS232 is connected to micro d9 connector, and the rest are extended to the VME / VPX connector.
[0065] At least two clock sources- 33.33 MHZ (ASEMB-33.333MHz-LY-T) and 24 MHz (ASEMB-24.000MHz-LY-T) are connected with PS section SoC. Using the main clock source 33.33 MHz, all the required clocks for the internal core operations and for all the external interfaces can be derived. Another clock 24 MHz is used only for the USB operation.
[0066] A miniature surface mount high-performance Inertial Measurement Unit (IMU) and Attitude Heading Reference System (AHRS) is used, as illustrated in figure 1. The sensor is considered both an IMU in that it can output acceleration, angular rate, and magnetic measurements along the X, Y, & Z axes of the sensor as well as an AHRS in that it can output filtered attitude estimates of the sensor with respect to a local coordinate frame. It senses the co-ordinates and sends signals to FPGA. The sensor is interfaced with the FPGA using SPI Protocol.
[0067] The real-time clock (RTC) device is programmed serially through an I2C bidirectional bus. It has a built-in power-sense circuit that detects power failures and automatically switches to the backup supply.
[0068] PCIe Gen 2 of x4 configuration and 5 Gbps bandwidth has been used on the board. The board has been tested as an endpoint configured device with on board clock and reset. Data from host is stored into DDR3 or Block memory Generator. The design is implemented in Programmable Logic Side of the SoC. PCIe data transfer in both QSPI and SD Card mode of booting has been tested. Image size has been decreased from 300 MB to 9 MB by changing the file system from system EXT4 to JFFS2.
[0069] Solid state Relay (SSR) is a Photovoltaic Relay which is a single-pole, normally open solid-state relay. It is particularly suited for isolated switching of high currents from 12 to 48 Volt AC or DC power sources. N number of SSR based controls can be provided and can be customized based on the application requirement.
[0070] Board Support Package (BSP) and Application Portable Interface (API): BSP with API has been developed to collect a status frame of 256 bytes for every 10 msec. A counter value representing every 10ms and RTC value are part of this status frame. The occurrence of the events in 10ms Nth frame are verified in N+1th frame and accordingly the priority is reassigned and accordingly the status information is shared for the required set up verification and for the corrective action by the operator. In case of the occurrence of the errors are consistent, though they are corrected to some extent, the same info is passed for the cross verification and correcting the set up based on the observation. In case of the rearranged order, developed functionality is to report the status and does not wait beyond the time limit to ensure the proper sequence of operation. Periodically all the slave interfaces such as NAND flash, MRAM, NVSRAM, are verified based on the known plain text writing and reading.
[0071] For bare metal applications where the image size is in terms MB, QSPI based booting mechanism is provided. Whereas for OS based applications including of file system and large image size, SD card based booting is provided. For remote programming, Serial and Ethernet based options are provided to increase the equipment availability. In both the options, image file is transferred from PC to board. Initially, image will be stored in DDR and using specific commands, image from DDR is moved to QSPI, where the old image will be overwritten by the new image. Booting time is 800msec with QSPI booting. QSPI IO frequency is 166MHz. For bare metal applications, QSPI booting is preferred and faster than any booting options.
[0072] Inertial measurement unit and attitude heading reference system (IMU-AHRS) device is interfaced with FPGA using SPI interface. Roll, pitch, yaw, Angular rates and accelerating rate values are captured and sent to GUI. The board is calibrated by mounting on a 3 axis turn table and measured with the tolerance of ± 1 deg. All the UARTs are given the option to configure to any user provided baud rate. With this, board will be useful across multiple slave systems. The design is implemented in Programmable Logic Side of the SoC.
[0073] The data (256 bytes health packet) is sent to GUI for every 10ms. The board include of 8GB NAND flash. More than 100 hours of continuous tracking is possible with 8GB NAND flash. NAND flash bad block feature is implemented to skip any future bad blocks which may occur during the mission. Five MRAM’s are provided for redundancy. 256 bytes of data is written to MRAM every 10ms and once 8k size is reached, the data is written to NAND flash. With this data is available both in MRAM and NAND. Only after 20MB MRAM’s are filled, data will be overwritten. But the old data will be still available in the NAND flash.
[0074] The GUI dashboard is designed to show the operational mode of the system, power status, and overall peripheral wise status. The GUI has 2 modes of operations as explained below:
[0075] Test Operational Mode: This mode is used to test the hardware by each peripheral. Each peripheral is evaluated for different parameters and the results are shown in respective peripheral pages. The ADC sensor values are parsed, scaled and converted to human readable formats and displayed appropriately. The IMU-AHRS sensor values are in a structure form of single floating point which are converted to hex and then converted to human readable format for display. The DOP’s can be set from the GUI and all the rules/logic for DOP’s are applied according to the business use-case. The memories are tested completely by uploading known data from GUI, downloading the same data from memories and then comparing both the data for correctness. The GUI accommodates all the options for uploading known data for different memories, downloading the data, store, compare the data byte by byte and shows the results accordingly. The GUI has options to trigger erase memories separately. The GUI has the option to process the downloaded data into structures that can be used for analysis. The processed data can be downloaded as file for further analysis. The download data from memories is available in redundant mechanism by having both Serial and Ethernet options.
[0076] Main Operational Mode: This mode is used during the operations of the system. A single page dashboard with critical parameters and interfaces is designed to have a snapshot view of the system during operations. This accommodates critical interfaces status, and timer’s data. The real-time data can be viewed using a single click provided on the dashboard.
[0077] In an embodiment, the system on chip disclosed in figure 1 may further connected to a display, one or more external devices through plurality of the communication terminals present on the board. The display (not shown) is configured to display results of the system or as an indicator about the system performance or measurement data of the system. The external device is connected to system to share the data with the system, the external device may include one or more processor, display unit, sensor to sense the data, etc. the external devices or the SoC communicably connected to each other. The system further include one more communication terminal to communicate with the plurality of system, base station or any suitable communication device or the user device to receive the data or transmit the data to the user device etc.
[0078] Referring to figure 2, figure 2 illustrates a functional flow diagram implemented according to an embodiment of the present application. When a system as illustrated in figure 1 starts, the Auto Diagnosing Fault Tolerant (ADFT) system turn ON with the system start. The auto Diagnosing Fault Tolerant (ADFT) system is inherent along with the actual applications related operations implemented on the system or the board. Product s/w execution with ADFT features Start, if the systems’ Power-On Self-Test (POST) is pass then the main Sequence of product operation begins. Fault tolerance and Periodical BITE as auto diagnosing are developed to be inherent along with the actual applications related operations. If the Power-On Self-Test (POST) fails, ALARM, Product is not sent to customer, until it is repaired.
[0079] In an embodiment, Auto Diagnose is performed at POST and Periodical BITE.
[0080] In an embodiment, in the System On-Chip (SOC) based Auto Diagnosing Fault Tolerant (ADFT) system, Fault tolerance is not enabled as part of POST intentionally. Fault tolerant architecture achieved through error detection, correction mechanisms, redundancy, and Non Volatile memories for storing the continuous events and through continuous tracking of the sequence of operations.
[0081] In an embodiment, Fault tolerance functions implemented according to the diagnosis of the fault in the system, the system takes following action, as fault tolerant system:
[0082] In an embodiment, the system illustrated in figures has plurality of memories connected to the main processor.
[0083] MEMORY REDUNDANCY: Multiple Non-volatile memories for quick access, huge data storage for extended tracking capability, redundancy to ensure the data availability at all circumstances, and to ensure the correctness of the data. Since the NAND Flash access time is less compared to MRAM and NVSRAM, one page size (8K) of captured data is written into MRAM first and NVSRAM in parallel. Then they are copied into NAND flash in the back end. Data are stored in NAND Flash and MRAM in linear circular addressing mode, where in NVSRAM, 16K size are used as buffers.
[0084] SYSTEM FAILURE: Black box type functionality not only for restoring the System data but also to resume its operation from where it was stopped due to unexpected power failure.
[0085] SEQUENCE FLOW: Each stage execution is verified, and recorded before proceeding to next operation. In case of missing of the sensing of event 1 and even 2 has been sensed, once gain effort to sense event 1 has been taken. This effort is a time limited effort. Three times if event 1 is not sensed consequently, then alarm is generated to draw the operator’s attention. Each event represents an external interface. To ensure better reception, error detection and correction are also used to receive the data to mitigate the data error from each interface in addition to retransmission request.
[0086] DATA ERROR: (1) SENSOR DATA: Instead of instantaneous measurement, averaged measurement is used. (2) Other interface DATA: Retransmission algorithm / checksum, Parity check, Detection and or correction algorithm.
[0087] In an exemplary embodiment, the present discloses provides a System On-Chip (SOC) based Auto Diagnosing Fault Tolerant (ADFT) system comprising: a plurality of communication terminals configured to connect to a plurality of external subsystems and a plurality of memories; a processor connected to the plurality of communication terminals and configured to perform a sequence of operations comprising: receive data from the external subsystems in a predetermined time-period cycle and store the data in at least one memory; determine status of one or more predetermined events occurring at each of the external subsystems; determine one or more of the external subsystems from which data is not received; and set priority to receive data from the determined external subsystems in the next predetermined time-period cycle; and display the status of each of the external subsystems based on the data.
[0088] The processor is configured to perform fault tolerance through continuous tracking of the sequence of operations.
[0089] The processor is configured to update boot image while the current boot image is under operation.
[0090] The processor is configured to perform fault tolerance upon occurrence of faults, wherein the faults are system failure, incorrect sequence of operation, and data error. The predetermined time-period is 10 milliseconds (ms).
[0091] The data is stored in the plurality of the memories with a real-time clock data.
[0092] When data error occurs in the data received from one or more sensors of the external subsystems, the processor is configured to perform fault tolerance by considering an average value of data received for each of the external system within the predetermined time-period.
[0093] When data error occurs in the data received from the external subsystems, the processor is configured to perform fault tolerance by performing parity check or requesting retransmission from the external subsystems.
[0094] When incorrect sequence of operation occurs, the processor is configured to perform fault tolerance by generating an alarm if the predetermined event is not determined at least three times consecutively.
[0095] When system failure occurs, the processor is configured to perform fault tolerance by restoring the system data and resuming the sequence of operations from where it was stopped. The plurality of memories stores same data to ensure data availability during occurrence of any fault.
[0096] In yet another exemplary implementation, the present disclosure provides a method for Auto Diagnosing Fault Tolerant (ADFT) comprising: receiving, by a processor, data from plurality of external subsystems in a predetermined time-period cycle through plurality of communication terminals; storing, by the processor, the data in one or more memories; determining, by the processor, status of one or more predetermined events occurring at each of the external subsystems; determining, by the processor, one or more of the external subsystems from which data is not received; and setting priority, by the processor, to receive data from the determined external subsystems in the next predetermined time-period cycle; and displaying, by a display unit, the status of each of the external subsystems.
[0097] The method further comprises performing fault tolerance, by the processor, through continuous tracking of the sequence of operations.
[0098] The method further comprises updating boot image, by the processor, while the current boot image is under operation.
[0099] The method further comprises performing fault tolerance, by the processor, upon occurrence of faults, wherein the faults are system failure, incorrect sequence of operation, and data error.
[00100] The predetermined time-period is 10 milliseconds (ms).
[00101] The method further comprises storing the data in the plurality of the memories with a real-time clock data.
[00102] When data error occurs in the data received from one or more sensors of the external subsystems, the method comprises performing fault tolerance by considering an average value of data received for each of the external system within the predetermined time-period.
[00103] Upon occurrence of data error in the data received from the external subsystems, the method comprises performing fault tolerance by performing parity check or requesting retransmission from the external subsystems.
[00104] When incorrect sequence of operation occurs, the method comprises performing fault tolerance by generating an alarm if the predetermined event is not determined at least three time consecutively.
[00105] Upon occurrence of system failure, the method comprises performing fault tolerance by restoring the system data and resuming the sequence of operations from where it was stopped.
[00106] The method further comprises storing same data into the plurality of memories stores to ensure data availability during occurrence of any fault.
[00107] In an embodiment, the ADFT system provide various advantages: Auto diagnosing: Keeping the track of the sequence of the events, updating the event database and reconfiguring, rescheduling and prioritizing the sequence to diagnose and adjust or repair the occurrence of the event to align to the predefined configurable sequence of the operations for the specific application.
[00108] Serial port and Ethernet based on the fly remote programming to increase the equipment availability and the same interfaces are used to download the complete NAND flash data for offline analysis.
[00109] Quick booting time with increased SPI clock, compressed boot image and the appropriately selected QSPI flash for mission critical application where boot time has to be in terms of msec.
[00110] Multiple redundant memory options for fault tolerant and mission criticality.
[00111] Integrated System on Chip (SOC) based module capable to configure as SBC with high-speed interfaces such as dual Gigabit Ethernet interfaces with jumbo frame support up to 8K, 1 no of PCIe, 1 no of USB, 7 serial ports, dual inbuilt ARM cores etc. Drivers for all the interfaces through interrupts. OS and Non-OS based implementation for all interfaces.
[00112] Fault tolerant: Fault tolerant architecture achieved through error detection and correction mechanisms and through continuous tracking of the sequence of operations. Continuous tracking of the events for more than 100 hours.
[00113] The foregoing description of the invention has been set merely to illustrate the invention and is not intended to be limiting. Since modifications of the disclosed embodiments incorporating the substance of the invention may occur to person skilled in the art, the invention should be construed to include everything within the scope of the invention.
,CLAIMS:
1. A System On-Chip (SOC) based Auto Diagnosing Fault Tolerant (ADFT) system comprising:
a plurality of communication terminals configured to connect to a plurality of external subsystems and a plurality of memories;
a processor connected to the plurality of communication terminals and configured to perform a sequence of operations comprising:
receive data from the external subsystems in a predetermined time-period cycle and store the data in at least one memory;
determine status of one or more predetermined events occurring at each of the external subsystems;
determine one or more of the external subsystems from which data is not received; and
set priority to receive data from the determined external subsystems in the next predetermined time-period cycle; and
display the status of each of the external subsystems based on the data.
2. The system as claimed in claim 1, wherein the processor is configured to perform fault tolerance through continuous tracking of the sequence of operations.
3. The system as claimed in any one of claims 1 to 2, wherein the processor is configured to update boot image while the current boot image is under operation.
4. The system as claimed in any one of claims 1 to 3, wherein the processor is configured to perform fault tolerance upon occurrence of faults, wherein the faults are system failure, incorrect sequence of operation, and data error.
5. The system as claimed in any one of claims 1 to 4, wherein the predetermined time-period is 10 milliseconds (ms).
6. The system as claimed in any one of claims 1 to 5, wherein the data is stored in the plurality of the memories with a real-time clock data.
7. The system as claimed in any one of claims 1 to 6, wherein, when data error occurs in the data received from one or more sensors of the external subsystems, the processor is configured to perform fault tolerance by considering an average value of data received for each of the external system within the predetermined time-period.
8. The system as claimed in any one of claims 1 to 6, wherein, when data error occurs in the data received from the external subsystems, the processor is configured to perform fault tolerance by performing parity check or requesting retransmission from the external subsystems.
9. The system as claimed in any one of claims 1 to 8, wherein, when incorrect sequence of operation occurs, the processor is configured to perform fault tolerance by generating an alarm if the predetermined event is not determined at least three times consecutively.
10. The system as claimed in any one of claims 1 to 9, wherein, when system failure occurs, the processor is configured to perform fault tolerance by restoring the system data and resuming the sequence of operations from where it was stopped.
11. The system as claimed in any one of claims 1 to 11, wherein the plurality of memories stores same data to ensure data availability during occurrence of any fault.
12. A method for Auto Diagnosing Fault Tolerant (ADFT) comprising:
receiving, by a processor, data from plurality of external subsystems in a predetermined time-period cycle through plurality of communication terminals;
storing, by the processor, the data in one or more memories;
determining, by the processor, status of one or more predetermined events occurring at each of the external subsystems;
determining, by the processor, one or more of the external subsystems from which data is not received; and
setting priority, by the processor, to receive data from the determined external subsystems in the next predetermined time-period cycle; and
displaying, by a display unit, the status of each of the external subsystems.
13. The method as claimed in claim 12, wherein the method further comprises performing fault tolerance, by the processor, through continuous tracking of the sequence of operations.
14. The method as claimed in any of claims 12 or 13, wherein the method further comprises updating boot image, by the processor, while the current boot image is under operation.
15. The method as claimed in any one of claims 12 to 14, wherein the method further comprises performing fault tolerance, by the processor, upon occurrence of faults, wherein the faults are system failure, incorrect sequence of operation, and data error.
16. The method as claimed in any one of claims 12 to 15, wherein the predetermined time-period is 10 milliseconds (ms).
17. The method as claimed in any one of claims 12 to 16, wherein the method further comprises storing the data in the plurality of the memories with a real-time clock data.
18. The method as claimed in any one of claims 12 to 17, wherein, when data error occurs in the data received from one or more sensors of the external subsystems, the method comprises performing fault tolerance by considering an average value of data received for each of the external system within the predetermined time-period.
19. The method as claimed in any one of claims 12 to 18, wherein, upon occurrence of data error in the data received from the external subsystems, the method comprises performing fault tolerance by performing parity check or requesting retransmission from the external subsystems.
20. The method as claimed in any one of claims 12 to 19, wherein, when incorrect sequence of operation occurs, the method comprises performing fault tolerance by generating an alarm if the predetermined event is not determined at least three time consecutively.
21. The method as claimed in any one of claims 12 to 20, wherein, upon occurrence of system failure, the method comprises performing fault tolerance by restoring the system data and resuming the sequence of operations from where it was stopped.
22. The method as claimed in any one of claims 12 to 21, wherein the method further comprises storing same data into the plurality of memories stores to ensure data availability during occurrence of any fault.

Documents

Application Documents

#	Name	Date
1	202341024780-PROVISIONAL SPECIFICATION [31-03-2023(online)].pdf	2023-03-31
2	202341024780-PROOF OF RIGHT [31-03-2023(online)].pdf	2023-03-31
3	202341024780-FORM 1 [31-03-2023(online)].pdf	2023-03-31
4	202341024780-DRAWINGS [31-03-2023(online)].pdf	2023-03-31
5	202341024780-FORM-26 [16-06-2023(online)].pdf	2023-06-16
6	202341024780-FORM 3 [28-03-2024(online)].pdf	2024-03-28
7	202341024780-ENDORSEMENT BY INVENTORS [28-03-2024(online)].pdf	2024-03-28
8	202341024780-DRAWING [28-03-2024(online)].pdf	2024-03-28
9	202341024780-COMPLETE SPECIFICATION [28-03-2024(online)].pdf	2024-03-28
10	202341024780-POA [28-10-2024(online)].pdf	2024-10-28
11	202341024780-FORM 13 [28-10-2024(online)].pdf	2024-10-28
12	202341024780-AMENDED DOCUMENTS [28-10-2024(online)].pdf	2024-10-28