System And Method For Disk Drives Failure Prediction

< Back

System And Method For Disk Drives Failure Prediction

Abstract: ABSTRACT SYSTEM AND METHOD FOR DISK DRIVES FAILURE PREDICTION This disclosure relates generally to a system and method for disk drives failure prediction. Existing SMART attributes fails to give accurate decisions about drive reliability, as the value of these attribute are vendor specific. In the present disclosure, the SMART attributes are identified from the dataset received from the Backblaze® daily snapshot, feature importance method of machine learning is implemented to obtain a subset of relevant SMART attributes. Further, life cycle analysis is performed to understand the behavior or correlation between the subset of relevant SMART attributes. Based on the determined behavior of the SMART attributes, machine learning model is trained using a training data to predict soon to fail hard-disk drives and trained model is fit into a test data wherein one or more alerts are generated if the probability of occurrence is higher than the defined threshold value. [To be published with FIG. 2]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

16 September 2019

Publication Number

20/2023

Publication Type

INA

Invention Field

ELECTRONICS

Status

kcopatents@khaitanco.com

Parent Application

Patent Number

Legal Status

Grant Date

2024-04-27

Renewal Date

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th Floor, Nariman Point Mumbai 400021 Maharashtra, India

Inventors

1. GAUTAM, Richa

Tata Consultancy Services Limited Sector 1, 1, Vibhuti Khand Rd, Vijaipur Colony, Vibhuti Khand, Gomti Nagar Lucknow 226010 Uttar Pradesh, India

2. VISHWAKARMA, Amish

Tata Consultancy Services Limited Sector 1, 1, Vibhuti Khand Rd, Vijaipur Colony, Vibhuti Khand, Gomti Nagar Lucknow 226010 Uttar Pradesh, India

3. KUMAR, Sanjeev

Tata Consultancy Services Limited GG7 Sector 74A, SKYVIEW Corporate Park Gurugram 122004 Haryana, India

4. GAUPTA, Gaurav

Tata Consultancy Services Limited Sector 1, 1, Vibhuti Khand Rd, Vijaipur Colony, Vibhuti Khand, Gomti Nagar Lucknow 226010 Uttar Pradesh, India

Specification

FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION (See Section 10 and Rule 13)
Title of invention:
SYSTEM AND METHOD FOR DISK DRIVES FAILURE PREDICTION
Applicant
Tata Consultancy Services Limited A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
Preamble to the description:
The following specification particularly describes the invention and the
manner in which it is to be performed.

CROSS-REFERENCE TO RELATED APPLICATIONS AND PRIORITY
[001] The present application claims priority from Indian provisional application no. 201921037335, filed on September 16, 2019. The entire contents of the aforementioned application are incorporated herein by reference.
TECHNICAL FIELD [002] The disclosure herein generally relates to the field of failure prediction, and, more particularly, to system and method for disk drives failure prediction.
BACKGROUND
[003] SMART (Self-Monitoring, Analysis and Reporting Technology; often written as SMART) is a monitoring system included in computer hard-disk drives (HDDs), solid-state drives (SSDs), and embedded multimedia card (eMMC) drives. The primary function of such SMART systems is to detect and report various indicators of drive reliability with the intent of anticipating imminent hardware failures. Further, SMART attributes are supported by different hard-disk drive manufacturers which define a threshold value of each attribute and if the value of attribute goes beyond that threshold value then it is observed as an abnormal behavior.
[004] The existing basic problem in IT industry is abrupt hardware failure. Storage Original Equipment Manufacturers (OEM) are trying to minimize the risk of data integrity and data loss for their customer by developing new solutions with the latest technologies to predict the hardware failure. Further, in any large-scale datacenters, hard-disk drives serve as the backbone for storing the data wherein data loss can seriously hamper businesses to the organization both financially and as well as operationally Even with improved reliability of a hard-disk drive, the disk failures do happen. .
[005] Existing SMART attributes have various limitations. SMART attributes fail to give accurate decisions about drive reliability, as the value of these attribute are vendor specific. Further, lack of documentation on SMART statistics

is one of the major concerns as the drive manufacturers do not share specific details of use cases with stakeholders. Every vendor has defined their own smart statistics which they want to track but they do not specify what is the purpose and these are being tracked. Further, SMART attributes are not correlated and, they do not have a linear relation with failed hard-disk drives. Further a large fraction of the hard-disk drives showed no sign of failure in all of its SMART monitoring feature, making it difficult to achieve an accurate decision.
SUMMARY [006] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one aspect, there is provided a processor implemented method for disk drives failure prediction. The method comprises receiving, via one or more hardware processor, a first set of statistics is specific to one or more hard-disk drives, wherein the first set of statistics corresponds to a first predetermined time period; identifying a plurality of attributes from the first set of statistics specific to the one or more hard-disk drives; applying a machine learning technique on the plurality of attributes to identify at least a subset of relevant attributes affecting the prediction of failure in the one or more hard-disk drives; obtaining, via one or more hardware processor, a second set of statistics specific to a plurality of failed hard-disk drives, wherein the second set of statistics comprises the subset of relevant attributes identified from the plurality of attributes, and wherein the second set of statistics corresponds to a second predetermined time period; performing a life cycle analysis on the second set of statistics to determine behavior of the subset of relevant attributes; and training, based on the determined behavior of the subset of relevant attributes comprised in the second set of statistics, a machine learning model using a training data comprised in a balanced dataset obtained from the first set of statistics and applying the trained machine learning model on a test data to predict soon to fail hard-disk drives.

[007] In another aspect, there is provided a system for disk drives failure prediction. The system comprises: a memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to: receive, via one or more hardware processor, a first set of statistics is specific to one or more hard-disk drives, wherein the first set of statistics corresponds to a first predetermined time period. The system further comprises identifying a plurality of attributes from the first set of statistics specific to the one or more hard-disk drives. Applying a machine learning technique on the plurality of attributes to identify at least a subset of relevant attributes affecting the prediction of failure in the one or more hard-disk drives; obtaining, via one or more hardware processor, a second set of statistics specific to a plurality of failed hard-disk drives, wherein the second set of statistics comprises the subset of relevant attributes identified from the plurality of attributes, and wherein the second set of statistics corresponds to a second predetermined time period; performing a life cycle analysis on the second set of statistics to determine behaviour of the subset of relevant attributes; and training, based on the determined behaviour of the subset of relevant attributes comprised in the second set of statistics, a machine learning model using a training data comprised in a balanced dataset obtained from the first set of statistics and applying the trained machine learning model on a test data to predict soon to fail hard-disk drives;
[008] In yet another aspect, there are provided one or more non-transitory machine-readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors cause receiving, via one or more hardware processor, a first set of statistics is specific to one or more hard-disk drives, wherein the first set of statistics corresponds to a first predetermined time period; identifying a plurality of attributes from the first set of statistics specific to the one or more hard-disk drives; applying a machine learning technique on the plurality of attributes to identify at least a subset of relevant attributes affecting the prediction of failure in the one or more hard-disk drives; obtaining, via one or more hardware processor, a second set of statistics specific to

a plurality of failed hard-disk drives, wherein the second set of statistics comprises the subset of relevant attributes identified from the plurality of attributes, and wherein the second set of statistics corresponds to a second predetermined time period; performing a life cycle analysis on the second set of statistics to determine behavior of the subset of relevant attributes; and training, based on the determined behavior of the subset of relevant attributes comprised in the second set of statistics, a machine learning model using a training data comprised in a balanced dataset obtained from the first set of statistics and applying the trained machine learning model on a test data to predict soon to fail hard-disk drives.
[009] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
[010] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
[011] FIG. 1 illustrates an exemplary system, for a disk drive failure prediction, in accordance with some embodiments of the present disclosure.
[012] FIG. 2 illustrates an exemplary block diagram of the system, for the disk drives failure prediction, in accordance with some embodiments of the present disclosure.
[013] FIG. 3 is a flow diagram, illustrating the steps involved in the disk drives failure prediction, in accordance with some embodiments of the present disclosure.
[014] FIG. 4 a use case illustrating the implementation of feature importance method to identify a plurality of attributes for the disk drives failure prediction, in accordance with some embodiments of the present disclosure.
[015] FIGS. 5A through 5D are uses cases illustrating the life cycle analysis of the failed hard disk drives with respect to individual SMART attributes,

for the disk drives failure prediction, in accordance with some embodiments of the present disclosure.
[016] FIG. 6 a use case illustrating the correlation between the SMART attributes during life cycle analysis of the failed hard-disk drives, for the disk drives failure prediction, in accordance with some embodiments of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS [017] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments. It is intended that the following detailed description be considered as exemplary only, with the true scope being indicated by the following claims.
[018] S.M.A.R.T. (Self-Monitoring Analysis and Reporting Technology) monitoring system, which is included in HDD, SSD and eMMC drives, helps in prediction of the failure of hard-disk drives. However, existing SMART attributes fail to give accurate decisions about disk drive reliability, as the value of these attribute are vendor specific. Further, SMART attributes are not correlated, and they do not have a linear relation with failed hard-disk drives. Further, a large fraction of hard-disk drives showed no sign of failure in all of its SMART monitoring feature, making it difficult to achieve an accurate decision.
[019] To overcome the above technical problem and to understand what went wrong/incorrect and where, embodiments of the present disclosure provide systems and methods for disk drives failure prediction. More specifically, in the present disclosure, SMART attributes are identified from the dataset received from the Backblaze® daily snapshot wherein a feature importance method of machine learning is implemented to obtain a subset of relevant attributes. Further, present disclosure includes performing life cycle analysis to understand the correlation

between the subset of relevant attributes obtained from the feature importance method. Further, in the present disclosure, a machine learning model is trained using a training data to predict soon to fail hard-disk drives.
[020] Referring now to the drawings, and more particularly to FIG. 1 through FIG. 6, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
[021] FIG. 1 illustrates an exemplary system 100, for a disk drive failure prediction, in accordance with some embodiments of the present disclosure. In an embodiment, the system 100 includes one or more processors 104, communication interface device(s) or input/output (I/O) interface(s) 106, disk drive(s) 108 and one or more data storage devices or memory 102 operatively coupled to the one or more processors 104. The one or more processors 104 that are hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, graphics controllers, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) are configured to fetch and execute computer-readable instructions stored in the memory. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.
[022] The I/O interface device(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface device(s) can include one or more ports for connecting a number of devices to one another or to another server.
[023] The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random

access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, one or more modules (not shown) of the system 100 can be stored in the memory 102.
[024] In an embodiment, the system 100 includes one or more data storage devices or memory 102 operatively coupled to the one or more processors 104 and is configured to store instructions configured for execution of steps of the method 200 by the one or more processors 104.
[025] FIG. 2 illustrates an exemplary block diagram of the system 100, for the disk drives failure prediction, in accordance with some embodiments of the present disclosure. In an embodiment, “hard-disk drives” may be referred as “disk drives” and can be used interchangeably in the present disclosure. In an embodiment, the modules of the system 200 utilized for the hard-disk drive failure prediction, include an input data set module 202, a SMART attribute module 204, a feature importance and selection module 206, a life cycle analysis of fail disk drives module 208, a classifier module 210, a machine learning model module 216 and an output module 218. The classifier module 210 of the system 200 further includes a train data set 212 and a test data set 214.
[026] FIG. 3 is a flow diagram, illustrating the steps involved in the disk drives failure prediction, in accordance with some embodiments of the present disclosure. Steps of the method of FIG. 3 shall be described in conjunction with the components of FIG. 2. At step 302 of the method 300, the one or more hardware processors 104 receive, a first set of statistics specific to one or more hard-disk drives, wherein the first set of statistics corresponds to a first predetermined time period. Referring to the FIG. 2, input data set module 202 of the system 200 receives a first set of statistics i.e. for example dataset of Q1-2016, specific to one or more hard-disk drives from Backblaze® which provides daily snapshot reports of its hard-disk drive (HDDs) SMART attributes and health. Backblaze® obtain daily snapshot of each operational hard-disk drive wherein the snapshot includes basic hard-disk drive information along with the SMART statistics reported by that hard-

disk drive. Further, the daily snapshot of one hard-disk drive is one record or row of data wherein all the hard-disk drive snapshots for a given day are collected into a file consisting of a row for each active hard-disk drive. The format of this file is a "csv" (Comma Separated Values) file. Further each day this file is named in the format YYYY-MM-DD.csv, for example, 2013-04-10.csv. Dataset Column includes –
• Date - The date of the file in yyyy-mm-dd format.
• Serial Number - The manufacturer-assigned serial number of the drive.
• Model - The manufacturer-assigned model number of the drive.
• Capacity - The drive capacity in bytes
• Failure - Contains a .0. If the drive is OK. Contains a .1. If this is the last day the drive was operational before failing.
These are the different vendor, whose hard-disk drives SMART statistics are released by the Backblaze®. Among the large number of SMART attributes present in the dataset these are few of the SMART attributes which we are identified as most significant SMART attributes for the prediction of hard-disk drive failure along with the capacity.
• SMART 1 (Read Error Rate) - Stores data related to the rate of hardware read errors that occurred when reading data from a disk surface.
• SMART 3 (Spin-Up Time) - Average time of spindle spin up (from zero RPM to fully operational
• SMART 5 (Reallocated Sectors Count) - Count of reallocated sectors. The raw value represents a count of the bad sectors that have been found and remapped.
• SMART 9 (Power-On Hours) - Count of hours in power-on state. The raw value of this attribute shows total count of hours in power-on state.

• SMART 187 (Reported Uncorrectable Errors) - The count of errors that could not be recovered using hardware ECC.
• SMART 188 (Command Timeout) - The count of aborted operations due to HDD timeout.
• SMART 196 (Reallocation Event Count) - Count of remap operations. The raw value of this attribute shows the total count of attempts to transfer data from reallocated sectors to a spare area.
• SMART 197 (Current Pending Sector Count) - Count of "unstable" sectors (waiting to be remapped, because of unrecoverable read errors).
• SMART 198 (Uncorrectable Sector Count) - The total count of uncorrectable errors when reading/writing a sector.
SMART (Self-Monitoring, Analysis and Reporting Technology) attributes include various attributes. However, not all the attributes are not relevant for hard-disk drives analysis. Hence SMART attributes which contribute most in hard-disk failure analysis are identified.
[027] At step 304 of the present disclosure, the one or more hardware processors 106 identify, a plurality of attributes from the first set of statistics specific to the one or more hard-disk drives. The plurality of attributes identified from the first set of statistics comprises information on (i) read error rate, (ii) spin up time, (iii) number of reallocated sectors, (iv) number of power on hours, (v) number of reported uncorrectable errors, (vi) command timeout, (vii) number of reallocation events, (viii) number of current pending sector, (ix) number of uncorrectable sectors and capacity.Referring to the FIG. 2, SMART attribute identification module 204 of the system 200 identifies a set of SMART attributes specifically 9 SMART attributes which are supported by majority of hard-disk drive vendor and 1 capacity. However, based on the analysis, it was observed that all the identified SMART attributes do not contribute equally in prediction of failure of hard-disk drives and also the dataset(first set of statistics) was very noisy and

irrelevant attribute might decrease the accuracy of the trained machine learning model.
[028] At step 306 of the present disclosure, the one or more hardware processors 104 apply a machine learning technique on the plurality of attributes to identify at least a subset of relevant attributes affecting the prediction of failure in the one or more hard-disk drives as depicted in FIG. 4. The feature importance and selection module 206 of the system 200 implements a machine learning technique, namely, feature importance method to identify a subset of SMART attributes which are supported by majority of disk drive vendor and contributes most in the hard-disk failure prediction. Further, including high impurity attribute may deviate the overall accuracy of the model, so feature importance method of machine learning algorithm is implemented to identify the features which contribute most to hard-disk drive failure prediction. Feature ranking of top 4 SMART attributes are Uncorrectable_Sector_Count = 0.336510, Current_Pending_Sector_Count = 0.288338, Power_On_Hours = 0.177049 and Reported_Uncorrectable_Errors = 0.122693 wherein these 4 SMART attributes includes high feature importance value out of 9 SMART attributes.
[029] At step 308 of the present disclosure, the one or more hardware processors 104, obtain, a second set of statistics specific to a plurality of failed hard-disk drives, wherein the second set of statistics comprises the subset of relevant attributes identified from the plurality of attributes comprised in the first set of statistics. The second set of statistics comprises at least (a) number of uncorrectable sectors, (b) number of current pending sector, (c) number of power on hours, and (d) reported uncorrectable errors. Further the second set of statistics corresponds to a second predetermined time period. For example, since the life of hard-disk drive may span from 3 to 5 years, dataset of 3 consecutive years are considered, separated fail hard-disk drives and success hard-disk drives, then merged 3 year data of only fail hard-disk drives to perform life cycle analysis of failed hard-disk drives, to understand behaviors of SMART attribute with respect to time which give details about hard-disk drive health.
For example: Dataset:

Years of Data – 2013, 2014,2015, Q1 2016
Number of files = 641 files, size varies from 4mb to 16 mb
[030] At step 310 of the present disclosure, the one or more hardware processors 104 perform, a life cycle analysis on the second set of statistics to determine behavior of the subset of relevant attributes as depicted from FIG’s 5A through 5D and FIG.6. Further, the second set of statistics comprises (a) number of uncorrectable sectors, (b) number of current pending sector, (c) number of power on hours, and (d) reported uncorrectable errors. As depicted in FIG.2, life cycle analysis of fail hard-disk drives module 208 of the system 200 is configured to perform life cycle analysis of the subset of SMART attributes identified using feature importance method. The life cycle analysis of the subset of relevant SMART attributes is performed to understand the correlation between the SMART attributes i.e. to analyze fail hard-disk drives data for attributes selected from feature importance method, to get an understanding about how the value of these attributes varies during the life span of a fail hard-disk drives. Further, the analysis will help in identifying the threshold limit which can be set for fail hard-disk drives / prepare a threshold value based decision tree to identify fail hard-disk drives / rule based classification to classify soon to fail drives, though there is an incremental pattern but it varies from each drive to drive and every vendor to vendor. As the analysis of individual attribute does not provide any concrete or valuable information about the hard-disk drives failure. Further, all the SMART attributes are plotted in a single graph to check if there is any correlation between these attributes for fail hard-disk drives, through the graph(as depicted in FIG.6) a clear picture is obtained, that there is no pattern following which an administrator can detect whether a disk drive is healthy or not. Further, having a single decision tree with defined threshold value will not fit to each disk drives.
[031] At step 312 of the present disclosure, the one or more hardware processors 104 train, based on the determined behavior of the subset of relevant attributes comprised in the second set of statistics, a machine learning model using a training data comprised in a balanced dataset obtained from the first set of statistics (i.e. for example, dataset of Q1-2016) and applying the trained machine

learning model on a test data to predict soon to fail hard disk drives. The step 312 of FIG. 3 is performed by the machine learning model module 216 of the system 200, as depicted in FIG.2. Here, the machine learning model refers to random forrest which is a supervised machine learning model to train the model, which develop a forrest of decision tree and then amalgamate them together to make a more accurate decision. In the present disclosure, training the model is performed by calculating a fraction of a plurality of working hard-disk drives and a plurality of failed hard-disk drives comprised in the first set of statistics (i.e. for example, dataset of Q1-2016). Further, the calculation of fraction is performed to understand the ratio of failed and working hard-disk drive. The classifier module 210 of the system 200 partitions/divides the complete dataset (i.e. for example, dataset of Q1-2016) into training dataset 212 and testing dataset 214 as depicted in the FIG.2. The complete dataset is divided into a ratio of 70:30, in one example embodiment. Such division of training and testing data shall not be construed as limiting the scope of the present disclosure. For example,
Working disk drive records: 3827272 Failed disk drive records: 234
Fraction of Failed Disk Drives: 6.113641624598368e-05 Fraction of Working Disk Drives: 0.9999388635837541
Since the datasets related to the failed and working hard-disk drives is imbalanced,
undersampling is performed on the unbalanced dataset of training dataset 212 to
obtain a balanced dataset.
For example,
Working and Failed Disk details after Undersampling:
Fraction of failed disks after under sampling: 0.4
Fraction of working disks after under sampling: 0.5999999999999999
Working disk drive records: 350
Failed disk drive records: 234

The model is then trained on training dataset using the random forrest model
wherein, to train the model only those SMART attributes which has highest feature
importance value are considered.
For example,
Data set - Q1 2016
Number of Files = 90+ files, size varies from 14mb -16mb
Train Dataset = 3827506
Test Dataset = 1640360
Working records = 3827272
Fail records =234
Also, the trained machine learning model validation includes:
For example:
Confusion metrics:
Array ([[1609804, 30453], [45, 58]])
True positive rate: 0.5631067961165048
False positive rate: 0.0019009537543836648
[032] In an embodiment, the output module 218 of the system 200 includes fitting the trained machine learning model into the third set of statistics (for example, dataset of 2017) to obtain at least one of a set of healthy hard-disk drives and a set of failed hard-disk drives. The third set of statistics comprises at least (a) number of uncorrectable sectors, (b) number of current pending sector, (c) number of power on hours, and (d) reported uncorrectable errors. Further, each failed hard-disk drive from the set of failed hard-disk drives is associated with a value indicative of a probability of occurrence of failure. Further, the output module 218 of the system 200 performs a comparison of the value indicative of the probability of failure occurrence of each failed hard-disk drive from the set of failed hard-disk drives and threshold value, and generates one or more alerts based on the comparison. The probability of occurrence of failure >= 0.92, in one example embodiment i.e. when the probability of occurrence of failure is greater than or equal to the defined threshold value, an automated email notification is sent to an

administrator to alert him/her for hard-disk drives which are more prone to failure so that administrator can proactively take the backup of these hard-disk drives. Such value of the probability of occurrence of failure shall not be construed as limiting the scope of the present disclosure.
For example: Below text message depicts the email notification sent to an administrator/user.
Dear Admin,
Below are the list of hard disk drives which our system has detected soon to fail.
Model : Model1 , Serial Number : Serial Number1 Model : Model2 , Serial Number : Serial Number2 Model : Model3 , Serial Number : Serial Number3 Model : Model4 , Serial Number : Serial Number4 Model : Model5 , Serial Number : Serial Number5
[033] FIG. 4 is a use case illustrating an implementation of feature importance method to identify a plurality of attributes for the hard-disk drives failure prediction, in accordance with some embodiments of the present disclosure. The feature importance and selection module 206 of the system 200 implements a machine learning technique, namely, feature importance method to identify a subset of SMART attributes which are supported by majority of hard-disk drive vendor and contributes most in the hard-disk failure prediction. The identification of SMART attributes using the machine learning technique is depicted in the step of 306 of FIG. 3.
[034] Below Table 1 depicts examples of implementation of feature importance method to identify a subset of relevant attributes which contributes most in the hard-disk drives failure prediction.

Attributes Feature Importance
UncorrectableSectorCount 0.336510

CurrentPendingSectorCount 0.288338
PowerOnHours 0.177049
ReportedUncorrectableErrors 0.122693
[035] FIGS. 5A through 5D are uses cases illustrating the life cycle analysis of the failed hard-disk drives with respect to individual SMART attributes, for the disk drives failure prediction, in accordance with some embodiments of the present disclosure. The life cycle analysis of the subset of SMART attributes is performed by obtaining, a second set of statistics specific to a plurality of failed hard-disk drives, wherein the second set of statistics comprises the subset of relevant attributes, and wherein the second set of statistics corresponds to a second predetermined time period, as depicted in the step 308 of FIG.3. Referring to FIG. 5A through 5D, lifecycle analysis of each SMART attribute out of 4 relevant SMART attributes is obtained by plotting graph of fail hard-disk drives for each SMART attributes against time and it is observed that, as soon as the raw value of attribute increased there are chances that hard-disk drive may fail. Even though the life cycle analysis of each SMART attribute provided an analysis on their behavior in the life span of 3 to 5 years, wherein as soon as values of the SMART attributes increases from 0 there are chances that hard-disk drive may fail but there is no fixed duration to predict when that hard-disk drive will fail. 4 SMART attributes or the subset of relevant attributes comprised in the second set of statistics which contribute most to hard-disk failure prediction are described below:
Uncorrectable Sector Count: The total count of uncorrectable errors when
reading/writing a sector.
Current Pending Sector Count - Count of "unstable" sectors (waiting to be
re- mapped, because of unrecoverable read errors).
Power-On Hours - Count of hours in power-on state. The raw value of this
attribute shows total count of hours in power-on state.
Reported Uncorrectable Errors - The count of errors that could not be
recovered using hardware ECC.

[036] FIG. 6 is a use case illustrating the correlation between the SMART attributes during life cycle analysis of the failed hard-disk drives, for the hard-disk drives failure prediction, in accordance with some embodiments of the present disclosure. The life cycle analysis of the subset of relevant SMART attributes is performed to determine the correlation between the SMART attributes i.e., to analyze fail hard-disk drives data for attributes selected from feature importance method and to understand how the value of these attributes varies during the life span of a fail hard- disk. Further, to determine the correlation between the SMART attributes, SMART attributes of fail hard-disk drives are plotted in a single plot. However, life cycle analysis of relevant SMART attributes did not provide any threshold value of smart attribute which can be used to predict drive failure.
[037] Hence the present disclosure provides machine learning approach to identify the SMART attributes which are contributing most to hard-disk drive failure prediction. Further, the value of these SMART attributes are analyzed for fail hard-disk drives to get an understanding, about how the value of these attributes varies during the life span of a fail hard-disk drive and to determine threshold limit, to prepare decision tree or rule based classification to classify soon to fail hard-disk drives. Though there is some incremental pattern, but it varies from each drive to drive and every vendor to vendor. As a single decision tree with defined threshold value does not fit to each and every hard-disk drive, random forrest machine learning model is implemented which is a supervised machine learning model and calculated the probability of occurrence of failure. Further, wherever probability of occurrence of failure is greater than or equal to the defined threshold value an automated mail is triggered and sent to administrator to alert him/her for these hard-disk drives so that one or more corrective measures can be taken (e.g., proactively take the backup of these drives and replace defective drives). The present approach helps administrators to identify those hard-disk drives which are more prone to failure and accordingly administrator can take some corrective action. Automated email eliminates the regular manual monitoring of disk drives.
[038] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the

subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[039] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
[040] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[041] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological

development will change the manner in which particular functions are performed.
These examples are presented herein for purposes of illustration, and not limitation.
Further, the boundaries of the functional building blocks have been arbitrarily
defined herein for the convenience of the description. Alternative boundaries can
be defined so long as the specified functions and relationships thereof are
appropriately performed. Alternatives (including equivalents, extensions,
variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[042] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[043] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.

We Claim:
1. A processor implemented method, comprising:
receiving, via one or more hardware processor, a first set of statistics is specific to one or more hard-disk drives, wherein the first set of statistics corresponds to a first predetermined time period (302);
identifying a plurality of attributes from the first set of statistics specific to the one or more hard-disk drives (304);
applying a machine learning technique on the plurality of attributes to identify at least a subset of relevant attributes affecting the prediction of failure in the one or more hard-disk drives (306);
obtaining, via one or more hardware processor, a second set of statistics specific to a plurality of failed hard-disk drives, wherein the second set of statistics comprises the subset of relevant attributes identified from the plurality of attributes, and wherein the second set of statistics corresponds to a second predetermined time period (308);
performing a life cycle analysis on the second set of statistics to determine behaviour of the subset of relevant attributes (310); and
training, based on the determined behaviour of the subset of relevant attributes comprised in the second set of statistics, a machine learning model using a training data comprised in a balanced dataset obtained from the first set of statistics and applying the trained machine learning model on a test data to predict soon to fail hard-disk drives (312);
2. The processor implemented method of claim 1, further comprising
obtaining a third set of statistics corresponding to a plurality of attributes
specific to one or more hard-disk drives;
fitting the trained machine learning model into the third set of statistics to obtain at least one of a set of healthy hard-disk drives and a set of failed hard-disk drives, wherein each failed hard-disk drive from the set of failed hard-disk drives is associated with a value indicative of a probability of failure occurrence; and

performing a comparison of (i) the value indicative of the probability of failure occurrence of each failed hard-disk drive from the set of failed hard-disk drives and (ii) a threshold value and generating one or more alerts based on the comparison.
3. The processor implemented method of claim 2, wherein the plurality of attributes identified from the first set of statistics comprises information on (i) read error rate, (ii) spin up time, (iii) number of reallocated sectors, (iv) number of power on hours, (v) number of reported uncorrectable errors, (vi) command timeout, (vii) number of reallocation events, (viii) number of current pending sector, (ix) number of uncorrectable sectors and capacity, and wherein each of the second set of statistics and the third set of statistics comprises at least (a) number of uncorrectable sectors, (b) number of current pending sector, (c) number of power on hours, and (d) reported uncorrectable errors.
4. The processor implemented method of claim 3, wherein the disk drive health is based on at least one of the read error rate, the spin up time raw, the number of reallocated sectors, the number of power on hours, the number reported uncorrectable errors, the command timeout, the number of reallocation events, the number of current pending sector, and the number of uncorrectable sectors.
5. A system (100), comprising:
a memory (102) storing instructions;
one or more communication interfaces (106);
one or more hardware processors (104) coupled to the memory (102) via the one or more communication interfaces (106), wherein the one or more hardware processors (104) are configured by the instructions to:
receive, via one or more hardware processor, a first set of statistics is specific to one or more hard-disk drives, wherein the first set of statistics corresponds to a first predetermined time period;

identify a plurality of attributes from the first set of statistics specific to the one or more hard-disk drives;
apply a machine learning technique on the plurality of attributes to identify at least a subset of relevant attributes affecting the prediction of failure in the one or more hard-disk drives;
obtain, via one or more hardware processor, a second set of statistics specific to a plurality of failed hard-disk drives, wherein the second set of statistics comprises the subset of relevant attributes, and wherein the second set of statistics corresponds to a second predetermined time period;
perform a life cycle analysis on the second set of statistics to determine behavior of the subset of relevant attributes;
train, based on the determined behavior of the subset of relevant attributes comprised in the second set of statistics, a machine learning model using a training data comprised in a balanced dataset obtained from the first set of statistics and applying the trained machine learning model on a test data to predict soon to fail hard-disk drives;
6. The system (100) as claimed in claim 5, wherein the one or more hardware
processors are further configured by the instructions to comprise:
obtain a third set of statistics corresponding to a plurality of attributes specific to one or more hard-disk drives;
fit the trained machine learning model into the third set of statistics to obtain at least one of a set of healthy hard-disk drives and a set of failed hard-disk drives, wherein each failed disk drive from the set of failed hard-disk drives is associated with a value indicative of a probability of failure occurrence; and
perform a comparison of (i) the value indicative of the probability of failure occurrence of each failed hard-disk drive from the set of failed hard-disk drives and (ii) a threshold value and generating one or more alerts based on the comparison.

7. The system (100) as claimed in claim 6, wherein the plurality of attributes identified from the first set of statistics comprises information on (i) read error rate, (ii) spin up time, (iii) number of reallocated sectors, (iv) number of power on hours, (v) number of reported uncorrectable errors, (vi) command timeout, (vii) number of reallocation events, (viii) number of current pending sector, (ix) number of uncorrectable sectors and capacity, and wherein each of the second set of statistics and the third set of statistics comprises at least (a) number of uncorrectable sectors, (b) number of current pending sector, (c) number of power on hours, and (d) reported uncorrectable errors.
8. The system (100) as claimed in claim 6, wherein the disk drive health is based on at least one of the read error rate, the spin up time raw, the number of reallocated sectors, the number of power on hours, the number reported uncorrectable errors, the command timeout, the number of reallocation events, the number of current pending sector, and the number of uncorrectable sectors.

Documents

Application Documents

#	Name	Date
1	201921037335-STATEMENT OF UNDERTAKING (FORM 3) [16-09-2019(online)].pdf	2019-09-16
2	201921037335-PROVISIONAL SPECIFICATION [16-09-2019(online)].pdf	2019-09-16
3	201921037335-FORM 1 [16-09-2019(online)].pdf	2019-09-16
4	201921037335-DRAWINGS [16-09-2019(online)].pdf	2019-09-16
5	201921037335-Proof of Right (MANDATORY) [18-11-2019(online)].pdf	2019-11-18
6	201921037335-ORIGINAL UR 6(1A) FORM 1-201119.pdf	2019-11-22
7	201921037335-Proof of Right (MANDATORY) [28-11-2019(online)].pdf	2019-11-28
8	201921037335-ORIGINAL UR 6(1A) FORM 1-271119.pdf	2019-11-30
9	201921037335-FORM-26 [19-03-2020(online)].pdf	2020-03-19
10	201921037335-FORM 18 [15-09-2020(online)].pdf	2020-09-15
11	201921037335-ENDORSEMENT BY INVENTORS [15-09-2020(online)].pdf	2020-09-15
12	201921037335-DRAWING [15-09-2020(online)].pdf	2020-09-15
13	201921037335-CORRESPONDENCE-OTHERS [15-09-2020(online)].pdf	2020-09-15
14	201921037335-COMPLETE SPECIFICATION [15-09-2020(online)].pdf	2020-09-15
15	Abstract1.jpg	2021-10-19
16	201921037335-FER.pdf	2023-09-06
17	201921037335-FER_SER_REPLY [24-11-2023(online)].pdf	2023-11-24
18	201921037335-CLAIMS [24-11-2023(online)].pdf	2023-11-24
19	201921037335-PatentCertificate27-04-2024.pdf	2024-04-27
20	201921037335-IntimationOfGrant27-04-2024.pdf	2024-04-27

Search Strategy

1	201921037335E_05-09-2023.pdf