System(s) And Method(s) To Facilitate Data Processing For Predicting

< Back

System(s) And Method(s) To Facilitate Data Processing For Predicting Medical Disorders

Abstract: System and method to facilitate prediction of neurological disorders is disclosed. System comprises distributed databases to store medical data and processors linked to cloud platform for accessing data stored in distributed databases. System applies statistical processing technique on data to generate statistical data which comprises genetic variables and neural images and applies regression technique on statistical data to prepare correlations between genetic variables and neural images to obtain first set of values. System filters first set of values to obtain statistical significant values and generates dataset based on first set of values, statistical significant values and prescription data. System accepts dataset, develops supervised learned methodology to derive data patterns. Supervised learned methodology is technique to identify abnormal statistical data representing disorder. Data patterns are abnormal statistical values representing disorder. System maps data patterns to pre-stored neurological disorder data to identify abnormal statistical values from data pattern to predict neurological disorder.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

29 October 2013

Publication Number

29/2015

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Patent Number

Legal Status

Grant Date

2023-10-11

Renewal Date

Applicants

TATA CONSULTANCY SERVICES LIMITED

NIRMAL BUILDING, 9TH FLOOR, NARIMAN POINT, MUMBAI 400021, MAHARASHTRA, INDIA

Inventors

1. AHAMED, SYED AZAR

TATA CONSULTANCY SERVICES LIMITED, PIONEER BUILDING, 12TH FLOOR, INTERNATIONAL TECH PARK, WHITEFIELD ROAD, BANGALORE - 560066, KARNATAKA, INDIA

2. VIJAYAKUMAR, SENTHILKUMAR

TATA CONSULTANCY SERVICES LIMITED, PIONEER BUILDING, 12TH FLOOR, INTERNATIONAL TECH PARK, WHITEFIELD ROAD, BANGALORE - 560066, KARNATAKA, INDIA

3. KUMAR, ROHIT

TATA CONSULTANCY SERVICES LIMITED, PIONEER BUILDING, 12TH FLOOR, INTERNATIONAL TECH PARK, WHITEFIELD ROAD, BANGALORE - 560066, KARNATAKA, INDIA

Specification

FORM 2
THE PATENTS ACT 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13).
TITLE OF INVENTION:
SYSTEM(S) AND METHOD(S) TO FACILITATE DATA PROCESSING FOR PREDICTING MEDICAL DISORDERS
Applicant
Tata Consultancy Services Limited A company Incorporated in India under The Companies Act. 1956
Having address:
Nirmal Building. 9th Floor,
Nariman Point, Mumbai 400021,
Maharashtra, India
The following specification particularly describes the invention and the manner in which it is to be performed.

TECHNICAL FIELD
[001] The present subject matter described herein, in general, relates to data
processing system(s) and method(s) designed to facilitate forecasting, and more particularly to the system(s) and method(s) to facilitate medical data processing for predicting medical disorders.
BACKGROUND
[002] Medical disorders particularly, neurological disorders are very
precarious and contingent for living beings and affect life of millions globally. An estimated 6.8 million people die every year as a result of these disorders. Neurological disorders need to be identified and medical attention needs to be provided as early as possible otherwise its impact may get fatal. People suffering from such disorders undergo various medical treatments. Numerous tests such as MRI scans, EEGs, and so on are conducted in order to detect the abnormality in the neural functions and treat the disease. The test results are accumulated as records and are used by the medical specialists, doctors for treatments.
[003] This procedure is being followed for many decades now and over
time. The accumulation of data in the form of scan images, numerical and graphical reports has also grown in size. Large amount of data is acquired, processed, compared, annotated and archived for future reference. This data may contain rich information that may be analyzed and used to improve the existing medical treatment process pertaining to various diseases such as Alzheimer's disease (AD), Parkinson's disease (PD), and other neurological diseases. Further, this data may also be used by the medical researchers, scientists for study and research to find better and improved medical solutions.
[004] Major challenges faced in the neuroscience domain are data
organization, data integration and processing. Also, the traditional course of treatments is sequential and time consuming. In addition, the data analysis in this area is too complex for any traditional data processing applications. So there is no

adequate support provided to the medical practitioners and researchers to analyze the available data and predict the disorder well before the disorder becomes fatal and provide the best possible solution.
SUMMARY
[005] This summary is provided to introduce aspects related to system(s) and
method(s) to facilitate data processing for prediction of neurological disorders and the
aspects are further described below in the detailed description. This summary is not
intended to identify essentia! features of the claimed subject matter nor is it intended
for use in determining or limiting the scope of the claimed subject matter.
[006] In one implementation, a system to facilitate prediction of neurological
disorders is described. The system comprises one or more distributed databases configured to store the data in one or more formats, wherein the data is a medical data and one or more processors linked to a cloud platform for accessing data stored in the distributed databases. The system further comprises one or more memories coupled to the one or more processors, wherein the one or more processors are capable of executing a plurality of modules stored in the one or more memories. The plurality of modules further comprises a computational module, a correlating module, a filtering module, a combination module and a machine learning module. The computational module is configured to apply a statistical processing technique on the data based on the one or more formats of the data in order to generate statistical data, wherein the statistical data comprises one or more genetic variables and one or more neural images. The correlating module is configured to apply a regression technique on the statistical data to prepare correlations between the genetic variables and the neural images in order to obtain one or more first set of values, wherein the first set of values represents a statistical significance of the correlation. The filtering module is configured to filter the one or more first set of values based upon a predefined threshold in order to obtain a set of statistical significant values. The combination module is configured to generate a dataset based on the one or more first set of

values, the set of statistical significant values, and prescription data. The machine
learning module is configured to accept the dataset and iteratively develop a
supervised learned methodology using the dataset to derive one or more data patterns,
wherein the supervised learned methodology is a technique to identify abnormal
statistical data representing a disorder, and the one or more data patterns are one or
more abnormal statistical values representing a disorder. The machine learning
module is further configured to map the data patterns to pre-stored neurological
disorder data representing an abnormality in order to identify one or more abnormal
statistical values from the data pattern to predict the neurological disorder.
[007] In one implementation, a method to facilitate prediction of
neurological disorders is disclosed. The method comprises accepting and storing the data in one or more formats in one or more distributed databases wherein the data is medical data. The method further comprises applying a statistical processing technique on the data based on the one or more formats of the data in order to generate statistical data, wherein the statistical data comprises one or more genetic variables and one or more neural images and applying a regression technique on the statistical data to prepare correlations between the genetic variables and the neural images in order to obtain one or more first set of values, wherein the first set of values represents a statistical significance of the correlation. The method further comprises filtering the one or more first set of values based upon a predefined threshold interval in order to obtain a set of statistical significant values and generating a dataset based on the one or more first set of values, the set of statistical significant values, and prescription data. The method further comprises iteratively developing a supervised learned methodology using the dataset to derive one or more data patterns, wherein the supervised learned methodology is a technique to identify abnormal statistical data representing a disorder, and the one or more data patterns are one or more abnormal statistical values representing a disorder and mapping the data patterns to pre-stored neurological disorder data representing an abnormality to identify one or more abnormal statistical values from the data pattern to predict the neurological

disorder. The accessing, the applying the statistical processing technique, the applying the regression technique, the filtering, the iteratively developing, the mapping and the identifying of the method are executed by one or more processors linked to a cloud platform.
BRIEF DESCRIPTION OF THE DRAWINGS
[008] The detailed description is described with reference to the
accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to refer like features and components.
[009] Figure 1 illustrates a network implementation of a system to facilitate
prediction of neurological disorders is shown, in accordance with an embodiment of the present subject matter.
[0010] Figure 2 illustrates the system to facilitate prediction of neurological
disorders, in accordance with an embodiment of the present subject matter.
[0011] Figure 3 illustrates a method to facilitate prediction of neurological
disorders, in accordance with an embodiment of the present subject matter.
[0012] Figure 4 illustrates translation, rotation, zoom and shear of images
during preprocessing, in accordance with an embodiment of the present subject matter.

[0013] Figure 5 illustrates Gaussian curve, in accordance with an embodiment
of the present subject matter.
[0014] Figure 6 illustrates the image results after statistical processing and
statistical parametric mapping (SPM) from the SPM tool, in accordance with an
exemplary embodiment of the present subject matter.
[0015] Figure 7 illustrates the statistical data results after statistical processing
and SPM from the SPM tool, in accordance with an exemplary embodiment of the
present subject matter.
[0016] Figure 8 illustrates the functioning of the system comprising
computation using MapReduce function, in accordance with an embodiment of the
present subject matter.
[0017] Figure 9 illustrates the working of the machine learning module, in
accordance with an embodiment of the present subject matter.
[0018] Figure 10 illustrates results for comparing the performance of
Massively Parallel Processing (MPP) database with the Conventional database, in
accordance with an exemplary embodiment of the present subject matter.
[0019] Figure 11 illustrates average Latency test results across various Cloud
Platforms, in accordance with an exemplary embodiment of the present subject
matter.
[0020] Figure 12 illustrates the comparison of various spectrum sizes 'k' and
the results of the tests, in accordance with an exemplary embodiment of the present
subject matter.
DETAILED DESCRIPTION
[0021] System(s) and method(s) to facilitate data processing for predicting
neurological disorders are disclosed. The system and method disclosed in the present invention facilitates medical data processing for predicting neurological disorders

such as brain disorders, spinal cord disorders. The system and method also assists doctors and specialists by providing support for treatment recommendation for the neurological disorders in the course of treatment. The system combines image and data processing capabilities of the statistical parametric mapping (SPM) tool with a classification algorithm to generate data patterns and map the data patterns to pre-stored data in order to predict the neurological disorders. Further, the image and data processing techniques are used to process and analyze the neural images data sequences from various techniques such as Magnetic Resonance Imaging (fMRI), Electroencephalography (EEG) and Magneto-encephalography (MEG). The data obtained as a result of processing is stored in a Massively Parallel Processing (MPP) database. The computational power used in the present invention is applied by Hadoop and its MapReduce feature. The processing of large datasets is simplified with the use of Cloud computing technology in open stack Cloud Platform. Further, the system and method of the present invention facilitates to understand the neural mechanisms underlying the brain disorders and to suggest new perspectives on diagnostics and therapeutic intervention.
[0022] The data sequences available from the above mentioned sources are
processed using SPM. Using SPM, the analysis of neuro-imaging data generally starts with a series of spatial transformations. These transformations may include Realignment, Spatial Normalization, Co-registration and Spatial Smoothing. After these transformations, the available data is used to examine the differences over a time series that is, correlations between a task variable and brain activity in a certain area using linear convolution models of how the measured signal is caused by underlying changes in neural activity. Parametric statistical models are assumed at each voxel (volumetric pixel), using the General Linear Model (GLM) to describe the data in terms of experimental and confounding effects, and residual variability. Classical statistical inference is used to test hypotheses that are expressed in terms of GLM parameters. This uses an image with voxel values (Statistic image or Statistical Parametric Map) to obtain P-values. The P-values describe the significance of the

data with respect to any specified test hypotheses. The P-values from multiple tests are continuously compared.
[0023] These statistical values are collected for a number of tests and are then
compared to quantitatively evaluate the initial situation of the patient and to monitor brain diseases. This data will be accumulated from all the sources and will be stored in a MPP database. The relevant information from this database will be extracted and mapped to various treatment procedures prescribed by specialists. The computation is carried out using the Hadoop framework and the optimization is done here using MapReduce. Because data analysis involved is very complex for any traditional data processing applications, a Machine Learning technique called Support Vector Machine (SVM) is used to derive mappings and convert to prediction models as a part of analytics. This pattern mapping is critical in the study of mutations and analysis of rare brain diseases. So patterns derived by using these act as predicting models for the upcoming datasets and aids specialists and researchers in the diagnosis of neurological disorders for real-time treatments. An attractive Cloud Computing solution may very well provide massively scalable computational power and green credentials; as long as the off-site compute is located where renewable sources of energy are used preferentially. A MPP database in the form of Greenplum for parallel access of huge datasets, coupled with the computational brilliance of Hadoop. built on the foundation of Cloud and with a predictive parallel SVM algorithm is the next generation solution provided by the present invention.
[0024] While aspects of described system and method for facilitating data
processing while predicting neurological disorders may be implemented in any number of different computing systems, environments, and/or configurations, the embodiments are described in the context of the following exemplary system.
[0025] Referring now to Figure 1, a network implementation 100 of a system
102 for facilitating prediction of neurological disorders is illustrated, in accordance with an embodiment of the present subject matter. In one embodiment, the system

102 accepts and stores the data in one or more formats in one or more distributed databases wherein the data is medical data. The system works on a cloud platform for accessing data stored in the distributed databases and to perform the required computation. Further, the system applies statistical processing techniques on the data to generate statistical data and applies regression technique on the statistical data to obtain statistical significant data. Further, the system generate a dataset based on the statistical significant data and prescription data and derives one or more data patterns by using supervised learned methodology. The system maps the data patterns to pre-stored neurological disorder data and identifies one or more abnormal statistical values from the data pattern to predict a neurological disorder. Further, the system also recommends solution to the neurological disorder based on the pre-stored data related to a similar neurological disorder.
[0026] Although the present subject matter is explained considering that the
system 102 is implemented on one or more servers on a cloud platform, it may be understood that the system 102 may also be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, and the like. It will be understood that the system 102 may be accessed by multiple users through one or more user devices 104-1, 104-2... 104-N, collectively referred to as user 104 hereinafter, or applications residing on the user devices 104. Examples of the user devices 104 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, and a workstation. The user devices 104 are communicatively coupled to the system 102 through a network 106. In one implementation, the system 102 may comprise the cloud-based computing environment in which a user may operate individual computing systems configured to execute remotely located applications.
[0027] In one implementation, the network 106 may be a wireless network, a
wired network or a combination thereof. The network 106 can be implemented as one

of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, and the like. The network 106 may either be a dedicated network or a shared network. The shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like, to communicate with one another. Further the network 106 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like.
[0028] Referring now to Figure 2, the system 102 is illustrated in accordance
with an embodiment of the present subject matter. In one embodiment, the system 102 may include at least one processor 202, an input/output (I/O) interface 204, and a memory 206. The at least one processor 202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the at least one processor 202 is configured to fetch and execute computer-readable instructions stored in the memory 206.
[0029] The I/O interface 204 may include a variety of software and hardware
interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface 204 may allow the system 102 to interact with a user directly or through the client devices 104. Further, the I/O interface 204 may enable the system 102 to communicate with other computing devices, such as web servers and external data servers (not shown). The I/O interface 204 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. The I/O interface 204 may include one or more ports for connecting a number of devices to one another or to another server.

[0030] The memory 206 may include any computer-readable medium known
in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. The memory 206 may include modules 208 and data 210.
[0031] The modules 208 include routines, programs, objects, components,
data structures, etc., which perform particular tasks or implement particular abstract data types. In one implementation, the modules 208 may include a computational module 212, a correlating module 214, a filtering module 216, a Combination module 218, a machine learning module 220 and other modules 222. The other modules 222 may include programs or coded instructions that supplement applications and functions of the system 102. The modules 208 described herein may be implemented as software modules that may be executed in the cloud-based computing environment of the system 102.
[0032] The data 210, amongst other things, serves as a repository for storing
data processed, received, and generated by one or more of the modules 208. The data 210 may also include one or more distributed database(s) 224, and other data 130. The other data 130 may include data generated as a result of the execution of one or more modules in the other module 222.
[0033] In one implementation, at first, a user may use the client device 104 to
access the system 102 via the I/O interface 204. The user may register them using the I/O interface 204 in order to use the system 102. The working of the system 102 may be explained in detail in Figures 2 and 3 explained below. The system 102 may be used to facilitate predicting neurological disorders.
[0034] In accordance with an embodiment, referring to figure 2, the system
102 comprises one or more distributed databases configured to store the data in one or more formats. Further the data stored in the distributed databases may be accessed

by using cloud platform. The data is a medical data. The medical data may be related to neuroscience data. The data may be one or more neural image data, graph data, records, and genetic data. The data may be generated from various techniques such as MRI, fMRI, EEG, MEG, Positron emission tomography (PET) and Single-photon emission computed tomography (SPECT). From data perspective the present invention involves data acquisition, storage, analysis and shared access data referring to a large variety of information and measurements acquired or generated during MRI, fMRI, EEG, MEG, PET and SPECT study. Further, results from above mentioned studies generate data heterogeneous in nature. By way of example the data may include scanned images (functional, structural), information about the applied stimuli, parameters of the acquisition protocol, subject characterization (for example, age, gender, pathology) results of the analysis (for example, statistical and activation maps, locations of activated regions) and interpretation (for example, annotations). The data may be generated by different types of physically dispersed equipments and image analysis utilities may require significant amount of time and effort for adequate management. As a result, the developments in scanning techniques, analysis methods, and collaborative and multi-centre research results in data growth. Present invention provides a solution by using the large size neural data for predicting brain disorders by combining Image Processing and Big Data Techniques.
[0035] For many neurological disorders, prediction of disease state is an
important clinical aim. Neuro-imaging provides detailed information about brain structure and function from which such predictions can be statistically derived. This is done in two ways - (i) predict the disease state based on the whole-brain neuro-imaging data and (ii) analyze the relative information of different image modalities and brain regions. Neuroscience generates data from the sensors such as EEG, MEG, and MRI. From a data perspective, these studies involve data acquisition, storage, and analysis and shared access data referring to a large variety of information and measurements acquired or generated during fMRI, EEG and MEG study. In this case, the results are heterogeneous in nature.

[0036] Referring to figure 2, the system 102 comprises the computational
module 212 is configured to apply a statistical processing technique on the data based on the one or more formats of the data. The statistical processing techniques are applied on the data in order to generate statistical data. The statistical data comprises one or more genetic variables and neural images. The statistical data may be in the form of a joint neuro-imaging dataset. The joint neuro-imaging dataset may comprise one or more genetic variables and neural images.
[0037] In accordance with an embodiment, the statistical processing
techniques may be applied on the data by using SPM. The SPM techniques may apply different techniques based on the format of the data. By way of an example, the SPM techniques applied for processing MRI. fMRI and for EEG and MEG may be different.
[0038] In accordance with an embodiment of the present invention, the
statistical processing techniques for processing MRI and fMRI are explained in detail. The fMRI images have moderately higher spatial and temporal resolution. SPM for fMRI images may be obtained with three stages namely spatial preprocessing, linear modeling and Statistical parametric map. According to an exemplary embodiment, the spatial Pre-processing is explained in detail. The images from fMRI or MRI are generally distorted and have signal dropout. The procedure to get a standard template from the scan images of fMRI or MRI is explained. The objectives of the spatial preprocessing technique are to match all scanned images of an individual subject and also to match scans of all subjects into standard space. In this stage a series of functions may needs to be performed comprises realignment normalization and smoothing is explained in detail as followed.
[0039] According to another embodiment, the realignment technique is
explained. Realign is the basic function to match images. Realign function uses a rigid body transformation to manipulate the scanned images. Co-registration process is done by optimizing parameters of the scanned image that describe a rigid body

transformation between the source and a reference image. The co-registration process includes translations by moving the image in X, Y, or Z direction and rotations over the X, Y, and Z axis. By trial and error it is tried to get the manipulation that minimizes the difference between two scanned images. The translations and rotations are as shown in Figure 4.
[0040] According to another embodiment, the Normalization technique is
explained. Individual brains differ in size, shape and folding. The Normalize function is used to put scans into standardized space i.e. to make results from different studies comparable by bringing them into a standard coordinate system. The standardization of the images is achieved by applying processing technique wherein SPM uses template images, wherein the template images are averaged scanned images of the multiple subjects. Similar with the realign function, the normalize function determines the transformation that minimizes the differences between two scanned images by minimizing the sum of squares of intensity differences. An individual anatomical scanned image is normalized wherein the scanned image needs to fit onto the scanned image's corresponding template obtained from the registration process. The fit between the anatomical scan and the template is maximized by applying several transformations like translations, rotations, zooms and shears as shown in Figure 4.
[0041] According to another embodiment, the Smoothing technique is
explained. The Smooth function is used in spatial pre-processing to blur the functional images. The smooth function is applied to fMRI images to correct remaining functional or anatomical differences between the subjects. The subject is anatomical part of living being. The subject may be brain of living beings. Smoothing is achieved by averaging every voxel with a weighted sum of its neighbors wherein the weighting is defined by a Gaussian kernel. The size of the Gaussian Curve is given by its Full Width at Half Maximum (FWHM). The larger the FWHM, the more

smoothing is obtained. The Gaussian Curve can be referred from the image shown in figure 5.
[0042] According to another embodiment, the GLM is explained. The GLM is
employed to estimate the parameters of a temporal model (encoded by a design matrix) and derive the appropriate univariate test statistic at every voxel. Using GLM the data is described in terms of experimental and confounding effects, and residual variability. The GLM is an equation Y=Xβ+Є that expresses the observed response variable in terms of a linear combination of explanatory variables X plus a well behaved error termЄ. The matrix containing the explanatory variables e.g. designed effects or confounds is called the design matrix. Each column of the design matrix corresponds to an effect one has built into the experiment or that may confound the results. These are referred to as explanatory variables, covariates or regressors. In Statistical Parametric Mapping the GLM consists of a linear combination of basic functions as shown in the equation below as Equation 1.
[0043] Referring to Equation 1. every function 'f' corresponds to individual
parameters considered for the comparison. Statistical inference is used to test hypotheses that are expressed in terms of GLM parameters. GLM parameters are the parameters used for tests and comparisons. GLM parameters may vary for different types of input data. The GLM parameters may depend on the Voxel values, Density values etc. Hypotheses are tested using mainly one-way analysis of variance (ANOVA), one or two sample t tests and ytests depending on the type of analysis. Contrast vectors are established to make inquiries regarding the different regressors modeled on the design matrix in the general linear model. Statistical inference and interrogation of results is obtained using these contrast vectors to produce Statistical Parametric Maps and Posterior Probability Maps.
[0044] According to another embodiment, SPM is explained. According to
SPM technique, the images showing structural anatomy with neural activation

overlay are saved and arranged in mosaics to prepare them for feature extraction. Neuronal activation results are extracted from the image applying a binary mask. The mask is generated filtering the RGB matrix values with a predefined threshold. The contour recognition algorithm is used to label and detect groups of activated voxels. The image is scanned to detect contiguous pixels and assign numeric labels to connected areas. Perimeter and area are computed for each cluster. Measures can be accessed through an interface which allows the user to select the cluster to be analyzed. Different subject or group results are examined to find differences in areas of interest using a Graphical User Interface (GUI). The subject can be neural images of the living beings. The neural images can be the brain images of living beings. The group results correspond to the results of the neural images of the group of living beings.
[0045] In accordance with an embodiment of the present invention, the
statistical processing techniques for the data produced by the EEG and MEG are explained. EEG and MEG techniques typically produce a time-varying modulation of signal amplitude or frequency-specific power in some peristimulus time period, at each electrode or sensor. The data obtained from EEG and MEG sensors is required to be pre-processed before the core analysis on the data takes place. Similar to the fMRI data the pre-processing for MEG or EEG data also involves various steps. The order of the pre-processing steps for MEG or EEG data may not be fixed and may depend on different considerations.
[0046] According to an embodiment of the invention, the pre-processing of
the data produced by EEG and MEG is explained. The pre-processing of the EEG or MEG data may include Epoching, Filtering. Down sampling Baseline Correction, Artifact Detection and Rejection Averaging. The Epoching technique comprises cutting out little chunks of continuous data and saving them as "single trials'". In MEG or EEG research, Epoching may be a data selection procedure to remove long gaps between trials. An epoching process may start at some user-specified pre-

stimulus time and may end at some poststimulus time; for example, from -100 to 400 milliseconds in peristimulus time. By default, the epoched data may be baseline corrected, that is, the mean of the pre-stimulus time is subtracted from the whole trial. The filtering technique is explained. By way of an example the continuous or epoched data can be filtered, over time, with a lowpass, highpass, bandstop, or bandpass-filter. The down sampling is explained herein. Down sampling is a process of reducing the sampling output of any signal with respect to ratio. The data can be down sampled to any sampling rate. The Baseline Correction is explained. The Baseline Correction function subtracts the baseline from channel data. The baseline period needs to be specified in ms; for example, [-100 0]. The Artifact Detection and Rejection is explained. The data as accepted in the present invention may contain (neuronal) signals of interest and may also include large signals from other sources, like eye movements or muscular activity. These signal components may be referred as 'artifacts'. There may be many kinds of artifacts and there may be variety of methods for detecting the artifacts. The data channels containing artifacts in a large proportion of trials may be automatically marked as bad. There are many layers in signal processing. Trials can be referred as instances of durations within the layers. The averaging is explained. The averaging of single trial data is the crucial step to obtain the evoked response. When averaging single trial data, single trials are averaged within trial type. Power and phase data of single trials can also be averaged by using the SPM averaging function.
[0047] According to an embodiment of the invention, the statistical
processing of the pre-processed data produced by EEG and MEG is explained. Statistical processing of MEG or EEG data in SPM may use the similar techniques as the data types of PET, fMRI, and structural MRI in Voxel Based Morphometry (VBM) may use. The statistical processing of MEG or EEG data may require transforming data from SPM MEG or EEG format to image files for example NIfTI format. Once the data is obtained in image format for example NIfTI format, the statistical processing (analyses) for MEG or EEG data are procedurally identical to

between-subject analyses of PET or VBM data. By way of an example, once the data is obtained in image format for example NlfTI format, the statistical processing (analyses) for MEG or EEG data are procedurally identical to the second level processing (analyses) in fMRI data processing. The statistical processing (analyses) for MEG or EEG data considers one summary statistic image per subject per condition (or level of an experimental factor). The summary statistic image is a technical term for the data feature summarizing treatment effects that one wants to make an inference about. When the summary statistic is itself a maximum likelihood estimate based on within-subject data, the analysis may be called a summary-statistic procedure for random effect models.
[0048] According to an exemplary embodiment of the present invention, the
statistical processing of the data by SPM may include conversion of the data from native machine-dependent format to MATLAB-based common SPM format. A dataset in this format consists of two files: DAT-file which is a binary file containing just the data and MAT-file containing a structure with all the additional information related to the dataset. By way of an example, for fMRI data analysis the required data formats may be as followed. DICOM- Output from AMI scanner in Digital Imaging and Communications in Medicine (DICOM). Digital Imaging and Communications in Medicine (DICOM) is also known as the standard of medical images. Neuroimaging Informatics Technology Initiative (NIFTI) and is the image format used by SPM8, FSL etc. These images formats consist of two or one files such as .nii or .img (image data) and .hdr (header information). The conversion of DICOM to NIFTI may also done in SPM using the DICOM import option. By way of an example, for MEG or EEG data analysis the data formats are as followed. Raw data from EEG or MEG techniques may be required to put into format suitable for SPM to read. By way of an example, 'Convert' in SPM may be used to convert raw data from MEG or EEG technique to convert into suitable formats. Raw data and template files may be provided to the SPM. SPM further may create a .mat and a .dat file. The .mat

file contains information about the structure of the data. The .dat file contains the data itself.
[0049] Referring to figure 2, the system 102 further comprises the correlating
module 214 configured to apply a regression technique on the statistical data to prepare correlations between the genetic variables and the neural images from the statistical data in order to obtain one or more first set of values. The first set of values represents a statistical significance of the correlation. By way of an example, the value of statistical significance of correlation is termed as 'P value'. The statistical data is in the form of a joint neuro-imaging dataset wherein the joint neuro-imaging dataset comprises one or more genetic variables and neural images. Several regressions are applied on the statistical data to provide several sets of the first set of values. The several sets of first set of values can be termed as intermediate data. The genetic information in conjunction with brain imaging data significantly improves understanding of both normal and pathological variability of brain organization. It may also lead to development of biomarkers and personalized medicine in the near future. Further, the adapted statistical processing techniques detect significant associations between the highly heterogeneous variables provided by genotyping and brain imaging. The major requirement of the present invention, the large-scale computation is solved by the computational brilliance of Hadoop.
[0050] Referring to figure 2, the system comprises the filtering module 216 is
configured to filter the one or more first set of values based upon a predefined threshold in order to obtain a set of statistical significant values. The filtering may be implemented by means of the reducer function of the MapRduce functionality of Hadoop Database system. Reducer accepts the one or more sets of first set of values and performs filtering on the first set of values. Reducer accepts the intermediate data and merges the intermediate data and selects the values within a threshold interval in order to generate a set of statistical significant values. The reducer compares the intermediate data in order to compare the values of each simulation and selects the

most significant (p-values. The set of statistical significant values are the statistically most significant values providing the most significant correlation between the joint neuroimaging dataset. Thus the set of statistical significant values provides the maximum accuracy towards detection of disorder in the neural function.
[0051] According to an exemplary embodiment of the present invention,
Figure 6 shows the image captured for Brain using an MRI scanner. The image is in MRI image format. Figure 6 shows the four stepwise images obtained from SPM tool for the MRI brain image. The images are the output of SPM tool for MRI scan brain image. Figure 6 shows a Tl (structural) and a functional blood-oxygen-level-dependent (BOLD) image in the initial state (1), after reorientation, realignment and co-registration the image in state (2) and after the normalization and functional smoothing step the image in state (3). The fourth section in the figure shows the statistical parametrical map computed and an overlay of the neuronal activation areas on the normalized Tl image.
[0052] According to an exemplary embodiment of the present invention, the
statistical inference obtained from SPM tool for the MRI scan image of brain as shown in figure 6 is provided in figure 7. Figure 7 provides the statistical data provided by the SPM tool for the MRI brain image. The data shows dimensions of the brain, P values, the data related to abnormality in the brain in the form of Expected Voxels per cluster, Expected clusters and the actual volume, actual volume in number of voxels and actual voxel size obtained from SPM. This data further helps to identify or predict the abnormality in the brain. Figure 7 also shows the statistical data obtained for the abnormal patches seen in the forth quadrant of the Figure 6. The data also describes the grey patches (right side image in the 4th Quadrant) that are formed from the processed brain image. Further, based on the statistical data obtained from the SPM tool and after providing this data to machine learning module, the machine learning module is enabled to support in detecting the disorder. The disorder detected as shown in the abnormal data in figure 7 is Alzheimer's disease.

[0053] According to an exemplary embodiment, referring to figure 8, the
initial data to be processed by the statistical techniques is given by a (X, Y) pair of a joint neuroimaging dataset, where X represents a set of genetic variables and Y represents a set of brain images obtained may be from a functional MRI. A variable has the dimension 'nv', and there are 'ns' such variables, where 'ns' gives the number of subjects. Considering the univariate testing, it is assumed that the set of values as {0, 1, 2}. X as a matrix is obtained containing one of these values. Since each image is divided into voxels, the set of values that forms a single variable in Y represents a value for a region of the brain image. The number of variables is given by the number of subjects (ns), a matrix is formed of real values. Subject is the number of patient whose data is under analysis. After performing the necessary computations the correlations between X and Y are obtained, giving a matrix of size nv x ng containing the first set of values may be referred as 'p-values' that represents the statistical significance of the association. Several regressions are performed with the data in order to obtain the correlations between X and Y of the statistical data, each giving a set of such correlations, and all these intermediate data is stored to compare the values of each simulation in order to keep the most significant p-values that is a set of statistical significant values. This set of statistical significant values forms a significant dataset for Machine Learning. Further, an intermediate state of data may be obtained wherein matrices of doubles (8 bytes) of size nv x ng is formed. The nv x ng is a matrix having nv rows and ng columns. 8 bytes may be the size of the parameters within the matrix. While considering the data of patients residing at a global scale, the matrix can grow in petabytes. The amount of intermediate data that must be stored can scale up to 80 petabytes. This requirement of massive scalable storage for data is fulfilled by the OpenStack Cloud Infrastructure.
[0054] According to another embodiment referring to figure 8, the
computations performed in order to obtain the correlations between the joint variables of the statistical data is described in detail. The joint variables of the statistical data such as X and Y are obtained several computations needs to be performed on

different sets of the statistical data comp. The required computations cannot be performed on standard procedure and equipment. The computations can be run in parallel because the computation can be sliced over multiple dimensions such as neuroimaging, genetic or permutations. The neuroimaging dataset is split into multiple nodes by applying parallelization with respect to permutation by using MapReduce function of the Hadoop Database System. The statistical data may also have various parameters such as density, position 1, position 2 etc. The MapReduce takes the set of statistical data having parameters for example density, position 1, position 2 etc and shuffle it and perform the regression between the two sets of data X and Y to obtain the correlation between them. For example, one module of MapReduce takes values of density, positionl and position2 from a single source file and stores it in one node, another module takes the values of density, position 1 and position 2 from another source and store it in another node and so on. Further, the shuffling is done in which the mapper of MapReduce accumulates all the values of density into one single node and another mapper accumulates all the values of position 1 into a single node and so on. The regression function of the MapReduce performs a series of matrix operations over the accumulated data at a number of nodes as described above and provides a matrix as output. Each mapper performs regression and generates a matrix as the first set of values or an intermediate data. In the final step the reducer performs filtering on the values of matrix. The Reducer takes the intermediate data, merges the intermediate data and selects the values within a threshold interval. For example, the reducer captures the intermediate data obtained from the mapper and deletes duplication of values and multiple numbers of occurrences of the values. The reducer combines the intermediate data into one single entry or a file. One or more reducers may filter the intermediate data in parallel. Further, the results of filtration are combined to get the set of statistical significance of the correlation can be called as unique statistical significance of the association. The set of statistical significance of the correlation may be combined with prescribed

treatment procedures by the Doctors and practitioners. This combined data is stored in Greenplum MPP Database and is used for machine learning in Alpine Miner tool.
[0055] Referring to figure 8, according to an exemplary embodiment working
of MapReduce in parallel mode in order to obtain the set of statistical significant values is explained. By way of an example, there are 3 mappers as shown in the figure 8. The shuffles are performed between variables X and Y in order to obtain intermediate data. For example as shown in figure 8, Xl=Shuffle(X), X2=Shuffle(X) and X3=Shuffle(X). Further, Shuffles of X such as XI, X2 and X3 are mapped with Y or shuffles of Y to obtain the intermediate data R. For example, after mapping between shuffles of X and Y such as R1=X1 op Y, R2=X2 op Y, R3-X3 op Y intermediate data obtained is Rl, R2 and R3. One or more reducers can be implemented to filter intermediate data in parallel.
[0056] Referring to figure 2, the system 102 comprises the combination
module 218 configured to generate a dataset based on the one or more first set of values, the set of statistical significant values, and prescription data. According to an embodiment of the invention, the functionality of the combination module is implemented by means of the reducer function of the MapReduce of Hadoop Database System. The result from MapReduce phase is the combination of the unique statistical significance of the association with prescribed treatment procedures by the Doctors and practitioners. By way of an example, the brain disease file is grouped into four levels such as patient ID, p-values. statistical significance and prescribed treatment procedures by the Doctors and practitioners are saved as a CSV file in a format that can be used as a dataset for the Parallel SVM Algorithm during Machine Learning. The dataset is the structured file that is provided as input to the Machine Learning Module 220. The dataset is further labeled in a binary way with p-values or first set of values and disease names, so as to train the algorithm in the Machine Learning - Training phase.

[0057] According to an exemplary embodiment of the invention, the
generation of dataset is explained in details. The data related to one patient can be stored in multiple files. Further, the files related to one patient are accumulated to group it. Similar grouping of files may be done for all the patients available on the system 102. Once the grouping of the files is completed, each field in the grouped file is separated and k-spectrum of messages is placed in a single line of the dataset and is stored in the CSV format. The preparation of the dataset can be implemented using a Folding Window technique or a Sliding Window technique. In Folding Window method, the dataset is prepared by taking 'k' (wherein 'k' is a window size can be any whole integer) records from the grouped file and arranging them in a single record in the xsv format. In case of the Sliding Window approach, the dataset is prepared by taking n to n+k records, where n varies from 1 to difference of number of records in the flat file with k, and arranged as a single record in the CSV format.
[0058] Referring to figure 2, the system 102 comprises the machine learning
module 220 configured to generate a dataset based on the one or more first set of values, the set of statistical significant values, and prescription data. The machine learning module 220 is further configured to iteratively develop a supervised learned methodology using the dataset to derive one or more data patterns. The supervised learned methodology is a technique to identify abnormal statistical data representing a disorder. The one or more data patterns are one or more abnormal statistical values representing a disorder. The machine learning module 220 is further configured to map the data patterns to pre-stored neurological disorder data representing an abnormality in order to identify one or more abnormal statistical values from the data pattern to predict a neurological disorder. The machine learning module works on the SVM Learning Algorithm. More specifically the machine learning module works on the Parallel SVM Learning Algorithm.
[0059] Referring to figure 9, the working of the machine learning module is
explained in detail. The result from MapReduce is a combination of unique statistical

significance of the association with prescribed treatment procedures by the Doctors and practitioners which becomes dataset as an input to Machine Learning Module. The combined data in the form of dataset is stored in Greenplum MPP Database and is used for Machine Learning in Alpine Miner tool. The functioning and implementation of the machine learning module is divided into three distinct phases such as Machine Learning - Training phase. Machine Learning - Evaluation Phase and Production (Failure Prediction and Solution) phase. Referring to Figure 9 shows the overall workflow of the three phases. The region 1 indicates the Machine Learning - Training phase. The region 2 indicates the Machine Learning - Evaluation phase and the region 3 indicates the Production phase,
[0060] Referring to figure 9, according to an embodiment, the machine
learning- training phase is described in detail. The dataset in CSV file format is accepted by the machine learning module. The implementation of the Machine Learning - Training phase is done by setting up a workflow using the Alpine Miner Analytics tool. The tool has a provision to create Machine Learning models, merge different learned models and also support Hadoop Integration. The output of the training phase is a Learned Model that can isolate and predict type of disease when fed with the dataset of the same format as in the learning phase.
[0061] Referring to figure 9, according to an exemplary embodiment, the
Alpine Miner Analytics Tools accepts the dataset in CSV format. The data in dataset is labeled with first set of values or p-values statistical significance and disease names for the Parallel SVM algorithm. In Alpine Miner Analytics Tools, the dataset operator denotes a database table or view. Either a database connection or a file path can be specified to access the data. The preferable file format is CSV. The dataset is fed into a Random Sampling operator which extracts data rows from the input dataset and generates sample Tables/views according to the sample properties specified by the user. The percentage or the number of rows to be sampled can be mentioned as input parameter to the Random Sampling operator. The output schema of the table that has

to be created is specified after sampling and also the columns that have to be sampled are selected. The dataset is randomly split in 80:20 proportions. The first part of 80% data is labeled as training data and it is used as the input for the SVM Classification operator in Machine Learning-Training phase. The other part of remaining 20% data is labeled as validation data and it is kept for use in the Machine Learning-Evaluation phase. The two split parts are saved using the Sample Selector operator. The Sample Selector connects to a preceding sample generating operator (for example, Random Sampling operator). Alpine Miner Analytics Tools allows users to specify one of the sample datasets generated from the preceding sample generating operator (may be Random Sampling) for use in the succeeding operator (SVM Classification).
[0062] Still referring to figure 9, the SVM Classification operator applies
Parallel SVM Classification algorithm on input dataset (for example, a Database Table or a CSV file). Users can select the dependent columns of the dataset that can be used for Machine Learning. The type of message and its unique message number are important factors for predicting the disease. The "Message Type" column is required to determine the cause of a problem/disease. The "message type" refers to the disease that is involved and every disease or pattern may be allocated with specific numbers for differentiating. The diseases or the different stages of diseases may be identified by the specific number. The Description field is required to suggest solution. The output of the SVM Classification node is the learned methodology which is named as Learned Model. When a model is trained by a 'trainer' operator for example, SVM Classification operator, the Model operator connected to it can retrieve the trained model and save it. By way of an example, the preferred spectrum size is taken as 5 so that each line in the dataset will consist of 5 sets of syslog fields. The spectrum size of 5 was found to be suitable because the spectrum size of 5 provides the maximum accuracy in Machine Learning.
(0063] According to an embodiment of the invention, referring to figure 9, the
machine learning-evaluation phase is described. The dataset prepared similar to

machine learning-training phase is fed to the machine learning-evaluation phase. After the dataset is prepared, the abnormal statistical values from neuro-images are identified and labeled to the dataset. But in this phase, though the data is labeled, the label is not fed into the Parallel SVM algorithm. The label is used to check the correctness of predictions made by the learned model. By way of an example, the Machine Learning-Evaluation phase is implemented using the Alpine Miner Analytics Tool. In this workflow, the training data is fed to the SVM Classification operator directly to formulate the learned model. The SVM Prediction operator utilizes a SVM model to apply prediction to the input dataset. The dataset contains the columns in which the names are the same as the columns in the dataset selected for Model training, except for the dependent column. If the model is a classification model, the prediction will be a classification prediction. If the model is a regression model, the prediction will be a regression prediction and so on. SVM prediction operation adds prediction columns with the columns of the input dataset into a prediction table specified by the user. The operator also includes additional column with the syntax P(column_name). For example, P(Disease_Name). This is the field that contains the prediction results. The columnname can be specified by the user. The SVM predictor can take two inputs, one is the model and the other is the data.
[0064] Still referring to figure 9, the output from the SVM Classification
operator and the validation data are fed into components such as Receiver Operating Characteristic (ROC), Goodness of Fit, and Lift. The ROC operator in Alpine Miner verifies and compares the trained model passed from preceding model operator by applying the algorithm on the dataset passed from the preceding operator. The ROC method considers the coordinate pairing of the False Positive rate (FP) and the True Positive rate (TP). This set of coordinates forms the ROC curve. The Goodness of Fit operator verifies a trained model. This operator applies the trained model on the input dataset, and then calculates precision, recall, sensitivity, specificity, and accuracy. The Lift operator of the Alpine Miner verifies and compares the trained model from

preceding model operator by applying it to the dataset supplied by the preceding data operator.
[0065] Still referring to figure 9. in the Machine Learning - Evaluation phase,
the different Kernel modes such as Polynomial, Dot Product and Gaussian are accurately evaluated. Also, the impact of various parameters in each of these Kernel modes can be studied. The Kernel and parameters with the best performance is chosen for implementation in the Production phase. After the best Kernel mode and the parameters for the Kernel are selected, the size of the spectrum, represented by 'k' can be determined. To determine the size of spectrum k, the accuracy of three different spectrum sizes was compared and the most suitable spectrum size was used for the dataset. After the model is created, the performance of the model may be evaluated. To evaluate a model's performance, labeled dataset is provided but the label is not disclosed to the model. The model is made to classify the data label. After the classification is complete, the label created by the model and the original label are compared to derive the confusion matrix. The confusion matrix may be used to check the performance of the algorithm. The columns of the confusion matrix may give the values of prediction and the rows represent the instances of the actual values.
[0066] According to another embodiment of the present invention, referring
to figure 9, the machine learning-production phase is described. The machine learning production phase can be sub-divided into two stages namely, a Prediction Phase and a Solution Phase. The Prediction Phase takes the processed dataset as input to the learned module generated by the machine learning training phase. The processed dataset can be a CSV file as mentioned in the above described paragraphs. The processed dataset is the labeled dataset. The learned model analyses the statistical value sequences from the dataset and generates the data patterns. The data patterns identified by the learned model are the abnormal statistical values representing the medial disorder. The medical disorder can be the neurological disorder. Further, based on the learnt abnormal statistical values from the data

patterns, the learned model predicts the disorder in the organ of living being. After the disorder is identified, the control shifts to the solution phase. The solution phase refers to the entries available in the Greenplum MPP database based on the pre-stored data and may recommend the solution to cure the disorder in the organ. The pre-stored data is the history data related disorders in the neural organs. Further the pre-stored data may be the past data related to similar disorder. The disorder may be related to neurological disorder. The organ may be brain, spinal cord etc.
[0067] According to an exemplary embodiment of the present invention the
working of the prediction phase is explained in detail. According to the production phase, the learned mode) may be implemented atop Greenplum MPP Database in order to fetch the statistical values from the dataset and predict the disorder with higher rate of speed. In production phase, the learned model fetches the dataset file can be a disease file, and the Extraction, Loading and Transformation (ELT) process is applied by the learned model on the dataset in order to generate a cleaned dataset. The cleaned dataset is fed into the Greenplum Database with Hadoop Distributed File System (HDFS). Data handled by the Greenplum MPP database can be transformed and processed in-flight, utilizing all nodes of the database in parallel, for high-performance ELT (extract-load-transform) and ETLT (extract-transform-load-transform) loading pipelines. Final gathering and storage of the data to disk takes place on all the nodes simultaneously, with data automatically partitioned across nodes and optionally compressed. This technology is exposed to the DBA via a programmable "external table" interface and a traditional command-line loading interface. The learned model obtained as the output from the Machine Learning -Evaluation phase is connected to the Greenplum MPP Database on HDFS using Alpine Miner Analytics Tool so that it fetches the dataset from the database and predicts the diseases. The prediction phase implements the parallel SVM classification algorithm by means of the Alpine Miner Analytics Tool in order to generate the data pattern and predict the disorder. The learned model fetches the data from the parallel database and generates the data pattern and predicts the disorder

based on the data patterns. The predictor phase may provide the output as a disease file.
[0068] According to an exemplary embodiment of the present invention the
working of the solution phase is explained. The solution phase accepts the output of the prediction phase may be the disease file as input to the learned model. The learned model in solution phase may identify the cause of the disease and suggest a solution from the database having pre-stored data related to the solutions of the disorders. The system can also point to the reason that may lead to the disease so that the disorder pertaining to data patterns can be easily diagnosed. The prescribed medical procedures to recover from the disorder will be predicted from the database containing the solution for disorders or diseases that may be already occurred in the past. Further, two different databases such as Greenplum MPP Database and PostgreSQL may be compared for efficiency and the best database may be selected for implementation. By default, Alpine Miner Tool saves the statistical mapping file as a database table in PostgreSQL. The objective of the Solution phase may be to
extract the data patterns pertaining to a disease (disorder) from the database and may
i
present the data pattern to the user in an easily understandable form and suggest a solution for that disease (disorder) so that the counter measures for the disease (disorder) can be performed without any further delay. There can be more than one disorder which may be predicted by the learned model and the solutions may be provided by the learned model. The Alpine Miner Analytics tool provides support for the parallel implementation of the learned model to connect to the Greenplum Database and to fetch the data for classification. The learned model may be a fine tuned learned model. The learned model is a supervised learned methodology. Further, the learned model is a technique to identify abnormal statistical data representing a disorder
[0069] According to an embodiment of the present invention, the
implementation of the system 102 by means of cloud platform is explained. The

processing of large datasets is simplified with the use of Cloud computing technology in open stack Cloud Platform. Open Stack Software delivers a massively scalable Open Source Cloud Operating System. OpenStack consists of a series of interrelated projects (like Hadoop, MapReduce, SNIA-CDMI etc) delivering various components for a Cloud infrastructure solution. OpenStack comprises of three core projects like a Compute, an Object Storage and an Image Service. As neuroimaging research continues to grow, dynamic neuro-informatics systems are necessary to store, retrieve, mine, and share the massive amount of data. Since implementation of system 102 may require execution of computationally intensive tasks and scalable infrastructure and neuro-informatics is the area that can be looked upon or challenged. Therefore Cloud Computing and its large dataset handling capabilities in the study of Computational Neurosciences were explored. The use of large datasets, its highly demanding algorithms and the need for sudden and powerful computational resources, make large-scale Neurosciences experiments an attractive test-case for Cloud Computing.
[0070] According to another embodiment, the OpenStack Compute feature is
explained. OpenStack Compute is Open Source software and standards for large-scale deployments of automatically provisioned virtual compute instances. This layer may provide Compute instances i.e. Compute nodes or VMs to Hadoop framework which may be a computational backbone for this solution. The OpenStack Object and Block Storage is explained. OpenStack provides redundant, scalable object storage using clusters of standardized servers capable of storing petabytes of data. Object Storage unlike a traditional file system, is a distributed storage system for static data such as virtual machine images, photo storage, medical image storage, backups and archives. Block storage is appropriate for performance sensitive scenarios such as database storage, expandable file systems, or providing a server with access to raw block level storage. OpenStack Cloud offers both computation and storage capacity on demand. A particular case of application of the present invention may be the data-intensive applications, which needs efficient data transfer as the main requirement. In

order to satisfy this requirement, the computation platform which executes the application needs an appropriate storage backend which permits efficient data manipulation. The OpenStack Image Service is explained. OpenStack Image Service is a package and provides discovery, registration, and delivery services for Virtual Disk Images (VDI) which can be pre-installed with Image Processing tool and analytics packages. These VDIs may be used to launch instances on OpenStack Cloud for Neural Image Processing.
[0071] In the cloud computing, the hosted services provided in the form of
application service provisioning that run client server software at remote location are 'SaaS' (Software as a Service), 'PaaS' (Platform as a Service), 'laaS' (Infrastructure as a Service), 'HaaS' (Hardware as a Service) and finally 'EaaS' (Everything as a Service). End users access cloud-based applications through web browser, thin client or mobile app while the business software and user's data are stored on servers at a remote location.
[0072] According to another embodiment of the present invention
implementation of system 102 by means of cloud platform is explained. At IaaS level, clients may run a distributed application using a set of VMs encapsulating it, may be running under certain limitations. Direct access to local storage space on the physical machine is denied considering data security aspect, clients are instead provided with a specialized storage service that they can be accessed through a specific Application Programming Interface (API). Through specific API, the access to patient's images and storage is restricted to public. At PaaS level, the clients provide the application that complies with the software platform. In addition to a remote storage in OpenStack Cloud, specialized file systems have been designed, such as HDFS, the default storage layer of Hadoop's MapReduce framework, which allows better data manipulation than the alternative of using remote storage.
[0073] Current scan modalities and other neuro-images like fMRI, MRI,
EEG, and MEG may be stored on hard-disks present at the scanner's location and

backed up to a centralized Cloud storage facility for Image processing through SPM tool. Information about each scan session is stored in the centralized MPP database for easy querying, and in a parallel way. Clinical assessments, questionnaires and psychological measures data are stored in a secure, reliable centralized MPP database. New data may be added to the database through unique subject-entry which requires dynamic recommendation and treatment method. Images processed through SPM and legacy psychological data from Doctors and practitioners which is stored in Greenplum MPP Database located on HDFS and it is further used for Machine Learning and dynamic prediction of neurological disorders.
[0074] In the present scenario, OpenStack Cloud is being primarily used for
some critical functions in a web-based biological database of neuroimaging. Web-based neuroimaging offers versatile, automatable data upload/import/entry options, rapid and secure sharing of data, querying and export all data, real time data analytics and real-time reporting which may be a great help for large institutions as well as smaller scale neuroscience and neuropsychology researchers. The database is running on Greenplum infrastructure. Greenplum MPP Database is great but it has certain limitations that prevent biological databases from running in its platform alone. So it was required to use NumPy, matplotlib and Biopython among other Python packages as well as command line programs written in C. These tasks are submitted to the Cloud for processing where the option of higher computational speed is available. The complex nature of the algorithm coupled with data and computational parallelism of Hadoop grid and massively parallel processing database for querying from big datasets containing petabytes of scan data, improves the accuracy, speed and optimizes querying from big datasets residing on the Cloud.
[0075] According to an exemplary embodiment, the REST API is used for
fetching neuro-imaging data from databases. Web application can be used to gather neuroimaging data from various sources like research centers, scientists and neuroimaging labs worldwide. Each core project may expose one or more

HTTP/RESTful interfaces for the purpose of interacting with the outside world. An existing API based on HTTP protocol permits any application, executed on a cloud computation node or outside the cloud, to perform data operations (put/get). When it comes to data-intensive applications, like some of the scientific ones, the distance between computation nodes and data nodes affects the performance. The idea proposed in the present invention is to exploit the data locality principle by having the data and the computation in the same nodes of a public cloud. This approach trades data-availability with efficiency, a good tradeoff for scientific applications that are executed in the Cloud.
[0076] According to an exemplary embodiment, default databases used in
OpenStack are SQLite, MySQL and PostgreSQL. The experiments were conducted with MySQL and PostgreSQL databases to fetch the sequences from the Cloud. It was observed that more complicated SQL queries are much slower due to the lack of clustered indices and poor join algorithms. Further, the databases were replaced with EMC® Greenplum® HD Community Edition in the Cloud environment to fetch the dataset in parallel and process it. The datasets were run faster with a more optimized code. So SQL query was optimized using Greenplum MPP Database. All steps in the pipeline can be done from a web application, again through the REST API.
[0077] According to an exemplary embodiment, AMQP and Nova in
OpenStack Cloud is explained. Further, HDFS deployable on the computational nodes of a public Cloud was created. By adapting and improving it, the system of the present invention offers all its basic properties and gains the features of the OpenStack Cloud such us automatic deployment, recovery of failed nodes, runtime scalability and privacy. Any type of application like scientific applications (ie SPM8) can be automatically deployed and can get benefit from this contribution. The system offers the necessary means to combine all local storages of worker roles into a unique and uniform file system that can be used to communicate and to pass data between workers. In the same time, data manipulation is enhanced due to data locality.

Advanced Message Queuing Protocol (AMQP) is the messaging technology chosen by the OpenStack Cloud. The AMQP broker, either RabbitMQ or Qpid, sits between any two Nova components and allows them to communicate in a loosely coupled fashion. More precisely, Nova components (the compute fabric of OpenStack) use Remote Procedure Calls (RPC) to communicate to one another.
[0078] Nova implements RPC (both request+response, and one-way,
respectively 'rpc.calF and 'rpc.cast') over AMQP by providing an adapter class which take care of marshaling and unmarshaling of messages into function calls. Each Nova service (for example Compute, Scheduler, etc.) create two queues at the initialization time, one which accepts messages with routing keys 'NODE-TYPE.NODE-ID' (for example compute.hostname) and another, which accepts messages with routing keys as generic 'NODE-TYPE' (for example compute). The former is used specifically when Nova-API needs to redirect commands to a specific node like 'euca-run-instances'. In this case, compute node runs instance of the specified machine image on host's hypervisor. The API acts as a consumer when RPC calls are request/response, otherwise is acts as publisher only. Every Nova component connects to the message broker and, depending on its personality (for example a compute node or a network node), may use the queue either as an Invoker (such as API or Scheduler) or a Worker (such as Compute or Network).
[0079] A common assumption in public clouds is that the application may use
the default communication mechanisms (e.g. AMQP), so they may not need the IPs of the virtual machines. Remote Procedures Calls (RPC) is the method through which a process can execute a procedure/function in another address space in order to communicate and to transmit data between entities. Because of this communication model, the Invoker must provide the configuration file with the addresses of all entities to each entity when this is started.
[0080] The Invoker is configured to be the first process started and executed
when a virtual machine is started. It will access a special service provided by the

cloud to ask for metadata about all instances like IPs, NODE-TYPE/NODE-ID etc. Based on this information this agent is able to take the necessary decision in order to start the HDFS. The decision mechanism was designed to allow customized policies for the HDFS and for the application. The Invoker creates the configuration file and based on the ID of the virtual machine, it starts the appropriate entity on the local machine. It also takes into account how many entities of each type were specified in the policy. The policy also permits to specify if some nodes will be used just for running the application or just to run specific entities of AMQP. If specified, the Initiator is able to start different applications on each machine, like MapReduce on one machine and Image Processing tools in another machine on the Cloud.
[0081] According to another embodiment of the present invention, the
working of the Hadoop database and the MapReduce functionality of Hadoop database with respect to present invention is explained. Hadoop is a software platform specifically designed to process and handle huge amount of data. Hadoop is based on the principle that moving the computation where the data resides is cheaper than moving large data blocks to the compute nodes. Hadoop is scalable, economical, efficient and reliable. Hadoop implements MapReduce, using the HDFS. The MapReduce programming model, in recent times, has evolved as a very effective approach to develop high-performance applications over very large distributed systems such as grids and clouds. The storage layer forms the core of the MapReduce framework. The storage layer must meet a series of specific requirements to enable massively parallel data processing to a higher degree over a large number of nodes.
[0082] Initially. MapReduce data is typically stored in huge files. The
requirement of the present invention is that the computation should efficiently process small parts of these huge files concurrently. The storage layer should provide efficient access to the files. Further, the storage layer should sustain high throughput in spite of concurrent access to the same file. These requirements have not been addressed efficiently enough within the Hadoop framework. So the problem was

solved by using the Greenplum database in combination with HDFS. Greenplum database's shared-nothing MPP architecture provides every segment with a dedicated, independent high-bandwidth channel to its disk. The segment servers are able to process every query in parallel manner, using all disk connections simultaneously and efficiently flow data between segments as query plan dictates. The degree of parallelism and overall scalability allowed by the Greenplum database far exceeds general-purpose database systems.
[0083] According to another exemplary embodiment of the present invention,
the system configuration and optimization of the system configuration by using the Hadoop database and the MapReduce functionality of Hadoop database with respect to present invention is explained. Working in a distributed context is necessary to deal with the memory and computational loads, which requires specific optimization strategies. Once the unitary cost is minimized, the main task while implementing such data parallel applications is to choose how to split the problem into smaller sub-problems and minimize computation, memory consumption and communication overhead. Hence, an efficient and optimized framework of the present system 102 that can handle all of the above said factors is explained. The system 102 is configured with several changes to the default configuration settings. By way of an example, the HDFS version may be used for implementation was 0.20.203. Data in HDFS may be stored using 128MB data blocks instead of the default 64MB. For example, setting input data size to the system 102 as 160 GB and dfs.block.size as 64 MB the minimum number of maps results to 2560 maps ((160 GB*1024 MB)/64 MB). If dfs.block.size is set to 128 MB the minimum number of maps results to 1280 maps ((160 GB*1024 MB)/128 MB)). In a small cluster having 2-3 nodes the map task creation overhead is considerable. Increasing the dfs.block.size leads to better utilization of the cluster resources. Two Map instances and a single Reduce instance are configured to execute concurrently on each node. In this approach, different combination of sequences were generated and stored in the Greenplum database. User can read and write files in parallel from Greenplum to HDFS, enabling rapid and

simple data sharing. Cross-platform analysis can be performed using the power of Greenplum SQL and advanced analytic functions to access data on HDFS. Each combination executes the Map/Reduce phases in parallel.
[0084] Further by default, each file is split into blocks and processing can
take place in parallel. More buffer space is allowed for file read/write operations 132 MB and increased the sort buffer to 200 MB with 100 concurrent streams for merging. Additionally, the number of parallel transfers run by Reduce during the shuffle phase and the number of worker threads for each TaskTracker s server were modified to be 15. The job tracker web UI provides information about general job statistics of the Hadoop cluster, which indicates running, completed or failed jobs and a job history log file, mapred.compress.map.output: The parameter, Map Output Compression, is to be set true for large cluster and large jobs. This parameter delivers faster disk write, saves disk space, consumes less time in data transfer (from Mappers to Reducers).mapred.map/reduce.tasks.speculative.execution: In implementation, the speculative execution parameter is set to false to increase the job time if the task progress is slow. In a busy cluster, speculative execution parameter can reduce overall throughput, since redundant tasks are being executed in an attempt to bring down the execution time for a single job.
[0085] Each virtual machine (node), in OpenStack cloud, has a local storage.
The system of the present invention overcomes the limitations of data not being persistent and accessible by other nodes and exploits these resources efficiently. The HDFS uses the local space on each node to store the data and offers the possibility to all nodes to access data from any other nodes. By placing the data on the local storage of the nodes, the data is as close as possible to the computation, thus applying the principle of data locality. Fault tolerance is addressed through the parameter replication by dfs.replication set to 3. having several replicas ensure that even in case a node fails, the remaining nodes can still retrieve the remaining copies of the data. In addition to efficiency of data manipulation, another advantage relates to the cost. By

using a HDFS instead of the storage offered by cloud providers, the cost for storage and for the bandwidth traffic are eliminated.
[0086] It is not necessary to alter the input data for MapReduce programs;
therefore the files were loaded on each node in parallel directly into HDFS as plain text using the command-line utility. Storing the data in this manner enables MapReduce programs to access data using Hadoop's TextlnputFormat data format, where the keys are line numbers in each file and their corresponding values are the contents of each line. It was found that this approach yielded the best performance in both the loading process and task execution, as opposed to using Hadoop's serialized data formats or compression features. The Mapper implementation, via the map method, processes one line at a time, as provided by the specified TextlnputFormat. It then splits the line into tokens separated by whitespaces, via the StringTokenizer, and emits a Key-Value pair of (, 1). Hence, the output of each map is passed through a sorting phase which sorts the output of Map according to the keys.
[0087] According to an exemplary embodiment, the implementation of the
correlating module via a mapper function is provided in the following Table 1.

Mapper Function:
Public class Mapper extends MapReduceBase implements Mapper
{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException
{
String line - value.toString();
StringTokenizertokenizer = new StringTokenizer(line):
while (tokenizer.hasMoreTokens())
{
word.set(tokenizer.nextToken()); output.collect(word, one);
} } }
Table 1: Mapper function
[0088] According to an exemplary embodiment, the implementation of the
filtering module via a reducer function is provided in the following Table 2. The output of the Mapper function is given to the Reducer function, which sums up the values, which are the occurrence count for each key.

Reducer Function:
public class Reducer extends MapReduceBase implements Reducer
{
public void reduce(Text key, Iterator values,
OutputCollector output, Reporter reporter)
throws lOException
{ int sum = 0;
while (values.hasNext())
{
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
} }
Table 2; Reducer function
[0089] According to an exemplary embodiment, the configuration and
optimization of the Greenplum database with respect to the present invention is explained. Greenplum Database's shared-nothing MPP architecture provides every segment with a dedicated, independent high-bandwidth channel for its disk. The segment servers can process every query in a fully parallel manner, use all disk connections simultaneously, and efficiently flow data between segments as query plans dictates. The degree of parallelism and overall scalability that this allows far exceeds general-purpose database systems. The Greenplum Query Plans are explained. The optimization procedures in Greenplum are explained. EXPLAIN and EXPLAIN ANALYZE query plans were chosen. By way of an example, the query

plans are provided below in Table 3 to fetch Alzheimer's disease with respective p-values.
EXPLAIN SELECT * FROM sequence WHERE sequences = 'Alzheimer1; QUERY PLAN
Gather Motion 2:1 (slicel)(cost=0.00..20.88rows=l width=13)->Seq
Scan on ' sequence' (cost=0.00..20.88 rows=l width=13)
Filter: name::text — 'Alzheimer'::text
Table 3: Query plans to fetch Alzheimer's disease with respective p-vaiues.
[0090] The results of the scan operation are passed to a gather motion
operation. In Greenplum Database, a gather motion is when segments send rows up to the master. In present scenario, there are 2 segment instances sending to 1 master instance (2:1). The present operation is working on slice! of the parallel query execution plan. In Greenplum Database a query plan is divided into slices so that portions of the query plan can be worked on in parallel by the segments. The estimated startup cost for this plan is 00.00 that is no cost and a total cost of 20.88 disk page fetches. The planner estimates that this query will return one row.
[0091] According to another embodiment of the present invention, the key
aspect of the present invention is use of cloud computing platform in combination with Hadoop database provided HDFS system in the present system 102 enables collection and storage of medical data over wide geographies. Further, the collected data comprises wide varieties of genetics and wide varieties of medical disorders which facilitate medical practitioners with the huge combination of data as a knowledge base. Present invention thus assists doctors or medical practitioners with the present system to predict the neurological or medical disorders well in advance before they become fatal.
[0092] Referring now to Figure 3, a method 300 to facilitate prediction of
neurological disorders is shown, in accordance with an embodiment of the present

subject matter. The method 300 may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. The method 300 may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.
[0093] The order in which the method 300 is described is not intended to be
construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method 300 or alternate methods. Additionally, individual blocks may be deleted from the method 300 without departing from the spirit and scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof. However, for ease of explanation, in the embodiments described below, the method 300 may be considered to be implemented in the above described system 102.
[0094] Referring to figure 3, the method 300 to facilitate prediction of
neurological disorders is shown, in accordance with an embodiment of the present subject matter. In step 302, the data in one or more formats may be accepted and stored in one or more distributed databases wherein the data is medical data. In one implementation, the accepting and storing of the data in one or more format in one or more distributed databases may be carried by the distributed databases. In step 304, a statistical processing technique is applied on the data based on the one or more formats of the data in order to generate statistical data, wherein the statistical data comprises one or more genetic variables and neural images. In one implementation, applying a statistical processing technique on the data based on the one or more

formats of the data in order to generate statistical data may be carried out by the computational module 212. In step 306. a regression technique is applied on the statistical data to prepare correlations between the genetic variables and the neural images in order to obtain a first set of values, wherein the first set of values represents a statistical significance of the correlation. In one implementation, applying a regression technique on the statistical data to prepare correlations between the genetic variables and the neural images in order to obtain a first set of values may be carried out by the correlating module 214. In step 306, the one or more first sets of values are filtered based upon a predefined threshold interval in order to obtain a set of statistical significant values. In one implementation, filtering the one or more first sets of values based upon the predefined threshold interval in order to obtain a set of statistical significant values maybe carried by the filtering module 216. In step 308, a dataset based on the one or more first set of values, the set of statistical significant values, and prescription data is generated. In one implementation, generating the dataset based on the one or more first set of values, the set of statistical significant values, and prescription data may be carried by the combination module 218. In step 310. a supervised learned methodology is iteratively developed using the dataset to derive one or more data patterns, wherein the supervised learned methodology is a technique to identify abnormal statistical data representing a disorder, and the one or more data patterns are one or more abnormal statistical values representing a disorder. In one implementation, iteratively developing a supervised learned methodology using the dataset to derive one or more data patterns may be carried out by the machine learning module 220. In step 312, the data patterns are mapped to pre-stored neurological disorder data representing an abnormality and in step 314 one or more abnormal statistical values from the data pattern are identified to predict the neurological disorder. In one implementation, mapping the data patterns to pre-stored neurological disorder data representing an abnormality and identifying one or more abnormal statistical values from the data pattern to predict the neurological disorder may be carried by the machine learning module 220. In one implementation, the

accessing, the applying the statistical processing technique, the applying the regression technique, the filtering, the generating the dataset, the iteratively developing, the mapping and the identifying are executed by one or more processors linked to a cloud platform.
[0095] According to an exemplary embodiment the working example of the
system 102 by implementing Greenplum database with HDFS in cloud platform is provided below. The parts of the query plan in bold as shown in Table 5 show the actual timing and rows returned for each plan node. If the plan is read from bottom up. some additional information for each plan node operation can be seen. The total elapsed time took to run text query 'Alzheimer' was 19.328 milliseconds. The total time required for the text query 'Alzheimer' is explained in detail in Table 4.
EXPLAIN ANALYZE SELECT * FROM sequence WHERE sequences -'Alzheimer'; QUERY PLAN
Gather Motion 2:1 (slicel) (cost=0.00..20.88 rows=l width=13) recv: Total 1 rows with 0.305 ms to first row, 0.537 ms to end. ->Seq Scan on 'sequence' (cost-0.00..20.88 rows=l width=13) Total 1 rows (segO) with 0.255 ms to first row, 0.486 ms to end.
Filter: name::text — 'Alzheimer '::text
Query Cost (Time) : 19.328 ms elapsed
Table 4: Time required for the text query 'Alzheimer'
[0096] The sequential scan operation had only one segment (segO) that
returned rows, and it returned just 1 row. It took 0.255 milliseconds to find the first row and 0.486 to scan all rows. The gather motion operation then received 1 row (segments sending up to the master). The total elapsed time for this operation was 0.537 milliseconds. Through this optimized query plans in Greenplum database

sequences can be fetched in much faster and efficient manner than traditional databases. The fetched data predict the fMRI measures of the hippocampus (region in Brain) may help to identify individuals who will develop Alzheimer's disease in the future. The volume of the hippocampus was measured with this methodology proposed reduces the time required to compute this volume from more than a half hour to a couple of minutes. Previous methods required tracing of this structure with sequential computer programs, which restricted its application to research studies because of the substantial requirement for parallel processing of the data.
[0097J According to an exemplary embodiment, detailed comparative study,
analysis and experimental results of the implementation of the system of the present invention is explained. This section deals with the Greenplum MPP Vs Conventional database experiments. In the experiment, referring to figure 10, to compare the performance of MPP database with the Conventional database (PostgreSQL), a query was executed to read tables with varies size in MB. The number of tables to be fetched from database was also changed and performance was noted. Figure 10 show that Greenplum MPP Database outperformed the conventional database. It was observed that the time in fetching the table from the database decreased by 59.73% in case of Greenplum (optimised query plans) with Hadoop when compared to PostgreSQL.
[0098] There is a potentially serious performance problem related to
MapReduce in handling of data transfer between Map and Reduce jobs. If each of the N Map instances produces M output files, each file is destined for a different Reduce instance. These files are written to the local disk on the node executing each particular Map instance. If N is 10 and M is 50, the Map phase of the program produces 500 local files. When the Reduce phase starts, each of the 50 Reduce instances needs to read its 10 input files and must use a file-transfer protocol to "pull" each of its input files from the nodes on which the Map instances were run. With hundreds of Reduce instances running simultaneously, it is inevitable that two or

more Reduce instances will attempt to read their input files from the same map node simultaneously, inducing large numbers of disk seeks and slowing the effective disk transfer rate.
[0099] Greenplum addresses this issue by utilizing a 'parallel-everywhere'
approach to loading where data flows from one or more source systems to every node of the database without any sequential choke points. This differs from traditional "bulk loading" technologies, used by most mainstream database and MPP appliance vendors that push data from a single source, often over a single or small number of parallel channels, and result in fundamental bottlenecks and ever-increasing load times. Greenplum's approach also avoids the need for a 'loader' tier of servers, as required by some other MPP database vendors that can add significant complexity and cost while they bottleneck the bandwidth and parallelism of communication into the database. Greenplum's new MPP Scatter / Gather Streaming (SG Streaming) technology eliminates the bottlenecks associated with other approaches in data loading, enabling lightning-fast flow of data into the Greenplum Database for large-scale analytics and data warehousing. Study reveals Greenplum customers are achieving production-loading speeds of over four terabytes per hour with negligible impact on concurrent database operations.
[00100] According to an exemplary embodiment, the latency test performed on
Cloud platforms is explained. A number of tests were executed on 5 different cloud platforms - Amazon, Google, OpenStack, Salesforce.com, and Azure attempting to measure the performance of each platform. One of their conclusions is that each platform works belter for different application types. In order to test cloud performance, 4 types of tests and benchmarking 5 native applications running on 5 different Cloud platforms were selected:
Tests
Requesting a Small Object- 1x1 pixel GIF Requesting a Large Object - 2 MB image

Performing a CPU Intensive Task - 1,000,000 sine and sum operations. For Salesforce.com a 100,000 ops load was used because of a platform limitation.
[00101] According to an exemplary embodiment, referring to figure 11, the
average latency test across various Cloud platforms and the results of the tests are provided. Performing an IO Intensive Task - Querying a 500,000 rows table using a Greenplum MPP database with cleared cache for Amazon and OpenStack, a data store for Salesforce.com, and BigTable for Google. To perform the measurements, requests were sent at various time intervals from multiple worldwide locations. One of the Cloud platforms, namely Salesforce.com and Google provide PaaS services, while the other three. Amazon. OpenStack and Azure, provide IaaS services. All platforms performed well for small objects, while PaaS platforms performed better than IaaS for larger objects. Salesforce.com performed poorly for CPU intensive tasks although the test included only 10% of the number of operations used on the other platforms. Google and OpenStack were best at the IO test.
[00102] According to an exemplary embodiment, referring to figure 12, the
comparison of various spectrum sizes 'k' and the results of the tests are provided. As the Polynomial kernel mode with a degree parameter as 3 is better suited configuration for the Parallel SVM classification algorithm for the dataset considered in the present invention. Further, it is also necessary to determine the optimal spectrum size of messages in the dataset. The size of the spectrum must not be too large as it increases the computation overhead considerably as large amount of data has to be kept in the memory at the same time. Also, a bigger size of the spectrum results in having more number of normal messages coming along with disease statistical mapping in the solution phase. To get the spectrum size that has maximum accuracy, the results of three different spectrum sizes namely 3, 4 and 5 are compared and the results are shown in Figure 13.From Figure 13, it can be concluded that for the spectrum size, k = 5, the maximum average accuracy of 92.5 percent is achieved.

The datasets with spectrum size k=3 and k=4 have the accuracies as 79.5 percent and 82.8 percent, respectively.
[00103] According to an exemplary embodiment of the present invention, the
system disclosed in the present invention can be used for detecting neurological disorders such as abnormalities in the brain, spinal cord or nerves and aid doctors and specialists in the course of treatments. The data obtained as a result of processing is captured and stored in a MPP database, thereby accumulating all the records from the global area by means of cloud platform. The relevant information from this database is extracted and mapped to various treatment procedures prescribed by the specialists. The computational power delivered here is using Hadoop and its MapReduce feature. The data analysis involved can be complex for any traditional data processing applications. In this approach, a SVM learning algorithm is used, which derives pattern mapping and creates prediction models to form a part of analytics. This pattern mapping is critical in the study of mutations and analysis of rare brain diseases. Patterns derived by these technologies will be a unique paradigm in predicting models for the upcoming datasets. Being able to make predictions by looking at the data on the initial scan allows the doctor to anticipate, and therefore treat severe symptoms early on before the impact becomes fatal. The doctors and researchers can also design more informative clinical trials if they can predict the likely course of a disease. This environment when implemented using the sample dataset has shown to predict brain diseases with accuracy of 90-92 percent.

Claims:
J. A system to facilitate prediction of neurological disorders, the system comprising:
one or more distributed databases configured to store the data in one or more formats, wherein the data is a medical data:
one or more processors linked to a cloud platform accessing data stored in the distributed databases; and
one or more memories coupled to the one or more processors, wherein the one or more processors are capable of executing a plurality of modules stored in the one or more memories, the plurality of modules comprising:
a computational module configured to apply a statistical processing technique on the data based on the one or more formats of the data in order to generate statistical data, wherein the statistical data comprises one or more genetic variables and one or more neural images;
a correlating module configured to apply a regression technique on the statistical data to prepare correlations between the genetic variables and the neural images in order to obtain one or more first set of values, wherein the first set of values represents a statistical significance of the correlation;
a filtering module configured to filter the one or more first set of values based upon a predefined threshold in order to obtain a set of statistical significant values;
a combination module configured to generate a dataset based on the one or more first set of values, the set of statistical significant values, and prescription data;
a machine learning module configured to accept the dataset and iteratively develop a supervised learned methodology using the dataset to derive one or more data patterns, wherein the supervised learned methodology is a technique to identify abnormal statistical

data representing a disorder, and the one or more data patterns are one or more abnormal statistical values representing a disorder; and to map the data patterns to pre-stored neurological disorder data representing an abnormality in order to identify one or more abnormal statistical values from the data pattern to predict the neurological disorder.
2. The system of claim 1, wherein the memory further stores a recommendation module configured to recommend a solution to the neurological disorder based on the pre-stored data related to a similar neurological disorder.
3. The system of claim 1, wherein the dataset supplied to the supervised learned methodology includes a labeled dataset in a binary way with the statistical significant values and a disorder name.
4. The system of claim 1, wherein the data comprises neurological data.
5. The system of claim 1, wherein the prescription data further comprises clinical assessments, questionnaires, psychological measures, treatment recommendation from users such as practitioners and doctors.
6. The system of claim 1, wherein the statistical data is in the form of a joint neuro- imaging dataset wherein the joint neuro-imaging dataset comprises one or more genetic variables and one or more neural images.
7. The system of claim 1, wherein regression technique is applied in permutation over multiple dimensions of the statistical data.
8. The system of claim 1, wherein the one or more processors use Hadoop and its MapReduce feature for computation.
9. The system of claim 1, wherein the filtering module filters the first set of values by using Hadoop's MapReduce feature.
10. The system of claim 1, wherein the data obtained as a result of processing is captured and stored in a Massively Parallel Processing (MPP) database such as Greenplum database wherein the Greenplum database includes Hadoop Distributed File System.

11. The system of claim 1. wherein the Cloud platform is an open stack Cloud platform implements distributed file system using Hadoop Distributed File System.
12. A method to facilitate prediction of neurological disorders, the method comprises:
accepting and storing the data in one or more formats in one or more distributed databases wherein the data is medical data;
applying a statistical processing technique on the data based on the one or more formats of the data in order to generate statistical data, wherein the statistical data comprises one or more genetic variables and neural images;
applying a regression technique on the statistical data to prepare correlations between the genetic variables and the neural images in order to obtain one or more first set of values, wherein the first set of values represents a statistical significance of the correlation;
filtering the one or more first set of values based upon a predefined threshold interval in order to obtain a set of statistical significant values;
generating a dataset based on the one or more first set of values, the set of statistical significant values, and prescription data;
iteratively developing a supervised learned methodology using the dataset to derive one or more data patterns, wherein the supervised learned methodology is a technique to identify abnormal statistical data representing a disorder, and the one or more data patterns are one or more abnormal statistical values representing a disorder; and
mapping the data patterns to pre-stored neurological disorder data representing an abnormality:
identifying one or more abnormal statistical values from the data pattern to predict the neurological disorder;
wherein the accessing, the applying the statistical processing technique, the applying the regression technique, the filtering, the generating, the iteratively

developing, the mapping and the identifying are executed by one or more processors linked to a cloud platform.
13. The method of claim 12, wherein the method further comprises recommending a solution to the neurological disorder based on the pre-stored data related to a similar neurological disorder.
14. The method of claim 12, wherein the supervised learned methodology is applied on the dataset includes a labeled dataset in a binary way with the statistical significant values and a disorder name.
15. The method of claim 12, wherein the data comprises neurological data.
16. The method of claim 12. wherein the prescription data further comprises clinical assessments, questionnaires, psychological measures, treatment recommendation from users such as practitioners and doctors.
17. The method of claim 12, wherein the statistical data is in the form of a joint neuro-imaging dataset wherein the joint neuro-imaging dataset comprises one or more genetic variables and one or more neural images.
18. The method of claim 12, wherein the method is using Hadoop and its MapReduce feature for computation.
19. The method of claim 12, wherein the filtering of the one or more first set of values is carried out by using Hadoop's MapReduce feature.

Documents

Orders

Section	Controller	Decision Date

Application Documents

#	Name	Date
1	3407-MUM-2013-FORM 1(25-11-2013).pdf	2013-11-25
1	3407-MUM-2013-IntimationOfGrant11-10-2023.pdf	2023-10-11
2	3407-MUM-2013-CORRESPONDENCE(25-11-2013).pdf	2013-11-25
2	3407-MUM-2013-PatentCertificate11-10-2023.pdf	2023-10-11
3	3407-MUM-2013-Written submissions and relevant documents [29-05-2023(online)].pdf	2023-05-29
3	3407-MUM-2013-FORM 26(12-12-2013).pdf	2013-12-12
4	3407-MUM-2013-CORRESPONDENCE(12-12-2013).pdf	2013-12-12
4	3407-MUM-2013-Correspondence to notify the Controller [03-05-2023(online)].pdf	2023-05-03
5	ABSTRACT1.jpg	2018-08-11
5	3407-MUM-2013-FORM-26 [03-05-2023(online)]-1.pdf	2023-05-03
6	3407-MUM-2013-FORM-26 [03-05-2023(online)].pdf	2023-05-03
6	3407-MUM-2013-FORM 3.pdf	2018-08-11
7	3407-MUM-2013-US(14)-HearingNotice-(HearingDate-25-05-2023).pdf	2023-04-25
7	3407-MUM-2013-FORM 2.pdf	2018-08-11
8	3407-MUM-2013-FORM 2(TITLE PAGE).pdf	2018-08-11
8	3407-MUM-2013-ABSTRACT [23-03-2020(online)].pdf	2020-03-23
9	3407-MUM-2013-CLAIMS [23-03-2020(online)].pdf	2020-03-23
9	3407-MUM-2013-FORM 18.pdf	2018-08-11
10	3407-MUM-2013-COMPLETE SPECIFICATION [23-03-2020(online)].pdf	2020-03-23
10	3407-MUM-2013-FORM 1.pdf	2018-08-11
11	3407-MUM-2013-DRAWING.pdf	2018-08-11
11	3407-MUM-2013-FER_SER_REPLY [23-03-2020(online)].pdf	2020-03-23
12	3407-MUM-2013-DESCRIPTION(COMPLETE).pdf	2018-08-11
12	3407-MUM-2013-OTHERS [23-03-2020(online)].pdf	2020-03-23
13	3407-MUM-2013-CORRESPONDENCE.pdf	2018-08-11
13	3407-MUM-2013-FER.pdf	2019-09-23
14	3407-MUM-2013-ABSTRACT.pdf	2018-08-11
14	3407-MUM-2013-CLAIMS.pdf	2018-08-11
15	3407-MUM-2013-ABSTRACT.pdf	2018-08-11
15	3407-MUM-2013-CLAIMS.pdf	2018-08-11
16	3407-MUM-2013-CORRESPONDENCE.pdf	2018-08-11
16	3407-MUM-2013-FER.pdf	2019-09-23
17	3407-MUM-2013-OTHERS [23-03-2020(online)].pdf	2020-03-23
17	3407-MUM-2013-DESCRIPTION(COMPLETE).pdf	2018-08-11
18	3407-MUM-2013-DRAWING.pdf	2018-08-11
18	3407-MUM-2013-FER_SER_REPLY [23-03-2020(online)].pdf	2020-03-23
19	3407-MUM-2013-COMPLETE SPECIFICATION [23-03-2020(online)].pdf	2020-03-23
19	3407-MUM-2013-FORM 1.pdf	2018-08-11
20	3407-MUM-2013-CLAIMS [23-03-2020(online)].pdf	2020-03-23
20	3407-MUM-2013-FORM 18.pdf	2018-08-11
21	3407-MUM-2013-ABSTRACT [23-03-2020(online)].pdf	2020-03-23
21	3407-MUM-2013-FORM 2(TITLE PAGE).pdf	2018-08-11
22	3407-MUM-2013-FORM 2.pdf	2018-08-11
22	3407-MUM-2013-US(14)-HearingNotice-(HearingDate-25-05-2023).pdf	2023-04-25
23	3407-MUM-2013-FORM 3.pdf	2018-08-11
23	3407-MUM-2013-FORM-26 [03-05-2023(online)].pdf	2023-05-03
24	3407-MUM-2013-FORM-26 [03-05-2023(online)]-1.pdf	2023-05-03
24	ABSTRACT1.jpg	2018-08-11
25	3407-MUM-2013-CORRESPONDENCE(12-12-2013).pdf	2013-12-12
25	3407-MUM-2013-Correspondence to notify the Controller [03-05-2023(online)].pdf	2023-05-03
26	3407-MUM-2013-Written submissions and relevant documents [29-05-2023(online)].pdf	2023-05-29
26	3407-MUM-2013-FORM 26(12-12-2013).pdf	2013-12-12
27	3407-MUM-2013-PatentCertificate11-10-2023.pdf	2023-10-11
27	3407-MUM-2013-CORRESPONDENCE(25-11-2013).pdf	2013-11-25
28	3407-MUM-2013-IntimationOfGrant11-10-2023.pdf	2023-10-11
28	3407-MUM-2013-FORM 1(25-11-2013).pdf	2013-11-25

Search Strategy

1	search_19-09-2019.pdf
1	ssamendedAE_30-12-2022.pdf
2	search_19-09-2019.pdf
2	ssamendedAE_30-12-2022.pdf