A Computer Implemented System And Method For Optimal Lossy Compression

A Computer Implemented System And Method For Optimal Lossy Compression Of Sensor Data

Abstract: The embodiment of the present invention relates to a system and method for optimal lossy compression of sensor data. This technique is sensor-agnostic i.e. independent of sensor type, and dependent on the statistical or signal processing nature of the sensor data. The optimal lossy sensor data compression technique helps to achieve high compression gain (size of original data/ size of compressed data) and optimize the information loss (standard deviation of decompressed signal/ standard deviation of original signal) while decompressing the sensor data. The system disclosed in the present disclosure adapts less rate of compression for unusual data pattern in sensor data, and for normal data pattern higher rate of compression is adapted for achieving maximum compression gain and optimizing information loss. Fig.1

Patent Information

Application #

Filing Date

24 February 2015

Publication Number

35/2016

Publication Type

INA

Invention Field

ELECTRICAL

Status

Email

dewan@rkdewanmail.com

Parent Application

Patent Number

Legal Status

Grant Date

2024-07-11

Renewal Date

Applicants

TATA CONSULTANCY SERVICES LIMITED

Nirmal Building, 9th Floor, Nariman Point, Mumbai – 400 021, Maharashtra, India.

Inventors

1. UKIL, Arijit

Innovation Lab, TCS, Kolkata

2. BANDYOPADHYAY, Soma

Innovation Lab, TCS, Kolkata

3. PAL , Arpan

Innovation Lab, TCS, Kolkata

4. SINHA , Aniruddha

Innovation Lab, TCS, Kolkata

Specification

CLIAMS:1. A computer implemented system for compression of sensor data, said system comprising:
• a system processor 10 configured for receiving predetermined set of rules and further configured to use said rules for providing processing commands to control various elements of said system;
• a repository 20 configured to store predetermined set of rules, including rules relating to segmenting of received data into predefined block size;
• an input module 30 configured to cooperate with the system processor 10 and further configured to accept sensor data;
• a detection module 40 configured to cooperate with said system processor 10 to receive commands and said input module 30 to receive said sensor data, said detection module comprising:
o a sensitive content detector 42 having an extractor configured to detect and extract segments of sensor data containing sensitive content;
o a divider 44 configured to cooperate with said sensitive content detector to receive said extracted segments of sensor data and further configured to divide said extracted segments into blocks of predefined size containing specific number of segments containing sensitive data;
o a first score assignor 46 configured to cooperate with the divider to receive said blocks and further configured to assign first importance scores to the blocks based on the frequency of their repetition ;
• an optimal size block module 50 configured to cooperate with said system processor 10, said first score assignor 46 configured to receive the first importance scores, said optimal block size module 50 comprising:
o a comparator 52 configured to compare the first importance scores of consecutive blocks to identify the consecutive blocks having the same first importance score;
o an optimal size block creator 54 configured to cooperate with the comparator 52 to receive said identified consecutive blocks and further configured to create optimal size blocks by merging the consecutive blocks;
• a second score assignor 60 configured to cooperate with said system processor 10, said optimal size block creator 54 to receive said optimal sized blocks and further configured to assign a second importance score to said optimal sized blocks;
• an optimal threshold determiner 70 configured to cooperate with said system processor 10 and said second score assignor 60 to receive the second importance scores, and further configured to determine an optimal threshold value for each of said optimal sized blocks based on the second importance score of said optimal sized blocks;
• a compression module 80 configured to cooperate with said system processor 10, said optimal size block creator 54 to receive the optimal size blocks , and optimal threshold determiner 70 to receive the optimal threshold value for each of said optimal sized blocks, and further configure to compress each of said optimal sized blocks corresponding to their optimal threshold values.
2. The system as claimed in claim 1, wherein the second importance score assigned to the optimized sized blocks is based on the first importance scores of the consecutive blocks associated with said optimal size blocks.
3. The system as claimed in claim 1, wherein the compression module 80 uses chebyshev compression technique for compressing said first set and second set.
4. A computer implemented method for compression of sensor data, said system comprising:
• receiving predetermined set of rules and further configured to use said rules for providing processing commands to control various elements of said system
• storing predetermined set of rules, including rules relating to segmenting of received data into predefined block size, predefined threshold value, predefined higher compression rate and predefined lower compression rate
• accepting sensor data;
• detecting and extracting segments of sensor data containing sensitive content
• dividing said extracted segments into blocks of predefined size containing specific number of segments containing sensitive data
• assigning first importance scores to the blocks based on the frequency of their repetition;
• comparing the first importance scores of consecutive blocks and identifying the consecutive blocks having the same first importance score;
• creating optimal size blocks by merging the consecutive blocks;
• assigning a second importance score to said optimal sized blocks;
• determining an optimal threshold value for each of said optimal sized blocks based on the second importance score of said optimal sized blocks; and
• compressing said optimal sized blocks.
5. The method as claimed in claim 4, wherein the second importance score assigned to the optimized sized blocks is based on the first importance scores of the consecutive blocks associated with said optimal size blocks.
6. The method as claimed in claim 4, wherein chebyshev compression technique is used for compressing said first set and second set. ,TagSPECI:TECHNICAL FIELD
The present disclosure relates to the field of data compression.

DEFINITIONS OF TERMS USED IN THE SPECIFICATION
The expression ‘sensitive content’ used hereinafter in this specification refers to segments of sensor data which indicate the presence of anomaly and has very less frequency of repetition. These segments of sensor data are more significant than the other segments of sensor data.
BACKGROUND
Sensor nodes are embodiment of IoT systems in microscopic level. The data collected by a network of sensors can be used to better understand the dynamics and operations of the structure or system on which these sensors are attached. As the number of sensors increases, the volume of sensor data increases exponentially. Storage, transmission and processing of such high volume data poses potential destruction in performance and scalability risk. Additionally, it is also economically inefficient. This problem further aggravates with the tiny sensors, as they have limited battery lifetime and processing power for transmission of high volume data.
In the prior art lossy compression method are used to achieve significant compression gain in processing of high volume of sensor data but due to heterogeneous nature of sensor data any single lossy compression method becomes unsuitable for compression. Additionally, prior art doesn’t provide for any compression technique which is suitable for compressing all type of sensor data types.
Therefore, there is a need for a system that limits the aforementioned drawbacks and provides lossy compression technique that supports different sensor data types.
OBJECT
An object of the present disclosure is to provide a system which maximizes the compression gain.
Another object of the present disclosure is to provide a system which optimizes the information loss during reconstruction or decompression of sensor data.
Still another object of the present disclosure is to provide a system which captures the inherent properties of large number of sensor data more accurately.
Additional object of the present disclosure is to provide a system which is sensor agnostic to support gamut of sensor data types
Other objects and advantages of the present disclosure will be more apparent from the following description when read in conjunction with the accompanying figures, which are not intended to limit the scope of the present disclosure.
SUMMARY
The present claimed subject matter envisages a system and method for optimal lossy compression of sensor data. The system comprises processor to provide processing commands, a repository to store predetermined set of rules and an input module to accept sensor data, a detection module to detect sensitive content from sensor data and assign scoring to data, an optimal block size module to assign optimal block size, as second score assignor to assign second importance score, an optimal threshold determiner to determine the optimal threshold value and a compression module to compress optimal size blocks corresponding to the optimal threshold values to achieve maximum compression gain and optimize information loss.
BRIEF DESCRIPTION OF ACCOMPANYING DRAWING
Figure 1 illustrates a schematic diagram for a computer implemented system for compression of sensor data in accordance with present disclosure.
Figure 2A and 2B illustrates a flow diagram showing the steps involved in for compression of sensor data, in accordance with present disclosure.
Figure 3A illustrates a bar chart showing the experimental comparison results of compression gain of various compression techniques without considering the optimal block size.
Figure 3B illustrates a bar chart showing the experimental comparison results of loss factor of various compression techniques without considering the optimal block size.
Figure 4A illustrates a bar chart showing the experimental comparison results of compression gain of various compression techniques with the optimal block size.
Figure 4B illustrates a bar chart showing the experimental comparison results of loss factor of various compression techniques with the optimal block size.
Figure 5A illustrates a bar chart showing the experimental comparison results of compression gain of various compression techniques with the optimal block size for heterogeneous sensor data.
Figure 5B illustrates a bar chart showing the experimental comparison results of loss factor of various compression techniques with the optimal block size for heterogeneous sensor data.
Figure 6 illustrates a graph showing the mean compression gain and loss factor with heterogeneous sensor data, showing high (+ve) compression gain and –ve loss factor.
Figure 7A illustrates a bar chart showing the experimental comparison results of compression gain of various compression techniques with the optimal block size for quasi-periodic signals like ECG.
Figure 7B illustrates a bar chart showing the experimental comparison results of loss factor of various compression techniques with the optimal block size for quasi-periodic signals like ECG.
DETAILED DESCRIPTION
A computer implemented system and method a computer implemented system and method for optimal lossy compression of sensor data, will now be described with reference to the embodiment shown in the accompanying drawing. The embodiment does not limit the scope and ambit of the disclosure. The description relates purely to the examples and preferred embodiments of the disclosed system and its suggested applications.
The system herein and the various features and advantageous details thereof are explained with reference to the non-limiting embodiments in the following description. Descriptions of well-known parameters and processing techniques are omitted so as to not unnecessarily obscure the embodiment herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiment herein may be practiced and to further enable those of skill in the art to practice the embodiment herein. Accordingly, the examples should not be construed as limiting the scope of the embodiment herein.
Referring to Figure 1, illustrates a system 100 for compression of sensor data. The system 100 comprises a system processor 10, a repository 20, an input module 30, a detection module 40, an optimal size block module 50, a second score assignor 60, an optimal threshold determiner 70 and a compression module 80.
The system processor 10 configured for receiving the predetermined set of rules and adapted to use said rules for providing processing commands to control various elements of said system 10. The processor may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor is configured to fetch and execute computer-readable instructions stored in the memory.
The repository 20 is configured to store predetermined set of rules. It also stores the rules relating to segmenting for breaking the data into predefined block sizes. The repository can include any computer-readable medium known in the art including, for example, volatile memory (e.g. RAM), and/or non-volatile memory (e.g., EPROM, flash memory, etc.).
The input module 30 configured to cooperate with the system processor 10 for receiving the processing commands and configured to accept sensor data.
The detection module 40 configured to cooperate with the system processor 10 to receive processing commands, and said input module 30 to receive said sensor data. The detection module 40 comprises a sensitive content detector 42, a divider 44 and a first score assignor 46.
The sensitive content detector 42 comprises an extractor (not shown in figure). The sensitive content detector 42 is configured to detect and extract segments of sensor data containing sensitive content. In an embodiment anomaly present in sensor data can be considered as a sensitive content. For example unusual pattern in ECG data (may be indicative of arrhythmia) is the sensitive content. The unusual high energy consumption at odd hours detected through smart meter readings is a sensitive content for the purpose of mining and knowledge discovery.
In an embodiment detection of sensitive data further comprises determining a kurtosis value corresponding to the time-series data and comparing the kurtosis value with a reference value. The process of detecting further comprises determining a data distribution of the time-series data (sensor data) based upon the comparison. The data distribution comprises one of a platykurtic distribution, a mesokurtic distribution, and a leptokurtic distribution. Further the time-series data is processed using a first filtering means or a second filtering means. The first filtering means is used when the data distribution of the time-series data is either of the platykurtic distribution or the mesokurtic distribution. The second filtering means is used when the data distribution of the time-series data is the leptokurtic distribution.

The divider 44 configured to cooperate with said sensitive content detector 42 to receive the extracted segments of sensor data containing sensitive content. The divider 44 is configured to divide the extracted segments of data using predefined set of rules into blocks of predefined size wherein each block contains the specific number of segments wherein each segment contains sensitive data. In an embodiment, the predefined block size is equal to 256.
The first score assignor 46 cooperates with the divider 44 to receive the blocks of predefined size containing sensitive data. The first score assignor 46 is configured to assign first importance score to the blocks based on the frequency of their repetition.
The optimal size block module 50 cooperates with the system processor 10 to receive system processing commands, the first score assignor 46 to receive the first importance score for each of the predefined sized blocks. The optimal block size module 50 comprises a comparator 52 and an optimal size block creator 54.
The comparator 52 is configured to compare the first importance score of consecutive blocks to identify the consecutive blocks having the same first importance score.
The optimal size block creator 54 configured to cooperate with the comparator 52 to receive the identified consecutive blocks and further configured to create optimal size blocks by merging the said identified consecutive blocks to obtain optimal sized blocks.
In the present claimed subject matter, chebyshev approximation technique is used for the compression of sensor data. The maximum error constraint is violated in Chebyshev compression is due to fixed pre-assigned block size B. Therefore, in this present claimed subject matter dynamic block size is adapted with the help of the optimal size block module 50.
In an embodiment, optimal size block B_o is assigned based on the first importance score differential (Δϕ_(ε )) as:
B_o= {■(recursion (B_init=2.B_init ), Δϕ_(ε )≠0 @B_init else)┤
Where, B_init is the block of predefine size =256 and Δϕ_(ε )= |ϕ_(|B| )- ϕ_(|B|+1 ) |, where ϕ_(|B| ) is the first importance score of Bth block and ϕ_(|B|+1 ) is the first importance score of B+1-th block. The inherent effect of larger block size is to enhance compression gain as well as decrease the information loss. Whenever, differential of first importance score is non-zero for consecutive blocks, larger block size is considered until differential importance score is zero or data end is reached.
The second score assignor 60 configured to cooperate with the optimal size block creator 54 to receive the optimal sized blocks. The second score assignor 60 further configured to assign the second importance score to the optimal sized blocks.
In an embodiment second importance score assigned to the optimized sizes blocks is based on the average of first importance score of the consecutive blocks associated with said optimal sized blocks.
The optimal threshold determiner 70 configured to cooperate with the system processor 10 to receive processing commands and the second score assignor 60 to receive the second importance score for each of said optimal sized blocks.
The optimal threshold determiner 70 further configured to determine an optimal threshold value for each of said optimal sized blocks based on the second importance score of said optimal sized blocks.
The optimal threshold value act as a clipping parameter. More optimal threshold value results in more compression gain as well as more information loss. Optimal threshold value determination is required for compression gain information loss trade-off. In an embodiment optimal threshold determiner 70 uses this equation for determining the optimal threshold value (Γ):
Γ= c.2^((ϕ_(MAX )- ϕ_(ε∈B_o ) ) ); c=ceil(F), where F (Fano factor) = σ^2/ε ̅
The compression module 80 cooperates with the system processor 20 to receive processing commands, the optimal size block creator 54 to receive the optimal sized blocks and the optimal compression rate determiner module 70 to receive the optimal compression rate. The compression module 80 is configured to compress the optimal sized blocks containing segments of sensor data.
In an embodiment the compression module 80 uses the non-linear model of chebyshev approximation technique for the compression of sensitive data. The nonlinear model of chebyshev approximation capture the inherent properties of large number of sensor data more accurately. Fixed block size is the only demerit of chebyshev compression technique over its maximum error bound feature. The preset disclosure has overcome this problem by introducing dynamic block sizes.
In another embodiment compressed data ε_δ is represented as linear combination of Chebyshev polynomials: φ(i)= ∑_(i=0)^(i=N)▒〖α_i.η_ϑ (i)〗, where (predefined) block size B consisting of N segments,
ϑ= (i- (N+1)/2). 2/(N-1), normalized to [-1, 1], simply, cosine look-up table.
α_i is the Chebyshev polynomial co-efficient at degree i.
For a defined optimal threshold Γ : ε_δ= {■(φ(i),φ(i) ≥ Γ@0, else)┤ (1)
Quantization (digitization) is done for non-zero ε_δ for storage, transmission purpose.
Referring to Figure 2A and 2B, illustrates a flow diagram 200 showing the steps involved in for compression of sensor data.
In step 202, predetermined set of rules are received at system processor 10 and these rules are adapted for providing processing commands to control various elements of the system;
In step 204, predetermined set of rules stored in the repository 20, wherein these rules includes the rules relating to segmenting of received data into predefined block sizes;
In step 206, sensor data is accepted by the input module.
In step 208, segments of sensor data containing sensitive content is detected and extracted.
In step 210, sensitive data is divided into blocks of predefined size using rules relating to segmenting of received data into predefined block sizes .
In step 212, first importance score is assigned to the blocks of predefined size based on the frequency of their repetition.
In step 214, first importance score is compared for consecutive blocks and the blocks having same first importance score is identified.
In step 216, optimal size blocks are created by merging the identified consecutive blocks.
In step 218, second importance score is assigned to each of said optimal sized blocks. In an embodiment second importance score assigned to the optimized sized blocks based on the first importance score of the consecutive blocks associated with said optimal size blocks.
In step 220, optimal threshold value for each of said optimal sized blocks is determined based on the second importance score of the optimal sized blocks.
In step 222, optimal sized blocks are compressed using chebyshev approximation technique.
Referring to figure 3A, illustrating a bar chart showing the comparison of compression gain of various compression techniques without considering the optimal block size. It is clear from the figure 3A information aware (sensitive content detection) has better compression gain than standard chebyshev and Lampel-ziv technique.
In an embodiment, five independent household smart meter data is chosen for the set of experimentation. All the experiments are done with publicly available sensor datasets like REDD, BLUED and Physionet.
Referring to figure 3B, illustrating a bar chart showing the comparison of o loss factor % of various compression techniques without considering the optimal block size. It is clear from the figure 3B information aware (sensitive content detection) has better loss factor % than standard chebyshev technique.
In an embodiment, five independent household smart meter data is chosen for the set of experimentation. All the experiments are done with publicly available sensor datasets like REDD, BLUED and Physionet.
Referring to figure 4A, illustrating a bar chart showing the comparison of compression gain of various compression techniques with the optimal block size. It is clear from the figure 4A information aware (sensitive content detection) with optimal block size has better compression gain than standard chebyshev, information aware (sensitive content detection) with fixed block size and Lampel-ziv technique.
In an embodiment, five independent household smart meter data is chosen for the set of experimentation. All the experiments are done with publicly available sensor datasets like REDD, BLUED and Physionet.
Referring to figure 4B, illustrating a bar chart showing the comparison of o loss factor % of various compression techniques with the optimal block size. It is clear from the figure 4B information aware (sensitive content detection) with optimal block size has almost loss factor % than standard chebyshev technique.
In an embodiment, five independent household smart meter data is chosen for the set of experimentation. All the experiments are done with publicly available sensor datasets like REDD, BLUED and Physionet.
Referring to figure 5A illustrates a bar chart showing the experimental comparison results of compression gain of various compression techniques with the optimal block size for heterogeneous sensor data.
Referring to figure 5B illustrates a bar chart showing the experimental comparison results of loss factor of various compression techniques with the optimal block size for heterogeneous sensor data.
From to figure 5A and 5B it’s clear that information aware (sensitive content detection) optimal block is showing poor performance.
Referring to figure 6 illustrates a graph showing the mean compression gain and loss factor with heterogeneous sensor data, showing high (+ve) compression gain and –ve loss factor.
Referring to Figure 7A illustrates a bar chart showing the experimental comparison results of compression gain of various compression techniques with the optimal block size for quasi-periodic signals like ECG.
Referring to Figure 7B illustrates a bar chart showing the experimental comparison results of loss factor of various compression techniques with the optimal block size for quasi-periodic signals like ECG.
In an embodiment, it is experimentally observed that sensor with quasi-periodic components do not conform to the high performance gain of IA Optimal block. It is analyzed that the importance score determination is not apt, more precisely outlier detection technique for quasi-periodic signals are prone to high false positive errors (swamping effect). It is considered that adaptive window-based discord discovery (AWDD) as the outlier detection technique out for quasi-periodic signals like ECG and rest of the analysis remains same. The subsequence s of length n of time-series ε_t is discord in ε_t if S has the largest distance to its nearest non-self match, i.e., ∀subsequence S of ε_t non-self match λ_sof s, and non-self match λ_(s*) of s^*, min⁡(Dist(λ_s, s)> min⁡(Dist(λ_(s*),s^* ). Instead of Eucladian distance, Mahalanobis distance is used as distance metric. Discords at dynamic time-window are considered as outliers. The figure 7A and 7B show the significant improvement of compression performance.
TECHNICAL ADVANCEMENTS
The technical advancements of the system envisaged by the present disclosure include the realization of:
a system which maximizes the compression gain.
a system which optimizes the information loss during reconstruction or decompression of sensor data.
a system which captures the inherent properties of large number of sensor data more accurately.
a system which is sensor agnostic to support gamut of sensor data types.
a system that does not require physical modeling of sensor for sensor lossy compression performance enhancement.
The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the embodiments as described herein.

Documents

Application Documents

#	Name	Date
1	606-MUM-2015-FORM 1-(31-03-2015).pdf	2015-03-31
2	606-MUM-2015-CORRESPONDENCE-(31-03-2015).pdf	2015-03-31
3	606-MUM-2015-GENERAL POWER OF ATTORNEY(26-05-2015).pdf	2015-05-26
4	606-MUM-2015-CORRESPONDENCE(26-05-2015).pdf	2015-05-26
5	Other Patent Document [01-09-2016(online)].pdf	2016-09-01
6	FORM 3.pdf	2018-08-11
7	Drawings V 24.02.15.pdf	2018-08-11
8	CS_Draft V 24 02 15 clean.pdf	2018-08-11
9	abs.pdf	2018-08-11
10	606-MUM-2015-FER.pdf	2019-09-30
11	606-MUM-2015-OTHERS [15-11-2019(online)].pdf	2019-11-15
12	606-MUM-2015-FER_SER_REPLY [15-11-2019(online)].pdf	2019-11-15
13	606-MUM-2015-DRAWING [15-11-2019(online)].pdf	2019-11-15
14	606-MUM-2015-ABSTRACT [15-11-2019(online)].pdf	2019-11-15
15	606-MUM-2015-Response to office action [15-09-2020(online)].pdf	2020-09-15
16	606-MUM-2015-US(14)-HearingNotice-(HearingDate-01-12-2023).pdf	2023-10-27
17	606-MUM-2015-FORM-26 [29-11-2023(online)].pdf	2023-11-29
18	606-MUM-2015-Correspondence to notify the Controller [29-11-2023(online)].pdf	2023-11-29
19	606-MUM-2015-REQUEST FOR ADJOURNMENT OF HEARING UNDER RULE 129A [01-12-2023(online)].pdf	2023-12-01
20	606-MUM-2015-PETITION UNDER RULE 137 [01-12-2023(online)].pdf	2023-12-01
21	606-MUM-2015-US(14)-ExtendedHearingNotice-(HearingDate-15-01-2024).pdf	2023-12-15
22	606-MUM-2015-Correspondence to notify the Controller [11-01-2024(online)].pdf	2024-01-11
23	606-MUM-2015-Written submissions and relevant documents [30-01-2024(online)].pdf	2024-01-30
24	606-MUM-2015-FORM-26 [31-01-2024(online)].pdf	2024-01-31
25	606-MUM-2015-PatentCertificate11-07-2024.pdf	2024-07-11
26	606-MUM-2015-IntimationOfGrant11-07-2024.pdf	2024-07-11

Search Strategy

1	searchstrategy_26-07-2019.pdf
2	2019-07-2617-46-22_26-07-2019.pdf