Abstract: ABSTRACT SYSTEMS AND METHODS FOR A MULTIMODAL SPATIOTEMPORAL AIR QUALITY NETWORK The invention consists of an air quality monitoring network which uses multimodal approach and multiple feature engineering techniques and a machine learning architecture to give air quality data with high spatial coverage.
DESC:FIELD OF THE INVENTION:
This invention relates to the field of environmental engineering and sensors.
Particularly, this invention relates to monitoring systems and networks.
Specifically, this invention relates to systems and methods for a multimodal spatiotemporal air quality network.
BACKGROUND OF THE INVENTION:
According to data, a country like India has, only, around 300 active air quality stations which is not sufficient enough for a country of its size. There have been multiple ways to measure air quality including on ground stations (reference grade or low-cost sensors) and satellite sources. Due to the difference in methods used to collect the data and the frequency of updating, although these sources might show similar trends many times, there is a need for synchronization between the station and satellite data. Using the knowledge of satellite trends and station data can help in estimating air quality values for a larger set of locations than it was previously possible with right ways to synchronize both sources.
OBJECTS OF THE INVENTION:
An object of the invention is to provide air quality data with high spatial coverage.
Yet another object of the invention is to use to a multimodal approach in order to provide air quality data.
Still another object of the invention is to provide air quality data with high spatiotemporal coverage.
SUMMARY OF THE INVENTION:
According to this invention, there are provided systems and methods for a multimodal spatiotemporal air quality network. The invention consists of an air quality monitoring network consisting of a physical sensor network which uses knowledge from various satellite and geospatial sources and applies techniques such as calibration, error correction, spatial correlation and machine learning to expand coverage of the network and give air quality data to an expanded set of locations.
In at least an embodiment, this invention’s system and method provides a multimodal approach that combines a Central Node / Server / Controller which collects and processes data, a collection of sensor nodes connected to the central node, satellite Air Quality Data collected by the central node for a predefined patch or bounding box, Spatial Locations of Pollutant Sources (Industries, Roads, Airports, Ports, Bus Stations, Railway Stations, Office Areas, Disaster Events etc)
In at least an embodiment, this invention’s system and method provides feature engineering pipeline architecture consisting of various derived spatial features
- haversine distance between stations and satellites, direction from stations to satellites and vice versa, haversine distance between station or satellite points to pollution sources, direction from station or satellite points to pollution sources and spatial correlation.
In at least an embodiment, this invention’s system and method uses various derived temporal features such as latest difference, 10 days mean difference and 5 days exponentially weighted moving average for the same hour between station and satellite sources.
In at least an embodiment, this invention’s system and method uses various weather parameters like temperature, humidity, wind speed, wind direction, visibility, precipitation probability, pressure and elevation.
In at least an embodiment, this invention’s system and method provides a machine learning architecture consisting of 3 models trained using spatial, temporal and weather features and a metal model trained using the output of the 3 models (first model using spatial features, second model using temporal features, third model using meteorological features).
In at least an embodiment, this invention’s system and method provides a hardware and software integrated network which uses a multimodal approach and algorithm to give air quality data with high spatial coverage.
According to this invention, there is provided a system for a multimodal spatiotemporal air quality network, said system consisting, essentially, of an air quality monitoring network using multimodal approach, multiple feature engineering techniques, and a machine learning architecture to give air quality data with high spatial coverage, in order to expand coverage of a defined sensor network to provide air quality data to an expanded set of locations beyond the defined sensor network, said system comprising:
- a network of sensor nodes (SSN), deployed over a region of interest, to provide a defined sensor network (SSN), the defined sensor network (SSN) comprising sources, selected from a group of sources, correlative to sensing, wherein each sensor node is configured to:
o measure environmental parameters including particulate matter data and gaseous pollutants data,
o measure meteorological parameters;
o transmit the measured environmental parameters to the central server node (CSN);
- a network of satellite data input nodes (SLN) configured to input satellite air quality data, weather data, and historical air quality data, for the region of interest;
- a geospatial module, configured with a network of location-specific nodes to input spatial locations of pollutant sources in order to obtain geospatial data per location-specific sensor node;
- a demographic module configured to poll demographic data per location-specific sensor node;
- a central server node (CSN) being communicably coupled to said network of sensor nodes (SSN), said network of satellite data input nodes (SLN), said geospatial module, and said demographic module, said central server node (CSN) being configured to collect and process air quality data, the central server node (CSN) comprising:
o a processor configured for anomaly detection, outlier detection, active error correction, and data filtering;
o a pre-processor configured to compute spatial features, temporal features, and meteorological features.
- said central server node (CSN) configured to output pollutant data in the region of interest based on said computed spatial features, said computed temporal features, and said computed meteorological features
In at least an embodiment, said central server node (CSN) utilizes machine learning algorithms to:
- calibrate and error-correct data from sensor nodes using:
o tracking anomalies and anomaly frequencies basis defined thresholds and basis total number of data points in a defined time window;
o shutdown criteria, to shutdown faulty sensor, involving defined thresholds;
- spatially correlate data to expand network coverage using:
o measuring straight-line distance between two points on Earth's surface, considering its curvature;
o correlating data from two sensors to obtain correlation data using sensor’s average values and deviations, in order to obtain a correlation score indicating strength and direction of relationship between sensor data at different locations;
o predicting air quality data basis physical distance between sensors and correlative correlation score;
- train a meta model using said computed spatial features, said computed temporal features, and said computed meteorological features to provide hyperlocal air quality and weather parameters, in order to determine priorities for said temporal features, said spatial features, said meteorological features, said training steps comprising the steps of:
o for temporal features,
? creating a buffer area around each sensor to capture external influence data on each sensor node;
? mapping each sensor node to satellite data by identifying closest satellite data point for each sensor node;
? extracting temporal feature by computing time-based factors correlative to sensor data;
? training a model using said extracted temporal features, sensor data, and satellite data, in order to obtain air quality values based on temporal factors;
o for spatial features,
? creating a buffer area around each sensor to capture external influence data on each sensor node;
? mapping each sensor node to satellite data by identifying closest satellite data point for each sensor node to form pairs of sensors and satellites;
? determining a first set of distances and directions between each pair of sensor and satellite;
? mapping sensors to nearest spatial features;
? determining a second set of distances and directions between each pair of sensor and satellite using spatially mapped sensors;
? extracting spatial correlations between each of said sensor – satellite pairs;
? training a model using said extracted spatial features, spatial correlations, sensor data, and satellite data, in order to obtain air quality values based on spatial factors;
o for meteorological features,
? creating a buffer area around each sensor to capture external influence data on each sensor node;
? mapping each sensor node to satellite data by identifying closest satellite data point for each sensor node to form pairs of sensors and satellites;
? collecting weather features, from satellites, in terms of pre-defined parameters;
? determining elevation of each sensor node;
? training a model using said extracted meteorological features, meteorological correlations, sensor data, and satellite data, in order to obtain air quality values based on meteorological factors.
In at least an embodiment, said network of sensor nodes (SSN) being configured with threshold values and parameters, each of them being:
- particulate matter (304) (PM1, PM2.5, PM10) parameter detectable in the range of 0-500ug/m3 for PM1, 0-1000ug/m3 for PM2.5, and 0-1000ug/m3 for PM10;
- gaseous pollutant (305, 203) (SO2, NO2, CO, O3, CO2) parameter detectable in the range of 0-20 PPM for SO2, 0-20 PPM for NO2, 0-15 PPM for O3, 0-20 PPM for CO; and
- weather parameters (306) detectable in the range of -40-80 degC for temperature, 0-100% for relative humidity, 300-1100hPa for barometric pressure, 0-40m/s for wind speed, 0-360 degree for wind direction, 0-9999 mm/hr for rainfall.
In at least an embodiment, said central server node’s (CSN) said processor configured with instructions for anomaly detection, outlier detection, active error correction, and data filtering, said instructions for anomaly detection being:
- determining, and discarding, logically inconsistent data points; and
- defining rules, with threshold values, for pollutant data, in order to retain coherent pollutant data subscribing to said defined rules.
In at least an embodiment, said central server node’s (CSN) said processor configured with instructions for outlier detection, outlier detection, active error correction, and data filtering, said instructions for outlier detection being:
- determining, and discarding, outliers by clustering data points and deviants, outside of pre-determined threshold values, from the clustered data points as being outliers by:
o deploying multiple sensors, as a cluster, in close proximity to cross-validate sensor data readings;
o determining discrepancies amongst various sensors in a given cluster;
o standardizing sensor data, in said given cluster, to ensure that all features contribute equally to distance calculations;
o applying Density-Based Spatial Clustering of Applications with Noise algorithms to identify sub-clusters and outliers in said standardized sensor data;
o applying redundancy operations and cross-validation operations, along with discrepancy data, to sensor data, for error corrections in said sensor data;
In at least an embodiment, said central server node’s (CSN) said processor configured with instructions for anomaly detection, outlier detection, active error correction, and data filtering, said instructions for active error correction being:
- identifying and correcting sensor data in real time using redundancy operations and cross-validation operations.
In at least an embodiment, said central server node’s (CSN) said processor configured with instructions for anomaly detection, outlier detection, active error correction, and data filtering, said instructions for data filtering being:
- employing moving average techniques to smooth out short-term fluctuations and highlight longer-term trends, characterized, in that:
o employing a moving average technique with a window size of 8 hours to smooth out short-term fluctuations and highlight longer-term trends; and
o applying a weighted factor that increases quadratically with each hour, with the latest hour having highest weight.
In at least an embodiment, said central server node’s (CSN) said pre-processor with a spatial module for computing spatial features, is configured with instructions to perform the steps of:
- requesting for geo-location of said sensor for a specific time unit;
- determining distance from each satellite point, from the network of satellite data input nodes (SLN), to at least a known sensor node within a determined radius of the region of interest;
- determining direction towards each known sensor node from each of said satellite points within the determined radius;
- determining distance from each point to a spatial pollutant source;
- determining spatial correlation between known sensor nodes and satellite for that determined time unit.
In at least an embodiment, said central server node’s (CSN) said pre-processor with a temporal module for computing temporal features, is configured with instructions to perform the steps of:
- requesting for geo-location of said sensor for a specific time unit;
- determining latest difference in value of pollutant for satellite data and nearest known sensor data within a determined radius of the region of interest;
- determining mean difference over number of pre-determined past days between satellite data and known sensor data within the determined radius;
- recording temporal data;
- computing exponentially weighted moving average (EWMA) over number of pre-determined past days of differences for the same time unit in order to achieve temporal difference.
In at least an embodiment, said central server node’s (CSN) said pre-processor with a meteorological module for computing meteorological features, is configured with instructions to perform the steps of:
- recording wind parameters;
- recording visibility parameters;
- recording elevation parameters;
- recording humidity parameters;
- recording pressure parameters; and
- recording precipitation probability.
In at least an embodiment, said central server node’s (CSN) said pre-processor with a spatial module for computing spatial features, is configured with instructions to perform the steps of:
- determining a region of interest;
- mapping each sensor to its nearest satellite point;
- extracting coordinates of all satellite points in order to provide geolocation;
- calculating distance for each pair formed by at least a sensor and an associated satellite point, using Haversine distance function, in order to provide a first distance data;
- mapping each sensor to its nearest spatial feature;
- calculating distance for each pair formed by at least a sensor and an associated spatial feature using Haversine distance function, in order to provide a second distance data;
- deriving spatial correlations for each pair formed by at least a sensor and an associated satellite point in order to provide spatial correlation data;
- determining pollutant values from said at least a sensor;
- determining pollutant values from said at least a satellite point;
- determining direction of said sensor;
- determining direction of said satellite point;
- training said spatial module using first distance data, direction of said sensor, second distance data, direction of said satellite point, spatial correlation data, geolocation data, determined pollutant values from said at least a sensor, and determining pollutant values from said at least a satellite point – in order to obtain trained spatial data; and
- predicting pollutant data, in said region of interest, using said trained data in order to obtain predicted spatial features.
In at least an embodiment, said central server node’s (CSN) said pre-processor with a temporal module for computing temporal features, is configured with instructions to perform the steps of:
- determining a region of interest;
- mapping each sensor to its nearest satellite point;
- extracting elevation for each sensor from satellite sources in order to provide elevation data;
- determining pollutant values from said at least a sensor;
- determining pollutant values from said at least a satellite point;
- training said temporal module using determined pollutant values from said at least a sensor, and determining pollutant values from said at least a satellite point, extracted elevation data – in order to obtain trained temporal data; and
- predicting pollutant data, in said region of interest, using said trained data in order to obtain predicted temporal features.
In at least an embodiment, said central server node’s (CSN) said pre-processor with a meteorological module for computing meteorological features, is configured with instructions to perform the steps of:
- determining a region of interest;
- mapping each sensor to its nearest satellite point;
- determining meteorological data;
- extracting elevation for each sensor from satellite sources in order to provide elevation data;
- determining pollutant values from said at least a sensor;
- determining pollutant values from said at least a satellite point;
- training said meteorological module using determined pollutant values from said at least a sensor, and determining pollutant values from said at least a satellite point, determined meteorological data, extracted elevation data – in order to obtain trained meteorological data; and
- predicting pollutant data, in said region of interest, using said trained data in order to obtain predicted meteorological features.
BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS:
This invention will now be described in relation to the accompanying drawings, in which:
FIGURE 1 illustrates an example system consisting of a central server node (CSN), a network of sensor nodes (big square boxes) (SSN), and a network of satellite nodes (small square boxes) (SLN);
FIGURE 2 illustrates an overall high-level architecture diagram of this system;
FIGURE 3A illustrates a schematic block diagram for a processing gateway;
FIGURE 3B illustrates a schematic block diagram for a multimodal sensor node;
FIGURE 4 illustrates a flowchart for computation of distances;
FIGURE 5 illustrates a flowchart for computation of directions;
FIGURE 6 illustrates a flowchart for computation of spatial correlation;
FIGURE 7 and FIGURE 8 illustrates a flowchart for computation of temporal difference;
FIGURE 9 illustrates a flowchart for computation of Model Training for Spatial Features;
FIGURE 10 illustrates a flowchart for computation of Model Training for Temporal Features;
FIGURE 11 illustrates a flowchart for computation of Model Training for Weather Features; and
FIGURE 12 illustrates a flowchart for computation of training a meta model.
DETAILED DESCRIPTION OF THE ACCOMPANYING DRAWINGS:
According to this invention, there are provided systems and methods for a multimodal spatiotemporal air quality network. The invention consists of an air quality monitoring network which uses multimodal approach and multiple feature engineering techniques and a machine learning architecture to give air quality data with high spatial coverage.
In at least an embodiment, the system comprises:
1. a central server node (CSN) which collects and processes data;
2. a network of sensor nodes (SSN) connected to the central server node;
3. a network of satellite data input nodes (SLN) configured to input satellite air quality data to be transmitted to the central senor node - for a predefined patch or bounding box;
4. a network of location-specific nodes configured to input spatial locations of pollutant sources (industries, roads, airports, ports, bus stations, railway stations, office areas, disaster events, and the like).
In at least an embodiment, a network of sensor nodes (SSN) is established to provide a defined sensor network, the defined sensor network comprising sources selected from a group of sources consisting of satellite sources, geospatial sources, and the like. Data from a known sensor network is taken, calibrated, error-corrected, spatially correlated, applied to machine learning algorithms; in order to expand coverage of the formed sensor network to provide air quality data to an expanded set of locations beyond the defined sensor network.
In at least an embodiment, a satellite module is configured to poll satellite data in order to obtain historical air quality data, weather data, and the like data. Typically, the satellite module is a network of satellite data input nodes (SLN) configured to input satellite air quality data to be transmitted to the central senor node - for a predefined patch or bounding box.
In at least an embodiment, a geospatial module is configured to poll geospatial data in order to obtain historical air quality data, weather data, and the like data at specific locations. Typically, a network of location-specific nodes is configured to input spatial locations of pollutant sources (industries, roads, airports, ports, bus stations, railway stations, office areas, disaster events, and the like).
In at least an embodiment, a demographic module is configured to poll data such as socio-economic data of the given geography, data of human activities (construction, garbage burning, and the like) in the given geography, and the like data.
FIGURE 1 illustrates an example system consisting of a central server node (CSN), a network of sensor nodes (big square boxes) (SSN), and a network of satellite nodes (small square boxes) (SLN).
In at least an embodiment, the central server node (CSN) comprises a processor configured for the following functions:
1. Anomaly Detection and Outlier Detection are built inside sensors / controllers
2. Active error correction: Controller Node does active error correction by observing patterns in anomalies
3. Data filtering: If the frequency of anomalies is huge, the sensor data is discarded
4. The controller shuts off a sensor from the pipeline if the frequency of the same anomaly crosses threshold
Anomaly and Outlier detection steps:
Step 1: Check for Unscientific Data
Objective: Identify and filter out data points that are scientifically implausible or logically inconsistent before they are further processed.
Negative Pollutant Values:
Definition: Negative values for pollutant concentrations are physically impossible since concentrations cannot be less than zero.
Implementation: Set a rule in the data processing pipeline to automatically flag and discard any data points where pollutant concentrations are negative.
Inconsistent Ratios (e.g., PM2.5 > PM10):
Definition: PM2.5 represents particulate matter with a diameter less than 2.5 micrometers, while PM10 includes particles up to 10 micrometers. Thus, PM2.5 should logically never exceed PM10.
Implementation: Add a condition to the data validation step to flag and correct any readings where PM2.5 values exceed PM10 values.
According to non-limiting exemplary embodiments, rules could be:
• PM2.5 cannot be larger than PM10
• A pollutant value cannot be negative
• PM values being zero might indicate error
• If station is outputting same value for a large duration of time
• If a station is outputting -999 or +999 for more than an hour
Outlier detection:
Step 2: Use Unsupervised Clustering Algorithms
Objective: Identify anomalies/outliers by clustering data points and detecting those that do not fit well within any cluster.
Unsupervised Clustering with DBSCAN:
Algorithm Overview: DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a clustering algorithm that groups together points that are closely packed together, marking points that are in low-density regions as outliers.
Advantages: It can find clusters of arbitrary shape and is effective in identifying outliers that do not belong to any cluster.
Implementation:
Step 1: Data Preparation: Standardize the data to ensure that all features contribute equally to the distance calculations.
Step 2: Apply DBSCAN: Run the DBSCAN algorithm on the standardized data to identify clusters and outliers.
Step 3: Analyze Results: Points labeled as noise by DBSCAN are considered anomalies/outliers.
Active error correction involves identifying and correcting errors in sensor readings in real-time. This can be achieved through redundancy and Cross-Validation:
Deploy multiple sensors in close proximity to cross-validate readings. Discrepancies among readings can be used to identify and correct errors.
Example: If three sensors in the same area report 50, 52, and 150, the reading of 150 is likely erroneous.
Data Filtering
Data filtering aims to remove noise and irrelevant information from sensor data to improve the quality and reliability of the analysis. Using moving average techniques, we smooth out short-term fluctuations and highlight longer-term trends. Steps include:
(i) employing a moving average technique with a window size of 8 hours to smooth out short-term fluctuations and highlight longer-term trends;
(ii) applying a weighted factor that increases quadratically with each hour, with the latest hour having highest weight. For example, for latest hour, a weight of 0.9, the second latest hour having a weight of 0.81, and so on, for a total of 8 hours
Data Discarding
• Tracking Anomalies:
Each sensor's data is continuously monitored for anomalies.
A record is maintained of the number of anomalies detected for each sensor over a specific time period.
• Frequency Calculation:
Calculate the anomaly frequency as the number of anomalies divided by the total number of data points within a given time window.
If this frequency exceeds a predefined threshold, the data from the sensor is flagged for discarding.
• Data Discarding Mechanism:
The controller node evaluates the anomaly frequency for each sensor periodically.
If a sensor’s anomaly frequency exceeds the threshold, all data from that sensor within the current time window is discarded.
• Updating Records:
Maintain a rolling window to keep track of recent anomaly frequencies.
Use these records to make real-time decisions about data quality.
• Sensor Shut Off Implementation
Track Anomaly Types:
Maintain a record of the types of anomalies reported by each sensor.
Track the frequency of each anomaly type for every sensor.
• Define Shutdown Criteria:
Set thresholds for the frequency of each anomaly type. If the frequency of a specific anomaly type exceeds the threshold, the sensor is flagged for shutdown.
• Monitor and Evaluate Anomaly Types:
Continuously monitor sensors and evaluate the types and frequencies of anomalies reported.
If a sensor repeatedly reports the same type of anomaly beyond the acceptable threshold, initiate a shutdown process.
• Shutdown Mechanism:
The controller node sends a command to deactivate the faulty sensor, removing it from the data collection pipeline.
The sensor is marked for maintenance and inspection.
In at least an embodiment, the central server node (CSN) comprises a pre-processor for computing spatial features and temporal features.
In at least an embodiment, the pre-processor configured for computing spatial features is configured with the following instructions:
1. Request is made for geo-location [latitude longitude] for a time ‘t’
2. Distance from each satellite point to stations (inside a certain radius)
3. Direction towards each station from satellite points (inside a certain radius)
4. Distance from each point to a spatial pollutant source (industries, roads, airports, residences, disaster events) feature
5. Distance from each point to a spatial pollutant source (industries, roads, airports, residences, disaster events) feature
6. Spatial correlation between stations and satellite for that hour
In at least an embodiment, the pre-processor configured for computing temporal features is configured with the following instructions:
1. Determining latest difference in value for satellite and nearest station data
2. Determining mean difference in the past 10 days between satellite and station data
3. Recording Time of the day, day of the year, month
4. Exponentially weighted moving average (ewm) of past 5 days of differences for the same hour
In at least an embodiment, the pre-processor configured for computing spatial features is configured with the following instructions:
Step 1: Creating Buffer Area
For each sensor "x", we create a buffer area "A" with a certain radius "r". This means we're creating a circular area around each sensor, with the radius being some distance (in the same unit as latitude and longitude).
Step 2: Mapping Sensors to Nearest Satellite Points
The system maps each sensor "x" to its nearest satellite point "y".
Step 3: Deriving Distances and Directions
For each pair of sensor "x" and satellite point "y", we:
Calculate the distance between them using a function called haversine (which gives us the shortest distance between two points on a sphere, like Earth).
Calculate the direction from "x" to "y".
Step 4: Mapping Sensors to Nearest Spatial Features
The system maps each sensor "x" to its nearest spatial feature "z".
Step 5: Deriving Distances and Directions Again
For each pair of sensor "x" and spatial feature "z",:
• Calculate the distance between them using haversine again
• Calculate the direction from "x" to "z"
Step 6: Deriving Spatial Correlations
The system calculates the spatial correlation (a measure of how similar two sets of data are) between each sensor "x" and its nearest satellite point "y".
Step 7: Training Model
The system trains a model "m_spatial" using all these inputs:
• Distances, directions, and correlations for each pair of sensor "x" and satellite point "y"
• Coordinates (latitude and longitude) of the satellite points "y"
• Values of the pollutant from the satellite points "y"
• Values of the pollutant from the sensors "x"
What comes out as output:
A trained model "m_spatial", which we'll use to predict the pollutant levels for each sensor location "x".
In at least an embodiment, the pre-processor configured for computing meteorological features, from the network of satellite data input nodes (SLN), is configured with the following parameter recording modules:
1. Wind direction
2. Wind speed
3. Visibility
4. Elevation
5. Humidity
6. Pressure
7. Precipitation Probability
A ‘station’ is basically a sensor node which is static and deployed in one particular location.
The aforementioned three sets of features (spatial features, temporal features, meteorological features) are achieved and a meta model is trained on these features to provide a final resultant output which determines hyperlocal Air Quality and Weather parameters.
In the training of meta model,
What's happening: We're training a special kind of model called a "Meta Model" that helps make predictions by combining the forecasts from other models.
Step 1: Collecting Predictions
The Meta Model is trained on the following inputs:
Predictions from three different models:
o Spatial module for computing spatial features (e.g. location-based information)
o Temporal module for computing temportal features (e.g. time-series data like sales or weather)
o Meteorological module for computing meteorological features (e.g. weather-related data)
Actual values ("x_vals") for comparison
Step 2: Training the Meta Model
The Meta Model is trained on all these inputs using a generic machine learning model (represented by the model() function, usually model such as XGBoost or Random Forest). This means the Meta Model learns to combine the predictions from the other models and actual values to make its own predictions.
What comes out as output: The trained Meta Model, which we'll call "m_meta", is returned as the output.
In at least an embodiment, the pre-processor configured for computing meteorological features is configured with the following instructions:
What's happening: The system and method trains a model to predict weather-related features (e.g., temperature, humidity) for each sensor location ("x") based on various inputs.
Step 1: Creating Buffer Area
For each sensor "x", the system and method create a buffer area "A" with a certain radius "r". This means we're creating a circular area around each sensor, with the radius being some distance (in the same unit as latitude and longitude).
Step 2: Mapping Sensors to Nearest Satellite Points
The system maps each sensor "x" to its nearest satellite point "y".
Step 3: Collecting Weather Features
For each sensor "x", we collect a list of weather features from satellite sources, which includes:
o Temperature
o Humidity
o Wind speed
o Wind direction
o Visibility
o Precipitation probability
o Pressure
The system also gets the elevation at the location of sensor "x" from satellite sources.
Step 4: Training Model
The system trains a model "m_weather" using all these inputs:
• Weather features for each sensor "x"
• Elevation for each sensor "x"
• Values of the pollutant from the satellite points "y"
This model is trained to predict weather-related features, which we'll use later.
Step 5: Predicting Weather Features
The system uses the trained model "m_weather" to predict the weather-related features ("x_pred_weather") for each sensor location "x".
What comes out as output:
The predicted weather-related features ("x_pred_weather") for each sensor location "x".
Data Discarding
Tracking Anomalies:
Each sensor's data is continuously monitored for anomalies.
A record is maintained of the number of anomalies detected for each sensor over a specific time period.
Frequency Calculation:
Calculate the anomaly frequency as the number of anomalies divided by the total number of data points within a given time window.
If this frequency exceeds a predefined threshold, the data from the sensor is flagged for discarding.
Data Discarding Mechanism:
The controller node evaluates the anomaly frequency for each sensor periodically.
If a sensor’s anomaly frequency exceeds the threshold, all data from that sensor within the current time window is discarded.
Updating Records:
Maintain a rolling window to keep track of recent anomaly frequencies.
Use these records to make real-time decisions about data quality.
Sensor Shut Off Implementation
Track Anomaly Types:
Maintain a record of the types of anomalies reported by each sensor.
Track the frequency of each anomaly type for every sensor.
Define Shutdown Criteria:
Set thresholds for the frequency of each anomaly type. If the frequency of a specific anomaly type exceeds the threshold, the sensor is flagged for shutdown.
Monitor and Evaluate Anomaly Types:
Continuously monitor sensors and evaluate the types and frequencies of anomalies reported.
If a sensor repeatedly reports the same type of anomaly beyond the acceptable threshold, initiate a shutdown process.
Shutdown Mechanism:
The controller node sends a command to deactivate the faulty sensor, removing it from the data collection pipeline.
The sensor is marked for maintenance and inspection.
Calibration: Develop regression models that adjust raw sensor readings based on known reference values (These can be government reference grade sensors). Calibration models correct systematic errors by applying adjustments derived from comparison with reference measurements.
Error Correction: Active error correction involves identifying and correcting errors in sensor readings in real-time. This can be achieved through Redundancy and Cross-Validation explained further below:
Redundancy: In this case, redundancy refers to looking at multiple stations in the same vicinity to identify bad sensor values. This involves:
• Collecting data from multiple stations that are geographically close to each other.
• Analyzing the data to look for patterns or inconsistencies between the stations.
• Identifying any station(s) with readings that deviate significantly from the others.
By comparing data from multiple stations, you can catch errors in individual sensors and get a more accurate picture of environmental conditions.
Step 1: Cross-Validation:
In this context, cross-validation refers to using leave-one-out method to predict values based on other stations and checking for huge discrepancies between predicted values and actual sensor readings. This involves:
Step 2: Removing one station's data from the dataset.
Using the remaining data to train a model that predicts the missing value (i.e., the value from the removed station).
Step 3: Comparing:
Comparing the predicted value with the actual reading from the removed station.
Repeating steps 1-3 for each sensor node in turn.
Replacing the erroneous value with predicted value.
Spatial Correlation
Objective: To improve network coverage by understanding how data from different sensor locations is related.
Steps:
Calculate Haversine Distance:
This step involves measuring the straight-line distance between two points on the Earth's surface, considering its curvature. This helps determine how far apart two sensors are from each other geographically.
Assess Spatial Correlation:
After knowing the distance, the next step is to determine how data from two sensors are related. If the readings from two sensors move together—meaning when one increases, the other does too—they have a strong positive correlation. Conversely, if one increases while the other decreases, they have a negative correlation.
The system looks at how much each sensor’s readings deviate from their average values and uses these deviations to calculate a correlation score. This score indicates the strength and direction of the relationship between the sensor data at different locations.
Distance: The system first calculates the physical distance between sensors.
Correlation: It then measures how closely the data from these sensors are related, which helps in predicting air quality in areas without sensors by using information from nearby locations.
Training the Meta Model
Objective: To combine predictions from different models to make a final prediction about air quality.
Collect Predictions: Gather the predictions from the models that focus on spatial data, temporal data, and weather data.
Train the Meta Model: Use these gathered predictions along with actual air quality values to train a new model. This model learns how to best combine the information from the three sources to improve accuracy.
Return the Meta Model: Once trained, this new model can be used to make more accurate predictions about air quality.
Model Training for Weather Features
Objective: To create a model that predicts air quality based on weather data.
Buffer Area: For each sensor, create a defined area around it to consider nearby influences on air quality.
Map Sensors to Satellite Data: Identify the closest satellite data point to each sensor.
Collect Weather Data: Gather weather information (like temperature, humidity, and wind speed) from the satellite and the elevation of the sensor’s location.
Train the Weather Model: Use the gathered weather data and the air quality values from the sensors to train a model that predicts air quality based on weather conditions.
Return Predictions: The trained model is then used to predict air quality values based on weather data.
Model Training for Temporal Features
Objective: To create a model that predicts air quality based on time-related factors.
Buffer Area: Similar to the weather model, create a defined area around each sensor to capture influences.
Map Sensors to Satellite Data: Identify the closest satellite data point for each sensor.
Extract Temporal Features: Calculate time-based factors like the difference in pollutant levels over time and averages.
Train the Temporal Model: Use the temporal features and pollutant values from both sensors and satellites to train a model for predicting air quality based on time.
Return Predictions: Use the trained model to predict air quality values based on temporal factors.
In at least an embodiment, the entire system and method of this invention comprises a modular low cost IoT enabled sensor network, the entire system is divided into 2 subsystems:
? a Central Gateway
? a set of Sensor Nodes
Air quality sensor nodes and gateway architecture are becoming increasingly popular due to the growing concern about air pollution and its impact on human health. The architecture typically consists of multiple sensor nodes that collect air quality data, which is then transmitted to a central gateway device. This gateway device is responsible for processing the data and making it available to other devices or systems for further analysis. The central gateway device is typically a more powerful device with more processing capabilities than the sensor nodes. It is responsible for receiving and processing the data from the sensor nodes, performing any necessary data filtering or pre-processing, and storing the data in a database or transmitting it to other devices or systems for further analysis.
The sensor nodes, themselves, are small, low-power devices that are designed to be deployed in a distributed manner throughout an area of interest. They typically contain a set of sensors that measure various parameters such as particulate matter (PM), carbon monoxide (CO), nitrogen dioxide (NO2), sulfur dioxidec(SO2) and ozone (O3) levels. Data from these sensors is collected and transmitted wirelessly to the gateway device, which is responsible for aggregating and processing the data.
One of the key advantages of this architecture is its flexibility. The number of sensor nodes can be scaled up or down depending on the size of the area being monitored, and the types of sensors used can be customized to suit specific monitoring requirements. Additionally, data collected by the sensor nodes can be easily integrated with other systems, such as environmental monitoring systems, to provide a more comprehensive view of the environment.
In conclusion, air quality sensor node and gateway architecture is an effective way to monitor air quality in a distributed manner. The architecture allows for flexible deployment of sensor nodes and provides a centralized platform for data processing and analysis.
FIGURE 2 illustrates an overall high-level architecture diagram of this system.
FIGURE 3A illustrates a schematic block diagram for a processing gateway.
FIGURE 3B illustrates a schematic block diagram for a multimodal sensor node.
In preferred embodiments, a sensor node is capable of measuring the following parameters:
? Particulate Matter (PM1, 2.5, 10) [referenced by reference numeral 304]
? Gaseous Pollutants (SO2, NO2, CO, O3, CO2) [referenced by reference numeral 305, 302]
? Weather Parameters (Temperature, Relative Humidity, Barometric Pressure, Wind Speed, Wind Direction, and Rainfall) [referenced by reference numeral 306]
The table 1, below, explains about working principle of the low-cost sensors that is being used in this invention’s system.
Sensor Detection Range Principle
PM1
(reference numeral 304) 0-500ug/m3 Laser Scattering
PM2.5
(reference numeral 304) 0-1000ug/m3 Laser Scattering
PM10
(reference numeral 304) 0-1000ug/m3 Laser Scattering
SO2
(reference numeral 305) 0-20 PPM Electrochemical
NO2
(reference numeral 305) 0-20 PPM Electrochemical
O3
(reference numeral 305) 0-15 PPM Electrochemical
CO
(reference numeral 305) 0-20 PPM Electrochemical
Particulate Matter
(reference numeral 304) 0-1000 ug/m3 Laser Scattering
Temperature
(reference numeral 303) -40-80 degC Metal Oxide Semiconductor type
Relative Humidity
(reference numeral 303) 0-100% Metal Oxide Semiconductor type
Barometric Pressure
(reference numeral 303) 300-1100hPa Metal Oxide Semiconductor type
Wind Speed
(reference numeral 306) 0-40m/s Ultrasonic
Wind Direction
(reference numeral 306) 0-360 degree Ultrasonic
Rainfall
(reference numeral 306) 0-9999 mm/hr Laser based
TABLE 1
Reference numeral 301 refers to timestamp, latitude, and longitude data.
In at least an embodiment, a sensor node is designed for collecting a wide range of environmental data using various sensors to monitor climate values. The following list illustrates features of such sensor nodes:
? Deep sleep MCU and sensors between data transmissions
? Two interrupt inputs could use to wake up the MCU and sensors from deep sleep
? Confirmed and unconfirmed data up messages
? Ultra-low power consumption. Under 10µA with all features and sensors
? Power input 3.5-6V:
? Battery (Li-Ion or Li-SOCl2 works fine)
? Battery with solar charger
? Sensor support
? Bosch BME280 (humidity, barometric pressure and ambient temperature)
? Plantower PMS5003
? Alphasense gas sensors
Location-wise, these sensor nodes need to meet the following site conditions:
1. Site Away from Pollution Sources or Sinks
? Building Exhausts
? Barbeque Grills
? Dusty Roads
2. Allow free air flow around the sensor Node
? Ideally 270-degree unobstructed flow at the sensor, no less than 180 degrees
3. Install about 3-6 ft above ground level
? Breathing zone height better represents exposure
4. Keep away from structures
? It must be next to building, place on upwind side
? Make sure no trees should be surrounded near the sensor
5. Look for sites that supports your needs
? WiFi/Cellular signal
? Power availability
? Tamper resistant
? Safe to install
Typically, they need to be 3 to 6 feet above ground.
Typically, they need to be away from upwind of buildings of trees.
Typically, they need to have a 270-degree air flow around them.
FIGURE 4 illustrates a flowchart for computation of distances.
In at least an embodiment, the central server node (CSN), comprising a processor, is envisaged in order to compute distances in the following manner:
For Haversine Distance Calculation - haversine(“x”,”y”),
1. Let “x” contain the coordinates (“lat1”, ”lon1”) and “y” contain the coordinates (“lat2”, “lon2”) [[STEP 402]]
a. for “coord” in [“lat1”, ”lon1”, “lat2”, “lon2”]: [[STEP 403]]
i. Convert “coord” to radians using “coord” = “coord” x pi/180 [[STEP 404]]
2. Calculate distances from each “y” to “x” using haversine distance [[STEP 405]]
3. Return the calculated distances [[STEP 406]]
FIGURE 5 illustrates a flowchart for computation of directions.
In at least an embodiment, the central server node (CSN), comprising a processor, is envisaged in order to compute directions.
For Direction Calculation - direction(“x”, “y”)
1. Let “x” contain the coordinates (“lat1”, “lon1”) and “y” contain the coordinates (“lat2”, “lon2”) [[STEP 502, STEP 503]]
a. for “coord” in [“lat1”, “lon1”, “lat2”, “lon2”]:
i. Convert “coord” to radians using “coord” = “coord” x pi/180 [[STEP 504]]
2. Calculate direction from “y” to “x” using the formula [[STEP 505]]
direction = direction x 180/pi
3. if direction<0: [[STEP 506, STEP 507]]
a. direction = direction + 360 [[STEP 508]]
4. Return the calculated directions [[STEP 509]]
FIGURE 6 illustrates a flowchart for computation of spatial correlation.
In at least an embodiment, the central server node (CSN), comprising a processor, is envisaged in order to compute spatial correlation.
For Spatial Correlation - spcorr(“x”,”y”)
1. Calculate haversine distance [[STEP 602]]
a. “d” = haversine(“x”,”y”)
2. Calculate spatial correlation using [[STEP 603]]
a. r = [n * ?(xi - x¯ )(yi - ?)] / [v(n * ?(xi - x¯ )^2) * v(n * ?(yi - ?)^2)]
Where :
n is the number of observations
xi and yi are the values of variables x and y at location i
x¯ and ? are the means of variables x and y over all locations
3. Return Output (r) [[STEP 604]]
FIGURE 7 and FIGURE 8 illustrates a flowchart for computation of temporal difference.
In at least an embodiment, the central server node (CSN), comprising a processor, is envisaged in order to compute temporal difference.
1. let “t” be a timestamp. “xt” denotes the value of “x” at time “t”. “yt” denotes value of “y” at time “t” [[STEP 702]]
2. “difference_t” = “xt” - “yt” [[STEP 702]]
3. return “difference_t” [[STEP 703]]
Calculating various temporal differences - temporal_differences(“x”, ”y”, ”t”)
1. For latest differences,
a. for each mapped (“x”, ”y”) and “t” denoting current timestamp in hours, calculate: [[STEP 802]]
i. latest_diff = temporal_difference(“x”, ”y”, ”t -1”) [[STEP 803]]
2. For 10 day mean differences, [[STEP 804]]
a. for each mapped (“x”,”y”) and “T” denoting list of timestamps for 10 days in hours:
i. for each “t” in “T”:
1. diff(t) = temporal_difference(“x”, ”y”, ”t”)
ii. Take mean of all “diff(t)”, let “mean_t” represent the mean
3. For exponentially weighted moving averages (EWMA) for past 5 days for the same hour, [[STEP 805]]
a. for each mapped (“x”,”y”) and “Th” denoting list of timestamps for 5 days in hours for hour “h”:
i. for each “t” in “T”:
1. diff(t) = temporal_difference(“x”, ”y”, ”t”)
ii. Calculate EWMA for all differences using:
1. “EWMA_t” = alpha * diff(t) + (1 - alpha) * EMWA(t-1)
4. Return (“latest_diff”, “mean_t”, “EWMA_t”) [[STEP 806]]
FIGURE 9 illustrates a flowchart for computation of Model Training for Spatial Features.
In at least an embodiment, the central server node (CSN), comprising a processor, is envisaged in order to compute Model Training for Spatial Features.
For Model Training for Spatial Features - spatial_model(“x”,”y”,”z”)
1. For each sensor “x”, create a buffer area “A” with radius “r” where “r” is a distance in the same unit as the coordinate reference system (latitude and longitude) [[STEP 902]]
2. Map each sensor “x” to nearest satellite point “y” [[STEP 903]]
3. Let “lat_ys”,”lon_ys” be coordinates of all “y”s [[STEP 904]]
4. Derive distances for each pair of “x” and “y” using “dists_sat” = haversine(“x”,”y”) [[STEP 905]]
5. Map each sensor “x” to nearest spatial feature “z” [[STEP 906]]
6. Derive distances for each pair of “x” and “z” using “dists_sf” = haversine(“x”,”z”) [[STEP 907]]
7. Derive spatial correlations for each pair of “x” and “y” using “sp_corrs” = spcorr(“x”,”y”) [[STEP 908]]
8. With “x_vals” being list values of the pollutant from sensors “x” and “y_vals” being list of values of the pollutant from satellite points “y”, train model “m_spatial” [[STEP 909]]
“m_spatial” = model([“dists_sat”,“dirs_sat”,“dists_sf”,“dirs_sf”,“sp_corrs”,“lat_ys”,”lon_ys”,
“y_vals”], “x_vals”)
where model() is a generic machine learning model
9. predict “x_pred_spatial” using “m_temporal” [[STEP 910]]
10. return “x_pred_spatial” [[STEP 910]]
FIGURE 10 illustrates a flowchart for computation of Model Training for Temporal Features.
In at least an embodiment, the central server node (CSN), comprising a processor, is envisaged in order to compute Model Training for Temporal Features.
For Model Training for Temporal Features - temporal_model(“x”,”y”,”z”)
1. For each sensor “x”, create a buffer area “A” with radius “r” where “r” is a distance in the same unit as the coordinate reference system (latitude and longitude) [[STEP 1002]]
2. Map each sensor “x” to nearest satellite point “y” [[STEP 1002]]
3. Derive all the temporal_features [[STEP 1003]]
“latest_diff”, “mean_t”, “EWMA_t” = temporal_differences(“x”, ”y”, ”t”)
4. With “x_vals” being list values of the pollutant from sensors “x” and “y_vals” being list of values of the pollutant from satellite points “y”, train model “m_temporal” [[STEP 1004]]
“m_temporal” = model([“latest_diff”, “mean_t”, “EWMA_t”,
“y_vals”], “x_vals”)
where model() is a generic machine learning model
5. predict “x_pred_temporal” using “m_temporal” [[STEP 1005]]
6. return “x_pred_temporal” [[STEP 1006]]
FIGURE 11 illustrates a flowchart for computation of Model Training for Weather Features.
In at least an embodiment, the central server node (CSN), comprising a processor, is envisaged in order to compute Model Training for Weather Features.
For Model Training for Weather Features - weather_model(“x”,”y”,”z”)
1. For each sensor “x”, create a buffer area “A” with radius “r” where “r” is a distance in the same unit as the coordinate reference system (latitude and longitude) [[STEP 1102]]
2. Map each sensor “x” to nearest satellite point “y” [[STEP 1103]]
3. Let “weather_x” be a list containing temperature, humidity, wind speed, wind direction, visibility, precipitation probability and pressure taken from satellite sources and let “elevation_x” be elevation at the location of sensor “x” taken from satellite sources. [[STEP 1104]]
4. With “x_vals” being list values of the pollutant from sensors “x” and “y_vals” being list of values of the pollutant from satellite points “y”, train model “m_temporal” [[STEP 1105, STEP 1106]]
“m_weather” = model([“weather_x”, “elevation_x”, “y_vals”], “x_vals”) where model() is a generic machine learning model
5. predict “x_pred_weather” using “m_weather” [[STEP 1107]]
6. return “x_pred_weather” [[STEP 1108]]
FIGURE 12 illustrates a flowchart for computation of training a meta model.
In at least an embodiment, the central server node (CSN), comprising a processor, is envisaged in order to compute training a meta model.
For Training the Meta Model - meta_model(“x_pred_spatial”,“x_pred_temporal”,“x_pred_weather ”,”x_vals”)
1. Collect the predictions of the base models and “x_vals” [[STEP 1201]]
2. train meta model [[STEP 1202]]: “m_meta” =
model([“x_pred_spatial”,“x_pred_temporal”,“x_pred_weather”],”x_vals”)
where model() is a generic machine learning model
3. return “m_meta” [[STEP 1203]]
The TECHNICAL ADVANCEMENT of this invention lies in providing an air quality monitoring network consisting of a physical sensor network which uses knowledge from various satellite and geospatial sources and applies techniques such as calibration, error correction, spatial correlation and machine learning to expand coverage of the network and give air quality data to an expanded set of locations.
While this detailed description has disclosed certain specific embodiments for illustrative purposes, various modifications will be apparent to those skilled in the art which do not constitute departures from the spirit and scope of the invention as defined in the following claims, and it is to be distinctly understood that the foregoing descriptive matter is to be interpreted merely as illustrative of the invention and not as a limitation.
,CLAIMS:WE CLAIM,
1. A system for a multimodal spatiotemporal air quality network, said system consisting, essentially, of an air quality monitoring network using multimodal approach, multiple feature engineering techniques, and a machine learning architecture to give air quality data with high spatial coverage, in order to expand coverage of a defined sensor network to provide air quality data to an expanded set of locations beyond the defined sensor network, said system comprising:
- a network of sensor nodes (SSN), deployed over a region of interest, to provide a defined sensor network (SSN), the defined sensor network (SSN) comprising sources, selected from a group of sources, correlative to sensing, wherein each sensor node is configured to:
o measure environmental parameters including particulate matter data and gaseous pollutants data,
o measure meteorological parameters;
o transmit the measured environmental parameters to the central server node (CSN);
- a network of satellite data input nodes (SLN) configured to input satellite air quality data, weather data, and historical air quality data, for the region of interest;
- a geospatial module, configured with a network of location-specific nodes to input spatial locations of pollutant sources in order to obtain geospatial data per location-specific sensor node;
- a demographic module configured to poll demographic data per location-specific sensor node;
- a central server node (CSN) being communicably coupled to said network of sensor nodes (SSN), said network of satellite data input nodes (SLN), said geospatial module, and said demographic module, said central server node (CSN) being configured to collect and process air quality data, the central server node (CSN) comprising:
o a processor configured for anomaly detection, outlier detection, active error correction, and data filtering;
o a pre-processor configured to compute spatial features, temporal features, and meteorological features.
- said central server node (CSN) configured to output pollutant data in the region of interest based on said computed spatial features, said computed temporal features, and said computed meteorological features
2. The system as claimed in claim 1 wherein, said central server node (CSN) utilizes machine learning algorithms to:
- calibrate and error-correct data from sensor nodes using:
o tracking anomalies and anomaly frequencies basis defined thresholds and basis total number of data points in a defined time window;
o shutdown criteria, to shutdown faulty sensor, involving defined thresholds;
- spatially correlate data to expand network coverage using:
o measuring straight-line distance between two points on Earth's surface, considering its curvature;
o correlating data from two sensors to obtain correlation data using sensor’s average values and deviations, in order to obtain a correlation score indicating strength and direction of relationship between sensor data at different locations;
o predicting air quality data basis physical distance between sensors and correlative correlation score;
- train a meta model using said computed spatial features, said computed temporal features, and said computed meteorological features to provide hyperlocal air quality and weather parameters, in order to determine priorities for said temporal features, said spatial features, said meteorological features, said training steps comprising the steps of:
o for temporal features,
? creating a buffer area around each sensor to capture external influence data on each sensor node;
? mapping each sensor node to satellite data by identifying closest satellite data point for each sensor node;
? extracting temporal feature by computing time-based factors correlative to sensor data;
? training a model using said extracted temporal features, sensor data, and satellite data, in order to obtain air quality values based on temporal factors;
o for spatial features,
? creating a buffer area around each sensor to capture external influence data on each sensor node;
? mapping each sensor node to satellite data by identifying closest satellite data point for each sensor node to form pairs of sensors and satellites;
? determining a first set of distances and directions between each pair of sensor and satellite;
? mapping sensors to nearest spatial features;
? determining a second set of distances and directions between each pair of sensor and satellite using spatially mapped sensors;
? extracting spatial correlations between each of said sensor – satellite pairs;
? training a model using said extracted spatial features, spatial correlations, sensor data, and satellite data, in order to obtain air quality values based on spatial factors;
o for meteorological features,
? creating a buffer area around each sensor to capture external influence data on each sensor node;
? mapping each sensor node to satellite data by identifying closest satellite data point for each sensor node to form pairs of sensors and satellites;
? collecting weather features, from satellites, in terms of pre-defined parameters;
? determining elevation of each sensor node;
? training a model using said extracted meteorological features, meteorological correlations, sensor data, and satellite data, in order to obtain air quality values based on meteorological factors.
3. The system as claimed in claim 1 wherein, said network of sensor nodes (SSN) being configured with threshold values and parameters, each of them being:
- particulate matter (304) (PM1, PM2.5, PM10) parameter detectable in the range of 0-500ug/m3 for PM1, 0-1000ug/m3 for PM2.5, and 0-1000ug/m3 for PM10;
- gaseous pollutant (305, 203) (SO2, NO2, CO, O3, CO2) parameter detectable in the range of 0-20 PPM for SO2, 0-20 PPM for NO2, 0-15 PPM for O3, 0-20 PPM for CO; and
- weather parameters (306) detectable in the range of -40-80 degC for temperature, 0-100% for relative humidity, 300-1100hPa for barometric pressure, 0-40m/s for wind speed, 0-360 degree for wind direction, 0-9999 mm/hr for rainfall.
4. The system as claimed in claim 1 wherein, said central server node’s (CSN) said processor configured with instructions for anomaly detection, outlier detection, active error correction, and data filtering, said instructions for anomaly detection being:
- determining, and discarding, logically inconsistent data points; and
- defining rules, with threshold values, for pollutant data, in order to retain coherent pollutant data subscribing to said defined rules.
5. The system as claimed in claim 1 wherein, said central server node’s (CSN) said processor configured with instructions for outlier detection, outlier detection, active error correction, and data filtering, said instructions for outlier detection being:
- determining, and discarding, outliers by clustering data points and deviants, outside of pre-determined threshold values, from the clustered data points as being outliers by:
o deploying multiple sensors, as a cluster, in close proximity to cross-validate sensor data readings;
o determining discrepancies amongst various sensors in a given cluster;
o standardizing sensor data, in said given cluster, to ensure that all features contribute equally to distance calculations;
o applying Density-Based Spatial Clustering of Applications with Noise algorithms to identify sub-clusters and outliers in said standardized sensor data;
o applying redundancy operations and cross-validation operations, along with discrepancy data, to sensor data, for error corrections in said sensor data;
6. The system as claimed in claim 1 wherein, said central server node’s (CSN) said processor configured with instructions for anomaly detection, outlier detection, active error correction, and data filtering, said instructions for active error correction being:
- identifying and correcting sensor data in real time using redundancy operations and cross-validation operations.
7. The system as claimed in claim 1 wherein, said central server node’s (CSN) said processor configured with instructions for anomaly detection, outlier detection, active error correction, and data filtering, said instructions for data filtering being:
- employing moving average techniques to smooth out short-term fluctuations and highlight longer-term trends, characterized, in that:
o employing a moving average technique with a window size of 8 hours to smooth out short-term fluctuations and highlight longer-term trends; and
o applying a weighted factor that increases quadratically with each hour, with the latest hour having highest weight.
8. The system as claimed in claim 1 wherein, said central server node’s (CSN) said pre-processor with a spatial module for computing spatial features, is configured with instructions to perform the steps of:
- requesting for geo-location of said sensor for a specific time unit;
- determining distance from each satellite point, from the network of satellite data input nodes (SLN), to at least a known sensor node within a determined radius of the region of interest;
- determining direction towards each known sensor node from each of said satellite points within the determined radius;
- determining distance from each point to a spatial pollutant source;
- determining spatial correlation between known sensor nodes and satellite for that determined time unit.
9. The system as claimed in claim 1 wherein, said central server node’s (CSN) said pre-processor with a temporal module for computing temporal features, is configured with instructions to perform the steps of:
- requesting for geo-location of said sensor for a specific time unit;
- determining latest difference in value of pollutant for satellite data and nearest known sensor data within a determined radius of the region of interest;
- determining mean difference over number of pre-determined past days between satellite data and known sensor data within the determined radius;
- recording temporal data;
- computing exponentially weighted moving average (EWMA) over number of pre-determined past days of differences for the same time unit in order to achieve temporal difference.
10. The system as claimed in claim 1 wherein, said central server node’s (CSN) said pre-processor with a meteorological module for computing meteorological features, is configured with instructions to perform the steps of:
- recording wind parameters;
- recording visibility parameters;
- recording elevation parameters;
- recording humidity parameters;
- recording pressure parameters; and
- recording precipitation probability.
11. The system as claimed in claim 1 wherein, said central server node’s (CSN) said pre-processor with a spatial module for computing spatial features, is configured with instructions to perform the steps of:
- determining a region of interest;
- mapping each sensor to its nearest satellite point;
- extracting coordinates of all satellite points in order to provide geolocation;
- calculating distance for each pair formed by at least a sensor and an associated satellite point, using Haversine distance function, in order to provide a first distance data;
- mapping each sensor to its nearest spatial feature;
- calculating distance for each pair formed by at least a sensor and an associated spatial feature using Haversine distance function, in order to provide a second distance data;
- deriving spatial correlations for each pair formed by at least a sensor and an associated satellite point in order to provide spatial correlation data;
- determining pollutant values from said at least a sensor;
- determining pollutant values from said at least a satellite point;
- determining direction of said sensor;
- determining direction of said satellite point;
- training said spatial module using first distance data, direction of said sensor, second distance data, direction of said satellite point, spatial correlation data, geolocation data, determined pollutant values from said at least a sensor, and determining pollutant values from said at least a satellite point – in order to obtain trained spatial data; and
- predicting pollutant data, in said region of interest, using said trained data in order to obtain predicted spatial features.
12. The system as claimed in claim 1 wherein, said central server node’s (CSN) said pre-processor with a temporal module for computing temporal features, is configured with instructions to perform the steps of:
- determining a region of interest;
- mapping each sensor to its nearest satellite point;
- extracting elevation for each sensor from satellite sources in order to provide elevation data;
- determining pollutant values from said at least a sensor;
- determining pollutant values from said at least a satellite point;
- training said temporal module using determined pollutant values from said at least a sensor, and determining pollutant values from said at least a satellite point, extracted elevation data – in order to obtain trained temporal data; and
- predicting pollutant data, in said region of interest, using said trained data in order to obtain predicted temporal features.
13. The system as claimed in claim 1 wherein, said central server node’s (CSN) said pre-processor with a meteorological module for computing meteorological features, is configured with instructions to perform the steps of:
- determining a region of interest;
- mapping each sensor to its nearest satellite point;
- determining meteorological data;
- extracting elevation for each sensor from satellite sources in order to provide elevation data;
- determining pollutant values from said at least a sensor;
- determining pollutant values from said at least a satellite point;
- training said meteorological module using determined pollutant values from said at least a sensor, and determining pollutant values from said at least a satellite point, determined meteorological data, extracted elevation data – in order to obtain trained meteorological data; and
- predicting pollutant data, in said region of interest, using said trained data in order to obtain predicted meteorological features.
Dated this 16th day of August, 2024
CHIRAG TANNA
of INK IDÉE
APPLICANT’S PATENT AGENT
REGN. NO. IN/PA – 1785
| # | Name | Date |
|---|---|---|
| 1 | 202341054778-PROVISIONAL SPECIFICATION [16-08-2023(online)].pdf | 2023-08-16 |
| 2 | 202341054778-PROOF OF RIGHT [16-08-2023(online)].pdf | 2023-08-16 |
| 3 | 202341054778-POWER OF AUTHORITY [16-08-2023(online)].pdf | 2023-08-16 |
| 4 | 202341054778-FORM FOR STARTUP [16-08-2023(online)].pdf | 2023-08-16 |
| 5 | 202341054778-FORM FOR STARTUP [16-08-2023(online)]-1.pdf | 2023-08-16 |
| 6 | 202341054778-FORM FOR SMALL ENTITY(FORM-28) [16-08-2023(online)].pdf | 2023-08-16 |
| 7 | 202341054778-FORM 3 [16-08-2023(online)].pdf | 2023-08-16 |
| 8 | 202341054778-FORM 1 [16-08-2023(online)].pdf | 2023-08-16 |
| 9 | 202341054778-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [16-08-2023(online)].pdf | 2023-08-16 |
| 10 | 202341054778-EVIDENCE FOR REGISTRATION UNDER SSI [16-08-2023(online)].pdf | 2023-08-16 |
| 11 | 202341054778-EDUCATIONAL INSTITUTION(S) [16-08-2023(online)].pdf | 2023-08-16 |
| 12 | 202341054778-DRAWINGS [16-08-2023(online)].pdf | 2023-08-16 |
| 13 | 202341054778-FORM-5 [16-08-2024(online)].pdf | 2024-08-16 |
| 14 | 202341054778-FORM 18 [16-08-2024(online)].pdf | 2024-08-16 |
| 15 | 202341054778-DRAWING [16-08-2024(online)].pdf | 2024-08-16 |
| 16 | 202341054778-COMPLETE SPECIFICATION [16-08-2024(online)].pdf | 2024-08-16 |