Sign In to Follow Application
View All Documents & Correspondence

System And Method For Determining Air Quality By Processing Environmental And Traffic Related Visual Data

Abstract: SYSTEM AND METHOD FOR DETERMINING AIR QUALITY BY PROCESSING ENVIRONMENTAL AND TRAFFIC-RELATED VISUAL DATA A system 100 and method are disclosed for automated air quality determination using multimodal data fusion and a transformer-based deep learning architecture. The system (100) receives synchronized video streams from a first camera (104) and a second camera (106) along with environmental sensor data from an air quality monitoring device (108). Images are sampled at defined intervals, annotated with sensor data and contextual metadata, and processed using a frozen convolutional neural network to extract feature vectors. These visual features are combined with standardized sensor readings and encoded contextual information to form a unified feature representation. A custom transformer model projects this representation into a token sequence with positional encoding. The final classification is performed using the transformer’s last token output, which captures attention-weighted spatial-temporal correlations between visual indicators and environmental factors. The system outputs a six-category air quality index (AQI) classification, including good, satisfactory, moderate, poor, very poor, or severe. FIG.1

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
05 June 2024
Publication Number
26/2025
Publication Type
INA
Invention Field
PHYSICS
Status
Email
Parent Application

Applicants

INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY
VINDHYA A-3, 213, IIIT Gachibowli Hyderabad Telangana India 500032

Inventors

1. Sachin Chaudhari
VINDHYA A-3, 213, IIIT Gachibowli Hyderabad, Telangana, India 500032
2. Anoop Namboodiri
VINDHYA A-3, 213, IIIT Gachibowli Hyderabad, Telangana, India 500032
3. Om Kathalkar
VINDHYA A-3, 213, IIIT Gachibowli Hyderabad, Telangana, India 500032
4. Nitin Nilesh
VINDHYA A-3, 213, IIIT Gachibowli Hyderabad, Telangana, India 500032

Specification

DESC:BACKGROUND
Technical Field
[0001] The present invention relates to environmental monitoring and data analysis, specifically to a system and method for determining air quality by processing environmental and traffic-related visual data.This invention integrates advanced image processing techniquesand environmental science to generate comprehensive image datasets that correlate traffic conditions with air quality levels.
Description of the related art
[0002] The accurate monitoring of air quality is critical for public health, environmental protection, and urban planning. Air quality is typically quantified using the Air Quality Index (AQI), a metric that ranges from 0 to 500, with lower values indicating better air quality and higher values indicating worse air quality. The AQI is determined based on concentrations of various pollutants, including particulate matter (PM2.5 and PM10) and gaseous pollutants, measured in micrograms per cubic meter (µg/m³). These pollutants significantly degrade ambient air quality, particularly in urban environments with high traffic density.
[0003] In India, the Central Pollution Control Board (CPCB) has established weather stations across various cities to monitor air quality. These stations are equipped with sophisticated and expensive instruments to measure atmospheric parameters and pollutant levels. However, the widespread spacing of these monitoring stations poses a significant challenge in capturing localized pollution events, such as those caused by burning or firecrackers. Consequently, the AQI readings provided by these stations may lack the granularity needed to reflect real-time, localized air quality conditions accurately.
[0004] To address this issue, numerous air quality researchers have explored low-cost AQI estimation systems that utilize highly accurate sensors. These systems, while effective, come with their own set of challenges, including the need for frequent maintenance and replacement of sensors, which can hinder the scalability and long-term reliability of air pollution monitoring efforts. In response to these limitations, there is a growing interest in developing sensor-free methodologies for air quality estimation. Image-based methods have emerged as a promising alternative due to their simplicity, cost-effectiveness, and ability to produce robust outputs with minimal effort. These methods leverage advancements in computer vision and machine learning to analyze images for indicators of air quality, such as haze and visibility reduction caused by particulate matter. Despite their potential, previous attempts at air quality estimation using images have encountered significant hurdles. Both stationary and mobile estimation techniques have underperformed primarily due to the lack of high-quality datasets. Existing datasets often lack essential attributes such as significant data points, seasonal variations, and day-night distinctions, which are crucial for accurate and comprehensive air quality analysis. In light of the global health implications of air pollution and the specific challenges faced by urban areas with high traffic-related pollution, there is a clear and pressing need for innovative approaches to air quality determination. Prominent companies like Google, Microsoft, and IBM have been at the forefront of this field, leveraging advanced machine-learning techniques to monitor air pollution.
[0005] Researchers have employed various methodologies for air quality monitoring using image datasets. Liu et al. utilized light attenuation and color information as key features for PM level estimation. Mondal et al. developed a custom CNN model to assess PM2.5 levels from images. Kalajdjieski et al. combined InceptionV3 for image features with a multilayer perceptron (MLP) for weather data, integrating these in a multimodal approach. Zhang et al. presented AQC-Net, a deep convolutional neural network based on ResNet, enhanced with a Spatial and Context Attention block (SCA) for better scene detail encoding. Nilesh et al. employed a hybrid approach using YOLOv5 for vehicle detection, BRISQUE for visibility metrics, and Random Forest for categorizing air quality conditions. Despite significant advancements, the lack of high-quality, comprehensive datasets for air quality monitoring remains a major challenge. Existing datasets often miss crucial features such as seasonal variations, day-night distinctions, and extensive data points.
[0006] Therefore, there arises a need to address the aforementioned technical drawbacks by providing a designed system and method to enhance the accuracy, reliability, and granularity of air quality determination in real-time, thereby contributing to more effective environmental management and public health protection.
SUMMARY OF THE INVENTION
[0007] The first aspect of the present invention provides a system for determining air quality by processing environmental and traffic-related visual data. The system includes a first camera mounted on the front side of a data collection vehicle. The first camera is configured to capture front-facing video data. The second camera is mounted on the rear side of the data collection vehicle. The second camera is configured to capture rear-facing video data. An air quality monitoring device includes a particulate matter sensor configured to measure particulate matter (PM2.5 and PM10) concentrations, a temperature and relative humidity sensor configured to measure temperature and relative humidity, a reference device configured to calibrate the particulate matter sensor and a microcontroller unit to control functionalities of the particulate matter sensor, the temperature and relative humidity sensor and the reference device. The air quality monitoring device operates at a temperature range from –40°C to +125°C and a humidity range from 0% to 100%. A server is communicatively connected to the first camera, the second camera, and the air quality monitoring device. The server is configured to receive the front-facing video data from the first camera, the rear-facing video data from the second cameraand the sensor data including the particulate matter, temperature, and relative humidity measurements from the air quality monitoring device. The server is further configured to (i) sample images from the front-facing and rear-facing video data at defined intervals, and associate each sampled image with co-located sensor data; (ii) extract visual features from the sampled images using a frozen convolutional neural network (CNN) to obtain front and rear feature vectors; (iii) combine the front and rear visual features with standardized sensor data and encoded contextual information into a unified feature vector, the contextual information includes day, night status and seasonal conditions, encoded numerically using categorical encoding techniques; (iv) process the unified feature vector using a transformer-based neural network architecture configured to model spatial-temporal and cross-modal relationships among the visual features, the sensor data, and the contextual information relevant to air quality assessment; and (v) classifyair quality into one of six categories comprising good, satisfactory, moderate, poor, very poor, and severe based on the transformer output.
[0008] In some embodiments, the CNN is a frozen ResNet50 model with final classification layer removed, producing a 2048-dimensional feature vector for each camera view.
[0009] In some embodiments, the sensor data is standardized using a StandardScaler to ensure zero mean and unit variance.
[0010] In some embodiments, the unified feature vector is projected into a sequence of ten tokens using a linear projection layer with an embedding dimension of 256.
[0011] In some embodiments, the sequence of tokens is augmented with positional encoding using sinusoidal functions to preserve spatial-temporal relationships.
[0012] In some embodiments, the transformer comprises six encoder layers with eight self-attention heads per layer and a feedforward dimensionality equal to four times the embedding dimension.
[0013] In some embodiments, final token of the transformer output is used for classification, wherein the token represents the aggregated attention-weighted summary of all input features.
[0014] In some embodiments, the particulate matter (PM) sensor is calibrated using the reference device via a linear regression model, wherein outliers in the PM sensor data are removed using the interquartile range (IQR) method.
[0015] In some embodiments, the particulate matter sensor and the temperature and relative humidity sensor are configured to measure the particulate matter (PM) and weather parameters at a frequency of 5 seconds.
[0016] In some embodiments, the particulate matter sensor is configured to measure particle size from 0.3 to 10 µm with a measuring range of 0.0 to 999.9 µg/m.
[0017] The second aspect of the present invention provides a method for determining air quality by acquiring and processing environmental and traffic-related visual data. The method includes (i) receiving, at a server, front-facing video data from a first camera, rear-facing video data from a second camera, and sensor data from an air quality monitoring device, the sensor data includes particulate matter, temperature, and relative humidity measurements; (ii) sampling images from the front-facing and rear-facing video data at defined intervals, and associating each sampled image with co-located sensor data; (iii) extracting visual features from the sampled images using a frozen convolutional neural network (CNN) to obtain front and rear feature vectors; (iv) combining the front and rear visual features with standardized sensor data and encoded contextual information into a unified feature vector, the contextual information includes day, night status and seasonal conditions, encoded numerically using categorical encoding techniques; (v) processing the unified feature vector using a transformer-based neural network architecture configured to model spatial-temporal and cross-modal relationships among the visual features, the sensor data, and the contextual information relevant to air quality assessment; and (vi) classifying air quality into one of six categories comprising good, satisfactory, moderate, poor, very poor, and severe based on the transformer output.
[0018] In some embodiments, the CNN is a frozen ResNet50 model with final classification layer removed, producing a 2048-dimensional feature vector for each camera view.
[0019] In some embodiments, the sensor data is standardized using a StandardScaler to ensure zero mean and unit variance.
[0020] In some embodiments, the unified feature vector is projected into a sequence of ten tokens using a linear projection layer with an embedding dimension of 256, wherein the sequence of tokens is augmented with positional encoding using sinusoidal functions to preserve spatial-temporal relationships.
[0021] In some embodiments, final token of the transformer output is used for classification, wherein the token represents the aggregated attention-weighted summary of all input features.
[0022] These and other aspects of the embodiments herein will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following descriptions, while indicating preferred embodiments and numerous specific details thereof, are given by way of illustration and not of limitation. Many changes and modifications may be made within the scope of the embodiments herein without departing from the scope thereof, and the embodiments herein include all such modifications.
BRIEF DESCRIPTION OF THE DRAWINGS
[0023] The embodiments herein will be better understood from the following detailed description with reference to the drawings, in which:
[0024] FIG. 1 illustrates a system fordetermining air quality by processing environmental and traffic-related visual data according to some embodiments herein;
[0025] FIG. 2 is a block diagram of the server of FIG. 1 including various modules for determining air quality by processing environmental and traffic-related visual data according to some embodiments herein;
[0026] FIG. 3illustrates a data sample from the traffic-related air quality image datasets of FIG. 1 according to some embodiments herein;
[0027] FIGS.4A and 4B illustrate the dataset distribution across AQI categories and seasons according to some embodiments herein;
[0028] FIG.5 is a table illustrating a distribution of the samples across different seasons and AQI categoriesaccording to some embodiments herein;
[0029] FIG.6 is a block diagram illustrating a directory structure of the traffic-related air quality image datasets of FIG. 1 according to some embodiments herein;
[0030] FIG.7 is a graphical illustration of the distribution of various vehicle classes across different Air Quality Index (AQI) categories within the traffic-related air quality image dataset of FIG. 1 according to some embodiments herein;
[0031] FIGS. 8A and 8B are flow diagrams that illustrate a methodfor determining air quality by processing environmental and traffic-related visual data according to some embodiments herein; and
[0032] FIG. 9 is a schematic diagram of a computer architecture in accordance with the embodiments herein.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
[0033] The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description.Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable those of skill in the art to practice the embodiments herein.Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.
[0034] As mentioned, there remains a needfor a need to address the aforementioned technical drawbacks in providing a designed dataset to enhance the accuracy, reliability, and granularity of air quality determination in real-time, thereby contributing to more effective environmental management and public health protection. Referring now to the drawings, and more particularly to FIGS. 1 through 9, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments.
[0035] FIG. 1 illustrates a system 100 for determining air quality by processing environmental and traffic-related visual data according to some embodiments herein.The system 100 includes a data collection vehicle 102 equipped with two dashboard cameras including a first camera 104 for front image capture, a second camera 106 for rear image capture, and an air quality monitoring device 108. The air quality monitoring device 108 includes a microcontroller (MCU)unit 110, a particulate matter (PM)sensor 112, a temperature and relative humidity sensor (RH) 114, and a reference device 116. The system 100 includes a server 118 and a network 120. The server 118 is communicatively connected to the first camera 104, the second camera 106, and the air quality monitoring device 108of the data collection vehicle 102.
[0036] The first camera 104 may be the DDPAI Mola N3 features a 5 MP CMOS sensor capable of recording in 2688×1944 ultra HD resolution.The second camera 106 may be theDDPAI X2 Pro offers a 120° lens at the rear, ensuring a broad field of view while recording at 1920×1080 resolution. The first camera 104 and the second camera 106 areconfigured to record one-minute videos at a resolution of 1920×1080 and a frame rate of 30 frames per second (fps). These features facilitate high-quality image acquisition, enhancing data accuracy in urban settings.The air quality monitoring device 108 is a custom device that measures particulate matter (PM) and weather parameters at a frequency of 5 seconds. The microcontroller unit 110 may be an EspressIF ESP32 microcontroller unit.The particulate matter (PM) sensor 112 may be a Nova SDS011 particulate matter (PM)sensor to measure PM2.5 and PM10 concentrations.The temperature and relative humidity sensor 114 may be a BME280 sensor configured to measure temperature and relative humidity. The reference device 116 may be an Aeroqual S500with a primary function of calibrating the particulate matter (PM) sensor 112 that captures a data point at a one-minute frequency. The particulate matter (PM) sensor 112 is configured to measure particle size from 0.3 to 10 µm with a measuring range of 0.0 to 999.9 µg/m. The air quality monitoring device 108operates at a temperature range from -40?C to +125?C and a humidity range from 0% to 100%. The air quality monitoring device 108was mounted on the data collection vehicle 102 to collect air quality data, capturing theparticulate matter (PM) values, the temperature, and relative humidity, which are essential for assessing overall air quality.
[0037] The server 118is configured to pre-process the video data obtained from the first camera 104 and the second camera 106. The video data is pre-processed by sampling images from the videos obtained from thefirst camera 104 and the second camera 106 every 5 seconds to eliminate any repetitive frames, ensure consistency between the images and corresponding sensor data, and align with the sampling rate of the sensors.The image data is pre-processed by filtering-out outlier images such as those affected by high headlight glare and extremely low illumination. The images are resized to 640 × 360 dimensions, making them compatible with deep learning architectures. The system 100 can detect and count vehicle number plates, which provides valuable insights into which vehicles are contributing to pollution levels. To adhere to privacy principles and maintain dataset anonymity, the system 100 employs algorithms to blur number plates and faces within the captured images. Number plates of electric vehicles are uniquely blurred in green, preserving their distinguishable color, while other number plates are blurred in white. This ensures that the dataset remains anonymous and complies with privacy regulations. By combining air quality data from sensors, traffic condition data from monitoring systems, and the privacy-preserved images, the system 100 generates comprehensive datasets. These datasets are stored in a structured database, allowing for detailed analysis and research on the correlation between traffic patterns and air quality.
[0038] The server 118 is configured to pre-process the sensor data obtained from the air quality monitoring device 108. The sensor data is pre-processed by calibrating the particulate matter (PM) sensor 112with the reference device 116using a linear regression model and removing outliers using the interquartile range (IQR) method.
[0039] The server 118 is configured to generate a traffic-related air quality image dataset using a traffic-related air quality image datasetgeneration module 122. The traffic-related air quality image dataset includes a comprehensive compilation of 27,041 data samples with diversity across various environmental contexts. The dataset includes about 9921 images obtained during the daytime (6 AM - 6 PM) and about 17120 images obtained at nighttime (6 PM - 6 AM), which enables the possibility of predicting the air quality index (AQI)at different times of the day. The dataset includes 27,041 data samples accumulated from 60 hours of video capture, focusing on vehicular traffic observations from multiple viewpoints. The traffic-related air quality image dataset is distinctive as it features a multi-view setup, incorporating both front and rear images from traffic scenes. The inclusion of co-located sensor readings ensures precise accuracy for weather and air quality measurements. Incorporating night images is essential for training AQI estimation models under various lighting and environmental conditions. Seasonal diversity enhances the dataset's representation of air quality by capturing different weather patterns, pollution levels, and atmospheric conditions influencing the visual appearance of the environment.The traffic-related air quality image dataset sequential data samples for each day, facilitating the analysis of spatial patterns in AQI. Its vast number of data samples captures the variability in urban environments, including different traffic patterns, building structures, and geographical features, ensuring the model's robustness and generalization capabilities.Images from the traffic-related air quality image dataset depict front and rear views of a single data sample across different AQI categories and day-night settings, showcasing the dataset's diversity. Each image is annotated with temperature and season information, providing context for the captured environmental conditions.
[0040] Multiple heterogeneous data types are combined through a sophisticated pipeline. The server 118 extracts visual features using a frozen ResNet50 convolutional neural network, which processes dual-perspective images captured from the first camera 104 and the second camera 106. The CNN feature extraction is implemented using timm.create_model('resnet50', pretrained=True) with the final classification layer removed via reset_classifier(0), producing 2048-dimensional feature vectors from each camera perspective. The fusion process concatenates four distinct feature types into a unified representation. First, front CNN features are flattened from the ResNet50 output, providing 2048 features that capture forward-facing traffic and pollution patterns. Second, rear CNN features are similarly extracted and flattened, contributing another 2048 features that provide complementary viewpoint information for comprehensive scene understanding. Third, environmental sensor data including temperature and humidity readings are standardized using StandardScaler to ensure zero mean and unit variance. Fourth, categorical context features including day, night conditions and seasonal information are numerically encoded using pandas Categorical codes. The complete feature concatenation np.concatenate([front_features.flatten(), is implemented as: rear_features.flatten(), all_features = numerical, categorical]).astype(np.float32), creating a unified feature vector of approximately 4100 dimensions that combines visual pollution indicators with calibrated environmental measurements.The CNN feature extraction implements computational efficiency improvements through a frozen architecture approach. The ResNet50 model is loaded with pre-trained ImageNet weights using self.cnn_extractor = timm.create_model('resnet50', pretrained=True), followed by setting the model to evaluation mode with self.cnn_extractor.eval() and freezing all parameters using for param in self.cnn_extractor.parameters(): param.requires_grad = False. This frozen weight strategy eliminates backpropagation through CNN layers, reduces GPU memory requirements significantly, and accelerates training speed while leveraging high-quality pre-trained visual features. Training optimization incorporates early stopping with checkpointing mechanisms to optimize model training efficiency. The system 100 implements patience-based early stopping with best_val_acc tracking and patience_counter management. When validation accuracy improves, the system 100 saves model weights using torch.save(model.state_dict(), args.checkpoint) and resets the patience counter. If validation accuracy fails to improve for 5 consecutive epochs, training terminates early to prevent overfitting. Additional optimizations include dynamic device selection using "cuda" if torch.cuda.is_available() else "cpu" for hardware acceleration and configurable batch processing for memory management.
[0041] The custom AQI Transformer architecture of the present invention addresses the technical problem of capturing complex relationships between visual pollution indicators and measured air quality through specialized processing mechanisms. The transformer converts the concatenated feature vector into a sequence representation using an input projection mechanism: self.input_projection = nn.Linear(input_dim, self.seq_length * d_model), where seq_length is set to 10 tokens and d_model is 256 dimensions. Positional encoding maintains spatial-temporal relationships in the data through sinusoidal encoding patterns. The Positional Encoding module implements position-dependent sinusoidal functions: pe[:, 0, 0::2] = torch.sin(position * div_term) and pe[:, 0, 1::2] = torch.cos(position * div_term), where div_term = torch.exp(torch.arange(0, d_model, 2) * (-np.log(10000.0) / d_model)). This encoding preserves the relative importance and relationships between different feature components including visual front/rear perspectives, sensor readings, and contextual information within the sequence representation. Self-attention mechanisms identify relevant visual pollution indicators through multi-head attention processing. The transformer encoder utilizes 8 attention heads across 6 transformer layers, implemented as nn.TransformerEncoderLayer(d_model, nhead, dim_feedforward=4*d_model, dropout=dropout). The self-attention mechanism identifies correlations between visual features from both camera perspectives, weights sensor readings against visual pollution indicators, captures complex interactions between environmental conditions and visual patterns, and focuses attention on the most relevant pollution indicators for AQI prediction. The final classification uses only the last token output[:, -1, :], which contains the aggregated attention-weighted information from all sequence positions.
[0042] The system 100 addresses varying illumination conditions through specialized preprocessing applied uniformly to both day and night images. Image transformations are implemented using transforms.Compose([transforms. Resize((224, 224)), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]). This preprocessing pipeline applies standardized normalization using ImageNet statistics regardless of lighting conditions, ensuring consistent feature extraction from the frozen ResNet50 model across all illumination scenarios. Environmental factors are categorically encoded through systematic conversion of qualitative variables to numerical representations. Season information is encoded using self.df["Season"] = pd.Categorical(self.df["Season"]).codes, which converts seasonal categories such as Spring, Summer, Fall, and Winter into numerical codes. Day, Night conditions are similarly encoded using self.df["Day_or_Night"] = pd.Categorical(self.df["Day_or_Night"]).codes for binary illumination classification. Temperature and humidity sensor readings undergo standardization using StandardScaler, which fits the scaler to numerical_features = self.df[["Temperature", "Humidity"]].astype(float).values and transforms them to zero mean and unit variance. Dual-perspective image capture compensates for viewpoint limitations through complementary camera positioning. The first camera 104 captures traffic density and forward pollution sources, while the second camera 106 provides an additional viewing angle for comprehensive scene analysis. Both perspectives undergo identical preprocessing and feature extraction, with the resulting features concatenated for fusion processing. The transformer's attention mechanism learns optimal weighting and combination of both viewpoints, enabling robust pollution assessment across varying environmental conditions, lighting scenarios, and traffic patterns.
[0043] FIG. 2 is a block diagram of the server 118of FIG. 1 including various modules fordetermining air quality by processing environmental and traffic-related visual data according to some embodiments herein.The server 118 includes a database 202, a data receiving module 204, an image sampling module 206, visual features extraction module208, anunified feature vector generation module 210, a unified feature vector processing module 212 and anair quality classification module 214.
[0044] The data receiving module 204 is configured to receive the front-facing video data from the first camera104, the rear-facing video data from the second camera 106, and the sensor data comprising particulate matter, temperature, and relative humidity measurementsfrom the air quality monitoring device 108.The image sampling module 206 is configured to sample images from the front-facing and rear-facing video data at defined intervals, and associate each sampled image with co-located sensor data.The visual features extraction module 208 is configured to extract visual features from the sampled images using a frozen convolutional neural network (CNN) to obtain front and rear feature vectors. The unified feature vector generation module 210 is configured to combine the front and rear visual features with standardized sensor data and encoded contextual information into a unified feature vector.The contextual information includes day, night status and seasonal conditions, encoded numerically using categorical encoding techniques. The unified feature vector processing module 212 is configured to processthe unified feature vector using a transformer-based neural network architecture configured to model spatial-temporal and cross-modal relationships among the visual features, the sensor data, and the contextual information relevant to air quality assessment.The air quality classification module 214 is configured to classify air quality into one of six categories comprising good, satisfactory, moderate, poor, very poor, and severe based on the transformer output.
[0045] FIG. 3illustrates a data sample from the traffic-related air quality image datasets of FIG. 1according to some embodiments herein. The data sample includes the front image 302, the rear image 304, the timestamp 306, the temperature (?C) 308, the relative humidity (%) 310, and the season 312,considered features 314. The data sample includes the PM2.5 concentration value 316, the PM10 concentration value 318, the AQI Value 320, and the AQI category 322according to the CPCB standards, considered labels 324. The front image 302 and the rear image 304 with 640 × 360 × 3 dimensions, help understand the traffic scenario’s vehicular dynamics. The temperature 308 and the relative humidity310 are scalar, and the season312 is categorical (.i.e.)summer, winter, monsoon. The timestamp 306 contains the date and time of the sample captured. Another feature of the dataset is the sequential organization of the samples. Since the data was collected on 21 different dates, the data samples collected each day form a sequence, representing the progression of air quality as the data collection vehicle 102 traverses the streets.
[0046] FIGS. 4A and 4B illustrate the dataset distribution across AQI categories and seasonsaccording to some embodiments herein. According to FIG.4A, AQI is generally adverse in traffic scenarios, with most data samples ranging from “satisfactory” to “moderate”, reflecting the typical air quality. The “poor”, “very poor”, and “severe” categories are significant, pointing to local events like construction activities, industrial operations, and open burning as contributing factors. FIG.4B shows the distribution of dataset with respect to seasons including monsoon, summer and winter.
[0047] FIG. 5 is a table illustrating a distribution of the samples across different seasons and AQI categoriesaccording to some embodiments herein. As shown in the table, monsoon months demonstrate improved air quality, ranging from good to moderate AQI levels, with a mean AQI of 108, attributed to frequent rains settling particulate matter. However, the winter months experience moderate to severe air quality, with a mean AQI of 152, as temperature inversion traps pollutants near the ground. Conversely, summer months witness the highest pollution levels, with a mean AQI of 190, likely due to warmer temperatures aiding pollutant dispersion. Recognizing these seasonal variations is crucial for devising targeted pollution mitigation strategies.
[0048] FIG. 6 is a block diagram 600 illustrating a directory structure of the traffic-related air quality image datasets of FIG. 1 according to some embodiments herein. The traffic-related air quality image dataset (TRAQID) 602 includes a single CSV file containing information about all data samples. Additionally, there are 21 subdirectories, each representing data from a different day. Each day’s data is represented as a sequence, with each sequence directory containing “Front” and “Rear” subdirectories holding the respective images named “.jpg” for that sequence. The CSV file includes a column labeled "Sequence," providing the sequence number for each data sample.
[0049] FIG.7 is a graphical illustration 700 of the distribution of various vehicle classes across different Air Quality Index (AQI) categories within the traffic-related air quality image dataset of FIG. 1 according to some embodiments herein. The classes of vehicles examined include bikes, autorickshaws, cars, trucks, and buses that are identified as the most prevalent in Indian traffic conditions. The overall air quality does not always directly correlate with traffic scenarios. This is evident in instances where images showed no vehicle presence, yet the data indicated poor to severe AQI categories. This discrepancy can be attributed to local events such as construction activities, industrial operations, and open burning, which exacerbate traffic congestion and significantly deteriorate environmental conditions.
[0050] FIGS. 8A and 8B are flow diagrams that illustrate a method for determining air quality by processing environmental and traffic-related visual data according to some embodiments herein.
[0051] At step 802, the method includes receiving, at a server, front-facing video data from a first camera, rear-facing video data from a second camera, and sensor data from an air quality monitoring device.The sensor data comprising particulate matter, temperature, and relative humidity measurements.
[0052] At step 804, the method includes sampling images from the front-facing and rear-facing video data at defined intervals, and associating each sampled image with co-located sensor data.
[0053] At step 806, the method includes extracting visual features from the sampled images using a frozen convolutional neural network (CNN) to obtain front and rear feature vectors.
[0054] At step 808, the method includes combining the front and rear visual features with standardized sensor data and encoded contextual information into a unified feature vector, wherein the contextual information includes day, night status and seasonal conditions, encoded numerically using categorical encoding techniques;
[0055] At step 810, the method includes processing the unified feature vector using a transformer-based neural network architecture configured to model spatial-temporal and cross-modal relationshipsamong the visual features, the sensor data, and the contextual informationrelevant to air quality assessment.
[0056] At step 812, the method includes classifying air quality into one of six categories comprising good, satisfactory, moderate, poor, very poor, and severe based on the transformer output.
[0057] The present invention offers numerous advantages that enhance air quality determination in real-time. By utilizing existing infrastructure such as CCTV cameras and dash cameras, the system 100 leverages readily available data sources to extract valuable air quality information, transforming surveillance footage into a practical tool for air quality assessment. The traffic-related air quality image dataset is uniquely comprehensive, capturing a large number of data points across various times of day, seasons, and geographic locations. This extensive dataset includes both front and rear images, co-located with sensor readings for parameters like temperature, humidity, and particulate matter concentrations, providing a rich, multi-faceted view of air quality dynamics. The temporal and spatial variability within the traffic-related air quality image dataset allows for the analysis of long-term trends and patterns, making it a robust resource for environmental monitoring. Additionally, the dataset's open access availability promotes collaboration and transparency, driving advancements in environmental research and public health. Overall, this innovative approach not only enhances environmental awareness but also has the potential to improve public health outcomes by enabling timely, localized air quality assessments and informing safer route planning based on real-time air quality data.
[0058] A representative hardware environment for practising the embodiments herein is depicted in FIG. 9 with reference to FIGS. 1 through 8. This schematic drawing illustrates a hardware configuration of a server 118/computer system in accordance with the embodiments herein. The server 118 /computer includes at least one processing device 10 and a cryptographic processor 11. The special-purpose CPU 10 and the cryptographic processor (CP) 11 may be interconnected via system bus 14 to various devices such as a random access memory (RAM) 15, read-only memory (ROM) 16, and an input/output (I/O) adapter 17. The I/O adapter 17 can connect to peripheral devices, such as disk units 12 and tape drives 13, or other program storage devices that are readable by the system 100. The server 118 / computer can read the inventive instructions on the program storage devices and follow these instructions to execute the methodology of the embodiments herein. The server 118/computersystem further includes a user interface adapter 20 that connects a keyboard 18, mouse 19, speaker 25, microphone 23, and/or other user interface devices such as a touch screen device (not shown) to the bus 14 to gather user input. Additionally, a communication adapter 21 connects the bus 14 to a data processing network 26, and a display adapter 22 connects the bus 14 to a display device 24, which provides a graphical user interface (GUI) 30 of the output data in accordance with the embodiments herein, or which may be embodied as an output device such as a monitor, printer, or transmitter, for example.Further, a transceiver 27, a signal comparator 28, and a signal converter 29 may be connected with the bus 14 for processing, transmission, receipt, comparison, and conversion of electric or electronic signals.
[0059] The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments.It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope.

,CLAIMS:I/We claim:
1. A system (100)for determining air quality by processing environmental and traffic-related visual data, wherein the system (100) comprises,
a first camera (104) mounted on front side of a data collection vehicle (102), wherein the first camera (104) is configured to capture front-facing video data;
a second camera (106) mounted on rear side of the data collection vehicle (102), wherein the second camera (106) is configured to capture rear-facing video data;
an air quality monitoring device (108) comprising a particulate matter sensor (112) configured to measure particulate matter (PM2.5 and PM10) concentrations, a temperature and relative humidity sensor (114) configured to measure temperature and relative humidity, a reference device (116) configured to calibrate the particulate matter sensor (112) and a microcontroller unit (110) to control functionalities of the particulate matter sensor (112),the temperature and relative humidity sensor (114) and the reference device (116), wherein the air quality monitoring device (108) operates at a temperature range from –40°C to +125°C and a humidity range from 0% to 100%;
a server (118) communicatively connected to the first camera (104), the second camera (106), and the air quality monitoring device (108), wherein the server (118) comprises a memory storing instructions that, when executed by the server (118) cause the server (118) to,
receive the front-facing video data from the first camera (104), the rear-facing video data from the second camera (106), and the sensor data from the air quality monitoring device (108), wherein the sensor data comprises particulate matter, temperature, and relative humidity measurements;
sample images from the front-facing and rear-facing video data at defined intervals, and associating each sampled image with co-located sensor data;
extracting visual features from the sampled images using a frozen convolutional neural network (CNN) to obtain front and rear feature vectors;
combining the front and rear visual features with standardized sensor data and encoded contextual information into a unified feature vector, wherein the contextual information includes day, night status and seasonal conditions, encoded numerically using categorical encoding techniques.
processing the unified feature vector using a transformer-based neural network architecture configured to model spatial-temporal and cross-modal relationships among the visual features, the sensor data, and the contextual information relevant to air quality assessment; and
classifying air quality into one of six categories comprising good, satisfactory, moderate, poor, very poor, and severe based on the transformer output.

2. The system (100) as claimed in claim 1, wherein the CNN is a frozen ResNet50 model with final classification layer removed, producing a 2048-dimensional feature vector for each camera view.

3. The system (100) as claimed in claim 1, wherein the sensor data is standardized using a StandardScaler to ensure zero mean and unit variance.

4. The system (100) as claimed in claim 1, wherein the unified feature vector is projected into a sequence of ten tokens using a linear projection layer with an embedding dimension of 256.

5. The system (100) as claimed in claim 4, wherein the sequence of tokens is augmented with positional encoding using sinusoidal functions to preserve spatial-temporal relationships.

6. The system (100) as claimed in claim 1, wherein the transformer comprises six encoder layers with eight self-attention heads per layer and a feedforward dimensionality equal to four times the embedding dimension.

7. The system (100) as claimed in claim 1, wherein final token of the transformer output is used for classification, wherein the token represents the aggregated attention-weighted summary of all input features.

8.The system (100) as claimed in claim 1, wherein the particulate matter (PM) sensor (112) is calibrated using the reference device (116) via a linear regression model, wherein outliers in the PM sensor data are removed using the interquartile range (IQR) method.

9. The system (100) as claimed in claim 1,wherein the particulate matter sensor (112) and the temperature and relative humidity sensor (114) are configured to measurethe particulate matter (PM) and weather parameters at a frequency of 5 seconds.

10. The system (100) as claimed in claim 1,wherein the particulate matter sensor (112) is configured to measure particle size from 0.3 to 10 µm with a measuring range of 0.0 to 999.9 µg/m.

11. A method for determining air quality by acquiring and processing environmental and traffic-related visual data, wherein the method comprises,
receiving, at a server (118), front-facing video data from a first camera (104), rear-facing video data from a second camera (106), and sensor data from an air quality monitoring device (108), the sensor data comprising particulate matter, temperature, and relative humidity measurements;
sampling images from the front-facing and rear-facing video data at defined intervals, and associating each sampled image with co-located sensor data;
extracting visual features from the sampled images using a frozen convolutional neural network (CNN) to obtain front and rear feature vectors;
combining the front and rear visual features with standardized sensor data and encoded contextual information into a unified feature vector, wherein the contextual information includes day, night status and seasonal conditions, encoded numerically using categorical encoding techniques;
processing the unified feature vector using a transformer-based neural network architecture configured to model spatial-temporal and cross-modal relationships among the visual features, the sensor data, and the contextual information relevant to air quality assessment; and
classifying air quality into one of six categories comprising good, satisfactory, moderate, poor, very poor, and severe based on the transformer output.

Dated this 5th June, 2025
Signature of the Patent Agent:
Arjun Karthik Bala
(IN/PA – 1021)

Documents

Application Documents

# Name Date
1 202441043706-STATEMENT OF UNDERTAKING (FORM 3) [05-06-2024(online)].pdf 2024-06-05
2 202441043706-PROVISIONAL SPECIFICATION [05-06-2024(online)].pdf 2024-06-05
3 202441043706-PROOF OF RIGHT [05-06-2024(online)].pdf 2024-06-05
4 202441043706-POWER OF AUTHORITY [05-06-2024(online)].pdf 2024-06-05
5 202441043706-FORM FOR SMALL ENTITY(FORM-28) [05-06-2024(online)].pdf 2024-06-05
6 202441043706-FORM 1 [05-06-2024(online)].pdf 2024-06-05
7 202441043706-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [05-06-2024(online)].pdf 2024-06-05
8 202441043706-EVIDENCE FOR REGISTRATION UNDER SSI [05-06-2024(online)].pdf 2024-06-05
9 202441043706-EDUCATIONAL INSTITUTION(S) [05-06-2024(online)].pdf 2024-06-05
10 202441043706-DRAWINGS [05-06-2024(online)].pdf 2024-06-05
11 202441043706-Request Letter-Correspondence [04-07-2024(online)].pdf 2024-07-04
12 202441043706-Power of Attorney [04-07-2024(online)].pdf 2024-07-04
13 202441043706-FORM28 [04-07-2024(online)].pdf 2024-07-04
14 202441043706-Form 1 (Submitted on date of filing) [04-07-2024(online)].pdf 2024-07-04
15 202441043706-Covering Letter [04-07-2024(online)].pdf 2024-07-04
16 202441043706-DRAWING [05-06-2025(online)].pdf 2025-06-05
17 202441043706-CORRESPONDENCE-OTHERS [05-06-2025(online)].pdf 2025-06-05
18 202441043706-COMPLETE SPECIFICATION [05-06-2025(online)].pdf 2025-06-05
19 202441043706-FORM-9 [21-06-2025(online)].pdf 2025-06-21
20 202441043706-FORM 18 [21-06-2025(online)].pdf 2025-06-21