Abstract: “AN AI-ENABLED ASSISTIVE WEARABLE DEVICE FOR BLIND AND VISUALLY IMPAIRED” The present invention relates to an AI-enabled assistive wearable device for blind and visually impaired. This assistive wearable device, designed for visually impaired individuals, enhances independence by translating visual information into audio signals. Integrated into eyeglasses (1), it captures real-time video and processes it to provide audio feedback on objects, currency, documents, and more. The device includes object detection, currency verification, document reading, live video calls, educational content access, and wayfinding assistance models. The device is user-friendly, easily attached to eyeglasses (1), and powered by a user equipment (5). It offers a significant advancement in technology for visually impaired individuals, empowering them to navigate their surroundings and access information more independently. Figure 1
DESC:FIELD OF INVENTION:
The present invention relates to the field of assistive technologies, specifically to an assistive wearable device designed for visually impaired individuals. More particularly, the invention relates to a device that translates visual information into audio signals using advanced camera vision technology. The device is configured to be mounted on standard eyeglasses, with an adjustable camera angle to ensure the optimal viewing angle. This configuration aims to provide relevant visual information without disrupting the user’s natural walking pattern.
BACKGROUND OF INVENTION:
Over 300 million blind or visually impaired people live in the world, with India having the single largest population of over 50 million, including over 1.4 million children who are blind or have severely low vision. Due to lack of real-time and detailed information, people with low vision or who are blind find it difficult to adapt to the rapidly changing environments, making them reliant on others for safe mobility. Because the majority of these people come from lower socioeconomic backgrounds, they have limited access to assistive technology and must rely solely on blind sticks.
Consider how a malfunction of the eyes, the primary sense organs, could impair the ability to adapt to information in the surrounding environment. As the world's population ages, the issue of visual impairment becomes more important. People with visual impairments rely on assistive technology to go around in their daily lives. Being self-sufficient is a top objective for almost everyone in today's technological age, when everyone desires to be self-sufficient in order to compete in a competitive world.
The situation has improved so drastically in recent years that inventors are increasingly focusing on health-care improvements. Blind people face difficulty while navigating. They require assistance when moving from one location to another. There are numerous ways to help the blind. Many creative alternatives have been developed to allow the blind to move freely as a result of technical improvements in hardware and software. The main challenge for blind people is figuring out how to get where they want to go. Such people require the assistance of those with clear vision.
Over the past few years, scientists have been developing new devices to provide a reliable method for blind individuals to recognize obstacles and alert them when they are in danger. Several efforts have been launched over the years to help visually impaired people. The most prevalent methods include object detection, GPS, the use of ultrasonic sensors, and audio conversion. There are numerous systems with constraints and limitations. Ultrasonic sensors are the primary sensor systems of a number of existing Electronic Travel Aids (ETAs). The marketplace for mobility aid products for blind people that use an ultrasonic sensor is divided into two categories: canes and head-mounted devices. Therefore, it is essential to develop a more profitable and cost-effective perspective for the blind in order to help them navigate daily obstacles and go forward with greater reliability.
Now, to provide joints between assistive wearable devices for the blind and visually impaired along with mobile applications, there are some conventional techniques available in the market. The same is discussed below.
PRIOR ART AND IT’S DISADVANTAGES:
An Indian patent application IN202341041448A discloses the blind smart glass is a ground-breaking wearable device designed to assist individuals with visual impairments. Incorporating augmented reality, computer vision, and haptic feedback technologies, this innovative solution provides real-time audio and haptic information about the user's surroundings. The transparent display overlays visual information onto the real-world environment, while the object recognition module identifies and classifies objects, obstacles, and landmarks. Users receive audio descriptions and haptic feedback to facilitate navigation and interaction. The device offers customizable settings, wireless connectivity, and a sleek design for comfort and style. The blind smart glass empowers visually impaired individuals, enhancing their independence, safety, and access to information.
However, the blind smart glass disclosed in the said prior art is unable to recognize different Indian languages and also cannot have access to read all printed documents or books, and also fails to provide assistance in shopping and cash payment verification. Said prior art is principally and constructionally different than the present invention. Also, the cited invention fails to provide a real-time contextual understanding and description of visual scenes, summarization and interactive exploration of documents, user-specific query handling based on visual input, integration of gesture-based control for activation and interaction, document analysis and comprehension algorithms, summary generation using natural language processing techniques, and interactive query-response system tailored to document content.
An Indian patent application number IN202321031932A discloses a machine learning based smart goggles for visually impaired. The invention provides an assistive system 100 for a visually impaired individual, helping them identify and navigate their environment via auditory feedback. The system includes assistive glasses 102 fitted with an image acquisition camera and an intelligent computing chip/processor 104, such as a Raspberry Pi 4 single board computer. The chip 104 processes the images and applies object and text detection algorithms to identify objects and text, which are converted into auditory cues. An audio output device 106 integrated into the glasses transmits these cues to the visually impaired individual.
However, the blind smart glass disclosed in the said prior art is unable to recognize different Indian languages and also cannot have access to read all printed documents or books, and also fails to provide assistance in shopping and cash payment verification. Said prior art is principally and constructionally different than the present invention. Also, the cited invention fails to provide a real-time contextual understanding and description of visual scenes, summarization and interactive exploration of documents, user-specific query handling based on visual input, integration of gesture-based control for activation and interaction, document analysis and comprehension algorithms, summary generation using natural language processing techniques, and interactive query-response system tailored to document content.
The US patent application number US201916254780A discloses a software application and system may be configured to enable a user equipment or other device to be used by a visually impaired person to receive voice navigation guidance during a directed exploration of an area. Directed exploration uses combinations of location data, directional data, and orientation data from the configured device to determine a direction that user wishes to explore, and only providing narrated results for streets, businesses, and other points of interest in that direction. The system may also utilize sets of wireless indicators positioned within indoor areas to provide accurate positioning to particular locations and floors within buildings.
However, the blind smart glass disclosed in the said prior art is unable to recognize different Indian languages and also cannot have access to read all printed documents or books, and also fails to provide assistance in shopping and cash payment verification. Said prior art is principally and constructionally different than the present invention. Also, the cited invention fails to provide a real-time contextual understanding and description of visual scenes, summarization and interactive exploration of documents, user-specific query handling based on visual input, integration of gesture-based control for activation and interaction, document analysis and comprehension algorithms, summary generation using natural language processing techniques, and interactive query-response system tailored to document content.
In the prior art, there are few attempts made to assist the visually impaired and /or blind population in making them independent. The attempts were made by SHG Technologies in India by developing Smart Vision Glass and by OrCam, Envision, DotLumen, and EyeSyth internationally. However, these attempts due to some extent help to blind and visually impaired people.
DISADVANTAGE OF PRIOR ART:
Existing assistive wearable device suffers from all or at least any of below mentioned disadvantages:
? Many of the prior art fails to provide an AI-enabled assistive wearable device for blind and visually impaired people that have access to all printed documents or books.
? Many of the prior art fails to provide an AI-enabled assistive wearable device for blind and visually impaired people that assists blind people in shopping.
? Many of the prior art fails to provide an AI-enabled assistive wearable device for the blind and visually impaired which helps with cash payments that may be verified using the device.
? Many of the prior art fails to provide an AI-enabled assistive wearable device for the blind and visually impaired that is easy to use and, therefore, user-friendly.
? Most of the prior arts fail to provide an AI-enabled assistive wearable device for the blind and visually impaired, which uses a device that is connected to the user-interface like mobile phone via a USB cable, and the same cable powers the device.
? Many of the prior art fails to provide an AI-enabled assistive wearable device for the blind and visually impaired which activates the application for user without the human intervention.
? Many of the prior art fails to provide an AI-enabled assistive wearable device for blind and visually impaired people that uses the user's mobile phone and is powerful enough to deliver the output or result to blind people.
? Many of the prior art fails to provide an AI-enabled assistive wearable device for the blind and visually impaired that is compact.
? Many of the prior art fails to provide an AI-enabled assistive wearable device for the blind and visually impaired that works without requiring the internet for image processing on the servers.
? Many of the prior art fails to provide an AI-enabled assistive wearable device for the blind and visually impaired that have a short operating time.
? Many of the prior art fails to provide an AI-enabled assistive wearable device for the blind and visually impaired that gives real-time information.
? Many of the prior art fails to provide an AI-enabled assistive wearable device for the blind and visually impaired that works without requiring battery recharge.
? Many of the prior art fails to provide an AI-enabled assistive wearable device for the blind and visually impaired that does video processing to provide information using a single camera.
? Many of the prior art fails to provide an AI-enabled assistive wearable device for the blind and visually impaired that works without requiring the user to press a button when they need the information.
OBJECTS OF THE INVENTION:
The main object of the present invention is to provide an AI-enabled assistive wearable device for the blind and visually impaired.
Another object of the present invention is to provide an AI enabled assistive wearable device for blind and visually impaired that have access to all printed documents or books.
Another object of the present invention is to provide an AI enabled assistive wearable device for blind and visually impaired that assist blind people in shopping.
Another object of the present invention is to provide an AI enabled assistive wearable device for blind and visually impaired which helps with cash payments that shall be verified using the assistive device.
Another object of the present invention is to provide an AI enabled assistive wearable device for blind and visually impaired that is easy to use.
Another object of the present invention is to provide an AI enabled assistive wearable device for blind and visually impaired which uses a device that is connected to the user-interface like mobile phone via a USB cable, and the same cable powers the device.
Another object of the present invention is to provide an AI enabled assistive wearable device for the blind and visually impaired, which activates the application for the user without human intervention.
Another object of the present invention is to provide an AI enabled assistive wearable device for blind and visually impaired that uses the users mobile phone and is powerful enough to deliver the output or result to blind people.
Another object of the present invention is to provide an AI enabled assistive wearable device for blind and visually impaired that is compact.
Another object of the present invention is to provide an AI enabled assistive wearable device for blind and visually impaired that works without requiring the internet for image processing on the servers.
Another object of the present invention is to provide an AI enabled assistive wearable device for blind and visually impaired that have short operating time.
Another object of the present invention is to provide an AI enabled assistive wearable device for blind and visually impaired that gives real-time information.
Another object of the present invention is to provide an AI enabled assistive wearable device for blind and visually impaired that does video processing to provide information using single camera and translate the visual world into audio using camera vision.
Another object of the present invention is to provide an AI enabled assistive wearable device for blind and visually impaired works without requiring battery recharge.
It is also object of the present invention to develop a device that improves the life of blind and visually impaired individuals, increases their access to information, and enhances their ability to participate in various activities.
BRIEF DESCRIPTION OF FIGURES
Various other objects, features and attendant advantages of the present invention will become fully appreciated as the same becomes better understood when considered in conjunction with the accompanying drawings, in which like reference characters designate the same or similar parts throughout the several views, and wherein:
Figure 1 : Shows an illustrative view of an AI enabled assistive wearable device for blind and visually impaired according to the present invention.
Figure 2 : Shows a side view of an AI enabled assistive wearable device according to the present invention.
Figure 3 : (a) Shows an exploded view of AI enabled assistive wearable device for blind and visually impaired according to the present invention.
(b) Shows an illustrative view of Single Shot MultiBox Detector (SSD) framework of AI enabled assistive wearable device according to the present invention.
(c) Shows an illustrative view of MobileNet V2 backbone framework of AI enabled assistive wearable device according to the present invention.
(d) Shows an illustrative view of Feature Pyramid Network Lite (FPNLite) framework of AI enabled assistive wearable device according to the present invention.
(e) Shows an illustrative view of SSD MobileNet V2 FPNLite 640 construction of AI enabled assistive wearable device according to the present invention.
Figure 4 : Shows a flowchart of the working procedure of the wearable device for blind and visually impaired according to the present invention.
Figure 5 : Shows a user equipment (smart phone) showcasing user interface application (App) of AI enabled assistive wearable device for blind and visually impaired according to the present invention.
Figure 6 : Shows an initial setup for user signing in to the user interface application of AI enabled assistive wearable device for blind and visually impaired according to the present invention.
Figure 7 : (a) Shows an outdoor object recognition mode of the user interface application of AI enabled assistive wearable device for blind and visually impaired according to the present invention.
(b) Shows an indoor object recognition mode of the user interface application of AI enabled assistive wearable device for blind and visually impaired according to the present invention.
(c) Shows a workflow of the real-time object recognition mode of AI enabled assistive wearable device for blind and visually impaired according to the present invention.
Figure 8 : (a) Shows a text recognition mode of the user interface application of AI enabled assistive wearable device for blind and visually impaired according to the present invention.
(b) Shows an AI-powered text recognition workflow of AI enabled assistive wearable device for blind and visually impaired according to the present invention.
(c) Shows an AI-tutor text recognition workflow of AI enabled assistive wearable device for blind and visually impaired according to the present invention.
Figure 9 : (a) Shows a currency identification/verification mode of the user interface application of AI enabled assistive wearable device for blind and visually impaired according to the present invention.
(b)Shows a currency identification/verification workflow of AI enabled assistive wearable device for blind and visually impaired according to the present invention.
Figure 10 : Shows a smart eye system workflow of AI enabled assistive wearable device for blind and visually impaired according to the present invention.
SUMMARY OF THE INVENTION:
The present invention introduces an assistive wearable device configured to enhance the independence of individuals who are blind or visually impaired. This device utilizes advanced camera vision technology to translate visual information into audio signals, providing users with a valuable tool for navigating their surroundings and accessing information. The device is seamlessly integrated into an eyeglasses frame, featuring a USB camera mounted on the arm. This configuration ensures a natural and comfortable fit for the user, allowing for unobstructed movement. The device captures real-time video footage, which is then processed to extract relevant visual information. This processed data is subsequently conveyed to the user through audio feedback, enabling them to understand and interact with their environment more effectively. The device offers a range of features, including object recognition, currency verification, document reading, live video calls, educational content access, and wayfinding assistance. These features empower users to navigate their surroundings confidently, manage their finances independently, access information from printed materials, receive real-time support, and acquire educational resources. The device is designed for ease of use, with a user-friendly interface and customizable settings. It may be easily attached to eyeglasses frames and powered by a user equipment, ensuring convenience and portability. Overall, this assistive wearable device represents a significant advancement in technology for individuals with visual impairments, offering them greater independence and autonomy in their daily lives.
Reference numerals of said parts of the present invention.
1 : Eyeglasses
2 : USB camera of 5MP
3 : Enclosure
4 : USB Cable
5 : User equipment
6 : Screws
8 : Angle Supporter
9 : Cable Tie
DETAILED DESCRIPTION OF THE INVENTION
The following description is presented to enable any person skilled in the art to make and use the invention and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present invention. Thus, the present invention is not intended to be limited to the embodiments shown but is to be accorded the widest scope consistent with the principles and features disclosed herein.
It is to be understood that the term “comprising” or “comprises” used in the specification and claims refers to the element of the invention which comprises X, Y, and Z, which means that the invention might have other elements in addition to X, Y, and Z. For example, their invention could include A, B, and/or C as long as it also has X, Y, and Z.
It is to be understood that the term “wearable device” or “device” herein referred in the specification has been used interchangeably and shall not change the original scope for protection of the invention.
Now according to the embodiments of the present invention, an Artificial Intelligence (AI)-enabled assistive wearable device for blind and visually impaired have been disclosed.
Said device translates visual information into audio signals using advanced camera vision technology. The device is configured to be mounted on standard eyeglasses, with an adjustable camera angle to ensure the optimal viewing angle which provides relevant visual information without disrupting the user’s natural walking pattern.
Said device works using a TensorFlow Lite Model Maker, which is a processor that streamline the development of tailored TensorFlow Lite models for deployment on edge devices. By leveraging transfer learning techniques, the library significantly reduces the mandatory training data volume and accelerates the training process. This renders it particularly well-suited for scenarios where computational resources and data availability may be constrained.
The library incorporates a set of key features, including the capacity to integrate custom datasets, streamline the model development workflow, and leverage pre-trained models. It is optimized for edge devices, ensuring efficient performance on user interface. TensorFlow Lite's deep integration with the TensorFlow ecosystem enables a seamless transition from model development to deployment, making it a preferred choice for numerous developers.
Within the broader TensorFlow Ecosystem, TensorFlow Lite Model Maker provides a valuable tool for addressing the challenges associated with deploying machine learning models on edge devices. By integrating seamlessly with the TensorFlow framework, developers may efficiently create and deploy customized TensorFlow Lite models, even in environments with limited computational resources.
Now referring to Figures 1 to 3a of the present invention, the AI enabled assistive wearable device according to an exemplary embodiment is shown. The AI-enabled assistive wearable device comprises of
• an eyeglass (1),
• an enclosure (3) connected with an arm of the eyeglass (1),
• a USB (Universal Serial Bus) cable (4) operatively fixed with a circuitry of the enclosure (3), and
• a user equipment (5) detachably connected with the USB cable (4) of the enclosure (3),
It is to be noted that a pair of eyeglasses comprising eye wires or rims that encircle and secure the lenses in place, a bridge connecting the two eye wires, nose pads that facilitate the positioning of the eye wires on the nose, and pad arms that connect the nose pads to the eye wires. It is to be understood that the term 'standard' is illustrative and not restrictive, as the specific shape and size of the eyeglasses may vary, and the invention is not limited to the aforementioned protection.
Further, the device of the present invention consists of the eyeglasses arm (1) equipped with a USB camera (2) mounted on the arm of the eyeglasses. The camera is stored within an enclosure (3) and connected to the user equipment (5) via the USB cable (4). The camera is securely attached to the arm of the eyeglasses using at least two screws (6). The user equipment (5), with a high-capacity battery for extended usage, is responsible for capturing images from the camera, processing them, and generating audio output.
For certain features, an application with internet connectivity may be required. A protective case (not shown) is provided to safeguard the device when not in use. An angle support (8) allows users to adjust the camera's position for optimal viewing, and a cable tie (9) is used to securely fasten the device to the eyeglasses frame. The audio output is delivered through the user equipment's (5) speakers or connected audio devices.
Further, it is to be noted that the device performance may vary depending on the specific user equipment like mobile phone model and its configuration.
It is to be understood that the user equipment may be selected from but not limited to smartphones etc.
Referring to Figure 3a, said enclosure (3) is a cubic structure designed to provide enclosure to the electrical or electronic components. The enclosure according to the present invention consists of
• the USB camera (2) having a resolution of 5 megapixels (MP) consists of a lens,
• a real-time object detection model,
• a high-performance object detection model,
• an Open Neural Network Exchange Text Recognition (Onnx TR) model,
• an Application Programming Interface (API) model, and
• a Generative Pre-trained Transformer 4 (GPT-4) model
For the real-time object detection model on the user equipment, the SSD MobileNet V2 FPNLite 640 model is preferably configured, striking an optimal balance between speed and accuracy. Said model is particularly suited for applications necessitating rapid detection. The core construction integrates the Single Shot MultiBox Detector (SSD) framework, which combines detection and classification within a single network pass, thereby enhancing both speed and accuracy. Due to that, it has a short operating time. The SSD construction is capable of real-time object detection by synchronously predicting multiple bounding boxes and class scores.
The model employs the MobileNet V2 backbone, which utilizes depth-wise separable convolutions to minimize computational costs while maintaining performance. This lightweight convolutional neural network is specifically tailored for mobile and embedded devices, making it a key component of the SSD MobileNet V2 FPNLite 640 model. Additionally, the Feature Pyramid Network Lite (FPNLite) enhances the model’s capability to detect objects at various scales by leveraging multi-level feature maps. This feature significantly improves object detection by providing comprehensive multi-scale feature representations, enabling the model to effectively manage objects of different sizes.
Said model demonstrates a Mean Average Precision (mAP) of approximately 21.8% on the COCO dataset with an input size of 640x640. This metric evaluates the model’s proficiency in accurately detecting and classifying objects across various recall levels, with a score of 21.8% indicating its effectiveness in object identification and localization within images.
It is to be understood that the terms of precision, recall and inference time, precision measures the accuracy of the model’s positive predictions, calculated as the ratio of true positive detections to the sum of true positive and false positive detections. Higher precision values signify fewer false positives and enhanced detection accuracy. Recall, on the other hand, assesses the model’s ability to detect all relevant objects, defined as the ratio of true positive detections to the sum of true positive detections and false negatives. Higher recall values reflect the model’s efficiency in capturing a larger proportion of true objects. Furthermore, the model is optimized for rapid inference, making it suitable for real-time applications. The inference time is a key factor for scenarios requiring swift detection and classification, such as in mobile and embedded vision systems.
The SSD MobileNet V2 FPNLite 640 model is versatile, suitable for various applications such as mobile vision systems, autonomous vehicles, drones, retail, and customer interaction. Its lightweight design and fast inference speed enable real-time object detection, safe navigation, and efficient inventory management.
In the embodiment of the present invention, said SSD (Single Shot MultiBox Detector) framework is an advanced object detection system engineered to execute detection and classification tasks within a single forward pass through the network. The SSD MobileNet V2 FPNLite 640 model integrates the SSD framework with the MobileNet V2 backbone and the FPNLite (Feature Pyramid Network Lite) configuration. This incorporation is specifically optimized for enhanced speed and efficiency, rendering it highly suitable for real-time applications on the user equipment. The model’s configuration ensures compliance with performance standards required for such applications, thereby facilitating its deployment in environments where rapid and accurate object detection is essential.
Referring to Figure 3b, said Single Shot MultiBox Detector (SSD) framework means configured to facilitate the simultaneous detection and classification of objects within a single network pass. This is achieved by predicting bounding boxes and class scores across various scales and aspect ratios, enabling the efficient detection of multiple objects within an image.
The key components of the SSD framework means consist of default bounding boxes and multi-scale feature maps. Said default boxes are employed at multiple scales and aspect ratios, facilitating the comparison with ground truth boxes and the generation of predictions for object locations and classifications. Said multi-scale feature maps are utilized to detect objects of varying sizes, enhancing the model's versatility and applicability across different domains.
The SSD framework's ability to detect and classify objects in a single network pass significantly reduces computational load, making it particularly well-suited for real-time applications. Its multi-scale approach further enhances its versatility, enabling the detection of a wide range of object sizes.
Referring to Figure 3c, said MobileNet V2 construction is a lightweight convolutional neural network configured specifically for the user equipment applications. It employs depth-wise separable convolutions, linear bottlenecks, and inverted residuals to optimize computational efficiency while maintaining high accuracy.
Said MobileNet V2 consists of depth-wise separable convolutions, linear bottlenecks and inverted residuals. Wherein, said convolutions decompose standard convolutions into depth-wise and pointwise convolutions, reducing the number of parameters and computational complexity. The linear bottlenecks compress the input and output of each block, further reducing computational overhead without compromising feature extraction capabilities. Said inverted residual structure enables efficient feature extraction by expanding input channels, applying depth-wise convolutions, and projecting results back to a lower dimension.
These configuration selections collectively contribute to the MobileNet V2's low computational cost and its ability to achieve a good balance of speed and accuracy, making it well-suited for real-time applications on mobile devices.
Referring to Figure 3d, said FPNLite architecture is a refined version of the traditional Feature Pyramid Network (FPN), configured to enhance the model's ability to detect objects at various scales. By incorporating multi-level feature maps, FPNLite effectively combines low-level features (capturing fine details) with high-level features (capturing abstract, context-rich information), improving the detection of objects of varying sizes within the same image.
To ensure suitability for resource-constrained devices, FPNLite streamlines the process of generating multi-scale feature maps, reducing computational complexity compared to traditional FPNs. This optimization allows FPNLite to maintain high detection accuracy while minimizing additional computational load, aligning with the overall goal of efficient edge computing.
Referring to Figure 3e, said SSD MobileNet V2 FPNLite 640 configuration is a state-of-the-art object detection model designed for efficiency and real-time performance. It consists of several key components which have been described below:
- The model begins by taking an input image of size 640x640 pixels, a size chosen to balance detail and processing speed, thereby enabling efficient detection of objects of various sizes.
- The input image is then processed through depth-wise separable convolutions within the MobileNet V2 backbone to extract features with reduced computational cost. These extracted features are compressed using linear bottlenecks, facilitating efficient feature extraction. Further processing occurs through inverted residual blocks, which enhance the model’s ability to capture complex patterns and details.
- Subsequently, the FPNLite configuration combines low-level features (which capture fine details) and high-level features, (which capture abstract information) from different layers of the backbone network. This combination generates multi-scale feature maps, essential for detecting objects of varying sizes within the same image.
- The SSD framework then applies default bounding boxes at different scales and aspect ratios across these multi-scale feature maps. The model predicts bounding boxes and class scores for each feature map, generating a set of potential object detections.
- In the post-processing phase, the model applies non-max suppression to filter out redundant bounding boxes, retaining only the most confident predictions. The final output consists of the remaining bounding boxes and class scores, which identify the locations and classifications of objects within the input image.
Furthermore, said SSD MobileNet V2 FPNLite 640 configuration offers several advantages, such as its lightweight design and efficient feature extraction process, which make it suitable for resource-constrained devices and real-time applications. The model achieves high accuracy in object detection tasks, even on challenging datasets, and it may be applied to a wide range of industries and use cases, from mobile vision systems to autonomous vehicles and beyond.
The SSD MobileNet V2 FPNLite 640 model integrates the Single Shot MultiBox Detector (SSD) and MobileNet configures to optimize both accuracy and computational efficiency in object detection tasks. The model’s efficacy is quantified using the Mean Average Precision (mAP) metric, which evaluates the model’s proficiency in detecting and classifying objects across varying recall thresholds. For the COCO (Common Objects in Context) dataset, the model typically achieves a mAP score of approximately 21.8%, underscoring its capability in object detection.
The key performance indicators consist of precision, recall metrics and inference time. Said precision measures the model’s accuracy by indicating the proportion of true positive detections among all positive detections, thereby minimizing false positives. The recall assesses the model’s completeness by determining the proportion of true positive detections among all actual positives, thus ensuring most relevant objects are detected. The balance between precision and recall is critical, contingent upon specific application requirements.
Said inference time is a pivotal consideration, particularly for real-time applications. The SSD MobileNet V2 FPNLite 640 model is configured for rapid inference, rendering it suitable for deployment on edge devices where prompt object detection is imperative. The model’s performance metrics and efficiency render it a viable option for diverse object detection applications.
The evaluation of currency detection models using standardized metrics is important to ensure their reliability and effectiveness in practical applications. By assessing metrics such as Mean Average Precision (mAP), developers may gain valuable insights into the model’s accuracy, efficiency, and overall performance.
Said mAP is a comprehensive metric that evaluates the model’s ability to detect and classify objects across various recall levels. It is calculated by analyzing the precision-recall curve for each object class, which plots the precision against the recall. The area under the precision-recall curve for each class represents the Average Precision (AP), providing a single-number summary of the model’s performance. The mean of the AP values across all object classes gives the Mean Average Precision (mAP), which reflects the model’s overall accuracy in detecting and classifying objects.
On the COCO dataset, the SSD MobileNet V2 FPNLite 640 model typically achieves a mean Average Precision (mAP) score of approximately 21.8% for the 640x640 input size. This performance metric indicates that while the model demonstrates a reasonable level of accuracy, it may necessitate further fine-tuning to meet the precision requirements of applications demanding higher accuracy.
The precision of a model, as defined in the context of object detection, is a critical metric that measures the accuracy of its positive predictions. It is calculated as the ratio of true positive detections to the sum of true positive and false positive detections.
Precision = True Positives / (True Positives + False Positives)
Wherein:
• True Positives: Objects that were correctly identified by the model.
• False Positives: Objects that the model incorrectly identified as positive.
The SSD MobileNet V2 FPNLite 640 model is configured to achieve an optimal balance between precision and computational efficiency. This model is particularly suited for deployment in the user equipment, where resource constraints necessitate a trade-off between accuracy and speed. However, it should be noted that the precision of SSD MobileNet V2 FPNLite 640 may not meet the stringent accuracy requirements of applications that prioritize maximum detection accuracy.
The recall, in the context of object detection, is a metric that quantifies a model's ability to identify all relevant objects within an image. It is calculated as the ratio of true positive detections to the sum of true positive detections and false negatives.
Recall = True Positives / (True Positives + False Negatives)
Wherein:
• True Positives: Objects that were correctly identified by the model.
• False Negatives: Objects that the model failed to identify.
The SSD MobileNet V2 FPNLite 640 model is configured for optimal speed and efficiency. This optimization may result in a compromise with recall rates. Said model is designed to sustain a balanced recall rate that is adequate for numerous real-time applications.
The Inference time, in the context of object detection models, refers to the computational time required for the model to process an input image and generate predictions. This metric is essential for real-time applications where the speed of detection and classification directly impacts the user experience.
Said SSD MobileNet V2 FPNLite 640 model is configured for rapid inference on the user equipment. Its construction, incorporating the MobileNet V2 backbone and FPNLite enhancements, facilitates expedited processing. This makes it particularly suitable for real-time currency detection applications, ensuring compliance with performance requirements for the user equipment.
In conclusion, the evaluation of currency detection models using metrics such as mean Average Precision (mAP), precision, recall, and inference time offers a thorough assessment of their performance capabilities. The SSD MobileNet V2 FPNLite 640 model exhibit distinct advantages, rendering them appropriate for various applications. Said SSD MobileNet V2 FPNLite 640 model is optimized for environments where computational efficiency and rapid processing are critical. It demonstrates moderate accuracy, achieving a mean Average Precision (mAP) of 21.8%. This model strikes a balance between precision and recall, ensuring reliable performance. Its primary benefit is its expedited inference time and energy efficiency, making it particularly suitable for applications necessitating real-time processing capabilities, even if this entails a marginal trade-off in accuracy. This model is thus highly recommended for deployment in resource-constrained settings where operational speed and energy conservation are of paramount importance.
Now referring to the high-performance object detection model with a focus on efficiency, an EfficientDet D0, a member of the EfficientDet family, is constructed. Said model is particularly suitable for deployment in environments where computational resources are limited, yet accuracy remains a main requirement. The model integrates the EfficientNet backbone, which utilizes compound scaling to optimize the model’s width, depth, and resolution. This optimization ensures a balanced trade-off between performance and resource consumption, thereby enhancing both accuracy and efficiency. Additionally, the compound scaling feature enables the model to uniformly scale its width, depth, and resolution, ensuring adaptability across various deployment scenarios while maintaining high accuracy and computational efficiency.
Said EfficientDet D0 configuration is a highly efficient object detection model designed for real-time applications. It consists of several key components:
- A input layer of the model takes an image of size 640x640 pixels with 3 channels, representing the RGB color space. This image is fed into the model for processing, where it shall be analyzed to detect objects, such as currency notes in specific applications.
- A backbone of the model is EfficientNet, which consists of several key components. The CBS blocks (Conv + BN + SiLU) consist of convolutional filters (Conv) that extract features from the input image, batch normalization (BN) to stabilize and speed up training, and the SiLU (Sigmoid Linear Unit) activation function to introduce non-linearity, enabling the model to learn complex patterns. The C3 blocks are residual blocks that include multiple CBS blocks and a bottleneck structure, allowing the model to learn features at multiple scales and depths. These bottlenecks may be configured with or without residual connections: the “False” configuration is a standard bottleneck without residual connections, while the “True” configuration includes a residual (skip) connection to help prevent degradation of features and facilitate easier training of deeper networks. Additionally, the model incorporates the Spatial Pyramid Pooling Fast (SPPF) block, which applies pooling operations at different scales (max-pooling in parallel) to create a multi-scale feature map. The outputs of these different scale poolings are concatenated and passed through the CBS block. This helps the model effectively capture features at different scales, which is particularly useful for detecting objects of varying sizes.
- A neck of the model acts as an intermediary between the backbone and the head, further processing the feature maps extracted by the backbone and preparing them for the final detection stage. It includes components such as
o A Upsample, which increases the spatial dimensions of the feature maps to match those of previous layers, facilitating their combination.
o A Concat component concatenates feature maps from different levels of the backbone, combining low-level spatial information from earlier layers with high-level semantic information from deeper layers.
o A C3 Blocks further processes the concatenated feature maps to refine the features before they are passed to the detection head.
- A head of the model is responsible for producing the final object detection outputs, including bounding boxes and class predictions. Said comprises detect blocks that operate at different spatial resolutions (80x80, 40x40, 20x20), each responsible for detecting objects at different scales. Each detect block outputs a set of bounding boxes and class predictions, allowing the model to detect both small and large objects within the same image. The detection process involves the output of each detect block being a feature map where each point corresponds to a potential object. The model predicts the bounding box coordinates, class probabilities, and objectness score, indicating how likely it is that an object is present at that location.
The EfficientDet D0 configuration offers several advantages, such as being designed to be highly efficient, making it suitable for real-time applications. The model achieves high accuracy in object detection tasks, and the EfficientDet family of models (D0-D7) allows for scaling the model to different resource constraints.
Said model demonstrates a Mean Average Precision (mAP) of approximately 33.4% on the COCO dataset, indicating robust performance in object detection and classification. This mAP value reflects the model’s efficacy in accurately localizing and identifying objects. The precision, a metric that measures the accuracy of positive detections, highlights the model’s capability to minimize false positives and ensure high true positive identification. Said recall, on the other hand, assesses the model’s proficiency in detecting all relevant objects within an image, with higher recall values signifying the model’s effectiveness in identifying a substantial portion of true positives. Furthermore, said EfficientDet D0 is engineered for computational efficiency, making it suitable for environments with limited resources. The model adeptly balances speed and accuracy, ensuring optimal performance even under resource constraints.
Said EfficientDet D0 is a high-accuracy, efficient object detection model suitable for smart cameras, industrial automation, agriculture, and environmental monitoring. It achieves a higher mAP compared to SSD MobileNet V2 FPNLite 640, ensuring precise detection. Its compound scaling approach allows for efficient resource use and high accuracy, making it ideal for resource-constrained environments.
The EfficientDet D0 model represents a notable advancement in object detection capabilities when compared to the SSD MobileNet V2 FPNLite model. According to the Mean Average Precision (mAP) metric, said EfficientDet D0 achieves a mAP score of approximately 33.4% on the COCO dataset, significantly surpassing the performance of SSD MobileNet V2 FPNLite. This higher mAP score indicates superior accuracy in object detection and classification, making EfficientDet D0 a preferable option for applications requiring high precision and recall.
The inference time for said EfficientDet D0 is engineered for computational efficiency, striking an optimal balance between inference speed and accuracy. This makes it particularly suitable for deployment in scenarios where both performance and efficiency are critical considerations. The EfficientDet D0 offers substantial improvements in both object detection performance and computational efficiency over SSD MobileNet V2 FPNLite, thereby presenting a convincing option for diverse application contexts.
The evaluation of currency detection models using standardized metrics is crucial to ensure their reliability and effectiveness in practical applications. By assessing metrics such as Mean Average Precision (mAP), developers may gain valuable insights into the model’s accuracy, efficiency, and overall performance.
Said mAP is a comprehensive metric that evaluates the model’s ability to detect and classify objects across various recall levels. It is calculated by analyzing the precision-recall curve for each object class, which plots the precision against the recall. The area under the precision-recall curve for each class represents the Average Precision (AP), providing a single-number summary of the model’s performance. The mean of the AP values across all object classes gives the Mean Average Precision (mAP), which reflects the model’s overall accuracy in detecting and classifying objects.
The EfficientDet D0 model, recognized for its efficiency and superior accuracy, achieves a mean Average Precision (mAP) score of approximately 33.4% on the COCO dataset. This elevated mAP score, in comparison to the SSD MobileNet V2 FPNLite 640, suggests that EfficientDet D0 is more suitable for applications necessitating higher precision in object detection.
The precision of a model, as defined in the context of object detection, is a critical metric that measures the accuracy of its positive predictions. It is calculated as the ratio of true positive detections to the sum of true positive and false positive detections.
Precision = True Positives / (True Positives + False Positives)
Wherein:
• True Positives: Objects that were correctly identified by the model.
• False Positives: Objects that the model incorrectly identified as positive.
The EfficientDet D0 model, advancing its architectural configuration, delivers superior precision relative to the SSD MobileNet V2 FPNLite 640. This enhanced precision renders EfficientDet D0 more appropriate for applications demanding high accuracy in object detection, such as the identification of currency, where minimizing false positives is of paramount importance.
The recall, in the context of object detection, is a metric that quantifies a model's ability to identify all relevant objects within an image. It is calculated as the ratio of true positive detections to the sum of true positive detections and false negatives.
Recall = True Positives / (True Positives + False Negatives)
Wherein:
• True Positives: Objects that were correctly identified by the model.
• False Negatives: Objects that the model failed to identify.
The EfficientDet D0 model exhibits superior recall rates compared to the SSD MobileNet V2 FPNLite 640. This characteristic renders it more appropriate for scenarios where the exhaustive detection of currency instances is imperative, such as in precise financial transactions or comprehensive currency validation processes.
The inference time, in the context of object detection models, refers to the computational time required for the model to process an input image and generate predictions. This metric is crucial for real-time applications where the speed of detection and classification directly impacts the user experience.
Said EfficientDet D0 model, optimized for both accuracy and computational efficiency, delivers competitive inference times. It is suitable for situations necessitating computational efficiency while maintaining a degree of real-time processing capability. This balance ensures adherence to operational standards where both performance and efficiency are critical.
In conclusion, the evaluation of currency detection models using metrics such as mean Average Precision (mAP), precision, recall, and inference time offers a thorough assessment of their performance capabilities. The EfficientDet D0 model exhibit distinct advantages, rendering them appropriate for various applications.
EfficientDet D0 demonstrates superior accuracy and recall, rendering it highly suitable for applications necessitating meticulous and precise currency detection. With a mean Average Precision (mAP) of 33.4%, it significantly surpasses the detection accuracy of SSD MobileNet V2 FPNLite 640. Despite its marginally larger model size, the EfficientDet D0’s optimized inference time and high efficiency establish it as a robust candidate for use cases where both accuracy and computational efficiency are paramount.
The aforementioned metrics, as applied to real-time and high-performance object detection models, serve as an important framework for developers. These metrics facilitate the selection of the most suitable model for specific applications and guide the fine-tuning and optimization processes. This ensures an optimal balance between accuracy, processing speed, and operational efficiency.
In another embodiment of the present invention, model size constitutes a pivotal consideration, particularly when deploying models onto devices characterized by limited storage or memory capacity. The SSD MobileNet V2 FPNLite 640 model, renowned for its compact dimensions, is very suitable for deployment on the user equipment. Although the EfficientDet D0 model exhibits a marginally larger footprint, it remains optimized for efficiency and may be deployed on analogous devices with certain adjustments.
Referring to another embodiment of the present invention, energy consumption emerges as a key factor, especially for devices powered by batteries. The SSD MobileNet V2 FPNLite 640's configuration is optimized for low energy consumption, rendering it ideal for applications such as drones, smartphones, and wearable devices. The EfficientDet D0, with its efficient scaling, also offers a favourable balance between performance and power usage, making it suitable for environments constrained by energy limitations.
For the Open Neural Network Exchange Text Recognition (Onnx TR) model, the Doc AI system constitutes a multipurpose tool designed for the management and analysis of diverse document formats. It extracts textual content and images, processes scanned PDFs through the application of Optical Character Recognition (OCR), and generates AI-powered descriptions for images. The system stores extracted data within MongoDB (which is a document orientated database program) and leverages Langchain (a framework that helps facilitate the integration of large language models into applications) for efficient document processing, retrieval, and question-answering capabilities. Key features encompass multi-format support, OCR capabilities, image detection, orientation correction, the creation of searchable PDFs (Portable Document Format), and MongoDB integration.
The system facilitates document uploads via a FastAPI endpoint (fast (high-performance), web framework for building APIs with Python based on standard Python-type hints), accepting various document formats. Upon upload, the system identifies the document type and routes it through the appropriate processing pipeline. For PDFs, searchable PDFs have their text extracted directly, while scanned PDFs undergo OCR to generate a new searchable PDF with embedded text. The DOCX and PPT/PPTX format documents have their text and images extracted, with PowerPoint files converted to PDFs prior to extraction. The plain text files are processed directly. The system utilizes OnnxTR for OCR, detecting document orientation and extracting text from images. The images within documents are identified, extracted, and their orientation is corrected. The extracted images are analysed by an OpenAI model to generate textual descriptions. Further, the extracted text, images, and metadata are stored in MongoDB, with vector embeddings generated and stored for semantic searches. MongoDB Atlas is used for vector searches, enabling retrieval based on semantic content. The system interacts with OpenAI API for image descriptions and document content processing, and MongoDB Atlas for data storage. The temporary files are stored and subsequently cleaned up to maintain efficient storage utilization.
For the Open AI Application Programming Interface (API) model, the AI Tutor which is an interactive learning system, is integrated into the user interface application. It uses the OpenAI API to provide real-time answers and explanations based on user prompts. The key features include interactive learning, customizable responses, instant information retrieval, multi-disciplinary support, and conversational context retention. The system processes user prompts, interacts with the OpenAI API, and offers customization options for fine-tuning responses. It also manages session context and ensures secure communication with the OpenAI API.
For the Open AI Generative Pre-trained Transformer 4 (GPT-4) model, the smart eye system constitutes a multipurpose tool configured for the capture and analysis of real-time images from live feeds. By leveraging OpenAI's GPT-4 model, the system generates detailed and context-aware descriptions of visual content. The key features encompass real-time image capture, image description generation, contextual analysis, and system integration capabilities. The system integrates with live feeds, supports high-resolution images, and utilizes GPT-4 for image description generation. It analyses not only objects but also their relationships and the overall scene, providing adaptable descriptions. The system offers API access for image capture and description retrieval, ensures real-time processing, and integrates with other AI features. The images and their descriptions are stored temporarily, with efficient cleanup processes in place. The system is constructed to handle multiple live feeds simultaneously and may be scaled to accommodate varying deployment environments.
Referring to another embodiment depicted in Figure 4, said wearable device connected with the user equipment (5) through the USB cable (4), having the camera (2) initiates the processing by capturing images at predefined intervals of the objects, which is subsequently transferred to the user equipment (5), where the user interface application processes them using selected feature extraction means. The processed data is then extracted as audio output through the user equipment's (5) speaker or connected audio peripherals such as earphones or headphones, providing the user with a tangible/physical representation of the visual information.
Referring to another embodiment as depicted in Figures 5 to 9, the user equipment visual representation has been disclosed. Said camera (2) is connected to the user equipment (5) via said USB cable (4) and said device is further powered through the mobile phone battery. Upon insertion of the USB cable (4) into the user equipment’s (5) USB port, the device is powered, and the associated user interface application is activated without requiring human intervention. Before device activation, the user must install the application and register using their phone number. Upon successful registration, the device establishes a connection with the application, enabling seamless interaction and utilization of its features. During the initial setup, the user must require a one-time sign in to the application. Following the successful initial authentication, the application redirects the user to the home page of the user interface application.
The assistive wearable device of the present invention initiates with object detection mode, wherein the wearable device commences the identification of objects positioned within the camera's (2) field of view. The user possesses the flexibility to modify accessibility settings to align with their specific needs. To achieve this, users may leverage the accessibility mode integrated into their user equipment’s and employ gesture-based commands to transition between various modes as required. These modes encompass text reading, currency identification/verification, educational learning resources, document summarization, and the ability to obtain answers to targeted queries pertaining to document content. Additionally, users also have accessibility to customize language preferences, activate or deactivate small text detection within the object recognition mode, and access and manage saved files from the text mode.
The camera (2) is designed to capture visual information from the surrounding environment by recording video. This video data is then processed locally on the device without requiring an internet connection. The processing involves real-time analysis of the video frames to extract relevant information, which is subsequently conveyed to the user through audio or haptic feedback.
In addition to continuous video recording, the user has an option to capture three still images at intervals of one second. These images are also processed locally on the device, utilizing on-edge video processing techniques. The output of this processing is presented to the user in the form of audio, delivered through the device's built-in speakers or connected audio peripherals such as earphones or headphones.
The device settings are fully customizable to meet individual user preferences. This includes the ability to adjust the audio feedback volume and select the desired type of haptic feedback. Further to this, said camera (2) is designed to be securely attached to the arm of the eyeglasses (1) using the cable tie (9).
The camera (2) offers users the flexibility to select from four distinct viewing angles, ensuring optimal field of view for individual preferences and head positioning. Recognizing that each person has a unique viewing style, the device allows users to preset and securely fasten their preferred angle, eliminating the need for frequent adjustments. This feature enhances the overall ease of use and personalized experience provided by the device.
In accordance with the present invention, the device incorporates the following features:
In another embodiment of the present invention, the device is equipped with object detection capabilities, enabling users to navigate their surroundings with greater confidence. It accurately detects a wide range of objects, including people, furniture, and obstacles, providing real-time audio feedback to describe their location and nature. This information empowers users to make informed decisions and avoid potential risks.
In another embodiment of the present invention, the device incorporates a currency identification/verification feature that accurately identifies and provides real-time audio feedback on the money and value of banknotes. This functionality is particularly beneficial for independent financial management, ensuring accuracy and reducing the risk of errors during transactions.
In another embodiment of the present invention, the device employs camera to capture images of printed or handwritten documents. These images are then processed and converted into audio format, enabling users to access information that may otherwise be challenging or impossible to read. This feature facilitates tasks such as reviewing contracts, legal documents, or medical forms, also user may upload the documents, which are then processed on the server. Additionally, users have the option to upload documents for processing on a server. The user interface application offers document summarization capabilities and the ability to obtain answers to targeted queries pertaining to document content.
In another embodiment of the present invention, the device facilitates live video calls with volunteers, providing real-time assistance and guidance in navigating unfamiliar environments, reading labels or signs, and identifying objects. This feature enhances the user's independence and ability to perform daily tasks.
In another embodiment of the present invention, the device offers access to audio books, providing users with educational resources and the flexibility to listen to content at their convenience.
In another embodiment of the present invention, when multiple objects are detected, the device's wayfinding feature provides precise guidance on the safest path to avoid obstacles and potential injuries.
The present invention, the AI-enabled assistive wearable device designed for individuals with visual impairments, is configured to enhance their independence and self-sufficiency in daily life. By providing real-time visual assistance, the device empowers visually impaired individuals to navigate their surroundings more effectively and engage completely with the world around them.
WORKING OF THE INVENTION:
According to the embodiments shown in Figures 5, 6, 7a, 7b, 8a and 9a, the procedure for using the AI enabled assistive wearable device for blind and visually impaired comprises steps:
Step 1. Device Setup:
• wearing and connecting the USB cable (4) of the assistive wearable device with the user equipment (5),
• activating the corresponding user interface application on the user equipment (5),
Step 2. Visual Data Capture:
• activating camera (2) of the wearable device without human intervention and capturing continuous vide data from the user’s surroundings,
Step 3. Data Processing and Feedback:
• analyzing the captured video data in real-time using device onboard processor,
• translating the visual information into audio signals providing real-time descriptions of the user's environment through the processor,
• transmitting the audio signals to the user through the user equipment (5) built-in speakers or headphones,
• initiating haptic feedback (vibrations) for additional signals, such as indicating the presence of obstacles or objects,
The present invention of an AI enabled assistive wearable device described here also has certain customizable features such as:
• The users may customize the device's accessibility settings to suit their individual needs, including:
? object detection mode;
? text recognition mode; wherein, the text may be printed or handwritten, and
? currency identification/verification preferences.
• The users may adjust the audio volume and haptic feedback intensity.
According to another embodiment of figure 10 present invention, depicts the workflow of the smart eye feature using the AI enabled assistive wearable device for blind and visually impaired, wherein the system commences by capturing real-time images from various live feed sources. The captured images undergo processing to maintain high resolution, enabling detailed analysis. The processed images are subsequently sent to the GPT-4 model for the generation of detailed and context-aware descriptions. The system analyses objects within the images, their relationships and the broader scene context. The descriptions are adapted to suit user requirements or specific use cases. The system provides the API endpoint for image capture and description retrieval, ensuring real-time processing. The images and their descriptions are stored temporarily, with efficient management and cleanup processes in place. The device handles multiple live feeds simultaneously and may be scaled to accommodate varying deployment needs.
WORKING EXAMPLE:
Here are the examples of three different experimental view
View 1. As depicted in Figure 5, the user interface application provides a visually intuitive platform for interacting with the wearable device.
View 2. Now referring to figure shown in Figure 6, the user initiates the login process by connecting the wearable device via USB cable to the user interface application. Upon successful authentication using a mobile number, the application redirects the user to the home page.
View 3. Figure 7a and 7b illustrate the wearable device's capability to identify objects within its field of view. Equipped with object detection model, the device accurately detects a wide range of objects, including individuals, furniture, and obstacles. Real-time audio feedback is provided to describe the location and nature of these objects, empowering users to navigate their surroundings with confidence and avoid potential hazards.
Additionally, users may customize accessibility settings to meet their individual needs by utilizing the accessibility mode integrated into the user interface application. The gesture-based commands, which is an in-built feature of user equipment may be employed to seamlessly transition between various modes, as detailed in views 4 and 5.
Furthermore, figure 7c illustrates the system begins with user interaction, where the user launches the user interface application and directs the camera towards a desired scene or scenario. The mobile camera then captures a real-time video stream and sends it frame-by-frame for object detection. Each frame undergoes preprocessing, including resizing and normalization. The preprocessed frame is subsequently fed into the EfficientDet D0 model, which detects objects within the frame, predicts bounding boxes, and assigns class labels with confidence scores. Post-processing is applied to eliminate overlapping bounding boxes, retaining only the most confident predictions. The detected object labels and confidence scores are extracted, and objects with confidence scores below a predefined threshold are discarded. The names of the detected objects are then overlaid onto the screen in real-time, providing the user with a continuous, real-time object detection experience as the camera captures new frames.
View 4. As shown in Figure 8a, the device utilizes advanced camera technology to capture images of printed or handwritten documents. These images are subsequently processed and converted into audio format, enabling users to access information that may otherwise be inaccessible due to visual impairments.
Further, figure 8b depicts the system begins by accepting document uploads via a FastAPI endpoint, supporting a variety of file formats. Document type identification determines the appropriate processing pipeline for text and image extraction. Preprocessing steps, including orientation correction, conversion to images, Optical Character Recognition (OCR) for scanned PDFs, and direct parsing for non-scanned documents, ensure accurate content extraction. Text and images are extracted using Open Neural Network Exchange Text Recognition (OnnxTR) for scanned documents and format-specific libraries for non-scanned documents. Image descriptions are optionally generated using OpenAI Vision. Metadata and document structure are preserved to maintain context and accuracy. The extracted text, images, and metadata are combined into a structured format, such as JSON, for further analysis or storage. Optionally, the extracted data is stored in MongoDB, and vector embeddings are generated and stored for semantic search. Langchain integration enhances retrieval-augmented generation capabilities, enabling users to ask questions based on uploaded documents and receive context-aware answers. The system interacts with OpenAI API for image descriptions and text processing, MongoDB Atlas for data storage, and utilizes temporary file storage with subsequent cleanup. The final processed output is returned to the user or another system component in a structured format.
Now referring to the figure 8c it demonstrates the system commences with user input, wherein users submit questions or prompts via the user interface application. The AI Tutor receives this input and incorporates relevant past interactions from the current session. Customization settings are applied, and the processed prompt, along with session context, is sent to the OpenAI API. The API generates a context-aware response that is subsequently formatted according to configuration settings and delivered to the user. The AI Tutor maintains session context and allows for state preservation, enabling users to pause and resume sessions. Sessions may be terminated by the user or upon timeout, with options to save or reset the session context.
View 5. Figure 9a demonstrates the device's ability to accurately identify and provide real-time audio feedback on the denomination and value of banknotes. This functionality is invaluable for independent financial management, ensuring accuracy and minimizing the risk of errors during transactions.
Referring to the Figure 9b it describes the system initiates with the capture of a real-time video stream utilizing the mobile camera. Individual frames are extracted from this video feed and subsequently preprocessed, involving resizing to 640x640 pixels and normalization of pixel values. The preprocessed frames are then fed into the SSD MobileNet V2 FPNLite 640 model for object detection. The model predicts bounding boxes, class labels (currency type), and confidence scores, with post-processing applied to eliminate overlapping boxes and retain only the most confident predictions. The identified currency labels and corresponding confidence scores are extracted, and predictions below a specified confidence threshold are discarded to ensure accuracy. The detected currency information is then converted into spoken output using a Text-to-Speech (TTS) engine, and the generated audio is played through the mobile device's speaker. This process is continuously repeated, enabling real-time feedback as new frames are captured by the mobile camera.
ADVANTAGES OF THE INVENTION
The present invention provides an AI enabled assistive wearable device for blind and visually impaired.
The present invention of an AI enabled assistive wearable device for the blind and visually impaired provides help to the to the blind in safe walking on the roads.
The present invention provides an AI enabled assistive wearable device for blind and visually impaired people that have access to all printed documents or books.
The present invention provides an AI enabled assistive wearable device for blind and visually impaired people that assist blind people in shopping.
The present invention provides an AI enabled assistive wearable device for the blind and visually impaired that helps with cash payments that shall be verified using the assistive device.
The present invention of an AI enabled assistive wearable device for blind and visually impaired is easy to use.
The present invention of an AI enabled assistive wearable device for blind and visually impaired provides a device which uses a device that is connected to the user equipment via a USB cable, and the same cable powers the device.
The present invention of an AI enabled assistive wearable device for blind and visually impaired provides a device which activates the application for user without the human intervention.
The present invention of an AI enabled assistive wearable device for blind and visually impaired provides a device that uses the users phone and is powerful enough to deliver the output or result to blind people.
The present invention of an AI enabled assistive wearable device for blind and visually impaired provides a device that works without requiring the internet for image processing on the servers.
The present invention of an AI enabled assistive wearable device for blind and visually impaired provides a device that is compact.
The present invention of an AI enabled assistive wearable device for blind and visually impaired provides a device that have short operating time.
The present invention of an AI enabled assistive wearable device for blind and visually impaired that gives real-time information.
The present invention of an AI enabled assistive wearable device for blind and visually impaired that does video processing to provide information using single camera and translate the visual world into audio using camera vision.
The present invention of an AI enabled assistive wearable device for blind and visually impaired that works without requiring battery recharge.
The present invention of an AI enabled assistive wearable device for blind and visually impaired individuals, increase their access to information, and enhance their ability to participate in various activities.
,CLAIMS:We Claim,
1. An Artificial Intelligence (AI)-enabled assistive wearable device for blind and visually impaired
, comprising:
- an eyeglass (1);
- an enclosure (3) connected with an arm of the eyeglass (1);
- a USB (Universal Serial Bus) cable (4) operatively fixed with a circuitry of the enclosure (3);
- a user equipment (5) detachably connected with the USB cable (4) of the enclosure (3),
- a circuitry within the enclosure (3) including USB camera (2) having a resolution of 5 megapixels (MP) consists of a lens, a real-time object detection model, a high-performance object detection model, an Open Neural Network Exchange Text Recognition (Onnx TR), an Application Programming Interface (API) model, and a Generative Pre-trained Transformer 4 (GPT-4) model;
said real-time object detection model, comprising the Single Shot MultiBox Detector (SSD) framework, MobileNet V2 backbone, and Feature Pyramid Network Lite (FPNLite) configuration, optimized for speed and accuracy on the user equipment (5), capable of detecting objects of varying sizes and recall rates;
said high-performance object detection model comprising the EfficientDet D0 configuration, for efficient deployment in environments with computational resources, utilizing the EfficientNet backbone with compound scaling to optimize width, depth, and resolution, including CBS (Conv + BN + SiLU) blocks, C3 blocks, SPPF block, neck, and head;
said document processing system utilizes an Onnx TR model for text recognition, OCR, and image analysis, supporting multiple document formats and extracting textual content, images, and metadata, generating searchable PDFs from scanned PDFs using OCR and embedding text within them, processing DOCX and PPT/PPTX documents, utilizing MongoDB for data storage and vector searches, and integrating with Langchain for document processing, retrieval, and question-answering capabilities;
said interactive learning system integrated into the user interface application, including an API for real-time answers and explanations, providing interactive learning, customizable responses, instant information retrieval, multi-disciplinary support, and conversational context retention, processing user prompts, interacting with the API;
said real-time image analysis system includes the GPT-4 model for generating detailed and context-aware descriptions of visual content, integrating with live feeds and supporting high-resolution images, offering API access for image capture and description retrieval;
wherein, said wearable device is configured to capture real-time visual information from the surrounding environment, process the captured information, and provide audio or haptic feedback to the user.
2. The AI-enabled assistive wearable device as claimed in claim 1, wherein the device is capable of capturing video, images and processing captured images and videos locally without an internet connectivity.
3. The assistive wearable device of claim 1, wherein the device includes customizable settings for audio feedback volume and haptic feedback type.
4. The assistive wearable device of claim 1, wherein the device is equipped with an adjustable camera angle to ensure optimal viewing for the user.
5. The assistive wearable device of claim 1, wherein the device includes the object detection model enabling user to identify objects in their surroundings and the currency identification model, enabling users to identify and verify currency.
6. The assistive wearable device of claim 1, wherein the device is securely attached to the arm of the eyeglasses (1) using a cable tie (9) and powered by the user equipment’s battery.
Dated this 07th day of October, 2024.
| # | Name | Date |
|---|---|---|
| 1 | 202321028137-STATEMENT OF UNDERTAKING (FORM 3) [18-04-2023(online)].pdf | 2023-04-18 |
| 2 | 202321028137-PROVISIONAL SPECIFICATION [18-04-2023(online)].pdf | 2023-04-18 |
| 3 | 202321028137-POWER OF AUTHORITY [18-04-2023(online)].pdf | 2023-04-18 |
| 4 | 202321028137-FORM FOR STARTUP [18-04-2023(online)].pdf | 2023-04-18 |
| 5 | 202321028137-FORM FOR SMALL ENTITY(FORM-28) [18-04-2023(online)].pdf | 2023-04-18 |
| 6 | 202321028137-FORM 1 [18-04-2023(online)].pdf | 2023-04-18 |
| 7 | 202321028137-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [18-04-2023(online)].pdf | 2023-04-18 |
| 8 | 202321028137-EVIDENCE FOR REGISTRATION UNDER SSI [18-04-2023(online)].pdf | 2023-04-18 |
| 9 | 202321028137-DRAWINGS [18-04-2023(online)].pdf | 2023-04-18 |
| 10 | 202321028137-DECLARATION OF INVENTORSHIP (FORM 5) [18-04-2023(online)].pdf | 2023-04-18 |
| 11 | 202321028137-FORM-26 [23-02-2024(online)].pdf | 2024-02-23 |
| 12 | 202321028137-RELEVANT DOCUMENTS [18-03-2024(online)].pdf | 2024-03-18 |
| 13 | 202321028137-POA [18-03-2024(online)].pdf | 2024-03-18 |
| 14 | 202321028137-MARKED COPIES OF AMENDEMENTS [18-03-2024(online)].pdf | 2024-03-18 |
| 15 | 202321028137-FORM 13 [18-03-2024(online)].pdf | 2024-03-18 |
| 16 | 202321028137-PostDating-(02-04-2024)-(E-6-76-2024-MUM).pdf | 2024-04-02 |
| 17 | 202321028137-APPLICATIONFORPOSTDATING [02-04-2024(online)].pdf | 2024-04-02 |
| 18 | 202321028137-FORM-5 [09-10-2024(online)].pdf | 2024-10-09 |
| 19 | 202321028137-DRAWING [09-10-2024(online)].pdf | 2024-10-09 |
| 20 | 202321028137-COMPLETE SPECIFICATION [09-10-2024(online)].pdf | 2024-10-09 |
| 21 | 202321028137-STARTUP [28-10-2024(online)].pdf | 2024-10-28 |
| 22 | 202321028137-FORM28 [28-10-2024(online)].pdf | 2024-10-28 |
| 23 | 202321028137-FORM-9 [28-10-2024(online)].pdf | 2024-10-28 |
| 24 | 202321028137-FORM 18A [28-10-2024(online)].pdf | 2024-10-28 |
| 25 | Abstract.jpg | 2024-11-22 |
| 26 | 202321028137-FER.pdf | 2025-04-02 |
| 1 | 202321028137_SearchStrategyNew_E_SearchHistory(2)E_30-03-2025.pdf |