System And Method For Image Based Indoor Localization And Navigation

Abstract: A system and a method for image based indoor localization and navigation are provided. The method includes receiving media content containing a plurality of frames captured in at least one orientation of the media content generator. Distance between each of the plurality of key points associated with the current frame and each of the plurality of key points associated with a subsequent frame is calculated to determine a change in the distance of each key point from the center point of the current frame and the subsequent frame. A topological map based on the change in the distance of the each key point and the orientation of the current frame is generated. Loop closure for the current frame is detected to generate a navigation map corresponding to topological map by identifying the current frame as a previously identified landmark in an environment based on the loop closure detection.

Patent Information

Application #

Filing Date

10 August 2016

Publication Number

07/2018

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

ip@legasis.in

Parent Application

Patent Number

Legal Status

Grant Date

2023-07-28

Renewal Date

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th Floor, Nariman Point, Mumbai-400021, Maharashtra, India

Inventors

1. GUPTA, Gaurav

Tata Consultancy Services Limited,Block C, Kings Canyon, ASF Insignia, Gurgaon- Faridabad Road, Gawal Pahari, Gurgaon-122003, Haryana, India

2. KEJRIWAL, Nishant

Tata Consultancy Services Limited, Galaxy Business Park, Plot - A 44 & 45, Sector-62, Noida - 201309, Uttar Pradesh, India

3. PALLAV, Prasun

Tata Consultancy Services Limited, Galaxy Business Park, Plot - A 44 & 45, Sector-62, Noida - 201309, Uttar Pradesh, India

4. EHTESHAM, Hassan

Tata Consultancy Services Limited,Block C, Kings Canyon, ASF Insignia, Gurgaon- Faridabad Road, Gawal Pahari, Gurgaon-122003, Haryana, India

5. KUMAR, Swagat

Tata Consultancy Services Limited, Galaxy Business Park, Plot - A 44 & 45, Sector-62, Noida - 201309, Uttar Pradesh, India

Specification

Claims:1. A computer-implemented method comprising:
receiving media content comprising a plurality of frames captured in at least one orientation of a media content generator;
determining center point of a current frame from amongst the plurality of frames and a plurality of key points associated with the current frame;
calculating distance between each of the plurality of key points associated with the current frame and each of the plurality of key points associated with a subsequent frame to determine a change in the distance of each key point from the center point of the current frame and the subsequent frame;
determining orientation of the current frame;
generating a topological map based on the change in the distance of the each key point and the orientation of the current frame, wherein the topological map comprises landmarks of a location;
detecting a loop closure for the current frame comprises:
extracting visual features for each of a node from a plurality of nodes of the current frame and assigning words corresponding to the visual features to obtain a dictionary comprising a set of visual words; and
determining a similarity index by a similarity measure for each of the node from the plurality of nodes of the current frame with one or more previous frames of the plurality of frames;
dynamically generating a navigation map corresponding to the topological map by identifying the current frame as a previously identified landmark in an environment based on the loop closure detection.
2. The method as claimed in claim 1, wherein the plurality of frames comprises one of a video and a set of images.
3. The method as claimed in claim 1, wherein detecting a loop closure is by a Linear Chain Conditional Random Field (L-CRF).

4. The method as claimed in claim 1, wherein extracting visual features further comprises extracting SURF (Speeded Up Robust Features) descriptors for each nodes of the current frame and to match the corresponding word in the dictionary.

5. The method as claimed in claim 1, wherein the similarity index being indicative of a similarity between the words of the current frame and the one or more previous frames, wherein the similarity between the words is based on a score associated with common words in the current frame and the one or more previous frames.

6. The method as claimed in claim 1, wherein determining the change in distance between each key point further comprises:
combining the change in the distance determined with a change in the distance determined by
an accelerometer to obtain a final change in the distance.

7. The method as claimed in claim 6, further comprising:
applying a Kalman filter on the final change in the distance to obtain a filtered distance.

8. The method as claimed in claim 1, the detecting further comprises:
assigning a weightage to each visual word based on frequency and time of occurrence of the word; and
removing words from the dictionary based on the weightage assigned to each visual word.

9. A computer implemented system comprising:
a visual odometer configured to:
determine center point of a current frame and a plurality of key points associated with the current frame; and
calculate distance between each of the plurality of key points associated with the current frame and each of the plurality of key points associated with a subsequent frame to determine a change in the distance of each key point from the center point of the current frame and the subsequent frame,
wherein the current frame and the subsequent frame are captured in at least one orientation of a media content generator;
an IMU (Inertial Measurement Unit) sensor, communicatively coupled to the visual odometer, configured to determine orientation of the current frame;
a computing device comprising at least one memory and at least one processor, the at least one memory, the visual odometer and the IMU sensor communicatively coupled to the at least one processor, wherein the at least one processor is capable of executing programmed instructions stored in the at least one memory to:
generate a topological map based on the change in the distance of the each key point and the orientation of the current frame, wherein the topological map comprises landmarks of a location;
detect a loop closure for the current frame comprises:
extracting visual features for each of a node from a plurality of nodes of the current frame and assigning words corresponding to the visual features to obtain a dictionary comprising a set of visual words; and
determining a similarity index by a similarity measure for each of the node from the plurality of nodes of the current frame with one or more previous frames of the plurality of frames;
dynamically generate a navigation map corresponding to the topological map by identifying the current frame as a previously identified landmark in an environment based on the loop closure detection.

10. The system as claimed in claim 9, wherein the media content generator comprises a plurality of frames, the plurality of frames includes one of a video and a set of images.

11. The system as claimed in claim 9, wherein the IMU sensor comprises a gyroscope, a magnetometer and an accelerometer.

12. The system as claimed in claim 9, wherein extracting visual features further comprises extracting SURF (Speeded Up Robust Features) descriptors for the current frame and to quantize the corresponding word in the dictionary.

13. The system as claimed in claim 9, wherein the similarity measure being indicative of a similarity between the words current frame and the one or more previous frames, wherein the similarity between the words is based on a score associated with common words in the current frame and the one or more previous frames.

14. The system as claimed in claim 9, wherein determining the change in distance between each key point further comprises:
combining the change in the distance determined with a change in the distance determined by an accelerometer to obtain a final change in the distance.

15. The system as claimed in claim 14, wherein a Kalman filter is applied on the final change in the distance to obtain a filtered distance.

16. The system as claimed in claim 9, the detecting further comprises:
assigning a weightage to each visual word based on frequency and time of occurrence of the word; and
removing words from the dictionary based on the weightage assigned to each visual word.

17. The method as claimed in claim 9, wherein the words are extracted from a K-D tree.
, Description:FORM 2

THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENTS RULES, 2003

COMPLETE SPECIFICATION
(See section 10 and rule 13)

Title of invention:
SYSTEM AND METHOD FOR IMAGE BASED INDOOR LOCALIZATION AND NAVIGATION

Applicant:
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India

The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
The embodiments herein generally relate to navigation and in particular to methods and systems for image-base indoor localization and navigation.

BACKGROUND
Generally, in navigation a GPS technology works best for outdoor environment, however, in the scenarios such as occluding buildings, trees, roof and other such objects, the signal from the satellite is attenuated resulting in weak localization. The error range for indoor GPS localization can sometime exceed the indoor space itself. Numerous approaches for indoor localization include use of indoor local references i.e. visual landmark points, Wi-Fi/ Bluetooth based localization and Inertial Measurement Unit (IMU). IMU based positioning utilizes accelerometer, gyroscope and magnetic field sensor readings for localization. The Wi-Fi/Bluetooth based approach uses the intensity of received signal and fingerprinting for locating. The accuracy of the latter method is strictly dependent on the number of access points placed and the calibration with the same access point locations is required in testing phase. IMU based positioning system can ideally give the location of user using orientation values from gyroscope and double integrated values of accelerometer provided initial point.

SUMMARY
The following presents a simplified summary of some embodiments of the disclosure in order to provide a basic understanding of the embodiments. This summary is not an extensive overview of the embodiments. It is not intended to identify key/critical elements of the embodiments or to delineate the scope of the embodiments. Its sole purpose is to present some embodiments in a simplified form as a prelude to the more detailed description that is presented below.
In view of the foregoing, an embodiment herein provides system and method for image based indoor localization and navigation. In an embodiment the computer implemented method includes receiving media content having a plurality of frames captured in at least one orientation of the media content generator determining center point of a current frame from amongst the plurality of frames and a plurality of key points associated with the current frame, calculating distance between each of the plurality of key points associated with the current frame and each of the plurality of key points associated with a subsequent frame to determine a change in the distance of each key point from the center point of the current frame and the subsequent frame. The method include determining orientation of the current frame, generating a topological map based on the change in the distance of the each key point and the orientation of the current frame, wherein the topological map comprises landmarks of a location. The method further includes detecting a loop closure for the current frame by extracting visual features for each of a node from a plurality of nodes of the current frame and assigning words corresponding to the visual features to obtain a dictionary comprising a set of visual words, and determining a similarity index by a similarity measure for each of the node from the plurality of nodes of the current frame with one or more previous frames of the plurality of frames. The method further incudes dynamically generating, a navigation map corresponding to topological map by identifying the current frame as a previously visited landmark in an environment based on the loop closure detection.
In another aspect, an image based indoor localization and navigation system is provided. The system includes a visual odometer configured to determine center point of a current frame and a plurality of key points associated with the current frame and to calculate distance between each of the plurality of key points associated with the current frame and each of the plurality of key points associated with a subsequent frame to determine a change in the distance of each key point from the center point of the current frame and the subsequent frame, wherein the current frame and the subsequent frame are captured in at least one orientation of a media content generator. The system includes an IMU (Inertial Measurement Unit) sensor, communicatively coupled to the visual odometer, configured to determine orientation of the current frame. The system includes a computing device comprising at least one memory and at least one processor, the at least one memory, the visual odometer and the IMU sensor communicatively coupled to the at least one processor, wherein the at least one processor is capable of executing programmed instructions stored in the at least one memory to: generate a topological map based on the change in the distance of the each key point and the orientation of the current frame, wherein the topological map comprises landmarks of a location; detect a loop closure for the current frame comprises: extracting visual features for each of a node from a plurality of nodes of the current frame and assigning words corresponding to the visual features to obtain a dictionary comprising a set of visual words; and determining a similarity index by a similarity measure for each of the node from the plurality of nodes of the current frame with one or more previous frames of the plurality of frames; dynamically, generate a navigation map corresponding to topological map by identifying the current frame as a previously visited landmark in an environment based on the loop closure detection.
In yet another aspect, a non-transitory computer-readable medium having embodied thereon a computer program for executing a method for image based indoor localization and navigation. In an embodiment method includes receiving media content comprising a plurality of frames captured in at least one orientation of the media content generator; determining center point of a current frame from amongst the plurality of frames and a plurality of key points associated with the current frame; calculating distance calculating distance between each of the plurality of key points associated with the current frame and each of the plurality of key points associated with a subsequent frame to determine a change in the distance of each key point from the center point of the current frame and the subsequent frame; determining orientation of the current frame; generating a topological map based on the change in the distance of the each key point and the orientation of the current frame, wherein the topological map comprises landmarks of a location. The method further includes detecting a loop closure for the current frame by extracting visual features for each of a node from a plurality of nodes of the current frame and assigning words corresponding to the visual features to obtain a dictionary comprising a set of visual words; and determining a similarity index by a similarity measure for each of the node from the plurality of nodes of the current frame with one or more previous frames of the plurality of frames. The method further incudes dynamically, generating a navigation map corresponding to topological map by identifying the current frame as a previously visited landmark in an environment based on the loop closure detection.
BRIEF DESCRIPTION OF THE FIGURES
The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to reference like features and modules.
FIG. 1 illustrates a process flow for determining change in distance between the key points, in accordance with an example embodiment;
FIG. 2 illustrates a block diagram for image-base indoor localization and navigation, in accordance with an example embodiment;
FIG. 3 illustrates a flow diagram for loop closure detection, in accordance with an example embodiment;
FIG. 4 illustrates a Linear Chain CRF (Conditional Random Field) representation, in accordance with an example embodiment; and
FIG. 5 illustrates a flow diagram of a method for image-base indoor localization and navigation, in accordance with an example embodiment.
It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative systems and devices embodying the principles of the present subject matter. Similarly, it will be appreciated that any flow charts, flow diagrams, and the like represent various processes which may be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
DETAILED DESCRIPTION
The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.
The present disclosure relates to a system and method for image-base indoor localization and navigation. The image-based localization and navigation provides more information than various sensor based navigation that exist in the market today. Various indoor localization system is necessary to assist a person navigating inside a building particularly when a person wants to explore a new indoor environment.
Various techniques are used in determining the change in distance between two key points in a frame. Visual odometry is one such example, the visual odometry data provides the information of an ego motion of capturing device by using continuous images from an image capturing unit. Visual odometry in addition to image based loop closure provides complimentary information for traversing an unknown path. An example of determining the change in distance between key points in at least two frames is presented hereunder.
Referring now to the drawings, and more particularly to FIGS. 1 through 4, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and method.
FIG. 1 illustrates a process flow 100 for determining a change in the distance between key points in accordance with an example embodiment. At 102, a current image, for example an Image (I1) as shown in the FIG. 1 is considered for determining a change in the distance. The Image (I1) can be received by a media content generator. For example, the Image (I1) may be plurality of frames or a video. The Image (I1) is considered for satisfying a condition. In one of the example, at 112, if condition A is satisfied, then it causes fresh initialization of key points in the Image (I1), that is, if |Oi| >1 or number of key points is 75% of the original key points. If the condition A is not satisfied, them at block 104, the new key points are determined based on their optical flow of previous frame key points, for example, via a visual odometer. The original key points refers to the number of key points in a location at the latest previous key point initialization. The Image Input Image (I1) is used to form visual odometry based distance estimation. Herein, the distance estimated by calculating the distance between each of the plurality of key points associated with the current frame and each of the plurality of key points associated with a subsequent frame to determine a change in the distance of each key point from the center point of the current frame and the subsequent frame. At 106, the distance estimated by visual odometry is normalized by the summing up the distance measure for all key points and divide the sum by total number of key points present in the current image.
At 108, an accelerometer is used in the form of an inertial sensor based distance estimation. At 110, the inertial sensor based estimation is calculated by double integration and applied on the current frame and the subsequent frame. The normalized distance calculated by the visual odometer at 106 and the distance calculated by accelerometer at by double integration at 110 are combined at 114 to obtain a fused distance 116. The distance calculated from the optical flow 104, includes a drift with time and the distance from accelerometer is noisy and jittery in nature. Hence the normalized distance from 106 and distance from 110 are combined by linear combination of data to obtain a fused distance. In an example embodiment, Kalman filter is applied on the fused distance to obtain a final filtered change in distance measurement.
At 118, orientation of the media content generator is determined by the locating the position of the media content generator. The combination of the change in distance measurement and orientation data determines the position of the media content generator, as shown in 120. The mapping of image is done only if Condition B is true, i.e. If |Change in fused distance| >?, herein ? is the least distance between old (previously visited) and new landmark where loop closure is detected. The loop closure detection is further explained in detail in FIG. 3 and FIG. 4.
FIG. 2 is a block diagram of a system 200 for image-based indoor localization and navigation, in accordance with an example embodiment. The system 200 includes a visual odometer 202, an IMU sensor 204 and a computing device 208. For example IMU sensor 204, may include a gyroscope, a magnetometer and an accelerometer. The computing device 208 includes or is otherwise in communication with at least one memory such as a memory 210, at least one processor such as a processor 212, and a communication interface 214. The processor 212, the memory 210, and the communication interface 214 may be coupled by a system bus such as a system bus 218 or a similar mechanism. Although FIG. 2 shows example components of computing device 208, in other implementations, system 200 may contain fewer components, additional components, different components, or differently arranged components than depicted in FIG. 2.
The at least one processor such as the processor 212 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that facilitates in managing access to a financial account. Further, the processor 212 may comprise a multi-core architecture. Among other capabilities, the processor 212 is configured to fetch and execute computer-readable instructions or modules stored in the memory 210. The processor 212 may include circuitry implementing, among others, audio and logic functions associated with the communication. For example, the processor 212 may include, but are not limited to, one or more digital signal processors (DSPs), one or more microprocessor, one or more special-purpose computer chips, one or more field-programmable gate arrays (FPGAs), one or more application-specific integrated circuits (ASICs), one or more computer(s), various analog to digital converters, digital to analog converters, and/or other support circuits. The processor 212 thus may also include the functionality to encode messages and/or data or information. The processor 212 may include, among other things, a clock, an arithmetic logic unit (ALU) and logic gates configured to support operation of the processor 212. Further, the processor 212 may include functionality to execute one or more software programs, which may be stored in the memory 210 or otherwise accessible to the processor 212.
The memory 210, may store any number of pieces of information, and data, used by the computing device 208 to implement the functions of the computing device 208. The memory 210 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. Examples of volatile memory may include, but are not limited to volatile random access memory (RAM). The non-volatile memory may additionally or alternatively comprise an electrically erasable programmable read only memory (EEPROM), flash memory, hard drive, or the like. The memory 210 may be configured to store information, data, applications, instructions or the like for enabling the computing device 208 to carry out various functions in accordance with various example embodiments. Additionally or alternatively, the memory 210 may be configured to store instructions which when executed by the processor 212 causes the computing device 208 to behave in a manner as described in various embodiments.
The communication interface(s) 214 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, local area network (LAN), cable, etc., and wireless networks, such as Wireless LAN (WLAN), cellular, or satellite. For the purpose, the communication interface (s) 214 may include one or more ports for connecting the computing system 208 to the visual odometer 202 and IMU sensor 204.
In an embodiment, the media content generator 216 may include an image capturing module, such as a camera, video and/or audio module, in communication with the processor 212. The media content generator 216 may be any means for facilitating capturing images, video and/or audio for storage, display, or transmission. For example, in an exemplary embodiment in which the media content generator 216 may be embodied in a camera, such that the camera may be configured to form and save a digital image file from an image of a location captured by the camera. The media content generator 216 may include hardware such as a CMOS/CCD (complementary metal-oxide semiconductor/charged coupled device) sensors configured for capturing images. In an embodiment, the media content generator 216 may be configured to capture media items in accordance with a number of capture settings. The capture settings may include, for example, focal length, zoom level, lens type, aperture, shutter timing, white balance, color, style (e.g., black and white, sepia, or the like), picture quality (e.g., pixel count), flash, date, time, or the like. Alternatively, the values of the capture settings (e.g., degree of zoom) may be obtained at the time a media item comprising the marker image is captured and stored in association with the captured media item in a memory device, such as, memory 210.
The media content generator, 216 may also include all hardware, such as circuitry, a lens or other optical component(s), and software to provide various image capturing functionality, such as, for example, image zooming functionality. Image zooming functionality can include the ability to magnify or de-magnify an image prior to or subsequent to capturing an image. Alternatively, according to some example embodiments, the image sensor may include only the hardware needed to view an image, while a memory device, such as the memory device of the computing device 208 stores instructions for execution by the processor 212 in the form of software to create a digital image file from a captured image. In an exemplary embodiment, the media content generator 216 may further include a processor or co-processor which assists the processor 212 in processing image data and an encoder and/or decoder for compressing and/or decompressing image data. The encoder and/or decoder may encode and/or decode according to, for example, a joint photographic experts group (JPEG) standard or other format.
In operation, the media content generator 216, generates media content such as videos, and/or images. The content generated may be in form of a plurality of frames. The media content received may be captured in at least one orientation of the media content generator 216. Centre point of a current frame and a plurality of key points associated with the current frame is determined. The distance between each of the plurality of key points associated with the current frame (I1) and each of the plurality of key points associated with a subsequent frame is determined using the method as described in FIG.1. The change in the distance of each key point from the centre point of the current frame and the subsequent frame is calculated based on the method as described in FIG. 1.
The orientation of the current frame is determined by an Orientation sensor. In one implementation, the IMU (Inertial Measurement Unit) sensor 204, communicatively coupled to the visual odometer is configured to determine the orientation of the current frame. The combination of distance measurement from previous calculations and orientation data determines the position of the media content generator 216 and thereby the orientation of the current frame. A topological map based on the change in the distance of the each key point and the orientation of the current frame is generated by the computing device 208. The topological map comprises landmarks of a location in a given environment. Further, loop closure for the current frame is detected by the computing device 208. The method of loop closure detection is further described in FIG. 3 and FIG.4
FIG. 3 illustrates a flow diagram for loop closure detection, in accordance with one of the embodiments. Linear Chain Conditional Random Field (L-CRF) is applied to detect loop closure based on a sequence of observations instead of a single observation. In general, the decision of loop closure is taken based on the last observation. In the present invention, in the topological mapping, images are received continuously, such that a sequence of past observations provides better observation for loop closure detection implemented using L-CRF. Hence, the present method increases the robustness and accuracy of loop closure detection. The current frame includes a plurality of nodes. For each node from the plurality of nodes, visual features are extracted. Thereafter, words corresponding to the visual features are assigned to obtain a dictionary containing a set of visual words. For example, SURF (Speeded Up Robust Features) descriptors are used as image features for extracting the visual features of each node from the plurality of nodes. For each image, for example current image, the SURF (128 dimensional) features are extracted. For each of the extracted feature, nearest neighbor search is performed in the dictionary of words using a K-D tree (K-Dimensional Tree). A visual word dictionary and word-pair dictionary is created via the dictionary of words and multi map data structure respectively. Both the dictionaries are updated incrementally as and when a new image (new landmark) is captured in the environment.
In one implementation, the words in the dictionary are matched with the word-pairs present in the current image. Based on the matching of the words in the dictionary and the word-pairs, a tf-idf (term frequency – inverse document frequency) score for each word and word-pair is generated and used in determining the similarity measure between current image and one or more previous frames of the plurality of frames. Herein the nodes of the current frame are matched with the plurality of nodes of the one or more previous frame. The similarity measure between two images is calculated based on tf-idf scores of the common words found in both images. The tf-idf score also indicates the importance and occurrence of a word in the dictionary. The similarity measure is further described in detail in FIG. 4. Based on the computed similarity measure a combined likelihood is used to find the loop closure probability of each node of the current frame is determined.
In one implementation, the size of the word-pair dictionary may be limited by removing very old and less used information in the map. A weight is assigned to each word in the dictionary based on its frequency and time of occurrence. Since fixed size word dictionary is used, the memory requirement of the system 200 is not increasing linearly and the computation time is constant over time. The computation time may be reduced further by reducing the word-pair dictionary size. A forgetting factor is implemented in dictionaries to limit the computation time per image by removing the old and less used words.
At, 302, if the condition of the loop closure is satisfied, then bundle adjustment is performed, thereby correcting and/or refining the approximation of a user’s location in the current frame. Herein, the loop closure indicates the previously visited landmark in the given environment, and in such a case of revisit of the landmark (i.e., when loop closure is satisfied), the bundle adjustment is performed thereby correcting and/or refining the generated topographical map. At, 306, the current image and the location associated with the current image is saved as a previously identified landmark. In another implementation, if the loop closure is not satisfied, then the current image is considered as a new landmark and algorithm continues to timestamp and captures the current image as a new image (new landmark) and further, processes and saves the new image in the database (not shown in the FIG).
FIG. 4 illustrates a Linear Chain CRF representation, in accordance with an example embodiment. The topological map generated includes distinct landmarks in the environment. The connections between the distinct landmarks is based on the relative position of the landmarks in the topological map. As shown in FIG. 4, the plurality of nodes of the current frame in the topological map are represented by hidden states in L-CRF as S0, S1.....St-1, St and the observations are represented by O1, ....,Ot-1,Ot. Herein, the observations refer to the bag of word pairs of each node in the map with current frame. The each of the hidden state is defined based on the tf-idf score of each word and the word-pair present in the node. For example, a current state of the user depends on its previous state in the map at the time t-1 and observation of the current state. The observation vector for current frame will be (O1, O2….. On) where n is number of nodes in the current frame such that at any timestamp, Ot = (O1, O2….. On). Therefore a clique defined for L-CRF consists of St-1, St , Ot and conditional distribution over hidden states is defined as :
P(s_(O:t)?O_(1:T) )=1/Z(X) ?_1^T¦Ø_t (St,St-1,Ot)
In an example embodiment, each state St can assume either of the two values = {Loop Closure, New Node}. Herein, the loop closure indicates previously visited landmark and the new node indicates a new landmark. The observations Ot consists of tf-idf scores of each word and word-pairs present in each node. Ft is a clique potential function which incorporates the observations and predefined transition model. Further, a forward-backward algorithm is used for inference in L-CRF.
The forward probabilities are calculated as follows.
Ft-3 = Ft-4 Tt-3 Ot-3 ---- (2)
Ft-2 = Ft-3 Tt-2 Ot-2 ---- (3)
Ft-1 = Ft -2 Tt-1 Ot-1 ---- (4)
Ft = Ft-1Tt Ot ---- (5)
Similarly, the backward probabilities are calculated and both the forward and backward probabilities are combined to form the final probability of the current image. The transition matrix Tt is defined based on certain rules. The observation matrix Ot is defined based on the tf-idf scores of each node.
The certain rules of the transition matrix Tt are defined as follows:
Formulation of P(St |St-1) i.e. transition probabilities from time t-1 to t. For example, K nodes (N1,N2 . . . . . . .NK ) are considered in the map at time t-1.
P(St = New place|St-1 = New place) = 0.9
P(St = Ni |St-1 = New place) = 0.1/K, where i ? {1, . . .K}.
P(St = New place| St-1 = Nj ) = 0.1 , where j ? {1, . . .K}.
P(St = Ni |St-1 = Nj) = Uniform probability distribution over immediate neighbours i = [j-1, j, j+1], where I,j ? {1,….K}.
With the growing map size, the computation time per image increases linearly and becomes intractable with large maps. The loop closure algorithm computation, in accordance with the present embodiment, mainly depends on the size of the dictionary. For a fixed size dictionary, only most recent and frequent words in the dictionary are retained. Aforementioned forgetting factor is implemented to remove old and less used words. As the size of the dictionary reaches a specified threshold, computation time per image become constant.
Although the present subject matter is explained considering the system 200 being implemented as a single device, it may be understood that the system 200 may also be implemented as a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a network server, a tablet, a mobile phone, a robot and the like.
FIG. 5 illustrates a flow diagram of a method 500 for image-base indoor localization and navigation, in accordance with an example embodiment. At block 502, a media content containing a plurality of frames is received. The media content is captured in at least one orientation of the media content generator. In an example embodiment, the plurality of frames may include a video or a set of images. At block 504, center point of a current frame from amongst the plurality of frames and a plurality of key points associated with the current frame is determined. The change in the distance determined is combination of a change in the distance determined by the visual odometry and a change in the distance determined by an accelerometer to obtain a final change in the distance. At block 506, distance between each of the plurality of key points associated with the current frame and each of the plurality of key points associated with a subsequent frame is calculated. A change in the distance of each key point from the center point of the current frame and the subsequent frame is determined based on the calculated distance. A Kalman filter is applied on the final change in the distance to obtain a filtered distance.
At block 508, orientation of the current frame with respect to the orientation of the media content generator is determined. A topological map is generated based on the change in the distance of the each key point and the orientation of the current frame. The topological map comprises distinct landmarks of a location in a given environment.
At block 510, loop closure for the current frame is determined. The loop closure determination includes extracting visual features for each of a node from a plurality of nodes of the current frame and assigning words corresponding to the visual features to obtain a dictionary comprising a set of visual words and determining a similarity index by a similarity measure for each of the node from the plurality of nodes of the current frame with one or more previous frames of the plurality of frames. The visual features of each node of the current frame are extracted using SURF descriptor. The similarity index is indicative of a similarity between the words of the current frame and the one or more previous frames. The similarity between the words is based on a score associated with common words in the current frame and the one or more previous frames. Herein, a weightage is assigned to each of the visual features based on frequency and time of occurrence of the word and based on the weightage, less commonly occurring words are removed.
At block 512, a navigation map corresponding to the topological map is dynamically generated by identifying the current frame as a previously identified landmark in an environment based on the loop closure detection.
The order in which the in which the method(s) are described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method 500, or an alternative method. Additionally, individual blocks may be deleted from the methods without departing scope of the subject matter described herein. Furthermore, the method 500 can be implemented in any suitable hardware, software, firmware, or combination thereof.
In an implementation, one or more of the method(s) described herein may be implemented at least in part as instructions embodied in a non-transitory computer-readable medium and executable by one or more computing devices. In general, a processor (for example a microprocessor) receives instructions, from a non-transitory computer-readable medium, for example, a memory, and executes those instructions, thereby performing one or more method(s), including one or more of the method(s) described herein. Such instructions may be stored and/or transmitted using any of a variety of known computer-readable media. It is, however to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device.
In various embodiments of FIGS. 1-5, a method and system for image-base indoor localization and navigation is disclosed. The system to assist a person navigating inside a building particularly when a person wants to explore a new indoor environment. The system and method further facilitates an effective mechanism to learn the maps dynamically i.e., without the prior knowledge of the indoor environment to be navigated. Further, the visual odometry data provides the information of the ego motion of capturing device by using continuous images. Visual odometry in addition to image based loop closure provides complimentary information for traversing an unknown path. The loop closure based on a sequence of observations instead of a single observation. It increases the robustness and accuracy of loop closure detection.
Various embodiments disclosed herein presents method and system for image-base indoor localization and navigation that significantly improves the computation time per image and less usage of memory.
The preceding description has been presented with reference to various embodiments. Persons having ordinary skill in the art and technology to which this application pertains appreciate that alterations and changes in the described structures and methods of operation can be practiced without meaningfully departing from the principle, spirit and scope.
The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
It is intended that the disclosure and examples be considered as exemplary only, with a true scope and spirit of disclosed embodiments being indicated by the following claims.
The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.

Documents

Application Documents

#	Name	Date
1	201621027377-IntimationOfGrant28-07-2023.pdf	2023-07-28
1	Form 3 [10-08-2016(online)].pdf	2016-08-10
2	201621027377-PatentCertificate28-07-2023.pdf	2023-07-28
2	Form 20 [10-08-2016(online)].jpg	2016-08-10
3	Form 18 [10-08-2016(online)].pdf_118.pdf	2016-08-10
3	201621027377-Written submissions and relevant documents [17-04-2023(online)].pdf	2023-04-17
4	Form 18 [10-08-2016(online)].pdf	2016-08-10
4	201621027377-PETITION UNDER RULE 137 [15-04-2023(online)].pdf	2023-04-15
5	Drawing [10-08-2016(online)].pdf	2016-08-10
5	201621027377-RELEVANT DOCUMENTS [15-04-2023(online)].pdf	2023-04-15
6	Description(Complete) [10-08-2016(online)].pdf	2016-08-10
6	201621027377-FORM-26 [11-04-2023(online)].pdf	2023-04-11
7	Other Patent Document [17-08-2016(online)].pdf	2016-08-17
7	201621027377-Correspondence to notify the Controller [29-03-2023(online)].pdf	2023-03-29
8	Form 26 [20-09-2016(online)].pdf	2016-09-20
8	201621027377-FORM-26 [29-03-2023(online)]-1.pdf	2023-03-29
9	201621027377-FORM-26 [29-03-2023(online)].pdf	2023-03-29
9	ABSTRACT1.JPG	2018-08-11
10	201621027377-Power of Attorney-260916.pdf	2018-08-11
10	201621027377-US(14)-HearingNotice-(HearingDate-11-04-2023).pdf	2023-03-20
11	201621027377-FER.pdf	2021-10-18
11	201621027377-Form 1-190816.pdf	2018-08-11
12	201621027377-CLAIMS [28-02-2021(online)].pdf	2021-02-28
12	201621027377-Correspondence-260916.pdf	2018-08-11
13	201621027377-COMPLETE SPECIFICATION [28-02-2021(online)].pdf	2021-02-28
13	201621027377-Correspondence-190816.pdf	2018-08-11
14	201621027377-FER_SER_REPLY [28-02-2021(online)].pdf	2021-02-28
14	201621027377-OTHERS [28-02-2021(online)].pdf	2021-02-28
15	201621027377-FER_SER_REPLY [28-02-2021(online)].pdf	2021-02-28
15	201621027377-OTHERS [28-02-2021(online)].pdf	2021-02-28
16	201621027377-COMPLETE SPECIFICATION [28-02-2021(online)].pdf	2021-02-28
16	201621027377-Correspondence-190816.pdf	2018-08-11
17	201621027377-Correspondence-260916.pdf	2018-08-11
17	201621027377-CLAIMS [28-02-2021(online)].pdf	2021-02-28
18	201621027377-FER.pdf	2021-10-18
18	201621027377-Form 1-190816.pdf	2018-08-11
19	201621027377-Power of Attorney-260916.pdf	2018-08-11
19	201621027377-US(14)-HearingNotice-(HearingDate-11-04-2023).pdf	2023-03-20
20	201621027377-FORM-26 [29-03-2023(online)].pdf	2023-03-29
20	ABSTRACT1.JPG	2018-08-11
21	201621027377-FORM-26 [29-03-2023(online)]-1.pdf	2023-03-29
21	Form 26 [20-09-2016(online)].pdf	2016-09-20
22	201621027377-Correspondence to notify the Controller [29-03-2023(online)].pdf	2023-03-29
22	Other Patent Document [17-08-2016(online)].pdf	2016-08-17
23	201621027377-FORM-26 [11-04-2023(online)].pdf	2023-04-11
23	Description(Complete) [10-08-2016(online)].pdf	2016-08-10
24	201621027377-RELEVANT DOCUMENTS [15-04-2023(online)].pdf	2023-04-15
24	Drawing [10-08-2016(online)].pdf	2016-08-10
25	Form 18 [10-08-2016(online)].pdf	2016-08-10
25	201621027377-PETITION UNDER RULE 137 [15-04-2023(online)].pdf	2023-04-15
26	Form 18 [10-08-2016(online)].pdf_118.pdf	2016-08-10
26	201621027377-Written submissions and relevant documents [17-04-2023(online)].pdf	2023-04-17
27	Form 20 [10-08-2016(online)].jpg	2016-08-10
27	201621027377-PatentCertificate28-07-2023.pdf	2023-07-28
28	Form 3 [10-08-2016(online)].pdf	2016-08-10
28	201621027377-IntimationOfGrant28-07-2023.pdf	2023-07-28

Search Strategy

1	searchE_05-08-2020.pdf