Abstract: Generating a ranked list of emerging technologies from diverse and subjective sources and grouping them into technology families in a systematic and objective manner is addressed by the present disclosure. The system and method systematically identify a ranked list of emerging technologies from diverse sources; eliminate duplicate technology terms; and performs multi-stage clustering of de-duplicated emerging technologies to obtain a set of technology clusters that can be mapped to a plurality of technology families such as the unitary technologies, technology domains and industry domains. Having grouped the de-duplicated emerging technologies into technology clusters and further classified under the technology families, using a network of nodes representing technology groups and edges characterized by common keywords across the nodes, collaborative capabilities of the technology groups are assessed and the technologies are suitably ranked. This enables maintaining technology foresight in an ever-changing landscape for use by portfolio management tool or innovation funnel models.
FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION (See Section 10 and Rule 13)
Title of invention:
AUTOMATIC IDENTIFICATION, RANKING AND GROUPING OF EMERGING TECHNOLOGIES FROM DIVERSE SOURCES
Applicant
Tata Consultancy Services Limited A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
Preamble to the description
The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD [001] The disclosure herein generally relates to the field of correlating data from diverse sources, and, more particularly, to systems and methods for automatic identification, ranking and grouping of emerging technologies from diverse sources into technology groups.
BACKGROUND
[002] Emerging technologies are catalysts for change, offering extraordinary new capabilities to innovation. Adoption of these technologies allows forward-thinking and reimagining the possible. There are a number of sources that periodically publish a list of emerging technologies they consider will make the most impact in the next several years. These sources are either analysts (Gartner, McKinsey, etc.) or Futurologists (Future Today Institute, Imperial Tech Foresight, etc.). These sources do not apply the same criteria or look at all areas of technology with equal emphasis. As a result, there are several divergent versions of what future technology trends might look like. The methodology relies significantly on in-house expert opinions and not disclosed for independent verification. There is no benchmark by which the various sources can be evaluated.
[003] To enable insights and foresights, a formal and standardized approach to establish the technology watchlist is required. Since there are multiple sources of information with varying degree of forecasting accuracy, and since different sources often concentrate on a limited category of technologies, it is challenging to correlate data from diverse sources, generate a list of emerging technologies (including in-house sources) and objectively evaluate them based on meaningful criteria and rank them in order of their putative transformative capability.
SUMMARY [004] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems.
[005] In an aspect, there is provided a processor implemented method comprising the steps of: identifying, via one or more hardware processors, a list of emerging technologies, at a current time period, from a plurality of technologies obtained from a plurality of sources, the list of the emerging technologies being ranked based on a Content Attribution Score (CAS) computed for each technology in the plurality of technologies, wherein the emerging technologies are technologies associated with unrealized development, practical applications, or a combination thereof, and wherein the plurality of sources comprise at least one of (i) internet sources including one or more web pages pertaining to the emerging technologies, (ii) databases and (iii) offline documents pertaining to the emerging technologies; eliminating, via the one or more hardware processors, duplicate technology terms from the identified list of the emerging technologies using a string-matching technique and word vector embeddings of technology terms to obtain a list of de-duplicated emerging technologies that are re-ranked using the CAS, wherein (i) the technology terms and (ii) associated technology description for each of the technology terms are obtained from the plurality of sources and updated in a technology corpus of interest to an enterprise; and performing multi-stage clustering, via the one or more hardware processors, of the de-duplicated emerging technologies into a plurality of technology families, the multi-stage clustering comprising: (i) using k-means clustering on document word vectors generated from technology description obtained from the plurality of sources, to obtain a set of technology clusters, wherein each technology cluster in the set of technology clusters is characterized by one or more technology terms having technology description and keywords associated thereof; and (ii) refining the set of technology clusters using an Iterative Keyword Swapping Method (IKSM) based regrouping of the de-duplicated emerging technologies, for technology alignment of each emerging technology, using the technology corpus of interest to the enterprise associated with a preceding time period, to obtain a set of technology group clusters characterized by one or more technology terms having technology description and keywords associated thereof and belonging to a plurality of technology families for the current time period based on a received mapping of the plurality of technology
families to the de-duplicated emerging technologies using the associated technology description.
[006] In another aspect, there is provided a system comprising: memory storing instructions; one or more communication interfaces; one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to: identify, a list of emerging technologies, at a current time period, from a plurality of technologies obtained from a plurality of sources, the list of the emerging technologies being ranked based on a Content Attribution Score (CAS) computed for each technology in the plurality of technologies, wherein the emerging technologies are technologies associated with unrealized development, practical applications, or a combination thereof, and wherein the plurality of sources comprise at least one of (i) internet sources including one or more web pages pertaining to the emerging technologies, (ii) databases and (iii) offline documents pertaining to the emerging technologies; eliminate, duplicate technology terms from the identified list of the emerging technologies using a string-matching technique and word vector embeddings of technology terms to obtain a list of de-duplicated emerging technologies that are re-ranked using the CAS, wherein (i) the technology terms and (ii) associated technology description for each of the technology terms are obtained from the plurality of sources and updated in a technology corpus of interest to an enterprise; and perform multi-stage clustering, of the de-duplicated emerging technologies into a plurality of technology families, the multi-stage clustering comprising: (i) using k-means clustering on document word vectors generated from technology description obtained from the plurality of sources, to obtain a set of technology clusters, wherein each technology cluster in the set of technology clusters is characterized by one or more technology terms having technology description and keywords associated thereof; and (ii) refining the set of technology clusters using an Iterative Keyword Swapping Method (IKSM) based regrouping of the de-duplicated emerging technologies, for technology alignment of each emerging technology, using the technology corpus of interest to the enterprise associated with a preceding time period, to obtain a set of
technology group clusters characterized by one or more technology terms having technology description and keywords associated thereof and belonging to a plurality of technology families for the current time period based on a received mapping of the plurality of technology families to the de-duplicated emerging technologies using the associated technology description.
[007] In yet another aspect, there is provided a computer program product comprising a non-transitory computer readable medium having a computer readable program embodied therein, wherein the computer readable program, when executed on a computing device, causes the computing device to: identify, a list of emerging technologies, at a current time period, from a plurality of technologies obtained from a plurality of sources, the list of the emerging technologies being ranked based on a Content Attribution Score (CAS) computed for each technology in the plurality of technologies, wherein the emerging technologies are technologies associated with unrealized development, practical applications, or a combination thereof, and wherein the plurality of sources comprise at least one of (i) internet sources including one or more web pages pertaining to the emerging technologies, (ii) databases and (iii) offline documents pertaining to the emerging technologies; eliminate, duplicate technology terms from the identified list of the emerging technologies using a string-matching technique and word vector embeddings of technology terms to obtain a list of de-duplicated emerging technologies that are re-ranked using the CAS, wherein (i) the technology terms and (ii) associated technology description for each of the technology terms are obtained from the plurality of sources and updated in a technology corpus of interest to an enterprise; and perform multi-stage clustering, of the de-duplicated emerging technologies into a plurality of technology families, the multi-stage clustering comprising: (i) using k-means clustering on document word vectors generated from technology description obtained from the plurality of sources, to obtain a set of technology clusters, wherein each technology cluster in the set of technology clusters is characterized by one or more technology terms having technology description and keywords associated thereof; and (ii) refining the set of technology clusters using an Iterative Keyword Swapping Method (IKSM) based regrouping of the de-
duplicated emerging technologies, for technology alignment of each emerging technology, using the technology corpus of interest to the enterprise associated with a preceding time period, to obtain a set of technology group clusters characterized by one or more technology terms having technology description and keywords associated thereof and belonging to a plurality of technology families for the current time period based on a received mapping of the plurality of technology families to the de-duplicated emerging technologies using the associated technology description.
[008] In accordance with an embodiment of the present disclosure, the one or more hardware processors are configured to receive the plurality of sources; perform at least one of: periodically crawling each of the plurality of sources using Document Object Model (DOM) navigation rules to obtain (i) the technology terms, (ii) a page rank for each of the internet sources and (iii) the technology description for each of the technology terms in the event that the plurality of sources are internet sources; and obtaining (i) the technology terms and the associated technology description for each of the technology terms from plurality of sources using Named entity recognition (NER) extraction and Regular Expression(regex) patterns; and (ii) page rank from an associated internet source, in the event that the plurality of sources are the databases and the offline documents pertaining to the emerging technologies; and update the technology corpus of interest to the enterprise, with the obtained technology terms, page rank for each of the plurality of sources and the associated technology description for each of the technology terms, prior to identifying the list of emerging technologies.
[009] In accordance with an embodiment of the present disclosure, the one or more hardware processors are configured to compute CAS by: obtaining the page rank associated with each of the plurality of sources corresponding to each technology; adding page ranks of the plurality of sources corresponding to each technology that appears in more than one source from the plurality of sources; normalizing the added page rank corresponding to each technology within a range of 1 to 5 to obtain a Page Rank (PR) score corresponding to each technology; computing an enterprise technology score for each technology based on percentage
of contribution of each of the plurality of sources to each technology and normalizing the enterprise technology score to a value between 1 and 5; obtaining an overall source rank for each technology as an average of normalized source ranks associated with corresponding sources from the plurality of sources, wherein each normalized source rank has a value in a range 1 to 10, and wherein a default value of 5 is assigned as a source rank of the technology before normalizing, if no source rank is available for a technology for a corresponding source; and computing the CAS for each technology based on the obtained PR score, the computed enterprise technology score and the obtained overall source rank, wherein the value of the CAS is in a range of 1 to 100.
[010] In accordance with an embodiment of the present disclosure, the one or more hardware processors are configured to eliminate the duplicate technologies from the identified list of emerging technologies by: combining the plurality of technologies in the technology corpus of interest to the enterprise, with the identified list of the emerging technologies; performing a first level filter of the combined plurality of technologies, by eliminating duplicate technology terms identified using the string-matching technique; splitting the technology terms from the first level filtered plurality of technologies into unigrams and identifying the word vector embedding for each combination of technology terms by using the unigrams to identify similar words in bigrams and trigrams as subparts; comparing the technology terms in each pair of technology terms using similarity scores associated with the word vector embedding for the combinations of technology terms; flagging, as duplicates, the technology terms with a higher similarity score in each pair of technology terms; performing a second level filter of the first level filtered list, to obtain the list of de-duplicated emerging technologies, by identifying the technology terms having the highest value of CAS amongst the flagged duplicate technology terms as distinct technology terms; and re-ranking the list of de-duplicated emerging technologies using the CAS.
[011] In accordance with an embodiment of the present disclosure, the one or more hardware processors are configured to perform IKSM by: iteratively performing for each technology group in the technology corpus of interest to the
enterprise associated with the preceding time period, the steps of: identifying a cluster from the set of technology clusters as a target cluster, wherein the target cluster has a highest count of technology terms associated with a technology group in the technology corpus of interest to the enterprise associated with the preceding time period; naming the target cluster in line with a name associated with the technology group in the technology corpus of interest to the enterprise associated with the preceding time period; identifying one or more missing technologies in the target cluster when compared to the technology group in the technology corpus of interest to the enterprise associated with the preceding time period; determining a source cluster for each of the one or more missing technologies; transferring the technology terms from an associated source cluster for each of the one or more missing technologies along with associated minimum keywords to the target cluster based on minimum keywords in the source cluster that map to keywords associated with the target cluster; and transferring the technology terms from any source cluster associated with keywords mapping to the target cluster; determining one or more anchor keywords for each unnamed cluster, wherein the one or more anchor keywords are keywords associated with a predefined number of highest combined Term Frequency–Inverse Document Frequency (TF-IDF) scores in each unnamed cluster; grouping technologies associated with the one or more anchor keywords to form one or more regrouped unnamed clusters while eliminating the one or more anchor keywords from other clusters amongst the unnamed clusters; and naming the formed one or more regrouped unnamed clusters based on the associated one or more anchor keywords, thereby obtaining the set of technology group clusters for the current time period, wherein the set of technology group clusters belong to the plurality of technology families based on a received mapping of the plurality of technology families to the de-duplicated emerging technologies using the associated technology description.
[012] In accordance with an embodiment of the present disclosure, the one or more hardware processors are further configured to: generate a network of technology groups in the set of technology group clusters, using a network analysis tool, wherein the network comprises nodes and edges, and wherein each node
represents a technology group and two nodes are connected by an edge if there is at least one keyword common to the technology groups; compute a Weighted Degree Centrality (WDC) indicative of reactivity of a technology group; normalize values of the WDC metric of each node in a range of 0 to 10; and associate reactivity of a technology group to the normalized values of the WDC metric, wherein a highest normalized value of the WDC metric is indicative of a most reactive technology group.
[013] In accordance with an embodiment of the present disclosure, the one or more hardware processors are further configured to: input (i) the set of technology group clusters characterized by one or more technology terms having technology description and keywords and belonging to the plurality of technology families for the current time period, and (ii) reactivity of the technology group in the set of technology group clusters into at least one of (a) an innovation funnel model to select the technology groups from the set of technology group clusters for Research and Development (R&D) and (b) a portfolio management tool for evaluating research projects based on utilization of the emerging technologies in terms of prioritization, capacity utilization, risk mitigation and cost optimization.
[014] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS [015] The accompanying drawings, which are incorporated in and
constitute a part of this disclosure, illustrate exemplary embodiments and, together
with the description, serve to explain the disclosed principles:
[016] FIG.1 illustrates an exemplary block diagram of a system for
automatic identification, ranking and grouping of emerging technologies from
diverse sources into technology groups, in accordance with some embodiments of
the present disclosure.
[017] FIG.2A through 2C illustrates an exemplary flow diagram of a
computer implemented method for automatic identification, ranking and grouping
of emerging technologies from diverse sources into technology groups, in accordance with some embodiments of the present disclosure.
[018] FIG.3 illustrates an exemplary network comprising nodes and edges, in accordance with some embodiments of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS
[019] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
[020] A study of various sources available that periodically publish a list of emerging technologies reveals that the variant sources have their own perspectives that are valuable and cannot be ignored. However, not all sources may have the same level of authority and reliability. Also, different sources may rank the same technology dissimilarly. It is critical that the divergent sources be considered but at the same time the differences therein are reconciled to determine a fair rank of each technology and derive insights on the impact each technology may make in future.
[021] In the context of the present disclosure, the expression ‘emerging technologies’ are technologies associated with unrealized development, practical applications, or both. The diverse sources that are evaluated, referred to as ‘plurality of sources’ include at least one of (i) internet sources including one or more web pages pertaining to emerging technologies, (ii) databases and (iii) offline documents pertaining to emerging technologies. They may include in-house sources as well.
[022] In the context of the present disclosure, the expressions ‘technology terms’ and ‘technology’ may be used interchangeably.
[023] In the context of the present disclosure, ‘diffusion state’ of a technology refers to the usage or impact of the technology across industries or domains.
[024] In the context of the present disclosure, ‘unitary technologies’ refers to core technologies or digital technologies such as Artificial Intelligence (AI) and NextGen Computing, that are derivatives of Computer Sciences and Information Technology. These are foundational blocks of digital systems and include software, hardware, communications (network) and data systems and can potentially deliver exponential impact on applied business. Certain technologies, such as space, whose impact is yet to be observed but cannot be ignored because these could potentially alter existing commerce paradigms are also considered as unitary technologies.
[025] In the context of the present disclosure, ‘technology domains’ refers to digital technologies such as Data & Intelligence, and Material Science, that can create exponential impact across social, political and economic world. These technologies usually build over multiple core technologies and have cross-industry relevance. They explicate combinatorial possibilities at the confluence of the physical, digital and biological worlds and lead the change towards more connected and inter-dependent commerce. Digital platforms are also included under technology domains.
[026] In the context of the present disclosure, ‘industry domains’ refers to industry imperatives derived from human needs, further equated with the forces of disruption (e.g. infrastructure, demographics, etc.) and emerging ecosystems (e.g. Mumbai, Jakarta, etc.). Industry imperatives are expected to have longevity and will mostly remain invariant over many years. Some examples include Drug Discovery & Development and Genetic Engineering.
[027] In accordance with the present disclosure, a systematic method system for executing the method is provided for compiling a list of emerging technologies from diverse sources (including in-house sources) and objectively evaluating them. However, since emerging technologies are nascent, they are often referred by different names by the individual sources and hence a list compiled from different sources are likely to contain many duplicates. Again, being nascent, there
is very little information published about the emerging technologies and therefore, there exists no standard way to de-duplicate the list of technologies. The method of the present disclosure addresses the nature and application of the technologies being compared to identify duplicates and then classifies them into enterprise-relevant technology families such as the unitary technology, the technology domains and the industrial domains (defined above) based on their diffusion state. In accordance with the present disclosure, identifying the emerging technologies from diverse sources and systematically and objectively grouping them under such domains is the key to deriving insights and foresight.
[028] Referring now to the drawings, and more particularly to FIG. 1 through FIG.3, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
[029] FIG.1 illustrates an exemplary block diagram of a system 100 for automatic identification, ranking and grouping of emerging technologies from diverse sources into technology groups, in accordance with some embodiments of the present disclosure. In an embodiment, the system 100 includes one or more hardware processors 104, communication interface (s) or input/output (I/O) interface(s) 106, and one or more data storage devices or memory 102 operatively coupled to the one or more hardware processors 104. The one or more hardware processors 104 can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, graphics controllers, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) are configured to fetch and execute computer-readable instructions stored in the memory. In the context of the present disclosure, the expressions ‘processors’ and ‘hardware processors’ may be used interchangeably. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.
[030] The communication interface (s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface(s) can include one or more ports for connecting a number of devices to one another or to another server.
[031] The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, one or more modules (not shown) of the system 100 can be stored in the memory 102.
[032] FIG.2A through FIG.2C illustrates an exemplary flow diagram of a computer implemented method 200 for automatic identification, ranking and grouping of emerging technologies from diverse sources into technology groups, in accordance with some embodiments of the present disclosure. In an embodiment, the system 100 includes the memory 102 operatively coupled to the one or more hardware processors 104 and is configured to store instructions configured for execution of steps of the method 200 by the one or more hardware processors 104. The steps of the method 200 will now be explained in detail with reference to the components of the system 100 of FIG.1. Although process steps, method steps, techniques or the like may be described in a sequential order, such processes, methods and techniques may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously.
[033] In an embodiment of the present disclosure, the one or more hardware processors 104, are configured to identify, at step 202, a list of the
emerging technologies, at a current time period, from a plurality of technologies obtained from a plurality of sources, the list of emerging technologies being ranked based on a Content Attribution Score (CAS) computed for each technology in the plurality of technologies.
[034] In an embodiment, the step of identifying the list of the emerging technologies is preceded by receiving the plurality of sources which are typically diverse. Depending on whether the plurality of sources are internet sources or otherwise, one of the following steps are performed. If the plurality of sources are internet sources, the list of the emerging technologies is identified by periodically crawling each of the plurality of sources using Document Object Model (DOM) navigation rules to obtain (i) the technology terms, (ii) a page rank for each of the internet sources and (iii) the technology description for each of the technology terms, from the plurality of sources being internet sources. In an embodiment, Selenium™ Integrated Development Environment(IDE) may be used with the DOM. For each new source, a new crawler is developed and added to a collection of crawlers. Whenever there is a new content update, the crawlers are activated using a job scheduler. When the job completes, new content is extracted from all sites of interest saved as individual files (one file per source) in a specially marked folder. For each source that is crawled, the page rank may be obtained using a PageRank Application Programming Interface (API). The page rank of a web page denotes the importance of the page based on the volume and page rank of all web sources that link to this page. This provides a good proxy for the relative authority of the source compared to the plurality of sources.
[035] In the event that the plurality of sources are the databases and the offline documents pertaining to the emerging technologies, (i) the technology terms and the associated technology description for each of the technology terms are obtained from the plurality of sources using Named entity recognition (NER) extraction and Regular Expression(regex) patterns and (ii) page rank is obtained from an internet source associated with the offline documents or databases.
[036] In accordance with the present disclosure, the CAS is computed as explained via examples below. The page rank associated with each of the plurality
of sources corresponding to each technology is obtained. For instance, refer Table 1 below that illustrates two exemplary sources and exemplary page ranks for a technology, say, social citizenship. Table 1:
Sources Page Rank
Wikipedia.org 6.34/10
futuretodayinstitute.com 4.5/10
[037] The page ranks of the plurality of sources corresponding to each technology that appears in more than one source from the plurality of sources are then added. For instance, refer Table 2 below that illustrates some technologies associated with exemplary sources and exemplary page ranks. Table 2:
Technologies Page Rank
futuretodayinstitute.com Wikipedia.org Added page rank
Social payments 4.5 0 4.5
AI for drug 4.5 0 4.5
development
Graph Neural 4.5 0 4.5
Networks
Digital 4.5 6.34 10.84
citizenship
Social 4.5 6.34 10.84
citizenship
[038] The added page rank corresponding to each technology is then normalized within a range of 1 to 5 to obtain a Page Rank (PR) score corresponding to each technology. For instance, refer Table 3 below that illustrates some technologies associated with exemplary sources and exemplary page ranks. Table 3:
Technologies Page Rank PR score
futuretodayinstitute.com Wikipedia.org Normalized page rank
Social payments 4.5 0 2.25
AI for drug 4.5 0 2.25
development
Graph Neural 4.5 0 2.25
Networks
Digital 4.5 6.34 2.71
citizenship
Social 4.5 6.34 2.71
citizenship
[039] An enterprise technology score is computed for each technology based on percentage of contribution of each of the plurality of sources to each technology and normalizing the enterprise technology score to a value between 1 and 5. For instance, refer Table 4 below that illustrates some technologies, Yes/No is indicative of the presence or absence of the technology term in the mentioned source, % contribution of the source and associated enterprise technology score.
[040] Table 4:
Technologies futuretodayinstitute.com Wikipedia.org %
contribution
of the
source Enterprise
technology
score
Social Yes No 0.8 4
payments
AI for drug Yes No 0.8 4
development
Graph Neural Yes No 0.8 4
Networks
Digital Yes Yes 0.2 1
citizenship
Social No Yes 0.2 1
citizenship
[041] An overall source rank is obtained for each technology, as an average of normalized source ranks associated with corresponding sources from the plurality of sources, wherein each normalized source rank has a value in a range 1 to 10, and wherein a default value of 5 is assigned as a source rank of the technology before normalizing, if no source rank is available for a technology for a corresponding source. For instance, refer Table 5 below that illustrates source rank and normalized source rank for some technologies. Table 5:
Source Rank Normalized Source
Rank (1-10)
Technologies Future Today
Institute (FTI)
rank Wikipedia (Wiki) rank FTI rank Wiki rank
Social 2 0 0.04 0
payments
AI for drug 360 0 7.2 0
development
Graph Neural 376 0 7.52 0
Networks
Digital 14 5 0.28 5
citizenship
Social 0 5 0 5
citizenship
[042] Taking an average of the source ranks corresponding to technologies appearing in more than source, Table 6 below illustrates an exemplary overall source rank obtained for each technology. Table 6:
Technologies Overall source rank (average=1, if value is less than 1)
Social payments 1
AI for drug development 7.2
Graph Neural Networks 7.52
Digital citizenship 2.64
Social citizenship 2.5
[043] Finally, the CAS is computed for each technology based on the obtained PR score, the computed enterprise technology score and the obtained overall source rank, wherein the value of the CAS is in a range of 1 to 100. In an embodiment, the CAS may be represented as CAS = (PR score + enterprise technology score) * (10/Overall source rank) as shown in Table 7 below. Table 7:
Technologies CAS= (PR score + Enterprise
technology score)*(10/Overall
source rank)
Social payments 62.5
AI for drug development 8.680555556
Graph Neural Networks 8.311170213
Digital citizenship 14.0530303
Social citizenship 14.84
[044] At the end of step 202, a ranked list of emerging technologies is identified, wherein each of the technologies is characterized by associated technology term and associated technology description. As mentioned above, since the emerging technologies are in a nascent stage, not much published information is available. Also, the perspectives of the associated plurality of sources may be different and subjective resulting in different names being used for the same technology in different sources from the plurality of sources. The method and system of the present disclosure evaluates existing and new technologies in a sequence that helps identify duplicates with increasing level of confidence.
[045] In an embodiment of the present disclosure, the one or more hardware processors 104, are configured to eliminate, at step 204, duplicate technology terms from the identified list of the emerging technologies, using (a) a string-matching technique and (b) word vector embeddings of technology terms to obtain a list of de-duplicated emerging technologies that are re-ranked using the CAS, wherein (i) the technology terms and (ii) associated technology description for each of the technology terms are obtained from the plurality of sources and updated in a technology corpus of interest to an enterprise.
[046] Sometimes, a technology term may be repeated in another technology term as its part. For instance, ‘Blockchain’ may be repeated in two other technology terms ‘Blockchain in Banking’ and ‘Blockchain in Insurance’. Hence both the technology terms ‘Blockchain in Banking’ and ‘Blockchain in Insurance’ are required to be marked as duplicates of the technology ‘Blockchain’. Using only Word2Vec methodology does not flag such duplicate terms. Also, ‘CNN’ representing Convolutional Neural Network and ‘RNN’ representing Recurrent Neural Network maybe flagged as similar if only Word2Vec methodology is used since it is difficult for word embedding or comparable algorithms to differentiate between ‘CNN’ and ‘RNN’. Hence the present disclosure uses a combination of a string-matching technique and word vector embeddings for accurate de-duplication.
[047] In an embodiment, the step of eliminating the duplicate technologies comprises combining the plurality of technologies in the technology corpus of interest to the enterprise with the identified list of the emerging technologies and performing a first level filter of the combined plurality of technologies, by eliminating duplicate technology terms identified using the string-matching technique. Using string comparison, exact string matches are identified for elimination. The technology terms from the first level filtered plurality of technologies are split into unigrams and the word vector embedding is identified for each combination of technology terms by using the unigrams to identify similar words in bigrams and trigrams as subparts that improves the accuracy of the output.
[048] In an embodiment, the word vector embedding is created by extracting word tokens from all documents in the technology corpus of interest to the enterprise and removing stop words such as on, of, the etc. Valid word tokens are selected by computing a Normalized Pairwise Mutual Information (NPMI ref: https://svn.spraakdata.gu.se/repos/gerlof/ pub/www/Docs/npmi-pfd.pdf) for each pair of words appearing together in the technology corpus of interest to the enterprise and applying a threshold. Occurrence of the word tokens in each document in the technology corpus of interest to the enterprise are replaced with jointed unigrams (for e.g., replace artificial neural network with artificial_neural_network).
[049] Table 8 below lists exemplary word tokens and associated jointed unigrams. Table 8:
Word tokens Jointed unigrams
Collaborative Robots CollaborativeRobots
Truth Decay in an Era of Synthetic Media TruthDecayinanEraofSyntheticMedia
Autonomous Mobile Robots AutonomousMobileRobots
Synthetic Media and Society Synthetic_Media_and_Society
Self-Assembling Robots Self-AssemblingRobots
Mobile Robots MobileRobots
Programmable Robot Swarms ProgrammableRobotSwarms
Synthetic Media and Content SyntheticMediaandContent
Synthetic Media Marketplaces Synthetic_Media_Marketplaces
Truth Decay in an Era of Synthetic Media TruthDecayinanEraofSyntheticMedia
Synthetic Media Technologies SyntheticMediaTechnologies
Using Synthetic Media to get around Copyright Laws SyntheticMediaTechnologies
[050] A Word2Vec model of the technology corpus of interest to the enterprise is then created using any state of the art methods such as https://radimrehurek.com/gensim/models/word2vec.html.
[051] The technology terms in each pair of technology terms are compared using similarity scores associated with the word vector embedding for the combinations of technology terms. The technology terms with a higher similarity score in each pair of technology terms are flagged as duplicates. A second level filter of the first level filtered list is performed, to obtain the list of de-duplicated emerging technologies, by identifying the technology terms having the highest value of CAS amongst the flagged duplicate technology terms as distinct technology terms. In accordance with the present disclosure, the method thus takes into account the fact that two different technologies may score a high similarity and two dissimilar text features (phrases, n-grams) describing the same technology may be flagged as different technologies. The list of de-duplicated emerging technologies is then re-ranked using the CAS.
[052] An exemplary set of duplicate technology terms identified at step 204, of the method 200 is as shown in Table 9 below. Table 9:
Technology term Duplicate technology terms
Autonomous mobile robots Self-Assembling Robots, Mobile
Robots, Programmable Robot Swarms,
Collaborative Robots
Synthetic media and content Synthetic Media Marketplaces, Truth
Decay in an Era of Synthetic Media,
Synthetic Media Technologies,
Synthetic Media and Society, Using
Synthetic Media to get around
Copyright Laws
[053] Conventionally, classification methods or clustering methods have been used to group technologies based on certain attributes provided there is sufficient training data available which is not the case with the emerging
technologies. For clustering, the attributes that characterize the technology’s family affinity are hard to identify and measure. Hence traditional methods cannot be used to perform the classification of the de-duplicated emerging technologies obtained at the output of step 204. In accordance with the present disclosure, sparse amount of technology description available are used to produce a set of technology clusters which is further refined using an Iterative Keyword Swapping Method (IKSM) based regrouping. By iterating through each technology group, starting from named groups, the clusters are refined to a high degree of accuracy.
[054] In an embodiment of the present disclosure, the one or more hardware processors 104, are configured to perform multi-stage clustering, at step 206, of the de-duplicated emerging technologies into a plurality of technology families such as the unitary technology, technology domains and industrial domains based on their diffusion state. In an embodiment, the multi-stage clustering comprises using k-means clustering on document word vectors generated from the associated technology description obtained from the plurality of sources, to obtain a set of technology clusters, wherein each technology cluster in the set of technology clusters is characterized by one or more technology terms having technology description and keywords associated thereof.
[055] An example of a technology term and associated technology description is as shown below:
Decision Intelligence: Decision intelligence (DI) is a practical domain framing a wide range of decision-making techniques. DI provides a framework that brings multiple traditional and advanced disciplines together to design, model, align, execute, monitor and tune decision models and processes. Those disciplines include decision management (including advanced nondeterministic techniques such as agent-based systems), decision support, continuous intelligence and process management; and techniques such as descriptive, diagnostics and predictive analytics.
[056] The technology description associated with each technology in the list of de-duplicated emerging technologies is converted into a bag-of-words representation by tokenizing the description and removing stop words. Term
Frequency-Inverse Document Frequency (TF-IDF) metric is computed for each word in the bag-of-words. A word vector is created for each technology using the technology corpus of interest to the enterprise as a vector space and the TF-IDF metric as coordinate values. The k-means clustering method is then employed to obtain the set of technology clusters. In an embodiment, an elbow method is used to determine the optimal number of clusters when employing the k-means clustering method. In an experiment, 700+ technology terms with associated technology description were considered for clustering. With the elbow method, it was determined that 26 is the optimal number of clusters for representing the technologies.
[057] Furthermore, at step 206, as part of the multi-stage clustering, the set of technology clusters is refined using the IKSM based regrouping of the de-duplicated emerging technologies, for technology alignment of each emerging technology, using the technology corpus of interest to the enterprise associated with a preceding time period, to obtain a set of technology group clusters belonging to a plurality of technology families for the current time period based on a received mapping of the plurality of technology families to the de-duplicated emerging technologies using the associated technology description.
[058] Exemplary keywords associated with some exemplary clusters in the set of technology clusters are as shown below:
Cluster No. 21: learn, train, model, deep, deep learn, learn model, supervise, educational, supervise learn, experiential, automate, unsupervised, problem, performance, game, transfer learn, reward, transfer, human, task, object, approach, adaptive, unsupervised learn, cognitive, learn deep, education, reuse, algorithm, win
Cluster No. 6: analytics, decision, discovery, science, model, organization, learn, analysis, discipline, database, enterprise, user, asset, optimization, program, visualization, term, scientist, technique, intelligence, time, pattern, lake, base, big, print, predictive, insight, advance, structure
Cluster No. 22: smart, device, connect, open, appliance, contract, open source, google, smart contract, apple, tech, source, fabric, consumer, amazon, smart
appliance, blockchain, voice, platform, glove, car, light, health, big, internet, screen, ecosystem, smart device, ownership
Cluster No. 5: compute, quantum, internet, architecture, access, infrastructure, web, computer, resource, wireless, center, communication, merchandise, organization, storage, native, scale, user, store, quantum computer, network, visual, supply chain, problem, governance, supply, connect, standard, distribute, software, human
[059] The IKSM based regrouping comprises iteratively performing for each technology group in the technology corpus of interest to the enterprise associated with the preceding time period, the steps explained hereinafter. A cluster from the set of technology clusters is identified as a target cluster, wherein the target cluster has a highest count of technology terms associated with a technology group in the technology corpus of interest to the enterprise associated with the preceding time period. The target cluster is then named in line with a name associated with the technology group in the technology corpus of interest to the enterprise, associated with the preceding time period. In the exemplary clusters shown above, cluster 21 having the highest count of technology terms associated with the technology group AI, is named AI.
[060] In an example, the keywords (in italics below) identified for the technologies which belong to AI cluster, but are present in some other clusters are as given below:
Cluster No. 21: learn, train, model, deep, deep learn, learn model, supervise, educational, supervise learn, experiential, automate, unsupervised, problem, performance, game, transfer learn, reward, transfer, task, object, approach, adaptive, unsupervised learn, cognitive, learn deep, education, reuse, algorithm, win, decision, analytics, intelligence, device, human, governance , ethical Cluster No. 6: analytics, decision, discovery, science, model, organization, learn, analysis, discipline, database, enterprise, user, asset, optimization, program, visualization, term, scientist, technique, time, pattern, lake, base, big, print, predictive, insight, advance, structure
Cluster No. 22: connect, open, appliance, contract, open source, google, smart contract, apple, tech, source, fabric, consumer, amazon, human, smart appliance,
device, blockchain, voice, platform, glove, car, light, intelligence, health, big, internet, screen, ecosystem, smart device, ownership
Cluster No. 5: compute, quantum, internet, architecture, access, infrastructure, web, computer, resource, wireless, center, communication, merchandise, organization, storage, native, scale, user, store, quantum computer, network, visual, supply chain, problem, governance, supply, connect, standard, distribute, software, ethical.
[061] One or more missing technologies in the target cluster (21) are identified when compared to the technology group AI in the technology corpus of interest to the enterprise associated with the preceding time period. A source cluster is determined for each of the one or more missing technologies. The technology terms from an associated source cluster for each of the one or more missing technologies are transferred along with associated minimum keywords to the target cluster based on minimum keywords in the source cluster that map to keywords associated with the target cluster. The technology terms from any source cluster associated with keywords mapping to the target cluster are also transferred.
[062] Table 10 below represents some exemplary clusters from the set of technology clusters. It may be noted that exemplary cluster 21 having the highest count of technology terms associated with the technology group AI is named AI. Table 10:
Cluster No. Technology Technology Keywords
group terms associated with the Technology group
21 AI
6 Decision Decision,
Intelligence analytics
22 Smart robotics Intelligence, device, human
5 Digital Ethics & governance,
AI governance ethical
[063] The underlined technology terms Decision Intelligence from Source Cluster No.6, Digital Ethics & AI Governance from Source Cluster No.5 and Smart Robotics from Source Cluster No. 22, in Table 11 below, are exemplary technologies that belonged to the Technology group AI in the technology corpus of interest to the enterprise associated with the preceding time period. Accordingly, these technologies are moved to the Technology group AI (Target Cluster No. 21). It may be understood that the minimum keywords associated with these Source clusters also move to the Technology group AI. For instance, Decision and analytics are two minimum keywords that map the technology Decision Intelligence to the Technology group AI, while Intelligence, device and human are three minimum keywords that map the technology Smart robotics to the Technology group AI. Also, if there are any other technologies in the source clusters that are associated with the minimum keywords identified, those technologies are also transferred to the target technology group. Table 11:
Cluster No. 21 Cluster No. 6 Cluster No. 22 Cluster No. 5
Adaptive Learning 3d printing Ambient Computing Expands 3G, 4G, WiFi, Bluetooth, Zigbee
AI PaaS Additive
Manufacturing and Printing Automatic Voice Cloning and Dubbing 6G
AutoML Advanced Anomaly Detection Automatic Voice Transcription Affective Computing
Cognitive,Robo-brain analytics Big Tech Gets Into Healthcare Cloud Center of Excellence
Continuous Learning Asset Performance Management Blockchain & Smart Contracts Cloud Computing
Deep Learning Augmented Data Discovery Connected Fabrics cloud storage
Deep RL big data Connected Home Cloud-Native
Democratized AI Building
Information
Modeling Consumer Device Targeting Cloud-Native
Application
Architecture
e-learning Customer-Best-Next Action Consumer Smart Appliances Cloudlets
Experiential Learning Data Catalog Corporate Foreign Policy Cognitive Computing
GAN,
Generative
Models Data for Good Data Ownership Composable Enterprise
Learning BPO Data Lakes Decentralized
Autonomous
Organization Composable Infrastructure
Machine Learning data mining Digital Eavesdropping Crypto-Mining Rights Malware
MLOps Data Science Faster and More Powerful Open-Source Frameworks Data as a Service
Multitask Learning data visualization Forced Bundling and Planned Obsolescence Digital Ethics & AI
Governance
One-shot Learning database Gaining Access to Backdoors Digital Workplace
Reinforcement Learning Decision In-car Supremacy DNA Computing
Intelligence
Small Data Decision Management Interactive Fitness Equipment Fragmentation
Transfer Learning Edge Analytics Interoperability GPU & HPC
Ensemble Learning Marketplace Consolidation Hijacking Internet Traffic
Enterprise Information Management Programs Networked Smart Devices hybrid cloud
Graph Analysis NLP Hyperscale Computing
Information Capabilities Framework Open Data Insecure Supply Chains
Information Semantic Services Open Source App Vulnerabilities Intelligent Infrastructure
Intelligent Applications Prioritizing Accountability and Trust Internet Computing
Logical Data Warehouse Real Estate and Home Building Powered by Platforms Internet of Meat
Notebooks Retrofitting Old Homes with New Tech Micro Data Centers
optimization Smart Advisors Microservices
Python Smart Appliance Screens Post Quantum Cryptography
Search-Based Data Discovery Tools Smart Assets Proliferation of Darknets
Servware Smart Belts and Shoes quantum
Smart Data Discovery Smart Cameras Quantum Communication
Space Analytics Smart Contracts Quantum Computing
Universal Intelligence Smart Fabrics Security
Smart Gloves Serverless Computing
Smart Lighting Service Mesh
Smart Robotics Sustainability in Supply Chain and Logistics
Smart Threads The Proliferation of Splinternets
Smart Workspace Visual Merchandising
The End of Attention Metrics Visual Merchandising
The Programmable Economy Web 3.0
Tokens For Smart Royalties and Freelancers wireless
[064] After the iterative process is completed for each technology group in the technology corpus of interest to the enterprise associated with the preceding time period, there may be some unnamed clusters. For example,
Cluster No.25: Bidirectional Brain-Machine Interface, Brain Computer Interface,
Brain simulation, Brain-Machine Interfaces (BMIs), Neuroenhancers,
neuroscience, Organoid Development
[065] One or more anchor keywords are determined for each unnamed cluster, wherein the one or more anchor keywords are keywords associated with a predefined number of highest combined Term Frequency–Inverse Document Frequency (TF-IDF) scores in each unnamed cluster. In an embodiment, the predefined number may be 2 or 3. In the exemplary cluster no. 25 provided above,
‘nervous’ and ‘neuron’ may be identified as the anchor keywords. The technologies associated with the one or more anchor keywords are grouped to form one or more regrouped unnamed clusters while eliminating the one or more anchor keywords from other clusters amongst the unnamed clusters. The anchor keywords ‘nervous’ and ‘neuron’ are removed from any other cluster, if present. The formed one or more regrouped unnamed clusters are then named based on the associated one or more anchor keywords, thereby obtaining the set of technology group clusters for the current time period, wherein the set of technology group clusters belong to the plurality of technology families based on a received mapping of the plurality of technology families to the de-duplicated emerging technologies using the associated technology description.
[066] The reactivity of a technology is defined as a measure of its affinity to combine with other technologies, solutions and social trends to create new technologies and solutions adding greater value to business. Based on similarity of individual technologies comprising a technology group, a reactivity measure can be associated with each technology group. For technologies that have matured, this measure is easy to compute, as physical instances of such combinations are available and can be counted. However, there are no physical instances to count for emerging technologies, and hence the measure is to be estimated based on available information.
[067] At the end of step 206, the set of technology group clusters belonging to the plurality of technology families for the current time period is obtained, wherein each technology cluster is named (labeled) and is associated with a set of technologies and associated keywords. An exemplary technology cluster with an exemplary set of keywords is as shown in Table 12 below. Table 12:
Cluster No. Label (name technology group) of Keywords
21 AI learn, train, model, deep, deep learn, learn model, supervise, educational,
supervise learn, experiential, automate, unsupervised, problem, performance, game, transfer learn, reward, transfer, task, object, approach, adaptive, unsupervised learn, cognitive, learn deep, education, reuse, algorithm, win, decision, analytics, intelligence, device, human, governance , ethical
[068] In accordance with the present disclosure, a network is generated for each technology group and using the metrics of the network, the reactivity of each technology group is estimated. In an embodiment of the present disclosure, the one or more hardware processors 104, are configured to generate, at step 208, a network of technology groups in the set of technology group clusters, using a network analysis tool, wherein the network comprises nodes and edges, and wherein each node represents a technology group and two nodes are connected by an edge if there is at least one keyword common to the technology groups. In an embodiment, the network analysis took may be Gephi™, an open graph visualization platform.
[069] Once the network is generated, a Weighted Degree Centrality (WDC) indicative of reactivity of a technology group is computed for each technology group (node), at step 210. The values of the WDC metric are normalized for each node in a range of 0 to 10, at step 212. The reactivity of the technology group is associated to the normalized values of the WDC metric, at step 214, wherein a highest normalized value of the WDC metric is indicative of a most reactive technology group.
[070] FIG.3 illustrates an exemplary network comprising nodes and edges, in accordance with some embodiments of the present disclosure. There are three exemplary clusters illustrated in FIG.3, wherein each cluster is represented by nodes of different shapes. Cluster 1 is represented by circular nodes and the labels (technology groups) are identified by single ended dashed arrows. Cluster 2 is represented by triangular nodes and the labels (technology groups) are identified by double ended dashed arrows. Cluster 3 is represented by star shaped nodes and the labels (technology groups) are identified by single ended arrows. The size of the nodes represents number of edges associated with the node. Higher the number of edges a node has, the bigger the size of the node is. The number of lines between two nodes represents number of common keywords between the two technology groups while the thickness of the lines connecting the nodes represents the weight of the edge or the number of times the keyword has occurred between the two nodes. The top 5 nodes (C12 – Tech Domain, Media, Security & Privacy, Advanced Analytics, and Network & Communication) based on Weighted Degree Centrality are highlighted with bold font.
[071] For the exemplary network illustrated in FIG.3, Table 13 below represents the top 5 Technology groups (nodes) that are considered highly reactive. It may be noted that Media followed by Advanced Analytics are the two technology groups that show the highest affinity to combine with other technologies based on the associated WDC values. Table 13:
Technology WDC value Normalized Value in range
groups (nodes) WDC = (value-min)/(max-min) 0- 10
Media 112 111.67 5.94
Advanced 102 101.67 5.41
Analytics
Network & 102 101.67 5.41
Communication
Security & Privacy 100 99.67 5.3
C12-Tech domain 98 97.67 5.2
[072] The nodes representing technology groups in the network of the present disclosure provide meaningful inferences by way of degree of collaboration (reactivity) of a technology group with the other technology groups. These inferences enable ranking of technology groups based on their collaborative capabilities.
[073] Thus, the present disclosure systematically identifies a ranked list of emerging technologies from a plurality of sources, that are diverse in nature; eliminates duplicate technology terms to obtain a list of de-duplicated emerging technologies; performs multi-stage clustering of the de-duplicated emerging technologies to obtain a set of technology clusters that can be mapped to a plurality of technology families such as the unitary technologies, technology domains and industry domains. Having grouped the de-duplicated emerging technologies into technology clusters and further classified under the technology families, using the network of nodes representing technology groups and edges as explained above, collaborative capabilities of the technology groups are assessed and the technologies are suitably ranked.
[074] In accordance with the present disclosure, subjective tasks of identifying, ranking and grouping emerging technologies have been converted into a systematic and objective set of tasks that enable crisp inferences and consistent selection of emerging technologies to meet objectives of the enterprise.
[075] In an embodiment of the present disclosure, the one or more hardware processors 104, are configured to input, at step 216, (i) the set of technology group clusters characterized by one or more technology terms having technology description and keywords and belonging to the plurality of technology families for the current time period, and (ii) reactivity of the technology group in the set of technology group clusters into at least one of (a) an innovation funnel model to select the technology groups from the set of technology group clusters for Research and Development (R&D) and (b) a portfolio management tool for evaluating research projects based on utilization of the emerging technologies in
terms of prioritization, capacity utilization, risk mitigation and cost optimization. In an embodiment, the portfolio management tool may be Planview™ Portfolio Management tool, Clarizen™ Project Management tool, and the like.
[076] In an embodiment, the system and method of the present disclosure can find application in Research and Development (R&D) funnel filtering. Companies that pursue innovation relentlessly and have invested a lot in their R&D, often employ innovation funnel models such as Bitrix24™, Canny™, and the like, to support innovation. In this model, a broad range of innovation ideas are generated through various means within the company and the funnel is used to gradually refine and select the best among them for R&D. Companies employ an automated application to simulate the working of the innovation funnel and employ automated filters to refine the list. While a broad range of criteria may be used for filtering, some of the most important technical criteria involve the evaluation of the technology needed to implement the idea. The key metrics used for automatic evaluation of the technical criteria are current ranking of the technology, the group to which the technology belongs and the reactivity of the technology group. These metrics provide a comparable measure for superiority of the solution, alignment with core technologies used by the company and the ability of the solution to serve multiple purposes.
[077] In another embodiment, the system and method of the present disclosure can also find application in portfolio analysis. Companies maintain a portfolio of research projects under various groups based on research areas or business domains. A portfolio management tool that analyzes the content of each portfolio from multiple perspectives is typically employed. These perspectives may include prioritization, capacity utilization, risk mitigation and cost optimization. The portfolio management tool offers recommendations on ways to improve the portfolio in these aspects. The system and method of the present disclosure provides key inputs to evaluate the research projects based on their utilization of emerging technologies in terms of prioritization, capacity utilization, risk mitigation and cost optimization.
[078] Thus ad hoc decision making is replaced by an objective and systematic approach with a traceability towards the selection of emerging technologies provided by the system to meet objectives of the enterprise.
[079] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[080] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means, and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
[081] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a
computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[082] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[083] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more hardware processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile
memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[084] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.
We Claim:
1. A processor implemented method (200) comprising the steps of:
identifying, via one or more hardware processors, a list of emerging technologies, at a current time period, from a plurality of technologies obtained from a plurality of sources, the list of the emerging technologies being ranked based on a Content Attribution Score (CAS) computed for each technology in the plurality of technologies, wherein emerging technologies in the list of emerging technologies are technologies associated with unrealized development, practical applications, or a combination thereof, and wherein the plurality of sources comprise at least one of (i) internet sources including one or more web pages pertaining to the emerging technologies, (ii) databases and (iii) offline documents pertaining to the emerging technologies (202);
eliminating, via the one or more hardware processors, duplicate technology terms from the identified list of the emerging technologies using (a) a string-matching technique and (b) word vector embeddings of technology terms to obtain a list of de-duplicated emerging technologies that are re-ranked using the CAS, wherein (i) the technology terms and (ii) associated technology description for each of the technology terms are obtained from the plurality of sources and updated in a technology corpus of interest to an enterprise (204); and
performing multi-stage clustering, via the one or more hardware processors, of the de-duplicated emerging technologies into a plurality of technology families, the multi-stage clustering comprising: (i) using k-means clustering on document word vectors generated from the associated technology description obtained from the plurality of sources, to obtain a set of technology clusters, wherein each technology cluster in the set of technology clusters is characterized by one or more technology terms having technology description and keywords associated thereof; and (ii) refining the set of technology clusters using an Iterative Keyword Swapping Method (IKSM) based regrouping of the de-duplicated emerging
technologies, for technology alignment of each emerging technology, using the technology corpus of interest to the enterprise associated with a preceding time period, to obtain a set of technology group clusters characterized by one or more technology terms having technology description and keywords associated thereof and belonging to a plurality of technology families for the current time period based on a received mapping of the plurality of technology families to the de-duplicated emerging technologies using the associated technology description (206).
2. The processor implemented method of claim 1, wherein the step of
identifying the list of emerging technologies is preceded by: receiving the plurality of sources; performing at least one of:
periodically crawling each of the plurality of sources using Document Object Model (DOM) navigation rules to obtain (i) the technology terms, (ii) a page rank for each of the internet sources and (iii) the technology description for each of the technology terms in the event that the plurality of sources are internet sources; and
obtaining (i) the technology terms and the associated
technology description for each of the technology terms from
plurality of sources using Named entity recognition (NER)
extraction and Regular Expression(regex) patterns; and (ii)
page rank from an associated internet source, in the event
that the plurality of sources are the databases and the offline
documents pertaining to the emerging technologies; and
updating the technology corpus of interest to the enterprise, with the
obtained technology terms, page rank for each of the plurality of sources
and the associated technology description for each of the technology terms.
3. The processor implemented method of claim 2, wherein the CAS is
computed by:
obtaining the page rank associated with each of the plurality of sources corresponding to each technology;
adding page ranks of the plurality of sources corresponding to each technology that appears in more than one source from the plurality of sources;
normalizing the added page rank corresponding to each technology within a range of 1 to 5 to obtain a Page Rank (PR) score corresponding to each technology;
computing an enterprise technology score for each technology based on percentage of contribution of each of the plurality of sources to each technology and normalizing the enterprise technology score to a value between 1 and 5;
obtaining an overall source rank for each technology as an average of normalized source ranks associated with corresponding sources from the plurality of sources, wherein each normalized source rank has a value in a range 1 to 10, and wherein a default value of 5 is assigned as a source rank of the technology before normalizing, if no source rank is available for a technology for a corresponding source; and
computing the CAS for each technology based on the obtained PR score, the computed enterprise technology score and the obtained overall source rank, wherein the value of the CAS is in a range of 1 to 100.
4. The processor implemented method of claim 1, wherein the step of
eliminating the duplicate technologies from the identified list of emerging
technologies comprises:
combining the plurality of technologies in the technology corpus of interest to the enterprise, with the identified list of the emerging technologies;
performing a first level filter of the combined plurality of technologies, by eliminating duplicate technology terms identified using the string-matching technique;
splitting the technology terms from the first level filtered plurality of technologies into unigrams and identifying the word vector embedding for each combination of technology terms by using the unigrams to identify similar words in bigrams and trigrams as subparts;
comparing the technology terms in each pair of technology terms using similarity scores associated with the word vector embedding for the combinations of technology terms;
flagging, as duplicates, the technology terms with a higher similarity score in each pair of technology terms;
performing a second level filter of the first level filtered list, to obtain the list of de-duplicated emerging technologies, by identifying the technology terms having the highest value of CAS amongst the flagged duplicate technology terms as distinct technology terms; and
re-ranking the list of de-duplicated emerging technologies using the CAS.
5. The processor implemented method of claim 1, wherein the IKSM
comprises:
iteratively performing for each technology group in the technology
corpus of interest to the enterprise associated with the preceding time
period, the steps of:
identifying a cluster from the set of technology clusters as a target cluster, wherein the target cluster has a highest count of technology terms associated with a technology group in the technology corpus of interest to the enterprise associated with the preceding time period;
naming the target cluster in line with a name associated with the technology group in the technology corpus of interest to the enterprise associated with the preceding time period;
identifying one or more missing technologies in the target cluster when compared to the technology group in the technology corpus of interest to the enterprise associated with the preceding time period;
determining a source cluster for each of the one or more missing technologies;
transferring the technology terms from an associated source cluster for each of the one or more missing technologies along with associated minimum keywords to the target cluster based on minimum keywords in the source cluster that map to keywords associated with the target cluster; and
transferring the technology terms from any source cluster
associated with keywords mapping to the target cluster;
determining one or more anchor keywords for each unnamed cluster,
wherein the one or more anchor keywords are keywords associated with a
predefined number of highest combined Term Frequency–Inverse
Document Frequency (TF-IDF) scores in each unnamed cluster;
grouping technologies associated with the one or more anchor keywords to form one or more regrouped unnamed clusters while eliminating the one or more anchor keywords from other clusters amongst the unnamed clusters; and
naming the formed one or more regrouped unnamed clusters based on the associated one or more anchor keywords, thereby obtaining the set of technology group clusters for the current time period, wherein the set of technology group clusters belong to the plurality of technology families based on a received mapping of the plurality of technology families to the de-duplicated emerging technologies using the associated technology description.
6. The processor implemented method of claim 5 further comprising:
generating, via the one or more hardware processors, a network of technology groups in the set of technology group clusters, using a network analysis tool (208), wherein the network comprises nodes and edges, and wherein each node represents a technology group and two nodes are connected by an edge if there is at least one keyword common to the technology groups;
computing a Weighted Degree Centrality (WDC) indicative of reactivity of a technology group (210);
normalizing values of the WDC metric of each node in a range of 0 to 10 (212); and
associating reactivity of a technology group to the normalized values of the WDC metric, wherein a highest normalized value of the WDC metric is indicative of a most reactive technology group (214).
7. The processor implemented method of claim 6, further comprising:
inputting, via the one or more hardware processors, (i) the set of technology group clusters characterized by one or more technology terms having technology description and keywords and belonging to the plurality of technology families for the current time period, and (ii) reactivity of the technology group in the set of technology group clusters into at least one of (a) an innovation funnel model to select the technology groups from the set of technology group clusters for Research and Development (R&D) and (b) a portfolio management tool for evaluating research projects based on utilization of the emerging technologies in terms of prioritization, capacity utilization, risk mitigation and cost optimization (216).
8. A system (100) comprising:
a memory (102) storing instructions;
one or more communication interfaces (106); and
one or more hardware processors (104) coupled to the memory (102) via the one or more communication interfaces (106), wherein the one or more hardware processors (104) are configured by the instructions to:
identify, a list of emerging technologies, at a current time period, from a plurality of technologies obtained from a plurality of sources, the list of the emerging technologies being ranked based on a Content Attribution Score (CAS) computed for each technology in the plurality of technologies, wherein emerging technologies in the list of emerging technologies are technologies associated with unrealized development, practical applications, or a combination thereof, and wherein the plurality of sources comprise at least one of (i) internet sources including one or more web pages pertaining to the emerging technologies, (ii) databases and (iii) offline documents pertaining to the emerging technologies;
eliminate, duplicate technology terms from the identified list of the emerging technologies using (a) a string-matching technique and (b) word vector embeddings of technology terms to obtain a list of de-duplicated emerging technologies that are re-ranked using the CAS, wherein (i) the technology terms and (ii) associated technology description for each of the technology terms are obtained from the plurality of sources and updated in a technology corpus of interest to an enterprise; and
perform multi-stage clustering, of the de-duplicated emerging technologies into a plurality of technology families, the multi-stage clustering comprising: (i) using k-means clustering on document word vectors generated from the associated technology description obtained from the plurality of sources, to obtain a set of technology clusters, wherein each technology cluster in the set of technology clusters is characterized by one or more technology terms having technology description and keywords associated thereof; and (ii) refining the set of technology clusters using an Iterative Keyword Swapping Method (IKSM) based regrouping of the de-duplicated emerging technologies, for technology alignment of each emerging technology, using the technology corpus of interest to the
enterprise associated with a preceding time period, to obtain a set of technology group clusters characterized by one or more technology terms having technology description and keywords associated thereof and belonging to a plurality of technology families for the current time period based on a received mapping of the plurality of technology families to the de-duplicated emerging technologies using the associated technology description.
9. The system of claim 8, wherein the one or more processors are configured
to:
receive the plurality of sources; perform at least one of:
periodically crawling each of the plurality of sources using Document Object Model (DOM) navigation rules to obtain (i) the technology terms, (ii) a page rank for each of the internet sources and (iii) the technology description for each of the technology terms in the event that the plurality of sources are internet sources; and
obtaining (i) the technology terms and the associated technology description for each of the technology terms from plurality of sources using Named entity recognition (NER) extraction and Regular Expression(regex) patterns; and (ii) page rank from an associated internet source, in the event that the plurality of sources are the databases and the offline documents pertaining to the emerging technologies; and update the technology corpus of interest to the enterprise, with the obtained technology terms, page rank for each of the plurality of sources and the associated technology description for each of the technology terms, prior to identifying the list of emerging technologies.
10. The system of claim 9, wherein the one or more processors are configured
to compute CAS by:
obtaining the page rank associated with each of the plurality of sources corresponding to each technology;
adding page ranks of the plurality of sources corresponding to each technology that appears in more than one source from the plurality of sources;
normalizing the added page rank corresponding to each technology within a range of 1 to 5 to obtain a Page Rank (PR) score corresponding to each technology;
computing an enterprise technology score for each technology based on percentage of contribution of each of the plurality of sources to each technology and normalizing the enterprise technology score to a value between 1 and 5;
obtaining an overall source rank for each technology as an average of normalized source ranks associated with corresponding sources from the plurality of sources, wherein each normalized source rank has a value in a range 1 to 10, and wherein a default value of 5 is assigned as a source rank of the technology before normalizing, if no source rank is available for a technology for a corresponding source; and
computing the CAS for each technology based on the obtained PR score, the computed enterprise technology score and the obtained overall source rank, wherein the value of the CAS is in a range of 1 to 100.
11. The system of claim 8, wherein the one or more processors are configured
to eliminate the duplicate technologies from the identified list of emerging
technologies by:
combining the plurality of technologies in the technology corpus of interest to the enterprise, with the identified list of the emerging technologies;
performing a first level filter of the combined plurality of technologies, by eliminating duplicate technology terms identified using the string-matching technique;
splitting the technology terms from the first level filtered plurality of technologies into unigrams and identifying the word vector embedding for each combination of technology terms by using the unigrams to identify similar words in bigrams and trigrams as subparts;
comparing the technology terms in each pair of technology terms using similarity scores associated with the word vector embedding for the combinations of technology terms;
flagging, as duplicates, the technology terms with a higher similarity score in each pair of technology terms;
performing a second level filter of the first level filtered list, to obtain the list of de-duplicated emerging technologies, by identifying the technology terms having the highest value of CAS amongst the flagged duplicate technology terms as distinct technology terms; and
re-ranking the list of de-duplicated emerging technologies using the CAS.
12. The system of claim 8, wherein the one or more processors are configured
to perform IKSM by:
iteratively performing for each technology group in the technology
corpus of interest to the enterprise associated with the preceding time
period, the steps of:
identifying a cluster from the set of technology clusters as a target cluster, wherein the target cluster has a highest count of technology terms associated with a technology group in the technology corpus of interest to the enterprise associated with the preceding time period;
naming the target cluster in line with a name associated with the technology group in the technology corpus of interest to the enterprise associated with the preceding time period;
identifying one or more missing technologies in the target cluster when compared to the technology group in the technology corpus of interest to the enterprise associated with the preceding time period;
determining a source cluster for each of the one or more missing technologies;
transferring the technology terms from an associated source cluster for each of the one or more missing technologies along with associated minimum keywords to the target cluster based on minimum keywords in the source cluster that map to keywords associated with the target cluster; and
transferring the technology terms from any source cluster
associated with keywords mapping to the target cluster;
determining one or more anchor keywords for each unnamed cluster,
wherein the one or more anchor keywords are keywords associated with a
predefined number of highest combined Term Frequency–Inverse
Document Frequency (TF-IDF) scores in each unnamed cluster;
grouping technologies associated with the one or more anchor keywords to form one or more regrouped unnamed clusters while eliminating the one or more anchor keywords from other clusters amongst the unnamed clusters; and
naming the formed one or more regrouped unnamed clusters based on the associated one or more anchor keywords, thereby obtaining the set of technology group clusters for the current time period, wherein the set of technology group clusters belong to the plurality of technology families based on a received mapping of the plurality of technology families to the de-duplicated emerging technologies using the associated technology description.
13. The system of claim 12, wherein the one or more processors are further
configured to:
generate a network of technology groups in the set of technology group clusters, using a network analysis tool, wherein the network comprises nodes and edges, and wherein each node represents a technology group and two nodes are connected by an edge if there is at least one keyword common to the technology groups;
compute a Weighted Degree Centrality (WDC) indicative of reactivity of a technology group;
normalize values of the WDC metric of each node in a range of 0 to 10; and
associate reactivity of a technology group to the normalized values of the WDC metric, wherein a highest normalized value of the WDC metric is indicative of a most reactive technology group.
14. The system of claim 13, wherein the one or more processors are further
configured to: input (i) the set of technology group clusters characterized by
one or more technology terms having technology description and keywords
and belonging to the plurality of technology families for the current time
period, and (ii) reactivity of the technology group in the set of technology
group clusters into at least one of (a) an innovation funnel model to select
the technology groups from the set of technology group clusters for
Research and Development (R&D) and (b) a portfolio management tool for
evaluating research projects based on utilization of the emerging
technologies in terms of prioritization, capacity utilization, risk mitigation
and cost optimization.
| # | Name | Date |
|---|---|---|
| 1 | 202121036165-STATEMENT OF UNDERTAKING (FORM 3) [10-08-2021(online)].pdf | 2021-08-10 |
| 2 | 202121036165-REQUEST FOR EXAMINATION (FORM-18) [10-08-2021(online)].pdf | 2021-08-10 |
| 3 | 202121036165-FORM 18 [10-08-2021(online)].pdf | 2021-08-10 |
| 4 | 202121036165-FORM 1 [10-08-2021(online)].pdf | 2021-08-10 |
| 5 | 202121036165-FIGURE OF ABSTRACT [10-08-2021(online)].jpg | 2021-08-10 |
| 6 | 202121036165-FIGURE OF ABSTRACT [10-08-2021(online)]-1.jpg | 2021-08-10 |
| 7 | 202121036165-DRAWINGS [10-08-2021(online)].pdf | 2021-08-10 |
| 8 | 202121036165-DECLARATION OF INVENTORSHIP (FORM 5) [10-08-2021(online)].pdf | 2021-08-10 |
| 9 | 202121036165-COMPLETE SPECIFICATION [10-08-2021(online)].pdf | 2021-08-10 |
| 10 | 202121036165-Proof of Right [17-08-2021(online)].pdf | 2021-08-17 |
| 11 | 202121036165-FORM-26 [21-10-2021(online)].pdf | 2021-10-21 |
| 12 | Abstract1.jpg | 2022-02-17 |
| 13 | 202121036165-FER.pdf | 2023-03-06 |
| 14 | 202121036165-OTHERS [08-08-2023(online)].pdf | 2023-08-08 |
| 15 | 202121036165-FER_SER_REPLY [08-08-2023(online)].pdf | 2023-08-08 |
| 16 | 202121036165-DRAWING [08-08-2023(online)].pdf | 2023-08-08 |
| 17 | 202121036165-COMPLETE SPECIFICATION [08-08-2023(online)].pdf | 2023-08-08 |
| 18 | 202121036165-CLAIMS [08-08-2023(online)].pdf | 2023-08-08 |
| 19 | 202121036165-ABSTRACT [08-08-2023(online)].pdf | 2023-08-08 |
| 20 | 202121036165-US(14)-HearingNotice-(HearingDate-01-08-2024).pdf | 2024-06-19 |
| 21 | 202121036165-Correspondence to notify the Controller [22-07-2024(online)].pdf | 2024-07-22 |
| 22 | 202121036165-Written submissions and relevant documents [08-08-2024(online)].pdf | 2024-08-08 |
| 23 | 202121036165-PatentCertificate02-09-2024.pdf | 2024-09-02 |
| 24 | 202121036165-IntimationOfGrant02-09-2024.pdf | 2024-09-02 |
| 1 | npl2E_06-03-2023.pdf |
| 2 | npl1E_06-03-2023.pdf |