A Novel System And Method For Automatic Generation Of Wikipedia

< Back

A Novel System And Method For Automatic Generation Of Wikipedia Articles

Abstract: An automatic Wikipedia article generation system comprises inputloutput device wherein said device interacts with computer proyam and computer device through network .interface and bus connected through network channel; storage device where& said storage device is hard disk storage device for storing the components of system which are read or written by computer program; graphcs processing unit wherein said unit performs computational operations; data collection device wherein said device extracts article, . " performs scanning to retrieve content of articles; template generator unit wherein said unit generates new article for similar topics; query generation unit wherein said unit generates query which can retrieve information from web; search and content extraction unit wherein said unit filters noise that links to other articles.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

13 October 2015

Publication Number

16/2017

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

ipr-cell@iitr.ac.in

Parent Application

Patent Number

Legal Status

Grant Date

2024-12-19

Renewal Date

Applicants

SANDEEP KUMAR

PROFESSOR COLONY,H.NO.810/12, STREET NO.-3 NEAR CHHOTA RLY, STATION, THANESAR, KURUKSHETRA, PIN-136118, HARYANA,INDIA

VIKRANT YADAV

PROFESSOR COLONY,H.NO.810/12, STREET NO.-3 NEAR CHHOTA RLY, STATION, THANESAR, KURUKSHETRA, PIN-136118, HARYANA,INDIA

FAISAL KHAN

PROFESSOR COLONY,H.NO.810/12, STREET NO.-3 NEAR CHHOTA RLY, STATION, THANESAR, KURUKSHETRA, PIN-136118, HARYANA,INDIA

ANSHUL SINGHAL

PROFESSOR COLONY,H.NO.810/12, STREET NO.-3 NEAR CHHOTA RLY, STATION, THANESAR, KURUKSHETRA, PIN-136118, HARYANA,INDIA

DEVENDRA PRATAP SINGH

PROFESSOR COLONY,H.NO.810/12, STREET NO.-3 NEAR CHHOTA RLY, STATION, THANESAR, KURUKSHETRA, PIN-136118, HARYANA,INDIA

KULDEEP KUMAR

PROFESSOR COLONY,H.NO.810/12, STREET NO.-3 NEAR CHHOTA RLY, STATION, THANESAR, KURUKSHETRA, PIN-136118, HARYANA,INDIA

Inventors

1. KUMAR,SANDEEP

PROFESSOR COLONY,H.NO.810/12, STREET NO.-3 NEAR CHHOTA RLY, STATION, THANESAR, KURUKSHETRA, PIN-136118, HARYANA,INDIA

2. YADAV,VIKRANT

V.P.O NANGAL PATHANI, TEH.-KOSLI, DISTT- REWARI, HARYANA,PIN-123401, INDIA

3. KHAN,FAISAL

H.NO.-470, SECTOR-15 PART-1,GURGAON, DISTRICT-GURGAON, HARYANA, PIN-122001, INDIA

4. SINGHAL, ANSHUL

H.NO.1620A, SECTOR-6, BAHADURGARH,DISTT.- JHAJJAR, HARYANA, PIN-124507, INDIA

5. SINGH,PRATAP,DEVENDRA

EWS-180, KALINDIPURAM,RAJROOPPUR ALLAHABAD (UTTAR PRADESH), PIN-211011, INDIA

6. KUMAR KULDEEP

PROFESSOR COLONY, H.NO.810/12, STREET NO.-3, NEAR CHHOTA RLY,STATION, THANESAR, KURUKSHETRA,PIN-136118, HARYANA,INDIA

Specification

Field of Invention -
The invention relates to an automatic Wikipedia article generation system that is capable of taking any
Wikipedia entity and category as input and can generate the corresponding Wikipedia article as an
output. The system generates article in any language such as Hindi, English, etc. The system is
capable of generating well-structured templates for any Wikipedia category. The system extracts text
from many formats of web data and is capable of summarizing the useful text with less redundancy
and better relevancy for bopdating templates. It learns best queries for retrieval of less noisy and
'relevant content from the web for any given entity in a Wikipedia category.
BACKGROUND OF INVENTION
The Indexed Web contains at least 4.68 billion pages. Information about almost everything is present
olr Lhe web in large quantities but the problem with such a high volume of information is its ability to
confuse people about which information to take when studying a particular topic. This phenomenon is
also termed as information overload. Information overload occurs when the amount of input to a
system exceeds its processing capacity. Decision makers have fairly limited cognitive processing
capacity. Consequently, when information overload occur reduction in decision quality occurs.
Therefore, there is a need for a single article to entirely cover up to date information about any given
topic. The popular website Wikipedia performs the same in a structured manner thus giving a
. .
comprehensive coverage about different topics. Wikipedia presently contains about 4.8 million
English articles and is ranked as the 6~ most popular website in the world by Alexa. But since
Wikipedia is crowd sourced and hence have some of its limitations. Wikipedia articles are not present
on all the popular topics. As a result many topics are present as stub on Wikipedia. So, there is a need
. . to have a mechanism which takes large amount of information and gives a Wikipedia like structured
and detailed information about any topic,, thus having the potential to convert a stub into a proper
article.
SUMMARY OF THE INVENTION
, This invention presents a system and method for automatically generating Wikipedia articles using the
information present on the Internet. In order to generate'wikipedia like structured article for a given
'topic related to a given' category, firstly template is automatically generated specific to the given
category like for instance American male Actors, City and Towns in India, etc. using the existing
Wikipedia articles relating to these categories. After determining the structure of the article to be
generated, queries are created using techniques for instance, clustering and topic modelling to
determine the relevant content to a particular section. For instance, for category American male actors
queries can be generated for sections like early life, personal life, career, etc.
The created queries then fetches the relevant webpages from the Internet using the ranking of web
f search engines, for instance, Google in our case. The fetched content has a lot of noise like
.advertisement, navigation panel, user comments, etc. collectively known as boilerplate, which is
removed using the existing boilerplate removal algorithms. Since, in a retrieved webpage, lot of
content is irrelevant to the topic concerned, we extract the relcvant content by classification. Finally,
I the filtered content from the previous step is compressed into a meaningful length article using multi- ~ documents summarisation.
. .
After the final document is generated it isevaluated using some standard metrics along with manual
evaluation. Scores like ROUGE are used for automatic evaluation of the generated articles. The
1 articles are also evaluated manually by observing the edit history of the article for a certain period of
1 - time after uploading it on Wikipedia. Such pronged approaches of evaluation helps in assessing the
efficiency of I - the system.
,DETAILED DESCRIPTION OF INVENTION
Basic Concepts
Dynamic template creation
In order to make an automated system to generate Wikipedia articles, structure is required based on
which the article for a topic is created. Manual creation of such template may not be feasible.
Therefore, template is generated for a topic relating to a particular category dynamically in an
automated fashion. All the existing articles relating to a given category are analysed and the common
headings from them are considered. A simple approach of counting the frequency of headings across
the category is sufficient to determine the template of that category. After selecting the most frequent
headings, ordering in.a meaningful manner is required. For ordering the headings in the template, the
order of the headings in the existing category is considered and based on which a methodology is
.developed to determine the ordering of the headings in the generated template.
Information retrieval
In order to fetch information from the lnternet regarding a particular topic, it is highly beneficial to
use context with the topic. Also, some information which is common across all the topics of a
category adds a lot of context to the topic and is useful for fetching information for it especially if the
topic is relatively new. This observation is captured using topic modelling, it gives a generalised
overview of any given topic by analysing the body text of Wikipedia articles of a category. Topic
modelling can be used as a starting point by authors for writing content about different topics.
@p@q-pt q~eges be 2eD@ei using wo%d%Vec by selecting queries with higher semantic
.similarity. This approach leads to formation of useful and less redundant queries. While fetching
information from Internet, documents contain a lot of information which is also used for generating
queries.
Filtering relevant information
i he content retrieved from the web may have less relevant excerpts for a given topic. In order to filter
only relevant excerpts simple features like bag of words can be used to classify the results instead of
other complex features. Given the non-coherent information present on the webpages, we use the
concept of context windows. It breaks the excerpt to be classified into smaller sized excerpts thus
reducing the effect of noise over the main content even when the main content is present in less
amount. Anomalies like disambiguating a positively classified excerpt based on the word count'of the
related'topic, removal of content like an interview excerpt based on first and second person pronouns
and resolving the conflict if an excerpt is classified as positive in two or more sections by using
I precedence relation is proposed.
i
Topic Modelling
Toljic modelling is a technique of extracting hidden "topics" from the collection of the documents.
These extracted "topics" represent the theme of the information present in the input documents. All of
the Wikipedia articles in a category share almost the same topics. Using topic modelling, high scoring
topics can be extracted and can be used to analyse the kind of information which needs to be retrieved
for a new article. Thus, topic modelling makes it easy to fetch the relevant raw information in the first
place for an article wl-lich is to be generated. There are various algorithms which can identify the
hidden topics in the data like Latent Semantic Analysis (LSA) , Latent Dirichlet Allocation (LDA) ,
etc. In this work, both LSA and LDA is used in methodology and the best of them is chosen.
Web Based Question ~ n s w e r i n ~
. .
Web based question answering has been used to fill the information in our Wikipedia article through
. searching for queries on the Internet. However, a web based Q &-A system generally produces short
answers (like two to three lines) whereas we require long detailed paragraphs to get a descriptive
article. In invented system, the web based query answering sub-system also add contexts to the
queries before searching and retrieving. Also, instead of directly scoring results based on their
similarity with query, the results are scored by comparing them with existing articles, i.e. classifying
C
them whether they are relevant or not, and to what extent (probability of them being selected).
~dntehEt xtraction . .
A web page consists of information relating to a particular topic and also some unnecessary
I DgfkwfiOn I$eyaviga@cm ~yZje$i~lu&$jihbd'e,~boutm e, contact us, copyright, header and footkr,
4
. and advertisements, comments which are collectively .known as boilerplate. To use such webpages it
is important to remove boilerplate from them so that a good quality content is available for further
analysis through a process known as Content extraction. In this invention, the approach for content
extraction has been used after proper comparative performance analysis.
Text Classification Algorithms
Theuse of three different machine learning algorithms has been mentioned for text classification:
.Multinomial Naive Bayes, Logistic Regression, and Linear Support Vector Machine.
Text-to-text generation
The system deals with the text extracted from different sources across the Internet. This becomes a
problem of multi-document summarization, which is an extension of single document summarization.
Information represented by different documents might be related to each other. This related
information needs to be summarized. There is' a difference between sentence extraction, which
requires the fusion of different sentences and sentence extraction, which extracts sentences from the
existing ones.,The present invention focuses on sentence extraction.
Working Process
Figure lshows the overall architecture of the proposed system. As shown in the figure, the overall
'system works on a Computing device with following components: Input and Output Device, Fixed
Storage, Processor &Memory (Volatile and non-volatile), and Computer Program. The methodology
for the generation of.Wikipedia articles is shown in the 'Computer Program' block that resides in the
memory on a computing device.
- Input and Output Device: In this system, the client interface through any standard input device is
required to input the entity name and category for which the Wikipedia article needs to be generated. It
is used by any end user of the system. For getting the generated Wikipedia article as an output from the
system, any standard output device is needed. The input and output devices are interacting with the
Computer Program and Computing 'device through network interface and Bus connected through
- network channel.
Fixed Storage: The system needs fixed storage, i.e., hard-disk storage device for storing the various . .
components of the system which are read or write by the Computer Program. It is one of the essential
components for execution of the invented methodology.
~rocessora nd Memory: The henctioning of the proposed system needs standard high end system with
enough amounts of storage, RAM and computing power for its operation.'This system requires a lot of ., ' . . .
computationally intensive operations. So, CUDA enabled computing devices are much beneficial.
While this system is ca able of runnin thput this, but to leverage the great performance which
I p Q 'f&ELHI 13-18-2015 33-19
CUDA systems have to offer, this system utilizes it. It enables dramatic increase in computing
performance by htiiiessing the power of the graphics processing unit (GPU). Multithreading is
,phenomenal in these hardware devices. For execution of the various methods working as components
of the Computer Program, processor and memory are the most essential components. The usage of
these devices at various steps in the Computer Programs has also been explained in the corresponding
steps.
Computer Program: In this component, firstly, the Wikipedia data is collected and parsed for further
analysis. The template for different categories is then generated and queries arc made fro111 llle
existing Wikipedia articles belonging to the same category. Among the generated queries redundant
queries are removed and best queries are selected. The generated queries are then fired through a Web
search engine. The retrieved webpages are then stripped of the noise like advertisement, navigation
panel, user comments, etc. The extracted content is further classified as relevant to the topic or not and
finally the classified content is summarised into a single article using multi-document summarisation.
Figure-] shows the various modules and workflow of the Computer Program. Each module have been
discussed below.
Data collection and Pre-processing
The initial data for English language was downloaded from Wikipedia itself using the latest dump on
the site in the xml format. The file size was around 50 GB which includes all English Wikipedia
articles till December 10, 2014,which is maintained on the hard disk of the computing system. We
then create an Input-Output interface, as shown in Fig. 2which extracts articles relevant to the given
Wikipedia category. The last step of this process involving scanning the Wikipedia dump to retrieve
contents of article's title list and saving them in XML file is computationally time intensive, so it can
be parallelized on GPU using CUDA for fast processing. The articles in the Wikipedia may.belong to
more than one category, some of which are the sub-categories of the original category we are
searching articles for. Also, in the dump, the sub-category is explicitly written with an article with no
way of determining the parent category. Thus, to search for articles which are inside a category and its
sub-categories, a method is needed to efficiently determine the sub-categories of the input category
and the articles inside a category. Given. a category, Wikipedia API is used to determine its subcategories
and all of article's title inside the category. Articles' titles are added to a "title list" and
recursively s&ch for articles with sub-categories using the API up to a specified depth (usually 2 or
3). After generating the title list, a simple scan of the Wikipedia dump is enough to retrieve the
content for'each article title in the list..The articles for a category are stored inside a single xrnl file in
a different directory in the non-volatile storage of communication architecture.
r
After retrieving the articles inside a category, we extract the original content of the article with all
" I P O DEkifidcGa fb&tfinb '%fio%e@'.%#e dl@Pe"" d9la 9sy ntax is very complex and building a perfect
Wikipedia parser is still an active area of research. However, based upon one of the available extractor
which seems almost fine in our case except few anomalies like unclosed "br, p" tags remaining
' . undetected, we developed an extractor according to our needs and created separate files for articles
heading ,and articles body which are stored in the non-volatile storage of the system.
While storing headings and body, the headings {references, external links, see also, firther reading,
links) are excluded because they were not containing any detailed content about the article and were
just links to other pages. 'l'he references,'external links, see also, further reading, and links were noisy
items and are filtered out. This helped in removing most of the noisc and proved lu bt: very useflil in
.later stages of the project.
The Input-Output- interface also has an article reading interface which creates a stream of articles in
the category and yields next article's headings and sections body on every access. This prevents
possible memory overflow errors by not reading all of the articles at once into main memory for
further computing operations. Also, most of the machine learning algorithms used in the coming
stages are online in nature. Thus, the reading interface was indeed necessary.
Template Generator
Almost all of the articles in a Wikipedia category have some common sections which represent the
t kind of information present in the articles of the category. These common sections can be utilized to
generate a new article for the category containing the information about similar topics as other articles
in'the category. Such sections' heading are added together to form the template for a category.
To generate template for a Wikipedia category, the section 1 sub-section headings of all the articles in
the category except the following non-useful headings are considered:
{references, external links, see also, further reading, links),
which contain noisy content. Our methodology for generating template is described in Fig. 3
In the method, Porter stemming algorithm is used to remove noise fiom the data. For e.g. "automatic"
and "automated" can't be considered different words. So, we remove suffix of words by using rules in
Porter stemming algorithm to generate root word, which in this case is "automate". Also, standard
.English stop words are removed fiom each heading. Step-4 in the methodology can be distributed on . .
the five current machines for fast processing. In this case, SharedMap is partitioned dynamically so
that each machine is accessing different partition of the map at a time and not blocking each other.
EQO DEh-HI 913-10-
Since the order of information matters very much in Wikipedia articles, it is equally important to
decide the order of headings in the template. The easy and reliable way to do this was by considering
the order of the headings in the existing articles. A high-level methodology for this is discussed in Fig.
4
'Query Generation and Selection
For each section of the template, we need to generate some queries with which we can retrieve useful
information from the web (google search) for any new article topic. We took two approaches to the
problem. First one is generating queries from only the section headings and other is generating queries
from the body text of the existing articles. Both of the approaches have their advantages and
drawbacks which are discussed in the following sub-sections.
Query Generation Based on Section Headings
A section's heading is able to retrieve useful general information about the section for the given title
on which article is to be generated. We generate two queries from each section heading:
format 1 - "Category" + "Article Title" + '"Section Heading" + " -wikipediaJ', and
format 2 -. "Article Title" + "Section Heading" + " -wik@edia"
The first format proves highly useful for titles which are a little famous or which may belong to two
different categories at the same time. This brings context to the query. For an American Male Actor
who has just begun his career, there might not be much highly ranked information about him on web
and thus, we get results from other people with same name if queried in format 2. Similarly, for a title
like "Tony Anthony", which is the name of a famous boxer and also a famous actor, we get mixed top
results from both if queried in second format. However, format 1 resolves both of these problems by
attaching a bit of context to the query. With an "American Male Actor" attached to the not so famous
actor title, we get top results of the actor and not any other person. Similarly, with "American Male
Actor Tony Anthonyy', we get results concerning the famous actor and not the famous boxer.
Generating Queries
. .
For each section in the template, we find the top 10 section headings which are semantically similar.to
. the.sectionts heading. Like Template Generation, we gather all the section / sub-section headings from
all the articles in the category and pre-process them by stemming and removing stop words. Next, we
run a topic modelling technique, Latent Semantic Analysis.(LSA) with nearly 300 latent topics on the
headings with each heading represented in the form of bag of words vector. With no surprise, the top 5
I ~ scoring topics of this LSA are indeed related to each of the template headings we selected by simple
count and thus, it also proves that our methods are performing well
Next, for each section heading in the template, we map it to the latent space created by LSA and find
its top 10 closest section headings from all of the input section headings in the latent space. The latent
space vector is more of a semantic representation for a heading h d by simply using cosine similarity
measure between two vectors in latent space, we calculate the semantic similarity between two
headings. The overall method is described in Fig.5. Step4 in the methodology in Fig 5 can be
parallelized on the same machine using multiple threads
with each of the selected closest heading, we make.a query and search it on the web using distributed
-r servers for fast processing in the two formats - format 3 and format 4 by replacing "Section Heading"
with the closest heading. They almost bring the same results as if searched using the template section
heading but sometimes they result in much needed section specific content.
Query Generation Based on Body Text
Queries generated from body text of each section proved to be more section specific as they represent
' the'type of information generally present in the body text of the section. Although they do get noisy
results, but adding context like section heading and category reduces the noise. The two format of
queries are as follows:
format 3- "Category" + "article title" + "section heading" + "query term" + " -wikipedia ", and
format 4- "Article title" + 'Section heading" + "query term" + " -wikipedian
usefulness of adding "Category" in the query is same as that of format 3.
Generating queries
We use topic modelling to find the most common topics talked about in the existing articles for each
section of the template heading. For a section, we run a topic modelling technique Latent Semantic
Analysis (LSA) on the contents of the same (and semantically related) section in different articles to
find out the top scoring topics. Although there are many other good topic modelling algorithms like
Latent Dirichlet Allocation (LDA), Deep Boltzmann Machines , Replicated Softmax Topic Modelling
,but LSA resulted in better top scoring topics than others with a marginally less computation time. So,
LSA is to be preferred if an online system is generating the queries on-the-go.
From the results, almost always the top one or two topics have far high singular values than the rest of
the topics. Also, the largest weighted terms in top 2 topics are repeated with considerably high weights
in rest of the topics. So, we discard the rest of the topics and consider top 10 weighted terms in the top
-JPO D E L H ~ 1 3 - 1 0 - L B P 5 186:l9
9
' 2 topics and combine them to form possible query terms. The complete methodology is described in
These query terms are used to search the web on distributed systems for fast retrieval with both the
.
formats 3and4, where they take the place of "query term".
All,of the possible queries generated are too much to be searched on Internet and retrieve information.
-Most of them either br'ing redundant results or they do not make much sense intuitively. Further, this
process is very resource intensive and time consuming due to remote server constraints and high
latencies. To handle this issue, we proposed two.possible ways of reducing queries in the following
two sub-sections.
Rerizoving Queries Giving Similar Results
Queries which are derived from the body text, in any of the format among 3 and 4, are going to give
results which are highly redundant. Thus, before running best query selection model on large number
of articles, we need to select a minimal set of queries per section from all the generated queries which I
give negligibly redundant results and also almost fully cover the results retrieved by all the queries.
For every section of a category, we sample 50 articles randomly which contain the particular section
and' perform the search on web for all the possible queries generated from the body text (in format 4)
separately. For each article, we retrieve top 10 URLs in results for each query. The methodology for
removing queries with similar results is explained in Fig. 7.
The choice of 20 clusters felt reasonable looking at the total number of queries which were around 70
for body text. Also, taking 20 clusters were resulting in maximum number of clusters having internal
. . .
' purity greater than 0.5 and having at least 3 items.
Reducing Queries using Word2 Vec .
All of the bigram queries which make some sense are the ones whose terms occur very close to each
other in the sentences of existing articles or which have similar meanings. For e.g. "high" and
"school" occur very close to each other and thus, "high school" should make sense. Similarly,
"elevation metres", "town area", etc. are some of the sensible queries. A good technique to learn
semantic and syntactic relationship's among the words is Word2Vec .Word2Vec learns the vector
representations for words from the input sentences. We used.the gensim library in python for learning
these word's representations. We extract the sentences from all.of the articles in a category and input it
to the Word2Vec model. We used a skip-gram Word2Vec model with a window size of 5 and 20
dimensions for each word vector.
We drop the unigram queries for they are present in the bigram one's and calculate the cosine
I p 0 ~~&l$r%j!Lbe&e&-tem i3 &f%e query. Similarity of terms in each of the query is ti;e
10
score of that query. We sort the queries with their scores in decreasing order and select the top 20
?
queries with similarity above 0.7. If any of the unigrarn query'is not present in any of the selected top
.2Q queries, we select another query with highest.score containing the corresponding unigram query.
This makes sure that we have all of the unigram query terms in at least one of the filtered bigram
queries. After discussing the both strategies for reducing queries generated from body text, we are
ready to analyse them both. Reducing queries by removing redundant queries depends on the search
engine used and takes around a day to execute for each section. Whereas reducing queries using word
vectors barely takes five to ten seconds and removes less serisible queries. Results of both the
strategies are very much the same ignoring the unigrnm queries. Thus, we finally prefer the later
strategy of reducing queries using Word2Vec in our design.
Best Query Selection .
After filtering out the redundant queries, we select the top queries from both body and section
?
headings generated queries separately which bring the best content for each section.
. .
~ dearch section of the template of a Wikipedia category, we randomly sample 500 existing articles of
the category which contain the particular section. For each query of the section, we query the web for 1 every sampled article. Scoring of a query can be done using equation (I).:
query score, q = ~ i : ys ,. ..... (1)
where "m" is the total number of articles sampled (nearly 500) and "s" is the maximum cosine
similarity score between any of the excerpt retrieved by the query and the original text 'of the section
in the Wikipedia article, as represented by equation (2).
s = max(cosine - simi) .....( 2)
Wh,ere, cosine-sirni'is the cosine similarity between the excerpt i and the original section body.
'The top scoring five queries are selected and stored in the non-volatile storage for each section of the
template. Our results, as shown in Fig.8, clearly shows that body queries either equals the section
heading generated queries or beat them in the quality of content fetched from Internet. Our results
disproves the claims made by Sauper et. al. (2009) where they described body queries to be noisier
compared to the query made with section heading only.
. . .
&
Search and content Extraction
The queries generated in the previous section are then fired on the Internet using Google web search
engine. 'The retrieved excerpts from the webpages contains noise in the form of advertisements,
website information, copyright notice, header, footer, 'links to other articles, user comments, etc.
which is removed using a content extraction algorithm, cleaned excerpts are then further classified as
&. '- xPt3 ~rk 1% v ,tff- Nth% tto_8ico ,r @twQ qppt@doIl~@is &&arized in Figure-9.
Search for excerpts
I Web contains a plethora of information which can be harnessed in variety of areas including natural
1 language processing, text mining, information retrieval, social media analysis and a host of other
I areas. This makes the use of Web as corpus very attractive given different types of applications it can
have. Our work also exploits the web for retrieving information pertaining to the required topic. The
best queries generated by the best query selection model are used to retrieve information from the
Internet. The retrieved information is then used to form the article. Given the potential and huge size
of the web, some web search engines like Google, Baidu, Yahoo, Bing have been developed. They act
as corpus managers giving the ability of searching the web efficiently and quickly. Our work utilises
'Google search because of the large size of web pages indexed by it. Also, Google search provides
more appropriate results to the query fired as compared to Yahoo which was used by Sauper et al.
(2009). The queries generated in the best query selection step are used to obtain the top 10 results
from Google excluding the results from the Wikipedia website. Since we are training and testing the
model on Wikipedia we remove the content of Wikipedia from the search results. . . -
If any link in the above top 10 results is a pdf file then we use a tool called PDFMiner which extracts
the text from the pdf file and removes information in the form of table or image from the pdf. The use
of pdf becomes important especially in the City and Towns in India category as many government
reports regarding various cities and towns are present in detail in the form of pdf. So, by extracting the
text from pdf files any precious information is not lost that may have been lost otherwise. A local
.cache is created which avoids the repeated retrieval of the same pdf file. Before fetching a pdf file it is
checked whether the pdf file is present in the cache or not. Cache is synced with every hardware
which is processing the queries in parallel.
Content Extraction
The top 10 webpages fetched from the Internet have lot of noise in the form of navigation panels
including home, about me, contact us, copyright, header and footer, and advertisements, comments,
etc. This noise is called as boilerplate. This noise in many webpages consists of up to 3540% of the
total content on the webpage which makes it necessary to remove this noise. Among the two main
techniques - webpage level based and website level based for boilerplate removal, we choose the
former because we need to extract content from different webpages and using a website level
approach becomes computationally intensive.
The webpages are fetched using a python script which iterates over the top 10 results returned by the
search engine. When we fetch a webpage we need to identify ourselves as a browser and not as robots
for which User-Agents are used. One important observation regarding content extraction is the fact
that many websites are rendered with a lighter HTML on mobile devices rather than on desktops
1 0 D ~ o ~ ~ e ~ ~ S o~Ys&8 -@~ ~g3 ~r l~iobb ~ ~ e " v i ~ e % r ocawn smeakres t he content extraction p;ocess
12
easier and more efficient by retrieving lighter HTML at the first place. The analysis of various website
level techniques such as Sauper et. al./Nalve approach , CETR, MSS, Kohlschiitter it. al., and
Readabilityfor boilerplate removal is performed against the dataset provided by CleanEva1 and
-
collected by us. After analyzing the precision, recall, F1-score and trade-off among these parameters,
.methodology presented by Kohlschiitter et. al. was found to be best suited for our work.
Excerpt Classification
The candidate excerpts'returned from the web may have information irrelevant to the section. For
example, many of the web-pages in the search result may have relevant information in only one of the
. .
excerpt (paragraph), while others are completely irrelevant. To Gckle this problem, classification of
excerpts is done to find the most representative excerpts of all for the section.
We train classifiers for each section on separate hardwares for fast processing. For each section of the
template for a category, we randomly partition the existing articles into the training (80 percent) and
testing (20 percent) sets. From training set, for all articles, the section body of the particular section in
consideration, if present in the article, is labelled positive. Otherwise, all other section's body in
training set is labelled negative if their section heading is part of the template. The test set is also
labelled the same way.
Instead of taking only accuracy as a measure of performance on testing dataset, our measure for
performance on testing set is average of both accuracy and sensitivity (or True Positive Rate - TPR),
as shown by equation-(3)
TruePositives
TruePositiveRate, TPR = . . . . (3) TruePositives + FalseNegatives
TruePositives + TrueNegatives
accuracy, acc = ..... (4)
TotalPositives + TotalNegatives
~ C +C T PR .
score = ..... (5)
2.0
The number of positively labelled samples can go to nearly one-tenth of the negatively labelled ones
' for'many sections. There is a need to correctly identify positive samples as much as we can for we
don't want to lose any useful information. However, accuracy (as represented by equation (4) can
. .
deceive us as higher number of true negatives (correctly identifying negative samples) may contribute
to higher accuracy even if true positives are low (as their population is very low). Averaging accuracy
and TPR balances the fWO, and we have a reliable c ing measure (as represented by equation (5).
~ p f -D~EL HI 13-10-2ors 16=ig
-we trained three classifiers for each section, namely, Naive Bayes, Logistic Regression and Linear
Support Vector Machine with many configurations in parameter space (Grid Search) distributed over
GPU using CUDA programming and selected the classifier and its configuration which gave the best
score on testing set. We tried following features:
a) ~ormalizedb ag of words (BOW), and
b) Term Frequency - Inverse Document Frequency (TF-IDF)
with unigrarn, bigrarn and trigram dictionaries separately and selected thc best features using the
performance of test set. For each section, an integrated pipeline is created whose 1'' step involves
selection of each feature among the above mentioned features and 2" step involves selection of each
classifier, Naive Bayes, Logistic Regression and Linear SVM, and performing a grid search on the set
of parameters. The features used along with the classifier and its parameters which perform the best on
testing set are stored so as to use them in future without retraining.
The difference between the text in training & testing sets and excerpts retrieved from we6 is that the
inforination in sections'of existing articles is very coherent whereas.web excerpts have information
distributed here and there. We found many excerpts which contain many sentences distributed across
them and positively qualified for a section but the overall pediition for the excerpt by classifier is
negative. To tackle this problem, we break the excerpts into "context windows" of size 3, where each
context window can contain at most 3 continuous sentences from the excerpt. This way, an excerpt
having N sentences results in (N-3+1) context windows. Generally, continuous 3 or 4 sentences have
same context, and thus, our classifier now correctly labels each window based on whether the context
of the window is required for the particular section or not. After classifying, we combine the
.positively classified windows in such a way that those belonging to an excerpt are put together in a
single paragraph with no redundant sentences. Now, we have excerpts whose overall context is more
or less similar to that of the section's forwhich we are searching.
The positively classified excerpts for a section may have any of the following anomalies:-
(Auomaly - a) Context of the excerpt is same to that of the section, but the excerpt is concerned about
any other entity. For e.g. excerpt for personal life of any other actor might be positively qualified in a
search where we want content for personal life of only a specific actor.
(Anomaly - b) Some excerpts are interviews on website and have a high usage of first and second
person pronouns (I,. we, us, me, mine, etc.) along with lot of questions and exclamation marks.
However, we do not want first or second person pronouns in our article. Also, Wikipedia articles tend
not to contain question and exclamation marks in'them.
(Anomaly - c) Some excerpts are positively classified for more than one section at a time, when we
are searching for a particular section. This creates a confusion that which section the information in
I P Q ~ % k x & ! $ t ~ ) & l b & W o ~ ~ ~ ~ ~ '"''
To take care of the above anomalies, we do the following operations on the positively classified
excerpts:
1) Handling ( ~ n o m a-l ~a) : Count the number of article's title occurrences in the excerpt. If we have
a zero count, we ignore the excerpt. An excerpt without even mention of the article's title at least once
is very likely to be about any other entity.
.2) Handling (Anomaly - b): If number of sentences in the excerpt having first or second pronoun,
question narks or exclamation marks is greater than one fifth of the total number of sentences in the
excerpt, we ignore the excerpt. This is most likely to be an interview or some noisy content.
3) Handling (Anomaly - c): This type of anomaly is very difficult to handle automatically. However,
by defining a precedence relationship over sections, we can easily decide a section for an excerpt.
I
I ina all^, after performing above operations, we select top 10 excerpts with respect to the title counts
for summarisation and redundancy removal (next stage).
Multi-document summarisation
Multi-document summarization is related to taking multiple documents and then creating a single
comprehensive summary having the diversity of all the topics. Multi-document summarization uses
two approaches extractive and abstractive. The abstractive approach requires use of natural language
pr'ocessing and generation techniques which are not well developed. So the extractive multi-document
summarization is more popular.
While generating a summary, foll'owing factors need to be taken into consideration ,
- Relevance: Summary should contain informative textual units (sentences) which are relevant
to the topic.
- Redundancy: Similar sentences should not appear more than once in the summary.
- Length: A bound on the length of the summary is defined, the score of the summary needs to
be maximized subject to the length.
This is modeled as a global inference problem, the selection of a sentence in the summary not only
depends on its properties, but it depends upon the complete documents set.
~ h b aIlnf erence
The input is a set of document D = {dl, d2,. . .., dk). A document. di contains a set of textual units d =
{tl, t2, . . . ., tm}. The textual units are essentially the sentences present in the documents, textual units
and sentences are interchangeable used in rest of the paper. The document set now becomes a set of
. sentences from different documents.
f
Summary Generation undergoes following steps already reported in literature:
1. Cleaning of the input document set: Headers and non-alpha-numeric characters are removed
from the documents.
2. Documents are split into sentences and sentences are stored on the local machine.
3. Sentences having less than 5 words are considered to be irrelevant and are thus omitted from
further consideration.
4. The parameters used in the model are computed.
. , .5. A global inference algorithm is used to maximize the score of the summary and thus,
extracting the sentences which contribute in maximizing the summary score.
Following scoring parameters are used in the score of the summary,
1. Relevance (i) : The measure of Relevance of a textual unit ti, participating in the summary.
2. Redundancy (i, j): The measure of redundancy of a textual unit ti with another textual unit tj,
higher the value, higher is the similarity between the two textual units.
3. sentence Len (i): length of the textual unit ti.
Now, the multi document summarization inference problem can be stated by equation (6) and (7):
score(S) = max (zs Relevance(i) - 1 Redundancy(i. j )
ti, tj ES, i score(T2) then
7. dp[i][k]= TI
8. else
9. dp[i][k]= T2
10. return max score(d- - - - -.
The data of two categories namely American Male Actors, and City and Towns in India were taken.
The ~mericanM ale Actors were chosen to compare our results with Sauper et. al. (2009). City and
owns in India category has very rich information in its articles. The results were compared with the
results of Sauper et. al. (2009) and an Oracle which have been discussed below:
Oracle: It is calculated using the present model without the multi-document summarization part. The
maximum cosine similarity between any of the positively classified excerpt were measured and the
. .
original content of the section in a Wikipedia article. On averaging the maximum cosine similarity
over test dataset gives an oracle score. It is the optimal score is achieved by the present model and
thus, gives a limit over the task of selecting relevant information.
After generating a final Wikipedia article, the quality needs to be evaluated. The evaluation
methodology followed is similar to that of evaluating a summary. Both human and automatic
evaluations can be used. Manual evaluation is very expensive. One of the most promising score is
,ROUGE (Recall-Oriented Understudy for Gisting Evaluation) and it has become a standard for
automatic evaluation of summaries as it correlates well with human evaluation.
The generated article has been compared with that of original Wikipedia article using ROUGE-N
score with N being 1, where N=l implies consider unigrams only, N=2 implies consider bigrams only,
and so on. The results prove that the present system shows various merits in view of the existing
techniques.

We claim:
1. An automatic Wikipedia article generation system comprises:
i. Input/output device wherein said device interacts with computer program and computer device
through network interface and bus connected through network channel;
ii. Storage device wherein said storage device is hard disk storage device for storing the components
.of system which are read or written by computer program;
iii. Graphics processing unit wherein said unit performs computational operations;
iv. Data collection device wherein said device extracts article, performs scanning to retrieve content of
articles;
v. Template generator unit wherein said unit generates new article for similar topics;
vi. query generation unit wherein said unit generates query which can retrieve information from web;
vii Search and Content Extraction unit wherein said unit filters noise that links to other articles.
2. The system as claimed in claim 1 accept any Wikipedia category inpl~tf or which Wikipedia article
is to be generated automatically and returned as an output.
3. The system as claimed in claim 1 generates templates for any input Wikipedia category using the
structure consisting.of section/sub-section headings of the existing articles in the said category.
4. The system as claimed in claim 1 wherein said system learn the order of the section headings in
the template, assigns a priority number to each heading of the template and arrange them
accordingly in the template output.
5. The system as claimed in claim 1, fetches highly contextual information for any entity in a
~ i k i ~ e dciaate gory from the Internet wherein the system does so by utilising the content of existing
articles in that category and generates possible queries, selects a maximal set of queries which are
least redundant and highly relevant for gathering the information about a new entity.
.6..The system as claimed in claim 1 generates queries from the body text and section headings of the
existing articles in a Wikipedia category the queries generated from body text are less noisier than
generated from section headings.
7. The system as claimed in claim 1 measure the semantic relationships from the queries wherein the
said system display them graphically.
8. The system as claimed in claim 1 processes varied text Internet document including pdf and extract
sflhiasiie&gateg&i .
20
9. The system as claimed in claim 1 measures the quality of generated queries for information retrieval
and filter out any piece of text, which is less relevant for an entity.
10. The system as claimed in claim 1 generates Wikipedia articles for multiple languages.

Documents

Orders

Section	Controller	Decision Date

Application Documents

#	Name	Date
1	3280-DEL-2015-AMMENDED DOCUMENTS [02-07-2024(online)].pdf	2024-07-02
1	3280-del-2015-Form-5-(13-10-2015).pdf	2015-10-13
1	3280-DEL-2015-IntimationOfGrant19-12-2024.pdf	2024-12-19
2	3280-DEL-2015-Annexure [02-07-2024(online)].pdf	2024-07-02
2	3280-del-2015-Form-3-(13-10-2015).pdf	2015-10-13
2	3280-DEL-2015-PatentCertificate19-12-2024.pdf	2024-12-19
3	3280-DEL-2015-FORM 13 [02-07-2024(online)].pdf	2024-07-02
3	3280-del-2015-Form-2-(13-10-2015).pdf	2015-10-13
3	3280-DEL-2015-Response to office action [19-12-2024(online)].pdf	2024-12-19
4	3280-DEL-2015-MARKED COPIES OF AMENDEMENTS [02-07-2024(online)].pdf	2024-07-02
4	3280-del-2015-Form-1-(13-10-2015).pdf	2015-10-13
4	3280-DEL-2015-AMMENDED DOCUMENTS [02-07-2024(online)].pdf	2024-07-02
5	Form 18 [06-10-2016(online)].pdf	2016-10-06
5	3280-DEL-2015-Written submissions and relevant documents [02-07-2024(online)].pdf	2024-07-02
5	3280-DEL-2015-Annexure [02-07-2024(online)].pdf	2024-07-02
6	3280-DEL-2015-Proof of Right [01-06-2021(online)].pdf	2021-06-01
6	3280-DEL-2015-FORM-26 [17-06-2024(online)].pdf	2024-06-17
6	3280-DEL-2015-FORM 13 [02-07-2024(online)].pdf	2024-07-02
7	3280-DEL-2015-OTHERS [01-06-2021(online)].pdf	2021-06-01
7	3280-DEL-2015-MARKED COPIES OF AMENDEMENTS [02-07-2024(online)].pdf	2024-07-02
7	3280-DEL-2015-Correspondence to notify the Controller [14-06-2024(online)].pdf	2024-06-14
8	3280-DEL-2015-FORM 13 [07-06-2024(online)].pdf	2024-06-07
8	3280-DEL-2015-FORM28 [01-06-2021(online)].pdf	2021-06-01
8	3280-DEL-2015-Written submissions and relevant documents [02-07-2024(online)].pdf	2024-07-02
9	3280-DEL-2015-EVIDENCE FOR REGISTRATION UNDER SSI [01-06-2021(online)].pdf	2021-06-01
9	3280-DEL-2015-FORM-26 [17-06-2024(online)].pdf	2024-06-17
9	3280-DEL-2015-POA [07-06-2024(online)].pdf	2024-06-07
10	3280-DEL-2015-Correspondence to notify the Controller [14-06-2024(online)].pdf	2024-06-14
10	3280-DEL-2015-ENDORSEMENT BY INVENTORS [01-06-2021(online)].pdf	2021-06-01
10	3280-DEL-2015-RELEVANT DOCUMENTS [07-06-2024(online)].pdf	2024-06-07
11	3280-DEL-2015-ASSIGNMENT DOCUMENTS [01-06-2021(online)].pdf	2021-06-01
11	3280-DEL-2015-FORM 13 [07-06-2024(online)].pdf	2024-06-07
11	3280-DEL-2015-US(14)-HearingNotice-(HearingDate-19-06-2024).pdf	2024-06-04
12	3280-del-2015-Annexure [01-06-2021(online)].pdf	2021-06-01
12	3280-DEL-2015-EVIDENCE OF ELIGIBILTY RULE 24C1f [03-05-2023(online)].pdf	2023-05-03
12	3280-DEL-2015-POA [07-06-2024(online)].pdf	2024-06-07
13	3280-DEL-2015-RELEVANT DOCUMENTS [07-06-2024(online)].pdf	2024-06-07
13	3280-DEL-2015-FORM 18A [03-05-2023(online)].pdf	2023-05-03
13	3280-DEL-2015-8(i)-Substitution-Change Of Applicant - Form 6 [01-06-2021(online)].pdf	2021-06-01
14	3280-DEL-2015-Abstract-090721.pdf	2021-10-17
14	3280-DEL-2015-FORM 13 [04-06-2021(online)].pdf	2021-06-04
14	3280-DEL-2015-US(14)-HearingNotice-(HearingDate-19-06-2024).pdf	2024-06-04
15	3280-DEL-2015-Claims-090721.pdf	2021-10-17
15	3280-DEL-2015-EVIDENCE OF ELIGIBILTY RULE 24C1f [03-05-2023(online)].pdf	2023-05-03
15	3280-DEL-2015-OTHERS [23-06-2021(online)].pdf	2021-06-23
16	3280-DEL-2015-Correspondence-090721.pdf	2021-10-17
16	3280-DEL-2015-FORM 18A [03-05-2023(online)].pdf	2023-05-03
16	3280-DEL-2015-FORM 3 [23-06-2021(online)].pdf	2021-06-23
17	3280-DEL-2015-Abstract-090721.pdf	2021-10-17
17	3280-DEL-2015-Description(Complete)-090721.pdf	2021-10-17
17	3280-DEL-2015-FER_SER_REPLY [23-06-2021(online)].pdf	2021-06-23
18	3280-DEL-2015-Claims-090721.pdf	2021-10-17
18	3280-DEL-2015-DRAWING [23-06-2021(online)].pdf	2021-06-23
18	3280-DEL-2015-Drawing-090721.pdf	2021-10-17
19	3280-DEL-2015-COMPLETE SPECIFICATION [23-06-2021(online)].pdf	2021-06-23
19	3280-DEL-2015-Correspondence-090721.pdf	2021-10-17
19	3280-DEL-2015-FER.pdf	2021-10-17
20	3280-DEL-2015-CLAIMS [23-06-2021(online)].pdf	2021-06-23
20	3280-DEL-2015-Description(Complete)-090721.pdf	2021-10-17
20	3280-DEL-2015-Form 2(Title Page)-090721.pdf	2021-10-17
21	3280-DEL-2015-Form 3-090721.pdf	2021-10-17
21	3280-DEL-2015-Drawing-090721.pdf	2021-10-17
21	3280-DEL-2015-Annexure [23-06-2021(online)].pdf	2021-06-23
22	3280-DEL-2015-ABSTRACT [23-06-2021(online)].pdf	2021-06-23
22	3280-DEL-2015-FER.pdf	2021-10-17
22	3280-DEL-2015-OTHERS-090721.pdf	2021-10-17
23	3280-DEL-2015-Form 2(Title Page)-090721.pdf	2021-10-17
23	3280-DEL-2015-FORM-8 [22-09-2021(online)].pdf	2021-09-22
24	3280-DEL-2015-OTHERS-090721.pdf	2021-10-17
24	3280-DEL-2015-Form 3-090721.pdf	2021-10-17
24	3280-DEL-2015-ABSTRACT [23-06-2021(online)].pdf	2021-06-23
25	3280-DEL-2015-Form 3-090721.pdf	2021-10-17
25	3280-DEL-2015-OTHERS-090721.pdf	2021-10-17
25	3280-DEL-2015-Annexure [23-06-2021(online)].pdf	2021-06-23
26	3280-DEL-2015-CLAIMS [23-06-2021(online)].pdf	2021-06-23
26	3280-DEL-2015-Form 2(Title Page)-090721.pdf	2021-10-17
26	3280-DEL-2015-FORM-8 [22-09-2021(online)].pdf	2021-09-22
27	3280-DEL-2015-ABSTRACT [23-06-2021(online)].pdf	2021-06-23
27	3280-DEL-2015-COMPLETE SPECIFICATION [23-06-2021(online)].pdf	2021-06-23
27	3280-DEL-2015-FER.pdf	2021-10-17
28	3280-DEL-2015-Drawing-090721.pdf	2021-10-17
28	3280-DEL-2015-DRAWING [23-06-2021(online)].pdf	2021-06-23
28	3280-DEL-2015-Annexure [23-06-2021(online)].pdf	2021-06-23
29	3280-DEL-2015-CLAIMS [23-06-2021(online)].pdf	2021-06-23
29	3280-DEL-2015-Description(Complete)-090721.pdf	2021-10-17
29	3280-DEL-2015-FER_SER_REPLY [23-06-2021(online)].pdf	2021-06-23
30	3280-DEL-2015-COMPLETE SPECIFICATION [23-06-2021(online)].pdf	2021-06-23
30	3280-DEL-2015-Correspondence-090721.pdf	2021-10-17
30	3280-DEL-2015-FORM 3 [23-06-2021(online)].pdf	2021-06-23
31	3280-DEL-2015-Claims-090721.pdf	2021-10-17
31	3280-DEL-2015-DRAWING [23-06-2021(online)].pdf	2021-06-23
31	3280-DEL-2015-OTHERS [23-06-2021(online)].pdf	2021-06-23
32	3280-DEL-2015-Abstract-090721.pdf	2021-10-17
32	3280-DEL-2015-FER_SER_REPLY [23-06-2021(online)].pdf	2021-06-23
32	3280-DEL-2015-FORM 13 [04-06-2021(online)].pdf	2021-06-04
33	3280-DEL-2015-8(i)-Substitution-Change Of Applicant - Form 6 [01-06-2021(online)].pdf	2021-06-01
33	3280-DEL-2015-FORM 18A [03-05-2023(online)].pdf	2023-05-03
33	3280-DEL-2015-FORM 3 [23-06-2021(online)].pdf	2021-06-23
34	3280-del-2015-Annexure [01-06-2021(online)].pdf	2021-06-01
34	3280-DEL-2015-EVIDENCE OF ELIGIBILTY RULE 24C1f [03-05-2023(online)].pdf	2023-05-03
34	3280-DEL-2015-OTHERS [23-06-2021(online)].pdf	2021-06-23
35	3280-DEL-2015-US(14)-HearingNotice-(HearingDate-19-06-2024).pdf	2024-06-04
35	3280-DEL-2015-FORM 13 [04-06-2021(online)].pdf	2021-06-04
35	3280-DEL-2015-ASSIGNMENT DOCUMENTS [01-06-2021(online)].pdf	2021-06-01
36	3280-DEL-2015-8(i)-Substitution-Change Of Applicant - Form 6 [01-06-2021(online)].pdf	2021-06-01
36	3280-DEL-2015-ENDORSEMENT BY INVENTORS [01-06-2021(online)].pdf	2021-06-01
36	3280-DEL-2015-RELEVANT DOCUMENTS [07-06-2024(online)].pdf	2024-06-07
37	3280-del-2015-Annexure [01-06-2021(online)].pdf	2021-06-01
37	3280-DEL-2015-EVIDENCE FOR REGISTRATION UNDER SSI [01-06-2021(online)].pdf	2021-06-01
37	3280-DEL-2015-POA [07-06-2024(online)].pdf	2024-06-07
38	3280-DEL-2015-ASSIGNMENT DOCUMENTS [01-06-2021(online)].pdf	2021-06-01
38	3280-DEL-2015-FORM 13 [07-06-2024(online)].pdf	2024-06-07
38	3280-DEL-2015-FORM28 [01-06-2021(online)].pdf	2021-06-01
39	3280-DEL-2015-Correspondence to notify the Controller [14-06-2024(online)].pdf	2024-06-14
39	3280-DEL-2015-ENDORSEMENT BY INVENTORS [01-06-2021(online)].pdf	2021-06-01
39	3280-DEL-2015-OTHERS [01-06-2021(online)].pdf	2021-06-01
40	3280-DEL-2015-EVIDENCE FOR REGISTRATION UNDER SSI [01-06-2021(online)].pdf	2021-06-01
40	3280-DEL-2015-FORM-26 [17-06-2024(online)].pdf	2024-06-17
40	3280-DEL-2015-Proof of Right [01-06-2021(online)].pdf	2021-06-01
41	3280-DEL-2015-FORM28 [01-06-2021(online)].pdf	2021-06-01
41	3280-DEL-2015-Written submissions and relevant documents [02-07-2024(online)].pdf	2024-07-02
41	Form 18 [06-10-2016(online)].pdf	2016-10-06
42	3280-del-2015-Form-1-(13-10-2015).pdf	2015-10-13
42	3280-DEL-2015-MARKED COPIES OF AMENDEMENTS [02-07-2024(online)].pdf	2024-07-02
42	3280-DEL-2015-OTHERS [01-06-2021(online)].pdf	2021-06-01
43	3280-DEL-2015-FORM 13 [02-07-2024(online)].pdf	2024-07-02
43	3280-del-2015-Form-2-(13-10-2015).pdf	2015-10-13
43	3280-DEL-2015-Proof of Right [01-06-2021(online)].pdf	2021-06-01
44	3280-DEL-2015-Annexure [02-07-2024(online)].pdf	2024-07-02
44	3280-del-2015-Form-3-(13-10-2015).pdf	2015-10-13
44	Form 18 [06-10-2016(online)].pdf	2016-10-06
45	3280-DEL-2015-AMMENDED DOCUMENTS [02-07-2024(online)].pdf	2024-07-02
45	3280-del-2015-Form-1-(13-10-2015).pdf	2015-10-13
45	3280-del-2015-Form-5-(13-10-2015).pdf	2015-10-13
46	3280-del-2015-Form-2-(13-10-2015).pdf	2015-10-13
46	3280-DEL-2015-Response to office action [19-12-2024(online)].pdf	2024-12-19
47	3280-del-2015-Form-3-(13-10-2015).pdf	2015-10-13
47	3280-DEL-2015-PatentCertificate19-12-2024.pdf	2024-12-19
48	3280-DEL-2015-IntimationOfGrant19-12-2024.pdf	2024-12-19
48	3280-del-2015-Form-5-(13-10-2015).pdf	2015-10-13

Search Strategy

1	2020-12-2215-04-02E_22-12-2020.pdf

ERegister / Renewals

3rd: 12 Mar 2025

From 13/10/2017 - To 13/10/2018

4th: 12 Mar 2025

From 13/10/2018 - To 13/10/2019

5th: 12 Mar 2025

From 13/10/2019 - To 13/10/2020

6th: 12 Mar 2025

From 13/10/2020 - To 13/10/2021

7th: 12 Mar 2025

From 13/10/2021 - To 13/10/2022

8th: 12 Mar 2025

From 13/10/2022 - To 13/10/2023

9th: 12 Mar 2025

From 13/10/2023 - To 13/10/2024

10th: 12 Mar 2025

From 13/10/2024 - To 13/10/2025

11th: 12 Mar 2025

From 13/10/2025 - To 13/10/2026