Abstract: The present invention relates to conversion of file from one format to another within a computer system. Particularly it relates to conversion of file from Document to XML format. More particularly it relates to a method as well as system for converting a file from Document to XML format.
FORM 2
THE PATENT ACT 1970 (39 of 1970)
&
The Patents Rules, 2 003 COMPLETE SPECIFICATION See Section 10, and rule 13)
1. TITLE OF INVENTION
A METHOD AND SYSTEM FOR CONVERTING A FILE FROM DOCUMENT TO XML FORMAT
2. APPLICANT(S)
a) Name : I MEDIA CORP LIMITED
b) Nationality : INDIAN Company
c) Address : EL 2 01, TTC INDUSTRIAL ESTATE,
MIDC MAHAPE,
NAVI MUMBAI - 400 079
MAHARASHTRA
3. PREAMBLE TO THE DESCRIPTION
The following specification particularly describes the intention and the manner in which it is to be performed : -
A Method and System for Converting a File from Document to XML Format
Field of Invention:
The present invention relates to conversion of file from one format to another within a computer system. Particularly it relates to conversion of file from Document to XML format. More particularly it relates to a method as well as system for converting a file from Document to XML format.
Prior Art:
Document format is popular because almost all users will be able to open and print them easily. Also it is simple to create and post. But it has certain disadvantages which makes the user to avoid the Document format for using. Those disadvantages are as follows:
• difficult to ensure consistency of formatting so can look unprofessional
• vulnerable to corruption by viruses
• may display strangely if they are opened using a different character set to that with which they are created
• appear vulnerable to appropriation as they are easily downloaded and edited
XML stands for Extensible Markup Language. XML is a markup language much like HTML. XML was designed to describe data. XML tags are not predefined. You must define your own tags. XML uses a Document Type Definition (DTD) or an XML Schema to describe the data. XML with a DTD or XML Schema is designed to be self-descriptive. XML is a W3C Recommendation.
XML has become a buzzword that's so over-used that it's difficult to understand when it might and when it might not be appropriate. In general, the main reason for XML's popularity is that it provides an underlying technology that gives "portability" of information across platforms, applications/ and organizations.
2
Much of the emphasis on XML has been on sending "structured" data in between companies. For example, if company A wants to send a purchase order to company B - they both need to agree on a formatting convention. XML provides the language of both the description of that formatting convention, and provides a convenient way to actually send the purchase order data.
While there are significant benefits to having inter-operable structured data, we believe that a use of XML that is just as important is for the creation, storage, indexing, and publishing of Documents - what is often referred to as "unstructured content". Unstructured (and semi-structured) content today in corporations is kept in a number of locations and typically makes up about 80% of a company's overall data/ information. Unlike structured data, which typically lives in databases and is well-ordered, unstructured content lives on individual file servers (as Microsoft Word or PDF files), in groupware databases (like Lotus Notes), on web servers (as HTML Documents) or in other legacy systems.
Why Create/Convert Documents to XML?
1. Allows Intelligent Queries of Content.
One of the main reasons to get Documents out of their existing formats is to be able to search / index those Documents in a meaningful way.
Say, for example, that your organization has one or more directories full of resumes.
Many resumes come in email or in Microsoft Word (.DOC) formats. This is not a particularly useful format for searching or indexing. Suppose you wanted to do a query to find "all people who worked for Lotus from 1998-2000. It is difficult, if not impossible to find this information from a group of files sitting on a file server. One
approach has been to full text index the Documents. This might help you find all people with the word Lotus in their resume- but there is still no intelligence around
3
the indexing. If the Documents were broken into meaningful XML formats (such as HR-XML, etc.), then it would be much easier to do this type of querying as you would have turned your Documents into a virtual database.
Similarly, if you were a mutual fund company you might have a collection of investment research gathered from a number of different sources, sitting on file servers as PDF files. PDF files are particularly difficult to fool around with because they aren't meant to be edited - only read. However, you might want to query this body of research to find all of those research reports which upgraded a stock from a Buy to a Strong Buy. Again, if you were to convert these into a meaningful XML format (such as RIXML), then you would be able to do this type of querying against the source data because it would be intelligently categorized.
This "intelligent" indexing can happen even if the Documents say as individual XML files on the file server, or it could happen by moving the XML into an XML database store.
2. Write Once, Publish Many Times.
Perhaps the most important reason to convert Documents to XML is when those Documents need to be published. Corporations today have more than one channel of information to their customers. This includes printed Documents and manuals, electronic communication that is emailed (brochures, email), web sites (which are in HTML format).
Most companies don't have a coherent strategy for external publishing - it is done in different ways throughout the company. One group might use Word Documents which are printed directly. Another might use a content management system for the web site. Yet another might convert to PDF for manuals.
4
The key with XML, as shown in Figure 1, is that it can be transformed into the appropriate publishing format - Word (DOC/RTF), HTML (for web sites), PDF (for printed Documentation), DocBook (An XML standard for storage and sharing of content), WML (for wireless devices), and into any other format which becomes available in the future. This saves time and money because effort doesn't have to be repeated. With a push of the button the XML can be transformed (using XSLT, or XML stylesheets) for transformation.
3. Custom-Assemble Documents for Customers, Business Partners.
Another key benefit that comes from having content stored in XML format is that it can be "custom-assembled". This means that customer A, who might be a customer that is only interested in research about two companies in the semi-conductor industry and 3 companies in software, can bet a research report that only covers those companies - rather than having to go through dozens of companies in each industry. Because the content can be assembled on the fly, as shown in Figure 2.
4. Saves Time and Money by Streamlining the Authoring Process.
Research has shown that during the authoring process as much as 50% of the time that is spent is on formatting. By having templates for Documents that are similar (which can be done using XSLT) and using an XML authoring tool, the author only has to worry about the content. For example, most press releases look the same, as do most product brochures. Most proposals should look the same, but often don't. Using XML as the mechanism for authoring and storing content can enforce consistency in standards and allow users not to have to worry about the eventual formatting, which will be handled by the templates and by validation files (DTD's or XML schemas).
5
5. Encourages Reuse of Documents and Fragments.
XML allows for the storage of "Document fragments", which encourages reuse of existing content. This means that you will be able to find Document fragments and include them in new Documents much more easily.
6. Distributed Authoring and Security.
XML is ideal for a content management system where dozens of people need to contribute content. Existing authoring tools, such as Word and other desktop editors are not ideal for this type of environment. Because each section (or page, within a web site) may have one or more people who are allowed to edit it, storage of pages in XML format allows each to be treated as a separate object, with separate permissions and authors can simultaneously edit different pages within the overall Document.
Another key benefit is that if end users are only allowed to view certain parts of Documents - by assembling the final Document based on the preferences of the end user is a better way to distribute Documents. Again, if all the sections are in XML, this type of end user security becomes much easier to enforce. If all the sections are stored in Word or PDF files, this becomes a much more difficult task.
7. Syndication of Content - Web Services.
XML is the language of Web Services and of Syndication of Content. This means that you can distribute your content (research reports, press releases, product catalogs, brochures) to other web sites or companies who may need to include your information on their site, but with some changes. Syndication of Content is often used for aggregation of content from different sources (for example, an industry site might want to publish a press release that your company created). If the information is provided in HTML, this is problematic because each source site will have different
6
formatting. However, if each source company provides XML (even if they provide slightly differing XML), the aggregation site can easily.
Web services are an emerging trend where one server makes a request for content from another server. This could be any type of content, or could be more programmatic structured data. By converting your Documents into XML, you open up Web Services for Documents, which allows for better information sharing with customers, business partners, and suppliers. For more on Web Services, see the upcoming white paper, Web Services for Documents.
8. Portability of content.
Many web content management systems provide distributed authoring, reuse of
fragments, etc., but do not store their content in an XML format. This makes it very difficult to move off of that particular content management system. If, however, the data is in XML (or can be easily exported into XML), then the end user has the flexibility to migrate the content easily into another system that supports XML rather than being tied to a particular vendor.
In addition to all of these specific business benefits, XML is particularly well suited technically for the storage of unstructured and semi-structured content. This is because most Documents have a tree-like structure (title, heading 1, section 1, paragraph 1, etc.), and XML has a tree-like structures. There is a lot of content that has been published in HTML format over the last five years (millions of pages) - and XML is a perfect format for distributing this information between sites. That is because both HTML and XML are both based on SGML, which is a more generic language for defining Documents.
From above discussion it is clear that why XML format is needed over Document. Hence by considering this it necessary to provide a method as well as system to convert a file from Document to XML format
7
Object of the Invention:
The object of the present invention is to convert a file from Document to XML format.
Another object of the present invention is to provide a method for converting a file from Document to XML format.
Another object of the present invention is to provide system for converting a file from Document to XML format.
Statement of the Invention:
To achieve the above mentioned objects there is provided a method for converting a file from Document to XML format within a computer system, said method comprising:
storing Document file to be converted in to XML in a computer readable storage medium within said computer system;
storing one or more conversion parameters required for converting a file from Document to XML format in said computer readable storage medium;
requesting for the conversion of said file from Document to XML format to the conversion software module;
retrieving automatically said conversion parameters from said computer readable storage medium by the conversion software module;
performing conversion of said file into XML using conversion parameters and
8
outputting the resulting XML file from said conversion software module to application folder.
It is preferable according to present invention to enter conversion parameters manually.
A method of present invention further comprises step of initializing said conversion software module, thereby allowing said conversion software module to be executed automatically.
According to present invention conversion software module consists of a set of instructions capable of being performing the task of Document to XML conversion.
Also there is provided a computer system for converting a file from Document to XML format, said system comprises:
at least one processor being arranged to accept a request for conversion of file from Document to XML format;
computer readable storage medium for storing a Document file to be converted and conversion parameters required for converting a file from Document to XML format; and
conversion software module consisting of set of instructions capable of being executed by said processor to perform the task of Document to XML conversion.
In a preferred form of invention a processor includes a computer server arrangement.
9
Brief description of the Drawings:
Detail Description of the Invention:
The above, and the other objects, features & advantages of invention will become apparent from following description read in conjunction with the accompanying drawings.
According to present invention a computer system for converting a file from Document to XML format comprises: at least one processor being arranged to accept a request for conversion of file from Document to XML format; computer readable storage medium for storing a Document file to be converted and conversion parameters required for converting a file from Document to XML format; and conversion software module consisting of set of instructions capable of being executed by said processor to perform the task of Document to XML conversion.
One such computer system in its physical form is shown in figure 3.
10
As discussed herein, a "system" or "computer system", such as a system for accessing, presenting, and manipulating general-purpose data sources responsive to virtual file system operations, may be an apparatus including hardware and/or software for processing data. The system may include, but is not limited to, a computer (e.g., portable, laptop, desktop, server, mainframe, etc.), hard copy equipment (e.g., optical disk burner, printer, plotter, fax machine, etc.), and the like.
A computer system 10 representing an exemplary workstation, host, or server in which features of the present invention may be implemented will now be described with reference to FIG. 3. The computer system 10 represents one possible computer system for implementing embodiments of the present invention, however other computer systems and variations of the computer system 10 are also possible. The computer system 10 comprises a bus or other communication means 11 for communicating information, and a processing means such as processor 12 coupled with the bus 11 for processing information. The computer system 10 further comprises a random access memory (RAM) or other dynamic storage device 14 (referred to as main memory), coupled to the bus 11 for storing information and instructions to be executed by the processor 12. The main memory 14 also may be used for storing temporary variables or other intermediate information during execution of instructions by the processor 12. In one embodiment, the main memory 14 may be used for storing the operating system, the file system, and application programs/modules such as the DMT 120 (e.g., FSI 235 and DAM 240), data structures, VFS representations, coded instructions, rule sets, and other types of data. The main memory 14 may also be used to implement the on-demand or pre-fetch cache. The computer system 10 also comprises a read only memory (ROM) and other static storage devices 16 coupled to the bus 11 for storing static information and instructions for the processor 12, such as the BIOS. A data storage device 17 such as a magnetic disk, zip, or optical disc and its corresponding drive may also be coupled to the computer system 10 for storing information and instructions. In one embodiment, the data storage device 17 may be used to compile data from several
11
different data sources, such as for backup, analysis, conversion, sale, or other purposes. A software conversion module is stored in the memory or storage device.
The computer system 10 may also be coupled via the bus 11 to a display device 21, such as a cathode ray tube (CRT) or Liquid Crystal Display (LCD), for displaying information to an end user. The display device may be used to display certain VFS representations and GUIs discussed in this application. Typically, a data input device 22, such as a keyboard or other alphanumeric input device including alphanumeric and other keys, may be coupled to the bus 11 for communicating information and command selections to the processor 1802. Another type of user input device is a cursor control device 23, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to the processor 12 and for controlling cursor movement on the display 21.
A communication device 25 is also coupled to the bus 11. Depending upon the particular implementation, the communication device 25 may include a modem, a network interface card, or other well-known interface devices, such as those used for coupling to Ethernet, token ring, or other types of physical attachment for purposes of providing a communication link to support a local or wide area network, for example. In any event, in this manner, the computer system 10 may be coupled to a number of clients or servers via a conventional network infrastructure, such as a company's intranet, an extranet, or the Internet, for example. The communication device may be used to send requests for data manipulation (e.g., data manipulation commands), and write data to other computer systems, and to receive results corresponding to the requests or commands.
Embodiments of the invention are not limited to any particular computer system. Rather, embodiments may be used on any stand alone, distributed, networked, or other type of computer system. For example, embodiments may be used on one or more computers compatible with NT, Linux, Windows, Macintosh, any variation of UNIX, or others. In a preferred form of invention a system includes a computer
12
server arrangement. Also conversion software module is stored in memory or fixed/removable disk.
Figure 4 shows flow chart of a method for conversion of file from Document to XML format according to present invention According to method of present invention first Document file to be converted into XML is stored in a computer readable storage medium. Along with Document file conversion parameters required for converting for file from Document to XML format are also stored in this computer readable medium. These conversion parameters are entered manually. Such computer readable medium includes, but is not limited to: magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROMs and holographic devices; magneto-optical media such as floptical disks.
Then these all i.e. Document file and conversion parameters are given to file conversion software module. This conversion module consists of program instructions which are executed by the processor. This conversion module receives the request for converting a file from Document to XML format. Then it retrieves the conversion parameters from the computer readable medium.
After getting conversion parameters conversion module converts the Document file into XML format and output the same to the application folder.
The present invention includes various steps, as described above. The steps of the present invention may be performed by hardware components or may be embodied in machine-executable instructions, which may be used to cause a general-purpose or special-purpose processor or logic circuits programmed with the instructions to perform the steps. The present invention may be provided as a computer program product that may include a machine-readable medium having stored thereon instructions that may be used to program a computer (or other electronic devices) to perform a process according to the present invention. The machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-
13
ROMs, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, magnet or optical cards, flash memory, or other type of media or machine-readable medium suitable for Storing electronic instructions. Moreover, the present invention may also be downloaded as a computer program product, wherein the program may be transferred from a remote computer to a requesting computer by way of data signals embodied in a carrier wave or other propagation medium via a communication link (e.g., a modem or network connection). Alternatively, the steps may be performed by a combination of hardware and software.
One more embodiment is shown in figure 5. Figure shows flow chart of the embodiment of the invention. Figure shows the flow chart of the method which simultaneously converts plurality of Document files in to XML files. In this the count number which is equal to total number of Document files along with Document files to be converted in to XML format and conversion parameters are feed in to the computer readable storage medium. Then all Document files are converted into XML format one by one by and outputted to application folder by the conversion software module.
Industrial Applicability:
Corporations have a tremendous amount of information assets that exist today as individual files in directories. This includes memos, reports, proposals, brochures, white papers, documentation, research, intranet sites, public web pages, etc. Because of its unstructured nature, it has been difficult to leverage this information and to reduce both the cost and complexity of managing this information. XML is a powerful tool that simplifies the creation, storage, indexing, categorization, and publishing of this content in complex environments. By converting existing documents and new documents into XML, organizations can achieve significant savings of both time and money.
14
The present invention is not limited to the above described embodiments, and various alterations, modifications, and / or alternative applications of the invention may be possible, if desired, without departing from the scope and spirit of the invention which can be read from the claims and the entire specification. All these possible alterations, modifications, and / or alternative applications of the invention are also intended to be within technical scope of the present invention.
15
WE CLAIM:
1. A method for converting a file from Document to XML format within a
computer system, said method comprising:
storing Document file to be converted in to XML in a computer readable storage medium within said computer system;
storing one or more conversion parameters required for converting a file from Document to XML format in said computer readable storage medium;
requesting for the conversion of said file from Document to XML format to the conversion software module;
retrieving automatically said conversion parameters from said computer readable storage medium by the conversion software module;
performing conversion of said file into XML using conversion parameters and
outputting the resulting XML file from said conversion software module to application folder.
2. A method for converting a file from Document to XML format within a computer system as claimed in claim 1 wherein conversion parameters can be entered manually.
3. A method for converting a file from Document to XML format within a computer system as claimed in claim 1 wherein it further comprises step of initializing said conversion software module, thereby allowing said
conversion software module to be executed automatically.
16
4. A method for converting a file from Document to XML format within a computer system as claimed in claim lwherein said conversion software module consists of a set of instructions capable of being performing the task of Document to XML conversion.
5. A computer system for converting a file from Document to XML format, said system comprises:
at least one processor being arranged to accept a request for conversion of file from Document to XML format;
computer readable storage medium for storing a Document file to be converted and conversion parameters required for converting a file from Document to XML format; and
conversion software module consisting of set of instructions capable of being
executed by said processor to perform the task of Document to XML conversion.
6. A computer system for converting a file from Document to XML format as claimed in claim 7 wherein processor includes a computer server arrangement.
7. A method for converting a file from Document to XML format within a computer system as hereinbefore described and illustrated with reference to accompanying drawings.
17
8. A computer system for converting a file from Document to XML format as hereinbefore described and illustrated with reference to accompanying drawings.
18
Dated this 20th day of January, 2007.
| # | Name | Date |
|---|---|---|
| 1 | 131-MUM-2007- FIRST EXAMINATION REPORT.pdf | 2022-03-01 |
| 1 | 131-MUM-2007-CORRESPONDENCE(IPO)(10-10-2011).pdf | 2011-10-10 |
| 2 | 131-MUM-2007_EXAMREPORT.pdf | 2018-08-09 |
| 2 | 131-MUM-2007- OTHER DOCUMENTS.pdf | 2022-03-01 |
| 3 | 131-mum-2007-form-3.pdf | 2018-08-09 |
| 3 | 131-MUM-2007- PUBLICATION REPORT.pdf | 2022-03-01 |
| 4 | 131-mum-2007-form-2.pdf | 2018-08-09 |
| 5 | 131-mum-2007-claims.pdf | 2018-08-09 |
| 6 | 131-mum-2007-form-1.pdf | 2018-08-09 |
| 6 | 131-mum-2007-correspondence(11-9-2007).pdf | 2018-08-09 |
| 7 | 131-mum-2007-form 18(11-9-2007).pdf | 2018-08-09 |
| 7 | 131-MUM-2007-CORRESPONDENCE(12-7-2011).pdf | 2018-08-09 |
| 8 | 131-mum-2007-drawings.pdf | 2018-08-09 |
| 8 | 131-mum-2007-correspondence-received.pdf | 2018-08-09 |
| 9 | 131-MUM-2007-DRAWING(12-7-2011).pdf | 2018-08-09 |
| 9 | 131-mum-2007-description (complete).pdf | 2018-08-09 |
| 10 | 131-MUM-2007-DRAWING(12-7-2011).pdf | 2018-08-09 |
| 10 | 131-mum-2007-description (complete).pdf | 2018-08-09 |
| 11 | 131-mum-2007-drawings.pdf | 2018-08-09 |
| 11 | 131-mum-2007-correspondence-received.pdf | 2018-08-09 |
| 12 | 131-mum-2007-form 18(11-9-2007).pdf | 2018-08-09 |
| 12 | 131-MUM-2007-CORRESPONDENCE(12-7-2011).pdf | 2018-08-09 |
| 13 | 131-mum-2007-form-1.pdf | 2018-08-09 |
| 13 | 131-mum-2007-correspondence(11-9-2007).pdf | 2018-08-09 |
| 14 | 131-mum-2007-claims.pdf | 2018-08-09 |
| 15 | 131-mum-2007-form-2.pdf | 2018-08-09 |
| 16 | 131-mum-2007-form-3.pdf | 2018-08-09 |
| 16 | 131-MUM-2007- PUBLICATION REPORT.pdf | 2022-03-01 |
| 17 | 131-MUM-2007_EXAMREPORT.pdf | 2018-08-09 |
| 17 | 131-MUM-2007- OTHER DOCUMENTS.pdf | 2022-03-01 |
| 18 | 131-MUM-2007- FIRST EXAMINATION REPORT.pdf | 2022-03-01 |
| 18 | 131-MUM-2007-CORRESPONDENCE(IPO)(10-10-2011).pdf | 2011-10-10 |