METHODS AND SYTEMS FOR BUILDING PACKAGES THAT CONTAIN
PRE-PAGINATED DOCUMENTS
TECHNICAL FIELD
This invention relates to a content framework, document format and related methods and systems that can utilize both.
BACKGROUND OF THE INVENTION
Typically today, there are many different types of content frameworks to represent content, and many different types of document formats to format various types of documents. Many times, each of these frameworks and formats requires its own associated software in order to build, produce, process or consume an associated document. For those who have the particular associated software installed on an appropriate device, building, producing, processing or consuming associated documents is not much of a problem. For those who do not have the appropriate software, building, producing, processing or consuming associated documents is typically not possible.
Against this backdrop, there is a continuing need for ubiquity insofar as production and consumption of documents is concerned.
SUMMARY OF THE INVENTION
Modular content framework and document format methods and systems are described. The described framework and format define a set of building blocks for composing, packaging, distributing, and rendering document-centered content. These building blocks define a platform-independent framework for document
formats that enable software and hardware systems to generate, exchange, and display documents reliably and consistently. The framework and format have been designed in a flexible and extensible fashion.
In addition to this general framework and format, a particular format, known as the reach package format, is defined using the general framework. The reach package format is a format for storing paginated documents. The contents of a reach package can be displayed or printed with full fidelity among devices and applications in a wide range of environments and across a wide range of scenarios.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 is a block diagram of components of an exemplary framework and format in accordance with one embodiment.
Fig. 2 is a block diagram of an exemplary package holding a document comprising a number of parts in accordance with one embodiment.
Fig. 3 is a block diagram that illustrates an exemplary writer that produces a package, and a reader that reads the package, in accordance with one embodiment.
Fig. 4 illustrates an example part that binds together three separate pages.
Fig. 5 is a diagram that illustrates an exemplary selector and sequences arranged to produce a financial report containing both an English representation and a French representation of the report, in accordance with one embodiment.
Fig. 6 illustrates some examples of writers and readers working together to communicate about a package, in accordance with one embodiment.
Fig. 7 illustrates an example of interleaving multiple parts of a document.
Figs. 8 and 9 illustrate different examples of packaging the multiple parts of the document shown in Fig. 7.
Fig. 10 illustrates an exemplary reach package and each of the valid types of parts that can make up or be found in a package, in accordance with one embodiment.
Fig. 11 illustrates an exemplary mapping of Common Language Runtime concepts to XML in accordance with one embodiment.
Fig. 12 illustrates both upright and sideways glyph metrics in accordance with one embodiment.
Fig. 13 illustrates a one-to-one cluster map in accordance with one embodiment.
Fig. 14 illustrates a many-to-one cluster map in accordance with one embodiment.
Fig. 15 illustrates a one-to-many cluster map hi accordance with one embodiment.
Fig. 16 illustrates a many-to-many cluster map in accordance with one embodiment.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
Overview
This document describes a modular content framework and document format. The framework and format define a set of building blocks for composing, packaging, distributing, and rendering document-centered content. These building blocks define a platform-independent framework for document formats that enable software and hardware systems to generate, exchange, and display documents reliably and consistently. The framework and format have been designed in a flexible and extensible fashion. In various embodiments, there is no restriction to
the type of content that can be included, how the content is presented, or the platform on which to build clients for handling the content.
In addition to this general framework, a particular format is defined using the general framework. This format is referred to as the reach package format in this document, and is a format for storing paginated or pre-paginated documents. The contents of a reach package can be displayed or printed with full fidelity among devices and applications in a wide range of environments and across a wide range of scenarios.
One of the goals of the framework described below is to ensure the interoperability of independently-written software and hardware systems reading or writing content produced in accordance with the framework and format described below. In order to achieve this interoperability, the described format defines formal requirements that systems that read or write content must satisfy.
The discussion below is organized along the following lines and presented in two main sections—one entitled "The Framework" and one entitled "The Reach Package Format".
The section entitled "The Framework" presents an illustrative packaging model and describes the various parts and relationships that make up framework packages. Information about using descriptive metadata in framework packages is discussed, as well as the process of mapping to physical containers, extending framework markup, and the use of framework versioning mechanisms.
The section entitled "The Reach Package Format" explores the structure of one particular type of framework-built package referred to as the reach package. This section also describes the package parts specific to a fixed payload and defines a reach package markup model and drawing model. This section concludes with
exemplary reach markup elements and their properties along with illustrated
samples.
As a high level overview of the discussion that follows, consider Fig. 1 which illustrates aspects of the inventive framework and format generally at 100. Certain exemplary components of the framework are illustrated at 102, and certain components of the reach package format are illustrated at 104.
Framework 102 comprises exemplary components which include, without limitation, a relationship component, a pluggable containers component, an interleaving/streaming component and a versioning/extensibility component, each of which is explored in more detail below. Reach package format 104 comprises components which include a selector/sequencer component and a package markup definition component.
In the discussion that follows below, periodic reference will be made back to Fig. 1 so that the reader can maintain perspective as to where the described components fit in the framework and package format.
THE FRAMEWORK
In the discussion that follows, a description of a general framework is provided. Separate primary sub-headings include "The Package Model", "Composition Parts: Selector and Sequence", "Descriptive Metadata", "Physical Model", "Physical Mappings" and "Versioning and Extensibility". Each primary sub-heading has one or more related sub-headings.
The Package Model
This section describes the package model and includes sub-headings that describe packages and parts, drivers, relationships, package relationships and the start part.
Packages and Parts
In the illustrated and described model, content is held within a package. A package is a logical entity that holds a collection of related parts. The package's purpose is to gather up all of the pieces of a document (or other types of content) into one object that is easy for programmers and end-users to work with. For example, consider Fig. 2 which illustrates an exemplary package 200 holding a document comprising a number of parts including an XML markup part 202 representing the document, a font part 204 describing a font that is used in the document, a number of page parts 206 describing pages of the document, and a picture part representing a picture within the document. The XML markup part 202 that represents a document is advantageous in that it can permit easy searchability and referencing without requiring the entire content of a package to be parsed. This will become more apparent below.
Throughout this document the notion of readers (also referred to as consumers) and writers (also referred to as producers) is introduced and discussed. A reader, as that term is used in this document, refers to an entity that reads modular content format-based files or packages. A writer, as that term is used in this document, refers to an entity that writes modular content format-based files or packages. As an example, consider Fig. 3, which shows a writer that produces a package and a reader that reads a package. Typically, the writer and reader will be embodied as software. In at least one embodiment, much of the processing
overhead and complexities associated with creating and formatting packages is placed on the writer. This, in turn, removes much of the processing complexity and overhead from readers which, as will be appreciated by the skilled artisan, is a departure from many current models. This aspect will become apparent below.
In accordance with at least one embodiment, a single package contains one or more representations of the content held in the package. Often a package will be a single file, referred to in this application as a container. This gives end-users, for example, a convenient way to distribute their documents with all of the component pieces of the document (images, fonts, data, etc.). While packages often correspond directly to a single file, this is not necessarily always so. A package is a logical entity that may be represented physically in a variety of ways (e.g., without limitation, in a single file, a collection of loose files, in a database, ephemerally in transit over a network connection, etc.). Thus containers hold packages, but not all packages are stored in containers.
An abstract model describes packages independently of any physical storage mechanism. For example, the abstract model does not refer to "files", "streams", or other physical terms related to the physical world in which the package is located. As discussed below, the abstract model allows users to create drivers for various physical formats, communication protocols, and the like. By analogy, when an application wants to print an image, it uses an abstraction of a printer (presented by the driver that understands the specific kind of printer). Thus, the application is not required to know about the specific printing device or how to communicate with the printing device.
A container provides many benefits over what might otherwise be a collection of loose, disconnected files. For example, similar components may be aggregated and content may be indexed and compressed. In addition, relationships
between components may be identified and rights management, digital signatures, encryption and metadata may be applied to components. Of course, containers can be used for and can embody other features which are not specifically enumerated above.
Common Part Properties
In the illustrated and described embodiment, a part comprises common properties (e.g., name) and a stream of bytes. This is analogous to a file in a file system or a resource on an HTTP server. In addition to its content, each part has some common part properties. These include a name — which is the name of the part, and a content type - which is the type of content stored in the part. Parts may also have one or more associated relationships, as discussed below.
Part names are used whenever it is necessary to refer hi some way to a part. In the illustrated and described embodiment, names are organized into a hierarchy, similar to paths on a file system or paths in URIs. Below are examples of part names:
/document.xml /tickets/ticket.xml /images/march/summer.jpeg /pages/page4.xml
As seen above, in this embodiment, part names have the following characteristics:
Part names are similar to file names in a traditional file system. Part names begin with a forward slash ('/').
Like paths in a file-system or paths in a URI, part names can be organized into a hierarchy by a set of directory-like names (tickets, images/march and pages in the above examples). This hierarchy is composed of segments delineated by slashes. The last segment of the name is similar to a filename a traditional file-system.
It is important to note that the rales for naming parts, especially the valid characters that can be used for part names, are specific to the framework described in this document. These part name rules are based on internet-standard URI naming rules. In accordance with this embodiment, the grammar used for specifying part names in this embodiment exactly matches abs_path syntax defined in Sections 3.3 (Path Component) and 5 (Relative URI References) of RFC2396, (Uniform Resource Identifiers (URI: Generic Syntax) specification.
The following additional restrictions are applied to absjpath as a valid part
name:
• Query Component, as it is defined in Sections 3 (URI Syntactic
Components) and 3.4 (Query Component), is not applicable to a part
name.
• Fragment identifier, as it is described in Section 4.1 (Fragment
Identifier), is not applicable to a part name.
• It is illegal to have any part with a name created by appending * ( "/"
segment) to the part name of an existing part.
Grammar for part names is shown below:
(Table Removed)
The segments of the names of all parts in a package can be seen to form a tree. This is analogous to what happens in file systems, in which all of the non-leaf nodes in the free are folders and the leaf nodes are the actual files containing content. These folder-like nodes (i.e., non-leaf nodes) in the name tree serve a similar function of organizing the parts in the package. It is important to remember, however, that these "folders" exist only as a concept in the naming hierarchy - they have no other manifestation in the persistence format.
Part names can not live at the "folder" level. Specifically, non-leaf nodes in the part naming hierarchy ("folder") cannot contain a part and a subfolder with the same name.
In the illustrated and described embodiment, every part has a content type which identifies what type of content is stored in a part. Examples of content types include:
image/jpeg
text/xml
text/plain; charset="us-ascii"
Content types are used in the illustrated framework as defined in RFC2045 (Multipurpose Internet Mail Extensions; (MIME)). Specifically, each content type includes a media type (e.g., text), a subtype (e.g., plain) and an optional set of parameters in key=value form (e.g., charset="us-ascii"); multiple parameters are separated by semicolons.
Part Addressing
Often parts will contain references to other parts. As a simple example, imagine a container with two parts: a markup file and an image. The markup file will want to hold a reference to the image so that when the markup file is processed, the associated image can be identified and located. Designers of content types and XML schemas may use URIs to represent these references. To make this possible, a mapping between the world of part names and world of URIs needs to be defined.
In order to allow the use of URIs in a package, a special URI interpretation rule must be used when evaluating URIs in package-based content: the package itself should be treated as the "authority" for URI references and the path component of the URI is used to navigate the part name hierarchy in the package.
For example, given a package URI of
http://www.exainple.com/foo/something.package, a reference to /abc/bar.xml is interpreted to mean the part called /abc/bar.xml, not the URI http://www.example.com/abc/bar.xml.
Relative URIs should be used when it is necessary to have a reference from one part to another in a container. Using relative references allows the contents of the container to be moved together into a different container (or into the container from, for example, the file system) without modifying the cross-part references.
Relative references from a part are interpreted relative to the "base URI" of the part containing the reference. By default, the base URI of a part is the part's name.
Consider a container which includes parts with the following names:
/markup/page.xml /images/picture.jpeg
/images/other_picture.jpeg
If the "/markup/page.xrnl" part contains a URI reference to "../images/picture.jpeg", then this reference must be interpreted as referring to the part name "/images/picture.jpeg", according to the rules above.
Some content types provide a way to override the default base URI by specifying a different base in the content. In the presence of one of these overrides, the explicitly specified base URI should be used instead of the default.
Sometimes it is useful to "address" a portion or specific point in a part. In the URI world, a fragment identifier is used [see, e.g. RFC2396]. In a container, the mechanism works the same way. Specifically, the fragment is a string that contains additional information that is understood in the context of the content type of the addressed part. For example, in a video file a fragment might identify a frame, in an XML file it might identify a portion of the XML file via an xpath.
A fragment identifier is used in conjunction with a URI that addresses a part to identify fragments of the addressed part. The fragment identifier is optional and is separated from the URI by a Crosshatch ("#") character. As such, it is not part of a URI, but is often used in conjunction with a URI.
The following discussion provides some guidance for part naming, as the package and pant naming model is quite flexible. This flexibility allows for a wide range of applications of a framework package. However, it is important to recognize that the framework is designed to enable scenarios in which multiple, unrelated software systems can manipulate "their own" parts of a package without colliding with each other. To allow this, certain guidelines are provided which, if followed, make this possible.
The guidelines given here describe a mechanism for minimizing or at least reducing the occurrences of part naming conflicts, and dealing with them when they do arise. Writers creating parts in a package must take steps to detect and handle naming conflicts with existing parts in the package. In the event that a name conflict arises, writers may not blindly replace existing parts.
In situations where a package is guaranteed to be manipulated by a single writer, that writer may deviate from these guidelines. However, if there is a possibility of multiple independent writers sharing a package, all writers must follow these guidelines. It is recommended, however, that all writers follow these guidelines in any case.
• It is required that writers adding parts into an existing container do so
in a new "folder" of the naming hierarchy, rather than placing parts
directly in the root, or in a pre-existing folder. In mis way, the
possibility of name conflicts is limited to the first segment of the part
name. Parts created within this new folder can be named without
risking conflicts with existing parts.
• In the event that the "preferred" name for the folder is already used by
an existing part, a writer must adopt some strategy for choosing
alternate folder names. Writers should use the strategy of appending
digits to the preferred name until an available folder name is found
(possibly resorting to a GUID after some number of unsuccessful
iterations).
• One consequence of this policy is that readers must not attempt to
locate a part via a "magic" or "well known" part name. Instead,
writers must create a package relationship to at least one part in each
folder they create. Readers must use these package relationships to
locate the parts rather than relying on well known names.
• Once a reader has found at least one part in a folder (via one of the
aforementioned package relationships) it may use conventions about
well known part names within that folder to find other parts.
Drivers
The file format described herein can be used by different applications, different document types, etc. - many of which have conflicting uses, conflicting formats, and the like. One or more drivers are used to resolve various conflicts, such as differences in file formats, differences in communication protocols, and the like. For example, different file formats include loose files and compound files, and different communication protocols include http, network, and wireless protocols. A group of drivers abstract various file formats and communication protocols into a single model. Multiple drivers can be provided for different scenarios, different customer requirements, different physical configurations, etc.
Relationships
Parts in a package may contain references to other parts in that package. In general, however, these references are represented inside the referring part in ways that are specific to the content type of the part; that is, in arbitrary markup or an application-specific encoding. This effectively hides the internal linkages between parts from readers that don't understand the content types of the parts containing such references.
Even for common content types (such as the Fixed Payload markup described in the Reach Package section), a reader would need to parse all of the content in a part to discover and resolve the references to other parts. For example, when implementing a print system that prints documents one page at a time, it may be desirable to identify pictures and fonts contained in the particular page. Existing systems must parse all information for each page, which can be time consuming, and must understand the language of each page, which may not be the situation with certain devices or readers (e.g., ones that are performing intermediate processing on
the document as it passes through a pipeline of processors on the way to a device). Instead, the systems and methods described herein use relationships to identify relationships between parts and to describe the nature of those relationships. The relationship language is simple and defined once so that readers can understand relationships without requiring knowledge of multiple different languages. In one embodiment, the relationships are represented in XML as individual parts. Each part has an associated relationship part that contains the relationships for which the part is a source,
For example, a spreadsheet application uses this format and stores different spreadsheets as parts. An application that knows nothing about the spreadsheet language can still discover various relationships associated with the spreadsheets. For example, the application can discover images in the spreadsheets and metadata associated with the spreadsheets. -An example relationship schema is provided below:
(Figure Removed)
This schema defines two XML elements, one called "relationships" and one called "relationship." This "relationship" element is used to describe a single relationship as described herein and has the following attributes: (1) "target," which indicates the part to which the source part is related, (2) "name" which indicates the type or nature of the relationship. The "relationships" element is defined to allow it to hold zero or more "relationship" elements and serves simply to collect these "relationship" elements together in a unit.
The systems and methods described herein introduce a higher-level mechanism to solve these problems called "relationships". Relationships provide an additional way to represent the kind of connection between a source part and a target part in a package. Relationships make the connections between parts directly "discoverable" without looking at the content in the parts, so they are independent of content-specific schema and faster to resolve. Additionally, these relationships are protocol independent. A variety of different relationships may be associated with a particular part.
Relationships provide a second important function: allowing parts to be related without modifying them. Sometimes this information serves as a form of "annotation" where the content type of the "annotated" part does not define a way to attach the given information. Potential examples include attached descriptive metadata, print tickets and true annotations. Finally, some scenarios require information to be attached to an existing part specifically without modifying that part - for example, when the part is encrypted and can not be decrypted or when the part is digitally signed and changing it would invalidate the signature. In another example, a user may want to attach an annotation to a JPEG image file. The JPEG image format does not currently provide support for identifying annotations. Changing the JPEG format to accommodate this user's desire is not practical.
However, the systems and methods discussed herein allow the user to provide an annotation to a JPEG file without modifying the JPEG image format.
In one embodiment, relationships are represented using XML in relationship parts. Each part in the container that is the source of one or more relationships has an associated relationship part. This relationship part holds (expressed in XML using the content type application/PLACEHOLDER) the list of relationships for that source part.
Fig. 4 below shows an environment 400 in which a "spine" part 402 (similar to a FixedPanel) binds together three pages 406, 408 and 410. The set of pages bound together by the spine has an associated "print ticket" 404. Additionally, page 2 has its own print ticket 412. The connections from the spine part 402 to its print ticket 404 and from page 2 to its print ticket 412 are represented using relationships. In the arrangement of Fig. 4, the spine part 402 'would have an associated relationship part which contained a relationship that connects the spine to ticketl, as shown in the example below.
Attributes: None
Allowed Child Elements: -
Element:
Attributes: None
Allowed Child Elements:
-
Element:
-
Attributes: Target — the part name of a part in the composition
As an example, here is the XML for the example of Fig. 5 above:
MainOocument. XML
EnglishRollup.XML
FrenchRollup.XML
-
-
In this XML, MainDocument.xml represents an entire part in the package and indicates, by virtue of the "selection" tag, that a selection is to be made between different items encapsulated by the "item" tag, i.e., the "EnglishRollup.xml" and the "FrenchRollup.xml".
The EnglishRollup.xml and FrenchRollup.xml are, by virtue of the "sequence" tags, sequences that sequence together the respective items encapsulated by their respective "item" tags.
Thus, a simple XML grammar is provided for describing selectors and sequences. Each part in this composition block is built and performs one
operation—either selecting or sequencing. By using a hierarchy of parts, different robust collections of selections and sequences can be built.
Composition Block
The composition block of a package comprises the set of all composition parts (selector or sequence) that are reachable from the starting part of the package. If the starting part of the package is neither a selector nor a sequence, then the composition block is considered empty. If the starting part is a composition part, then the child - s in those composition parts are recursively traversed to produce a directed, acyclic graph of the composition parts (stopping traversal when a non-composition part is encountered). This graph is the composition block (and it must, in accordance with this embodiment, be acyclic for the package to be valid).
Determining Composition Semantics
Having established the relatively straight forward XML grammar above, the following discussion describes a way to represent the information such that selections can be made based on content type. That is, the XML described above provides enough information to allow readers to locate the parts that are assembled together into a composition, but does not provide enough information to help a reader know more about the nature of the composition. For example, given a selection that composes two parts, how does a reader know on what basis (e.g., language, paper size, etc.) to make the selection? The answer is that these rules are associated with the content type of the composition part. Thus, a selector part that is used for picking between representations based on language will have a different associated content type from a selector part that picks between representations based on paper sizes.
The general framework defines the general form for these content types:
Application/XML+Selector-SOMETHING Application/XML+Sequence-SOMETHING
The SOMETHING in these content types is replaced by a word that indicates the nature of the selection or sequence, e.g. page size, color, language, resident software on a reader device and the like. In this framework then, one can invent all kinds of selectors and sequences and each can have very different semantics.
The described framework also defines the following well-known content types for selectors and sequences that all readers or reading devices must understand.
(Table Removed)
As an example, consider the following. Assume a package contains a document that has a page, and in the middle of the page there is an area in which a video is to appear. In this example, a video part of the page might comprise video in the form of a Quicktime video. One problem with this scenario is that Quicktime videos are not universally understood. Assume, however, that in accordance with this framework and, more particularly, the reach package format described below, there is a universally understood image format—JPEG. When producing the package that contains the document described above, the producer might, in addition to defining the video as a part of the package, define a JPEG image for the
page and interpose a SupportedContentType selector so that if the user's computer has software that understands the QuickTime video, the Quicktime video is selected, otherwise the JPEG image is selected.
Thus, as described above, the framework-level selector and sequence components allow a robust hierarchy to be built which, in this example, is defined in XML. In addition, there is a well-defined way to identify the behaviors of selectors and sequences using content types. Additionally, in accordance with one embodiment, the general framework comprises one particular content type that is predefined and which allows processing and utilization of packages based on what a consumer (e.g. a reader or reading device) does and does not understand.
Other composition part content types can be defined using similar rules, examples of which are discussed below.
Descriptive Metadata
In accordance with one embodiment, descriptive metadata parts provide writers or producers of packages with a way in which to store values of properties that enable readers of the packages to reliably discover the values. These properties are typically used to record additional information about the package as a whole, as well as individual parts within the container. For example, a descriptive metadata part in a packaige might hold information such as the author of the package, keywords, a summary, and the like.
In the illustrated and described embodiment, the descriptive metadata is expressed in XML, is stored in parts with well-known content types, and can be found using well-known relationship types.
Descriptive metadata holds metadata properties. Metadata properties are represented by a property name and one or many property values. Property values
have simple data types, so each data type is described by a single XML qname. The fact that descriptive metadata properties have simple types does not mean that one cannot store data with complex XML types in a package. In this case, one must store the information as a full XML part. When this is done, all constraints about only using simple types are removed, but the simplicity of the "flat" descriptive metadata property model is lost.
In addition to the general purpose mechanism for defining sets of properties, there is a specific, well-defined set of document core properties, stored using this mechanism. These document core properties are commonly used to describe documents and include properties like title, keywords, author, etc.
Finally, metadata parts holding these document core properties can also hold additional, custom-defined properties in addition to the document core properties.
Metadata Format
In accordance with one embodiment, descriptive metadata parts have a content type and are targeted by relationships according to the following rules:
(Table Removed)
The following XML pattern is used to represent descriptive metadata in accordance with one embodiment. Details about each component of the markup are given in the table after the sample.
_ value _
(Table Removed)
Document Core Properties
The following is a table of document core properties that includes the name of the property, the property type and a description.
(Table Removed)
Physical Model
The physical model defines various ways in which a package is used by writers and readers. This model is based on three components: a writer, a reader and a pipe between them. Fig. 6 shows some examples of writers and readers working together to communicate about a package.
The pipe carries data from the writer to the reader. In many scenarios, the pipe can simply comprise the API calls that the reader makes to read the package from the local file system. This is referred to as direct access.
Often, however, the reader and the writer must communicate with each other over some type of protocol. This communication happens, for example, across a process boundary or between a server and a desktop computer. This is referred to as networked access and is important because of the communications characteristics of the pipe (specifically, the speed and request latency).
In order to enable maximum performance, physical package designs must consider support in three important areas: access style, layout style and communication style.
Access Style
Streaming Consumption
Because communication between the writer and the reader using networked access is not instantaneous, it is important to allow for progressive creation and consumption of packages. In particular, it is recommended, in accordance with this embodiment, that any physical package format be designed to allow a reader to begin interpreting and processing the data it receives the data (e.g., parts), before all of the bits of the package have been delivered through the pipe. This capability is called streaming consumption.
Streaming Creation
When a writer begins to create a package, it does not always know what it will be putting in the package. As an example, when an application begins to build a print spool file package, it may not know how many pages will need to be put into
the package. As another example, a program on a server that is dynamically generating a report may not realize how long the report will be or how many pictures the report will have - until it has completely generated the report. In order to allow writers like this, physical packages should allow writers to dynamically add parts after other parts have already been added (for example, a writer must not be required to state up front how many parts it will be creating when it starts writing). Additionally, physical packages should allow a writer to begin writing the contents of a part without knowing the ultimate length of that part. Together, these requirements enable streaming creation.
Simultaneous Creation and Consumption
In a highly-pipelined architecture, streaming creation and streaming consumption can occur simultaneously for a specific package. When designing a physical package, supporting streaming creation and supporting streaming consumption can push a design in opposite directions. However, it is often possible to find a design that supports both. Because of the benefits in a pipelined architecture, it is recommended that physical packages support simultaneous creation and consumption.
Layout Styles
Physical packages hold a collection of parts. These parts can be laid out in one of two styles: simple ordering and interleaved. With simple ordering, the parts in the package are laid out with a defined ordering. When such a package is delivered in a pure linear fashion, starting with the first byte in the package through to the last, all of the bytes for the first part arrive first, then all of the bytes for the second part, and so on.
With interleaved layout, the bytes of the multiple parts are interleaved, allowing for improved performance in certain scenarios. Two scenarios that benefit significantly from interleaving are multi-media playback (e.g., delivering video and audio at the same time) and inline resource reference (e.g., a reference in the middle of a markup file to an image).
Interleaving is handled through a special convention for organizing the contents of interleaved parts. By breaking parts into pieces and interleaving these pieces, it is possible to achieve the desired results of interleaving while still making it possible to easily reconstruct the original larger part. To understand how interleaving works, Fig. 7 illustrates a simple example involving two parts: contentxml 702 and image.jpeg 704. The first part, contentxml, describes the contents of a page and in the middle of that page is a reference to an image (image.jpeg) that should appear on the page.
To understand why interleaving is valuable, consider how these parts would be arranged in a package using simple ordering, as shown in Fig. 8. A reader that is processing this package (and is receiving bytes sequentially) will be unable to display the picture until it has received all of the content.xml part as well as the image.jpeg. In some circumstances (e.g., small or simple packages, or fast communications links) this may not be a problem. In other circumstances (for example, if content.xml was very large or the communications link was very slow), needing to read through all of the content.xml part to get to the image will result in unacceptable performance or place unreasonable memory demands on the reader system.
In order to achieve closer to ideal performance, it would be nice to be able to split the content.xml part and insert the image.jpeg part into the middle, right after where the picture is referenced. This would allow the reader to begin processing
the image earlier: as soon as it encounters the reference, the image data follows. This would produce, for example, the package layout shown in Fig. 9. Because of the performance benefits, it is often desirable that physical packages support interleaving. Depending on the kind of physical package being used, interleaving may or may not. be supported. Different physical packages may handle the internal representation of interleaving differently. Regardless of how the physical package handles interleaving, it's important to remember that interleaving is an optimization that occurs at the physical level and a part that is broken into multiple pieces in the physical file is still one logical part; the pieces themselves aren't parts.
Communication Styles
Communication between writer and reader can be based on sequential delivery of parts or by random-access to parts, allowing them to be accessed out of order. Which of these communication styles is utilized depends on the capabilities of both the pipe and the physical package format. Generally, all pipes will support sequential delivery. Physical packages must support sequential delivery. To support random-access scenarios, both the pipe in use and the physical package must support random-access. Some pipes are based on protocols that can enable random access (e.g., HTTP 1.1 with byte-range support). In order to allow maximum performance when these pipes are in use, it is recommended that physical packages support random-access. In the absence of this support, readers will simply wait until the parts they need are delivered sequentially.
Physical Mappings
The logical packaging model defines a package abstraction; an actual instance of a package is based on some particular physical representation of a
package. The packaging model may be mapped to physical persistence formats, as well as to various transports (e.g., network-based protocols). A physical package format can be described as a mapping from the components of the abstract packaging model to the features of a particular physical format. The packaging model does not specify which physical package formats should be used for archiving, distributing, or spooling packages. In one embodiment, only the logical structure is specified. A package may be "physically" embodied by a collection of loose files, a .ZIP file archive, a compound file, or some other format. The format chosen is supported by the targeted consuming device, or by a driver for the device.
Components Being Mapped
Each physical package format defines a mapping for the following components. Some components are optional and a specific physical package format may not support these optional components.
(Table Removed)
Common Mapping Patterns
(Table Removed)
There exist many physical storage formats whose features partially match the packaging-model components. In defining mappings from the packaging model to such storage formats, it may be desirable to take advantage of any similarities in capabilities between the packaging model and the physical storage medium, while using layers of mapping to provide additional capabilities not inherently present in the physical storage medium. For example, some physical package formats may store individual parts as individual files in a file system. In such a physical format, it would be natural to map many part names directly to identical physical file names. Part names using characters which are not valid file system file names may require some kind of escaping mechanism.
In many cases, a single common mapping problem may be faced by the designers of different physical package formats. Two examples of common mapping problems arise when associating arbitrary Content Types with parts, and when supporting the Interleaved layout style. This specification suggests common solutions to such common mapping problems. Designers of specific physical package formats may be encouraged, but are not required, to use the common mapping solutions defined here.
Identifying Content Types of Parts
Physical package format mappings define a mechanism for storing a content type for each part. Some physical package formats have a native mechanism for representing content types (for example, the "Content-Type" header in MIME). For such physical packages, it is recommended that the mapping use the native mechanism to represent content types for parts. For other physical package formats, some other mechanism is used to represent content types. The recommended mechanism for representing content types in these packages is by including a
specially-named XML stream in the package, known as the types stream. This stream is not a part, and is therefore not itself URI-addressable. However, it can be interleaved in the physical package using the same mechanisms used for interleaving parts.
The types stream contains XML with a top level "Types" element, and one or more "Default" and "Override" sub-elements. The "Default" elements define default mappings from part name extensions to content types. This takes advantage of the fact that file extensions often correspond to content type. "Override" elements are used to specify content types on parts that are not covered by, or are not consistent with, the default mappings. Package writers may use "Default" elements to reduce the number of per-part "Override" elements, but are not required to do so.
The "Default" element has the following attributes:
(Table Removed)
The "Override" element has the following attributes:
• Na • Descript
me ion equired
_ e A part name
part URI An"Override"
element matches es
the part whose name equals this attribute's value.
o A content type as defined in
tentType RFC2045. Indicates
the content type of the matching part.
The following is an example of the XML contained in a types stream:
The following table shows a sample list of parts, and their corresponding content types as defined by the above types stream:
Part Name • Content Type
/a/b/samplel.t • plain/text
• /a/b/sample2.j • image/jpeg
peg
• /a/b/sample3.p • image/gif
icture
• /a/b/sample4.p • image/jpeg
icture
For every part in the package, the types stream contains either (a) one matching "Default" element, (b) one matching "Override" element, or (c) both a matching "Default" element and a matching "Override" element (in which case the "Override" element takes precedence). In general there is, at most, one "Default" element for any given extension, and one "Override" element for any given part name.
The order of "Default" and "Override" elements in the types stream is not significant. However, in interleaved packages, "Default" and "Override" elements appear in the physical package before the part(s) they correspond to.
Interleaving
Not all physical packages support interleaving of the data streams of parts natively. In one embodiment, a mapping to any such physical package uses the general mechanism described in this section to allow interleaving of parts. The general mechanism works by breaking the data stream of a part into multiple pieces
that can then be interleaved with pieces of other parts, or whole parts. The individual pieces of a part exist in the physical mapping and are not addressable in the logical packaging model. Pieces may have a zero size.
The following unique mapping from a part name to the names for the individual pieces of a part is defined, such that a reader can stitch together the pieces in their original order to form the data stream of the part.
Grammar for deriving piece names for a given part name:
piece_name = part_name "/" "[" l*digit "]" [ ".last" ] ".piece"
The following validity constraints exist for piece_names generated by the grammar:
• The piece numbers start with 0, and are positive, consecutive integer
numbers. Piece numbers can be left-zero-padded.
• The last piece of the set of pieces of a part contains the ".last" in the
piece name before ".piece".
• The piece name is generated from the name of the logical part before
mapping to names in the physical package.
Although it is not necessary to store pieces in their natural order, such storage may provide optimal efficiency. A physical package containing interleaved (pieced) parts can also contain non-interleaved (one-piece) parts, so the following example would be valid:
spine.xaml/[0].piece
pages/page0.xaml
spine.xaml/[1].piece
pages/pagel.xaml
spine.xaml/[2].last.piece
pages/page2.xaml
Specific Mappings
The following defines specific mappings for the following physical formats: Loose files in a. Windows file system.
Mapping to Loose Files in a Windows file system
In order to better understand how to map elements of the logical model to a physical format, consider the basic case of representing a Metro package as a collection of loose files in a Windows file system. Each part in the logical package will be contained in a separate file (stream). Each part name in the logical model corresponds to the name of the file.
(Table Removed)
The part names are translated into valid Windows file names, as illustrated by the table below.
Given below are two character sets that are valid for logical part name segments (URI segments) and for Windows filenames. This table reveals two important things:
There are two valid URI symbols colon (:) and asterisk (*) which we need to escape when converting a URI to a filename.
There are valid filename symbols ^ {} [] # which cannot be present in a URI (they can be used for special mapping purposes, like interleaving).
"Escaping" is used as a technique to produces valid filename characters when a part name contains a character that can not be used in a file name. To escape a character, the caret symbol (^) is used, followed by the hexadecimal representation of the character.
To map from an abs_path (part name) to a file name:
remove first /
convert all / to escape colon and asterisk characters
For example, the part name /a:b/c/d*.xaml becomes the following file name a^25b\c\d^2a.xaml.
To perform the reverse mapping:
convert all \ to /
add / to the beginning of the string
unescape characters by replacing^[hexCode] with the corresponding character
(Table Removed)
Versioning and Extensibility
Like other technical specifications, the specification contained herein may evolve with future enhancements. The design of the first edition of this specification includes plans for the future interchange of documents between software systems written based on the first edition, and software systems written for future editions. Similarly, this specification allows for third-parties to create extensions to the specification. Such an extension might, for example, allow for the construction of a document which exploits a feature of some specific printer, while still retaining compatibility with other readers that are unaware of that printer's existence.
Documents using new versions of the Fixed Payload markup, or third-party extensions to the markup, require readers to make appropriate decisions about behavior (e.g., how to render something visually). To guide readers, the author of a document (or the tool that generated the document) should identify appropriate
behavior for readers encountering otherwise-unrecognized elements or attributes. For Reach documents, this type of guidance is important.
New printers, browsers, and other clients may implement a variety of support for future features. Document authors exploiting new versions or extensions must carefully consider the behavior of readers unaware of those versions of extensions.
Versioning Namespace
XML markup recognition is based on namespace URIs. For any XML-namespace, a reader is expected to recognize either all or none of the XML-elements and XML-attributes defined in that namespace. If the reader does not recognize the new namespace, the reader will need to perform fallback rendering operations as specified within the document.
The XML namespace URI 'http://PLACEHOLDER/version-contror includes the XML elements and attributes used to construct Fixed payload markup that is version-adaptive and extensions-adaptive. Fixed Payloads are not required to have versioning elements within them. In order to build adaptive content, however, one must use at least one of the and XML-elements.
This Fixed-Payload markup specification has an xmlns URI associated with it: 'http://PLACEHOLDER/pdl'. Using this namespace in a Fixed Payload will indicate to a reader application that only elements defined in this specification will be used. Future versions of this specification will have their own namespaces. Reader applications familiar with the new namespace will know how to support the superset of elements of attributes defined in previous versions. Reader applications that are not familiar with the new version will consider the URI of the new version as if it were the URI of some unknown extension to the PDL. These applications
may not know that a relationship exists between the namespaces, that one is a superset of the other.
Backward and "Forward" Compatibility
In the context of applications or devices supporting the systems and methods discussed herein, compatibility is indicated by the ability of clients to parse and display documents that were authored using previous versions of the specification, or unknown extensions or versions of the specification. Various versioning mechanisms address "backward compatibility," allowing future implementations of clients to be able to support documents based on down-level versions of the specification, as illustrated below.
When an implemented client, such as a printer, receives a document built using a future version of the markup language, the client will be able to parse and understand the available rendering options. The ability of client software written according to an older version of a specification to handle some documents using features of a newer version is often called "forward compatibility." A document written to enable forward compatibility is described as "version-adaptive."
Further, because implemented clients will also need to be able to support documents that have unknown extensions representing new elements or properties, various semantics support the more general case of documents that are "extension adaptive."
If a printer or viewer encounters extensions that are unknown, it will look for information embedded alongside the use of the extension for guidance about adaptively rendering the surrounding content. This adaptation involves replacing unknown elements or attributes with content that is understood. However, adaptation can take other forms, including purely ignoring unknown content. In the
absence of explicit guidance, a reader should treat the presence of an unrecognized extension in the markup as an error-condition. If guidance is not provided, the extension is presumed to be fundamental to understanding the content. The rendering failure will be captured and reported to the user.
To support this model, new and extended versions of the markup language should logically group related extensions in namespaces. In this way, document authors will be able to take advantage of extended features using a minimum number of namespaces.
Versioning Markup
The XML vocabulary for supporting extension-adaptive behavior includes the following elements:
• Versioning Element * Description
and Hierarchy
• • Controls how the
parser reacts to an unknown element or attribute.
• • Declares that the
associated namespace URI is ignorable.
• • Declares that if an
element is ignored, the
contents of the element will be processed as if it was contained by the container of the ignored element.
• • Indicates to the
document editing tools
whether ignorable content should be preserved when
the document is modified.
• • Reverses the effect of
an element declared
ignorable.
• • In markup that
exploits version ing/extension features, the
element associates substitute "fallback" markup to be used by reader applications that are not able to handle the markup specified as Preferred.
• • Specifies preferred
content. This content will that a client is aware of version/extension features.
• • For down-level
clients, specifies the 'down-level' content to be substituted for the preferred content.
The Element
Compatibility.Rules can be attached to any element that can hold an attached attribute, as well as to the Xaml root element. The element controls how the parser reacts to unknown elements or attributes. Normally such items are reported as errors. Adding an Ignorable element to a Compatibilitiy.Rules property informs the compiler that items from certain namespaces can be ignored.
Compatibility.Rules can contain the elements Ignorable and MustUnderstand. By default, all elements and attributes are assumed to be MustUnderstand. Elements and attributes can be made Ignorable by adding an Ignorable element into its container's Compatibility.Rules property. An element or property can be made MustUnderstand again by adding a MustUnderstand element
to one of the nested containers. One Ignorable or MustUnderstand refers to a particular namespace URI within the same Compatibility.Rules element.
The Compatibility.Rules> element affects the contents of a container, not the container's own tag or attributes. To affect a container's tag or attributes, its container must contain the compatibility rules. The Xaml root element can be used to specify compatibility rules for elements that would otherwise be root elements, such as Canvas. The Compatibility .Rules compound attribute is the first element in a container.
The Element
The element declares that the enclosed namespace URI is ignorable. An item can be considered ignorable if an tag is declared ahead of the item in the current block or a container block, and the namespace URI is unknown to the parser. If the URI is known, the Ignorable tag is disregarded and all items are understood. In one embodiment, all items not explicitly declared as Ignorable must be understood. The Ignorable element can contain and elements, which are used to modify how an element is ignored as well as give guidance to document editing tools how such content should be preserved in edited documents.
The Element
The element declares that if an element is ignored, the contents of the element will be processed as if it was contained by the container of the ignored element.
Attributes
• Attri • Description
bute
• Elements • A space delimited list of element
names for which to process the contents, or **" indicating the contents of all elements should be processed. The Elements attribute defaults to "*" if it is not specified.
The Element
The optional element indicates to the document editing tools whether ignorable content should be preserved when the document is modified. The method by which an editing tool preserves or discards the ignorable content is in the domain of the editing tool. If multiple elements refer to the same element or attribute in a namespace, the last specified has precedence.
Attributes
• Attri • Description
bute
• Elements • A space delimited list of element
names that are requested to be carried along when the document is edited, or "*" indicating the contents of all elements in the namespace should be carried along. The Elements attribute defaults to "*" if it is not specified.
• Attributes • A space delimited list of attribute
names within the elements that are to be carried along, or a **" indicating that all
attributes of the elements should be carried along. When an element is ignored and carried along, all attributes are carried along regardless of the contents of this attribute. This attribute only has an effect if the attribute specified is used in an element that is not ignored, as in the example below. By default, Attributes is "*".
The Element
is an element that reverses the effects of an Ignorable element. This technique is useful, for example, when combined with alternate content. Outside the scope defined by the element, the element remains Ignorable.
Attributes
• Attri • Description
bute
• NamespaceU • The URI of the namespace whose
ri items must be understood.
The Element
The element allows alternate content to be provided if any part of the specified content is not understood. An AltemateContent block uses both a and a block. If anything in the block is not understood, then the contents of the block are used. A namespace is declared in order to indicate that the fallback is to be used. If a
namespace is declared ignorable and that namespace is used within a
block, the content in the block will not be used.
Versioning Markup Examples
Using
This example uses a fictitious markup namespace, http://PLACEHOLDER/Circle. that defines an element Circle in its initial version and uses the Opacity attribute of Circle introduced in a future version of the markup (version 2) and the Luminance property introduced in an even later version of the markup (version 3). This markup remains loadable in versions 1 and 2, as well as 3 and beyond. Additionally, the element specifies that v3:Luminance MUST be preserved when editing even when the editor doesn't understand v3'.Luminance.
For a version 1 reader, Opacity and Luminance are ignored.
For a version 2 reader, only Luminance is ignored.
For a version 3 reader and beyond, all the attributes are used.
Using
The following example demonstrates the use of the element.
Use of the element causes the references to v3: Luminance to be in error, even though it was declared to Ignorable in the root element. This technique is useful if combined with alternate content that uses, for example, the Luminance property of Canvas added' in Version 2 instead (see below). Outside the scope of the Canvas element, Circle's Luminance property is ignorable again.
Using
If any element or attribute is declared as but is not understood in the block of an block, the block is skipped in its entirety and the block is processed as normal (that is, any MustUnderstand items encountered are reported as errors).
THE REACH PACKAGE FORMAT
In the discussion that follows, a description of a specific file format is provided. Separate primary sub-headings in this section include "Introduction to
the Reach Package Format", "The Reach Package Structure", "Fixed Payload Parts", "FixedPage Markup Basics", "Fixed-Payload Elements and Properties" and "FixedPage Markup". Each primary sub-heading has one or more related subheadings.
Introduction to the Reach Package Format
Having described an exemplary framework above, the description that follows is one of a specific format that is provided utilizing the tools described above. It is to be appreciated and understood that the following description constitutes but one exemplary format and is not intended to limit application of the claimed subject matter.
In accordance with this embodiment, a single package may contain multiple payloads, each acting as a different representation of a document. A payload is a collection of parts, including an identifiable "root" part and all the parts required for valid processing of that root part. For instance, a payload could be a fixed representation of a document, a reflowable representation, or any arbitrary representation.
The description that follows defines a particular representation called the fixed payload. A fixed payload has a root part that contains a FixedPanel markup which, hi turn, references FixedPage parts. Together, these describe a precise rendering of a multi-page document.
A package which holds at least one fixed payload, and follows other rules described below, is known referred to as a reach package. Readers and writers of reach packages can implement then- own parsers and rendering engines, based on the specification of the reach package format.
Features of Reach Packages
In accordance with the described embodiment, reach packages address the requirements that information workers have for distributing, archiving, and rendering documents. Using known rendering rules, reach packages can be unambiguously and exactly reproduced or printed from the format in which they are saved, without tying client devices or applications to specific operating systems or service libraries. Additionally, because the reach payload is expressed in a neutral, application-independent way, the document can typically be viewed and printed without the application used to create the package. To provide this ability, the notion of a. fixed payload is introduced and contained in a reach package.
In accordance with the described embodiment, a fixed payload has a fixed number of pages and page breaks are always the same. The layout of all the elements on a page in a fixed payload is predetermined. Each page has a fixed size and orientation. As such, no layout calculations have to be performed on the consuming side and content can simply be rendered. This applies not just to graphics, but to text as well, which is represented in the fixed payload with precise typographic placement. The content of a page (text, graphics, images) is described using a powerful but simple set of visual primitives.
Reach packages support a variety of mechanisms for organizing pages. A group of pages are "glued" together one after another into a "FixedPanel." This group of pages is roughly equivalent to a traditional multi-page document. A FixedPanel can then further participate in composition—the process of building sequences and selections to assemble a "compound" document.
In the illustrated and described embodiment, reach packages support a specific kind of sequence called a FixedPanel sequence that can be used, for
example, to glue together a set of FixedPanels into a single, larger "document." Imagine, for example, gluing together two documents that came from different sources: a two-page cover memo (a FixedPanel) and a twenty-page report (a FixedPanel).
Reach packages support a number of specific selectors that can be used when building document packages containing alternate representations of the "same" content. In particular, reach packages allow selection based on language, color capability, and page size. Thus, one could have, for example, a bi-lingual document that uses a selector to pick between the English representation and the French representation of the document.
In addition to these simple uses of selector and sequence for composition in a reach package, it is important to note that selectors and sequences can also refer to further selectors and sequences thus allowing for powerful aggregate hierarchies to be built. The exact rules for what can and cannot be done, in accordance with this embodiment, are specified below in the section entitled "The Reach Package Structure".
Additionally, a reach package can contain additional payloads that are not fixed payloads, but instead are richer and perhaps editable representations of the document. This allows a package to contain a rich, editable document that works well in an editor application as well as a representation that is visually accurate and can be viewed without the editing application.
Finally, in accordance with this embodiment, reach packages support what is known as sprint ticket. The print ticket provides settings that should be used when the package is printed. These print tickets can be attached in a variety of ways to achieve substantial flexibility. For example, a print ticket can be "attached" to an entire package and its settings will affect the whole package. Print tickets can be
further attached at lower levels in the structure (e.g., to individual pages) and these print tickets will provide override settings to be used when printing the part to which they are attached.
The Reach Package Structure
As described above, a reach package supports a set of features including "fixed" pages, FixedPanels, composition, print tickets, and the like. These features are represented in a package using the core components of the package model: parts and relationships. In this section and its related sub-sections, a complete definition of a "reach package" is provided, including descriptions of how all these parts and relationships must be assembled, related, etc.
Reach Package Structure Overview
Fig. 10 illustrates an exemplary reach package and, in this embodiment, each of the valid types of parts that can make up or be found in a package. The table provided just below lists each valid part type and provides a description of each:
(Table Removed)
Because a reach package is designed to be a "view and print anywhere" document, readers and writers of reach packages must share common, unambiguously-defined expectations of what constitutes a "valid" reach package. To provide a definition of a "valid" reach package, a few concepts are first defined below.
Reach Composition Parts
A reach package must contain at least one FixedPanel that is "discoverable" by traversing the composition block from the starting part of the package. In accordance with the described embodiment, the discovery process follows the following algorithm:
• Recursively traverse the graph of composition parts starting at the
package starting part.
• When performing this traversal, only traverse into composition parts
that are reach composition parts (described below).
• Locate all of the terminal nodes (those without outgoing arcs) at the
edge of the graph.
These terminal nodes refer (via their
- elements) to a set of parts called the reach payload roots.
Fixed Pavload
A fixed payload is a payload whose root part is a FixedPanel part. For example, each of the fixed payloads in Fig. 10 has as its root part an associated FixedPanel part. The payload includes the full closure of all of the parts required for valid processing of the FixedPanel. These include:
• The FixedPanel itself;
All FixedPages referenced from within the FixedPanel; All image parts referenced (directly, or indirectly through a selector) by any of the FixedPages in the payload;
All reach selectors (as described below) referenced directly or indirectly from image brushes used within .any of the FixedPages within the payload;
All font parts referenced by any of the FixedPages in the payload; All descriptive metadata parts attached to any part in the fixed payload; and Any print tickets attached to any part in the fixed payload.
Validity Rules for Reach Package
With the above definitions in place, conformance rules that describe a "valid" reach package in accordance with the described embodiment are now described:
A reach package must have a starting part defined using the standard
mechanism of a package relationship as described above;
The starting part of a reach package must be either a selector or a
sequence;
A reach package must have at least one reach payload root that is a
FixedPanel;
PrintTicket parts may be attached to any of the composition parts,
FixedPanel parts or any of the FixedPage parts identified in the
FixedPanel(s). In the present example, this is done via the
http://PLACEHOLDER/HasPrintTicketRel relationship;
o PrintTickets may be attached to any or all of these parts;
o Any given part must have no more than one PrintTicket
attached;
A Descriptive Metadata part may be attached to any part in the package;
Every Font object in the FixedPaylbad must meet the font format rules defined in section "Font Parts".
References to images from within any FixedPage in the fixed payload may point to a selector which may make a selection (potentially recursively through other selectors) to find the actual image part to be rendered;
Every Image object used in the fixed payload must meet the font format rules defined in section "Image Parts";
• For any font, image or selector part referenced from a FixedPage (directly, or indirectly through selector), there must be a "required part" relationship (relationship name = http://mmcf-fixed-RequiredResource-PLACEHOLDER) from the referencing FixedPage to the referenced part.
Reach Composition Parts
While a reach package may contain many types of composition part, only a well-defined set of types of composition parts have well-defined behavior according to this document These composition parts with well-defined behavior are called reach composition parts. Parts other than these are not relevant when determining validity of a reach package.
The following types of composition parts are defined as reach composition parts:
(Table Removed)
Reach Selectors
Those selector composition parts defined as reach composition parts are called reach selectors. As noted above, a language selector picks between representations based on their natural language, such as English or French. To discover this language, the selector inspects each of its items. Only those that are XML are considered. For those, the root element of each one is inspected to determine its language. If the xmhlang attribute is not present, the part is ignored.
The selector then considers each of these parts in turn, selecting the first one whose language matches the system's default language.
A color selector chooses between representations based on whether they are monochromatic or color. The page size selector chooses between representations based on their page size. A content type selector chooses between representations based on whether their content types can be understood by the system.
Reach Sequences
Those sequence composition parts defined as reach composition parts are called reach sequences. A faced sequence combines children that are fixed content into a sequence.
Fixed Pavloads Parts
The fixed payload can contain the following kinds of parts: a FixedPanel part, a FixedPage part, Image parts, Font parts, Print Ticket parts, and Descriptive Metadata parts, each of which is discussed below under its own sub-heading.
The FixedPanel Part
The document structure of the Fixed-Payload identifies FixedPages as part of a spine, as shown below. The relationships between the spine part and the page parts are defined within the relationships stream for the spine. The FixedPanel part is of content type application/xml+PLACEHOLDER.
The spine of the Fixed-Payload content is specified in markup by including a element within a element. In the example below, the element specifies the sources of the pages that are held in the spine.
<1-- SPINE -->
The Element
The element has no attributes and must have only one child: .
The Element
The element is the document spine, logically binding an ordered sequence of pages together into a single multi-page document. Pages always specify their own width and height, but a element may also optionally specify a height and width. This information can be used for a variety of purposes including, for example, selecting between alternate representations based on page size. If a element specifies a height and width, it will usually be aligned with the width and height of the pages within the , but these dimensions do not specify the height and width of individual pages.
The following table summarizes FixedPanel attributes in accordance with the described embodiment.
(Table Removed)
The element is the only allowable child element of the element. The elements are in sequential markup order matching the page order of the document.
The Element
Each element refers to the source of the content for a single page. To determine the number of pages in the document, one would count the number of children contained within the .
The element has no allowable children, and has a single required attribute, Source, which refers to the FixedPage part for the contents of a page.
As with the element, the element may optionally include a PageHeight and PageWidth attribute, here reflecting the size of the single page, The required page size is specified in the FixedPage part; the optional size on is advisory only. The size attributes allow applications such as document viewers to make visual layout estimates for a document quickly, without loading and parsing all of the individual FixedPage parts.
The table provided just below summarizes attributes and provides a description of the attributes.
(Table Removed)
The URJ string of the page content must reference the part location of the content relative to the package.
The FixedPage Part
Each element in the references by name (URI) a FixedPage part. Each FixedPage part contains FixedPage markup describing the rendering of a single page of content. The FixedPage part is of Content Type application/xml+PLACEHOLDER-FixedPage.
Describing FixedPages hi Markup
Below is an example of how the markup of the source content might look for the page referenced in the sample spine markup above ().
// /content/p1.xml
The table below summarizes FixedPage properties and provides a description of the properties.
(Table Removed)
Reading Order in FixedPage Markup
In one embodiment, the markup order of the Glyphs child elements contained within a FixedPage must be the same as the desired reading order of the text content of the page. This reading order may be used both for interactive selection/copy of sequential text from a FixedPage in a viewer, and for enabling access to sequential text by accessibility technology. It is the responsibility of the application generating the FixedPage markup to ensure this correspondence between markup order and reading order.
Image Parts
Supported Formats
In accordance with the described embodiment, image parts used by FixedPages in a reach package can be in a fixed number of formats, e.g., PNG or JPEG, although other formats can be used.
Font Parts
In accordance with the described embodiment, reach packages support a limited number of font formats. In the illustrated and described embodiment, the supported font format include the TrueType format and the OpenType format.
As will be appreciated by the skilled artisan, the OpenType font format is an extension of the TrueType font format, adding support for PostScript font data and complex typographical layout. An OpenType font file contains data, in table format, that comprises either a TrueType outline font or a PostScript outline font.
In accordance with the described embodiment, the following font formats are not supported in reach packages: Adobe type 1, Bitmap font, Font with hidden attribute (use system Flag to decide whether to enumerate it or not), Vector fonts, and EUDC font (whose font family name is EUDC).
Subsetting Fonts
Fixed payloads represent all text using the Glyphs element described in detail below. Since, in this embodiment, the format is fixed, it is possible to subset fonts to contain only the glyphs required by FixedPayloads. Therefore, fonts in reach packages may be subsetted based on glyph usage. Though a subsetted font will not contain all the glyphs in the original font, the subsetted font must be a valid OpenType font file.
Print Ticket Parts
Print ticket parts provide settings that can be used when the package is printed. These print tickets can be attached in a variety of ways to achieve substantial flexibility. For example, a print ticket can be "attached" to an entire package and its settings will affect the whole package. Print tickets can be further attached at lower levels in the structure (e.g., to individual pages) and these print tickets will provide override settings to be used when printing the part to which they are attached.
Descriptive Metadata
As noted above, descriptive .metadata parts provide writers or producers of packages with a way in which to store values of properties that enable readers of the packages to reliably discover the values. These properties are typically used to
record additional information about the package as a whole, as well as individual parts within the container.
FixedPage Markup Basics
This section describes some basic information associated with the FixedPage markup and includes the following sections: "Fixed Payload and Other Markup Standards", "FixedPage Markup Model", "Resources and Resource References", and "FixedPage Drawing Model".
Fixed Payload and Other Markup Standards
The FixedPanel and FixedPage markup for the Fixed Payload in a reach package is a subset from Windows® Longhorn's Avalon XAML markup. That is, while the Fixed Payload markup stands alone as an independent XML markup format (as documented in this document), it loads in the same way as in Longhom systems, and renders a WYSIWYG reproduction of the original multi-page document.
As some background on XAML markup, consider the following. XAML markup is a mechanism that allows a user to specify a hierarchy of objects and the programming logic behind the objects as an XML-based markup language. This provides the ability for an object model to be described in XML. This allows extensible classes, such as classes in the Common Language Runtime (CLR) of the .NET Framework by Microsoft Corporation, to be accessed in XML. The XAML mechanism provides a direct mapping of XML tags to CLR objects and the ability to represent related code in the markup. It is to be appreciated and understood that
various implementations need not specifically utilize a CLR-based implementation of XAML. Rather, a CLR-based implementation constitutes but one way in which XAML can be employed in the context of the embodiments described in this document.
More specifically, consider the following in connection with Fig. 11 which illustrates an exemplary mapping of CLR concepts (left side components) to XML (right side components). Namespaces are found in the xmlns declaration using a CLR concept called reflection. Classes map directly to XML tags. Properties and events map directly to attributes. Using this hierarchy, a user can specify a hierarchy tree of any CLR objects in XML markup files. Xaml files are xml files with a .xaml extension and a mediatype of application/xaml+xml. Xaml files have one root tag that typically specifies a namespace using the xmlns attribute. The namespace may be specified in other types of tags.
Continuing, tags hi a xaml file generally map to CLR objects. Tags can be elements, compound properties, definitions or resources. Elements are CLR objects that are generally instantiated during runtime and form a hierarchy of objects. Compound property tags are used to set a property in a parent tag. Definition tags are used to add code into a page and define resources. The resource tag provides the ability to reuse a tree of objects merely by specifying the tree as a resource. Definition tags may also be defined within another tag as an xmlns attribute.
Once a document is suitably described in markup (typically by a writer), the markup can be parsed and processed (typically by a reader). A suitably configured parser determines from the root tag which CLR assemblies and namespaces should be searched to find a tag. In many instances, the parser looks for and will find a namespace definition file in a URL specified by the xmlns attribute. The namespace definition file provides the name of assemblies and their install path and
a list of CLR namespaces. When the parser encounters a tag, the parser determines which CLR class the tag refers to using the xmlns of the tag and the xmlns definition file for that xmlns. The parser searches in the order that the assemblies and namespaces are specified in the definition file. When it finds a match, the parser instantiates an object of the class.
Thus, the mechanism described just above, and more fully in the application incorporated by reference above, allows object models to be represented in an XML-based file using markup tags. This ability to represent object models as markup tags can be used to create vector graphic drawings, fixed-format documents, adaptive-flow documents, and application UIs asynchronously or synchronously.
In the illustrated and described embodiment, the Fixed Payload markup is a very minimal, nearly completely parsimonious subset of Avalon XAML rendering primitives. It represents visually anything that can be represented in Avalon, with full fidelity. The Fixed Payload markup is a subset of Avalon XAML elements and properties—plus additional conventions, canonical forms, or restrictions in usage compared to Avalon XAML.
The radically-minimal Fixed Payload markup set defined reduces the cost associated with implementation and testing of reach package readers, such as printer RIPs or interactive viewer applications—as well as reducing the complexity and memory footprint of the associated parser. The parsimonious markup set also minimizes the opportunities for subsetting, errors, or inconsistencies among reach package writers and readers, making the format and its ecosystem inherently more robust.
In addition to the minimal Fixed Payload markup, the reach package will specify markup for additional semantic information to support viewers or
presentations of reach package documents with features such as hyperlinks, section/outline structure and navigation, text selection, and document accessibility.
Finally, using the versioning and extensibility mechanisms described above, it is possible to supplement the minimal Fixed Payload markup with a richer set of elements for specific target consuming applications, viewers, or devices. - •
FixedPage Markup Model
In the illustrated and described embodiment, a FixedPage part is expressed in an XML-based markup language, based on XML-Elements, XML-Attributes, and XML-Namespaces. Three XML-Namespaces are defined in this document for inclusion in FixedPage markup. One such namespace references the Version-control elements and attributes defined elsewhere in this specification. The principle namespace used for elements and attributes in the FixedPage markup is "http://schemas.microsoft.com/MMCF-PLACEHOLDER-FixedPage". And finally, FixedPage markup introduces a concept of "Resources" which requires a third namespace, described below.
Although FixedPage markup is expressed using XML-Elements and XML-Attributes, its specification is based upon a higher-level abstract model of "Contents" and "Properties". The FixedPage elements are all expressed as XML-elements. Only a handful of FixedPage elements can hold "Contents", expressed as child XML-elements. But a property-value may be expressed using an XML-Attribute or using a child XML-element.
FixedPage Markup also depends upon the twin concepts of a Resource-Dictionary and Resource-Reference. The combination of a Resource-Dictionary and multiple Resource-References allows for a single property-value to be shared by multiple properties of multiple FixedPage-markup elements.
'_ Properties in FixedPage Markup
In the illustrated and described embodiment, there are three forms of markup which can be used to specify the value of a FixedPage-markup property.
If the property is specified using a resource-reference, then the property name is used as an XML-attribute name, and a special syntax for the attribute-value indicates the presence of a resource reference. The syntax for expressing resource-references is described in the section entitled "Resources and Resource-References".
Any property-value that is not specified as a resource-reference may be expressed in XML using a nested child XML-element identifying the property whose value is being set. This "Compound-Property Syntax" is described below.
Finally, some non-resource-reference property-values can be expressed as simple-text strings. Although all such property-values may be expressed using Compound-Property Syntax, they may also be expressed using simple XML-attribute syntax
For any given element, any property may be set no more than once, regardless of the syntax used for specifying a value.
Simple Attribute Syntax
For a property value expressible as a simple string, XML-attribute-syntax may be used to specify a property-value. For example, given the FixedPage-markup element called "SolidColorBrush," with the property called "Color", the following syntax can be used to specify a property value:
<-- Simple Attribute Syntax -->
Compound-Property Syntax
Some property values cannot be expressed as a simple string, e.g. an XML-element is used to describe the property value. Such a property value cannot be expressed using simple attribute syntax. But they can be expressed using compound-property syntax.
In compound-property syntax, a child XML-Element is used, but the XML-Element name is derived from a combination of the parent-element name and the property name, separated by dot. Given the FixedPage-markup element , which has a property "Fill" which may be set to a , the following markup can be used to set the "Fill" property of the element:
(Figure Removed)
Compound-Property Syntax may be used even in cases where Simple-Attribute Syntax would suffice to express a property-value. So, the example of the previous section:
(Figure Removed)
Can be expressed instead in Compound-Property Syntax:
(Figure Removed)
When specifying property-value using Compound-Property Syntax, the child XML-elements representing "Properties" must appear before child XML-elements representing "Contents". The order of individual Compound-Property child XML-elements is not important, only that they appear together before any "Contents" of the parent-element.
For example, when using both Clip and RenderTransform properties of the