"xml2rfc" Version 3 Preparation Tool Description

The steps listed here are in order of processing. In all cases where the prep tool would "add" an attribute or element, if that attribute or element already exists, the prep tool will check that the attribute or element has valid values. If the value is incorrect, the prep tool will warn with the old and new values, then replace the incorrect value with the new value. Currently, the IETF uses a tool called "idnits" to check text input to the Internet-Drafts posting process. idnits indicates if it encountered anything it considers an error and provides text describing all of the warnings and errors in a human-readable form. The prep tool should probably check for as many of these errors and warnings as possible when it is processing the XML input. For the moment, tooling might run idnits on the text output from the prepared XML. The list below contains some of these errors and warnings, but the deployed version of the prep tool may contain additional steps to include more or the checks from idnits.

These steps will ensure that the input document is properly formatted and that all XML processing has been performed.

Process all <x:include> elements. Note: XML <x:include> elements may include more <x:include> elements (with relative references resolved against the base URI potentially modified by a previously inserted xml:base attribute). The tool may be configurable with a limit on the depth of recursion.

Fully process any Document Type Definitions (DTDs) in the input document, then remove the DTD. At a minimum, this entails processing the entity references and includes for external files.

Remove processing instructions.

Check the input against the RELAX NG (RNG) in . If the input is not valid, give an error.

Check all elements for "anchor" attributes. If any "anchor" attribute begins with "s-", "f-", "t-", or "i-", give an error.

These steps will ensure that all default values have been filled in to the XML, in case the defaults change at a later date. Steps in this section will not overwrite existing values in the input file.

If the <rfc> element has a "version" attribute with a value other than "3", give an error. If the <rfc> element has no "version" attribute, add one with the value "3".

If the <front> element of the <rfc> element does not already have a <seriesInfo> element, add a <seriesInfo> element with the name attribute based on the mode in which the prep tool is running ("Internet-Draft" for Draft mode and "RFC" for RFC production mode) and a value that is the input filename minus any extension for Internet-Drafts, and is a number specified by the RFC Editor for RFCs.

If the <front> element in the <rfc> element does not contain a <date> element, add it and fill in the "day", "month", and "year" attributes from the current date. If the <front> element in the <rfc> element has a <date> element with "day", "month", and "year" attributes, but the date indicated is more than three days in the past or is in the future, give a warning. If the <front> element in the <rfc> element has a <date> element with some but not all of the "day", "month", and "year" attributes, give an error.

If the input document includes a "prepTime" attribute of <rfc>, exit with an error. Fill in the "prepTime" attribute of <rfc> with the current datetime.

Add a "start" attribute to every <ol> element containing a group that does not already have a start.

Fill in any default values for attributes on elements, except "keepWithNext" and "keepWithPrevious" of <t>, and "toc" of <section>. Some default values can be found in the RELAX NG schema, while others can be found in the prose describing the elements in .

For each <section>, modify the "toc" attribute to be either "include" or "exclude": for sections that have an ancestor of <boilerplate>, use "exclude" else for sections that have a descendant that has toc="include", use "include". If the ancestor section has toc="exclude" in the input, this is an error. else for sections that are children of a section with toc="exclude", use "exclude". else for sections that are deeper than rfc/@tocDepth, use "exclude" else use "include"

In I-D mode, if there is a <note> or <section> element with a "removeInRFC" attribute that has the value "true", add a paragraph to the top of the element with the text "This note is to be removed before publishing as an RFC." or "This section...", unless a paragraph consisting of that exact text already exists.

These steps will ensure that ideas that can be expressed in multiple different ways in the input document are only found in one way in the prepared document.

Normalize the values of "month" attributes in all <date> elements in <front> elements in <rfc> elements to numeric values.

In every <email>, <organization>, <street>, <city>, <region>, <country>, and <code> element, if there is an "ascii" attribute and the value of that attribute is the same as the content of the element, remove the "ascii" element and issue a warning about the removal. In every <author> element, if there is an "asciiFullname", "asciiInitials", or "asciiSurname" attribute, check the content of that element against its matching "fullname", "initials", or "surname" element (respectively). If the two are the same, remove the "ascii*" element and issue a warning about the removal.

For every <section>, <note>, <figure>, <references>, and <texttable> element that has a (deprecated) "title" attribute, remove the "title" attribute and insert a <name> element with the title from the attribute.

These steps will generate new content, overriding existing similar content in the input document. Some of these steps are important enough that they specify a warning to be generated when the content being overwritten does not match the new content.

If in I-D mode, fill in "expiresDate" attribute of <rfc> based on the <date> element of the document's <front> element.

Create a <boilerplate> element if it does not exist. If there are any children of the <boilerplate> element, produce a warning that says "Existing boilerplate being removed. Other tools, specifically the draft submission tool, will treat this condition as an error" and remove the existing children.

Verify that <rfc> "submissionType" and <seriesInfo> "stream" are the same if they are both present. If either is missing, add it. Note that both have a default value of "IETF".

Add the "Status of this Memo" section to the <boilerplate> element with current values. The application will use the "submissionType", and "consensus" attributes of the <rfc> element, the <workgroup> element, and the "status" and "stream" attributes of the <seriesInfo> element, to determine which boilerplate from to include, as described in Appendix A of .

Add the "Copyright Notice" section to the <boilerplate> element. The application will use the "ipr" and "submissionType" attributes of the <rfc> element and the <date> element to determine which portions and which version of the Trust Legal Provisions (TLP) to use, as described in A.1 of .

For any <reference> element that does not already have a "target" attribute, fill the target attribute in if the element has one or more <seriesinfo> child element(s) and the "name" attribute of the <seriesinfo> element is "RFC", "Internet-Draft", or "DOI" or other value for which it is clear what the "target" should be. The particular URLs for RFCs, Internet-Drafts, and Digital Object Identifiers (DOIs) for this step will be specified later by the RFC Editor and the IESG. These URLs might also be different before and after the v3 format is adopted.

Add a "slugifiedName" attribute to each <name> element that does not contain one; replace the attribute if it contains a value that begins with "n-".

If the "sortRefs" attribute of the <rfc> element is true, sort the <reference> and <referencegroup> elements lexically by the value of the "anchor" attribute, as modified by the "to" attribute of any <displayreference> element. The RFC Editor needs to determine what the rules for lexical sorting are. The authors of this document acknowledge that getting consensus on this will be a difficult task.

Add "pn" attributes for all parts. Parts are: <section> in <middle>: pn='s-1.4.2' <references>: pn='s-12' or pn='s-12.1' <abstract>: pn='s-abstract' <note>: pn='s-note-2' <section> in <boilerplate>: pn='s-boilerplate-1' <table>: pn='t-3' <figure>: pn='f-4' <artwork>, <aside>, <blockquote>, <dt>, <li>, <sourcecode>, <t>: pn='p-[section]-[counter]'

In every <iref> element, create a document-unique "pn" attribute. The value of the "pn" attribute will start with 'i-', and use the item attribute, the subitem attribute (if it exists), and a counter to ensure uniqueness. For example, the first instance of "<iref item='foo' subitem='bar'>" will have the "irefid" attribute set to 'i-foo-bar-1'.

For each <xref> element that has content, fill the "derivedContent" with the element content, having first trimmed the whitespace from ends of content text. Issue a warning if the "derivedContent" attribute already exists and has a different value from what was being filled in.

For each <xref> element that does not have content, fill the "derivedContent" attribute based on the "format" attribute. For a value of "counter", the "derivedContent" is set to the section, figure, table, or ordered list number of the element with an anchor equal to the <xref> target. For format='default' and the "target" attribute points to a <reference> or <referencegroup> element, the "derivedContent" is the value of the "target" attribute (or the "to" attribute of a <displayreference> element for the targeted <reference>). For format='default' and the "target" attribute points to a <section>, <figure>, or <table>, the "derivedContent" is the name of the thing pointed to, such as "Section 2.3", "Figure 12", or "Table 4". For format='title', if the target is a <reference> element, the "derivedContent" attribute is the name of the reference, extracted from the <title> child of the <front> child of the reference. For format='title', if the target element has a <name> child element, the "derivedContent" attribute is the text content of that <name> element concatenated with the text content of each descendant node of <name> (that is, stripping out all of the XML markup, leaving only the text). For format='title', if the target element does not contain a <name> child element, the "derivedContent" attribute is the value of the "target" attribute with no other adornment. Issue a warning if the "derivedContent" attribute already exists and has a different value from what was being filled in.

If any <relref> element's "target" attribute refers to anything but a <reference> element, give an error. For each <relref> element, fill in the "derivedLink" attribute.

These steps will include external files into the output document.

If an <artwork> element has a "src" attribute where no scheme is specified, copy the "src" attribute value to the "originalSrc" attribute, and replace the "src" value with a URI that uses the "file:" scheme in a path relative to the file being processed. See for warnings about this step. This will likely be one of the most common authoring approaches. If an <artwork> element has a "src" attribute with a "file:" scheme, and if processing the URL would cause the processor to retrieve a file that is not in the same directory, or a subdirectory, as the file being processed, give an error. If the "src" has any shellmeta strings (such as "`", "$USER", and so on) that would be processed, give an error. Replace the "src" attribute with a URI that uses the "file:" scheme in a path relative to the file being processed. This rule attempts to prevent <artwork src='file:///etc/passwd'> and similar security issues. See for warnings about this step. If an <artwork> element has a "src" attribute, and the element has content, give an error. If an <artwork> element has type='svg' and there is an "src" attribute, the data needs to be moved into the content of the <artwork> element. If the "src" URI scheme is "data:", fill the content of the <artwork> element with that data and remove the "src" attribute. If the "src" URI scheme is "file:", "http:", or "https:", fill the content of the <artwork> element with the resolved XML from the URI in the "src" attribute. If there is no "originalSrc" attribute, add an "originalSrc" attribute with the value of the URI and remove the "src" attribute. If the <artwork> element has an "alt" attribute, and the SVG does not have a <desc> element, add the <desc> element with the contents of the "alt" attribute. If an <artwork> element has type='binary-art', the data needs to be in an "src" attribute with a URI scheme of "data:". If the "src" URI scheme is "file:", "http:", or "https:", resolve the URL. Replace the "src" attribute with a "data:" URI, and add an "originalSrc" attribute with the value of the URI. For the "http:" and "https:" URI schemes, the mediatype of the "data:" URI will be the Content-Type of the HTTP response. For the "file:" URI scheme, the mediatype of the "data:" URI needs to be guessed with heuristics (this is possibly a bad idea). This also fails for content that includes binary images but uses a type other than "binary-art". Note: since this feature can't be used for RFCs at the moment, this entire feature might be If an <artwork> element does not have type='svg' or type='binary-art' and there is an "src" attribute, the data needs to be moved into the content of the <artwork> element. Note that this step assumes that all of the preferred types other than "binary-art" are text, which is possibly wrong. If the "src" URI scheme is "data:", fill the content of the <artwork> element with the correctly escaped form of that data and remove the "src" attribute. If the "src" URI scheme is "file:", "http:", or "https:", fill the content of the <artwork> element with the correctly escaped form of the resolved text from the URI in the "src" attribute. If there is no "originalSrc" attribute, add an "originalSrc" attribute with the value of the URI and remove the "src" attribute.

If a <sourcecode> element has a "src" attribute where no scheme is specified, copy the "src" attribute value to the "originalSrc" attribute and replace the "src" value with a URI that uses the "file:" scheme in a path relative to the file being processed. See for warnings about this step. This will likely be one of the most common authoring approaches. If a <sourcecode> element has a "src" attribute with a "file:" scheme, and if processing the URL would cause the processor to retrieve a file that is not in the same directory, or a subdirectory, as the file being processed, give an error. If the "src" has any shellmeta strings (such as "`", "$USER", and so on) that would be processed, give an error. Replace the "src" attribute with a URI that uses the "file:" scheme in a path relative to the file being processed. This rule attempts to prevent <sourcecode src='file:///etc/passwd'> and similar security issues. See for warnings about this step. If a <sourcecode> element has a "src" attribute, and the element has content, give an error. If a <sourcecode> element has a "src" attribute, the data needs to be moved into the content of the <sourcecode> element. If the "src" URI scheme is "data:", fill the content of the <sourcecode> element with that data and remove the "src" attribute. If the "src" URI scheme is "file:", "http:", or "https:", fill the content of the <sourcecode> element with the resolved XML from the URI in the "src" attribute. If there is no "originalSrc" attribute, add an "originalSrc" attribute with the value of the URI and remove the "src" attribute.

These steps provide extra cleanup of the output document in RFC production mode.

In RFC production mode, if there is a <note> or <section> element with a "removeInRFC" attribute that has the value "true", remove the element.

If in RFC production mode, remove all <cref> elements.

If in RFC production mode, remove all <link> elements whose "rel" attribute has the value "alternate". If in RFC production mode, check if there is a <link> element with the current ISSN for the RFC series (2070-1721); if not, add <link rel="item" href="urn:issn:2070-1721">. If in RFC production mode, check if there is a <link> element with a DOI for this RFC; if not, add one of the form <link rel="describedBy" href="https://dx.doi.org/10.17487/rfcdd"> where "dd" is the number of the RFC, such as "https://dx.doi.org/10.17487/rfc2109". The URI is described in . If there was already a <link> element with a DOI for this RFC, check that the "href" value has the right format. The content of the href attribute is expected to change in the future. If in RFC production mode, check if there is a <link> element with the file name of the Internet-Draft that became this RFC the form <link rel="convertedFrom" href="https://datatracker.ietf.org/doc/draft-tttttttttt/">. If one does not exist, give an error.

If in RFC production mode, remove XML comments.

If in RFC production mode, remove all "xml:base" or "originalSrc" attributes from all elements.

If in RFC production mode, ensure that the result is in full compliance to the v3 schema, without any deprecated elements or attributes and give an error if any issues are found.

These steps provide the finishing touches on the output document.

Determine all the characters used in the document and fill in the "scripts" attribute for <rfc>.

Pretty-format the XML output. (Note: there are many tools that do an adequate job.)