<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
	<!ENTITY rfc2119 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml'>
	<!ENTITY rfc2046 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2046.xml'> <!--MIME Part Two: Media Types-->
	<!ENTITY rfc2616 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2616.xml'> <!--HTTP/1.1-->
	<!ENTITY rfc2854 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2854.xml'> <!--text/html-->
	<!ENTITY rfc3023 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3023.xml'> <!--text/xml-->    
	<!ENTITY rfc3629 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3629.xml'> <!--UTF-8-->
]>
<!-- ?xml-stylesheet type='text/xsl' href='http://xml.resource.org/authoring/rfc2629.xslt' ? -->
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>

<?rfc toc="yes" ?>
<?rfc tocindent="no" ?>
<?rfc autobreaks="no" ?>
<?rfc comments="yes" ?>
<?rfc inline="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes"?>
<?rfc iprnotified="no" ?>
<?rfc strict="no" ?>
<?rfc compact="yes"?>

<rfc category="std"
     ipr="trust200902"
     updates="2046"
     docName="draft-melnikov-mime-default-charset-01">
    <front>
        <title abbrev="MIME Charset Default Update">Update to MIME regarding Charset Parameter Handling in Textual Media Types</title>
        <author initials='A.' surname="Melnikov" fullname="Alexey Melnikov">
            <organization>Isode Limited</organization>
            <address>
              <postal>
                <street>5 Castle Business Village</street>
                <street>36 Station Road</street>
                <city>Hampton</city>
                <region>Middlesex</region>
                <code>TW12 2BX</code>
                <country>UK</country>
              </postal>
              <email>Alexey.Melnikov@isode.com</email>
            </address>
        </author>
        <author initials="J. F." surname="Reschke" fullname="Julian F. Reschke">
            <organization abbrev="greenbytes">greenbytes GmbH</organization>
            <address>
              <postal>
                <street>Hafenweg 16</street>
                <city>Muenster</city><region>NW</region><code>48155</code>
                <country>Germany</country>
              </postal>
              <email>julian.reschke@greenbytes.de</email>	
              <uri>http://greenbytes.de/tech/webdav/</uri>	
            </address>
        </author>
        <date year="2011" month="July" day="11"/>
        <area>Applications</area>
        <keyword>MIME</keyword>
        <keyword>charset</keyword>
        <keyword>text</keyword>
        <abstract>
          <t>
            This document changes RFC 2046 rules regarding default charset parameter
	    values for text/* media types to better align with common usage by existing
	    clients and servers.
          </t>
        </abstract>
    </front>

    <middle>
        <section title="Introduction and overview">
            <t>

<!--////Alexey: this might need improvments-->

	    <xref target="RFC2046"/> specified that the default charset parameter
	    (i.e. the value used when it is not specified) is "US-ASCII".
	    <xref target="RFC2616"/> changed the default for use by HTTP to be "ISO-8859-1".
	    This encoding is not very common for new text/* media types
	    and a special rule in HTTP adds confusion
	    about which specification (<xref target="RFC2046"/> or <xref target="RFC2616"/>)
	    is authoritative in regards to the default charset for text/* media types.
	    
	    <cref>At the time of writing of this document the IETF HTTPBIS WG is working
	    on an update to RFC 2616 which removes the default charset of "ISO-8859-1"
	    for "text/*" media types. It is expected that the set of HTTPBIs documents
	    will reference this document in order to use the updated rules
	    of default charset in "text/*" media types.</cref>
            </t>
    
            <t>
	    Many complex text subtypes such as text/html <xref target="RFC2854"/>  and text/xml <xref target="RFC3023"/>  have internal
	    (to their format) means of describing the charset.
	    Many existing User Agents ignore the default of "US-ASCII" rule for at least
	    text/html and text/xml.
            </t>

	    <t>This document changes RFC 2046 rules regarding default charset parameter
	    values for text/* media types to better align with common usage by existing
	    clients and servers.
	    </t>

        </section>

	<section title="Conventions Used in This Document">
	    
	    <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
	    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
	    this document are to be interpreted as described in
	    <xref target="RFC2119"/>.</t>
          
	</section>
	
	<section title="New rules for default charset parameter values for text/* media types">

	    <t>Section 4.1.2 of <xref target="RFC2046"/> says:</t>

	    <t>"The default character set, which must be assumed in the absence
	    of a charset parameter, is US-ASCII."</t>
	    
<!--///RFC 2046, Section 4.1.2 also says:
   Note that the character set used, if anything other than US- ASCII,
   must always be explicitly specified in the Content-Type field.
-->
	    
	    <t>As explained in the Introduction section this rule is considered
	    to be outdated, so this document replaces it with the following set
	    of rules:</t>

<!--///Ned wrote:
    In the absence of a specification of a default, I'm tempted
    to say the "default default" should be UTF-8.
-->

<!--///John Klensin wrote:
    Require text/* to be accompanied by a charset parameter
    always.  Of course, if one is omitted, which will happen
    in practice, people will assume the old rules.  But new
    and updated specs say "parameter MUST be supplied".
-->   
	    
	    <t>Each subtype of the "text" media type which uses the "charset"
	    parameter can define its own default value for the "charset" parameter,
	    including absence of any default.
	    </t>
	    
	    <t>
<!-- jre I'm not sure I understand that "for interoperability..." part-->
	    In order to improve interoperability with deployed agents,
	    "text/*" media type definitions SHOULD either
	    a) specify that the "charset" parameter is not used for the defined subtype,
	    because the charset information is transported inside the payload (as in "text/xml") or
	    b) require explicit unconditional inclusion of the "charset" parameter
	    eliminating the need for a default value.
<!--////Alexey: Hmmm, does this mean that the second choice above doesn't specify a default either?-->

	    In accordance with option (a), above, "text/*" media types that can
	    transport charset information inside the corresponding payloads,
	    specifically including "text/html" and "text/xml", SHOULD NOT specify
	    the use of a "charset" parameter, nor any default value, in order to
	    avoid conflicting interpretations should the charset parameter value
	    and the value specified in the payload disagree.</t>
	    
<!--////Alexey: Julian also suggested that only charsets that are supersets of US-ASCII
should be used as defaults. I.e. this would rule out UTF-16. I tend to agree.-->
	    <t>
	    New subtypes of the "text" media type, thus, SHOULD NOT define a
	    default "charset" value.  If there is a strong reason to do so
	    despite this advice, they SHOULD use the "UTF-8" <xref target='RFC3629'/> charset
	    as the default.
	    </t>

	    <t>
	    Specifications of how to specify the "charset" parameter, and what
	    default value, if any, is used, are subtype-specific, NOT protocol-
	    specific.  Protocols that use MIME, therefore, MUST NOT override
	    default charset values for "text/*" media types to be different for
	    their specific protocol.  The protocol definitions MUST leave that
	    to the subtype definitions.
	    </t>
	    
        </section>

	<section title="Default charset parameter value for text/plain media type">

	    <t>The default charset parameter value for text/plain is unchanged
	    from <xref target="RFC2046"/> and remains as "US-ASCII".</t>
	    
        </section>
	
	
<!--////Do we also need to update the document that registers text/xml?-->


	<section anchor="security" title="Security Considerations">
	    
          <t>TBD. Guessing of default charset is a security problem.
	  Conflicting information in-band vs out-of-band is also a security problem.
          </t>

	</section>

        <section anchor="iana" title="IANA Considerations">

            <t>
	      This document asks IANA to update the "text" subregistry of
	      the Media Types registry to additionally point to this document.
            </t>
	    
        </section>
    </middle>

    <back>
        <references title="Normative References">

	    &rfc2119;
	    &rfc2046;
	    &rfc3629;

        </references>

        <references title="Informative References">

	    &rfc2616;
	    &rfc2854;
	    &rfc3023;
	    
        </references>
    
    <section title="Acknowledgements">
	
      <t>
	Many thanks to Ned Freed and John Klensin for comments and ideas that motivated
	creation of this document, and to Barry Leiba for suggested text.
      </t>

    </section>
	
    </back>
</rfc>
