<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc [
  <!ENTITY MAY "<bcp14 xmlns='http://purl.org/net/xml2rfc/ext'>MAY</bcp14>">
  <!ENTITY MUST "<bcp14 xmlns='http://purl.org/net/xml2rfc/ext'>MUST</bcp14>">
  <!ENTITY MUST-NOT "<bcp14 xmlns='http://purl.org/net/xml2rfc/ext'>MUST NOT</bcp14>">
  <!ENTITY OPTIONAL "<bcp14 xmlns='http://purl.org/net/xml2rfc/ext'>OPTIONAL</bcp14>">
  <!ENTITY RECOMMENDED "<bcp14 xmlns='http://purl.org/net/xml2rfc/ext'>RECOMMENDED</bcp14>">
  <!ENTITY REQUIRED "<bcp14 xmlns='http://purl.org/net/xml2rfc/ext'>REQUIRED</bcp14>">
  <!ENTITY SHALL "<bcp14 xmlns='http://purl.org/net/xml2rfc/ext'>SHALL</bcp14>">
  <!ENTITY SHALL-NOT "<bcp14 xmlns='http://purl.org/net/xml2rfc/ext'>SHALL NOT</bcp14>">
  <!ENTITY SHOULD "<bcp14 xmlns='http://purl.org/net/xml2rfc/ext'>SHOULD</bcp14>">
  <!ENTITY SHOULD-NOT "<bcp14 xmlns='http://purl.org/net/xml2rfc/ext'>SHOULD NOT</bcp14>">
	<!ENTITY rfc2119 PUBLIC '' 'bibxml/reference.RFC.2119.xml'>
	<!ENTITY rfc2046 PUBLIC '' 'bibxml/reference.RFC.2046.xml'> <!--MIME Part Two: Media Types-->
	<!ENTITY rfc2616 PUBLIC '' 'bibxml/reference.RFC.2616.xml'> <!--HTTP/1.1-->
	<!ENTITY rfc2854 PUBLIC '' 'bibxml/reference.RFC.2854.xml'> <!--text/html-->
	<!ENTITY rfc3023 PUBLIC '' 'bibxml/reference.RFC.3023.xml'> <!--text/xml-->    
	<!ENTITY rfc3629 PUBLIC '' 'bibxml/reference.RFC.3629.xml'> <!--UTF-8-->
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>

<?rfc toc="yes" ?>
<?rfc tocindent="no" ?>
<?rfc autobreaks="no" ?>
<?rfc comments="yes" ?>
<?rfc inline="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes"?>
<?rfc iprnotified="no" ?>
<?rfc strict="no" ?>
<?rfc compact="yes"?>
<?rfc rfcedstyle="yes"?>

<rfc number="6657" category="std" submissionType="IETF" consensus="yes" ipr="trust200902" updates="2046"
     xmlns:x='http://purl.org/net/xml2rfc/ext'
     x:maturity-level="proposed">
    <front>
        <title abbrev="MIME Charset Default Update">Update to MIME regarding &quot;charset&quot; Parameter Handling in&#160;Textual&#160;Media&#160;Types</title>
        <author initials='A.' surname="Melnikov" fullname="Alexey Melnikov">
            <organization>Isode Limited</organization>
            <address>
              <postal>
                <street>5 Castle Business Village</street>
                <street>36 Station Road</street>
                <city>Hampton</city>
                <region>Middlesex</region>
                <code>TW12 2BX</code>
                <country>UK</country>
              </postal>
              <email>Alexey.Melnikov@isode.com</email>
            </address>
        </author>
        <author initials="J. F." surname="Reschke" fullname="Julian F. Reschke">
            <organization abbrev="greenbytes">greenbytes GmbH</organization>
            <address>
              <postal>
                <street>Hafenweg 16</street>
                <city>Muenster</city><region>NW</region><code>48155</code>
                <country>Germany</country>
              </postal>
              <email>julian.reschke@greenbytes.de</email>	
              <uri>http://greenbytes.de/tech/webdav/</uri>	
            </address>
        </author>
        <date year="2012" month="July"/>
        <area>Applications</area>
        <workgroup>Applications Area Working Group</workgroup>
        
        <keyword>MIME</keyword>
        <keyword>charset</keyword>
        <keyword>text</keyword>
        <abstract>
          <t>
            This document changes RFC 2046 rules regarding default "charset" parameter
	    values for "text/*" media types to better align with common usage by existing
	    clients and servers.
          </t>
        </abstract>
    </front>

    <middle>
        <section title="Introduction and Overview">
            <t>

	    RFC 2046 specified that the default "charset" parameter
	    (i.e., the value used when the parameter is not specified) is "US-ASCII" (<xref x:fmt="of" x:sec="4.1.2" target="RFC2046"/>).
	    RFC 2616 changed the default for use by HTTP (Hypertext Transfer Protocol) to be "ISO-8859-1" (<xref x:fmt="of" x:sec="3.7.1" target="RFC2616"/>).
	    This encoding is not very common for new "text/*" media types
	    and a special rule in the HTTP specification adds confusion
	    about which specification (<xref target="RFC2046"/> or <xref target="RFC2616"/>)
	    is authoritative in regards to the default charset for "text/*" media types.
	    
	    <!-- jre recommends to raise an HTTPbis issue one feels strongly about this
      
      <cref>At the time of writing of this document the IETF HTTPBIS WG is working
	    on an update to RFC 2616 which removes the default charset of "ISO-8859-1"
	    for "text/*" media types. It is expected that the set of HTTPBIs documents
	    will reference this document in order to use the updated rules
	    of default charset in "text/*" media types.</cref> -->
            </t>
    
            <t>
	    Many complex text subtypes such as "text/html" <xref target="RFC2854"/>  and "text/xml" <xref target="RFC3023"/>  have internal
	    (to their format) means of describing the charset.
	    Many existing User Agents ignore the default of "US-ASCII" rule for at least
	    "text/html" and "text/xml".
            </t>

	    <t>This document changes RFC 2046 rules regarding default "charset" parameter
	    values for "text/*" media types to better align with common usage by existing
	    clients and servers. It does not change the defaults for any currently
      registered media type.<!-- FIXME if we actually do change the default for text/plain-->
	    </t>
<!-- JR: we may also want to state that we do not define handling of broken messages-->
        </section>

	<section title="Conventions Used in This Document">
	    
	    <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
	    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
	    this document are to be interpreted as described in
	    <xref target="RFC2119"/>.</t>
          
	</section>
	
	<section title="New Rules for Default &quot;charset&quot; Parameter Values for &quot;text/*&quot; Media Types">

	    <t><xref target="RFC2046" x:sec="4.1.2" x:fmt="of"/> says:</t>

      <x:blockquote cite="http://tools.ietf.org/html/rfc2046#section-4.1.2">
        <t>The default character set, which must be assumed in the absence of a charset parameter, is US-ASCII.</t>
      </x:blockquote>

<!--///RFC 2046, Section 4.1.2 also says:
   Note that the character set used, if anything other than US- ASCII,
   must always be explicitly specified in the Content-Type field.
-->
	    
	    <t>As explained in the Introduction section, this rule is considered
	    outdated, so this document replaces it with the following set
	    of rules:</t>

<!--///Ned wrote:
    In the absence of a specification of a default, I'm tempted
    to say the "default default" should be UTF-8.
-->

<!--///John Klensin wrote:
    Require text/* to be accompanied by a charset parameter
    always.  Of course, if one is omitted, which will happen
    in practice, people will assume the old rules.  But new
    and updated specs say "parameter MUST be supplied".
-->   
	    
	    <t>Each subtype of the "text" media type that uses the "charset"
	    parameter can define its own default value for the "charset" parameter,
	    including the absence of any default.
	    </t>
	    
	    <t>
<!-- jre I'm not sure I understand that "for interoperability..." part-->
<!--
Henri S.> For backwards compatibility, pretty much every existing text/* type
Henri S.> will have to violate this "SHOULD NOT".

Ned F.> Yep. That's the main reason why it needs to be a SHOULD.

JR: those who will update text/xml and text/html know how to read the SHOULD.
-->
	    In order to improve interoperability with deployed agents,
	    "text/*" media type registrations &SHOULD; either
      </t>
      <t>
      <list style="letters">
        <t>
          specify that the "charset" parameter is not used for the defined subtype,
    	    because the charset information is transported inside the payload (such as in "text/xml"), or
        </t>
        <t>
          require explicit unconditional inclusion of the "charset" parameter,
    	    eliminating the need for a default value.
        </t>
      </list>
      </t>
      <t>
<!--////Alexey: Hmmm, does this mean that the second choice above doesn't specify a default either?-->

	    In accordance with option (a) above, registrations for "text/*" media types that can
	    transport charset information inside the corresponding payloads (such
	    as "text/html" and "text/xml") &SHOULD-NOT; specify
	    the use of a "charset" parameter, nor any default value, in order to
	    avoid conflicting interpretations should the "charset" parameter value
	    and the value specified in the payload disagree.</t>
	    
	    <t>
	    Thus, new subtypes of the "text" media type &SHOULD-NOT; define a
	    default "charset" value.  If there is a strong reason to do so
	    despite this advice, they &SHOULD; use the "UTF-8" <xref target='RFC3629'/> charset
	    as the default.
	    </t>
      
      <t>
      Regardless of what approach is chosen, all new "text/*" registrations &MUST;
      clearly specify how the charset is determined; relying on the default
      defined in <xref target="RFC2046" x:sec="4.1.2" x:fmt="of"/> is no longer
      permitted. However, existing "text/*" registrations that fail to specify
      how the charset is determined still default to US-ASCII. 
      </t>
    
	    <t>
	    Specifications covering the "charset" parameter, and what
	    default value, if any, is used, are subtype-specific, NOT
      protocol-specific.  Protocols that use MIME, therefore, &MUST-NOT;
      override default charset values for "text/*" media types to be different
      for their specific protocol.  The protocol definitions &MUST; leave that
	    to the subtype definitions.
	    </t>
	    
        </section>

	<section title="Default &quot;charset&quot; Parameter Value for &quot;text/plain&quot; Media Type">

	    <t>The default "charset" parameter value for "text/plain" is unchanged
	    from <xref target="RFC2046"/> and remains as "US-ASCII".</t>
	    
        </section>
		<section anchor="security" title="Security Considerations">
	    
          <t>
            Guessing of the "charset" parameter can lead to security issues
            such as content buffer overflows, denial of services, or bypass
            of filtering mechanisms. However, this document does not
            promote guessing, but encourages use of charset information
            that is specified by the sender.
          </t>
          <t>
            Conflicting information in-band vs. out-of-band can also lead to
            similar security problems, and this document recommends the use
            of charset information that is more likely to be correct (for
            example, in-band over out-of-band). 
          </t>

	</section>

        <section anchor="iana" title="IANA Considerations">

            <t>
              IANA has updated the "text" subregistry of the Media
              Types registry (<eref target="http://www.iana.org/assignments/media-types/text/"/>) to add the following preamble:
              "See [RFC6657] for information about 'charset' parameter handling for text media types."
            </t>
            <t>
              Also, IANA has added this RFC to the list of references at the beginning of the Application for Media Type (<eref target="http://www.iana.org/form/media-types"/>).
            </t>
        </section>
    </middle>

    <back>
        <references title="Normative References">

	    &rfc2119;
	    &rfc2046;
	    &rfc3629;

        </references>

        <references title="Informative References">

	    &rfc2616;
	    &rfc2854;
	    &rfc3023;
	    
        </references>
    
    <section title="Acknowledgements">
	
      <t>
	Many thanks to Ned Freed and John Klensin for comments and ideas that motivated
	creation of this document, and to Carsten Bormann, Murray S. Kucherawy, Barry Leiba, and Henri Sivonen for feedback and text suggestions.
      </t>

    </section>
	
    </back>
</rfc>
