Network Working GroupA. Melnikov
Internet-DraftIsode Limited
Updates: 2046 (if approved)J. Reschke
Intended status: Standards Trackgreenbytes
Expires: December 16, 2011June 14, 2011

Update to MIME regarding Charset Parameter Handling in Textual Media Types


This document changes RFC 2046 rules regarding default charset parameter values for text/* media types to better align with common usage by existing clients and servers.

Status of this Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress”.

This Internet-Draft will expire on December 16, 2011.

Copyright Notice

Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents ( in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

1. Introduction and overview

[RFC2046] specified that the default charset parameter (i.e. the value used when it is not specified) is "US-ASCII". [RFC2616] changed the default for use by HTTP to be "ISO-8859-1". This encoding is not very common for new text/* media types and a special rule in HTTP adds confusion about which specification ([RFC2046] or [RFC2616]) is authoritative in regards to the default charset for text/* media types. [rfc.comment.1: At the time of writing of this document the IETF HTTPBIS WG is working on an update to RFC 2616 which removes the default charset of "ISO-8859-1" for "text/*" media types. It is expected that the set of HTTPBIs documents will reference this document in order to use the updated rules of default charset in "text/*" media types.]

Many complex text subtypes such as text/html and text/xml have internal (to their format) means of describing the charset. Many existing User Agents ignore the default of "US-ASCII" rule for at least text/html and text/xml.

This document changes RFC 2046 rules regarding default charset parameter values for text/* media types to better align with common usage by existing clients and servers.

2. Conventions Used in This Document

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

3. New rules for default charset parameter values for text/* media types

Section 4.1.2 of [RFC2046] says:

"The default character set, which must be assumed in the absence of a charset parameter, is US-ASCII."

As explained in the Introduction section this rule is considered to be outdated, so this document replaces it with the following set of rules:

Each subtype of the "text" media type which uses the charset parameter can define its own default value for the charset parameter, including absence of any default.

In order to improve interoperability with deployed agents, "text/*" media type definitions SHOULD either a) recommend no default charset parameter value (i.e. the charset information is transport inside the payload, for example as in "text/xml") or b) require explicit unconditional inclusion of the charset parameter with the default value. "text/*" media types that can transport charset information inside the corresponding payloads SHOULD NOT specify any default, in order to avoid conflicting instructions if the charset parameter value and the value specified in the payload don't agree.

New subtypes of the "text" media type that do define a default charset SHOULD use the "UTF-8" [RFC3629] charset as the default.

Protocols using MIME MUST NOT override default charset values for "text/*" media types to be different for their specific protocol.

4. Default charset parameter value for text/plain media type

The default charset parameter value for text/plain is unchanged from [RFC2046] and remains as "US-ASCII".

5. Security Considerations

TBD. Guessing of default charset is a security problem. Conflicting information in-band vs out-of-band is also a security problem.

6. IANA Considerations

This document asks IANA to update the "text" subregistry of the Media Types registry to additionally point to this document.

7. References

7.2. Informative References

Le Hors, A., Raggett, D., and I. Jacobs, “HTML 4.01 Specification”, W3C Recommendation REC-html401-19991224, December 1999, <>.
Latest version available at <>.
Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1”, RFC 2616, June 1999.
Sperberg-McQueen, C., Maler, E., Yergeau, F., Paoli, J., and T. Bray, “Extensible Markup Language (XML) 1.0 (Fifth Edition)”, W3C Recommendation REC-xml-20081126, November 2008, <>.
Latest version available at <>.

Appendix A. Acknowledgements

Many thanks to Ned Freed and John Klensin for comments and ideas that motivated creation of this document.

Authors' Addresses

Alexey Melnikov
Isode Limited
5 Castle Business Village
36 Station Road
Hampton, Middlesex TW12 2BX
Julian F. Reschke
greenbytes GmbH
Hafenweg 16
Muenster, NW 48155