HTTP/2.0 Header CompressionGoogle, Incfenix@google.comCanon CRFherve.ruellan@crf.canon.fr
Applications
HTTPbis Working GroupHTTPHeader
This document describes a format adapted to efficiently
represent HTTP headers in the context of HTTP/2.0.
This document describes a format adapted to efficiently
represent HTTP headers in the context of HTTP/2.0.
In HTTP/1.X, HTTP headers, which are necessary for the
functioning of the protocol, are transmitted with no
transformations. Unfortunately, the amount of redundancy in
both the keys and the values of these headers is astonishingly
high, and is the cause of increased latency on
lower bandwidth links. This indicates that an alternate
encoding for headers would be beneficial to latency, and that
is what is proposed here.
As shown by SPDY, Deflate
compresses HTTP very effectively. However, the use of a
compression scheme which allows for arbitrary matches against
the previously encoded data (such as Deflate) exposes users to
security issues.
In particular, the compression of sensitive data, together
with other data controlled by an attacker, may lead to leakage
of that sensitive data, even when the resultant bytes are
transmitted over an encrypted channel.
Another consideration is that processing and memory costs of a
compressor such as Deflate may also be too high for some
classes of devices, for example when doing forward or reverse
proxying.
The HTTP header representation described in this document
is based on indexing tables that store (name, value) pairs,
called header tables in the remainder of this document.
This scheme is believed to be safe for all known attacks
against the compression context today. Header tables are
incrementally updated during the whole HTTP/2.0 session.
Two independent header tables are used during a HTTP/2.0
session, one for HTTP request headers and one for HTTP
response headers.
The encoder is responsible for deciding which headers to
insert as (name, value) pairs in the header table. The
decoder then does exactly what the encoder prescribes,
ending in a state that exactly matches the encoder's
state. This enables decoders to remain simple and
understand a wide variety of encoders.
A header may be represented as a literal or as an index.
If represented as a literal, the representation specifies
whether this header is used to update the indexing table.
The different representations are described in .
A set of headers is coded as a difference from the
previous set of headers.
An example illustrating the use these different mechanisms
to represent headers is available in .
The encoding and decoding of headers relies on a few
components. First, a header table (see ) is used to associate headers
to index values. Second, a set of headers is encoded as a
difference from the previous reference set of headers (see
).
As messages are exchanged in two directions, from client
to server and from server to client, there are two sets of
components: one for each direction. All the headers sent
in messages from the client to the server are encoded (and
decoded) using one set of components. All the headers sent
in messages from the server to the client (including
headers contained in PUSH_PROMISE frame) are encoded
using the other set of compotents.
A header table consists of an ordered list of (name,
value) pairs. A pair is either inserted at the end of the
table or replaces an existing pair depending on the chosen
representation. A pair can be represented as an index
which is its position in the table, starting with 0 for
the first entry.
An input header name matches the header name of a (name,
value) pair stored in the Header Table if they are
equal using a character-based, case
sensitive comparison.
An input header value matches the header value of a
(name, value) pair stored in the Header Table if they are
equal using a character-based, case
sensitive comparison.
An input header (name, value) pair matches a pair in the
Header Table if both the name and value are matching as
per above.
Generally, the header table will not contain duplicate
header (name, value) entries. However, implementations
MUST be prepared to accept duplicates without signaling an
error. If duplicates are added to the table, they MUST be
treated as distinct entries with their own index
positions.
The header table is progressively updated based on headers
represented as literal (as defined in ). Two update mechanisms are
defined:
Incremental indexing: the represented header is
inserted at the end of the header table as a
(name, value) pair. The inserted pair index is
set to the next free index in the table: it is
equal to the number of headers in the table before
its insertion.
Substitution indexing: the represented header
contains an index to an existing (name, value)
pair. The existing pair value is replaced by the
pair representing the new header.
Incremental and substitution indexing are optional. If
none of them is selected in a header representation, the
header table is not updated. In particular, no update
happens on the header table when processing an indexed
representation.
The header table size can be bounded so as to limit the
memory requirements (see the SETTINGS_MAX_BUFFER_SIZE in
). The header table
size is defined as the sum of the size of each entry of
the table. The size of an entry is the sum of the length
in bytes (as defined in ) of its
name, of value's length in bytes and of 32 bytes (for
accounting for the entry structure overhead). The header
table size MUST NOT exceed this limit.
Before adding a new entry to the header table or changing
an existing one, a check has to be performed to ensure
that the change will not cause the table to grow in size
beyond the SETTINGS_MAX_BUFFER_SIZE limit. If necessary,
one or more items from the beginning of the table are
removed until there is enough free space available to make
the modification. Dropping an entry from the beginning of
the table causes the index positions of the remaining
entries in the table to be decremented by 1.
Feedback is needed on this automatic eviction
strategy.
When using substitution indexing, it is possible that the
existing item being replaced might be one of the items
removed when performing the necessary size adjustment. In
such cases, the substituted value being added to the
header table is inserted at the beginning of the header
table (at index position #0) and the index positions of
the other remaining entries in the table are incremented
by 1.
To optimize the representation of the headers exchanged at
the beginning of an HTTP/2.0 session, the header table is
initialized with common headers. Two lists of initial
headers are provided in .
One is for messages sent from a client to a server, the
other is for messages sent from a server to a client.
The literal representation defines a new header. A
literal header is represented as:
A header name, with two possible
representations:
A literal string, as described in
.
A index in the header table
referencing the name of the
corresponding header. The index is
represented as an integer, as
described in .
The header value, represented as a literal
string, as described in .
The indexed representation defines a header as a match
to a (name, value) pair in the header table. An
indexed header is represented as:
An integer representing the index of the
matching (name, value) pair, as described in
.
A set of headers is encoded as a difference from the
previous reference set of headers. The initial reference
set of headers is the empty set.
An indexed representation toggles the presence of the
header in the current set of headers. If the header
corresponding to the indexed representation was not in the
set, it is added to the set. If the header index was in
the set, it is removed from it.
A literal representation adds a header to the current set
of headers.
To ensure a correct decoding of a set of headers, the
following steps or equivalent ones MUST be executed by the
decoder.
First, upon starting the decoding of a new set of headers,
the reference set of headers is interpreted into the
working set of headers: for each header in the reference
set, an entry is added to the working set, containing the
header name, its value, and its current index in the
header table.
Then, the header representations are processed in their
order of occurrence in the frame.
For an indexed representation, the decoder checks whether
the index is present in the working set. If true, the
corresponding entry is removed from the working set. If
several entries correspond to this encoded index, all
these entries are removed from the working set. If
the index is not present in the working set, it is used to
retrieve the corresponding header from the header table,
and a new entry is added to the working set representing
this header.
For a literal representation, a new entry is added to the
working set representing this header. If the literal
representation specifies that the header is to be indexed,
the header is added accordingly to the header table, and
its index is included in the entry in the working set.
Otherwise, the entry in the working set contains an
undefined index.
When all the header representations have been processed,
the working set contains all the headers of the set of
headers.
The new reference set of headers is computed by removing
from the working set all the headers that are not present
in the header table.
It should be noted that during the decoding of the header
representations, the same index may be associated to
different headers in the working set and in the header
table.
A header block consists of a set of header fields, which
are name-value pairs. Each header field is encoded using
one of the header representation.
Integers are used to represent name indexes, pair
indexes or string lengths.
The integer representation keeps byte-alignment as
much as possible as this allows various processing
optimizations as well as efficient use of DEFLATE.
For that purpose, an integer representation always
finishes at the end of a byte.
An integer is represented in two parts: a prefix that
fills the current byte and an optional list of bytes
that are used if the integer value does not fit in the
prefix. The number of bits of the prefix (called N)
is a parameter of the integer representation.
The N-bit prefix allows filling the current byte. If
the value is small enough (strictly less than 2^N-1),
it is encoded within the N-bit prefix. Otherwise all
the bits of the prefix are set to 1 and the value is
encoded using an
unsigned variable length integer
representation.
The algorithm to represent an integer I is as follows:
If I < 2^N - 1, encode I on N bitsElse, encode 2^N - 1 on N bits and do the
following steps:Set I to (I - (2^N - 1)) and Q to 1While Q > 0Compute Q and R, quotient and remainder
of I divided by 2^7If Q is strictly greater than 0, write
one 1 bit; otherwise, write one 0
bitEncode R on the next 7 bitsI = Q
The value 10 is to be encoded with a 5-bit prefix.
10 is less than 31 (= 2^5 - 1) and is
represented using the 5-bit prefix.
The value I=1337 is to be encoded with a 5-bit
prefix.
1337 is greater than 31 (= 2^5 - 1).The 5-bit prefix is filled with its max
value (31).The value to represent on next bytes is I =
1337 - (2^5 - 1) = 1306.1306 = 128*10 + 26, i.e. Q=10 and
R=26.Q is greater than 1, bit 8 is set to
1.The remainder R=26 is encoded on next 7
bits.I is replaced by the quotient Q=10.The value to represent on next bytes is I =
10.10 = 128*0 + 10, i.e. Q=0 and R=10.Q is equal to 0, bit 16 is set to
0.The remainder R=10 is encoded on next 7
bits.I is replaced by the quotient Q=0.The process ends.
Literal strings can represent header names or header
values. They are encoded in two parts:
The string length, defined as the number of
bytes needed to store its UTF-8
representation, is represented as an integer
with a zero bits prefix. If the string length
is strictly less than 128, it is represented
as one byte.
The string value represented as a list of
UTF-8 characters.
This representation starts with the '1' 1-bit pattern,
followed by the index of the matching pair, represented as
an integer with a 7-bit prefix.
This representation, which does not involve updating
the header table, starts with the '011' 3-bit pattern.
If the header name matches the header name of a (name,
value) pair stored in the Header Table, the index of
the pair increased by one (index + 1) is represented
as an integer with a 5-bit prefix. Note that if the
index is strictly below 31, one byte is used.
If the header name does not match a header name entry,
the value 0 is represented on 5 bits followed by the
header name, represented as a literal string.
Header name representation is followed by the header
value represented as a literal string as described in
.
This representation starts with the '010' 3-bit
pattern.
If the header name matches the header name of a (name,
value) pair stored in the Header Table, the index of
the pair increased by one (index + 1) is represented
as an integer with a 5-bit prefix. Note that if the
index is strictly below 31, one byte is used.
If the header name does not match a header name entry,
the value 0 is represented on 5 bits followed by the
header name, represented as a literal string.
Header name representation is followed by the header
value represented as a literal string as described in
.
This representation starts with the '00' 2-bit
pattern.
If the header name matches the header name of a (name,
value) pair stored in the Header Table, the index of
the pair increased by one (index + 1) is represented
as an integer with a 6-bit prefix. Note that if the
index is strictly below 62, one byte is used.
If the header name does not match a header name entry,
the value 0 is represented on 6 bits followed by the
header name, represented as a literal string.
The index of the substituted (name, value) pair is
inserted after the header name representation as a
0-bit prefix integer.
The index of the substituted pair MUST correspond to a
position in the header table containing a non-void
entry. An index for the substituted pair that
corresponds to empty position in the header table MUST
be treated as an error.
This index is followed by the header
value represented as a literal string as described in
.
A few parameters can be used to accomodate client and server
processing and memory requirements.
These settings are currently not supported as they have
not been integrated in the main specification. Therefore,
the maximum buffer size for the header table is fixed at
4096 bytes.
Allows the sender to inform the remote endpoint of the
maximum size it accepts for the header table.
The default value is 4096 bytes.
Is this default value OK? Do we need a maximum size? Do we want to allow infinite buffer?
When the remote endpoint receives a SETTINGS frame
containing a SETTINGS_MAX_BUFFER_SIZE setting with a
value smaller than the one currently in use, it MUST
send as soon as possible a HEADER frame with a stream
identifier of 0x0 containing a value smaller than or
equal to the received setting value.
This changes slightly the behaviour of the
HEADERS frame, which should be updated as follows:
A HEADER frame with a stream identifier of 0x0
indicates that the sender has reduced the maximum size
of the header table. The new maximum size of the
header table is encoded on 32-bit. The decoder MUST
reduce its own header table by dropping entries from
it until the size of the header table is lower than or
equal to the transmitted maximum size.
TODO?This memo includes no request to IANA.SPDY ProtocolTwistGoogle
The tables in this section should be updated based on
statistical analysis of header names frequency and specific
HTTP 2.0 header rules (like removal of some headers).
These tables are not adapted for headers contained in
PUSH_PROMISE frames. Either the tables can be merged, or the
table for responses can be updated.
The following table lists the pre-defined headers that
make-up the initial header table user to represent
requests sent from a client to a server.
IndexHeader NameHeader Value0:schemehttp1:schemehttps2:host3:path/4:methodGET5accept6accept-charset7accept-encoding8accept-language9cookie10if-modified-since11keep-alive12user-agent13proxy-connection14referer15accept-datetime16authorization17allow18cache-control19connection20content-length21content-md522content-type23date24expect25from26if-match27if-none-match28if-range29if-unmodified-since30max-forwards31pragma32proxy-authorization33range34te35upgrade36via37warning
The following table lists the pre-defined headers that
make-up the initial header table used to represent
responses sent from a server to a client. The same header
table is also used to represent request headers sent from
a server to a client in a PUSH_PROMISE frame.
IndexHeader NameHeader Value0:status2001age2cache-control3content-length4content-type5date6etag7expires8last-modified9server10set-cookie11vary12via13access-control-allow-origin14accept-ranges15allow16connection17content-disposition18content-encoding19content-language20content-location21content-md522content-range23link24location25p3p26pragma27proxy-authenticate28refresh29retry-after30strict-transport-security31trailer32transfer-encoding33warning34www-authenticate
Here is an example that illustrates different representations
and how tables are updated.
This section needs to be updated to integrate differential coding.
The first header set to represent is the following:
The header table is empty, all headers are represented as
literal headers with indexing. The 'x-my-header' header
name is not in the header name table and is encoded
literally. This gives the following representation:
The header table is as follows after the processing of
these headers:
As all the headers in the first header set are indexed in
the header table, all are kept in the reference
set of headers, which is:
The second header set to represent is the following:
Comparing this second header set to the reference set, the
first and third headers are from the reference set are not
present in this second header set and must be removed. In
addition, in this new set, the first and third headers
have to be encoded.
The path header is represented as a literal header with
substitution indexing. The x-my-header will be
represented as a literal header with incremental indexing.
The header table is updated as follow:
All the headers in this second header set are indexed in
the header table, therefore, all are kept in the reference
set of headers, which becomes: