Discussion:
Why the quote element doesn't add quotes by default (was Re: http://www.w3.org/TR/2004/WD-xhtml2-20040722/xhtml2-diff.html)
(too old to reply)
Masayasu Ishikawa
2004-08-03 05:40:49 UTC
Permalink
Section 9.8 The quote element
What's the rationale behind requiring the author to add quotes via style or
content instead of inserting them by default ("default stylesheet")?
This question comes up frequently, so I'll explain the rationale behind
this.

Short summary:

The q element in earlier version of (X)HTML placed the burden of adding
"proper" quotation marks on the wrong side. The quote element in
XHTML 2.0 shifts the burden of adding "proper" quotation marks from
user agents to authors, who know what are "proper" quotation marks
for their documents.

Longer story:

Back to 2001, the HTML Working Group reviewed all elements/attributes
in the XHTML namespace whether they should be succeeded to XHTML 2.0.
A question arose whether the q element should be altered to NOT supply
the quotation marks by default, and had discussion with the I18N WG
and the CSS WG.

The basic problem is that the q element requires arcane knowledge of
language-sensitive quotation marks, and no user agent would be able to
capture all the possible combination of all languages around the world.
So, it would be unavoidable that each user agent would end up supporting
only certain subset of language-sensitive quotation marks, which may
differ by each user agent - the least common denominator would be quite
small, or even none. So the result is unpredictable, and authors can't
be sure what kind of quotation marks will be rendered, even though they
do know what kind of quotation marks they intended.

While the HTML 4 spec didn't indicate what a user agent should minimally
do [1], RFC 2070 included the following note [2]:

NOTE -- minimal support for the Q element is to surround the
contents with some kind of quotes, like the plain ASCII double
quotes. As this is rather easy to implement, and as the lack of
any visible quotes may affect the perceived meaning of the text,
user-agent implementors are strongly requested to provide at least
this minimal level of support.

And this fallback behavior is another reason why the q element was not
used widely. In the early days, the main reason was of course the lack
of support at all. However, by 2001 many "modern" browsers provided
at least "minimal" support for the q element. To list a few (caution:
these are the implementation status in 2001, those may have been
improved since then):

- Lynx has been supporting nested handling of the q element so that
it alternates double-quotes and single-quotes with directionality
of start and end single-quotes (i.e. something like "... `...' ..."),
since 27 May 1996.
- Opera supports the q element since version 4 (but only minimally),
and also supports relevant CSS properties.
- Mozilla/Netscape 6 also support it, but they all just insert " around
<q>...</q>, in non-language-sensitive manner. It also supports relevant
CSS properties, but didn't handle nesting of quotes properly at that time.
- Amaya alternates " and ', but it's not language-sensitive.
- Alis Tango is able to configure quotation marks, but strangely its
configuration is affected by the language of the *user interface*,
so if a user chooses Japanese UI, Tango inserts Japanese quotation
marks regardless of the language of the document, even in English or
French context.
- IE5/Mac tries to be somewhat language-sensitive, but its behavior is
sometimes strange, e.g. it uses the combination of U+201C - U+201D and
U+2018 - U+2019 if the language is "en", but it merely uses " and '
for "en-US", "en-GB" and so on, and for some languages it uses strange
quotation marks. It doesn't support relevant CSS properties or other
means to override the default quotation marks.
- iCab implements the q element in a language-sensitive manner to some
extent, but doesn't provide a way to override the default quotation marks.
- IE/Win lacks support for the q element in all versions.

This situation effectively shows that the "minimal" level of support
for the q element is certainly not difficult, but very few implementors
dare to go beyond that level. Ironically IE5/Mac and iCab tried to
implement it far better than other user agents, but neither of them
provided a way to override the default quotation marks, so for example,
neither of them does Japanese quotation marks correctly but authors cannot
override the poor "fallback" quotation marks on those user agents.

This situation rather discourages the use of the q element, e.g. even if
a French author does know what the French quotation marks should be,
the specification says that authors should not put quotation marks
by themselves around q, and most browsers just end up with ", which
is not at all satisfactory. Given that situation, it is quite possible
that some authors just insert French quotation marks directly and don't
use the q element at all. Even the latest draft of "HTML Techniques
for WCAG 2.0" says as follows [3]:

The q element marks up inline quotations.

NOTE: The q element, though designed for semantic markup, is
unsupported, or poorly-supported, in most browsers. So this is
a future technique.

This is not a document written in the last century, a document written
in 2004. Probably the "future" will never come. Not using appropriate
markup for quotations is worse than not having appropriate quotation marks.

Another difficult aspect of handling language-sensitive quotation marks
is that existing practice vary whether quotation marks are considered
as part of the content of the parent of the quoted text, or that of
the quoted text itself. We researched a bunch of publications, only
to find that there's no consistent rule across the world.

For example, the quotation marks around English quoted text inside French
content text are typically rendered as French quotation marks. On the
other hand, when languages like Chinese, German, Indonesian, Korean,
Malay are quoted inside Japanese text, quotation marks are typically
rendered in the language of the *quoted text*, not as Japanese quotation
marks. These are all real-world examples, and those examples effectively
show that there are diverse practices around the world, and it is not
at all trivial to determine the "proper" quotation marks in an appropriate
context. The rule may even differ by local convention, or by author's
preference.

If we require that user agents should have default style rules,
implementors would have to prepare great number of language-sensitive
style rules, and even if they do a great job, they won't be able to
cover all possible combination of various languages around the world,
and even if they can, that may not match the author's preference/
convention. On the other hand, it is rather rare that a document
includes multilingual quotations, and authors only have to provide
a few style rules that are necessary for their documents. And they
do know their preference/convention.

So we concluded that it would be reasonable to place the burden of
adding "proper" quotation marks on authors rather than implementors.
The I18N WG recommended that using styling would be a preferable way
and encouraged CSS implementors to support relevant feature more widely
and consistently. Then, each author may have their own default style
rules, and may include them in their author style sheet. We could
provide some sample style rules, but it MUST NOT be in the default
XHTML 2.0 style sheet.

That's what was agreed between HTML, I18N, and CSS WGs more than three
years ago, and why the quote element doesn't add quotes by default.

[1] http://www.w3.org/TR/html4/struct/text.html#edef-Q
[2] http://www.rfc-editor.org/rfc/rfc2070.txt
[3] http://www.w3.org/TR/2004/WD-WCAG20-HTML-TECHS-20040730/#q

Regards,
--
Masayasu Ishikawa / ***@w3.org
W3C - World Wide Web Consortium
Jukka K. Korpela
2004-08-03 07:21:39 UTC
Permalink
Post by Masayasu Ishikawa
The q element in earlier version of (X)HTML placed the burden of adding
"proper" quotation marks on the wrong side. The quote element in
XHTML 2.0 shifts the burden of adding "proper" quotation marks from
user agents to authors, who know what are "proper" quotation marks
for their documents.
Short summary of my comments:

There's no reason not to include it into XHTML 2.0, with essentially
the same definition as in HTML 4.01, if XHTML 2.0 is not designed to be
compatible with "old" user agents (such as IE 6 or current indexing robots).
Post by Masayasu Ishikawa
The basic problem is that the q element requires arcane knowledge of
language-sensitive quotation marks, and no user agent would be able to
capture all the possible combination of all languages around the world.
No, I don't think that's the basic problem. The basic problem is that
<q> was designed not to degrade gracefully. Browsers that do not recognize
or do not support <q> markup now render just the context, omitting the
potentially vital information that it's a quotation. Markup like
<q><qm>"</qm>To be or not to be, that is the question<qm>"</qm>.</q>
would degrade gracefully. Here qm elements would contain quotation marks
that are to be omitted by user agents that support the q element.
(Cf. to ideas of Ruby markup.)

It's a lot of work to support the quotation mark rules for all the
languages of the world. But it would be reasonable to support just
the hundred or so most used languages and use default rendering
(with Ascii quotation marks) for the rest. Besides, I think we can
realistically hope that the Common Locale Data Project conducted by the
Unicode Consortium will produce, as part of the locale data repository,
information about such conventions in a manner that can be directly fed
into software. (Quotation mark usage rules are not included
into the current scenario, but it is very natural to expect that they
will be addressed. Those rules are essential to text processing programs
for example.)

After all, if we don't think that even relatively trivial things like
punctuation character variation can be handled, then what's the point of
telling authors to use language markup (for all changes in language too!)?
The W3C documents about language markup promise quite a many cool things,
like language-sensitive text formatting, spelling checks, etc. If the
reality is that even selecting quotation marks is way above the state of
the art, then please give me a break.
Post by Masayasu Ishikawa
This situation effectively shows that the "minimal" level of support
for the q element is certainly not difficult, but very few implementors
dare to go beyond that level.
But for XHTML 2.0, making the minimal support mandatory, you will have a
new start. If some browser claims XHTML 2.0 conformance and accepts
XHTML 2.0 documents for rendering, yet fails to do such an extremely
simple thing as putting Ascii quotation marks around <q> context
if it can't do any better, then it's hopeless anyway. In fact, if you
added optional <qm> markup (though it would admittedly be odd - a
compatibility element in a language designed to be incompatible),
then authors could even take precautions against such misbehavior.
Post by Masayasu Ishikawa
This situation rather discourages the use of the q element, e.g. even if
a French author does know what the French quotation marks should be,
the specification says that authors should not put quotation marks
by themselves around q, and most browsers just end up with ", which
is not at all satisfactory.
The French rules are actually a good reason _for_ <q> markup. You can
enter guillemets as characters (in any way you like), in current HTML and
in the future, but how will you tell visual browsers to leave a
fine (thin) space between the text and the guillemets? If you use
the THIN SPACE character, you face the problem that it allows a line
break. If you use NO-BREAK SPACE, you get it typographically wrong
(too wide). Using <q> markup along with language information _allows_
a browser to create the best possible rendering. (Admittedly this raises
the question whether we need markup for questions and exclamations as
well.)
Post by Masayasu Ishikawa
Not using appropriate
markup for quotations is worse than not having appropriate quotation marks.
Is it? What would quotation markup be used for, then? I can imagine quite
a many _possible_ uses (like search engines specifically searching for
occurrences of words inside quotations), but realistically, what do you
expect, during this century?
Post by Masayasu Ishikawa
Another difficult aspect of handling language-sensitive quotation marks
is that existing practice vary whether quotation marks are considered
as part of the content of the parent of the quoted text, or that of
the quoted text itself.
Indeed. And there is variation within each language, too. And you might
have difficulties in finding out what the official rules, and the
de facto rules, for nested quotations are. But such things need to be
addressed anyway in the world of computing. A good-quality text processing
program needs to know how to change Ascii quotation marks or apostrophes
into something more suitable, by language-sensitive rules. Building such
things into a browser means work but not rocket science.
Post by Masayasu Ishikawa
So we concluded that it would be reasonable to place the burden of
adding "proper" quotation marks on authors rather than implementors.
That's where the burden has been, and very few authors take the burden.
It has always been possible, for example, to use guillemets (which
belong to ISO 8859-1) in HTML documents in languages and orthographic
styles that use guillemets for quotations. But it is rare to see them used
in such situations. Placing the burden on authors alone means, in effect,
saying that proper use of quotation marks isn't generally relevant.
(I use the word "alone", since the <q> markup approach naturally
expects authors to use that markup for quotations.)
Post by Masayasu Ishikawa
The I18N WG recommended that using styling would be a preferable way
and encouraged CSS implementors to support relevant feature more widely
and consistently.
But how could you do such styling in CSS when your markup is
<quote>"To be or not to be, that is the question".</quote>
and you have no way in CSS to tell the browser not to render some
characters in the content? If you use the correct English quotation marks
in the content, you won't need any CSS styling. If you omit the quotation
marks from the content and add them in CSS, then you are relying on CSS
in conveying essential semantic information, and, besides, if authors
really did such things, millions of people would write the same CSS rules.
Well, not the same actually - authors would write _wrong_ rules.
Even the CSS 2 specification presents _wrong_ rules for quotation marks.
If those rules are hard to get right even to people who write
specifications, standards, and browsers, is it realistic to expect
that ordinary authors have a fair chance of getting them right?
(OK, admittedly we can expect a person to know the punctuation rules of
their native language. The expectation is mostly wrong, but fair.
But people use other languages in Web authoring, too.)
--
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Anne van Kesteren
2004-08-23 10:29:29 UTC
Permalink
Post by Jukka K. Korpela
No, I don't think that's the basic problem. The basic problem is that
<q> was designed not to degrade gracefully. Browsers that do not
recognize or do not support <q> markup now render just the context,
omitting the potentially vital information that it's a quotation.
Markup like <q><qm>"</qm>To be or not to be, that is the
question<qm>"</qm>.</q> would degrade gracefully. Here qm elements
would contain quotation marks that are to be omitted by user agents
that support the q element. (Cf. to ideas of Ruby markup.)
That sounds like a great idea. Something similar could be designed for
BLOCKQUOTE as well, I guess.
--
Anne van Kesteren
<http://annevankesteren.nl/>
Bjoern Hoehrmann
2004-08-03 19:26:29 UTC
Permalink
Post by Masayasu Ishikawa
So we concluded that it would be reasonable to place the burden of
adding "proper" quotation marks on authors rather than implementors.
The I18N WG recommended that using styling would be a preferable way
and encouraged CSS implementors to support relevant feature more widely
and consistently. Then, each author may have their own default style
rules, and may include them in their author style sheet.
http://www.w3.org/TR/xhtml2/xhtml2.html#sec_9.8. states

[...]
Visual user agents must not by default add delimiting quotation marks
(as was the case for the q element in earlier versions of XHTML). It
is the responsibility of the document author to add any required
quotation marks, either directly in the text, or via a stylesheet.
[...]

While http://www.w3.org/TR/WCAG10/#q31 states

[...]
6.1 Organize documents so they may be read without style sheets. For
example, when an HTML document is rendered without associated style
sheets, it must still be possible to read the document. [Priority 1]
[...]

It seems thus obvious that it is very misleading to state that authors
can use style sheets to add the quote marks, if they do, the document
would break in user agents that do not support style sheets, rendering
e.g. "[QUOTE: ... ]" instead of "'...'" seems to be allowed but is
something rather stupid to do for a user agent. Even if they are allowed
to render quote marks or other indicators for a quote, it would likely
break as the user agent does not know whether the author included the
quote marks in the document, they would have to deal with all of

"<quote>...</quote>"
<quote>"..."</quote>
<quote>...</quote>

which then might yield in e.g.

"[QUOTE: ... ]"
[QUOTE: "..." ]
[QUOTE: ... ]

Are the quote marks part of the content or indicate the quote marks
that the content is a quotation? If you want to write your document
so that it makes sense without style sheets you could write

"<quote>...</quote>"

but then, what would you do if you want to style the quote marks?
Not possible. So you might rather write

<span class="qm">"</span><quote>...</quote><span class="qm">"</span>

or maybe you would write

<quote><span class="qm">"</span>...<span class="qm">"</span></quote>

who knows..., then you could write

<style ...>
...
xhtml2|span.qm { display: none }
xhtml2|quote { quotes: "\201C" "\201D" "\2018" "\2019" }
xhtml2|quote::before { content: open-quote }
xhtml2|quote::after { content: close-quote }
xhtml2|quote::before,
xhtml2|quote::after { font-size: xx-large }
</style>

except that it would break the document again if the user agent does
not support CSS3 but CSS2, so you could write

<style ...>
span.qm { display: none }
quote { quotes: "\201C" "\201D" "\2018" "\2019" }
quote:before { content: open-quote }
quote:after { content: close-quote }
quote:before,
quote:after { font-size: xx-large }
</style>

Except that it would break in user agents that do not support CSS2
but CSS1, so maybe you would rather write

<style ...>
.qm { font-size: xx-large }
</style>

and include the quote marks directly in the text, but then you might
want to give the quotation different background-color, like

<style ...>
quote { background-color: #eeeeee }
</style>

but that would paint the background-color behind the quote marks
too, so you might add

<style ...>
quote { background-color: #eeeeee }
.qm { background-color: transparent }
</style>

which works to some extend, until you want to use a border instead of
a background color, as that would require to put the quote mark span
outside the quote element... This design is completly broken, there is
essentially no authoring benefit and proper use is way too complicated.
Whatever proper use might be, who knows. Even something like

<quote open-quote = '...' close-quote = '...'>...

would be better than this mess. It is way more important to me that user
agents lacking CSS support properly indicate that something is a quote
than that they render the quote marks I desire, and if the latter is
what the HTML Working Group considers more important, it would make way
more sense to require that style sheets are not used to insert the quote
marks, or to require user agents to support the relevant CSS 2.0
features.
Karl Dubost
2004-08-04 14:38:00 UTC
Permalink
Interesting analysis Björn
Post by Bjoern Hoehrmann
if the latter is
what the HTML Working Group considers more important, it would make way
more sense to require that style sheets are not used to insert the quote
marks, or to require user agents to support the relevant CSS 2.0
features.
CSS 2.0 or CSS 2.1?

CSS 2.1 is in CR phase
http://www.w3.org/TR/2004/CR-CSS21-20040225/

You can read their
s/Candidate Recommendation Exit Criteria/Proposed Recommendation
Entrance Criteria/
http://www.w3.org/TR/2004/CR-CSS21-20040225/#crec


which means to enter the Proposed Recommendation status:
http://www.w3.org/2004/02/Process-20040205/tr.html#cfr
""" 2. Shown that each feature of the technical
report has been implemented. Preferably, the Working
Group SHOULD be able to demonstrate two interoperable
implementations of each feature. If the Director
believes that immediate Advisory Committee review is
critical to the success of a technical report, the
Director MAY accept to Call for Review of a Proposed
Recommendation even without adequate implementation
experience;"""

More information about Implementation Report
http://esw.w3.org/topic/ImplementationReport

We might hope that it will be implemented (or dropped :/)
--
Karl Dubost - http://www.w3.org/People/karl/
W3C Conformance Manager
*** Be Strict To Be Cool ***
Bjoern Hoehrmann
2004-08-05 02:52:14 UTC
Permalink
Post by Karl Dubost
if the latter is what the HTML Working Group considers more
important, it would make way more sense to require that style
sheets are not used to insert the quote marks, or to require
user agents to support the relevant CSS 2.0 features.
CSS 2.0 or CSS 2.1?
For the features in question it does not really matter, for the
features they have in common they are defined to be equivalent,
that's at least what the CSS 2.0 Errata states.
Ian Hickson
2004-08-10 09:31:27 UTC
Permalink
Post by Bjoern Hoehrmann
It seems thus obvious that it is very misleading to state that authors
can use style sheets to add the quote marks, if they do, the document
would break in user agents that do not support style sheets, rendering
e.g. "[QUOTE: ... ]" instead of "'...'" seems to be allowed but is
something rather stupid to do for a user agent. Even if they are allowed
to render quote marks or other indicators for a quote, it would likely
break as the user agent does not know whether the author included the
quote marks in the document, they would have to deal with all of [...]
Bjoern makes some very valid points in his e-mail [1]. Is there any chance
the HTML working group could reply to his post? I am curious to understand
what the intended use of the <quote> element is.

[1] http://www.w3.org/mid/***@smtp.bjoern.hoehrmann.de

Cheers,
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Karl Dubost
2004-08-04 14:19:00 UTC
Permalink
Post by Masayasu Ishikawa
This question comes up frequently, so I'll explain the rationale behind
this.
Excellent Masayasu!!! Thanks.
Loading...