7 Commits

Author SHA1 Message Date
Roozbeh Pournader
d8917c69a9 Do not allow line breaks before currency symbols
Implement the change proposed in UTC document L2/16-043R
(http://www.unicode.org/L2/L2016/16043r-line-break-pr-po.txt) to make
sure we do not break between letters and currency symbols.

Bug: 24959657
Change-Id: Ia29d0e5625f84870bd910d0c6e19036d17206704
2016-03-16 16:21:09 -07:00
Raph Levien
56840e8006 Suppress line breaks in emoji + modifier
An emoji base with an emoji modifier renders as a single glyph and
thus should not be a line break. Current (Unicode 8) logic does
indicate a line break, so we override the results of the ICU line
break iterator. The code references a proposal to improve Unicode
behavior; when that is adopted and we upgrade ICU accordingly, the
special-case code should be deleted, but the tests can remain.

Bug: 27343378
Change-Id: I5de9c53e9a34c503816f9131e3d894e6f7a57d13
2016-02-26 19:05:34 +00:00
Raph Levien
d3f45892c7 Suppress linebreaks in emoji ZWJ sequences
Due to the way emoji ZWJ sequences are defined, the ICU line breaking
algorithm determines that there are valid line breaks inside the
sequence. This patch suppresses these line breaks.

This is an adaptation of I225ebebc0f4186e4b8f48fee399c4a62b3f0218a
into the nyc-dev branch.

Bug: 25433289
Change-Id: I84b50b1e6ef13d436965eab389659d02a30d100f
2016-02-18 15:00:24 -08:00
Raph Levien
c88ef135fc Add penalty for breaks in URLs and email addresses
Recent changes have added special cases for line breaks within URLs
and email addresses. Such breaks are undesirable when they can be
avoided, but at other times are needed to avoid huge gaps, or indeed
to make the line fit at all.

This patch assigns a penalty for such breaks, equal to the hyphenation
penalty. The mechanism is currently very simple, but would be easy to
fine-tune based on more detailed information about break quality.

Bug: 20126487
Bug: 20566159
Change-Id: I0d3323897737a2850f1e734fa17b96b065eabd9c
2016-02-17 23:13:44 +00:00
Raph Levien
6d15657e4a Add line breaks to email addresses and URLs
This change adds accceptable line breaks according to sections 7.42
(Dividing URLs and e-mail addresses) and 14.12 (URLs or DOIs and line
breaks) of the Chicago Manual of Style (16th ed.). In general, these
place breaks before punctuation symbols, and suppresses them after
hyphens.

Bug: 20126487
Bug: 20566159
Change-Id: I2d07d516b920a506a2f718c38fb435c5eb1ee1f8
2016-02-17 23:12:48 +00:00
Raph Levien
9c4cc648ab Special-case URLs and email addresses for line breaking
Detect URLs and email addresses, and suppress both line breaking and
hyphenation within them.

Bug: 20126487
Bug: 20566159

Change-Id: I43629347a063dcf579e355e5b678d7195f453ad9
2016-02-17 23:11:46 +00:00
Raph Levien
57b6dae989 Refine hyphenation around punctuation
Implement a WordBreaker that defines our concept of valid word
boundaries, customizing the ICU behavior. Currently, we suppress line
breaks at soft hyphens (these are handled specially). Also, the
new WordBreaker class has methods that determine the start and end
of the word (punctuation stripped) for the purpose of hyphenation.

This patch, in its current form, doesn't handle email addresses and
URLs specially, but the WordBreaker class is the correct place to do
so. Also, special case handling of hyphens and dashes is still done
in LineBreaker, but all of that should be moved to WordBreaker.

Bug: 20126487
Bug: 20566159
Change-Id: I492cbad963f9b74a2915f010dad46bb91f97b2fe
2016-02-16 22:05:07 -08:00