- Notifications
You must be signed in to change notification settings - Fork 3.1k
Fonts: Normalize font face font-family #9951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Fonts: Normalize font face font-family #9951
Conversation
This reverts commit 9f29537.
…nto fix/font-face-name-normalization
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
| The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the Core Committers: Use this line as a base for the props when committing in SVN: To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
| if ( | ||
| strlen( $font_family ) > 1 && | ||
| ( '"' === $font_family[0] && '"' === $font_family[ strlen( $font_family ) - 1 ] ) || | ||
| ( "'" === $font_family[0] && "'" === $font_family[ strlen( $font_family ) - 1 ] ) | ||
| ) { | ||
| _doing_it_wrong( | ||
| __METHOD__, | ||
| __( 'Font font-family should not be wrapped in quotes; they will be added automatically.' ), | ||
| '6.9.0' | ||
| ); | ||
| $font_family = substr( $font_family, 1, -1 ); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I don't think this is a good idea and it should not be checking for quoted font families.
An already quoted font family suggests that it's been encoded for CSS already. In that case, the "normalization" here is dangerous.
The original implementation tried to cover both cases:
/* * Wrap font-family in quotes if it contains spaces * and is not already wrapped in quotes. */ if ( str_contains( $font_face['font-family'], ' ' ) && ! str_contains( $font_face['font-family'], '"' ) && ! str_contains( $font_face['font-family'], "'" ) ) { $font_face['font-family'] = '"' . $font_face['font-family'] . '"'; }If a quoted string were found, normalization would need to parse it as a CSS string first, then perform normalization on the parsed result.
#7857 includes some CSS parsing functionality, but that's not ready to land yet. I think at this time the best this can do is call _doing_it_wrong and not perform normalization.
I'd love to hear thoughts from other folks on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's CSSProcessor escapes strings:
(Note the method is misnamed, it's called escape_url_value but it should really be called escape_string).
It also parses strings, maybe that would be useful here:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An already quoted font family suggests that it's been encoded for CSS already. In that case, the "normalization" here is dangerous.
Agreed. What are the scenarios where we'd receive an already quoted value? Should we just tokenize our CSS data source?
A safe thing to do would be to assume it may not be encoded right and reject it right away. Alternatively, we could recognize it's a quoted value and try to parse it, but something about that feels off and I worry we'd end up with double-encoded values sooner or later. Sticking to a single accepted input format sounds great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for asking this, you encouraged me to dig deeper and research. I'll summarize that I think this change is doing the right thing based on usage I've seen, but documentation should be updated to indicate that plain text (unquoted, unescaped) should be provided and the system will handle transformation to a CSS font family name.
I believe this typically comes from JSON in a theme.json file. It's not CSS, it's a JSON string like in this example:
{ "fontFamily": "Open Sans", "fontWeight": "300 800", "fontStyle": "italic", "fontStretch": "normal", "src": [ "file:./assets/fonts/open-sans-italic.woff2" ] }Each object accepts these properties that map to descriptors for the @font-face CSS at-rule:
- fontFamily: A valid CSS font-family descriptor.
The documentation suggests that the value should be valid CSS, so someone passing in { "fontFamily": "\"O'Reilly Sans\"" } is doing the right thing according to documentation. At the same time, it would be good to quote Open Sans in the documented example, suggesting it's actually better to pass an unquoted, unescaped, plain JSON string and let the system handle it (much more robust with this PR).
Passing a correctly escaped CSS identifier for the font-family name would regress with this PR, like { "fontFamily": "Open\20Sans" }. However, I think it's very unlikely anyone is doing that.
I did some searching to describe what's currently being done.
I was surprised to see that twentytwentyfive quotes the font face family:
| "fontFamily": "\"Fira Code\"" |
But twentytwentythree does not:
| "fontFamily": "DM Sans", |
| "fontFamily": "IBM Plex Mono", |
| "fontFamily": "Source Serif Pro", |
I did a theme search and looked through the top themes:
- Twenty Twenty-Three uses unquoted families with spaces
- Twenty Twenty-Two uses unquoted families with spaces
"fontFamily": "Source Serif Pro" - Extendable uses unquoted families with spaces
"fontFamily": "Baloo Tamma 2" - Kubio uses unquoted families with spaces
- YITH Wonder uses unquoted families with spaces
- Raft uses unquoted families with spaces
- Variations uses unquoted families with spaces
- Neve FSE uses unquoted families with spaces
- Mindscape uses unquoted families with spaces
- Vertice uses unquoted families with spaces
I looked through the themes with a minimum of 5,000 installs and found many examples of unquoted families with spaces, but no examples of identifiers with CSS Unicode escape sequences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice research! And I agree with your conclusion. We already have real inputs in both forms out there, continuing to support them is kind. Hopefully we won't see many/any escape sequence. Actually, that makes me wonder...
Why couldn't we support unicode escape sequences? If we treated that string value from theme.json as untokenized CSS, we could easily consume it using the CSSProcessor:
<?php use WordPress\DataLiberation\CSS\CSSProcessor; require_once __DIR__ . '/vendor/autoload.php'; $fontFamilies = [ "\"Fira Code\"", "Source Serif Pro", "Open\\20Sans" ]; foreach ( $fontFamilies as $family ) { $processor = CSSProcessor::create( $family ); while ( $processor->next_token() ) { if($processor->get_token_type() === CSSProcessor::TOKEN_WHITESPACE) { echo ' '; } else { echo $processor->get_token_value(); } } echo "\n"; }Outputs:
Fira Code Source Serif Pro Open Sans There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be perfectly clear, my familiarity with the font system in WordPress is limited to this PR. I can't speak to the goals and design of the system.
My intuitive understanding is that it's difficult and error prone to work with a CSS value here instead of a JSON string. Maybe it's the fact that the JSON string has quotes and you need strong familiarity with the system to understand that {"fontFamily": "\"O'Reilly Sans\""} is expected. It's also confusing that font families can be CSS identifiers or CSS strings, so Helvetica and "Helvetica" are equally valid and correct ways of referencing the same font family.
It still really feels like this PR is helpful and will mostly do what folks want, but I may be wrong! I'm happy to be challenged on this.
The other font family
Of course, there's the other fontFamily usage that's also a string of CSS like this (again from the example):
{ "name": "Primary", "slug": "primary", "fontFamily": "Charter, 'Bitstream Charter', 'Sitka Text', Cambria, serif" }Is seems like this could have been an array of unescaped, unquoted, JSON strings and it would be simpler and might avoid the split/sanitize/join issues:
{ // Wrong! This is not how the system works, but maybe it could be? "fontFamily": [ "Charter", "Bitstream Charter", "Sitka Text", "Cambria", "serif", // Uh-oh, that's a problem ], }The problem there would the generic font family serif. As a generic font family, it must be unquoted. But it's also possible to add a font family named "serif" (quoted) which is different from the generic. CSS semantics handle this by differentiating between the string "serif" and the identifier serif.
I'm not sure how to handle that in an array of JSON strings without making things… messy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean this PR makes things better than they were before, perhaps aside of an unlikely escape (\20). All I'm saying is tokenizing CSS would likely be nearly as fast and would free us from tiptoeing around implicit restrictions around commas, quotes, etc. It might not be the thing to do for 6.9, given we're in beta cycle now, but perhaps for 7.0?
"fontFamily": "Charter, 'Bitstream Charter', 'Sitka Text', Cambria, serif"
A CSS Tokenizer would solve for that one as well and correctly distinguish between strings and identifiers. I don't think we can support arbitrarily complex values without a tokenizer – every time we improve how we massage that string value we effectively move closer to tokenization.
The problem there would the generic font family
serif
Again, a tokenizer would help us distinguish between a string "serif" and an identifier serif in regular CSS values, although not in a JSON array.
Lacking one, we could reject "serif" as a custom font-family name and always treat it as an identifier. Perhaps it won't even ever come up? I'd guess it's an unlikely name, especially all lowercase. It seems like there's around 10 generic font family names. We could just list them all and treat them in a special way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My position is that it's actually confusing to work with these snippets of CSS text, and we're bettor off using plain JSON strings, at least in this case of the font-face font family.
In that case, CSS tokenizing isn't helpful. We need to encode for CSS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sirreal oops! I've edited my comment to address JSON syntax more and I've submitted that edit right before you've submitted your comment.
I have mixed feelings about using a JSON array. On one hand, sure, if we're committing to the JSON format let's just embrace it completely. It's less awkward than mixing CSS and JSON syntaxes. On the other, I'm not sure how well that would generalize to other CSS rules with their own subsyntax in the value. If it wouldn't, we'd end up with some keys requiring a CSS string value and some keys requiring a JSON array. I'm also worried about things like calc(), --main-font-family, etc. assuming we want to support them. 🤔
| __( 'Font font-family should not be wrapped in quotes; they will be added automatically.' ), | ||
| '6.9.0' | ||
| ); | ||
| return $font_family; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% satisfied with this. A broken family name could be used, like """, which would result in broken CSS output. The existing implementation has the same issue. This is likely good enough for anything but the most malicious font name.
The font family name """ could be safely used with this implementation as '"""' or preferably something like "\22\22\22".
One improvement here is that the string must start and end with a matching quote character "…" or '…' in order to be treated as a quoted string. This allows fonts to contain those characters without issue and they'll be normalized properly.
It would be nice if the system knew that plain strings were provided and all quoting and normalization of the font family name were handled by the system. I'm not sure that's possible while maintaining backwards compatibility.
tests/phpunit/tests/fonts/font-face/wpFontFace/generateAndPrint.php Outdated Show resolved Hide resolved
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| I'd love to get feedback from folks that worked on Font Face CSS support. Based on history, @hellofromtonya and @oandregal may be candidates. |
| */ | ||
| "\n" => '\\A ', | ||
| '\\' => '\\5C ', | ||
| ',' => '\\2C ', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this one necessary? Not that it's wrong, but we're producing a quoted value anyway. It shouldn't derail any syntax parsing and, within a string, it's seen as a comma either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When printing CSS, the font system checks for , and does a split/join in an effort to normalize the font family names:
wordpress-develop/src/wp-includes/fonts/class-wp-font-utils.php
Lines 67 to 76 in 4d43703
| if ( str_contains( $output, ',' ) ) { | |
| $items = explode( ',', $output ); | |
| foreach ( $items as $item ) { | |
| $formatted_item = self::maybe_add_quotes( $item ); | |
| if ( ! empty( $formatted_item ) ) { | |
| $formatted_items[] = $formatted_item; | |
| } | |
| } | |
| return implode( ', ', $formatted_items ); | |
| } |
It seemed preferable to escape , in the single font family name here to try to avoid being mangled later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah so it's a workaround for the deficiencies of another part of the system. Could we go the other way around and parse the value instead of exploding by ,? The CSS Processor has some code that could help with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Core doesn't have a CSS parser, does it?
This Unicode escaping is at worst harmless and at best better than requiring full blown CSS parsing (even if that would be the most correct thing to do later).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't today, but it could. CSS Processor is just a tokenizer, not a full-blown parser. Or you might want to reuse just the part of it that parses strings, just to avoid the blanket explode() call.
| I've been doing some testing and I have concerns about this approach. This has focused on fonts registered via theme.json, however after reviewing the issue and testing I've realized that font uploads are not processed by the functions that this PR touches. I need to investigate how font uploads are processed. I've tested uploading a font named That case is interesting because we know it's a plain font name (not CSS) so there's no question of whether or not the value has been quoted or escaped already (see #9951 (comment)). |
Ensure that font-family names in the font face class are normalized and correctly encoded for CSS.
This prevents names like
O'Reilly SansorSuisse BP Int'lare correctly encoded and will not break the printed CSS.This includes tests to verify correct and expected behavior. It's not as straightforward to verify that the normalized font works correctly in the browser.
- " rel="nofollow">I've prepared a demonstration to test and compare how these encodings work.
It includes some examples from the test suite and can be used to verify that, for example, these font families match:The behavior can be tested for regressions with the playground. The default
twentytwentyfivetheme will use the font familiesManropeandFira Codethat can be tested on the playground.Before (unquoted): https://playground.wordpress.net/?wp=6.8
After (quoted): https://playground.wordpress.net/?core-pr=9951
Trac ticket: https://core.trac.wordpress.org/ticket/63568
This builds on work and ideas in #8982.
This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.