GSoC'21: Final Report- 9 mins
Table of Contents:
- About Matplotlib
- Font Subsetting
- Font Fallback
To kick things off for the final report, here’s a meme to nudge about the previous blogs.
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations, which has become a de-facto Python plotting library.
Much of the implementation behind its font manager is inspired by W3C compliant algorithms, allowing users to interact with font properties like
However, the way Matplotlib handled fonts and general text layout was not ideal, which is what Summer 2021 was all about.
By “not ideal”, I do not mean that the library has design flaws, but that the design was engineered in the early 2000s, and is now outdated.
About the Project
(PS: here’s the link to my GSoC proposal, if you’re interested)
Overall, the project was divided into two major subgoals:
- Font Subsetting
- Font Fallback
But before we take each of them on, we should get an idea about some basic terminology for fonts (which are a lot, and are rightly confusing)
The PR: Clarify/Improve docs on family-names vs generic-families brings about a bit of clarity about some of these terms. The next section has a linked PR which also explains the types of fonts and how that is relevant to Matplotlib.
An easy-to-read guide on Fonts and Matplotlib was created with PR: [Doc] Font Types and Font Subsetting, which is currently live at Matplotlib’s DevDocs.
Taking an excerpt from one of my previous blogs (and the doc):
Fonts can be considered as a collection of these glyphs, so ultimately the goal of subsetting is to find out which glyphs are for a certain array of characters, and embed within the output.
PDF, PS/EPS and SVG output document formats are special, as in the text within them can be , i.e, one can copy/search text from documents (for eg, from a PDF file) if the text is editable.
Matplotlib and Subsetting
The PDF, PS/EPS and SVG backends used to support font subsetting, only for a few types. What that means is, before Summer ‘21, Matplotlib could generate Type 3 subsets for PDF, PS/EPS backends, but it generate Type 42 / TrueType subsets.
With PR: Type42 subsetting in PS/PDF merged in, users can expect their PDF/PS/EPS documents to contains subsetted glyphs from the original fonts.
This is especially benefitial for people who wish to use (or CJK) fonts. Licenses for many fonts require subsetting such that they can’t be trivially copied from the output files generated from Matplotlib.
Matplotlib was designed to work with a single font at runtime. A user could specify a
font.family, which was supposed to correspond to CSS properties, but that was only used to find a single font present on the user’s system.
Once that font was found (which is almost always found, since Matplotlib ships with a set of default fonts), all the user text was rendered only through that font. (which used to give out “” if a character wasn’t found)
It might seem like an outdated approach for text rendering, now that we have these concepts like font-fallback, . Even getting a single font to work was considered a hard engineering problem.
This was primarily because of the lack of any standardization for representation of fonts (Adobe had their own font representation, and so did Apple, Microsoft, etc.)
To migrate from a font-first approach to a text-first approach, there are multiple steps involved:
Parsing the whole font family
The very first (and crucial!) step is to get to a point where we have multiple font paths (ideally individual font files for the whole family). That is achieved with either:
- PR: [with findfont diff] Parsing all families in font_manager, or
- PR: [without findfont diff] Parsing all families in font_manager
Quoting one of my previous blogs:
Don’t break, a lot at stake!
My first approach was to change the existing public
findfont API to incorporate multiple filepaths. Since Matplotlib has a very huge userbase, there’s a high chance it would break a chunk of people’s workflow:
Once we get a list of font paths, we need to change the internal representation of a “font”. Matplotlib has a utility called FT2Font, which is written in C++, and used with wrappers as a Python extension, which in turn is used throughout the backends. For all intents and purposes, it used to mean:
FT2Font === SingleFont (if you’re interested, here’s a meme about how FT2Font was named!)
But that is not the case anymore, here’s a flowchart to explain what happens now:
With PR: Implement Font-Fallback in Matplotlib, every FT2Font object has a
std::vector<FT2Font *> fallback_list, which is used for filling the parent cache, as can be seen in the self-explanatory flowchart.
For simplicity, only one type of cache () is shown, whereas in actual implementation there’s 2 types of caches, one shown above, and another for glyphs ().
Note: Only the parent’s APIs are used in some backends, so for each of the individual public functions like
get_kerning, etc., we find the FT2Font object which has that glyph from the parent FT2Font cache!
Multi-Font embedding in PDF/PS/EPS
Now that we have multiple fonts to render a string, we also need to embed them for those special backends (i.e., PDF/PS, etc.). This was done with some patches to specific backends:
- PR: Implement multi-font embedding for PDF Backend
- PR: Implement multi-font embedding for PS Backend
With this, one could create a PDF or a PS/EPS document with multiple fonts which are embedded (and subsetted!).
From small contributions to eventually working on a core module of such a huge library, the road was not what I had imagined, and I learnt a lot while designing solutions to these problems.
The work I did would eventually end up affecting every single Matplotlib user.
…since all plots will work their way through the new codepath!
I think that single statement is worth the .
Pull Request Statistics
For the sake of statistics (and to make GSoC sound a bit less intimidating), here’s a list of contributions I made to Matplotlib , most of which are only a few lines of diff:
|Created At||PR Title||Diff||Status|
|Nov 2, 2020||Expand ScalarMappable.set_array to accept array-like inputs||(+28 −4)||MERGED|
|Nov 8, 2020||Add overset and underset support for mathtext||(+71 −0)||MERGED|
|Nov 14, 2020||Strictly increasing check with test coverage for streamplot grid||(+54 −2)||MERGED|
|Jan 11, 2021||WIP: Add support to edit subplot configurations via textbox||(+51 −11)||DRAFT|
|Jan 18, 2021||Fix over/under mathtext symbols||(+7,459 −4,169)||MERGED|
|Feb 11, 2021||Add overset/underset whatsnew entry||(+28 −17)||MERGED|
|May 15, 2021||Warn user when mathtext font is used for ticks||(+28 −0)||MERGED|
Here’s a list of PRs I opened :
- [Status: ✅] Clarify/Improve docs on family-names vs generic-families
- [Status: ✅] Add parse_math in Text and default it False for TextBox
- [Status: ✅] Type42 subsetting in PS/PDF
- [Status: ✅] [Doc] Font Types and Font Subsetting
- [Status: 🚧] [with findfont diff] Parsing all families in font_manager
- [Status: 🚧] [without findfont diff] Parsing all families in font_manager
- [Status: 🚧] Implement Font-Fallback in Matplotlib
- [Status: 🚧] Implement multi-font embedding for PDF Backend
- [Status: 🚧] Implement multi-font embedding for PS Backend
From learning about software engineering fundamentals from Tom to learning about nitty-gritty details about font representations from Jouni;
From learning through Antony’s patches and pointers to receiving amazing feedback on these blogs from Hannah, it has been an adventure! 💯
Special Mentions: Frank, Srijan and Atharva for their helping hands!
And lastly, you, the reader; if you’ve been following my previous blogs, or if you’ve landed at this one directly, I thank you nevertheless. (one last meme, I promise!)
I know I speak for every developer out there, when I say when you choose to look at their journey or their work product; it could as well be a tiny website, or it could be as big as designing a complete library!
I’m grateful to Maptlotlib (under the parent organisation: NumFOCUS), and of course, Google Summer of Code for this incredible learning opportunity.
Farewell, reader! :’)