GSoC'21: Introduction at Matplotlib
- 4 minsTable of Contents:
- About Me
- First Contributions
- Getting started with Matplotlib
- Learning throughout the process
- Where does GSoC fit?
- About the Project
The day of result, was a very, very long day.
With this small writeup, I intend to talk about everything before that day, my experiences, my journey, and the role of Matplotlib throughout!
About Me
I am a third-year undergraduate student currently pursuing a Dual Degree (B.Tech + M.Tech) in Information Technology at Indian Institute of Information Technology, Gwalior.
During my sophomore year, my interests started expanding in the domain of Machine Learning, where I learnt about various amazing open-source libraries like NumPy, SciPy, pandas, and Matplotlib! Gradually, in my third year, I explored the field of Computer Vision during my internship at a startup, where a big chunk of my work was to integrate their native C++ codebase to Android via JNI calls.
To actuate my learnings from the internship, I worked upon my own research along with a friend from my university. The paper was accepted in CoDS-COMAD’21 and is published at ACM Digital Library. (Link, if anyone’s interested)
During this period, I also picked up the knack for open-source and started glaring at various issues (and pull requests) in libraries, including OpenCV [contributions] and NumPy [contributions].
I quickly got involved in Matplotlib’s community; it was very welcoming and beginner-friendly.
Fun fact: Its dev call was the very first I attended with people from all around the world!
First Contributions
We all mess up, my very first PR to an organisation like OpenCV went horrible, till date, it looks like this:
In all honesty, I added a single commit with only a few lines of diff.
However, I pulled all the changes from upstream
master
to my working branch, whereas the PR was to be made on3.4
branch.
I’m sure I could’ve done tons of things to solve it, but at that time I couldn’t do anything - imagine the anxiety!
At this point when I look back at those fumbled PRs, I feel like they were important for my learning process.
Fun Fact: Because of one of these initial contributions, I got a shiny little badge [Mars 2020 Helicopter Contributor] on GitHub!
Getting started with Matplotlib
It was around initial weeks of November last year, I was scanning through Good First Issue
and New Feature
labels, I realised a pattern - most Mathtext related issues were unattended.
To make it simple, Mathtext is a part of Matplotlib which parses mathematical expressions and provides TeX-like outputs, for example:
I scanned the related source code to try to figure out how to solve those Mathtext issues. Eventually, with the help of maintainers reviewing the PRs and a lot of verbose discussions on GitHub issues/pull requests and on the Gitter channel, I was able to get my initial PRs merged!
Learning throughout the process
Most of us use libraries without understanding the underlining structure of them, which sometimes can cause downstream bugs!
While I was studying Matplotlib’s architecture, I figured that I could use the same ideology for one of my own projects!
Matplotlib uses a global dictionary-like object named as rcParams
, I used a smaller interface, similar to rcParams, in swi-ml - a small Python library I wrote, implementing a subset of ML algorithms, with a switchable backend.
Where does GSoC fit?
It was around January, I had a conversation with one of the maintainers (hey Antony!) about the long-list of issues with the current ways of handling texts/fonts in the library.
After compiling them into an order, after few tweaks from maintainers, GSoC Idea-List for Matplotlib was born. And so did my journey of building a strong proposal!
About the Project
Proposal Link: Google Docs (will stay alive after GSoC), GSoC Website (not so sure)
Revisiting Text/Font Handling
The aim of the project is divided into 3 subgoals:
-
Font-Fallback: A redesigned text-first font interface - essentially parsing all family before rendering a “tofu”.
(similar to specifying font-family in CSS!)
-
Font Subsetting: Every exported PS/PDF would contain embedded glyphs subsetted from the whole font.
(imagine a plot with just a single letter “a”, would you like it if the PDF you exported from Matplotlib to embed the whole font file within it?)
-
Most mpl backends would use the unified TeX exporting mechanism
Mentors Thomas A Caswell, Antony Lee, Hannah.
Thanks a lot for spending time reading the blog! I’ll be back with my progress in subsequent posts.