Google Web Fonts API – categories and subsets

Each font provided by the Google Web Fonts API is in a category, and comes in a variety of one or more subsets.

The category provides a broad grouping of the fonts into styles. These are the categories:

  • Sans Serif
  • Display
  • Serif
  • Handwriting

Each font will appear in just one of these categories. When you request lists of fonts from the Google API, you cannot filter the list by category. This would be a useful thing to do when choosing a font from within an application, but the API just does not support it.

The subset is similar to a category in that you cannot request lists of fonts that match a subset from the API, but differs in that a font can have many subsets. The subset defines the range of characters for which a glyph – the physical representation of a character – exists within the font set. The default is latin, and covers all Western European glyphs. There is extended latin (latin-ext) that includes many additional characters. Here is the current list, but bear in mind these will be constantly increasing in number as Google widens its international support.

  • latin
  • latin-ext
  • menu
  • greek
  • greek-ext
  • cyrillic
  • cyrillic-ext
  • vietnamese
  • arabic
  • khmer
  • lao
  • tamil
  • bengali
  • hindi
  • korean

It is important to note that not all of these are available through the Google API at present, but more on this later.

When you request a font from Google for use on a website, you can list which of the above subsets you want included. You can select multiple subsets, and Google will deliver you a single font that includes them all – the full range of glyphs. However, the more you request, the bigger the file download size. This can impact mobile users a lot, so you should request only the subsets you need.

Now a problem arises when choosing a font for use on a site. If you only want the latin subset, then you can select almost any font, as most support latin. Latin extended is also supported by a vast majority. Greek and Cyrillic is less supported, so you have a choice of fewer fonts. The remaining subsets even less so, with some only being available in around half a dozen fonts. Not being able to fetch lists of fonts from the Google API filtered by subsets, means you cannot choose a font from within an application based on the subsets that application needs to support.

Edit: it has been pointed out that not all fonts provided by Google have glyphs Latin. For example, the Khmer font is currently only available with Khmer glyphs and so if you want to use that font then you must request the Khmer subset when requesting that font or request the font without any subset filter, which is how we get the following Khmer text in the Khmer font:

ញ៉ាំកញ្ចក់បាន ដោយគ្មាន

It should be noted that the Google Fonts website does allow you to filter the fonts by both subset and category, so you can browse your fonts there, note them down, then select those same fonts for your application. You just don’t have that filter in an application of your own that pulls its font information from the Google Web Fonts API.

It should also be noted that most modern browsers are able to cope with missing glyphs in a font, by using substitutes from other fonts. That may solve the problem for the end user, but it does mean the designer has less control over how the site looks and how the layout works.

So how do we get around this limitation of the API? How do we filter the font lists by category and subset? It turns out that the data behind the fonts that Google delivers is managed in a public hg repository. You can browse through the fonts and data, and all the metadata that describes these fonts. You can download the repository and pull the data out into your own lists.

The hg repository is over 2.6Gbytes in size (all those fonts in every subset, every format and every style, do take up a lot of space) so you probably don’t want to be downloading it too often. Unfortunately you cannot choose just to download the metadata and leave the fonts behind; you get the whole lot or nothing.

What I have done, is download the repository as a one-off massive download, and keep it regularly updated (which involves many small updates). I then pull out the metadata with some hacky scripts and summarise the results in a number of useful ways. The scripts that do this end result can be found in this git repository. The updates to this repository are not automated yet, but will be soon. So if you want to access that metadata, as JSON data files, that is where you can get them from.

The hg repository is Google’s work area. It is where the font lists are collected and constructed before being release publicly through the API. A problem with that is that the repository will always be ahead of what the API supports; there will usually be more fonts, more subsets and more styles than you can get at through the API. But at least you know they will be arriving at some point, so you can plan ahead.

https://github.com/academe/GoogleFontMetadata

Note: I realise that the subsets are listed against each font when you request a list of fonts from the API, so you could use that information to filter them locally. The category is still not listed though.

———–

Are there ways of analysing the metadata that I have not covered, and would perhaps like me to include in the scripts? Is it worth me creating an API for downloading this analysed metadata for your applications, at least until Google provide support for such filters? Would you use such an API? Would you trust it? Let me know what you think.

4 Responses to Google Web Fonts API – categories and subsets

  1. Alexander 2014-01-06 at 16:01 #

    Do you have an idea, when you are including multiple fonts in a single line, how can you include multiple subsets? Example:

    http://fonts.googleapis.com/css?family=Open+Sans:700,300|PT+Sans:400,700

    But i have no idea how to include multiple subsets.

    • Jason Judge 2014-01-12 at 17:39 #

      The font families all go into the “family” parameter, each separated by a bar (|). The subsets all go into the “subset” parameter, each separated by a comma.

      An example:

      http://fonts.googleapis.com/css?family=Advent+Pro:100,r,b|Alegreya:r,ri,b,bi&subset=latin,greek-ext

      This would load two fonts in various weights, each in both latin and greek-ext (extended) subsets. That is assuming those subsets are available for those fonts.

  2. Patrick 2014-03-10 at 22:12 #

    I filtered for the Khmer font in the Latin subset but this returned no results.
    However you mention that all fonts are part of the Latin subset?

    • Jason Judge 2014-03-15 at 19:58 #

      Yes, it looks like the Khmer font is only available in the Khmer subset. You can request that font in the Latin subset in a page, but you won’t get it. You can display Latin characters using that font, but the browser will fall back to another font to do the actual rendering. Browsers tend to look for a “best fit” font when a subset is not available in the font you are trying to display.

      The Google page for Khmer is here and notice how it only lists one selectable character set, also called Khmer.

      I’ll correct the main article. Thanks.

Leave a Reply