About RDFa Lite, Microdata, and Microformats

Why I am switching from microformats to RDFa Lite.

Webpages are written for humans. As humans, we understand articles in their greater contexts. We understand a recently published piece about the pandemic will be about COVID-19, or an article about the newly elected president of France will be about Emmanuel Macron, even if it doesn’t mention these names. This is harder for computers. Humans understand a piece about a pandemic, or the French elections published in in a different context. How do we explain to computers these differences, how does a computer understand what a webpage is about? This is linked data at its core: making data machine-readable.

There are different ways to do this. AI and ML play an important role nowadays. These are top-down approaches: they try to make sense of unstructured data. Linked data is the opposite, bottom-up, adding structure to our own data so computers can understand it more easily. RDFa, microdata, and microformats are ways to do this.

Different Formats, Same Goal

In I started sprinkling microformats all over this blog. Microformats gained adoption and when version 2 was published I updated my blog to use the new vocabulary. Meanwhile—over at work—I started playing with microdata, a similar idea. Microdata is more verbose but has a more extensive vocabulary. Both formats can structure data about a person, but there is no way to indicate a FAQ in microformats, to name one example.

<p class="h-card">
  <a class="p-name u-url" href="">
    Jane Doe
↑ Example 1: Structured data about a person in microformats.
<section itemscope itemtype="">
  <article itemprop='mainEntity' itemscope itemtype="">
    <h1 itemprop="name">Who directed the 1972 film “The Godfather”?</h1>
    <span itemprop="acceptedAnswer" itemscope itemtype="">
      <p itemprop="text">
        Francis Coppola co-wrote the screenplay with Mario Puzo, based on Puzo’s novel.
↑ Example 2: Structured data about an FAQ in microdata.

The 3rd one, RDFa, is again similar: a way to augment HTML with machine-readable hints for what the copy is about. It can even use the same vocabulary as microdata but is more extensible as it can easily use different or even multiple vocabularies in the same document. While microdata is limited to HTML documents, RDFa can be used in XML variants like SVG and Atom. RDFa can be more complex, but there is a simplified version: RDFa Lite, which seems to be just fine for a website like mine.

<section typeof="Book">
  <h1 property="name">1984</h1>
  <p property="author">George Orwell</p>
  <p property="isbn">
    <a href="" property="mainEntityOfPage">
  <p property="review" typeof="Review">
    <span property="reviewRating" typeof="Rating">
      <meta property="bestRating" content="5">
      <meta property="ratingValue" content="5">
↑ Example 3: Book entry from my reading list in RDFa Lite.

What To Choose?

So, what’s one to use: Microformats, microdata, or RDFa Lite? To add to the complexity, there is also JSON-LD which I haven’t even looked at yet.

While I initially liked the “you simply add special CSS class names” idea from microformats, I now prefer the more explicit microdata or RDFa attributes. It’s too easy to accidentally remove a class name in a larger project with multiple developers, the attributes at least indicate they have a special purpose. Micorformats are more limited in scope and there is this nagging feeling its glory days are over. Microdata and RDFa Lite seem similar, but I can see how investing time to learn RDF might be powerful in the long run, since it can be used more broadly. Then there is this bit where microdata was retired until it wasn’t? Lastly, this blog has always been a place to experiment and learn about different web technologies and RDFa is an excuse to do so.

That’s why I am slowly moving my blog from microformats to RDFa Lite, starting with my reading list as a first exercise.