Solutions to implement Microdata + Schema.org in Drupal

allan dick party - 030.jpg - michaelmcd

Solutions to implement Microdata + Schema.org in Drupal

This tutorial was tested only on Drupal 7

Drupal is one the the best CMS available right now. It has great capabilities by default and if it lacks of something you can improve its capabilities with a bunch of modules available ranging from APIs to themes. However, although Drupal has been improved outrageously there are still some things left. One of these things to improve has to do with the RDFa module. This module is used to define machine-readable data (often called rich snippets) but it has some problems with HTML5 themes. Unfortunately, there is no replacement for this module and there are no modules that can solve this problem totally. Nonetheless, there are other simple ways to define the rich snippets. Have you heard about Microdata and Schema.org?. Microdata is a new specification promoted with HTML5.

There are two ways to use microdata + Schema.org to define the rich snippets as well as getting rid of RDFa module. The first one is using the Microdata module (Drupal). It has a way to implement Schema.org to the content types and fields. However, it still does not support some datatypes and it might be unstable yet (you better test it yourself). The second one is to add/modify the templates in order to set the rich snippets by yourself. It might be more work but you will have more control. Additionally, you can do even more things like setting hidden data. Using currently displayed information and much more. Besides, you are not supposed to change or define rich snippets all the time, so why not doing it by templating.

First alternative: Microdata (module)

There is a module available for Drupal that can implement Microdata to the content types and its fields. It still does not support some datatypes, though. Despite of this, it is a good way to define the rich snippets with Microdata. It is somewhat rudimentary. It works directly with the content types, so if you need to define something more I fear you can not do it with this module. The datatypes are specified directly in a text input (no select box or anything else) which is not validated. Therefore you might need some knowledge about Microdata and Schema.org. Despite of its limitations it is pretty simple to use. Just specify the datatype for a given content type, specify the items for each field and you are done.

Drupal - Microdata (module)
A simple way to implement Microdata+Schema.org is using a module like Microdata (module)

Second alternative: Template override

Although Microdata (module) seems to be pretty interesting it still could be somewhat unstable, not customizable and you can not define hidden data with it. However, if you override some templates you can do all that and more. Actually, how many times do you need to define the rich snippets?, I would say just once, but even if someone needs to do that more than once. It would be just a few times, you are not supposed to change the rich snippets every day.

We are going to define some common rich snippets by template override. This would help to know exactly how to implement Microdata/Schema.org using this method. We are going to define some rich snippets such as; WebSite, BlogPosting, Author and articleBody.

In order to define the rich snippets with Schema.org. Some key templates must be overridden such as the HTML template (html.tpl.php), Some content types (node.tpl.php), and some fields (field.tpl.php, specially the body field). Additionally, we are going to define a function to define every image with Schema.org.

An overview to the theme files

Before doing anything, we must check all the rich snippets that we can define according to the templates that we can override. So we will know the templates that we have to override exactly. This is an overview of the most basic templates that we can create.

Theme files overview
Attribution: drupal.org

Basically, we must have a list of all the rich snippets that want to define in order to find where the data is being displayed in the template and define it properly. For this you must be knowledgeable about the structure of Drupal and how to theme it. Usually the HTML and body tag are defined along with the page content in the html.tpl.php file. The page content which includes many regions such as the content itself. The header, the footer and the sidebars. Is defined in the page.tpl.php file. The node.tpl.php file prints the content of every node which may include its respective fields. This content is the real content in Drupal. The fields are defined in the field.tpl.php file (although it is not in the image above it exists and it is important, at least for me). Including the body field which has the main content of the node. These kind of things have to be known in order to define the desired data. Microdata and Schema.org let you define any data within a HTML element through a set of attributes. There are pretty basic datatype such as “Date” or “Number” which exists just to hold simple data as numbers, dates or pieces of text. There are global datatypes such as “WebSite” and “WebSiteElement” (and all their derivates) that are used to represent general entities enclosing data that has not to be associated as theirs. There are more specific datatypes such as “Person”, “Organization” and “Places” which include several items with fully detailed data. Once you know which data to define and where it is located, you simply have to add the proper attributes.

An overview to Microdata and Schema.org

Schema.org is a new set of datatypes for Microdata. Microdata is a new specification that allows to define machine-readable data in HTML documents. It is especially optimized for HTML5 and is pretty simple to use. Paraphrasing the use of Microdata/Schema.org. Microdata works basically with three attributes, “itemscope”, “itemtype” and “itemprop”. The “itemtype” attribute defines the datatype of a given item (HTML elements with Microdata attributes are going to be called items from now on). The URL of the datatype is supposed to be set here. All the datatypes of Schema.org can be found in the “The type hierarchy” section of its documentation. Interestingly, all the URLs of each datatype shown in that page are the real URLs to be set in that attribute. Most of the items with a defined datatype contain a set of children items not just simple data. For example, the items with “person” datatype may hold name, email, telephone and much more. In these cases the “itemscope” attribute should be set. The “itemscope” attribute indicates the items which have children items. It is very likely that the “itemscope” attribute is set along with the “itemtype” attribute. Every child item must have an “itemprop” attribute in order to indicate what item it is representing. For example, the item representing the name of another item must have the attribute “itemprop” with the value “name” in order to indicate it is a specific child of a parent item. If the child item has a datatype it should have an “itemtype” attribute (and an “itemscope” attribute if it has children items as well). Be aware of the items with an “itemscope” attribute. They must have at least one children item if not they will not be recognized. Some datatypes will require specific items such as the “name” item (e.g.: “Person” datatype). It is encouraged that you define the “name” item when possible.

<!-- a Person datatype item without any parent item -->
<div id="element" class="element" itemscope itemtype="http://schema.org/Person">

  ...

  <!-- a paragraph with two children items (name and jobTitle) enclosed by span tags whose parent is the Person datatype item -->
  <p>Hello I am <span itemprop="name">Miguel Rodriguez</span>,
  a freelance <span itemprop="jobTitle">Developer</span>.</p>

  ...

  <!-- an Organization datatype item, it represents the 'worksFor' data of the Person datatype item -->
  <div itemprop="worksFor" itemscope itemtype="http://schema.org/Organization">

    <!-- a child item, it is the 'name' data of the Organization datatype item -->
    <div itemprop="name">Independent Contractor</div>

  </div>

</div>

This is an use example of Schema.org. There is an item with a “Person” datatype holding three children, the “name”, “jobTitle” and “worksFor” items. All these children items are indicated by the “itemprop” attribute. The “worksFor” item has an “Organization” datatype holding a “name” item as well. Not writing the whole URL of the datatype is a common error (e.g.: you must write “http://schema.org/Person” not just “schema.org/Person”). You should also make sure you validate the rich snippets with an online validator such as the Google Structured Data Testing Tool.

The HTML template: html.tpl.php

In the HTML template you can find the HTML & Body tag and the HTML elements enclosing the page content. Although there is nothing more, this template is perfect to set a “WebSite” datatype item as well as any other global datatype items to enclose the content. The following is the possible structure of the HTML template.

<head \.\.\. >

<?php print $head; ?>
<title><?php print $head\_title; ?></title>

\.\.\.\.

<?php print $styles; ?>
<?php print $scripts; ?>

\.\.\.\.

<\/head>
<\!\-\- you can place a datatype in the Body tag \-\->
<body \.\.\. itemscope itemtype="http:\/\/schema\.org\/WebPage" >

\.\.\.

<?php print $page\_top; ?>

\.\.\.

<\!\-\- or you can make a content wrapper \-\->
<div class="wrapper" \.\.\. \[itemprop=" \.\.\. "\] itemscope itemtype="http:\/\/schema\.org\/WebPage" >

<?php print $page; ?>

<meta itemprop="name" content="WebPage Name">
<\/div>

\.\.\.

<?php print $page\_bottom; ?>

\.\.\.

<\/body>

the html.tpl.php file will never look the same whether you are using a theme built from scratch or a subtheme

Let us say this is the structure of the html.tpl.php file. As it was mentioned before, this template is perfect to place global datatype items to enclose all the content.

The Page template: page.tpl.php

Some website elements such as the header, the footer and the sidebars are defined in the Page template. The content is defined here as well (actually, it is just a wrapper but the content is inside anyway). In this template you can define all those elements as the “WPHeader”, “WPFooter” and “WPSideBar” respectively. This would be an example of a Page template (remember that all the templates may change depending on the base theme you are using. If you are using a base theme of course).

...

<div id="page">

...

<div id="header" ... itemscope itemtype="http://schema.org/WPHeader">
...
<meta itemprop="name description" content="Header">
</div>

...

<div id="content" ... >
...
<?php print render($page['content']); ?><!-- your content goes here -->
...
</div>

...

<div id="sidebar-first" ... itemscope itemtype="http://schema.org/WPSideBar">
...
<meta itemprop="name description" content="Left Sidebar">
</div>

...

<div id="sidebar-second" ... itemscope itemtype="http://schema.org/WPSideBar">
...
<meta itemprop="name description" content="Right Sidebar">
</div>

...

<div id="footer" itemscope itemtype="http://schema.org/WPFooter">
...
<meta itemprop="name description" content="Footer">
</div>

...

</div>

The Node template: node.tpl.php

All the content in Drupal is displayed as Nodes including every content type. Data such as the title (if is there any), the author, the author picture and creation date can be found in this template. Depending on the content, you can override the Node template to define any “BlogPosting” or “Article” item. Although you can define it in the Page template as well, defining it in the Node template is encouraged. Not all the data is shown. There are some data that you must define by using the variables that the template provides without showing them. In order to do this the data must be defined in HTML meta elements. These metas must be placed inside an item which is going to represent the “author” of the “BlogPosting” or “Article” item. According to the “Article” datatype (the “BlogPosting” datatype inherits all its properties) the “author” child item must be a “Person” datatype. Therefore a “Person” datatype item is going to be created enclosing the user’s picture (that would relate the picture with the author automatically. Please check the section “adding the images” bellow) and all the hidden data is going to be defined after it (this article does not cover how to get the author’s data).

<div id="node-<?php print $node->nid; ?>" ... itemscope itemtype="http://schema.org/BlogPosting" ... >

...

<h2<?php print $title_attributes; ?>><a href="<?php print $node_url; ?>"><?php print $title; ?></a></h2>

...

<div itemprop="author" itemscope itemtype="http://schema.org/Person">

<?php print $user_picture; ?>

<div class="submitted">
<?php print $submitted; ?>
</div>

<meta itemprop="name" content="author's name">
<meta itemprop="url" content="author's URL">
<meta itemprop="email" content="author's email">

</div>

...

<div class="content clearfix"<?php print $content_attributes; ?>>

...

<?php print render($content); ?>

...

</div>

...

<meta itemprop="name" content="<?php print $title; ?>">
<meta itemprop="dateCreated" content="<?php print $date; ?>">

</div>

Although you can set a datatype to all the nodes, setting specific datatypes to the nodes according to the content type is encouraged. You can override the template for specific content types by naming the template file as this:

node--[content type].tpl.php

For more information visit the Drupal documentation page about template (theme hook) suggestions.

The Fields template: field.tpl.php

Every single field in Drupal can be customized with this template. There are some fields that come by default such as the body field. Generally, the main content comes in this field. This makes it perfect to define items such as the “articleBody” or other item that comes inside the “Article” item.

<div itemprop="articleBody" class="<?php print $classes; ?>"<?php print $attributes; ?>>

...

<?php if (!$label_hidden): ?>
<div class="field-label"<?php print $title_attributes; ?>><?php print $label ?>:&nbsp;</div>
<?php endif; ?>

...

<div class="field-items"<?php print $content_attributes; ?>>

<?php foreach ($items as $delta => $item): ?>
<div class="field-item <?php print $delta % 2 ? 'odd' : 'even'; ?>"<?php print $item_attributes[$delta]; ?>><?php print render($item); ?></div>
<?php endforeach; ?>

</div>

...
</div>

The same situation happens with the fields. Although you can define an item to all the fields, setting specific item to the fields according to the content type is encouraged. You can override the template for specific content types as this:

field--[field name]--[content type].tpl.php

For more information visit template (theme hook) suggestions.

Defining more information

Not all the information can be added just by templating. There are some other things to do in order to add the rest of the information such as the images and the ads. For example, some information may be contained in the blocks. Some other information, as the images, can not be accessed that easy.

Adding the images

The information of the website and its articles have been added. However, the images are not. They must be added and associated with the webpage. As you may know, this is not something that can be done merely by overriding a template. In order to do this we must add a function which is going to add the microdata attributes automatically.

function THEME_preprocess_image(&$vars)
{
$vars["attributes"]["itemprop"] = "image";
}

All the images would be associated with their respective parent item automatically, so if is there an image inside the “WebPage” item that image will be associated with it. If is there an image inside the “BlogPosting” item that image will be associated with it (not with the “WebPage” item), and so on.

Adding the Ads

You can also define the ads. In order to do this, they should have been added in a block with “full html” filtering without any text editor (wysiwyg). You should not modify the ads but you can add the ads in a container that you can create with the proper attributes.

<div itemscope itemtype="http://schema.org/WPAdBlock">

<script type="text/javascript" ... > ... </script>

...

</div>

By doing this, all the ads are going to be defined with Microdata + Schema.org. it is important to define as much data as possible, even if it means adding all the ads.

Adding information in the fields (wysiwyg)

This is somewhat cumbersome because there are no text editors which allow adding microdata information comfortably, but is still possible as long as you know HTML and that you can insert full HTML inputs. Basically you just have to remember how Microdata and Schema.org work. Just select a piece of information and enclose it in span tags. If it is an item which is going to have more items add an “itemscope” attribute. If it is a child of a parent item add an “itemprop” attribute with the name of the item it is representing. If it has a specific datatype (either required by its parent item or not) add an “itemtype” attribute with the proper datatype.

Benefits and Limitations

Both possibilities have benefits and limitations that you might consider;

Using Microdata (module);

Well, using this module would be the best for people who does not know how to code or people who like things easy. You can define a lot of data without hassle. Besides, this might be ideal if you need to add/change data often with Microdata and Schema.org (this is hypothetical, I still do not know why would you do this). Although it has some limitations is still a great alternative.

Benefits

  • It is a straightforward process.
  • No coding skill required (although you should know how Microdata and Schema.org work).
  • Many data types can be used.

Limitations

  • It can only specify data between content types and its fields.
  • Although it supports many datatypes, there might be some datatypes that it can not be used.
  • Hidden data can not be specified.
  • There are several data outside the content types that can not be defined.

Through theming;

In my personal opinion this is more a theming solution than a module solution. This is something that must be planned and not “on the go”. If you need to define the data thoroughly (all the items that you can including hidden data) perhaps doing this is the best solution.

Benefits

  • The data can be thoroughly specified.
  • Any datatype can be used.
  • hidden data can be defined.
  • You can define other items such as the header, footer, sidebar, ads and many other things.

Limitations

  • It requires coding skills.
  • It might be hard and complex.