Categories
Web Development

A More Semantic Web with Schema.org, The Open Graph Protocol and HTML5

One of the most important things for any modern business is its internet presence. If you’re not on the internet, or not active and visible on the internet, you might as well not exist to a large group of people. Search Engine Optimisation is the process of improving ones website so that it might appear higher up the Google Search rankings, where more people are likely to find it.

At the same time, one of the most interesting elements of modern software and services  is its openness. Everyone from local councils to The Association of Train Operating Companies is currently in the process of opening up their data to the world and hoping someone innovative, or with a different set of skills and resources, can make something they either couldn’t imagine themselves or didn’t have the time and money to build — for mutual benefit.

One possible enhancements to SEO and Openness for an organisation is to make their website semantic. The definition of Semantics, according to The Oxford Dictionary, is:

The branch of linguistics and logic concerned with meaning. The two main areas are logical semantics, concerned with matters such as sense and reference and presupposition and implication, and lexical semantics, concerned with the analysis of word meanings and relations between them.

The main takeaway point is that things, in this case HTML markup for websites, have meaning. We need to make sure that the meanings we are making visible to the world actually mean what we want them to mean. A nice side-effect of this is that web pages become a lot easier to parse or screen-scrape and extract information from.

HTML5

Prior to HTML5 the best way to give meaning to a tag was to use an id. So if you were to markup a simple website with a header and a list of news stories you might come up with something like this:

<div id="header">
	<h1>News Website</h1>
	<img src="logo.png" alt="logo"/>
</div>
<div id="newslist">
	<div class="story">
		<h2>News Title</h2>
		<p>Here is some exciting news!</p>
	</div>
	<div class="story">
		<h2>Another bit of news</h2>
		<p>A shame, as no news is good news!</p>
	</div>
</div>

Whilst this is relatively clean code, it does come with some issues. How is a screen-reader or search engine spider meant to know the meaning of a “story” element for example? Whilst it seems simple viewing it as a human being, we must remember that there are literally thousands of possibilities for element id names that mean “story”.

HTML5 provides some new Semantic Tags which allow us to bake meaning into elements themselves. Check out the example below which simplifies and improves the previous code using the new HTML 5 semantic tags.

<header>
	<h1>News Website</h1>
	<img src="logo.png" alt="logo"/>
</header>
<main>
	<article>
		<h2>News Title</h2>
		<p>Here is some exciting news!</p>
	</article>
	<article>
		<h2>Another bit of news</h2>
		<p>A shame, as no news is good news!</p>
	</article>
</main>

This implementation allows a browser, spider or screen reader to accurately understand what each element is for as the tag names used have been standardized by the W3C. In case you’re wondering the `<article>` tag is what is detected by browsers like IE and Safari to show a Reading View.

Wherever possible you should aim to use the semantic tags over generatic tags such as `<div>`. It makes code easier to read in addition to being more semantically correct. A full list of the HTML5 semantic tags and their meanings can be found on DiveIntoHTML5.

The Open Graph Protocol

Whilst I had been using HTML5 semantic elements for some time, I wanted to do more as part of the CS Blogs project both in terms of SEO and improving user experience through semantics.

I started with the Open Graph Protocol. The Open Graph protocol was developed by Facebook to allow websites to integrate better with Facebook, both in app and on the web, however other Social Media services also take advantage of open graph, including Pintrest, Twitter and even Google+.

The Open Graph protocol is implemented as a series of `<meta>` tags that you place in the head of your HTML pages. Each page can describe itself as identifying a Person, Movie, Song or other graph object using code such as that shown below for a Blogger on CS Blogs.com

<meta property="og:title" content="The Computer Science Blogs profile of Daniel Brown" />
<meta property="og:site_name" content="Computer Science Blogs"/>
<meta property="og:type" content="profile"/>
<meta property="og:locale" content="en_GB"/>
<meta property="og:image" content="https://avatars.githubusercontent.com/u/342035" />
<meta property="profile:first_name" content="Daniel"/>
<meta property="profile:last_name" content="Brown"/>
<meta property="profile:username" content="dannybrown"/>

As you can see most open graph properties start with an `og:` suffix, except those particular to the type of content you are making available, which are suffixed with the type name. The documentation for what tags are available can be found on the Open Graph Website.

This code will then be used by Facebook when someone links to that particular web page in their messages, or on their newsfeed. Here’s an example:

Open Graph element displayed on Facebook newsfeed
Open Graph element displayed on Facebook newsfeed

Whilst open graph is great for this purpose it does have some limitations. Each page can only be of one type, and you cannot add semantics for more than one element. This limitation is a problem for pages such as csblogs.com/bloggers which represents multiple people.

Despite its limitations its still worth implementing open graphs on pages for which it makes sense, especially if those pages are likely to be shared on social media.

Facebook, as usual, have some great development tools for open graph including the Open Graph Debugger, which allows you to see how Facebook interprets your page (but because Open Graph is a standard it’ll also help you debug any issues with Pintrest, Twitter etc.)

Schema.org

Schema.org is a standard developed in a weird moment of collaboration between the 3 search engine giants — Google, Microsoft and Yahoo. It allows you to specify the meaning of certain elements of content. You can technically do this using 3 different types of syntax, however in this blog post I will focus on micro data, partly because its the easiest to understand, fits inline with your pages and is an official part of the HTML5 spec, but also because its the only format currently fully supported by the Google search engine.

To begin with here is the HTML 5 structure of a blog post before it has been marked up with schema.org micro data. It should be pretty simple to understand if you’ve checked out the HTML 5 semantic elements mentioned previously.

<article>
    <header>
        <h2><a href="dannybrown.net">A Blog Post</a></h2>
    </header>
    <img src="dannybrown.net/image.png" alt="Featured Image"/>
    <p>This is an exert... <a class="read-more" href="dannybrown.net">Read more →</a></p>
    <footer>
        <div class="article-info">
            <a class="avatar" href="/bloggers/dannybrown">
                <img class="avatar" src="dannybrown.net/danny.png" alt="Avatar"/>
            </a>
            <a class="article-author" href="/bloggers/dannybrown">Daniel Brown</a>
            <p class="article-date">1 day ago</p>
        </div>
    </footer>
</article>

In order to markup our html with Schema.org we need to do a few things:

  1. Determine which Schema.org schema best suits the element we are describing.
  2. Determine the scope of that element
  3. Add the microdata attributes to our HTML

For our blog post example above the most relevant schema is BlogPosting. You can see all of the different types in a hierarchy at schema.org. The scope of the BlogPosting is the entire block contained within the `<article>` tags.

The scope of an item is delimited on the opening tag of our scope using the `itemscope` attribute. Read it as “Every bit of micro data within this element is about one item”. When we define the `itemscope` we also need to give it is type — this is done with the `itemtype` attribute. The value of the `itemtype` is the url of the schema.org schema — in our case `http://schema.org/BlogPosting`.

The values of fields that make up our schema, for example the “headline” of a blogpost are either other schemas or the values of elements. Here’s a fully schema’d up blog post:

<article itemscope itemtype="http://schema.org/BlogPosting">
    <header>
        <h2 itemprop="headline"><a href="dannybrown.net">A semantic blog post</a></h2>
    </header>
    <img itemprop="image" src="dannybrown.net/image.png" alt="Featured Image"/>
    <p itemprop="articleBody">This is an exert... <a itemprop="url" class="read-more" href="dannybrown.net">Read more →</a></p>
    <footer>
        <div class="article-info">
			<div itemscope itemprop="author" itemtype="https://schema.org/Person">
                <a class="avatar" href="/bloggers/dannybrown">
                    <img class="avatar" itemprop="image" src="dannybrown.net/danny.png" alt="Avatar"/>
                </a>
                <a class="article-author" itemprop="sameAs" href="/bloggers/dannybrown"><span itemprop="givenName">Daniel</span> <span itemprop="familyName">Brown</span></a>
			</div>
            <p class="article-date" itemprop="datePublished">1 day ago</p>
        </div>
    </footer>
</article>

Here we can see that just by assigning an `itemprop` attribute to a tag, the textual content it contains becomes the value of the named field. We can also see that a Person schema can be nested inside our BlogPosting schema to give us a rich author ‘object’.

One other thing worth noting here is that I elected to add `<span>` elements (which don’t change the visual layout of the HTML page) around the first and last names of the author so as to be able to correctly mark them up with `givenName` and `familyName` itemprops.

Any elements which you mark up with schema.org should be visible to the end user. Writing schema elements into your page and then hiding them via css or JavaScript will actually result in your SEO ratings being reduced, and could impare applications which rely on schema properties. (For example if a screen reader used schema.org properties, which to my knowledge none do yet)

Google provides a debugger for Schema.org, which came in great use whilst I was added in support for CS Blogs, its called the Structured Data Testing Tool. The output for a the home page of csblogs.com is shown below:

Google Structured Data Testing Tool Output
Google Structured Data Testing Tool Output

As you can see using Schema.org means that the Google search engine can actually understand what is on the page, and therefore its semantic meaning. csblogs.com is therefore more likely to go up in search terms that include the word blog, or search for the names of the authors mentioned for example.

Wrapping Up

Hopefully this blog post will have made you think about what you can do to make your websites more semantic — and therefore better for search engines, accessibility and in terms of openness. You can use all three of the technologies above at the same time, and I would implore you to do so. In return you’ll benefit from better Search Engine rankings, your users will benefit from better Social Media integration and screen reading for those with disabilities, and search engines can point people to web pages with a better understanding of what that page represents rather than just scanning for keywords.

Danny

Categories
Computer Science Conferences Microsoft Student Partner

Campus Party : Europe in London

Last week I was fortunate enough to be with some of my fellow Microsoft Student Partners, some Windows Ambassadors, some Microsoft Interns and some Microsoft Employees at Campus Party Europe, an event which was described by the BBC as ‘Glastonbury for geeks’.

I would say this was fairly accurate, except there was less mud! Like Glastonbury there were several stages, a whole host of interesting people to meet, and tents!

Working on the Microsoft Stand

Tuesday through to Friday I worked for 6 hours a day on the Microsoft Stand. It was really good fun! Our job was to talk to people about Windows 8.1, Windows Phone 8, Microsoft Surface and the Xbox One and endeavour to answer any questions they had about either the software or hardware. As well as that we tried to get as many people as possible to take our surveys, in return each participant got a surprisingly stylish pair of Windows 8 Branded Sunglasses and a glow stick!

I was also fortunate enough to have Academic Audience Lead Phil Cross, point a few developers who had questions about Visual Studio and developing for Windows platforms my way.

The TeamworkPM App for Windows 8 I developed on the 2 big displays and the Surface Pro I wrote it on
The TeamworkPM App for Windows 8 I developed on the 2 big displays and the Surface Pro I wrote it on

Throughout Wednesday and Thursday I spent much of my shifts writing a Windows 8 app for the project management website TeamworkPM. It was especially interesting to do this because my display was being projected on two 42inch monitors above my head, this meant everyone could see what I was doing and I attracted quite a few developers to come and talk about developing for the platform.

In the evenings when the stand got a bit quiet we would try to entice people to come and see our wares in a variety of ways, one of which was through the medium of dance :P. My highlight was the Macarana, or the Microsoft Macarena as I called it.  Below you can see us all dancing and waving our glowsticks to the ever-entertaining Harlem Shake.

Talks

The main thing that first attracted me to the offer of working for Microsoft at Campus Party Europe was the fact that we could spend our down time watching some of the many speakers that came to talk about their respective fields.

I was fortunate enough to catch 2 or 3 lectures a day, from people as well respected and diverse as Jon “Maddog” Hall — chairman of Linux International — and Ian Livingstone — President of Eidos and founder of Games Workshop.

The O2 arena hosted 8 stages, of all of which had talks from 10am – 10pm each night, so there was certainly a lot to take in — too much to write about here.

My favourite talks were actually that about free and open source software (sorry, Microsoft), and the relatively new phenomenon of open data.

Swag

At the end of the week my fellow MSP’s and I were super happy with being able to have witnessed one of the coolest, and largest tech conferences in the world, but even on top of that Microsoft were generous enough to allow us to keep the devices we had been using throughout the week to showcase both Windows 8.1 and Windows Phone 8 to customers, this meant a Nokia Lumia 920 and a Microsoft Surface RT each!

I was over the moon with the Surface RT because I had been looking to get an RT device for a while to test the performance of a few of my apps on the lower powered ARM CPU’s — but I was especially happy with the Nokia Lumia 920. My phone contract ends in a few days, and because now I have an awesome new phone I’m gonna go on a SIM only plan and save myself some money 🙂

Thanks

I would like to say a massive thank-you to everyone involved at the O2, the people behind Campus Party, and of course Microsoft for making everything work like clock work and giving me a fantastic opportunity to learn from some of the best minds in our industry, a lot of laughs, some great knowledge and some cool electronics! I hope to see you all again soon!

Danny.