Marking up your website with JSON-LD according to Schema.org

JSON-LD refers to a JSON-based method for embedding structured data into a website. Unlike other formats for data structures, like Microformats, RDFa, and Microdata, tagging isn’t carried out as a source code annotation. Instead, metadata is implemented in script form separately from website content. JSON-LD uses JSON notation, which it then expands into ‘types’ and ‘properties’, as described on Schema.org. JSON-LD’s specification comes from Digital Bazaar’s founder, Manu Sporny, and has been recommended by W3C since 2014.

What is JSON?

Short for ‘JavaScript Object Notation’, JSON is a compact format used for exchanging data written in text form so that it can be easily processed by both human as well as machine users. The data format is derived from JavaScript, meaning that valid JSON documents need to be JavaScript. No matter which programming language is used, JSON can be applied across all platforms. As a data format for serializing data structures, JSON is used for transferring and storing structured data for web applications and mobile apps. The syntax of JSON objects are mostly composed of name-value pairs that are separated by colons:

JSON syntax:

{
"name": "Manu Sporny",
"homepage": "http://manu.sporny.org/about/",
}

In the introductory segment, you can find the word pair, ‘Manu Sporny’. Simply by relying on innate pattern recognition skills, a human user would be able to tell through context that the sequence of letters above refers to a name and that the deposited hyperlink is a reference to the JSON-LD developer’s web presence. Programs like web browsers or search engine crawlers, on the other hand, need meta data in order to be able to grasp such contexts. This is exactly what JSON does by proving users with name-value pairs. The example code above shows the two name elements, ‘name’ and ‘homepage’, with their corresponding values. A program reading a website with this JSON object is thus able to recognize ‘Manu Sporny’ as a name and ‘http://manu.sporny.org/about/’about’ as a website.

Linked data (LD)

While JSON tends to run flawlessly when assigning values within a single website, analyzing several websites can quickly lead to ambiguity problems. Let’s take a look at how these programs work in order to better understand how such issues may come about. Generally, programs parse a variety of information from websites and evaluate this gained data within their databases. Beginning with the example of code above, however, it isn’t possible to determine that the name elements, ‘name’ and ‘website’, are to be used within the same semantic context. In order to rule out this ambiguity, JSON-LD adds a context forming element (i.e. a schema of sorts referred to as a ‘type’) to the original JSON notation. This process is carried out with the help of linked data. Freely available data found online is accessed via uniformed resource identifier (URIs). The project, Schema.org, offers a standard set of schemata, or types, for structuring data. JSON LD, however, isn’t fixed to any particular vocabulary.

When supplemented with the relevant context elements, the example code above produces the following lines:

JSON syntax supplemented by keywords

"@context" : "http://schema.org/",
  "@type" : "Person",
  "name" : "Manu Sporny",
  url" : "http://manu.sporny.org/about/"
 }

In order to process JSON for linked data, JSON-LD supplements name-value pairs with keywords. These keywords begin with an ‘@’, symbol. Here, the keywords @context and @type are integral for this process. While @context (line 2) defines the script’s fundamental vocabulary (here: Schema.org), @type (line 3) states which schema (data type) is at play. Following this, a program parsing this vocabulary is able to recognize that the text element, ‘Manu Sporny’, is labeled as a person as defined by Schema.org’s data type, ‘Person’.  Introduced here as ‘name’ and ‘url’, the introductory name-value pairs are processed as properties of the schema, or type, ‘person’. The vocabulary decides which properties a type can be assigned to.

How JSON-LD stacks up against other data formats

JSON-LD allocates types in the same way that other formats for semantically tagging web content do. Converted into a source text annotation, the example script below can be labeled per Microdata or RDFa according to schema.org without any loss of information:

Microdata syntax according to Schema.org:

<div itemscope itemtype="http://schema.org/Person">
  <span itemprop="name">Manu Sporny</span>
  <span itemprop="url">http://manu.sporny.org/about/</span>
</div>

RDFa syntax according to Schema.org:

<div vocab="http://schema.org/" typeof="Person">
  <span property="name">Manu Sporny</span>
  <span property="url">http://manu.sporny.org/about/</span>
</div>

The advantage that JSON-LD has over its competitors is that metadata doesn’t need to be directly embedded into the HTML code. Instead, it can be freely imbedded at any point. This step is carried out with the script tag according to the following type:

Implementation of JSON-LD in HTML:

<script type="application/ld+json">
{
  JSON-LD
}
</script>

The strict separation of HTML and semantic annotation does more than just increase the source code’s readability. This kind of implementation is suited to generating dynamic websites independently of their visible content. This means that JSON-LD enables meta data to be entered into the backend, read out of a database, and then automatically generated with the help of a template. Despite all of this, JSON-LD still hasn’t managed to replace the other data structuring formats. Although Schema.org named JSON-LD its preferred format for structuring data in 2013, the leading search engines only support the script-based implementation of meta data with a few data types. For example, Google recommends JSON-LD for the Knowledge Graph and the sitelink search box as well as for tagging recipes and events. For the remaining data types, on the other hand, the market leader suggests annotating via RDFa or Microdata; one reason for this is due to the high potential of spam for non-visual content markup.

Until now, marking up content to make it machine-readable has been a fundamental rule of semantic annotation for search engines. JSON-LD’s script-based markup parts with this tradition.


Using JSON-LD

Google recommends using script-based markup with JSON-LD for information related to events. In HMTL, announcements for concerts, musicals, theatrical performances, etc. are represented with the following type:

Event announcement in HTML

<p>
  <a href="http://www.eventhost.com/band/2016-04-20-2000">band in New York City</a>,<br>
  date: 20.04.2016,<br>
  admission: 20:00,<br>
  <a href="http://www.eventhost.com/events/band/2016-04-20-2000/tickets">tickets</a>
  Preis: 100,<br>
  tickets available: 1839,<br>
  <a href="http://www.event-in-NYC.com/">event location</a>,<br>
  Example street 1,<br>
  10458 New York City,<br>
</p>

Typical data for the data type ‘Event’ include information like dates, times, prices, the number of available tickets, the event’s location, and additional information about the event and its location. Human site visitors are able to extract this information from its various forms of depiction (e.g. paragraphs, tables, titles, etc.,) and assign it to a corresponding semantic context. Programs like search engine crawlers, on the other hand, require meta data with instructions on how the presented information should be processed. JSON-LD delivers this data in the form of a script. This script can be added to any spot of the HTML source code, separately from its content.

Tagging events with JSON-LD

Event details in JSON-LD format can be transferred into a separate script as follows:

JSON-LD script for marking up event information:

<script type="application/ld+json">
{
  "@context" : "http://schema.org",
  "@type" : "Event",
  "name" : "band in New York City",
  "startDate" : "2016-04-20T20:00",
  "url" : "http://www.eventhost.com/events/band/2016-04-20-2000",
  "offers" : {
    "@type": "AggregateOffer",
     "url" : "http://www.eventhost.com/events/band/2016-04-20-2000/tickets",
    "lowPrice" : "100",
    "offerCount" : "1839"
  },  
  "location" :
  {
    "@type" : "Place",
    "sameAs" : "http://www.event-in-NYC/",
    "name" : "event location",
    "address" :
    {
      "@type" : "PostalAddress",
      "streetAddress" : "example street 1",
      "addressLocality" : "New York City",
      "postalCode" : "10458"
    }
  }
}
</script>

The script tag in line one describes all of the elements as the typeapplication/ld+json’. The information that follows is geared towards programs that are able to read linked data in JSON format. On the first layer, the keywords @context and @type with the values ‘http://schema.org’ and ‘event’ (line 03 and 04) can be found. Here, a parsing program is able to receive instructions that the following information is to be assigned to the type ‘event’ as per Schema.org and thus refers to a specific property of the described event. These properties are depicted in the form of name-value pairs. The properties ‘name’, ‘startDate’, ‘url’, and ‘location’ are located on the first layer and have been assigned to the event information as values. This way, a search engine crawler is able to identify the information ‘http://www.eventhost.com/events/band/2016-04-2000’ as the URL for the relevant event and ‘2016-04-20T20:00’ as its starting time (StartDate)

Properties as types

Just as with RDFa and Microdata, it’s also possible with JSON-LD to define properties as types and then further define these with specific properties. Such a case can be seen on the second layer on lines 09, 16, and 21. Here, the event property ‘offer’ is tagged as the subtype ‘AggregateOffer’ and is paired with the properties ‘lowPrice’ and ‘offerCount’:

  "offers": {
   "@type": "AggregateOffer",
   "url" : "http://www.eventhost.com/events/band/2016-04-20-2000/tickets",
"lowPrice": "100",
  "offerCount": "1839"
}, 

The event property ‘location’ can be tagged as the type ‘place’, which is then further specified through ‘name’ and ‘address’. In line 21 above on the third layer, one can see a nested value. Here, the property ‘address’, a property of the subtype ‘place’, has been further assigned the property postal address, which in turn is assigned ‘streetAddress’, ‘addressLocality’, and ‘postalCode’.

Testing JSONJ-LD scripts

By nesting types, sub types, and properties within one another, complex JSON LD scripts are made possible. Separating HTML markup and semantic annotation ensures vastly clearer readability than what’s normally seen with other formants, like RDFa and Microdata, that rely on source text annotation. In order to avoid programming mistakes, Google offers a free-of-charge tool that lets developers validate JSON-LD scripts for data structuring.