Canonical tag: what’s behind the canonical URL?

When search engines index website content, they follow two main principles: firstly, each page that is to be included in the index must be able to answer at least one relevant search query. On the other hand, a page’s content may only be retrieved under a single URL – otherwise it is classified as duplicate content. This means that the page is question won’t get indexed or will disappear from the index. Many web projects are faced with a dilemma if they offer similar content in the same language on several country specific domains (e.g. stores that have websites for the US, Canada, and the UK). Product descriptions often differ only in terms of currency and shipping costs. In general, many of the online stores risk duplicate content when they offer several variants of a product and the description only differs by a few words. An example might be shoes that are offered in different colors.

One possible solution to this problem is the so-called canonical tag, which enables you to declare a URL or URI as the default resource for indexing.

What is a canonical tag?

In 2009, Yahoo, Microsoft, and Google introduced a new link type named canonical, which would quickly become one of the most important tools for SEOs, and since 2012 has been labelled as the official web standard in RFC 9596. As a component of a link element, it is implemented in the HTML header of a webpage, which is the same location that the title and the meta description can be found. If the canonical link is embedded on a page, it refers to a specified default URL or URI, which is also referred to as a canonical URL and is used as an indexing source instead of the page.

The canonical URL unites its own link popularity and reputation as well as that of the referenced pages –  it generates a better ranking this way. However, since the tagged URLs are not included in the index, there are no problems with duplicate content. The label is merely a recommendation to the search engines, indicating that the link attribute does not necessarily have to be included in the index. If the implementation is incomplete or flawed, there is even the risk that the entire site will be ignored, which is why it’s so important to use the canonical tag correctly.

How does the canonical tag work?

For the canonical tag to work, two things are required: firstly, you need the exact URL of the desired canonical page to be specified as the default resource. Secondly, you need a link element in which you can insert the canonical URL including the canonical statement. The corresponding code looks like this:

<link rel="canonical" href="URL/URI of the canonical page">

The link element, which has no closing tag in HTML, contains the attributes rel and href as the empty element. The former is needed to specify the relationship between the current and the linked document, while the latter indicates where the linked document is to be found. The values needed for this purpose can be found within the quotation marks: the rel value 'canonical' specifies that it’s dealing with a canonical URL, which is specified in the href attribute below:

Tip

The canonical tag can be used to link to an external domain, not just an internal one. The approach isn’t different, which is why you need to specify the exact URL, not the default address of the site.

So that the alternative pages reference the specified default resource, the code needs to be inserted into the header area of the respective HTML documents, as mentioned above. If the contents are not in HTML, for example, in the case of a PDF file, implementing them in the HTTP header is also possible. This is logically associated with a somewhat different syntax:

Link: <URL/URI of the canonical page>; rel="canonical"

When does it make sense to use canonical instructions?

By definition, the canonical tag isn’t a factor that search engines must necessarily include. By specifying a concrete, representative version for multiple content versions, you’re giving search application crawlers a helping hand, although you can’t always be sure that this is noticed. Since the code must also be implemented individually for each piece of multiple content and for each alternative URI, the question arises of whether the comparatively high effort is worth it. In the next few paragraphs, we will discuss four scenarios in which you should consider using the canonical tag.

Content is distributed on dynamic URLs

Dynamic URLs now play an important role – especially in e-commerce. Although the user-specific pages are an excellent and easy option for presenting the same content (including slight variations) to different users, they also create problems for the search engine crawler. Here, canonical tags are highly recommended to prevent possible duplicate content.

Content can be accessed via different URLs

Due to the structure, some web projects such as blogs, web stores, and advice portals offer content in several categories at the same time, and therefore often under different URLs. For example, a store could feature a 'green shirt' at the same time on several URLs:

  • my-ecommerce.store/fashion/shirt-green
  • my-ecommerce.store/summerfashion/shirt-green
  • my-ecommerce.store/winterfashion/shirt-green

Content is often retrievable via different URLs due to a change in website structure or even a complete domain transfer. For cases like these, 301 redirects are always displayed, but if this isn’t possible for technical reasons, link rel="canonical" can prove a useful alternative.

Content is available on different domains

The possibilities of so-called cross-domain canonicals have already been mentioned briefly. This way, you can easily have your posts on a different domain without creating duplicate content. Positive user signals, links, and other ranking-relevant factors are transferred to the original URL, which can significantly improve the performance.

Content has different formats

In some situations, it is useful to publish content in different formats and, for example, to offer forms not only in HTML versions, but also as PDF files and printable versions. To ensure that search engines do not analyze each variant individually and end up ranking the wrong version, it’s recommended to use canonicals. A mentioned above, you may need to integrate the tag in the modified syntax into the HTTP header, depending on the format.

Tip

If you have a valid SSL/TLS certificate, you should make sure the secured HTTPS URLs are canonical URLs and link to them from the unsecured variants (HTTP). The same also applies to mobile or AMP sites – for the latter, canonicals are even required.

The differences between canonical tags and 301 redirecting

At first sight, the canonical tag seems to be very similar to the 301 redirect. This redirect, which is based on the HTTP status code 301 (Moved Permanently), also signals to the search engines that these pages should be analyzed as individual versions. In addition, redirects also lead visitors to the original URL, while all other variants are eliminated. Pages with canonical tags are only marked as a copy for the search engines and therefore remain accessible to the user.

Another difference is the fact that search engines never ignore a redirect, while this may be the case when a recommended link element label is used. Last but not least, the two methods also differ in terms of their functionality when used across multiple domains: While the 301 redirect transports a page from domain A to domain B, the canonical tag states that only a correlation from domain A to domain B exists.

Canonical tag: common errors

A canonical URL is the optimal solution in many situations to avoid duplicate content on your site. Leading search engines take the canonical label into account when indexing, recognizing that you don’t want the same or similar content to be ranked. Positive search engine signals are combined in the main URL, which improves their positioning. At this point, however, it should be noted that the use of canonical tags can quickly end up having a negative effect if they aren’t properly labeled or are incorrectly implemented. The following sections show you the most common canonical errors.

Numbered pages refer to a canonical URL

To prepare content in an appealing manner, many web masters resort to page numbering. News portals, in particular, use this method by dividing and numbering content on several pages. However, if you set canonical tags for content like this, and link later pages to the output URL using link-rel="canonical", you’re making a mistake: the following pages are by no means duplicates, which is why unique content does not end up in the index. If you still want to give the search engines information about the chosen numbering, there are two advisable approaches:

  • Link to the previous and subsequent page on each URL. You require the link attribute, rel, then replace the value 'canonical' with 'prev' or 'next'.
  • Using link-rel="canonical", link to a one-page version of the relevant content, which combines all numbered pages.

Relative URLs don’t link to the desired canonical page

Like most HTML tags, the <link> tag offers the possibility to specify absolute and relative URLs. While absolute URLs describe the entire path (including 'http://' etc.), relative URLs link to a particular folder on the current site without requiring the complete URL. For example, a relative path 'images/image.jpg' states that the image 'image.jpg' can be accessed in the sub folder entitled 'images'. However, when using the canonical tag, using paths like these quickly leads to complications, which could mean that the crawler ends up completely ignoring the tag.

Google therefore links to the following, as an example:

< link rel="canonical" href="example.com/cupcake.html" >

Because of the missing HTTP prefix, the crawler interprets the URL 'example.com/cupcake.html' as a relative URL, assuming that the desired canonical URL is 'http://example.com/example.com/cupcake.html'. Ideally, you should always specify the full URL when you place the canonical tag, or alternatively, when you link to an absolute URL without a domain:

< link rel="canonical" href="/cupcake.html" />

Pages linking to more than one canonical URL

The nature of canonical URLs logically excludes a subpage from linking to several standard websites. Multiple links like these are created quickly when you work with a content management system or store software. Plugins and templates often use canonical tags automatically even if you have already specified a canonical URL. If you are using extensions like these, you should also check the source text and correct the information if necessary. Otherwise, your efforts will probably be fruitless, since search engines are likely to ignore all canonical tags instead of preferring them.

Canonical URL is specified in the body

The link element can be included as often as required in an HTML document. However, a prerequisite for the functionality of the attributes used is that the <head> area should be used for this purpose. If the label is in the <body> area of the respective page, it remains without any effect. To avoid problems when processing the HTML code (HTML parsing), Google also recommends that the canonical instruction appears as early as possible in the header area.

No canonical tags in the mobile version

Anyone who offers a mobile version of their website as well as a desktop version is faced with the possibility that a lot of things could go wrong. Even if pages can be indexed despite missing information, you should try to make it as easy as possible for search engines when collecting information and categorizing it. For example, Google recommends using rel="alternate" as an alternative and also recommends linking from mobile pages to desktop URLs using the canonical tag. Google has provided a guide on extensive tips and possible approaches for 'Mobile Friendly Websites'.

Canonical tags and hreflang contradict themselves

International web projects with different country domains present a great challenge to SEOs. On the one hand, the pages of all variants should be ranked well and be displayed to the appropriate users; on the other hand, the risk of duplicate content should be reduced to a minimum. One of the most important tools is the hreflang attribute, which allows individual versions to be labeled as equivalent alternatives. For this purpose, however, it is necessary that these pages always link back to themselves. If URLs like these link to canonical URLs simultaneously by canonical tag, this is a big contradiction for search engine crawlers.

This results in the search engine ignoring both signals and instead indexing websites based on other features. Therefore, you should avoid using both these instructions at the same time.

Fact

Not only does the combination of canonical URLs and hreflang lead to contradictions that have a negative effect on your site’s ranking, but using canonical tags and instructions such as 'nofollow' or 'noindex' don’t bode well with Google.