The term duplicate content refers to the occurrence of one and the same piece of content or very similar content under several URLs.
Why duplicate content can be bad for SEO
Duplicate content can lead to SEO problems with otherwise good content, as search engines like Google do not rate affected content as unique. However, because unique and high-quality content is an important factor in Google's evaluation of a website's quality, it can have negative consequences for your website in terms of SEO.
Another problem with duplicate content is the following: if Google crawls multiple pages with the same content, it does not know which of the affected pages is more relevant and should appear in the search results. The relevance, therefore, is "split" among the respective pages, or Google selects a page to be displayed in the search results that might be the wrong one for your SEO strategy.
In addition, duplicate content is problematic for SEO with regard to backlinks, because if the same content is accessible under several URLs, it can happen that other websites do not link to the desired content version. This means that valuable references for SEO are lost or that two or more pages exist that are only referenced a little instead of one page that is well linked. Duplicate content, therefore, distributes link equity to the affected pages, which can impact the ranking of the individual pages on Google negatively.
In the case of deliberate manipulation, duplicate content can even lead to punishment by Google. This is the case, for example, if the content is stolen from external sites or if you try to be displayed more often in the search results by having several pages on the same topic, thereby increasing the number of visitors to your website.
Nevertheless, duplicate content is not always bad for SEO. Sometimes it can even be necessary, e.g. for legal information that has to be repeated on several pages. Google also knows this, which is why duplicate content is not punished in principle. Instead, Google evaluates the appropriateness of duplicated content on a case-by-case basis.
Types of duplicate content
First, you have to distinguish between internal and external duplicate content. Internal duplicate content is defined as content that exists on several URLs of the same website. External duplicate content, on the other hand, occurs when the same content can be found on different websites. It can be caused, for example, by adopting press releases or by plagiarism. The creation of separate websites for individual projects of a company can also cause external duplicate content if these websites copy content from the main company website.
In addition, there are different levels of duplicate content. An exact duplicate exists if two URLs contain the same content. This does not necessarily require 100% equality, because even if the page titles differ, for example, pages with the same content/text are recognized as duplicates by Google.
In addition to such exact duplicates, however, it can also happen that pages contain (include) the complete content of another page (in addition to other content). This problem often occurs on weblogs, when the complete text of an article is displayed on the home page or on tag pages.
Another important type of duplicate content is near duplicate content. This SEO term describes the occurrence of the same content on several pages but formulated and edited differently in each case. An example of this would be to publish two different articles about "SEO" which cover the same aspects in terms of content. Although these are not identical pages, there’s a problem of Keyword Cannibalization as both articles are about the same topic. Therefore, they target the same keywords and thus impair each other's ranking in search results.
When does duplicate content occur?
Duplicate content occurs when identical content is accessible under different URLs. This can have a variety of causes, such as
- content is accessible with or without entering "www." (subdomain) into the Google search
- a website is accessible via HTTP as well as via HTTPS
- a home page can be accessed with or without "index.html" in the URL
- identical content is linked with different URL parameters (e.g. products of an online shop sorted by different parameters, but with the same results)
- session IDs in a URL that are used to track user behavior
- changing the domain and using the same content on the new domain
- owning different domain names or extensions with the same content (e.g. a company owns and operates both the domain www.companyabc.com, www.company-abc.com, and www.company-abc.info to prevent third parties from occupying these domains)
- category and tag pages, e.g. on blogs, where complete articles are listed beneath each other
- pagination (page numbering), e.g. of comments
- print versions of individual pages
- using upper and lower case URLs simultaneously (e.g. a corporate website can be accessed both at www.company.com and www.Company.com)
- different language versions of a page (no problem for SEO if Google can recognize that the versions are intended for different countries, e.g. based on an hreflang attribute)
- mobile versions of a website with the same content
- using identical content and texts from external pages or your own page (e.g. direct adoption of product descriptions from a manufacturer's homepage)
How to solve duplicate content issues
If you already have an SEO problem with duplicate content on your website, you can use the following solutions to solve this problem. However, you can also use these methods to prevent the occurrence of duplicate content in the first place.
First of all, you can set up an HTTP redirect, preferably with the HTTP status code 301 - "Moved permanently" - for URLs that should not appear in search results. The redirect automatically redirects users (and all bots) to the "correct" URL and solves the problem of duplicate content. However, you have to make sure that users are always redirected to the corresponding subpage and not to your home page, for example. Otherwise, users have to find the right page again and their user experience is affected negatively.
Another way to solve duplicate content problems is to use canonical links. These are tags that are inserted into the source code of a website to refer to the original source of the page's content. This way, you tell search engines such as Google which URL is preferred (= canonical URL) and should, therefore, appear in search results. However, you can’t tell in general whether an HTTP redirect or a canonical link is the right solution since this has to be decided depending on the individual case. For example, using a canonical link is more suitable in the case of print versions, whereas an HTTP redirect should be used for domain changes.
Another solution is to use "noindex" to indicate to Google that a particular page should not be indexed, which also counteracts the problem of duplicate content.
Further information on avoiding duplicate content
To avoid duplicate content in the first place, you should not use the same content multiple times on different pages. Instead, when setting up a website, your goal should be to create unique and high-quality content for users and to avoid using repetitive text modules, as this is not only badly received by search engines, but also by users.
If you can’t avoid using already existing content in some cases, you should always link to the source when deliberately copying external content, so that search engines like Google recognize which version is the original. Alternatively, you could also use canonical links here.
Furthermore, there are some technical aspects you should consider if you want to avoid duplicate content:
- All of your pages should redirect to URLs with or without www. with status code 301 - "moved permanently" (do not allow both!).
- In case of a domain change, you have to set up a redirect from the old domain to the new one. Here you should also make sure that you always redirect to the corresponding subpages and not to your home page.
- You should limit URLs to the lower case version.
- Lists etc., which can be sorted by different parameters, should be limited to one variant via a canonical link.
- It is best to specify a canonical link for each page. That way, unpleasant parameters in URLs (e.g. /index.html?source=web&refer=google), which can be caused by careless linking, etc., cannot generate duplicates.
- Check whether the result lists on your website are sufficiently distinctive. If, for example, all articles from category A are also in category B, these category pages or result lists will probably be identical, even if they have a different order.
- With category and tag pages (e.g. on blogs), it is better to just tease the text of individual articles (instead of displaying it in full) and offer a read more button. This not only prevents duplicate content but also increases page views per user.