Penguin is a series of Google algorithmic approaches launched in order to improve the quality of search results by discounting the value of certain manipulative link building practices employed by marketing vendors in order to improve their visibility in Google and other major search engines. Web sites that have used these practices will notice a traffic in received by Google Organic Search Results because of the algorithmic updates.

On this guide: 

What did really happen?

When you see a traffic drop from Google, it does not necessary mean that it is due to a penguin penalty. There might be other causes such as:

Tracking issues

In certain occasions the actual reason a google organic traffic drop might be that the Google Analytics tracking pixel is missing from one or more pages.

Checklist

Scan Site – Craw all web sites and identify which pages are missing the google analytics tag (this can be done with either screaming frog or gachecker.com)

Referrals – Make sure you site is not showing as a referral traffic source (GA > Acquisition > All Traffic > Referrals)

Top pages – Check the top pages in traffic and find out if any pages have stopped working. (GA > Behaviour > Landing Pages)

Hostnames – Identify if any other web site are using your analytics code. These can be caches, translations and other websites. (This report is found under Visitors, Network Properties, and Hostnames).

Alerts – Set up intelligence alerts to get notified when there are big traffic drops. (Use the following (read more here: https://support.google.com/analytics/answer/1033021?hl=en)

Content issues

The content of the sight might be of poor quality or there might be duplicate content issues that might have resulted in a Panda or manual penalty.

Check with Google Web Master Tools and Siteliner to find internal duplicate content issues (duplicate title tags, meta descriptions) and check for doubles of:

Checklist
  • www.site.com and site.com http:// and https:// dir and dir/ / and /index.php /cat/dir/ and /dir/cat/ /cat/dir/id/ and /cat /id/ param_1 = 12?t_1 = 34 and / cat_12 / dir_34 / site.com and test.site.com test.site.com and site.com/test/ /? you_id = 334 / session_id = 344333
  • Use CopyScape or PlagSpotter for external duplicate content issues.
  • Make sure all parameters are blocked from the search engines so that these pages do not get indexed.
  • Check for partial duplicates.
  • Check for inconsistent internal linking (screaming frog, deepcrawl).
  • Look for any sub-domains.
  • Use Angular Duplicate Content Tools to identify and duplicate content issues (http://angular.marketing/free-tools/duplicate-content/).

Technical Issues

This is very common after a site migration due a disallow directive in robots.txt, wrong implementation of rel=”canonical”, severe site performance issues etc.

Checklist
  • Proper use of 301s.
  • Any “Bad” redirects.
  • Redirects point to the final URL and do not leverage redirect chains.
  • Has the Canonical version of site been specified in Google Webmaster Tools.
  • Is the Rel canonical link tag properly implemented across the site
  • Is the site using absolute instead of relative URLs.
  • Is the canonical version of the site established through 301s.
  • Is the robots.txt file as it might be blocking any pages.
  • Scan the whole site and see what tags are being used in every pages in case the no-index tag has been implemented by mistake to one or more pages.
  • Check for indexed pdf versions of your content. (site:mydomain  filetype:pdf)

Outbound linking issues – Many times a site links to a spam sites or websites operating in untrustworthy niches.

  • Links to low trust web sites
  • Paid Links

Negative SEO – If you experience a sudden traffic drop, then you might have been a victim of negative SEO. Negative SEO usually refers to the practice of a competitor buying low quality links and pointing them to your web site with the intention of hurting your organic traffic.

Hacking – In several occasions your web site could have been hosting spam, malware or viruses as a consequence of being hacked.

Google Updates

In order to get a better understanding of how the various Google updates affected the organic traffic, it is also recommended to identify all the core dates that any updates have taken place (official and unofficial)  and how they have.

Google Algorithm Updates Sources affected the web site’s traffic

google-updatesGoogle Traffic

In order to isolate the Google Traffic you will need to create the following segment:

https://www.google.com/analytics/web/template?uid=CsPptfU_QE-X-Yngg00dVQ

Google Algorithm Updates Tools

There are two online tools that I highly recommend to speed up the process:

What is Google Penguin

Google Penguin is an Algorithmic update firstly launched by Google Search Engine in April 2012 to improve the value of the search results returned for users by trying to deal with any form related to spam (also known as spamdexing or Black Hat SEO) such as:

Key facts about Penguin

  • Penguin is an algorithmic update, which means that it is possible to instantly recover from it.
  • Penguin seems to affect more keyword rankings.
  • Recovery is possible before the next update.
  • You DO NOT receive a notification in Google Web Master Tools if you have been hit by a Penguin update.
  • You can only submit a reconsideration request when you have received a manual penalty.
  • The Key date is the 24th April 2012, so if you show a traffic drop after this date, you have been hit by the Google Penguin Algorithmic Update.

How to find out if you were hit by Penguin 

As Penguin is related mostly to backlinks, it is absolutely necessary to examine the following:

  • Over optimised anchor text (externally and internally)
  • Over optimised anchor text on low quality web sites
  • The dates that your web sites traffic was affected.
  • If you received any notification in Google Web Master Tools.
  • Is it a site-wide drop or does it seem to be keyword-specific?

Steps To Recovery

Step 1 – Match updates to Google Analytics organic traffic

Google Analytics is a very useful tool in this case as it can help us identify if there was any traffic drop after each Penguin update.

  • April 24, 2012: Penguin 1
  • May 25, 2012: Penguin 1.2
  • October 5, 2012: Penguin 1.3
  • May 22, 2013: Penguin 2.0
  • October 4, 2013: Penguin 2.1
  • October 18, 2014: Penguin 3.0

google-penguin-update

Step 2 – Compare 2 weeks before and two weeks after

Now you need to compare the organic traffic two weeks prior to each Penguin update two weeks after each update allowing a few days of buffer on both sides to give the algorithm time to shake out.

organic-traffic-drop-after-penguin

Step 3 – Investigating what dropped

Now that you have a clear understanding which updates affected the web sites organic traffic from Google, we also need to find out what actually dropped.

Step 4 – Which keywords dropped?

Penguin seems to affect web sites more at a keyword level rather than site wide. Do a comparison for the same period that you checked your traffic for the top keywords that you are optimising your web site to see if there were any affected severely.

which-keywords-dropped-after-google-penguin

Step 5 – Gather All Links

Now you reached the point that you need to gather all links to start the analysis. For this process you will need you backlink profile from the following tools

After you have exported all data and remove all duplicates with excel, start the analysis of the acnhor text. What you need to do initially is to find instances of anchor text by using the following fucnctions:

COUNTIF

Microsoft Excel Definition: Counts the number of cells within a range that meet the given criteria.

Syntax: COUNTIF(range,criteria)

COUNTIF is your go-to function for getting a count of the number of instances of a particular string.

IFERROR

Microsoft Excel Definition: Returns a value that you specify if a formula evaluates to an error; otherwise, it returns the result of the formula. Use IFERROR to trap and handle errors in a formula.

Syntax: IFERROR(value, value_if_error)

IFERROR is really simple and will become an important piece of most of our formulas as things get more complex. IFERROR is your method to turn those pesky #N/A, #VALUE or #DIV/0 messages into something a bit more presentable.

Step 6 – Combine data to pull out learnings

Now you need to pull data from Google Analytics for each update (15 days before vs 15  days after) for the top anchor texts in order to discover if there was a drop in the organic traffic for these keywords that were used to improve your rankings by linking back to your web site (top anchors). Here is what you need to do step by step:

  • Combine all link resources in excel
  • Keep only the i) Anchor Text ii) Linking Domains ii) Links Containing Anchor Text
  • De-duplicate data
  • Use COUNTIF AND IFERROR to find anchor text instances
  • Extract data from Google Analytics (pre and post update)
  • Find the percentage of traffic drop by using the following formula (date before – date after)/date_before by selecting the columns and cells that represent the data for each date range.
  • Create a pivote table and combine the following information
    • The drop;
    • # of LRDs;

If you are not very familiar with excel and pivot tables, I recommend downloading the following spreadsheet and use it as a guide as it will help you save a lot of time.

Penguin Recovery Spreadsheet

Step 7 – Check links using Link Detox

Link Detox is a very powerful tool if used property as it combines data from multiple resources. Here is what you need to do:

  • Create an account here https://www.linkresearchtools.com/
  • Go to Link Detox https://www.linkresearchtools.com/toolkit/dtox.php
  • Enter Domain to analyze
  • Analyze links going to Root Domain
  • Activate the NOFOLLOW evaluation
  • Select theme of domain from dropdown
  • Select if Google Send You A Manual Spam Example (Yes, No, Do Not Know)
  • Upload any links you already have (Ahrefs, Open site Explorer, Majestic, Google Web Master Tools)
  • Upload Disavowed links (if you have disavowed any)
  • Hit the Run Link Detox and wait
  • After one hour or even the next day go to reports  to find the link detox report
  • Classify all your anchor text before start auditing your links
    • Money
    • Brand
    • Compound (brand + money example : Debenhams toys collection)
    • Other
  • Download the report in CSV formant and open with excel.
  • Keep only the following columns
    • From URL- This is the URL of the page that links to your web site.
    • To URL- This is the page of your web site that the external web site is linking to.
    • Anchor Text – This is the keyword or keyword phrase used as link text.
    • Link Status: If the link is passing link juice or not for the search engines (follow or nofollow).
    • Link Loc – The location of the link on the page (paragraph, footer, widget etc.). Very useful when you need to remove it.
    • HTTP-Code –
    • Link Audit Profile – The important of reviewing the links coming from each domain. The higher the priority the more urgent to examine the links.
    • DTOXRISK – How toxic is each link.
    • Sitewide links – a site wide link is one that appears on most or all of a website’s pages (blogroll, footer etc.)
    • Disavow – Has Google been notified through the Disavow Tool that these link has to be ignored.
    • Power Trust – The power trust is a metric used to show how powerful and trusted is a page or domain to the eyes of Google.
    • Power Trust* Domain – Power trust metric applied to an individual page.
    • Rules – Spam link classification (banned domain, link network etc.)

Step 8 – Create additional columns

Before you start, you will need to create the following columns:

  • Contact Email
  • Contact Page URL
  • Contacted (Yes, No)
  • Removed (Yes, No)
  • Page Power Trust (Majestic)
  • Domain Power Trust (Majestic)
  • Niche (use majestic for this)
  • Page Indexed (double check)
  • Date of 1st Contact
  • Date of 2nd Contact
  • Date of 3rd Contact
  • Notes

The following are supplementary:

  • Edu Domains (Majestic)
  • Domain Toxic Links (OpenLinkProfiler)
  • Governmental Domains  (Majestic)
  • Page Facebook Shares
  • Page Facebook Likes
  • Page Twitter Shares
  • Page Google ++ Likes

Step 9 – Keep only on URL per domain

Create an additional column after the domain column and paste the following function to the fist cell =IF (B1=B2,”duplicate”,”unique”) and copy it across the whole column and then select from the filter control you have applied to view only unique values.

Step 10 – Exclude non-verified domains

Simply use the filter from the column anchor text to exclude the unverified links. These are links that do not exist anymore.

Step 11 – Exclude disavowed Links

If you have done a link audits before and you have disavowed files, it would be good to exclude them too as this will help save precious time. You can review these links separately.

Step 12 – Start with banned domains  

Now it the time to start reviewing your links. Follow links are always a higher priority as if they violate the Google guidelines directly.

  • Now is the time to start reviewing your links. Follow backlinks are always a higher priority as they violate the Google web master guidelines directly.
  • Apply a filter to all cells.
  • Then apply filter to view links with TOX1 (banned links).
  • Use the tag columns to mark which domain needs to be removed or not by using a descriptive tag.
  • Mark any URL or domain that needs to be disavowed so that you can create a file at the end very easily.
  • Be careful while reviewing as in certain cases several domains might not be indexed for other reasons than being penalised (robots.txt, no-index tag).
  • Also, several domains might be very authoritative and trustworthy and there is nothing wrong with linking to you. For instance the following link was found as TOXIC during one of my link audits.

Step 13 – Domain infected with viruses

  • Apply a filter to all cells
  • Then apply filter to view links with TOX2 (virus infected)
  • If you find good one do not remove but simply contact the web master.
  • Remove only the bad ones.
  • Double check the domains with one of the following tools

Step 14 – Audit TOX3 domains

All there links according to Link Genesis are classified a highly toxical, so you will need to check them very carefully and remove them if you agree with link detox suggestions.

Step 15 – Double check Google Web Master backlinks

Pay particular attention to link imported from Google Web Master Tools during your reviews as according to John Mueller from Google has confirmed that it should be the primary source of backlinks used to clean up your web site from bad links.

How to judge the value of a Link

Before deciding to take any action with any links that might be toxic and therefore could result in your web site receiving a Penguin penalty by Google, you need to devote time to understate all the data that you have pulled in the spreadsheet such as:

Domain Trust Flow (Majestic): How respected is the domain on the web. If the domain has a high trust in general, this is an indication that Google values it.  (This usually applies for domains with domain trust over 10).

Page Trust Flow (Majestic): This metric is similar to Domain Power Trust but it is applied at a page level.

Domain Power Trust (Cember): This metric determines the quality of a website according to its strength and trustworthiness. It analyses data in real time from over 24 sources including Google, Moz, SEMrush, MajesticSEO, Sistrix, and many more.

There are four type of links:

  • High Trust and Low Power– Link from highly trusted domains such as Universities or governmental institutions. These links are usually very difficult to get and have a very positive impact on your web site’s credibility.
  • Low Trust and High Power– These links require further research as they are not necessarily always good.
  • Low Trust and Low Power– These links do not help much in general as they may come  from new, dormant, or even  penalized sites. Review carefully any of these sites before you decide to build any links in any of their pages.
  • High Trust and High Power– This is ideally what you are looking for. Pursue this links actively as they will strongly benefit your web site.

DTOXRISK: This is the risk for each link based on how harmful it might be for your web site based on Link Detox calculations (client feedback, observations, linking domains, neighbourhood, internal and external seo experts, known google publications etc.)

To get a full understanding please go the following page: http://www.linkdetox.com/faq

Link Audit Priority: The higher the priority the more important it is to review each link.

Link Status: Whether a link is follow or no-follow.

Link Location: Where the link is placed on the page (header, footer, navigation etc.)

Niche: The niche that the domain falls under (finance, property, computers etc.)

HTTP code: These codes help identify the cause of the problem based on the response send from the server (For detailed information please go: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html)

Page indexed: Whether the page is indexed or not by Google. Please double check and also use http://indexchecking.com/ .

On the top of all these metrics you will need to take into consideration how search engines juidge the value of each link.

Which Links To Remove?

  • Link networks
  • Article submissions
  • Directory submissions
  • Duplicate content links (e.g. guest blog duplicated over 100s of domains)
  • Spammy bookmarking sites
  • Forum profiles (if done for backlinks)
  • Malware/hacked sites
  • Gambling/Adult sites (if your site is not in the same niche)
  • Comment links with over-optimized anchor text (e.g. Cheap Flights instead of John Doe)
  • Blog roll links
  • Footer links
  • Site wide links (in most cases)
  • Scraper sites
  • Any auto-generated links (xRumer forum posts, etc.)

Defining and measuring the success

After identification and prioritization of the most toxic links to be removed, check the results of these tasks. Depending upon the website case, there can be 3 scenarios:

  • Manual Penalty: Even if you have received the “Manual Spam action revoked” from Google, this does not ensure that your website traffic will recover to its pre-penalty level.
  • Algorithm hit:In this case, the effect is visible only after a month or so. For example, penguin update may take as long as 6 months to recover from it.
  • Proactive approach: Avoiding a manual penalty/future algorithmic is appreciable for the sites that initially had heavy toxic links.

How to Remove Links

When it comes to removing links, there are several options available.

Contact web masters

  • Draft a document containing complete links’ list to be removed. Send to webmaster in single, well drafted and small email.
  • Adopt one communication channel (email/linked-in/face-book). Shift the channel only when the earlier one isn’t responding.
  • Be polite to webmasters. They are trying their best to solve your problem! Keep human touch in communication protocol. An email referring the webmaster by his/her name is more likely to get response and develop strong professional relation.
  • Email from the domain (you wish to remove) brings better response possibility from webmaster
  • Avoid paying any high feeds for link removal.
  • Be polite because you are asking favor from webmaster.
  • Be polite while talking to Spammers because they can blackmail you for money if you act rude.
  • If you spammed somebody’s site; Be polite, Admit and apologize; Make sure you don’t repeat it (webmaster will check).

The disavow tool

  • If you aren’t able to remove all toxic links, use disavow tool before reconsideration tool. Disavow tool shouldn’t be used as short cut at expense of above mentioned tasks because Google checks previous efforts before entertaining your request.
  • Make a spread-sheet of the links removed without using disavow tool; sort it and make a list of un-removed links, removed links and methods of removal.
  • Focus on un-removed links. Try to sort data in order of domain.
  • You can either disavow a full domain or just one link from a domain. Choose the method wisely. Make separate lists in notepad for the domain and links.

google-disavow-toolsSome tips:

  • Don’t include http://www. Prior to a domain.
  • Don’t use extensions like .com
  • Put each domain in new line and add reason (with prefix #) for disavowing for Google’s reference.

Link Removal Tools

link-removal-toolLink Research Tools / Link Detox

This is my personal favourite tool as it has some very powerful features such as:

  • Nofollow Link Evaluation
  • Disavow Links Import
  • Link Spam Classification
  • Link Prioritisation
  • Tagging
  • Exporting to excel
  • Disavow File Creation
  • Data Visualisation
  • Contact details extraction

I found it extremely useful when it comes to link networks, penalised web sites, article sites, directories and forums. However, sometimes a few of the results are not 100% accurate especially for Toxic 1 Links and probably because the tool is unable to retrieve the right data. If you use it carefully and have a full understanding of the metrics used to categorise each link you will be able to speed up your work without removing any good links.

KERBOO

Kerboo can be used both to analyse your backlink profile and audit you disavow file. All Links are classified based on their LinkRisk from 0-1000. You can add your own link data to a Kerboo Audit profile. The more data you use the more accurate the LinkRisk score. There is a nice user interface that allows you to easily audit all link profile data, while the Peek too makes information discovery a lot easier, saving you a lot of time.

LinkDelete

LinkDelete is in my opinion one of the best removal services I have seen. Their algorithm is very sophisticated and accurate and they also handle manual outreach to web masters and send you reports that you can submit to Google. Different packages are available depending your budget and needs. The service keeps improving.

Remove’em

Remove’em can help you identify spam my links and save time by managing the outreach from the same place. The interface is not that great but it definitely helps speed up the process. I use it mostly for link removal after I have completed the link audit

Rmoov

Rmoov does not attempt to classify links as spammy but it simply help you speed up the link outreach by locating any contact information available on each web site.

Buzzstream

Buzzstream is not a link audit tool but it can be used to help you with link removal as it pulls the contact information from each web site and it can also retrieve the Whois information if there is not any contact information available on the web site. Other useful features included are: i) powerful templates ii) list building iii) flexible filtering and iv) notifications.

Disavow

After you have tried removing as many links as possible use the disavow tool provided by Google for links that you were not able to remove. This tool is only meant to be used as a last resource so you have to make very good use of it.

404 the pages

In several cases if you cannot remove any deep links at all, you can also change the URL page so that all these links go to a 404. I personally redirect them to another site that I create specifically for this reason. I do not like to increase the errors on any web site that I am working on.

Writing Effective Link Removal Emails

writing-effective-link-removal-emails

When writing to web masters to remove low quality links that might hurt or that have already hurt your web site you always need to bear in mind the following simple guidelines:

  • Use Your Own Email
  • Be brief
  • Be polite
  • Explain the situation
  • Make it easy to remove the links
  • Provide the pages that are linking to your web site
  • The linked pages
  • Explain what exactly needs to be done
  • Remove links
  • Make links nofollow
  • Disavowing is not enough, please remove any mention of our site from any page.
  • Notify me when the links have been removed.

Here are a few examples that you can use

Email example 1

Subject line:  Please remove a link

Hi,

I am currently trying to remove any links I can as my web site has been penalised by Google. Your site is really great but in order to increase the chances of getting out of the penalty I will have to ask for your help.

Here are the pages that are linking to my site:

www.example.com/randompage is linking to www.mysite.com/my-page with the “Luxury London Hotels” anchor text.

The link need to be actually removed, rather than just disavowed. Even if they are “nofollow,” I’d still like them removed.

Please let me know as soon as you remove the links from your web site.

All The Best
Your Name

Email example 2

Hi,

I am trying to remove some backlinks pointing to our website, www.hurford-salvi-carr.co.uk. I would really appreciate your help in removing these links. Here is the info…

My website is linked on your website here:

  • http://www.example.com/path/page.html

The URLS point to the following pages:

  • http://www.mysite.com/path/page.html

And it used the following anchor text:

  • Sussex Flats To Let

If you could please send a confirmation note letting me know that the link has been removed, I would really appreciate it. Thanks in advance! I hope to hear from you soon.

Kindest Regards
Your Name

A Common Sense Approach

Based on everything that has been said by John Mueller and Mutt Cuts from Google and from industry experts and my personal experience I would suggest the following actions:

  • Review very carefully all links.
  • Clean up as many links as possible you can and in particular the ones you created yourself (directories, forums, mini sites, profiles, press releases on poor quality sites, mini sites)
  • Disavow all toxic links that you could not remove.
  • Make sure 60% of your anchor text is branded and only 40% focuses on money keywords ( 20% exact, 20% miscellaneous).
  • Review carefully you niche to understand their link profile ( branded vs non-branded percentage, link types).
  • Build quality only links to restore link equity and to also build trust with Google.
  • Grow your brand.
  • Get media coverage.
  • Wait until Google reruns the Penguin algorithm and reassesses your site.

I personally try to remove as many links as possible to recover a site from penguin even if it is not necessary, simply because I do not want the sites that I work with to be associated with any spammy or low quality sites. Furthermore, if I am not convinced 100% that the disavow works without removing any Inks. Another reason is that sometimes if I choose to disavow links instead of domains (rarely), I might miss several bad links.

Depending on your time and resources, you will have to decide if you really wish to clean up the sites back link profile whether to focus only on recovering from Penguin. Link removal campaigns have in general 5 – 20% success rate, so for algorithmic updates are inefficient but you should always talk to your clients about this option.

If you have any suggestion please feel free to email me. I will also try to keep improving this guide as much as I can with new tools and information.