Remove cached URLs from Google

https://www.ilfusion.com/how-to-prevent-google-from-indexing-certain-web-pages

 

https://www.google.com/webmasters/tools/home?hl=en

Controlling Indexing

  • Create robots.txt in webroot
    • User-agent: *
      Sitemap: https://xxx.yyy/sitemap.txt
      Allow: /sitemap.txt
      Allow: /index.html
      Disallow: /
    • Exclude subfolders

      User-agent: *
      Disallow: /dir1/*
      Disallow: /dir2/*
  • Disallow specific pages with html head meta tags
    • <meta name="robots" content="noindex">
       
  • Create sitemap.txt
  • In webmaster tools
    • Submit sitemap to encourage fast re-index

Make Google forget sensitive data from specific pages permanently

"Make removal permanent"
https://support.google.com/webmasters/answer/1663419?hl=en

  • Hotfix: Remove urls via search console (see above)
  • Then immediately make sure the problematic urls return proper 404 and/or noindex tags
    DO NOT INCLUDE THEM in robots.txt, otherwise the bots will never find out, that the page does not exist anymore!
    • Remove or update the actual content from your site (images, pages, directories) and make sure that your web server returns either a 404 (Not Found) or 410 (Gone) HTTP status code.
      Non-HTML files (like PDFs) should be completely removed from your server. (Learn more about HTTP status codes)
    • Block access to the content, for example by requiring a password.
    • Indicate that the page should not to be indexed using the noindex meta tag. This is less secure than the other methods.
      IMPORTANT: For the noindex meta tag to be effective, the page must not be blocked by a robots.txt file. If the page is blocked by a robots.txt file, the crawler will never see the noindex tag.
      (https://support.google.com/webmasters/answer/93710)

Example for symfony1 action

  public function executeShow(sfRequest $request)
  {
    $this->getResponse()->setStatusCode(404); // Forbidden
    $this->getResponse()->setHttpHeader('X-Robots-Tag', 'noindex');
    $this->getResponse()->addMeta('robots', 'noindex'); // or "noindex,nofollow"
    echo "404 - page not found";
    return sfView::NONE;
  } 

 

Bulk Remover

A chrome plugin which helps with removing a long list of separate urls