« Back to list view

Remove cached URLs from Google

https://www.ilfusion.com/how-to-prevent-google-from-indexing-certain-web-pages

https://www.google.com/webmasters/tools/home?hl=en

Add site (save google file to web root)
Removal of urls can be requested here:
https://www.google.com/webmasters/tools/url-removal?hl=en
Help: https://support.google.com/webmasters/answer/1663419?hl=en
- Single url:
  - /foo/bar.html?baz=zak
  - /foo/ (With trailing slash)
- Directory:
  - /foo (Without trailing slash)
  - Note: does not seem to remove subdirectories!
    - Returns /foo/bar and /foo/bar.html
      but not /for/bar/baz/
- Whole site:
  - Leave empty

Controlling Indexing

Create robots.txt in webroot
- User-agent: *
  Sitemap: https://xxx.yyy/sitemap.txt
  Allow: /sitemap.txt
  Allow: /index.html
  Disallow: /
- Exclude subfolders
  
  User-agent: *
  Disallow: /dir1/*
  Disallow: /dir2/*
Disallow specific pages with html head meta tags
- <meta name="robots" content="noindex">
Create sitemap.txt
- http://googleseo.de/xml-sitemap-generator
In webmaster tools
- Submit sitemap to encourage fast re-index

Make Google forget sensitive data from specific pages permanently

"Make removal permanent"
https://support.google.com/webmasters/answer/1663419?hl=en

Hotfix: Remove urls via search console (see above)
Then immediately make sure the problematic urls return proper 404 and/or noindex tags
DO NOT INCLUDE THEM in robots.txt, otherwise the bots will never find out, that the page does not exist anymore!
- Remove or update the actual content from your site (images, pages, directories) and make sure that your web server returns either a 404 (Not Found) or 410 (Gone) HTTP status code.
  Non-HTML files (like PDFs) should be completely removed from your server. (Learn more about HTTP status codes)
- Block access to the content, for example by requiring a password.
- Indicate that the page should not to be indexed using the noindex meta tag. This is less secure than the other methods.
  IMPORTANT: For the noindex meta tag to be effective, the page must not be blocked by a robots.txt file. If the page is blocked by a robots.txt file, the crawler will never see the noindex tag.
  (https://support.google.com/webmasters/answer/93710)

Example for symfony1 action

  public function executeShow(sfRequest $request)
  {
    $this->getResponse()->setStatusCode(404); // Forbidden
    $this->getResponse()->setHttpHeader('X-Robots-Tag', 'noindex');
    $this->getResponse()->addMeta('robots', 'noindex'); // or "noindex,nofollow"
    echo "404 - page not found";
    return sfView::NONE;
  }

Bulk Remover

A chrome plugin which helps with removing a long list of separate urls

https://www.searchcommander.com/how-to-bulk-remove-urls-google/
Download and install chrome extension
Build list of urls:
- Google Search Console -> MySite -> Suchanfragen -> interne Links -> Tabelle herunterladen
- Convert to a text file with one url by line
- Add the FQDN
Or direct from Google:
- Do your Google-Search (e.g. site:https://xxx.yyy)
- Search-Settings -> 100 entries per page
- http://www.chrisains.com/seo-tools/extract-urls-from-web-serps/
Upload with chrome extension
- If you get message "Quota exceeded", wait a while and reload page (resend post data)

Updated at 05/04/2020 14:53

https://www.ullright.org/ullWiki/show/remove-urls-from-google

Create

Links

Remove cached URLs from Google

Controlling Indexing

Make Google forget sensitive data from specific pages permanently

Bulk Remover