Remove cached URLs from Google
https://www.ilfusion.com/how-to-prevent-google-from-indexing-certain-web-pages
https://www.google.com/webmasters/tools/home?hl=en
- Add site (save google file to web root)
- Removal of urls can be requested here:
https://www.google.com/webmasters/tools/url-removal?hl=en
Help: https://support.google.com/webmasters/answer/1663419?hl=en- Single url:
- /foo/bar.html?baz=zak
- /foo/ (With trailing slash)
- Directory:
- /foo (Without trailing slash)
- Note: does not seem to remove subdirectories!
- Returns /foo/bar and /foo/bar.html
but not /for/bar/baz/
- Returns /foo/bar and /foo/bar.html
- Whole site:
- Leave empty
- Single url:
Controlling Indexing
- Create robots.txt in webroot
- User-agent: *
Sitemap: https://xxx.yyy/sitemap.txt
Allow: /sitemap.txt
Allow: /index.html
Disallow: / - Exclude subfolders
User-agent: *
Disallow: /dir1/*
Disallow: /dir2/*
- User-agent: *
- Disallow specific pages with html head meta tags
- <meta name="robots" content="noindex">
- <meta name="robots" content="noindex">
- Create sitemap.txt
- In webmaster tools
- Submit sitemap to encourage fast re-index
Make Google forget sensitive data from specific pages permanently
"Make removal permanent"
https://support.google.com/webmasters/answer/1663419?hl=en
- Hotfix: Remove urls via search console (see above)
- Then immediately make sure the problematic urls return proper 404 and/or noindex tags
DO NOT INCLUDE THEM in robots.txt, otherwise the bots will never find out, that the page does not exist anymore!- Remove or update the actual content from your site (images, pages, directories) and make sure that your web server returns either a 404 (Not Found) or 410 (Gone) HTTP status code.
Non-HTML files (like PDFs) should be completely removed from your server. (Learn more about HTTP status codes) - Block access to the content, for example by requiring a password.
- Indicate that the page should not to be indexed using the noindex meta tag. This is less secure than the other methods.
IMPORTANT: For the noindex meta tag to be effective, the page must not be blocked by a robots.txt file. If the page is blocked by a robots.txt file, the crawler will never see the noindex tag.
(https://support.google.com/webmasters/answer/93710)
- Remove or update the actual content from your site (images, pages, directories) and make sure that your web server returns either a 404 (Not Found) or 410 (Gone) HTTP status code.
Example for symfony1 action
public function executeShow(sfRequest $request) { $this->getResponse()->setStatusCode(404); // Forbidden $this->getResponse()->setHttpHeader('X-Robots-Tag', 'noindex'); $this->getResponse()->addMeta('robots', 'noindex'); // or "noindex,nofollow" echo "404 - page not found"; return sfView::NONE; }
Bulk Remover
A chrome plugin which helps with removing a long list of separate urls
- https://www.searchcommander.com/how-to-bulk-remove-urls-google/
- Download and install chrome extension
- Build list of urls:
- Google Search Console -> MySite -> Suchanfragen -> interne Links -> Tabelle herunterladen
- Convert to a text file with one url by line
- Add the FQDN
- Or direct from Google:
- Do your Google-Search (e.g. site:https://xxx.yyy)
- Search-Settings -> 100 entries per page
- http://www.chrisains.com/seo-tools/extract-urls-from-web-serps/
- Upload with chrome extension
- If you get message "Quota exceeded", wait a while and reload page (resend post data)