Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 1.32.0

Request #12453 disallow robots in versioned api docs
Submitted: 2007-11-15 14:52 UTC
From: cweiske Assigned: dufuz
Status: Closed Package: pearweb (version CVS)
PHP Version: Irrelevant OS:
Roadmaps: 1.18.0, 1.17.0    

 [2007-11-15 14:52 UTC] cweiske (Christian Weiske)
Description: ------------ When looking via a search engine for e.g. "mdb2 multiple", I get to the api docs of the outdated mdb 2.0.0rc5. Versioned API doc URLs should be disallowed for robots, so that only the "/latest/" urls get into search engines.


 [2007-12-17 19:25 UTC] wiesemann (Mark Wiesemann)
This is impossible with the current directory layout and robots.txt, e.g.: The description for the "Disallow" lines is: "The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that starts with this value will not be retrieved." ( => we either need another directory structure for the API docs or some ugly hack that detects search engine access to the API doc directories
 [2007-12-30 06:24 UTC] dufuz (Helgi Þormar Þorbjörnsson)
Look at rev 1.8 and the comment: But basically: Accourding to the defacto standards a glob isn't supported but google, yahoo, msn and say they support it (I haven't checked with any other engine) so hopefully this works, I'm keeping the ticket open just in case, we'll have to monitor the access logs and check out the engines in the coming weeks. The alternative is to move all the api docs to and just symlink Then we can put disallow on only the /docs/ url and the problem should be taken care of, not we'd not put a redirect for latest, that way we actually should be able to trick the search engine into believing it's under /package/ with only the latest one. We can name docs to api-docs or whatever, the name doesn't matter, the biggest issue with this is we have to alter the cron and do a live move on the server, which has to happen when we do a release, tho we could just do cp on the dir and keep things temp in 2 dirs until the release is over. Just ranting out ideas but lets see if this robots.txt idea pans out, mark and christian, please help me monitor this after the release on 2 jan.
 [2008-01-03 21:20 UTC] jorrit (Jorrit Schippers)
Is a <meta name="robots" content="noindex" /> on all non-latest api-docs pages possible?
 [2008-01-10 18:56 UTC] dufuz (Helgi Þormar Þorbjörnsson)
Bummer seems my Disallow Allow thing did not work, even if Google Yahoo and others said they'd support it. Perhaps I'm doing something wrong, hrmm
 [2008-01-10 19:13 UTC] dufuz (Helgi Þormar Þorbjörnsson)
I just looked at how google parses the file, they have this nifty webmaster tool so if you have a google account then do this: * Sign into Google webmaster tools with your Google Account. * On the Dashboard, click the URL for the site you want. * Click Tools, and then click Analyze robots.txt. then add these as your test urls: Only the middle url is blocked so searching for mdb2 multiple on google should show us the latest thing but for some reason we're not listed on the first page, we even have our own bugs listed higher, I wonder why that is ? We probably got listed this high because we had the same content on multiple urls (i.e. for each release), so I wonder what we have to do to get the api docs higher up and indeed even the docs in general.
 [2008-01-10 21:32 UTC] dufuz (Helgi Þormar Þorbjörnsson)
Thank you for your bug report. This issue has been fixed in the latest released version of the package, which you can download at I'd say this one is kinda fixed but I think we should discuss how we can get API docs and normal docs higher in search results compared to some tutorials and blog posts, that should be done in another bug report on on So if you have ideas then please let me know via those channels :)