Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR/historical branches are getting indexed by Google #3645

Closed
chalin opened this issue Dec 5, 2023 · 5 comments
Closed

PR/historical branches are getting indexed by Google #3645

chalin opened this issue Dec 5, 2023 · 5 comments

Comments

@chalin
Copy link
Contributor

chalin commented Dec 5, 2023

Originally posted by @thesuperzapper in #3628 (comment):

@chalin Also, all our PR/historical branches are getting indexed by Google, we should fix that at the same time as this PR.

The goals would be:

  1. The main www.kubeflow.org site should be indexed
  2. All PR deploy-preview-XXXX--competent-brattain-de2d6d.netlify.app should NOT be indexed
  3. All other v1-7-branch.kubeflow.org sites should be NOT be indexed:
    • (these are just CNAME records pointing to the branch domains like v1-7-branch--competent-brattain-de2d6d.netlify.app)

I believe your changes here achieve 2, because you are setting -e dev in the hugo command, and because this is not "production", docsy adds <meta> no index tags.

We need to be careful about 1. Are you 100% confident that not setting -e production or HUGO_ENV=production is safe?

To achieve 3, we could set the HUGO_ENV from [context.branch-deploy.environment] to dev, but it will probably propagate faster if we use a robots.txt disallow on those domains (otherwise, the <meta> tags will take until Google next indexes each page).

@chalin
Copy link
Contributor Author

chalin commented Dec 6, 2023

To achieve 3, we could set the HUGO_ENV from [context.branch-deploy.environment] to dev, but it will probably propagate faster if we use a robots.txt disallow on those domains (otherwise, the <meta> tags will take until Google next indexes each page).

AFAIK, what you propose won't work. I've had to work through a similar issue for another CNCF project with multiple versions of the docs being indexed. Based on my experiences, you'll need to change each old-version branch individually (to somehow set / config it to emit noindex, nofollow as appropriate for the branch) and have it rebuilt and redeployed.

Btw, you can't use robots.txt to prevent domains from being indexed -- see https://developers.google.com/search/docs/crawling-indexing/robots/intro:

image

/cc @nate-double-u

@chalin
Copy link
Contributor Author

chalin commented Dec 6, 2023

As I mentioned elsewhere, I'm OOO, but I'll be glad to help with this in the new year.

@thesuperzapper
Copy link
Member

@chalin It's possible if the Netelify configs are defined for all branches in master (rather than the branches themselves) as discussed here #3628 (comment), then we might only need to update master, and then trigger a re-deploy of the older Netelify branches.

(However, I think the super new version of Hugo running in master will probably break our really old Docsy versions and the deploy might fail).

@thesuperzapper
Copy link
Member

thesuperzapper commented Oct 29, 2024

This has been done mostly as part of #3863 (comment).

We still need to update the CNAME for the older branches, but we are working on this in #3915

/close

Copy link

@thesuperzapper: Closing this issue.

In response to this:

This has been done mostly as part of #3863 (comment).

We still need to update the CNAME for the older branchs, but we are working on this in #3915

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@github-project-automation github-project-automation bot moved this from To Do to Closed in Needs Triage Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Closed
Development

No branches or pull requests

2 participants