Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQUEST] Adding sitemap.xml for link extraction #997

Open
n3rada opened this issue Sep 30, 2023 · 5 comments
Open

[FEATURE REQUEST] Adding sitemap.xml for link extraction #997

n3rada opened this issue Sep 30, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@n3rada
Copy link

n3rada commented Sep 30, 2023

Hi there! 👋

Thanks for all your work on feroxbuster. I was wondering, maybe I've seen it wrong but it seems to me that link extraction takes robots.txt into account but not sitemap.xml. 🤔

That would be "simple" and interesting to add, wouldn't it?

@epi052
Copy link
Owner

epi052 commented Oct 4, 2023

hey there, sorry, i forgot to reply to this one. parsing the sitemap would absolutely be an interesting feature to add. I see you submitted a PR 🎉 , ill be able to take a look soon. Thank you !

@epi052
Copy link
Owner

epi052 commented Oct 7, 2023

Ok, adding some thoughts here on how to go about parsing sitemaps.

  • there are various sitemap formats:
    • traditional xml
    • rss feed (still xml, different tags)
    • plaintext
  • sitemaps can be located inside any directory on the webserver. when located in a non-root directory, they only describe locations within that directory
  • urls in a sitemap are entity escaped (i.e. " becomes ")
  • sitemaps may be gzipped if they're huge, but have a max uncompressed size of 50MB/50,000 URLs
  • a sitemap may be a sitemap index. a sitemap index points to multiple sitemaps

It may seem like a lot, but if we're going to add sitemap parsing, I'd like to get the majority of the cases above handled.

@n3rada
Copy link
Author

n3rada commented Oct 8, 2023

I clearly underestimated the sitemap.xml then. I guess I'm not going to be very useful in this job after all. But I think it would be amazing to have this feature as well! 😊

Copy link

stale bot commented Dec 15, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Dec 15, 2023
@L1-0
Copy link

L1-0 commented May 16, 2024

This would be a great feature!

@stale stale bot removed the stale label May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants