https://github.com/capjamesg/getsitemap
A Python library that retrieves all URLs in the sitemaps on a website.
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.1%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
A Python library that retrieves all URLs in the sitemaps on a website.
Basic Info
- Host: GitHub
- Owner: capjamesg
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://getsitemap.readthedocs.io/en/latest/
- Size: 62.5 KB
Statistics
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 1
- Releases: 2
Topics
Metadata Files
README.md
getsitemap
getsitemap is a Python library that retrieves all of the URLs that are found in all of the sitemaps on a website.
This project may be useful if you are building a search crawler or sitemap URL status code validators.
You can read the documentation for this project on Read the Docs.
Installation 💻
To get started, pip install getsitemap:
pip install getsitemap
Quickstart ⚡
Get all URLs recursively in all sitemaps
``` python import getsitemap
urls = getsitemap.getindividualsitemap("https://jamesg.blog/sitemap.xml")
print(urls) ```
Get all URLs in a single sitemap
``` python import getsitemap
allurls = getsitemap.retrievesitemap_urls("https://sitemap")
print(all_urls) ```
Code Quality
This library uses tox, pytest, and flake8 to assure code quality.
To run code quality checks, run the following command:
bash
tox
License 👩
This project is licensed under an MIT License.
Contributing 🛠️
We would love to have your help in improving [getsitemap]{.title-ref}. Have an idea for a new feature or a bug to fix? Leave information in a GitHub Issue to start a discussion!
If you have
Contributors 💻
- capjamesg
Owner
- Name: James
- Login: capjamesg
- Kind: user
- Location: Scotland
- Company: @Roboflow
- Website: jamesg.blog
- Repositories: 320
- Profile: https://github.com/capjamesg
from words, wonder.
GitHub Events
Total
Last Year
Committers
Last synced: almost 3 years ago
All Time
- Total Commits: 15
- Total Committers: 3
- Avg Commits per committer: 5.0
- Development Distribution Score (DDS): 0.133
Top Committers
| Name | Commits | |
|---|---|---|
| capjamesg | j****g@j****g | 13 |
| dependabot[bot] | 4****]@u****m | 1 |
| jamesg | 3****g@u****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 2
- Total pull requests: 8
- Average time to close issues: 1 day
- Average time to close pull requests: about 15 hours
- Total issue authors: 2
- Total pull request authors: 1
- Average comments per issue: 2.0
- Average comments per pull request: 0.0
- Merged pull requests: 8
- Bot issues: 0
- Bot pull requests: 8
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- capjamesg (1)
- minism (1)
Pull Request Authors
- dependabot[bot] (11)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 3
-
Total downloads:
- pypi 54 last-month
-
Total dependent packages: 0
(may contain duplicates) -
Total dependent repositories: 0
(may contain duplicates) - Total versions: 8
- Total maintainers: 1
pypi.org: getsitemap
Retrieve all URLs from a sitemap.
- Homepage: https://github.com/capjamesg/getsitemap
- Documentation: https://getsitemap.readthedocs.io/
- License: MIT License
-
Latest release: 0.1.5
published over 2 years ago
Rankings
Maintainers (1)
pypi.org: disinfo-domains
Analyze the reliability of a source using Wikipedia.
- Homepage: https://github.com/capjamesg/getsitemap
- Documentation: https://disinfo-domains.readthedocs.io/
- License: MIT License
-
Latest release: 0.1.0
published almost 2 years ago
Rankings
Maintainers (1)
pypi.org: sourcetrust
Analyze the reliability of a source using Wikipedia.
- Homepage: https://github.com/capjamesg/getsitemap
- Documentation: https://sourcetrust.readthedocs.io/
- License: MIT License
-
Latest release: 0.1.0
published almost 2 years ago
Rankings
Maintainers (1)
Dependencies
- beautifulsoup4 ==4.11.1
- bs4 ==0.0.1
- certifi ==2022.9.24
- charset-normalizer ==2.1.1
- idna ==3.4
- requests ==2.28.1
- soupsieve ==2.3.2.post1
- urllib3 ==1.26.12