https://github.com/capjamesg/site-asset-size-crawler
Measure the size of different assets (i.e. PNG, GIF, MP4) across your entire website. Find opportunities to reduce file sizes.
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.7%) to scientific vocabulary
Repository
Measure the size of different assets (i.e. PNG, GIF, MP4) across your entire website. Find opportunities to reduce file sizes.
Basic Info
- Host: GitHub
- Owner: capjamesg
- License: mit
- Language: Python
- Default Branch: main
- Size: 19.5 KB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
site-asset-size-crawler
Measure the size of different assets (i.e. PNG, GIF, MP4) across your entire website.
The analyze.py script in this project takes a URL, downloads the associated sitemap, then crawls all pages in the sitemap to find the file sizes of image and video assets. You can use this to find assets that are larger than they should be, and how many times those assets are referenced on your website.
You can also measure the total weight of image and video assets referenced on each page, allowing you to find pages that may load slowly due to the number of assets referenced.
This tool sends HEAD requests to each asset URL to get the file size from a Content-Length header. This means that you can measure file sizes without having to download them.
Installation
To get started, clone this project repository and install the required dependencies:
git clone https://github.com/capjamesg/site-asset-size-crawler
cd site-asset-size-crawler
pip3 install -r requirements.txt
Usage
To analyze your site, run:
python3 analyze.py --url https://example.com
Where https://example.com is either:
- Your root site, or
- A specific sitemap on which you want to run an analysis.
The script creates a few files.
assets_by_size.csv: A list of images and videos found, listed in descending order by file size.assets_by_use.csv: A list of images and videos found, listed in descending order of the number of pages on which the image was referenced.potential_optimizations.csv: A list of images and videos > 200 KB in size.pages_by_asset_size.csv: A list of pages, listed in descending order by the total size of assets referenced on the page.results.json: Stores the URLs analyzed and asset sizes by page as a JSON file.
License
This project is licensed under an MIT license.
Contributing
Have an idea on how this software can be improved? Create an Issue in the project GitHub repository! Found a bug? If you would like to, feel free to file a PR to help make the software better!
Owner
- Name: James
- Login: capjamesg
- Kind: user
- Location: Scotland
- Company: @Roboflow
- Website: jamesg.blog
- Repositories: 320
- Profile: https://github.com/capjamesg
from words, wonder.
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- BeautifulSoup *
- requests *
- tqdm *
- validators *