Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary
Keywords
Repository
Concurrent data collection, compression and storage
Basic Info
Statistics
- Stars: 1
- Watchers: 2
- Forks: 5
- Open Issues: 0
- Releases: 29
Topics
Metadata Files
README.md
aswan
collect and organize data into a T1 data depot named after the Aswan Dam
Collect and compress data from the internet for later parsing
- quick, parallel, customizable to collect
- compressed to store
- quick to sync with a remote store
- sync to continue collecting
- sync to parse
- immutable collection
To Setup a Remote
set the environment variables ASWAN_AUTH_HEX and ASWAN_AUTH_PASS according to the zimmauth package, and ASWAN_REMOTE with the name of the default remote.
Concepts
- objects
- saved by collection events
- events
- collection
- registration (v2: registration for parsing)
- (v2) parsing
- runs
- manual run vs automated run
- makes manual adding of urls easy but revertible
- has unique id
- generates events
- linked to a specific version of the code
- ideally commit hash + pip freeze
- statuses
- determined by base status + runs integrated
- contains
- what urls need to be collected
- (v2) what collected objects need to be parsed
- sqlite file, constantly trimmed
Structure
- objects
- 00, 01, ...
- runs
- run-hash
- context.yaml
- commit-hash, pip-freeze, ...
- events.zip
- run-hash
- statuses
- status-hash
- context.yaml
- parent-status, integrated
- db.sqlite.zip
current-run
- context.yaml
- events
- these to be compressed into ../runs
- status.sqlite
there is a 'TEST' status
- cannot be integrated whatever is based on it
- a test run can be made on it...
when starting a run: - check if current-run is empty - if not, fail with - find latest status - if it has not integrated all past runs, create a new status that has - start collection (+ registration) - either stops or breaks, all events and objects are saved to disk - if properly stops, move and compress stuff - based on one that was the starter, and current run id
Pre v1.0 laundry list
- parallelize push / pull
- parsing/connection/broken session error docs
transferring / ignoring cookies
template projects
- oddsportal
- updating thingy, based on latest match in season
- footy
- rotten
- boxoffice
Owner
- Name: Endre Mark Borza
- Login: endremborza
- Kind: user
- Twitter: endremborza
- Repositories: 14
- Profile: https://github.com/endremborza
Citation (CITATION.cff)
cff-version: 1.2.0 message: If you use this software, please cite it as below. url: https://github.com/endremborza/aswan authors: - family-names: Borza given-names: Endre Márk orcid: https://orcid.org/0000-0002-8804-4520 title: endremborza/aswan version: 0.5.15 date-released: 2024-06-07
GitHub Events
Total
Last Year
Committers
Last synced: almost 3 years ago
All Time
- Total Commits: 228
- Total Committers: 2
- Avg Commits per committer: 114.0
- Development Distribution Score (DDS): 0.039
Top Committers
| Name | Commits | |
|---|---|---|
| Endre Márk Borza | e****a@g****m | 219 |
| papsebestyen | p****n@g****m | 9 |
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 0
- Total pull requests: 7
- Average time to close issues: N/A
- Average time to close pull requests: 20 days
- Total issue authors: 0
- Total pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.57
- Merged pull requests: 7
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- papsebestyen (4)
- endremborza (3)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 1,715 last-month
- Total dependent packages: 0
- Total dependent repositories: 3
- Total versions: 27
- Total maintainers: 1
pypi.org: aswan
Data collection manager
- Homepage: https://github.com/endremborza/aswan
- Documentation: https://aswan.readthedocs.io/
- License: mit
-
Latest release: 0.5.15
published over 1 year ago
Rankings
Maintainers (1)
Dependencies
- actions/checkout v3 composite
- actions/setup-python v4 composite
- codecov/codecov-action v3 composite
- nanasess/setup-chromedriver v1 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- atqo >=0.3.0
- beautifulsoup4 *
- brotli *
- flask *
- flask-cors *
- html5lib *
- pyyaml *
- requests *
- selenium *
- sqlalchemy *
- typer *