Recent Releases of paperscraper
paperscraper - v0.3.2
What's Changed
- Update pdf.py by @MoonDavid in https://github.com/jannisborn/paperscraper/pull/82
- Chemrxiv limit by @jannisborn in https://github.com/jannisborn/paperscraper/pull/84
- prepare 0.3.2 by @jannisborn in https://github.com/jannisborn/paperscraper/pull/86
New Contributors
- @MoonDavid made their first contribution in https://github.com/jannisborn/paperscraper/pull/82
Full Changelog: https://github.com/jannisborn/paperscraper/compare/v0.3.1...v0.3.2
- Python
Published by jannisborn 8 months ago
paperscraper - v0.3.1
What's Changed
- Load API keys automatically from
.envfile if available -- by @jannisborn in https://github.com/jannisborn/paperscraper/pull/77 - Optionally download bioRxiv PDFs via requester-pays S3 bucket -- by @jannisborn in https://github.com/jannisborn/paperscraper/pull/80
Pre-release
- Selflink by @jannisborn in https://github.com/jannisborn/paperscraper/pull/76
- Homogenize self citation/reference client by @jannisborn in https://github.com/jannisborn/paperscraper/pull/78
Full Changelog: https://github.com/jannisborn/paperscraper/compare/v0.3.0...v0.3.1
- Python
Published by jannisborn 11 months ago
paperscraper - v0.3.0
What's Changed
- Citations of a paper can now be retrieved from a DOI by @jannisborn in https://github.com/jannisborn/paperscraper/pull/73
- Full text download fallback implementation by @mathinic in https://github.com/jannisborn/paperscraper/pull/72
New Contributors
- @mathinic made their first contribution in https://github.com/jannisborn/paperscraper/pull/72
Full Changelog: https://github.com/jannisborn/paperscraper/compare/v0.2.16...v0.3.0
- Python
Published by jannisborn about 1 year ago
paperscraper - v0.2.16
What's Changed
- feat: support retries for chemrxiv api by @jannisborn in https://github.com/jannisborn/paperscraper/pull/66
- BREAKING CHANGE: Homogenize the usage of begindate instead startdate by @jannisborn in https://github.com/jannisborn/paperscraper/pull/69
- Ensure unique DOI from PubMed API by @jannisborn in https://github.com/jannisborn/paperscraper/pull/71
- More robust PubMed requests (bumped pymed-paperscraper dependency)
Full Changelog: https://github.com/jannisborn/paperscraper/compare/v0.2.15...v0.2.16
- Python
Published by jannisborn about 1 year ago
paperscraper - v0.2.15
What's Changed
- feat: support scraping arxiv entirely by @jannisborn in https://github.com/jannisborn/paperscraper/pull/64
- feat: support date search in arxiv by @jannisborn in https://github.com/jannisborn/paperscraper/pull/63
- feat: Journal Impact factors are now up to date until 2024 @jannisborn in https://github.com/jannisborn/paperscraper/pull/55
- feat:
paperscraper.pdf.save_pdfcan now also save paper metadata in json format by @jannisborn in https://github.com/jannisborn/paperscraper/pull/57
Pre-releases: * Adding support for self-referencing (#59) by @jannisborn in https://github.com/jannisborn/paperscraper/pull/60 * Base setup for self-linking by @jannisborn in https://github.com/jannisborn/paperscraper/pull/61
Full Changelog: https://github.com/jannisborn/paperscraper/compare/v0.2.14...v0.2.15
- Python
Published by jannisborn about 1 year ago
paperscraper - v0.2.14
What's Changed
- Refactor to pymed-paperscraper as dependency by @jannisborn in https://github.com/jannisborn/paperscraper/pull/53
- Support and Tests for higher Python versions by @jannisborn in https://github.com/jannisborn/paperscraper/pull/48
- Expand unit tests by @jannisborn in https://github.com/jannisborn/paperscraper/pull/49
- doc: Basic mkdocs setup by @jannisborn in https://github.com/jannisborn/paperscraper/pull/50
- Add codespell support (config, workflow to detect/not fix) and make it fix few typos by @yarikoptic in https://github.com/jannisborn/paperscraper/pull/54
New Contributors
- @yarikoptic made their first contribution in https://github.com/jannisborn/paperscraper/pull/54
Full Changelog: https://github.com/jannisborn/paperscraper/compare/v0.2.13...v0.2.14
- Python
Published by jannisborn over 1 year ago
paperscraper - v0.2.13
What's Changed
- Bump scholarly dependency by @jannisborn in https://github.com/jannisborn/paperscraper/pull/47
Full Changelog: https://github.com/jannisborn/paperscraper/compare/v0.2.12...v0.2.13
- Python
Published by jannisborn almost 2 years ago
paperscraper - v0.2.12
What's Changed
- chore(deps): bump requests from 2.31.0 to 2.32.0 by @dependabot in https://github.com/jannisborn/paperscraper/pull/42
- add retry logic in XRXivApi to tackle request timed out by @memray in https://github.com/jannisborn/paperscraper/pull/43
New Contributors
- @memray made their first contribution in https://github.com/jannisborn/paperscraper/pull/43
Full Changelog: https://github.com/jannisborn/paperscraper/compare/v0.2.11...v0.2.12
- Python
Published by jannisborn almost 2 years ago
paperscraper - v0.2.11
What's Changed
- fix: lower default max_results by @jannisborn in https://github.com/PhosphorylatedRabbits/paperscraper/pull/41
Full Changelog: https://github.com/PhosphorylatedRabbits/paperscraper/compare/v0.2.10...v0.2.11
- Python
Published by jannisborn about 2 years ago
paperscraper - Impact factor restoration
0.2.9 was broken because deps of paperscraper.impact were not shipped via PyPI (installation from source was OK).
Fixed this and expanded tests to discover such cases in future
What's Changed
- Hotfix by @jannisborn in https://github.com/PhosphorylatedRabbits/paperscraper/pull/39
Full Changelog: https://github.com/PhosphorylatedRabbits/paperscraper/compare/v0.2.9...v0.2.10
- Python
Published by jannisborn about 2 years ago
paperscraper - Impact factor integreation
Fuzzy search of impact factor from journals
What's Changed
- Impact factor by @jannisborn in https://github.com/PhosphorylatedRabbits/paperscraper/pull/37
Full Changelog: https://github.com/PhosphorylatedRabbits/paperscraper/compare/v0.2.8...v0.2.9
- Python
Published by jannisborn over 2 years ago
paperscraper - v0.2.8
What's Changed
- Graceful handling of connection errors by @jannisborn in https://github.com/PhosphorylatedRabbits/paperscraper/pull/35
- chore(deps): bump requests from 2.24.0 to 2.31.0 by @dependabot in https://github.com/PhosphorylatedRabbits/paperscraper/pull/30
New Contributors
- @dependabot made their first contribution in https://github.com/PhosphorylatedRabbits/paperscraper/pull/30
Full Changelog: https://github.com/PhosphorylatedRabbits/paperscraper/compare/v0.2.7...v0.2.8
- Python
Published by jannisborn over 2 years ago
paperscraper - v0.2.7
What's Changed
- fix: OS agnostic urljoining by @jannisborn in https://github.com/PhosphorylatedRabbits/paperscraper/pull/29
A bugfix for Windows users that prevented from querying the chemrxiv API
Full Changelog: https://github.com/PhosphorylatedRabbits/paperscraper/compare/v0.2.6...v0.2.7
- Python
Published by jannisborn almost 3 years ago
paperscraper - 0.2.6
What's Changed
- Save DOIs from arxiv papers by @jannisborn in https://github.com/PhosphorylatedRabbits/paperscraper/pull/27 --> This also allows to scrape PDFs from arxiv metadata
Full Changelog: https://github.com/PhosphorylatedRabbits/paperscraper/compare/v0.2.5...v0.2.6
- Python
Published by jannisborn almost 3 years ago
paperscraper - v0.2.5
What's Changed
- Extract records from biorxiv and medrxiv based on start date and end date by @achouhan93 in https://github.com/PhosphorylatedRabbits/paperscraper/pull/24
- Extract records from chemrxiv based on start date and end date by @achouhan93 and @jannisborn in https://github.com/PhosphorylatedRabbits/paperscraper/pull/25
EXAMPLE
Since v0.2.5 paperscraper also allows to scrape {med/bio/chem}rxiv for specific dates!
py
medrxiv(begin_date="2023-04-01", end_date="2023-04-08")
But watch out. The resulting .jsonl file will be labelled according to the current date and all your subsequent searches will be based on this file only. If you use this option you might want to keep an eye on the source files (paperscraper/server_dumps/*jsonl) to ensure they contain the paper metadata for all papers you're interested in.
New Contributors
- @achouhan93 made their first contribution in https://github.com/PhosphorylatedRabbits/paperscraper/pull/24
Full Changelog: https://github.com/PhosphorylatedRabbits/paperscraper/compare/v0.2.4...v0.2.5
- Python
Published by jannisborn almost 3 years ago
paperscraper - v0.2.4
v0.2.4 - release summary
1. Support for scraping PDFs
2. Harmonize return types of scraper classes to pd.DataFrame rather than List[Dict].
1. Scraping PDFs
v0.2.4 now supports downloading PDFs. The core function is paperscraper.pdf.save_pdf which receives a dictionary with the key doi and downloads the PDF for the desired DOI. There's also a wrapper function paperscraper.pdf.save_pdf_from_dump that can be called with a filepath of a .jsonl file that was previously obtained in the metadata search. This wrapper downloads all PDFs from the metadata search. Examples are given in the README.
Thanks to @daenuprobst for suggestions!
2.Return types With this version, it is ensured that all scraper classes return the results in a pandas dataframe (one paper per row) as opposed to a list of dictionaries (one paper per dict).
Full Changelog: https://github.com/PhosphorylatedRabbits/paperscraper/compare/v0.2.3...v0.2.4
- Python
Published by jannisborn over 3 years ago
paperscraper - v0.2.3
What's Changed
- fix: preprint['id'] should be preprint['item']['id'] by @oppih in https://github.com/PhosphorylatedRabbits/paperscraper/pull/22
Full Changelog: https://github.com/PhosphorylatedRabbits/paperscraper/compare/v0.2.2...v0.2.3
- Python
Published by jannisborn almost 4 years ago
paperscraper - v0.2.2
What's Changed
- refactor: extraction of published DOI/URL by @oppih in https://github.com/PhosphorylatedRabbits/paperscraper/pull/21
New Contributors
- @oppih made their first contribution in https://github.com/PhosphorylatedRabbits/paperscraper/pull/21
Full Changelog: https://github.com/PhosphorylatedRabbits/paperscraper/compare/0.2.1...v0.2.2
- Python
Published by jannisborn almost 4 years ago
paperscraper - 0.2.1 Streamline .jsonl handling (saving/loading)
This version streamlines the handling of .jsonl files throughout the package. It removes an inconsistency between the arxiv/pubmed and the biorxiv/chemrxiv/medrxiv entry points where the former would dump the papers one string per line and the latter dumps it as one dict (json) per line.
Thanks @juliusbierk for pointing this out.
What's Changed
- Export to json format by @juliusbierk in https://github.com/PhosphorylatedRabbits/paperscraper/pull/19
- 0.2.1 - Streamline jsonl file saving/loading by @jannisborn in https://github.com/PhosphorylatedRabbits/paperscraper/pull/20
New Contributors
- @juliusbierk made their first contribution in https://github.com/PhosphorylatedRabbits/paperscraper/pull/19
Full Changelog: https://github.com/PhosphorylatedRabbits/paperscraper/compare/0.2.0...0.2.1
- Python
Published by jannisborn over 4 years ago
paperscraper - 0.2.0 - Integrate chemRxiv API from Open Engage
What's Changed
- 0.2.0 - Chemrxiv engage api by @jannisborn in https://github.com/PhosphorylatedRabbits/paperscraper/pull/18
- Bring back the support of chemrxiv
- Extend functionalities compared to old figshare API (more searchable fields)
Full Changelog: https://github.com/PhosphorylatedRabbits/paperscraper/compare/0.1.1...0.2.0
- Python
Published by jannisborn over 4 years ago
paperscraper - 0.1.1 - Reflect ChemRxiv API shutdown
Release 0.1.1 to reflect ChemRxiv API shutdown
What's Changed
ChemRxiv update by @jannisborn in https://github.com/PhosphorylatedRabbits/paperscraper/pull/16:
- Decided to keep the chemrxiv-related code in the package to ensure backwards compatibility.
- attempting to download the latest chemrxiv dump (
paperscraper.get_dumps.chemrxiv) will now be denied byConnectionRefusedError - Loading the package still tries to find a local chemrxiv dump. If one is available, package behaves as before (i.e., existing local chemrxiv dumps will continue to be fully searchable with all associated functionalities)
- If no chemrxiv dump is available, package silently proceeds (no logging, since this is the new default, closing #13)
- Improved dump loading in case the .jsonl files are empty or faulty (fixes #15)
- README description with details about chemrxiv migration from figshare to Endorse
- added badges about download statistics
ci: switch from travis to GA by @jannisborn in https://github.com/PhosphorylatedRabbits/paperscraper/pull/12
- PyPI releases now triggered with releases instead of tags
Full Changelog: https://github.com/PhosphorylatedRabbits/paperscraper/compare/0.1.0...0.1.1
- Python
Published by jannisborn over 4 years ago