pushshiftdumps

Example scripts for the pushshift dump files

https://github.com/watchful1/pushshiftdumps

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (3.7%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

Example scripts for the pushshift dump files

Basic Info
  • Host: GitHub
  • Owner: Watchful1
  • License: mit
  • Language: Python
  • Default Branch: master
  • Size: 262 KB
Statistics
  • Stars: 386
  • Watchers: 7
  • Forks: 73
  • Open Issues: 4
  • Releases: 0
Created over 4 years ago · Last pushed 8 months ago
Metadata Files
Readme License Citation

README.md

This repo contains example python scripts for processing the reddit dump files created by pushshift. The files can be torrented from here.

  • single_file.py decompresses and iterates over a single zst compressed file
  • iterate_folder.py does the same, but for all files in a folder
  • combine_folder_multiprocess.py uses separate processes to iterate over multiple files in parallel, writing lines that match the criteria passed in to text files, then combining them into a final zst compressed file

Owner

  • Login: Watchful1
  • Kind: user

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Pushshift dump utils
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Watchful1
repository-code: 'https://github.com/Watchful1/PushshiftDumps'
abstract: >-
  Tools to help parse reddit data from zstandard compressed
  ndjson files from the pushshift archives
license: MIT

GitHub Events

Total
  • Issues event: 9
  • Watch event: 92
  • Issue comment event: 14
  • Push event: 43
  • Fork event: 21
Last Year
  • Issues event: 9
  • Watch event: 92
  • Issue comment event: 14
  • Push event: 43
  • Fork event: 21

Committers

Last synced: 11 months ago

All Time
  • Total Commits: 167
  • Total Committers: 2
  • Avg Commits per committer: 83.5
  • Development Distribution Score (DDS): 0.006
Past Year
  • Commits: 60
  • Committers: 1
  • Avg Commits per committer: 60.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Watchful1 w****l@w****r 166
Peter Eckersley p****e@o****s 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 22
  • Total pull requests: 3
  • Average time to close issues: 1 day
  • Average time to close pull requests: 5 months
  • Total issue authors: 18
  • Total pull request authors: 2
  • Average comments per issue: 3.23
  • Average comments per pull request: 1.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 9
  • Pull requests: 0
  • Average time to close issues: about 9 hours
  • Average time to close pull requests: N/A
  • Issue authors: 5
  • Pull request authors: 0
  • Average comments per issue: 2.11
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Zoher15 (3)
  • sean-doody (1)
  • Flopblopper (1)
  • Jacobzwj (1)
  • jolui206 (1)
  • anatoliivanov (1)
  • JodanJodan (1)
  • maria-pro (1)
  • taygetea (1)
  • pj097 (1)
  • jannat5134 (1)
  • huycke (1)
  • PizzaCoder (1)
  • refaat31 (1)
  • Kashish-1426 (1)
Pull Request Authors
  • pde (2)
  • Watchful1 (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

Pipfile.lock pypi
  • certifi ==2022.6.15
  • charset-normalizer ==2.1.0
  • discord-logging https://github.com/Watchful1/DiscordLogging.git#9e543194d3612dde92eae1203d0ea143a7963f6e
  • dnspython ==2.2.1
  • idna ==3.3
  • numpy ==1.23.1
  • pymongo ==4.1.1
  • requests ==2.28.1
  • scipy ==1.8.1
  • urllib3 ==1.26.10
  • zstandard ==0.18.0
Pipfile pypi