hail

Cloud-native genomic dataframes and batch computing

https://github.com/hail-is/hail

Science Score: 46.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    25 of 95 committers (26.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.1%) to scientific vocabulary

Keywords

bioinformatics genetics genomics gwas hail python software vcf

Keywords from Contributors

operating-system state-management optim interactive pde networks agda numeric dependent-types notebook
Last synced: 6 months ago · JSON representation

Repository

Cloud-native genomic dataframes and batch computing

Basic Info
  • Host: GitHub
  • Owner: hail-is
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage: https://hail.is
  • Size: 128 MB
Statistics
  • Stars: 1,026
  • Watchers: 52
  • Forks: 256
  • Open Issues: 292
  • Releases: 124
Topics
bioinformatics genetics genomics gwas hail python software vcf
Created over 10 years ago · Last pushed 6 months ago
Metadata Files
Readme License Citation Authors Zenodo

README.md

Hail

Zulip DOI PyPI version

Hail is an open-source, general-purpose, Python-based data analysis tool with additional data types and methods for working with genomic data.

Hail is built to scale and has first-class support for multi-dimensional structured data, like the genomic data in a genome-wide association study (GWAS).

Hail is exposed as a Python library, using primitives for distributed queries and linear algebra implemented in Scala, Spark, and increasingly C++.

See the documentation for more info on using Hail.

Community

Hail has been widely adopted in academia and industry, including as the analysis platform for the genome aggregation database and UK Biobank rapid GWAS. Learn more about Hail-powered science.

Contribute

If you'd like to discuss or contribute to the development of methods or infrastructure, please:

Hail uses a continuous deployment approach to software development, which means we frequently add new features. We update users about changes to Hail via the Discussion Forum. We recommend creating an account on the Discussion Forum so that you can subscribe to these updates as well.

Maintainer

Hail is maintained by a team in the Neale lab at the Stanley Center for Psychiatric Research of the Broad Institute of MIT and Harvard and the Analytic and Translational Genetics Unit of Massachusetts General Hospital.

Contact the Hail team at hail@broadinstitute.org.

Citing Hail

If you use Hail for published work, please cite the software. You can get a citation for the version of Hail you installed by executing:

python import hail as hl print(hl.citation())

Which will look like:

Hail Team. Hail 0.2.13-81ab564db2b4. https://github.com/hail-is/hail/releases/tag/0.2.13.

Acknowledgements

The Hail team has several sources of funding at the Broad Institute: - The Stanley Center for Psychiatric Research, which together with Neale Lab has provided an incredibly supportive and stimulating home. - Principal Investigators Benjamin Neale and Daniel MacArthur, whose scientific leadership has been essential for solving the right problems. - Jeremy Wertheimer, whose strategic advice and generous philanthropy have been essential for growing the impact of Hail.

We are grateful for generous support from: - The National Institute of Diabetes and Digestive and Kidney Diseases - The National Institute of Mental Health - The National Human Genome Research Institute - The Chan Zuckerberg Initiative

We would like to thank Zulip for supporting open-source by providing free hosting, and YourKit, LLC for generously providing free licenses for YourKit Java Profiler for open-source development.

Owner

  • Name: Hail
  • Login: hail-is
  • Kind: organization

Scalable genetic data analysis

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 11,233
  • Total Committers: 95
  • Avg Commits per committer: 118.242
  • Development Distribution Score (DDS): 0.785
Past Year
  • Commits: 208
  • Committers: 12
  • Avg Commits per committer: 17.333
  • Development Distribution Score (DDS): 0.779
Top Committers
Name Email Commits
Tim Poterba t****a@b****g 2,416
Daniel King d****g@g****m 2,039
cseed c****n@a****u 1,275
jigold j****d 1,139
Daniel Goldstein d****5@g****m 839
jbloom22 j****m@b****g 631
John Compitello j****c@b****g 490
Christopher Vittal c****l@b****g 456
Patrick Schultz p****z@b****g 335
Arcturus Wang w****g@b****g 268
Alex V. Kotlar a****r@b****u 207
Amanda Wang a****g@a****u 145
Edmund Higham e****m 114
dependabot[bot] 4****] 106
Nick Watts n****s@b****g 68
Dan King d****g@b****g 62
iris 8****n 61
Konrad Karczewski k****i@g****m 52
Chris Llanwarne c****e 44
lfrancioli l****n@b****g 38
maccum 3****m 37
Patrick Cummings 4****2 35
alexb-3 a****3 33
Dania-Abuhijleh a****d@n****u 28
Milo i****s@g****m 26
Leonhard Gruenschloss l****s@p****u 25
ammekk 7****k 23
Carolin Diaz 6****6 23
Kumar Veerapen m****n@g****m 15
vrautela 1****a 12
and 65 more...

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 401
  • Total pull requests: 2,029
  • Average time to close issues: 2 months
  • Average time to close pull requests: 18 days
  • Total issue authors: 42
  • Total pull request authors: 32
  • Average comments per issue: 1.27
  • Average comments per pull request: 1.03
  • Merged pull requests: 1,283
  • Bot issues: 0
  • Bot pull requests: 145
Past Year
  • Issues: 70
  • Pull requests: 451
  • Average time to close issues: 16 days
  • Average time to close pull requests: 14 days
  • Issue authors: 17
  • Pull request authors: 15
  • Average comments per issue: 0.16
  • Average comments per pull request: 0.91
  • Merged pull requests: 290
  • Bot issues: 0
  • Bot pull requests: 3
Top Authors
Issue Authors
  • danking (189)
  • daniel-goldstein (44)
  • chrisvittal (33)
  • cjllanwarne (26)
  • jigold (22)
  • patrick-schultz (16)
  • ehigham (14)
  • grohli (7)
  • jmarshall (5)
  • iris-garden (4)
  • nawatts (3)
  • tpoterba (3)
  • mhebrard (2)
  • kasittig (2)
  • MattWellie (2)
Pull Request Authors
  • danking (465)
  • daniel-goldstein (337)
  • patrick-schultz (230)
  • ehigham (213)
  • chrisvittal (204)
  • dependabot[bot] (142)
  • jigold (131)
  • cjllanwarne (115)
  • iris-garden (58)
  • grohli (25)
  • jmarshall (22)
  • tpoterba (21)
  • Will-Tyler (12)
  • sjparsa (12)
  • kasittig (8)
Top Labels
Issue Labels
bug (159) new-feature (84) needs-triage (76) batch (73) query (70) chore (20) fs (10) enhancement (9) snack (7) infrastructure (7) prio:low (4) security (4) documentation (4) Epic (3) hailctl (3) good-first-project (3) triaged (3) performance (3) help wanted (3) UI (2) dependencies (2) AoU Echo (1) cursor (1) needs more info (1) migration (1) WIP (1) stacked PR (1)
Pull Request Labels
dependencies (141) python (121) prio:high (109) WIP (39) migration (33) ready-to-merge (23) do-not-test (12) full-deploy (11) stacked PR (8) java (7) NIST 800-53 (6) documentation (6) prio:low (5) tiny PR (3) snack (3) terraform (1)

Packages

  • Total packages: 3
  • Total downloads:
    • pypi 24,921 last-month
  • Total docker downloads: 237,565
  • Total dependent packages: 10
    (may contain duplicates)
  • Total dependent repositories: 45
    (may contain duplicates)
  • Total versions: 167
  • Total maintainers: 5
  • Total advisories: 1
pypi.org: hail

Scalable library for exploring and analyzing genomic data.

  • Versions: 157
  • Dependent Packages: 10
  • Dependent Repositories: 43
  • Downloads: 24,883 Last month
  • Docker Downloads: 237,565
Rankings
Docker downloads count: 1.1%
Downloads: 2.0%
Stargazers count: 2.1%
Dependent repos count: 2.2%
Average: 2.4%
Dependent packages count: 3.3%
Forks count: 3.4%
Maintainers (2)
Last synced: 6 months ago
pypi.org: j11hail

Scalable library for exploring and analyzing genomic data.

  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 16 Last month
Rankings
Stargazers count: 2.1%
Forks count: 3.4%
Dependent packages count: 7.4%
Average: 17.0%
Dependent repos count: 22.2%
Downloads: 49.7%
Maintainers (2)
Last synced: 6 months ago
pypi.org: cpg-hail

Scalable library for exploring and analyzing genomic data.

  • Versions: 6
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 22 Last month
Rankings
Stargazers count: 2.1%
Forks count: 3.4%
Dependent packages count: 7.4%
Average: 21.2%
Dependent repos count: 22.2%
Downloads: 70.9%
Maintainers (1)
cpg
Last synced: 6 months ago