edgar

A small library to access files from SEC's edgar

https://github.com/joeyism/py-edgar

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.2%) to scientific vocabulary

Keywords

cik edgar sec
Last synced: 6 months ago · JSON representation

Repository

A small library to access files from SEC's edgar

Basic Info
  • Host: GitHub
  • Owner: joeyism
  • License: gpl-3.0
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 103 KB
Statistics
  • Stars: 240
  • Watchers: 11
  • Forks: 51
  • Open Issues: 4
  • Releases: 8
Topics
cik edgar sec
Created over 8 years ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md

EDGAR

A small library to access files from SEC's edgar.

Installation

pip install edgar

Example

To get a company's latest 5 10-Ks, run

python from edgar import Company company = Company("Oracle Corp", "0001341439") tree = company.get_all_filings(filing_type = "10-K") docs = Company.get_documents(tree, no_of_documents=5) or ```python from edgar import Company, TXTML

company = Company("INTERNATIONAL BUSINESS MACHINES CORP", "0000051143") doc = company.get10K() text = TXTML.parsefull_10K(doc) ```

To get all companies and find a specific one, run

python from edgar import Edgar edgar = Edgar() possible_companies = edgar.find_company_name("Cisco System")

To avoid pull of all company data from sec.gov on Edgar initialization, pass in a local path to the data

python from edgar import Edgar edgar = Edgar("/path/to/cik-lookup-data.txt") possible_companies = edgar.find_company_name("Cisco System")

To get XBRL data, run ```python from edgar import Company, XBRL, XBRLElement

company = Company("Oracle Corp", "0001341439") results = company.getdatafilesfrom10K("EX-101.INS", isxml=True) xbrl = XBRL(results[0]) XBRLElement(xbrl.relevantchildrenparsed[15]).to_dict() // returns a dictionary of name, value, and schemaRef ```

API

Company

python Company(name, cik, timeout=10) * name (company name) * cik (company CIK number) * timeout (optional) (default: 10)

Methods

get_filings_url(self, filing_type="", prior_to="", ownership="include", no_of_entries=100) -> str

Returns a url to fetch filings data * filingtype: The type of document you want. i.e. 10-K, S-8, 8-K. If not specified, it'll return all documents * priorto: Time prior which documents are to be retrieved. If not specified, it'll return all documents * ownership: defaults to include. Options are include, exclude, only. * noofentries: defaults to 100. Returns the number of entries to be returned. Maximum is 100.

get_all_filings(self, filing_type="", prior_to="", ownership="include", no_of_entries=100) -> lxml.html.HtmlElement

Returns the HTML in the form of lxml.html * filingtype: The type of document you want. i.e. 10-K, S-8, 8-K. If not specified, it'll return all documents * priorto: Time prior which documents are to be retrieved. If not specified, it'll return all documents * ownership: defaults to include. Options are include, exclude, only. * noofentries: defaults to 100. Returns the number of entries to be returned. Maximum is 100.

get_10Ks(self, no_of_documents=1, as_documents=False) -> List[lxml.html.HtmlElement]

Returns the HTML in the form of lxml.html of concatenation of all the documents in the 10-K * noofdocuments (default: 1): numer of documents to be retrieved * When as_documents is set to True, it returns -> List[edgar.document.Documents] a list of Documents

get_10Ks_metadata(self) -> List[dict]

Returns the HTML in the form of a dictionary of concatenation of all the document metadata in the 10-K

get_document_type_from_10K(self, document_type, no_of_documents=1) -> List[lxml.html.HtmlElement]

Returns the HTML in the form of lxml.html of the document within 10-K * documenttype: Tye type of document you want, i.e. 10-K, EX-3.2 * noof_documents (default: 1): numer of documents to be retrieved

get_data_files_from_10K(self, document_type, no_of_documents=1, isxml=False) -> List[lxml.html.HtmlElement]

Returns the HTML in the form of lxml.html of the data file within 10-K * documenttype: Tye type of document you want, i.e. EX-101.INS * noof_documents (default: 1): numer of documents to be retrieved * isxml (default: False): by default, things aren't case sensitive and is parsed with html in lxml. If this is True, then it is parsed withetree` which is case sensitive

Class Method

get_documents(self, tree: lxml.html.Htmlelement, no_of_documents=1, debug=False, as_documents=False) -> List[lxml.html.HtmlElement] Returns a list of strings, each string contains the body of the specified document from input

  • tree: lxml.html form that is returned from Company.getAllFilings
  • noofdocuments: number of document returned. If it is 1, the returned result is just one string, instead of a list of strings. Defaults to 1.
  • debug (default: False): if True, displays the URL and form
  • When as_documents is set to True, it returns -> List[edgar.document.Documents] a list of Documents

Edgar

Gets all companies from EDGAR

get_cik_by_company_name(company_name: str) -> str: Returns the CIK if given the exact name or the company

get_company_name_by_cik(cik: str) -> str: Returns the company name if given the CIK (with the 000s)

find_company_name(words: str) -> List[str]: Returns a list of company names by exact word matching

find_company_name_cik(words: str) -> List[tuple[str, str]]: Return a list of company names and their CIK values

match_company_by_company_name(self, name, top=5) -> List[Dict[str, Any]]: Returns a list of dictionarys, with company names, CIK, and their fuzzy match score * top (default: 5) returns the top number of fuzzy matches. If set to None, it'll return the whole list (which is a lot)

XBRL

Parses data from XBRL

Properties

relevant_children * get children that are not context relevant_children_parsed * get children that are not context, unit, schemaRef * cleans tags

Documents

Filing and Documents Details for the SEC EDGAR Form (such as 10-K)

python Documents(url, timeout=10)

Properties

url: str: URL of the document

content: dict: Dictionary of meta data of the document

content['Filing Date']: str: Document filing date

content['Accepted']: str: Document accepted datetime

content['Period of Report']: str: The date period that the document is for

element: lxml.html.HtmlElement: The HTML element for the Document (from the url) so it can be further parsed

Contribution

Buy Me A Coffee

Owner

  • Name: Joey
  • Login: joeyism
  • Kind: user
  • Location: Toronto, Canada

Machine Learning Engineer, with a lot of CLI Dev Tools

GitHub Events

Total
  • Watch event: 15
  • Fork event: 1
Last Year
  • Watch event: 15
  • Fork event: 1

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 109
  • Total Committers: 6
  • Avg Commits per committer: 18.167
  • Development Distribution Score (DDS): 0.064
Past Year
  • Commits: 7
  • Committers: 1
  • Avg Commits per committer: 7.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
joeyism s****y@g****m 102
pprice p****e@d****m 2
Koen Oussoren K****n@n****m 2
kecarus k****n@i****t 1
kbennatti 3****i 1
eabase 5****e 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 23
  • Total pull requests: 7
  • Average time to close issues: 4 months
  • Average time to close pull requests: 3 months
  • Total issue authors: 17
  • Total pull request authors: 6
  • Average comments per issue: 2.7
  • Average comments per pull request: 0.71
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • gregjasonroberts (4)
  • Deanc419 (2)
  • rpocase (2)
  • victor4shen (2)
  • auggunner (1)
  • eabase (1)
  • ipl31 (1)
  • bdog1385 (1)
  • bosanipietro (1)
  • Jurga14 (1)
  • chrislakumb (1)
  • compusaurusrex (1)
  • kostadtk (1)
  • lascott (1)
  • joezein (1)
Pull Request Authors
  • colfax4 (2)
  • nickderobertis (1)
  • ipl31 (1)
  • Koen-kun (1)
  • eabase (1)
  • kbennatti (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 2,803 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 13
  • Total versions: 63
  • Total maintainers: 1
pypi.org: edgar

Scrape data from SEC's EDGAR

  • Versions: 63
  • Dependent Packages: 0
  • Dependent Repositories: 13
  • Downloads: 2,803 Last month
Rankings
Dependent repos count: 4.0%
Stargazers count: 4.6%
Forks count: 5.7%
Average: 6.6%
Downloads: 8.6%
Dependent packages count: 10.0%
Maintainers (1)
Last synced: 6 months ago

Dependencies

requirements-dev.txt pypi
  • pytest * development
requirements.txt pypi
  • fuzzywuzzy *
  • lxml *
  • requests *
  • tqdm *
setup.py pypi
  • package.split *