metafinder

Search for documents in a domain through Search Engines (Google, Bing and Baidu). The objective is to extract metadata

https://github.com/josue87/metafinder

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.5%) to scientific vocabulary

Keywords

crawler metadata osint
Last synced: 6 months ago · JSON representation

Repository

Search for documents in a domain through Search Engines (Google, Bing and Baidu). The objective is to extract metadata

Basic Info
  • Host: GitHub
  • Owner: Josue87
  • License: gpl-3.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 53.7 KB
Statistics
  • Stars: 219
  • Watchers: 7
  • Forks: 34
  • Open Issues: 5
  • Releases: 0
Archived
Topics
crawler metadata osint
Created about 5 years ago · Last pushed about 2 years ago
Metadata Files
Readme Contributing License

README.md

MetaFinder

Search for documents in a domain through Search Engines. The objective is to extract metadata.


Installation:

```

pip3 install metafinder ```

Upgrades are also available using:

```

pip3 install metafinder --upgrade ```

Usage

MetaFinder can be used in 2 ways:

CLI

metafinder -d domain.com -l 20 -o folder [-t 10] -go -bi -ba

Parameters: * d: Specifies the target domain. * l: Specify the maximum number of results to be searched in the searchs engines. * o: Specify the path to save the report. * t: Optional. Used to configure the threads (4 by default). * v: Show Metafinder version. * Search Engines to select (Google by default): * go: Optional. Search in Google. * bi: Optional. Search in Bing. * ba: Optional. Search in Baidu. (Experimental)

In Code

``` import metafinder.extractor as metadata_extractor

documentslimit = 5 domain = "targetdomain" result = metadataextractor.extractmetadatafromgooglesearch(domain, documentslimit)

result = metadataextractor.extractmetadatafrombingsearch(domain, documentslimit)

result = metadataextractor.extractmetadatafrombaidusearch(domain, documentslimit)

authors = result.getauthors() software = result.getsoftware() for k,v in result.getmetadata().items(): print(f"{k}:") print(f"| URL: {v['url']}") for metadata,value in v['metadata'].items(): print(f"|__ {metadata}: {value}")

documentname = "test.pdf" try: metadatafile = metadataextractor.extractmetadatafromdocument(documentname) for k,v in metadatafile.items(): print(f"{k}: {v}") except FileNotFoundError: print("File not found") ```

Example

image

Author

This project has been developed by:

Contributors

Disclaimer!

The software is designed to leave no trace in the documents we upload to a domain. The author is not responsible for any illegitimate use.

Owner

  • Name: Josué Encinar
  • Login: Josue87
  • Kind: user
  • Location: Madrid
  • Company: IriusRisk

Security Researcher / Offensive Security Enthusiast

GitHub Events

Total
  • Watch event: 24
  • Issue comment event: 1
  • Pull request event: 1
  • Fork event: 3
Last Year
  • Watch event: 24
  • Issue comment event: 1
  • Pull request event: 1
  • Fork event: 3

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 33
  • Total Committers: 5
  • Avg Commits per committer: 6.6
  • Development Distribution Score (DDS): 0.273
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Josue87 j****7@g****m 24
Josué Encinar J****7 5
Lucas Fernandez l****n@g****m 2
abdallaEG 5****G 1
febrezo f****o@d****g 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 9
  • Total pull requests: 7
  • Average time to close issues: 9 days
  • Average time to close pull requests: 6 days
  • Total issue authors: 9
  • Total pull request authors: 7
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.43
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • truesamurai (1)
  • vicmac-github (1)
  • alanEG (1)
  • Bartates (1)
  • lucferbux (1)
  • wb4r (1)
  • landaboot (1)
  • mlinton (1)
  • sec0ps (1)
Pull Request Authors
  • zblurx (2)
  • mlinton (2)
  • lucferbux (1)
  • six2dez (1)
  • alanEG (1)
  • AxylumRust (1)
  • febrezo (1)
Top Labels
Issue Labels
Pull Request Labels
enhancement (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 1,547 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 87
  • Total versions: 13
  • Total maintainers: 1
pypi.org: metafinder

MetaFinder - Metadata search through Search Engines

  • Versions: 13
  • Dependent Packages: 0
  • Dependent Repositories: 87
  • Downloads: 1,547 Last month
Rankings
Dependent repos count: 1.6%
Downloads: 3.7%
Average: 6.1%
Stargazers count: 7.1%
Forks count: 8.2%
Dependent packages count: 10.0%
Maintainers (1)
Last synced: 6 months ago

Dependencies

requirements.txt pypi
  • beautifulsoup4 >=4.9.3
  • openpyxl >=3.0.5
  • pikepdf >=2.5.2
  • prompt-toolkit >=3.0.5
  • python-docx >=0.8.6
  • python-pptx >=0.6.18
  • requests >=2.25.1
  • urllib3 >=1.26.4
setup.py pypi
  • beautifulsoup4 >=4.9.3
  • openpyxl >=3.0.5
  • pikepdf >=2.5.2
  • prompt-toolkit >=3.0.5
  • python-docx >=0.8.6
  • python-pptx >=0.6.18
  • requests >=2.25.1
  • urllib3 >=1.26.4