Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.3%) to scientific vocabulary
Last synced: 9 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: tercanblg
  • Language: Python
  • Default Branch: main
  • Size: 452 KB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 2 years ago · Last pushed about 2 years ago
Metadata Files
Readme Citation

README.md

Data Extraction Project Introduction This project aims to extract data from various sources and formats, such as websites, databases, and documents, using Python-based tools and libraries. The extracted data can be used for various purposes, such as analysis, reporting, and machine learning.

Installation Clone this repository to your local machine:

bash Kodu kopyala git clone https://github.com/tercanblg/dataextraction.git Navigate to the project directory:

bash Kodu kopyala cd dataextraction Install the required dependencies:

Kodu kopyala pip install -r requirements.txt Usage Modify the configuration file (config.ini) to specify the sources and formats from which you want to extract data.

Run the main script to start the data extraction process:

css Kodu kopyala python main.py The extracted data will be saved to the specified output location as per the configuration.

Configuration Modify the config.ini file to customize the extraction process. Specify the sources, formats, output location, and any other parameters required for data extraction. Contributing Contributions are welcome! If you find any issues or have suggestions for improvements, feel free to open an issue or submit a pull request.

License This project is licensed under the MIT License.

Contact For any inquiries or feedback, you can reach out to [insert your contact information].

Feel free to customize this template according to your project's specific requirements and details.

Owner

  • Login: tercanblg
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Joneš"
  given-names: "Jan"
title: "AI-based Structured Web Data Extraction"
version: 1.0.0
date-released: 2022-05-04
url: "https://github.com/jjonescz/awe"
repository-code: "https://github.com/jjonescz/awe"
repository-artifacts: "https://github.com/jjonescz/awe/releases/tag/v1.0"
preferred-citation:
  type: thesis
  authors:
  - family-names: "Joneš"
    given-names: "Jan"
  title: "AI-based Structured Web Data Extraction"
  thesis-type: MS
  year: 2022
  department: Department of Software Engineering
  institution:
    name: Charles University
    city: Prague
    country: CZ
  date-published: 2022-06-15
  url: "http://hdl.handle.net/20.500.11956/174143"

GitHub Events

Total
Last Year

Dependencies

.github/workflows/demo-docker-image.yml actions
  • actions/checkout v2 composite
.github/workflows/fly-deploy.yml actions
  • actions/checkout v2 composite
  • superfly/flyctl-actions 1.3 composite
.github/workflows/gradient-docker-image.yml actions
  • actions/checkout v2 composite
.github/workflows/heroku-deploy.yml actions
  • actions/checkout v2 composite
.github/workflows/training.yml actions
  • actions/checkout v2 composite
  • actions/upload-artifact v2 composite
gradient/Dockerfile docker
  • nvidia/cuda 9.2-base-ubuntu16.04 build
js/package.json npm
  • @types/cli-progress 3.9.2 development
  • @types/express 4.17.13 development
  • @types/natural-compare-lite 1.4.0 development
  • @types/node 16.11.6 development
  • ts-node 10.4.0 development
  • typescript 4.4.4 development
  • @oclif/command 1.8.0
  • @oclif/errors 1.3.5
  • cheerio 1.0.0-rc.10
  • cli-progress 3.9.1
  • domhandler 4.3.1
  • express 4.17.3
  • fast-glob 3.2.7
  • generic-pool 3.8.2
  • html-template-tag 4.0.0
  • natural-compare-lite 1.4.0
  • puppeteer-core 11.0.0
  • python-shell 3.0.1
  • rxjs 7.4.0
  • winston 3.3.3
js/pnpm-lock.yaml npm
  • 252 dependencies
awe/requirements.txt pypi
  • absl-py =1.0.0=pypi_0
  • anyio =3.5.0=pypi_0
  • argon2-cffi =21.3.0=pypi_0
  • argon2-cffi-bindings =21.2.0=pypi_0
  • astroid =2.9.3=pypi_0
  • asttokens =2.0.5=pypi_0
  • attrs =18.2.0=pypi_0
  • autopep8 =1.6.0=pypi_0
  • babel =2.10.1=pypi_0
  • backcall =0.2.0=pypi_0
  • beautifulsoup4 =4.11.1=py39h06a4308_0
  • bleach =5.0.0=pypi_0
  • brotlipy =0.7.0=py39h27cfd23_1003
  • bzip2 =1.0.8=h7b6447c_0
  • ca-certificates =2021.10.8=ha878542_0
  • cachetools =5.0.0=pypi_0
  • certifi =2021.10.8=py39hf3d152e_2
  • cffi =1.15.0=py39hd667e15_1
  • chardet =4.0.0=py39h06a4308_1003
  • charset-normalizer =2.0.4=pyhd3eb1b0_0
  • click =8.1.2=pypi_0
  • click-completion =0.5.2=pypi_0
  • click-didyoumean =0.3.0=pypi_0
  • click-help-colors =0.9.1=pypi_0
  • colorama =0.4.3=pypi_0
  • conda =4.12.0=py39hf3d152e_0
  • conda-build =3.21.8=py39h06a4308_2
  • conda-content-trust =0.1.1=pyhd3eb1b0_0
  • conda-package-handling =1.7.3=py39h27cfd23_1
  • cryptography =36.0.0=py39h9ce1e76_0
  • cycler =0.11.0=pypi_0
  • cython =0.29.28=pypi_0
  • debugpy =1.6.0=pypi_0
  • decorator =5.1.1=pypi_0
  • defusedxml =0.7.1=pypi_0
  • descartes =1.1.0=pypi_0
  • entrypoints =0.4=pypi_0
  • executing =0.8.3=pypi_0
  • fastjsonschema =2.15.3=pypi_0
  • filelock =3.6.0=pyhd3eb1b0_0
  • fonttools =4.33.2=pypi_0
  • gensim =4.1.2=pypi_0
  • gh =2.6.0=ha8f183a_0
  • glob2 =0.7=pyhd3eb1b0_0
  • google-auth =2.6.6=pypi_0
  • google-auth-oauthlib =0.4.6=pypi_0
  • gql =3.0.0a6=pypi_0
  • gradient =2.0.2=pypi_0
  • gradient-utils =0.5.0=pypi_0
  • graphql-core =3.1.7=pypi_0
  • grpcio =1.44.0=pypi_0
  • halo =0.0.31=pypi_0
  • huggingface-hub =0.5.1=pypi_0
  • icu =69.1=h9c3ff4c_0
  • idna =3.3=pyhd3eb1b0_0
  • ijson =3.1.4=pypi_0
  • importlib-metadata =4.11.3=pypi_0
  • inflection =0.5.1=pypi_0
  • ipykernel =6.13.0=pypi_0
  • ipython =8.2.0=pypi_0
  • ipython-genutils =0.2.0=pypi_0
  • ipywidgets =7.6.5=pypi_0
  • isort =5.10.1=pypi_0
  • jedi =0.18.1=pypi_0
  • jinja2 =3.1.1=pypi_0
  • joblib =1.1.0=pypi_0
  • json5 =0.9.6=pypi_0
  • jsonschema =4.4.0=pypi_0
  • jupyter-client =7.2.2=pypi_0
  • jupyter-core =4.10.0=pypi_0
  • jupyter-server =1.16.0=pypi_0
  • jupyterlab =3.2.4=pypi_0
  • jupyterlab-pygments =0.2.2=pypi_0
  • jupyterlab-server =2.13.0=pypi_0
  • jupyterlab-widgets =1.1.0=pypi_0
  • kiwisolver =1.4.2=pypi_0
  • lazy-object-proxy =1.7.1=pypi_0
  • ld_impl_linux-64 =2.35.1=h7274673_9
  • libarchive =3.4.2=h62408e4_0
  • libffi =3.3=he6710b0_2
  • libgcc-ng =11.2.0=h1d223b6_16
  • libiconv =1.16=h516909a_0
  • liblief =0.11.5=h295c915_1
  • libstdcxx-ng =11.2.0=he4da1e4_16
  • libuv =1.42.0=h7f98852_0
  • libxml2 =2.9.12=h885dcf4_1
  • libzlib =1.2.11=h166bdaf_1014
  • llvm-openmp =13.0.1=he0ac6c6_1
  • log-symbols =0.0.14=pypi_0
  • lz4-c =1.9.3=h295c915_1
  • markdown =3.3.6=pypi_0
  • markupsafe =2.0.1=py39h27cfd23_0
  • marshmallow =2.21.0=pypi_0
  • matplotlib =3.5.1=pypi_0
  • matplotlib-inline =0.1.3=pypi_0
  • mccabe =0.6.1=pypi_0
  • mistune =0.8.4=pypi_0
  • mizani =0.7.4=pypi_0
  • multidict =6.0.2=pypi_0
  • nbclassic =0.3.7=pypi_0
  • nbclient =0.6.0=pypi_0
  • nbconvert =6.5.0=pypi_0
  • nbformat =5.3.0=pypi_0
  • ncurses =6.3=h7f8727e_2
  • nest-asyncio =1.5.5=pypi_0
  • nodejs =17.1.0=h8ca31f7_2
  • notebook =6.4.11=pypi_0
  • notebook-shim =0.1.0=pypi_0
  • numpy =1.22.3=pypi_0
  • oauthlib =3.2.0=pypi_0
  • openssl =1.1.1n=h166bdaf_0
  • packaging =21.3=pypi_0
  • palettable =3.3.0=pypi_0
  • pandas =1.4.2=pypi_0
  • pandocfilters =1.5.0=pypi_0
  • parso =0.8.3=pypi_0
  • patchelf =0.13=h295c915_0
  • patsy =0.5.2=pypi_0
  • pexpect =4.8.0=pypi_0
  • pickleshare =0.7.5=pypi_0
  • pillow =9.1.0=pypi_0
  • pip =21.2.4=py39h06a4308_0
  • pkginfo =1.8.2=pyhd3eb1b0_0
  • platformdirs =2.5.2=pypi_0
  • plotnine =0.8.0=pypi_0
  • progressbar2 =4.0.0=pypi_0
  • prometheus-client =0.9.0=pypi_0
  • prompt-toolkit =3.0.29=pypi_0
  • protobuf =3.20.1=pypi_0
  • psutil =5.8.0=py39h27cfd23_1
  • ptyprocess =0.7.0=pypi_0
  • pure-eval =0.2.2=pypi_0
  • py-lief =0.11.5=py39h295c915_1
  • pyasn1 =0.4.8=pypi_0
  • pyasn1-modules =0.2.8=pypi_0
  • pycodestyle =2.8.0=pypi_0
  • pycosat =0.6.3=py39h27cfd23_0
  • pycparser =2.21=pyhd3eb1b0_0
  • pygments =2.11.2=pypi_0
  • pylint =2.12.2=pypi_0
  • pymongo =3.12.3=pypi_0
  • pyopenssl =21.0.0=pyhd3eb1b0_1
  • pyparsing =3.0.8=pypi_0
  • pyrsistent =0.18.1=pypi_0
  • pysocks =1.7.1=py39h06a4308_0
  • python =3.9.7=h12debd9_1
  • python-dateutil =2.8.2=pypi_0
  • python-libarchive-c =2.9=pyhd3eb1b0_1
  • python-slugify =5.0.2=pypi_0
  • python-utils =3.1.0=pypi_0
  • python_abi =3.9=2_cp39
  • pytz =2021.3=pyhd3eb1b0_0
  • pyyaml =5.4.1=pypi_0
  • pyzmq =22.3.0=pypi_0
  • readline =8.1.2=h7f8727e_1
  • regex =2022.3.15=pypi_0
  • requests =2.27.1=pyhd3eb1b0_0
  • requests-oauthlib =1.3.1=pypi_0
  • requests-toolbelt =0.9.1=pypi_0
  • ripgrep =12.1.1=0
  • rsa =4.8=pypi_0
  • ruamel_yaml =0.15.100=py39h27cfd23_0
  • sacremoses =0.0.49=pypi_0
  • scikit-learn =1.0.2=pypi_0
  • scipy =1.8.0=pypi_0
  • selectolax =0.3.6=pypi_0
  • send2trash =1.8.0=pypi_0
  • setuptools =58.0.4=py39h06a4308_0
  • shellingham =1.4.0=pypi_0
  • six =1.16.0=pyhd3eb1b0_0
  • smart-open =5.2.1=pypi_0
  • sniffio =1.2.0=pypi_0
  • soupsieve =2.3.1=pyhd3eb1b0_0
  • spinners =0.0.24=pypi_0
  • sqlite =3.37.0=hc218d9a_0
  • stack-data =0.2.0=pypi_0
  • statsmodels =0.13.2=pypi_0
  • tensorboard =2.8.0=pypi_0
  • tensorboard-data-server =0.6.1=pypi_0
  • tensorboard-plugin-wit =1.8.1=pypi_0
  • termcolor =1.1.0=pypi_0
  • terminado =0.13.3=pypi_0
  • terminaltables =3.1.10=pypi_0
  • text-unidecode =1.3=pypi_0
  • threadpoolctl =3.1.0=pypi_0
  • tinycss2 =1.1.1=pypi_0
  • tk =8.6.11=h1ccaba5_0
  • tokenizers =0.10.3=pypi_0
  • toml =0.10.2=pypi_0
  • torch =1.10.0=pypi_0
  • torch-tb-profiler =0.2.1=pypi_0
  • torchinfo =1.6.5=pypi_0
  • torchmetrics =0.6.2=pypi_0
  • torchtext =0.11.0=pypi_0
  • tornado =6.1=pypi_0
  • tqdm =4.62.3=pyhd3eb1b0_1
  • traitlets =5.1.1=pypi_0
  • transformers =4.15.0=pypi_0
  • typing-extensions =4.2.0=pypi_0
  • tzdata =2021e=hda174b7_0
  • urllib3 =1.26.7=pyhd3eb1b0_0
  • wcwidth =0.2.5=pypi_0
  • webencodings =0.5.1=pypi_0
  • websocket-client =0.57.0=pypi_0
  • werkzeug =2.1.1=pypi_0
  • wheel =0.35.1=pypi_0
  • widgetsnbextension =3.5.2=pypi_0
  • wrapt =1.13.3=pypi_0
  • xz =5.2.5=h7b6447c_0
  • yaml =0.2.5=h7b6447c_0
  • yarl =1.7.2=pypi_0
  • zipp =3.8.0=pypi_0
  • zlib =1.2.11=h166bdaf_1014
  • zstd =1.4.9=haebb681_0
demo/Dockerfile docker
  • janjones/awe-gradient latest build
gitpod/Dockerfile docker
  • janjones/awe-gradient 1650739890 build
gradient/requirements-torch.txt pypi
  • torch ==1.10.0
gradient/requirements.txt pypi
  • autopep8 ==1.6.0
  • gensim ==4.1.2
  • gradient ==2.0.2
  • ijson ==3.1.4
  • inflection ==0.5.1
  • ipywidgets ==7.6.5
  • jupyterlab ==3.2.4
  • matplotlib ==3.5.1
  • plotnine ==0.8.0
  • pylint ==2.12.2
  • python-slugify ==5.0.2
  • scikit-learn ==1.0.2
  • selectolax ==0.3.6
  • torch-tb-profiler ==0.2.1
  • torchinfo ==1.6.5
  • torchmetrics <0.7
  • torchtext ==0.11.0
  • transformers ==4.15.0