acquiring-linguistic-knowledge

Master's thesis of Theodor Amariucai, supervised by Alexander Warstadt and Prof. Ryan Cotterell.

https://github.com/amariucaitheodor/acquiring-linguistic-knowledge

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.5%) to scientific vocabulary

Keywords

linguistics multimodal-learning natural-language-processing
Last synced: 6 months ago · JSON representation ·

Repository

Master's thesis of Theodor Amariucai, supervised by Alexander Warstadt and Prof. Ryan Cotterell.

Basic Info
  • Host: GitHub
  • Owner: amariucaitheodor
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 5.14 MB
Statistics
  • Stars: 3
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Topics
linguistics multimodal-learning natural-language-processing
Created almost 3 years ago · Last pushed over 2 years ago
Metadata Files
Readme Citation

README.md

alkmi

Master's thesis of Theodor Amariucai, supervised by Alexander Warstadt and Prof. Ryan Cotterell.

alt text

Introduction

Project sections

  1. Text-vision pre-training works directly through HuggingFace and WaB.
  • Log in with HuggingFace: huggingface-cli login
  • Log in with Weights and Biases: wandb login

```shell poetry build poetry install poetry shell

cd alkmi/callbacks/evaluation-pipeline pip install -e ".[dev]" unzip filter-data.zip

Test run:

bash run_locally.sh configs/flava/debug.yaml

```

  1. Evaluation involves the evaluation-pipeline project for BLiMP. Note: this requires >=python3.9.

Miscellaneous

  • For sweeps with WandB on the WiT configuration, run e.g.:

shell cd alkmi wandb sweep configs/flava/wit_final_sweep_config.yaml bash run_sweep_on_cluster.sh NR_AGENTS SWEEP_ID

  • To connect to the Euler compute node, first ssh into the login node then run e.g.:

```bash

An already-running job

srun --interactive --jobid JOBID --pty bash

Interactive node with 2 GPUs (20GBs VRAM each) and a lot of RAM for the (initial) WiT collapsing

srun --gpus=2 \ --gres=gpumem:40g \ --ntasks-per-node=2 \ --job-name "interactive" --cpus-per-task=4 --mem-per-cpu=15000 --nodes=1 --time=4:00:00 --pty --preserve-env $SHELL

Interactive node with 1 GPU and a lot of RAM for the (initial) WiT collapsing

srun --gpus=1 \ --gres=gpumem:40g \ --ntasks-per-node=1 \ --job-name "interactive" --cpus-per-task=4 --mem-per-cpu=15000 --nodes=1 --time=4:00:00 --pty --preserve-env $SHELL

Processing node

srun --time=24:00:00 --ntasks-per-node=1 --cpus-per-task=16 --mem-per-cpu=16000 --nodes=1 --pty --preserve-env $SHELL

```

Euler

  • Check space: shell du -h -d 2 ~ | sort -h # home directory du -h -d 2 /cluster/scratch/tamariucai/ | sort -h # scratch space du -h -d 2 /cluster/work/cotterell/tamariucai/ | sort -h # work directory

  • For faster access, set up a ~/init.sh:

```bash env2lmod module purge module load ethproxy gcc/8.2.0 pythongpu/3.10.4 cuda/11.8.0 cudnn/8.8.1.3 # should move to pythongpu/3.11.2! export PYTORCHCUDAALLOCCONF=maxsplitsizemb:256 export PATH="$HOME/.local/bin:$PATH" export HFDATASETSCACHE="/cluster/scratch/tamariucai/HuggingfaceDatasets" export HFHOME="/cluster/work/cotterell/tamariucai/HuggingfaceHome" export WANDBCACHEDIR="/cluster/scratch/tamariucai/WandbCache" export WANDB_DIR="/cluster/work/cotterell/tamariucai/WandbDir" export PYTHONPATH=/cluster/work/cotterell/tamariucai/acquiring-linguistic-knowledge/:/cluster/work/cotterell/tamariucai/acquiring-linguistic-knowledge/alkmi/callbacks/evaluation-pipeline cd /cluster/work/cotterell/tamariucai/acquiring-linguistic-knowledge poetry shell cd alkmi

```

Submodules

To work with submodules:

  • Pull library updates with git submodule update --remote --rebase. This might also help git submodule update --init --force --remote.
  • Push changes (including to the library) with git push --recurse-submodules=check

Owner

  • Name: Theodor Amariucai
  • Login: amariucaitheodor
  • Kind: user
  • Location: Zurich, Switzerland
  • Company: ETH Zurich

Interested in Algorithmics, Artificial Intelligence, Vision & Robotics, Natural Language Processing, Mobile App Development.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Amariucai"
  given-names: "Theodor"
  orcid: "https://orcid.org/0009-0001-2506-2510"
- family-names: "PyTorch"
  given-names: "Multimodal Team"
title: "Acquiring Linguistic Knowledge from Multimodal Input"
version: 0.1.0
# doi: 10.5281/zenodo.1234
date-released: 2023-09-13
url: "https://github.com/amariucaitheodor/acquiring-linguistic-knowledge"

GitHub Events

Total
  • Watch event: 2
Last Year
  • Watch event: 2

Issues and Pull Requests

Last synced: 12 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/pylint.yml actions
  • Gr1N/setup-poetry v8 composite
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
poetry.lock pypi
  • aiohttp 3.8.5
  • aiosignal 1.3.1
  • antlr4-python3-runtime 4.9.3
  • appdirs 1.4.4
  • async-timeout 4.0.3
  • attrs 23.1.0
  • blobfile 2.0.2
  • certifi 2023.7.22
  • charset-normalizer 3.2.0
  • click 8.1.7
  • cmake 3.27.2
  • colorama 0.4.6
  • contourpy 1.1.0
  • cycler 0.11.0
  • dall-e 0.1
  • datasets 2.14.4
  • dill 0.3.7
  • docker-pycreds 0.4.0
  • evaluate 0.4.0
  • exceptiongroup 1.1.3
  • filelock 3.12.2
  • fonttools 4.42.1
  • frozenlist 1.4.0
  • fsspec 2023.6.0
  • gitdb 4.0.10
  • gitpython 3.1.32
  • huggingface-hub 0.16.4
  • hydra-core 1.3.2
  • idna 3.4
  • iniconfig 2.0.0
  • jinja2 3.1.2
  • kiwisolver 1.4.4
  • lightning-utilities 0.9.0
  • lit 16.0.6
  • lxml 4.9.3
  • markupsafe 2.1.3
  • matplotlib 3.7.2
  • mpmath 1.3.0
  • multidict 6.0.4
  • multiprocess 0.70.15
  • mypy 1.5.1
  • mypy-extensions 1.0.0
  • networkx 3.1
  • numpy 1.25.2
  • nvidia-cublas-cu11 11.10.3.66
  • nvidia-cuda-cupti-cu11 11.7.101
  • nvidia-cuda-nvrtc-cu11 11.7.99
  • nvidia-cuda-runtime-cu11 11.7.99
  • nvidia-cudnn-cu11 8.5.0.96
  • nvidia-cufft-cu11 10.9.0.58
  • nvidia-curand-cu11 10.2.10.91
  • nvidia-cusolver-cu11 11.4.0.1
  • nvidia-cusparse-cu11 11.7.4.91
  • nvidia-nccl-cu11 2.14.3
  • nvidia-nvtx-cu11 11.7.91
  • omegaconf 2.3.0
  • packaging 23.1
  • pandas 2.0.3
  • pathtools 0.1.2
  • patsy 0.5.3
  • pillow 9.5.0
  • pluggy 1.2.0
  • protobuf 4.24.1
  • psutil 5.9.5
  • pyarrow 12.0.1
  • pycocotools 2.0.7
  • pycryptodomex 3.18.0
  • pyparsing 3.0.9
  • pytest 7.4.0
  • python-dateutil 2.8.2
  • pytorch-lightning 2.0.7
  • pytz 2023.3
  • pyyaml 6.0.1
  • regex 2023.8.8
  • requests 2.31.0
  • responses 0.18.0
  • scipy 1.9.3
  • seaborn 0.12.2
  • sentry-sdk 1.29.2
  • setproctitle 1.3.2
  • setuptools 68.1.2
  • six 1.16.0
  • smmap 5.0.0
  • statsmodels 0.14.0
  • sympy 1.12
  • tokenizers 0.13.3
  • tomli 2.0.1
  • torch 2.0.0
  • torchmetrics 1.0.3
  • torchvision 0.15.1
  • tqdm 4.66.1
  • transformers 4.29.2
  • triton 2.0.0
  • typing-extensions 4.7.1
  • tzdata 2023.3
  • urllib3 2.0.4
  • wandb 0.15.4
  • wheel 0.41.1
  • xxhash 3.3.0
  • yarl 1.9.2
pyproject.toml pypi
  • DALL-E ^0.1
  • Pillow ^9.5.0
  • datasets ^2.13.1
  • evaluate ^0.4.0
  • hydra-core ^1.3.2
  • omegaconf ^2.3.0
  • protobuf ^4.23.2
  • pycocotools ^2.0.6
  • python ^3.10
  • pytorch-lightning ^2.0.5
  • requests ^2.31.0
  • seaborn ^0.12.2
  • statsmodels ^0.14.0
  • torch 2.0.0
  • transformers 4.29.2
  • wandb 0.15.4