acquiring-linguistic-knowledge
Master's thesis of Theodor Amariucai, supervised by Alexander Warstadt and Prof. Ryan Cotterell.
https://github.com/amariucaitheodor/acquiring-linguistic-knowledge
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.5%) to scientific vocabulary
Keywords
Repository
Master's thesis of Theodor Amariucai, supervised by Alexander Warstadt and Prof. Ryan Cotterell.
Basic Info
Statistics
- Stars: 3
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
alkmi
Master's thesis of Theodor Amariucai, supervised by Alexander Warstadt and Prof. Ryan Cotterell.

Introduction
Project sections
- Text-vision pre-training works directly through HuggingFace and WaB.
- Log in with HuggingFace:
huggingface-cli login - Log in with Weights and Biases:
wandb login
```shell poetry build poetry install poetry shell
cd alkmi/callbacks/evaluation-pipeline pip install -e ".[dev]" unzip filter-data.zip
Test run:
bash run_locally.sh configs/flava/debug.yaml
```
- Evaluation involves the evaluation-pipeline project
for BLiMP. Note: this requires >=
python3.9.
Miscellaneous
- For sweeps with WandB on the WiT configuration, run e.g.:
shell
cd alkmi
wandb sweep configs/flava/wit_final_sweep_config.yaml
bash run_sweep_on_cluster.sh NR_AGENTS SWEEP_ID
- To connect to the Euler compute node, first ssh into the login node then run e.g.:
```bash
An already-running job
srun --interactive --jobid JOBID --pty bash
Interactive node with 2 GPUs (20GBs VRAM each) and a lot of RAM for the (initial) WiT collapsing
srun --gpus=2 \ --gres=gpumem:40g \ --ntasks-per-node=2 \ --job-name "interactive" --cpus-per-task=4 --mem-per-cpu=15000 --nodes=1 --time=4:00:00 --pty --preserve-env $SHELL
Interactive node with 1 GPU and a lot of RAM for the (initial) WiT collapsing
srun --gpus=1 \ --gres=gpumem:40g \ --ntasks-per-node=1 \ --job-name "interactive" --cpus-per-task=4 --mem-per-cpu=15000 --nodes=1 --time=4:00:00 --pty --preserve-env $SHELL
Processing node
srun --time=24:00:00 --ntasks-per-node=1 --cpus-per-task=16 --mem-per-cpu=16000 --nodes=1 --pty --preserve-env $SHELL
```
Euler
Check space:
shell du -h -d 2 ~ | sort -h # home directory du -h -d 2 /cluster/scratch/tamariucai/ | sort -h # scratch space du -h -d 2 /cluster/work/cotterell/tamariucai/ | sort -h # work directoryFor faster access, set up a
~/init.sh:
```bash env2lmod module purge module load ethproxy gcc/8.2.0 pythongpu/3.10.4 cuda/11.8.0 cudnn/8.8.1.3 # should move to pythongpu/3.11.2! export PYTORCHCUDAALLOCCONF=maxsplitsizemb:256 export PATH="$HOME/.local/bin:$PATH" export HFDATASETSCACHE="/cluster/scratch/tamariucai/HuggingfaceDatasets" export HFHOME="/cluster/work/cotterell/tamariucai/HuggingfaceHome" export WANDBCACHEDIR="/cluster/scratch/tamariucai/WandbCache" export WANDB_DIR="/cluster/work/cotterell/tamariucai/WandbDir" export PYTHONPATH=/cluster/work/cotterell/tamariucai/acquiring-linguistic-knowledge/:/cluster/work/cotterell/tamariucai/acquiring-linguistic-knowledge/alkmi/callbacks/evaluation-pipeline cd /cluster/work/cotterell/tamariucai/acquiring-linguistic-knowledge poetry shell cd alkmi
```
Submodules
To work with submodules:
- Pull library updates with
git submodule update --remote --rebase. This might also helpgit submodule update --init --force --remote. - Push changes (including to the library) with
git push --recurse-submodules=check
Owner
- Name: Theodor Amariucai
- Login: amariucaitheodor
- Kind: user
- Location: Zurich, Switzerland
- Company: ETH Zurich
- Website: https://www.linkedin.com/in/amariucaitheodor/
- Twitter: TAmariucai
- Repositories: 15
- Profile: https://github.com/amariucaitheodor
Interested in Algorithmics, Artificial Intelligence, Vision & Robotics, Natural Language Processing, Mobile App Development.
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Amariucai" given-names: "Theodor" orcid: "https://orcid.org/0009-0001-2506-2510" - family-names: "PyTorch" given-names: "Multimodal Team" title: "Acquiring Linguistic Knowledge from Multimodal Input" version: 0.1.0 # doi: 10.5281/zenodo.1234 date-released: 2023-09-13 url: "https://github.com/amariucaitheodor/acquiring-linguistic-knowledge"
GitHub Events
Total
- Watch event: 2
Last Year
- Watch event: 2
Issues and Pull Requests
Last synced: 12 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- Gr1N/setup-poetry v8 composite
- actions/checkout v3 composite
- actions/setup-python v3 composite
- aiohttp 3.8.5
- aiosignal 1.3.1
- antlr4-python3-runtime 4.9.3
- appdirs 1.4.4
- async-timeout 4.0.3
- attrs 23.1.0
- blobfile 2.0.2
- certifi 2023.7.22
- charset-normalizer 3.2.0
- click 8.1.7
- cmake 3.27.2
- colorama 0.4.6
- contourpy 1.1.0
- cycler 0.11.0
- dall-e 0.1
- datasets 2.14.4
- dill 0.3.7
- docker-pycreds 0.4.0
- evaluate 0.4.0
- exceptiongroup 1.1.3
- filelock 3.12.2
- fonttools 4.42.1
- frozenlist 1.4.0
- fsspec 2023.6.0
- gitdb 4.0.10
- gitpython 3.1.32
- huggingface-hub 0.16.4
- hydra-core 1.3.2
- idna 3.4
- iniconfig 2.0.0
- jinja2 3.1.2
- kiwisolver 1.4.4
- lightning-utilities 0.9.0
- lit 16.0.6
- lxml 4.9.3
- markupsafe 2.1.3
- matplotlib 3.7.2
- mpmath 1.3.0
- multidict 6.0.4
- multiprocess 0.70.15
- mypy 1.5.1
- mypy-extensions 1.0.0
- networkx 3.1
- numpy 1.25.2
- nvidia-cublas-cu11 11.10.3.66
- nvidia-cuda-cupti-cu11 11.7.101
- nvidia-cuda-nvrtc-cu11 11.7.99
- nvidia-cuda-runtime-cu11 11.7.99
- nvidia-cudnn-cu11 8.5.0.96
- nvidia-cufft-cu11 10.9.0.58
- nvidia-curand-cu11 10.2.10.91
- nvidia-cusolver-cu11 11.4.0.1
- nvidia-cusparse-cu11 11.7.4.91
- nvidia-nccl-cu11 2.14.3
- nvidia-nvtx-cu11 11.7.91
- omegaconf 2.3.0
- packaging 23.1
- pandas 2.0.3
- pathtools 0.1.2
- patsy 0.5.3
- pillow 9.5.0
- pluggy 1.2.0
- protobuf 4.24.1
- psutil 5.9.5
- pyarrow 12.0.1
- pycocotools 2.0.7
- pycryptodomex 3.18.0
- pyparsing 3.0.9
- pytest 7.4.0
- python-dateutil 2.8.2
- pytorch-lightning 2.0.7
- pytz 2023.3
- pyyaml 6.0.1
- regex 2023.8.8
- requests 2.31.0
- responses 0.18.0
- scipy 1.9.3
- seaborn 0.12.2
- sentry-sdk 1.29.2
- setproctitle 1.3.2
- setuptools 68.1.2
- six 1.16.0
- smmap 5.0.0
- statsmodels 0.14.0
- sympy 1.12
- tokenizers 0.13.3
- tomli 2.0.1
- torch 2.0.0
- torchmetrics 1.0.3
- torchvision 0.15.1
- tqdm 4.66.1
- transformers 4.29.2
- triton 2.0.0
- typing-extensions 4.7.1
- tzdata 2023.3
- urllib3 2.0.4
- wandb 0.15.4
- wheel 0.41.1
- xxhash 3.3.0
- yarl 1.9.2
- DALL-E ^0.1
- Pillow ^9.5.0
- datasets ^2.13.1
- evaluate ^0.4.0
- hydra-core ^1.3.2
- omegaconf ^2.3.0
- protobuf ^4.23.2
- pycocotools ^2.0.6
- python ^3.10
- pytorch-lightning ^2.0.5
- requests ^2.31.0
- seaborn ^0.12.2
- statsmodels ^0.14.0
- torch 2.0.0
- transformers 4.29.2
- wandb 0.15.4