https://github.com/alfa-group/code-representations-ml-brain
[NeurIPS 2022] "Convergent Representations of Computer Programs in Human and Artificial Neural Networks" by Shashank Srikant*, Benjamin Lipkin*, Anna A. Ivanova, Evelina Fedorenko, Una-May O'Reilly.
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.4%) to scientific vocabulary
Keywords
cognitive-neuroscience
fmri-data-analysis
language-models
language-understanding
programming-languages
python
representation-learning
Last synced: 5 months ago
·
JSON representation
Repository
[NeurIPS 2022] "Convergent Representations of Computer Programs in Human and Artificial Neural Networks" by Shashank Srikant*, Benjamin Lipkin*, Anna A. Ivanova, Evelina Fedorenko, Una-May O'Reilly.
Basic Info
Statistics
- Stars: 5
- Watchers: 5
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
cognitive-neuroscience
fmri-data-analysis
language-models
language-understanding
programming-languages
python
representation-learning
Created over 3 years ago
· Last pushed about 3 years ago
https://github.com/ALFA-group/code-representations-ml-brain/blob/main/
# Convergent Representations of Computer Programs in Human and Artificial Neural Networks
Resources for the paper `Convergent Representations of Computer Programs in Human and Artificial Neural Networks` by Shashank Srikant*, Benjamin Lipkin*, Anna A. Ivanova, Evelina Fedorenko, Una-May O'Reilly.
Published in NeurIPS 2022: https://openreview.net/forum?id=AqexjBWRQFx
Citation:
```bibtex
@inproceedings{SrikantLipkin2022,
title={Convergent Representations of Computer Programs in Human and Artificial Neural Networks},
author={Shashank Srikant* and Ben Lipkin* and Anna A Ivanova and Evelina Fedorenko and {Una-May} {O'R}eilly},
booktitle={Advances in Neural Information Processing Systems},
editor={Alice H. Oh and Alekh Agarwal and Danielle Belgrave and Kyunghyun Cho},
year={2022},
url={https://openreview.net/forum?id=AqexjBWRQFx}
}
```
The labs involved:
https://evlab.mit.edu/
https://alfagroup.csail.mit.edu/
For additional information, contact shash@mit.edu, lipkinb@mit.edu, or unamay@csail.mit.edu, evelina9@mit.edu.
Related material like slides, talk, a summary of our work etc. [are available here](https://shashank-srikant.github.io/notes/neurips22-brain/).
Datasets and model checkpoints which this codebase downloads and analyzes are here: https://huggingface.co/datasets/benlipkin/braincode-neurips2022
## Overview
The goal of this work is to relate brain representations of code to (1) specific code properties and (2) representations of code produced by language models trained on code.
In Experiment 1, we predict the different static and dynamic analysis metrics from the brain MRI recordings (each of dimension D_B) of 24 human subjects reading 72 unique Python programs (N) by training separate linear models for each subject and metric.
In Experiment 2, we learn affine maps from brain representations to the corresponding representations generated by code language models (each of dimension D_M) on these 72 programs.

## Details
This pipeline supports several major functions.
- **MVPA** (multivariate pattern analysis) evaluates decoding of **code properties** or **code model** representations from their respective **brain representations** within a collection of canonical **brain regions**.
- **PRDA** (program representation decoding analysis) evaluates decoding of **code properties** from **code model** representations.
## Reproducing paper results
This package provides an automated build using [GNU Make](https://www.gnu.org/software/make/manual/make.html). A single pipeline is provided, which starts from an empty environment, and provides ready to use software.
```bash
make setup # see 'make help' for more info
```
Pipelines also exist to run core analyses and generate figures and tables.
To run all core experiments from the paper, the following command will suffice after setup:
```bash
make analysis
```
To regenerate tables and figures from the paper, run the following after completing the analyses:
```bash
make paper
```
Note - These commands will take ~8 hours to complete on a machine without GPU cards.
## Custom Analyses
The pipeline can also be used for custom analyses, via the following command line interface.
```bash
# basic examples
python braincode mvpa -f brain-MD -t task-structure # brain -> {task, model}
python braincode prda -f code-bert -t task-tokens # model -> task
# more complex example
python braincode mvpa -f brain-lang+brain-MD -t code-projection -d 64 -m SpearmanRho -p $BASE_PATH --score_only
# note how `+` operator can be used to join multiple representations via concatenation
# additional metrics are available in the `metrics.py` module
```
### Supported Brain Regions
- `brain-MD` (Multiple Demand)
- `brain-lang` (Language)
- `brain-vis` (Visual)
- `brain-aud` (Auditory)
### Supported Code Features
**Code Properties**
- `test-code` (code vs. sentences)
- `test-lang` (english vs. japanese)
- `task-content` (math vs. str) \*datatype
- `task-structure` (seq vs. for vs. if) \*control flow
- `task-tokens` (# of tokens in program) \*static analysis
- `task-lines` (# of runtime steps during execution) \*dynamic analysis
- `task-bytes` (# of bytecode ops executed)
- `task-nodes` (# of nodes in AST)
- `task-halstead` (function of tokens, operations, vocabulary)
- `task-cyclomatic` (function of program control flow graph)
**Code Models**
- `code-projection` (presence of tokens)
- `code-bow` (token frequency)
- `code-tfidf` (token and document frequency)
- `code-seq2seq` [1](https://github.com/IBM/pytorch-seq2seq) (sequence modeling)
- `code-xlnet` [2](https://arxiv.org/pdf/1906.08237.pdf) (autoregressive LM)
- `code-gpt2` [4](https://huggingface.co/microsoft/CodeGPT-small-py) (autoregressive LM)
- `code-bert` [5](https://arxiv.org/pdf/2002.08155.pdf) (masked LM)
- `code-roberta` [6](https://huggingface.co/huggingface/CodeBERTa-small-v1) (masked LM)
- `code-transformer` [3](https://arxiv.org/pdf/2103.11318.pdf) (LM + structure learning)
## License
[](https://opensource.org/licenses/MIT)
Owner
- Name: Anyscale Learning For All (ALFA)
- Login: ALFA-group
- Kind: organization
- Email: alfa-apply@csail.mit.edu
- Location: Cambridge, MA, USA
- Website: https://alfagroup.csail.mit.edu/
- Repositories: 19
- Profile: https://github.com/ALFA-group
Scalable machine learning technology, Adversarial AI, Evolutionary algorithms, and data science frameworks.
GitHub Events
Total
- Watch event: 2
Last Year
- Watch event: 2
Dependencies
Dockerfile
docker
- continuumio/miniconda3 latest build
requirements.txt
pypi
- astor ==0.8.1
- datasets ==1.9.0
- dill ==0.3.4
- joblib ==0.14.1
- line_profiler ==3.3.0
- lxml ==4.8.0
- matplotlib ==3.3.4
- mne ==0.24.1
- mypy ==0.941
- numpy ==1.18.1
- pylint ==2.13.4
- pylint-exit ==1.2.0
- pylint-json2html ==0.3.0
- radon ==5.1.0
- scikit_learn ==0.24.1
- scipy ==1.4.1
- tensorflow ==2.3.0
- torch ==1.4.0
- torchtext ==0.5.0
- tqdm ==4.43.0
- transformers ==3.1.0