aoc-frp-code

Replication package, supplementary materials, and analysis pipeline for our EEG study

https://github.com/brains-on-code/aoc-frp-code

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.3%) to scientific vocabulary

Keywords

eeg fixation-related-potentials natural-language-comprehension program-comprehension replication-package research

Last synced: 6 months ago · JSON representation ·

Repository

Replication package, supplementary materials, and analysis pipeline for our EEG study

Basic Info

Host: GitHub
Owner: brains-on-code
License: cc-by-4.0
Language: Jupyter Notebook
Default Branch: main
Homepage:
Size: 13.5 MB

Statistics

Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Topics

eeg fixation-related-potentials natural-language-comprehension program-comprehension replication-package research

Created about 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

Unexpected but informative: What fixation-related potentials tell us about the processing of confusing program code

Concept

Inspired by existing studies on atoms of confusions such as Langhout et Aniche: Atoms of Confusion in Java (ICPC 2021), which analyzed confusing code patterns in program code, but is extended to a EEG and eye-tracking setup to detect neurocognitive processes on a deeper and more detailed level using fixation-related potentials.

Requirements

The software has been developed and used on Windows 10 and 11 OS. There are no specific hardware requirements

The major share of the code is presented as jupyter notebooks, for which Python3 is required (implementation in Python 3.11). It is recommended to create a virtual environment to use for the execution of those snippets. The required packages can be installed via the requirements file. This should require only a few minutes.

pip install -r requirements.txt

For some aspects like the study presentation, we used other programming languages. For the presentation, we used scripts generated in PsychoPy, so the script presented in 06-Task-Presentation_Recording is created and executed with PsychoPy (Version 2021.3.2). Additionally, for GLMERs we use R (4.3.2), which has to be installed and usable in jupyter notebook.

Study

Purpose

This folder contains all parts used for the study, where we showed the participants code snippets and captured their gaze and brain activity (eye tracker and eeg waves) while understanding the presented code.

Pipeline and Folders

In the folder, there exist the following folders for an iteration:

01-Data-Code_Snippets contains the adapted versions (v0) we created for our study and the versions we made to each in v1 and v2 to increase our snippet database. They are based on Langhout et Aniche's snippets but shortened to suit an FRP setup. The correct solutions of those snippets, provided wrong solutions and additional information such as aoc categories are stored in a csv file snippets.csv.
03-Data-CodeSnippetImages contain:
- the Pictures generated for this study for each of our snippets, whose content is located in the presentation image folder.
- AOI/Pictures/: Pictures contains an annotated version of the snippets, where the atoms of confusion and the expected area of interest for the reader are marked. Those were used later for FRP analysis. There, we have categorized the heterogeneous AoIs into multiple categories depending on the amount / type of expression used. For blocks (multiple lines of code) the color blue was used, for marking whole statements the color green was used, for marking larger expressions / areas the color yellow was used, for marking small expressions / parts of an expression the color orange was used and for marking crucial operators or keywords the color red was used. All snippet pairs have identical AoI categories, the AoIs will only differ in their size and position. For the FRP analysis, only the AoI with the lowest category (and smallest size) for each snippet pair were used.
04-Task-GenerateBlockSequences contains a task GenerateBlockSequences that uses the information on the snippets to create presentation sequences for all participants of the study by randomly ordering the snippets based on some constraints, and creating the condition files for the presentation.
05-Data-Trial_Runs stores the three condition file sequences for each participant. They can be found with the raw data in the data repository.
06-Task-StudyPresentationSoftware contains the final presentation script created with PsychoPy, including all images as generated into Pictures. In the sub-folder conditions, you need to store the condition files as found with the raw data in the data repository.
07-Task-Study management and execution contains the template for the documents used for the study, such as the questionnaires and consent forms, as well as a document that describes the study procedure.
08-Data-Trial_Recordings needs to store the results received from the participants' trials, and is structured into
- raw folder, whose data is taken directly from the presentation and eeg recording scripts,
- and the processed folder, which contains the transformed data to be used for analysis,
- manualaccuracyevaluation contains the iterative approach of evaluation the fixations, the evaluation sheets of each of the 2 reviewers as 1 excel file and the combination (the fixation data itself is in a sub-folder of processed.
- screenshots should contain the screenshots without aoi and with aoi boxes).

All required files for this folder can be found with the raw data in the data repository. * 09-Task-Evaluate_Data contains the framework scripts to preprocess (PRE) and evaluate (EVAL) the recorded data, consisting of behavioral, visual and eeg input. They are structured as numbered jupyter notebooks for the analysis pipeline and need to be executed in this order (except for 01d4 and 01d5, which should be executed by iteration and then number, e.g., 01d4PREVISFixationCorrection-it1, 01d5PREVISSnippetCorrectionAccuracyImage-it1, 01d4PREVISFixationCorrection-it2, 01d5PREVISSnippetCorrectionAccuracyImage-it2, 01d4PREVISFixationCorrection-it3, 01d5PREVISSnippetCorrectionAccuracyImage-it3, 01d4PREVISFixationCorrection-it4, 01d5PREVISSnippetCorrectionAccuracyImage-it4), - The core functionality is extracted in a separate utils package, detailed in the package's README. - image contains images for instruction readmes in this folder.

The jupyter notebooks have the following content: - 001PREVISScreenshotAOIidentification extracts the AoI positions from the screenshots - 01a1PREDate_Anonymization anonymizes all raw data extracted from each experiment run - 01a2PREExclusion_Files creates the exclusion files for each participant - 01bPREEEGMetadataCheck checks the impedances of the EEG recording for each participant's experiment run - 01c1PREBEHEventData extracts and cleans the behavioral data - 01c2PREBEHTrialExclusion excludes trials based on behavioral data on participant-level if the overall correctness is under the expected result due to chance, or on trial-level if there are outliers in the comprehension time (no exclusion for this study)
- 01d1PREVISEventGaze_Data extracts eye gaze data as stream of x-y coordinates - 01d2PREVISFixationCalculation calculates the fixations based on the gaze data. Attention! This uses the I2MC algorithm which is unstable. We set made random parameters constant, but you might still receive different results. - 01d3PREVISFixationCrossAccuracyImage creates the accuracy images for the fixation-cross view displaying the fixations over the screenshot for each trial - 01d4PREVISFixationCorrection-it1 performs the first iteration of fixation correction (which is nothing, as the data is already there and the first iteration is exclusively used for outlier removal) - 01d5PREVISSnippetCorrectionAccuracyImage-it1 performs the first iteration of creating accuracy images for the snippet-view to detect outliers and of creating the template to be filled by the evaluators - 01d6PREVISManualAccuracy_Evaluation evaluators perform accuracy evaluation separately and combine their results (after 01d5 of each iteration) - 01d4PREVISFixationCorrection-it2 performs the second iteration of fixation correction (by applying up to 4 correction algorithms to the existing data for all trials), does not apply x-offset correction in correction algorithm - 01d5PREVISSnippetCorrectionAccuracyImage-it2 performs the second iteration of creating accuracy images for different algorithm results in comparison to the original version in the snippet-view to detect outliers and of creating the template to be filled by the evaluators, considers x-offset when plotting the results - 01d6PREVISManualAccuracy_Evaluation evaluators perform accuracy evaluation separately and combine their results (after 01d5 of each iteration) - 01d4PREVISFixationCorrection-it3 performs the third iteration of fixation correction (by applying up to 4 correction algorithms to the existing data for all trials that need to be reworked), does not apply x-offset correction in correction algorithm - 01d5PREVISSnippetCorrectionAccuracyImage-it3 performs the third iteration of creating accuracy images for different algorithm results in comparison to the original version in the snippet-view to detect outliers and of creating the template to be filled by the evaluators (for all trials that need to be reworked), considers x-offset when plotting the results - 01d6PREVISManualAccuracy_Evaluation evaluators perform accuracy evaluation separately and combine their results (after 01d5 of each iteration) - 01d4PREVISFixationCorrection-it4 performs the fourth iteration of fixation correction (by applying up to 4 correction algorithms to the existing data for all trials that need to be reworked), does not apply x-offset correction in correction algorithm - 01d5PREVISSnippetCorrectionAccuracyImage-it4 performs the fourth iteration of creating accuracy images for different algorithm results in comparison to the original version in the snippet-view to detect outliers and of creating the template to be filled by the evaluators (for all trials that need to be reworked), considers x-offset when plotting the results - 01d6PREVISManualAccuracy_Evaluation evaluators perform accuracy evaluation separately and combine their results (after 01d5 of each iteration) - 01d7PREVISGazeSelection_Fixation check that the manual evaluation was executed correctly, and applying the outlier removal, exclusions and corrections to the final dataset - 01d8PREVIS_Statistics correct gaze and fixation statistics about the final dataset - 01d9PREVISSpecialFixations determine the fixation in each trial to be used for ERP analysis - 01d11PREVISOutlierTrials aggregates outlier statistics for each phase of the eye-tracking preprocessing (01d1-01d7) - 01e1PREEEGICAReasoning_File creates the template for performing ICA using BrainVision Analyzer - 01e2PREEEGBVApreparation the preprocessing steps performed in BrainVision Analyzer - 01e3PREEEGFilterTrial_Assignments loads and crops EEG data to relevant part, then tries to automatically assign trial information (snippet) to each event in the EEG recording using information generated by 01c1PREBEHEventData. Otherwise, the trial information assignment has to be performed manually based on a provided information template - 01e4PREEEGTrialAssignments_Split creates EEG segments for each trial as previously assigned - 01f1PREALLSummaryParticipant_Exclusion summarizes all exclusions in either data mode including the reason - 01g1PREEEGVISSetFRPmarkers copies the EEG segments and synchronizes with eye-tracking data by adding a marker at the time of the special fixation for FRP analysis - 03a1EVALBEHConfusionDistribution analyzes the behavioral data by generating overviews of the dependent variables for each condition and for different levels of abstraction - 03a2EVALBEHConfusionLMEM Correctness displays the steps of the backtracking algorithm to determine to optimal model for answer correctness - 03a2EVALBEHConfusionLMEM Duration displays the steps of the backtracking algorithm to determine to optimal model for comprehension time - 03a2EVALBEHConfusionLMEM Rating displays the steps of the backtracking algorithm to determine to optimal model for subjective difficulty rating - 03c1EVALERP aggregate EEG segments to subject average and grand average based on stimulus onset - 03c2EVALERPMassUnivariate_sdiff 03-15 100Hz performs the cluster-based permutation test - 03c2EVALERP_plot creates different visualizations of the grand averages - 03d1EVALFRP aggregate EEG segments to subject average and grand average based on fixation onset - 03d2EVALFRPfixationanalysis analyzes fixation-based metrics for all trials included in FRP analysis, such as onset delay - 03d2EVALFRPPermutationsdiff01-10100Hz performs the cluster-based permutation test - 03d2EVALFRP_plot creates different visualizations of the grand averages - 03e1EVALVISReadingMetrics analyzes reading-based metrics, such as first-pass in aoi * 10-DataEvaluationResults will be created to store the evaluation results such as behavioral statistics, GLMER models for the behavioral metrics, and ERP and FRP analyses with their epoch data and statistical analysis.

The data of this project is stored on Zenodo upon publication.

Terminology

Some terms used in the paper have another correspondence in the code analysis due to project evolution.

Duration -> Comprehension Time
Correctness -> Answer Correctness
Rating -> Subjective Difficulty Rating
Clean -> Clean
Obf -> Confusing
Block_No -> BlockNo
InBlockNo -> ItemOrder

Clarification of further terms: - Conditions = Confusing, Clean - Snippets: one program code stimulus, also referred to as (code) snippet - Snippet pair: two corresponding snippets differing only in the condition - Snippet number: the snippet number as assigned by Langhout et Aniche, stands for 3 snippet pairs with differ only in their version number (v0,v1,v2) - Trials: A unique combination of snippet and Participant - Behavioral data = Comprehension Time, Answer Correctness, Subjective Difficulty Rating

License

The work is licensed as Creative Commons Attribution 4.0 International.

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited.

Owner

Name: Brains-on-Code
Login: brains-on-code
Kind: organization

Website: https://brains-on-code.github.io/
Repositories: 2
Profile: https://github.com/brains-on-code

We are researchers interested in empirical software engineering from Chemnitz, Magdeburg, Saarbrücken, and Raleigh.

Citation (CITATION.cff)

cff-version: 1.2.0
title: >-
  Unexpected but informative: What fixation-related
  potentials tell us about the processing of confusing
  program code
message: >-
  If you use these scripts, please cite it using the metadata
  from this file.
date-released: 2024-11-28
url: "https://github.com/brains-on-code/AoC-FRP-Code/"
authors:
  - given-names: Annabelle
    family-names: Bergum
  - given-names: Anna-Maria
    family-names: Maurer
  - given-names: Norman
    family-names: Peitek
  - given-names: Regine
    family-names: Bader
  - given-names: Axel
    family-names: Mecklinger
  - given-names: Vera
    family-names: Demberg
  - given-names: Janet
    family-names: Siegmund
  - given-names: Sven
    family-names: Apel
identifiers:
  - type: doi
    value: 10.48550/arXiv.2412.10099
    description: Preprint
preferred-citation:
  authors:
  - given-names: Annabelle
    family-names: Bergum
  - given-names: Anna-Maria
    family-names: Maurer
  - given-names: Norman
    family-names: Peitek
  - given-names: Regine
    family-names: Bader
  - given-names: Axel
    family-names: Mecklinger
  - given-names: Vera
    family-names: Demberg
  - given-names: Janet
    family-names: Siegmund
  - given-names: Sven
    family-names: Apel
  doi: "10.48550/arXiv.2412.10099"
  title: "Unexpected but informative: What fixation-related potentials tell us about the processing of confusing program code"
license: CC-BY-4.0

GitHub Events

Total

Push event: 3

Last Year

Push event: 3

Dependencies

requirements.txt pypi

I2MC ==2.2.2
Pillow ==9.5.0
cliffs-delta ==1.0.0
h5py ==3.9.0
ipywidgets ==8.0.6
matplotlib ==3.7.1
mne ==1.4.2
numpy ==1.24.2
opencv-python >=4.8.0
openpyxl ==3.1.2
pandas ==2.0.0
pingouin ==0.5.4
pybv ==0.7.5
rpy2 ==3.5.15
scipy ==1.11.1
seaborn ==0.12.2
statsmodels ==0.14.1
tqdm ==4.65.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science