disruption-prediciton-based-on-multimodal-deep-learning

Research-repository: Disruption Prediction and Analysis through Multimodal Deep Learning in KSTAR

https://github.com/zinzinbin/disruption-prediciton-based-on-multimodal-deep-learning

Keywords

computer-vision deep-learning disruptions multi-modal-learning plasma-instabilities plasma-physics time-series-classification tokamak

Last synced: 6 months ago · JSON representation ·

Repository

Research-repository: Disruption Prediction and Analysis through Multimodal Deep Learning in KSTAR

Basic Info

Host: GitHub
Owner: ZINZINBIN
Language: Jupyter Notebook
Default Branch: main
Homepage: https://www.sciencedirect.com/science/article/abs/pii/S0920379624000577
Size: 196 MB

Statistics

Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

computer-vision deep-learning disruptions multi-modal-learning plasma-instabilities plasma-physics time-series-classification tokamak

Created almost 4 years ago · Last pushed 12 months ago

Metadata Files

Readme Citation

README.md

# Disruptive prediction model using KSTAR video and numerical data via Deep Learning [Paper : Disruption Prediction and Analysis Through Multimodal Deep Learning in KSTAR]

Introduction

This is github repository of research on disruption predictin using deep learning in KSTAR dataset. In this research, KSTAR IVIS data and 0D parameters are used for predicting disruptions. We obtained plasma image data from IVIS in KSTAR and 0D parameters such as stored energy, beta, plasma current, internal inductance and so on. Additonal information such as ECE data including electron temperature and density are also used.

Unlike other research for disruption predictin using machine learning, we also used video data as an input so to use spatial-temporal information of plasma including time-varying plasma shape and light emission induced by plasma-neutral interaction. This requires neural networks which are generally used for video classification task.

However, there is imbalance data distribution issue which results from the time scale difference between disruptive phase and plasma operation. Thus, we applied resampling with Focal loss and LDAM loss to handle this problem. We demonstrated that using multimodal data including video and 0D parameters can enhance the precision of the disruption alarms. Moreover, some consistent results can be shown using GradCAM and permutation feature importance, which implies that the networks focus on the near of the core plasma from both image and 0D parameters(Te, Ne in the core of plasma). Several techniques were used for comparing the model performance indirectly.

If there is any question or comment, please contact my email (personal : wlstn5376@gmail.com, school : asdwlstn@snu.ac.kr) whenever you want.

How to generate training data

To generate training dataset, we have to use some assumptions and technique. The major process can be listed as below.

Firstly, we set image sequence data as an input data (B,T,C,W,H) and assumed that the last frame where the image of the plasma in tokamak disapper is a disruption.
Then, the last second frame of the image sequence can be considered as a current quench.
Thus, the frame sequences including the last second frame of each experiment data, are labeled as disruptive.
Under this condition, the neural networks trained by these labeled dataset can predict the disruption prior to a current quench.

This method can be useful for small dataset since it uses all data obtained from each experiment, but imbalance data distribution due to the time scale of disruptive phase can occur in this process. Therefore, we use specific learning algorithms to handle this issue (e.g. Re-sampling, Re-weighting, Focal Loss, LDAM Loss)

The model performance of disruption prediction

We can proceed continuous disruption prediction using video data(left) and 0D data(right) for KSTAR shot #21310. It is quite obvious that the sensitivity of the model controled by the threshold affects the missing alarm rate(recall). Additionally, different characteristics in predicting disruption are observed according to the data modality. Thus, we can think about combining two different modality of data so to overcome the limits.

Analysis of the models using visualization of hidden vectors

The hidden vectors embedded by vision networks can be visualized using PCA or t-SNE. The separation of the embedded data would seem to be more clear as the model predicts the disruption as well. Since the distinct patterns or precurssors for predicting disruptions can not be detected over long prediction time, the separtion is hardly observed at the case of long prediction time.

Analysis of the models using permutation feature importance

The importance of the 0D parameters can be estimated by permutation feature importance. According to the permuatation feature importance, we can observe that the electron information such as electron density and temperatature from both edge and core is important to predict the disruption. It is quite interesting that the importance of q95 is smaller than other values except kappa. Since KSTAR operations proceed in high q95 region, low q-limit is not considerable than other factors.

Analysis of the models using GradCAM and attention rollout

GradCAM and attention rollout are applied to visualize the information flow in case of models trained by IVIS data. CNN-based networks are highly trained as well so to focus on the specific region of the inside of the tokamak. However, this locality can not be observed in case of Video Vision Transformer, which has low precision due to high false alarm rate.

Enhancement of predicting disruptions using multimodal learning

Applying multimodal learning to the video vision network shows decrease on false alarms and locality in the vision encoder. This means that the modal capabilities increased by multimodal data can enhance the low precision of the disruption predictors and show robustness with respect to the data noise. Furthermore, several factors which induce the various types of disruptions can be considered by using multimodal data including IVIS, 1D profiles, 0D parameters and so on.

Enviornment

The code was developed using python 3.9 on Ubuntu 18.04

The GPU used : NVIDIA GeForce RTX 3090 24GB x 4

The resources for training networks were provided by PLARE in Seoul National University

How to Run

setting

Environment conda create env -f environment.yaml conda activate research-env
Video Dataset Generation : old version, inefficient memory usage and scalability ```

generate disruptive video data and normal video data from .avi

python3 ./src/generatevideodatafixed.py --fps 210 --duration 21 --distance 5 --savepath './dataset/'

train and test split with converting video as image sequences

python3 ./src/preprocessing.py --testratio 0.2 --validratio 0.2 --videodatapath './dataset/dur21dis0' --savepath './dataset/dur21_dis0' ```
Video Dataset Generation : new version, more efficient than old version ```

additional KSTAR shot log with frame information of the video data

python3 ./src/generatemodifiedshot_log.py

generate video dataset from extended KSTAR shot log : you don't need to split the train-test set for every distance

python3 ./src/generatevideodata.py --fps 210 --rawvideopath "./dataset/rawvideos/rawvideos/" --dfshotlistpath "./dataset/KSTARDisruptionShotListextend.csv" --savepath "./dataset/temp" --width 256 --height 256 --overwrite True ```
0D Dataset Generation (Numerical dataset) ```

interpolate KSTAR data and convert as tabular dataframe

python3 ./src/generatenumericaldata.py ```

Test

Test code before model training : check the invalid data or issues from model architecture ```

test all process : data + model

pytest test

test the data validity

pytest test/test_data.py

test the model validity

pytest test/test_model.py ```

Model training process

Models for video data python3 train_vision_nework.py --batch_size {batch size} --gpu_num {gpu num} --use_LDAM {bool : use LDAM loss} --model_type {model name} --tag {name of experiment / info} --use_DRW {bool : use Deferred re-weighting} --use_RS {bool : use re-sampling} --seq_len {int : input sequence length} --pred_len {int : prediction time} --image_size {int}
Models for 0D data python3 train_0D_nework.py --batch_size {batch size} --gpu_num {gpu num} --use_LDAM {bool : use LDAM loss} --model_type {model name} --tag {name of experiment / info} --use_DRW {bool : use Deferred re-weighting} --use_RS {bool : use re-sampling} --seq_len {int : input sequence length} --pred_len {int : prediction time}
Models for MultiModal(video + 0D data) python3 train_multi_modal.py --batch_size {batch size} --gpu_num {gpu num} --use_LDAM {bool : use LDAM loss} --use_GB {bool : use Deferred re-weighting} --tag {name of experiment / info} --use_DRW {bool : use Deferred re-weighting} --use_RS {bool : use re-sampling} --seq_len {int : input sequence length} --pred_len {int : prediction time} --tau {int : stride for input sequence}

Experiment

Experiment for each network(vision, 0D, multimodal) with different prediction time ```

R1Plus1D

sh exp/exp_r1plus1d.sh

Slowfast

sh exp/exp_slowfast.sh

ViViT

sh exp/exp_vivit.sh

Transformer

sh exp/exp0Dtransformer.sh

CnnLSTM

sh exp/exp0Dcnnlstm.sh

MLSTM-FCN

sh exp/exp0Dmlstm.sh

Multimodal model

sh exp/exp_multi.sh

Multimodal model with Gradient Blending

sh exp/expmultigb.sh ```
Experiment with different learning algorithms and models ```

case : R2Plus1D

sh exp/explar2plus1d.sh

case : SlowFast

sh exp/explaslowfast.sh

case : ViViT

sh exp/explavivit.sh ```
Model performance visualization for continuous disruption prediction using gif python3 make_continuous_prediction.py

Detail

Model to use

Video encoder
- R2Plus1D : https://github.com/irhum/R2Plus1D-PyTorch
- Slowfast : https://github.com/facebookresearch/SlowFast
- ViViT : https://github.com/rishikksh20/ViViT-pytorch
0D data encoder
- Transformer : paper(https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf), application code(https://www.kaggle.com/general/200913)
- Conv1D-LSTM using self-attention : https://pseudo-lab.github.io/Tutorial-Book/chapters/time-series/Ch5-CNN-LSTM.html
- MLSTM_FCN : paper(https://arxiv.org/abs/1801.04503), application code(https://github.com/titu1994/MLSTM-FCN)
Multimodal Model
- Multimodal fusion model: video encoder + 0D data encoder
- Tensor Fusion Network
- Other methods (Future work)
  - Multimodal deep representation learning for video classification : https://link.springer.com/content/pdf/10.1007/s11280-018-0548-3.pdf?pdf=button
  - Truly Multi-modal YouTube-8M Video Classification with Video, Audio, and Text : https://static.googleusercontent.com/media/research.google.com/ko//youtube8m/workshop2017/c06.pdf

Technique or algorithm to use

Solving imbalanced classificatio issue
- Re-Sampling : ImbalancedWeightedSampler, Over-Sampling for minor classes
- Re-Weighting : Define inverse class frequencies as weights to apply with loss function (CE, Focal Loss, LDAM Loss)
- LDAM with DRW : Label-distribution-aware margin loss with deferred re-weighting scheduling
- Multimodal Learning : Gradient Blending for avoiding sub-optimal due to large modalities
- Multimodal Learning : CCA Learning for enhancement
Analysis on physical characteristics of disruptive video data
- CAM : proceeding
- Grad CAM : paper(https://arxiv.org/abs/1610.02391), target model(R2Plus1D, SlowFast)
- attention rollout : paper(https://arxiv.org/abs/2005.00928), target model(ViViT)
Data augmentation
- Video Mixup Algorithm for Data augmentation(done, not effective)
- Conventional Image Augmentation(Flip, Brightness, Contrast, Blur, shift)
Training Process enhancement
- Multigrid training algorithm : Fast training for SlowFast
- Deep CCA : Deep cannonical correlation analysis to train multi-modal representation
Generalization and Robustness
- Add noise with image sequence and 0D data for robustness
- Multimodality can also guarantee the robustness from noise of the data
- Gradient Blending for avoiding sub-optimal states from multi-modal learning

Additional Task

Multi-GPU distributed Learning : done
Database contruction : Tabular dataset(IKSTAR) + Video dataset, done
ML Pipeline : Tensorboard, done

Dataset

Disruption : disruptive state at t = tipminf (current-quench)
Borderline : inter-plane region (not used)
Normal : non-disruptive state

📖 Citation

If you use this repository in your research, please cite the following:

📜 Research Article

Disruption prediction and analysis through multimodal deep learning in KSTAR
Kim, Jinsu, et al. "Disruption prediction and analysis through multimodal deep learning in KSTAR." Fusion Engineering and Design 200 (2024): 114204.

📌 Code Repository

Jinsu Kim (2024). Disruption-Prediciton-based-on-Multimodal-Deep-Learning. GitHub.
https://github.com/ZINZINBIN/Disruption-Prediciton-based-on-Multimodal-Deep-Learning

📚 BibTeX:

bibtex @software{Kim_Deep_Multimodal_Learning_2024, author = {Kim, Jinsu}, doi = {https://doi.org/10.1016/j.fusengdes.2024.114204}, license = {MIT}, month = feb, title = {{Deep Multimodal Learning based KSTAR Disruption Prediction Model}}, url = {https://github.com/ZINZINBIN/Disruption-Prediciton-based-on-Multimodal-Deep-Learning}, version = {1.0.0}, year = {2024} }

Owner

Name: KIM JINSU
Login: ZINZINBIN
Kind: user
Location: Seoul, Republic of Korea
Company: Seoul National University

Repositories: 6
Profile: https://github.com/ZINZINBIN

BS : Nuclear Engineering / Physics

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite our article and repository."
authors:
  - family-names: "Kim"
    given-names: "Jinsu"
    orcid: "https://orcid.org/0009-0000-2610-4551"
title: "Deep Multimodal Learning based KSTAR Disruption Prediction Model"
version: "1.0.0"
doi: "https://doi.org/10.1016/j.fusengdes.2024.114204"  # Replace with your DOI
url: "https://github.com/ZINZINBIN/Disruption-Prediciton-based-on-Multimodal-Deep-Learning"
date-released: "2024-02-02"
license: "MIT"

disruption-prediciton-based-on-multimodal-deep-learning

Science Score: 67.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Introduction

How to generate training data

The model performance of disruption prediction

Analysis of the models using visualization of hidden vectors

Analysis of the models using permutation feature importance

Analysis of the models using GradCAM and attention rollout

Enhancement of predicting disruptions using multimodal learning

Enviornment

How to Run

setting

generate disruptive video data and normal video data from .avi

train and test split with converting video as image sequences

additional KSTAR shot log with frame information of the video data

generate video dataset from extended KSTAR shot log : you don't need to split the train-test set for every distance

interpolate KSTAR data and convert as tabular dataframe

Test

test all process : data + model

test the data validity

test the model validity

Model training process

Experiment

R1Plus1D

Slowfast

ViViT

Transformer

CnnLSTM

MLSTM-FCN

Multimodal model

Multimodal model with Gradient Blending

case : R2Plus1D

case : SlowFast

case : ViViT