anomalies_solar_orbiter
https://github.com/imperialcollegelondon/anomalies_solar_orbiter
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.4%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: ImperialCollegeLondon
- License: mit
- Language: HTML
- Default Branch: main
- Size: 1.89 MB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
Readme.md
ESA's Solar Orbiter

This is a project to analyze data from ESA's Solar Orbiter for a period of 2 years, from 1st of January 2022 to 1st of January 2024.
All the data needed has already been provided in the Data folder as Solar_Orbiter.csv.
Primary Tasks
There are two primary tasks which I want to perform: 1. Build a Dashboard of all the instruments' behavior over time. 2. Detect anomalies in the data to understand on which dates the spacecraft was doing something interesting.
Setup Instructions
To get started, ensure you have Python3 installed on your system. You can download the latest stable version from here: Python Download
Environment Setup
Build a virtual environment (optional) - This will help keep your project isolated from other python modules on your device.
- Make a Virtual Environment by typing this in your console:
python3 -m venv my_project - Activate it
source my_project/bin/activate - Change
my_projectwith whatever name you want to give it.
- Make a Virtual Environment by typing this in your console:
Install all the dependencies of the project by typing this in your console:
pip install -r Setup_File/Requirements.txt
To quickly run everything and see main results, run this in terminal:
- python3 Run_Ml_Models.py after changing working directory by cd Python_Scripts
- python3 Dashboard.py while being in the same Python_Scripts directory. ( Confirm it using, pwd )
Understanding the Data
Open the Data Folder. It has Solar_Orbiter.csv. This file contains, per day, mean values of:
- Radial Distance from Sun (AU)
- Electronic Box Temperature (DegC)
- Out Board Sensor Temperature (DegC)
- In Board Sensor Temperature (DegC)
- Search Coil Magnetometers Temperature (DegC)
- Solar Array Angle (Deg)
- High Gain Antenna azimuth (Deg)
Detecting Anomalies
TO KEEP THIS BRIEF(ER), I HAVE PROVIDED MORE DETAILED EXPLANATIONS IN CODE COMMENTS
- Open the Python_Scripts folder.
- Here, you will find a file called
Run_Ml_Models.py. - Running this in the terminal from the
Python_Scriptswill detect all the anomalies within the dataset using the Isolation Forest model. The output will be stored as
Solar_Orbiter_With_Anomalies.csvin theDatafolderThe Isolation Forest algorithm is an unsupervised learning algorithm for anomaly detection that works by:
Randomly selecting a feature and a split value between the maximum and minimum values of that feature.
Repeating this process recursively to create a tree-like structure.
Anomalies are isolated in the tree with a shorter path length, i.e., fewer splits.

References: - Original Paper - Scikit-learn Documentation
Understanding Anomalies with SHAP
- SHAP values are used to explain the decisions of the Isolation Forest model.
- SHAP (SHapley Additive exPlanations) values derive from game theory and provide insights into the contribution of each feature to a specific prediction made by the model.
- The formula for calculating SHAP values is,

- As can be seen, we find the marginal contribution of each feature, and multiply it by the inverse of prouct of permutations of all possible sets of data and the set of data selected.
- These will be calculated upon running the
Run_Ml_Models.pyfile from thePython_Scriptsdirectory. - The functions to support calculations of these are put in a separate file, called,
Helpers.py.
- Visualization:
- A visualization for the mean absolute value of SHAP values to get feature importance is created and stored in the
Python_Scripts/Explainabilityfolder, based on section 9.6.5 of textbook of Interpretable ML Book - It shows that Temperature of the Outboard sensor causes the maximum amount of output change in predicting anomalies
References: - Interpretable ML Book - SHAP (Section 9.6.5 SHAP Feature Importance, Section 9.6 SHAP) - PyData Conference on SHAP - Tel Aviv - SHAP Documentation
- Output:
- We will have
Solar_Orbiter_With_Anomalies.csvwithin the Data folder saved ( This contains the original database with anomaly scores explained in code) - We will have
Shap_Values_Plot.htmlsaved inPython_Scripts/Explainability, containing the visualisation of Feature importance
Dashboarding the Data
- Local run - The dashboard, can be run on a local server on your own system by running the
Dashboard.py filewithin thePython_Scriptsfolder - The Dashboard consists of 4 key visualisations,
- Time Series Chart: This shows how each of the features varies over time and gives us insights about how the data looks overall
- Correlation Heatmap: This calculates correlation coefficient between several features and displays it in the form of a heatmap. Interestingly, Solar Array Angle is highly correlated to Radial Distance from the sun. This is because, the Solar Arrays change their angle to point in the direction of the sun.
- Anomaly Score Chart: This is used to find the anomalous dates within the spacecraft. Lower the score, more anomalous the date. Interestingly, 4 May to 11 May are identified as anomalous dates by the model. This is consistent with the fact that the spacecraft was having high noise period around that time ( https://www.cosmos.esa.int/web/soar/support-data )
Feature Importance Plot: This is simply embedded from the explainability folder using the Dash HTML component Iframe
Deploying using Render - I have deployed the dashboard on the web using render, largely by following a tutorial
Please follow this tutorial for doing the same https://www.youtube.com/watch?v=XWJBJoV5yww&t=0s
For the same, you will find the entire Dashboard named as app and all the needed things within src folder in the
Deploy_With_RenderfolderCopy the DeployWithRender directory and open it in a separate project to avoid nested git repositories
Ensure you have dash-tools installed, it is their in requirements.txt ( so I am assuming it is installed or do pip install dash-tools)
type
dashtools guiin terminalGo to Deploy section on the newly opened page
Open your file there, by putting the path of your folder in the text box
Follow the instruction further in the tutorial and you will be able to deploy it, just like this: https://my-render-jh3k.onrender.com/
Scalability
- Memory Profiling:
- We use the memoryprofiler library to do memory profiling
- The results are stored in
Scalability/Memory_Profiling - From the results, it can be seen that within the dashboard, every line involves about 120 Mib of memory while callback requires 120 Mib recurrently
- Also, within the
Run_Ml_Models.pycalculating shap values and fitting the model are the most memory intensive tasks - Interestingly, as seen in
Scalability/Plot_ML_Model, there is a growth and decline in memory usage for RunMlModels but, no decline for Dashboard. - You can reproduce these results by reading the comements in the 'Dashboard.py' file.
You will simply need to uncomment 2 lines to be able to reproduce these results.
Reference: https://pypi.org/project/memory-profiler/
Reference: https://github.com/pythonprofilers/memory_profiler
- Time Profiling:
- We use the cProfile package for doing time profiling
- The results are stored in
Scalability/Time_Profiling - To reproduce, simply follow the instructions at the bottom of the code for
Dashboard.pyandRun_Ml_Models.py - you can interpret the results using snakeviz as mentioned there
- It shows the time required to load the dashboard completely along with breakdown of time required by different components
It shows the time required to run the model and get the shaply values with visualisation with breakdown
Reference: https://docs.python.org/3/library/profile.html
Access the Dashboard at link:
Deployed dashboard link: Dashboard ("The Server is free and hence needs to restart after giving sometime to reload, will buy a paid server for better deployment in next version")
Security and License Please read the License to ethically and safely reproduce the repository. Please read Security policy to report any security issues. Please report any Issues in the issues section and I will try to fix it soon.
Owner
- Name: Imperial College London
- Login: ImperialCollegeLondon
- Kind: organization
- Email: icgithub-support@imperial.ac.uk
- Location: Imperial College London
- Repositories: 311
- Profile: https://github.com/ImperialCollegeLondon
Imperial College main code repository
Citation (Citation.cff)
cff-version: 1.0.0 message: "If you use this software, please cite it as below." authors: - family-names: "Jain" given-names: "Rishabh" orcid: "https://orcid.org/0009-0007-5652-0602" title: "Solar_Orbiter_Anomalies" version: 1.0.7 date-released: 2024-05-10 url: "https://github.com/Rishie123/Solar_Orbiter_Anomalies"
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0