Research



Most Recent Research


Exploratory Analysis of Magnetic Polarity Inversion Line Metadata and Eruptive Characteristics of Solar Active Regions

Berkay Aydin, Anli Ji, Nigar Khasayeva, Rafal A. Angryk, Petrus Martens, Manolis K. Georgoulis

We have built a systematic and comprehensive dataset of polarity inversion lines from line-of-sight magnetograms in HMI Active Region Patches (HARPs) data series ranging from May 2010 to December 2021. This dataset includes PIL-related binary masks as rasters (i.e., thinned PIL, the region of polarity inversion (RoPI), and the convex hull of PIL) and time series metadata extracted from these masks. We primarily use the definitive series (with 12-min cadence) that are mapped to the Lambert Cylindrical Equal Area (CEA) projection, where each CEA pixel roughly covers the same physical area on the Sun. The PIL detection procedure is based on an edge detection technique along with magnetic field strength and PIL size filter. First, we identify positive and negative polarity regions with a magnetic field strength threshold. Then, we utilize the Canny edge detector and morphological operations to both positive and negative regions to identify coarse PILs. Finally, we generate PILs by applying magnetic field strength and PIL size filter to the coarse PILs as mentioned above. We envision that this comprehensive PIL dataset will benefit space weather analytics research, specifically in understanding the PIL structure and evolution, and complementing the existing datasets used for space weather forecasting.

Available here

Solar Flare Forecasting with Deep Learning-based Time Series Classifiers

Anli Ji , Junzhi Wen , Rafal Angryk , Berkay Aydin

Over the past two decades, machine learning and deep learning techniques for forecasting solar flares have generated great impact due to their ability to learn from a high dimensional data space. However, lack of high quality data from flaring phenomena becomes a constraining factor for such tasks. One of the methods to tackle this complex problem is utilizing trained classifiers with multivariate time series of magnetic field parameters.

In this work, we compare the exceedingly popular multivariate time series classifiers applying deep learning techniques with commonly used machine learning classifiers (i.e., SVM).

We intend to explore the role of data augmentation on time series oriented flare prediction techniques, specifically the deep learning-based ones. We utilize four time series data augmentation techniques and couple them with selected multi- variate time series classifiers to understand how each of them affects the outcome. In the end, we show that the deep learning algorithms as well as augmentation techniques improve our classifiers performance. The resulting classifiers’ performance after augmentation outplayed the traditional flare forecasting techniques.

Available at TBD

A Modular Approach to Building Solar Energetic Particle Event Forecasting Systems

Anli Ji , Akhil Arya , Dustin Kempton , Rafal Angryk , Manolis K. Georgoulis , Berkay Aydin

Unlike common predictions that focus on the occurrence of an event, an All-Clear forecast puts more emphasis on predicting the absence of an event. Such forecasts, while usually not addressed directly, can be crucial in operational environments. We have developed an All-Clear SEP event prediction system utilizing active region- based prediction methods together with active region scenarios (i.e., location and complexity). Within our All-Clear forecast system, signals are generated only when requested as binary predictions of YES or NO indicating “All Clear” or “Not All Clear”, respectively. Such signals referred to the potential possibility of the occurrence of any events in the next prediction window, in our cases, the next 24 hours.

Available here

Figure. An illustration of the prediction workflow for a hypothetical set of active regions and multivariate time series parameters (MVTS), derived from their NRT Magnetogram Patches.

Four space weather event forecasting modules are established corresponding to the flare prediction (FP), eruptive flare prediction (ERP), CME speed prediction, and full-disk aggregation methodology. All of them are loosely coupled without direct communications between each other using microservices. Our system design follows a modular approach for flexibility, maintainability, and extensibility that can be configured to utilize various data access mechanisms, such as file storage or database systems, outside the confines of our system.

Towards Coupling Full-disk and Active Region-based Flare Prediction for Operational Space Weather Forecasting

Chetraj Pandey; Anli Ji, Rafal A. Angryk, Manolis Georgoulis, Berkay Aydin

We present a set of new heuristic approaches to train and deploy an operational solar flare prediction system for ≥M1.0-class flares with two prediction modes: full-disk and active region-based. In full-disk mode, predictions are performed on full-disk line-of-sight magnetograms using deep learning models whereas in active region-based models, predictions are issued for each active region individually using multivariate time series data instances. The outputs from individual active region forecasts and full-disk predictors are combined to a final full-disk prediction result with a meta-model. We utilized an equal weighted average ensemble of two base learners’ flare probabilities as our baseline meta learner and improved the capabilities of our two base learners by training a logistic regression model.

Available here

The major findings of this study are: (i) We successfully coupled two heterogeneous flare prediction models trained with different datasets and model architecture to predict a full-disk flare probability for next 24 hours, (ii) Our proposed ensembling model, i.e., logistic regression, improves on the predictive performance of two base learners and the baseline meta learner measured in terms of two widely used metrics True Skill Statistic (TSS) and Heidke Skill Score (HSS), and (iii) Our result analysis suggests that the logistic regression-based ensemble improves on the full-disk model (base learner) by ∼ 9% in terms TSS and ∼ 10% in terms of HSS. Similarly, it improves on the AR-based model (base learner) by ∼ 17% and ∼ 20% in terms of TSS and HSS respectively. Finally, when compared to the baseline meta model, it improves on TSS by ∼ 10% and HSS by ∼ 15%.

Figure. A timeline diagram to present the problem formulation of our deep learning-based full-disk flare prediction model using bi-daily observations of full-disk line-of-sight magnetograms and prediction window of 24 hours considered to label the magnetogram instances.



Deep Neural Networks based Solar Flare Prediction using Compressed Full-disk Line-of-sight Magnetograms

Chetraj Pandey; Rafal A. Angryk; Berkay Aydin

We selected three prediction modes, among which two are binary for predicting the occurrence of ≥M1.0 and ≥C4.0 class flares and one is a multi-class mode for predicting the occurrence of <C4.0, [≥C4.0, <M1.0] and ≥M1.0 within the next 24 hours. We perform our experiments in all three modes using three well-known pre-trained CNN models—AlexNet, VGG16 and ResNet34. For this, we collected compressed 8-bit images derived from full-disk line-of-sight magnetograms provided by the Helioseismic and Magnetic Imager (HMI) instrument onboard Solar Dynamics Observatory (SDO). We trained our models using data-augmented oversampling to address the existing class-imbalance issue by following a time-segmented cross-validation strategy to effectively understand the accuracy performance of our models and used true skill statistics (TSS) and Heidke skill score (HSS) as metrics to compare and evaluate.

Available here

Figure. An overview of three deep learning architectures we use (a) AlexNet-, (b) VGG16-, (c) ResNet34-based models for both the binary and multi-class flare prediction. Models produce a set of probabilities determined based on the prediction mode.

The major results of this study are (1) we successfully implemented an efficient and effective full-disk flare predictor for operational forecasting using compressed images of solar magnetograms; (2) Our candidate model for multi-class flare prediction achieves an average TSS of 0.36 and average HSS of 0.31. Similarly, for binary prediction in (i) ≥C4.0 mode: we achieve an average TSS score of 0.47 and HSS score of 0.46 (ii) ≥M1.0 mode: we achieve an average TSS score of 0.55 and HSS score of 0.43.

Solar Flare Forecasting with Deep Neural Networks using Compressed Full-disk HMI Magnetograms

Chetraj Pandey; Rafal A. Angryk; Berkay Aydin

We present a solution to full-disk flare prediction using compressed magnetogram images, which was performed by training a set of CNNs to perform operations-ready flare forecasts. We selected two prediction modes, which are both binary for predicting the occurrence of ≥M1.0 and ≥C1.0 class flares within the next 24 hours.

For this, we used the pre-trained AlexNet model and collected compressed images derived from solar magnetograms provided by the Helioseismic and Magnetic Imager (HMI) instrument onboard Solar Dynamics Observatory (SDO).

Figure. Architecture of our AlexNet-based flare prediction model.

We followed two time-segmented cross-validation strategies: chronological and non-chronological, to effectively understand the predictive skill of our models. We also trained our models using data-augmentation and oversampling to address the existing class imbalance issue and used true skill statistic (TSS) and Heidke skill score (HSS) as metrics to compare and evaluate. Our experimental evaluation suggests that training a flare prediction model is heavily influenced by the sampling strategies involved due to the imbalanced nature of the datasets and predicting ≥M1.0 class flares is a more challenging task compared to ≥C1.0 ones.

Available here




Multiscale IoU:

A Metric for Evaluation of Salient Object Detection with Fine Structures

A. Ahmadzadeh, D. J. Kempton, Y. Chen, R. A. Angryk


Available at IEEE and ArXiv





A Framework for Local Outlier Detection from Spatio-Temporal Trajectory Datasets

X Cai, B Aydin, A Ji, R Angryk

We develop an interpretable, clustering-based technique to detect local outliers in multi-type trajectory datasets by utilizing spatial and temporal attributes of moving objects. This local outlier detection involves three phases. First, we apply a temporal partition to divide the raw trajectory into multiple trajectory segments and extract trajectory features from spatial and temporal attributes for each trajectory segment. Second, we generate template features of trajectory segments by applying a clustering schema. Lastly, we use the abnormal score - a novel dissimilarity measure, which quantifies the disparity among the query and template trajectory segments in terms of trajectory features and hence determines the local outliers based on the distribution of abnormal score.

To demonstrate the effectiveness of our method, we conducted three case studies on the real-life spatio-temporal trajectory datasets from the solar astroinformatics domain. Those are solar active regions, coronal mass ejections, polarity inversion lines (PIL). Our experimental results show that our local outlier detection approach can effectively discover the erroneous reports from the reporting module and abnormal phenomenon in various spatio-temporal trajectory datasets.



How to Train Your Flare Prediction Model: Revisiting Robust Sampling of Rare Events

A. Ahmadzadeh, B. Aydin, M. Georgoulis, D. J. Kempton, S. S. Mahajan, and R. A. Angryk

We have been working on a case study of solar flare forecasting by means of metadata feature time series. We treat this data as a prominent class-imbalance and temporally coherent problem. We take full advantage of pre-flare time series in solar active regions, which is made possible thanks to the Space Weather Analytics for Solar Flare benchmark dataset, known as SWAN-SF. This benchmakr dataset is a partitioned collection of multivariate time series of active region properties comprising 4075 regions and spanning over 9 years of the Solar Dynamics Observatory (SDO) period of operations.

Twelve consecutive time series slices for the parameter Total Unsigned Current Helicity (TOTUSJH) corresponding to an M1.0-class flare associated to NOAA AR 11875 (HARP 3291). Each time series spans over 12 hours of observation, with a 12-minute cadence.

We showcase the general concept of temporal coherence (figure above) triggered by the demand of continuity in time series forecasting and show that lack of proper understanding of this effect may spuriously enhance models’ performance.


We further address another well-known challenge in rare event prediction, namely, the class-imbalance issue. The SWAN-SF is an appropriate dataset for this, with a 60:1 imbalance ratio for GOES M- and X-class flares and a 800:1 for X-class flares against flare-quiet instances (figure on the right). We revisit the main remedies for these challenges and present several experiments to illustrate the exact impact that each of these remedies may have on performance.

This study is now in press, but a pre-print is available here.

This study is published in The Astrophysical Journal Supplement Series, here.




All-Clear Flare Prediction Using Interval-based Time Series Classifiers

Anli Ji, Berkay Aydin, Manolis K. Georgoulis, Rafal Angryk


An all-clear flare prediction is a type of solar flare forecasting that intends to predict relatively small flares and flare quiet regions. This type of prediction focuses more on forecasting the non-flaring class more precisely instead of simply a binary or probabilistic estimation of whether a flare will occur. While many flare prediction studies do not address this problem directly, all-clear predictions can be useful in operational context. However, in all-clear predictions, finding the right balance between avoiding false negatives (misses) and reducing the false positives (false alarms) is often challenging.

We put more emphasis on predicting non-flaring instances with high precision while still maintaining valuable predictive results. Our study focuses on training and testing a set of interval-based time series classifiers named Time Series Forest (TSF). These classifiers will be used towards building an all-clear flare prediction system by utilizing multivariate time series data. An ensemble schema and overview of the system is shown in the figure.

Schematic overview of the homogeneous ensemble pipeline

Our research is built around three branches: data collection, predictive model building and evaluation processes, and comparing our time series classification models with baselines using our benchmark datasets. Our results show that time series classifiers provide better forecasting results in terms of skill scores, precision and recall metrics, and they can be further improved for more precise all-clear forecasts by tuning model hyperparameters.

This study is published in IEEE Big Data 2020, and is accessible here, and also on arXiv here.


Multivariate time series dataset for space weather data analytics

Rafal A. Angryk, Petrus C. Martens, Berkay Aydin, Dustin Kempton, Sushant S. Mahajan, Sunitha Basodi , AzimAhmadzadeh1, XuminCai, Soukaina Filali Boubrahimi, Shah Muhammad Hamdi, Michael A. Schuh & Manolis K.Georgoulis

We introduce and make openly accessible a comprehensive, multivariate time series (MVTS) dataset extracted from solar photospheric vector magnetograms in Space weather HMI Active Region Patch (SHARP) series. Our dataset also includes a cross-checked NOAA solar flare catalog that immediately facilitates solar flare prediction efforts. We discuss methods used for data collection, cleaning and pre-processing of the solar active region and flare data, and we further describe a novel data integration and sampling methodology.

Our dataset covers 4,098 MVTS data collections from active regions occurring between May 2010 and December 2018, includes 51 flare-predictive parameters, and integrates over 10,000 flare reports. The immediate tasks enabled by the disseminated dataset include: optimization of solar flare prediction and detailed investigation for elusive flare predictors or precursors, with both operational (research-to-operations), and basic research (operations-to-research) benefits potentially following in the future.





This study is published in Scientific Data, Nature, and is publicly accessible here.

Overview of our 4-step flare data enhancement and cross-checking procedures as well as accompanied enhancements after each step (brief explanations also provided). The cross-checking with secondary flare data sources (SSW Latest Events and Hinode-XRT) results in three sets of flare reports: (1) primary-verified, where the locations of the primary flare reports (from GOES) are verified by at least one secondary source; (2) secondary-verified, where GOES reported locations could not be verified but SSW and XRT reported locations are in agreement; and (3) non-verified, where flare location from any of the three data sources cannot be verified.