Solar Flare Prediction from Time Series Data 2020

A Track in the IEEE Big Data 2020 Big Data Cup Challenge

Task Background Information

This competition is an extension of last year's competition (Solar Flare Prediction from Time Series of Solar Magnetic Field Parameters). There have been several changes implemented to both the dataset and the problem that you are being asked to solve. For instance, in the SABiD 2019 that our competition was a part of, improvements in the integration of solar flares and active regions was presented by one of the participants in An Application of Spatio-temporal Co-occurrence Analyses for Integrating Solar Active Region Data from Multiple Reporting Modules. The updated flare to active region catalog produced by their work is integrated into the dataset for this year's competiton.

In addition to dataset updates, we have update the problem to be that of multi-class classification, instead of binary classification. This change in the problem for the first phase of the competition is due to findings in the two papers published since the start of last year's competition Rare-Event Time Series Prediction: A Case Study of Solar Flare Forecasting and Challenges with Extreme Class-Imbalance and Temporal Coherence: A Study on Solar Flare Data. These two papers perform an extensive analysis of the affects of various sampling decisions on training solutions for class imbalance on this dataset. Since some of their more interesting results utilized over/under sampling while preserving the climatology of the underlying sub-classes, we felt the added information would be useful for this year's competition.

The second stage of the competition, where participants will submit a regular paper for peer review, will again be the binary classification task presented last year, with the addition of us providing the solution file for the public and private leaderboard datasets so that participants can calculate the requested evaluation metrics (we will update the submissions page with the required metrics once phase two starts). As such, it may be useful to review the winners of last year's competition. Their papers are listed below (in no particular order):

As stated in the competition overview, our dataset mainly relies on Spaceweather HMI Active Region Patches (SHARPs) available from the Joint Science Operations Center (JSOC). This data product stems from solar vector magnetograms obtained by the Helioseismic Magnetic Imager (HMI) onboard the Solar Dynamics Observatory (SDO). The processed dataset provided for this competition is a set of magnetic field parameters calculated from individual SHARPs. We have transformed the SHARPs input data into multivariate time series (MVTS) of magnetic field parameters, and have sliced these resultant MVTS data series into records of twelve hours in length with a sample cadence of twelve minutes. The sliced MVTS are annotated with class labels, and for the purposes of the BigData Cup Challenge, there shall be five classes for participants to differentiate from in the first phase, with these five later grouped to two in the second phase of the competition. For the first phase, the classes shall be labeled as a MVTS slice from a SHARP that has an M,X,C,B-class flare occurring within the next twenty-four hours, or one that does not have any of those (considered flare quiet or no flare in this competition).

In the second phase of the competition, the classes shall be labeled as a MVTS slice from a SHARP that has a major solar flare (M- or X-class) occurring within the next twenty-four hours, or one that does not (i.e., it may have B- or C-class flares or no flares at all). It is important to note that large flares (M- or X-class), which are the most commonly targeted in predictive analyses, are scarce. In our dataset, there is a large imbalance between our flaring and non-flaring classes with ~4K flaring samples and ~192K non-flaring samples in the training set. It is safe to assume that similar imbalances continue in the testing data, but specifics of this shall remain a closely held secret for the duration of this competition.