The Data Mining Lab at Georgia State University performs research on the storage, processing, retrieval, and analysis of massive, real-life data with highly dynamic spatial and temporal characteristics. Members of this lab work in close collaboration with experts from solar physics, astronomy, business, geosciences, statistics, and other fields.  This inter-departmental/multi-institutional collaborative environment provides the lab members with numerous opportunities to work on problems that are not only interesting in a computer science perspective, but also impact other areas of scientific understanding.






Courtesy: NASA (svs.gsfc.nasa.gov)


Some topics that the Data Mining Lab has worked on include:

  • (Un)Supervised Classification/Clustering
  • Fuzzy (overlapping) multi-class data
  • Ensemble learning (fusion of classifiers)
  • Frequent Pattern Mining (Co-location patterns)
  • Parallel/Distributed/GPU Computing
  • Information Visualization (and presentation)
  • Spatial/Temporal Databases (OpenGIS systems)
  • Time Series analysis
  • Linear Regression Models and Trends
  • Dimensionality Reduction
  • Multidimensional Database Indexing
  • Knowledge Rule Mining (empirical findings and hypothesis validation)

Data Challenge 2019

The Upcoming Data Challenge

We are now organizing a Big Data Cup Challenge on Solar Flare Prediction as part of IEEE BigData 2019. The goal of this dataset competition is to introduce the machine learning/data mining community to an integrated dataset that can be utilized for predicting and understanding solar flares.

The winning prediction method(s) will be evaluated on the following:

  • 30% coming from their rank on the private leaderboard,
  • 10% from their rank on the public leaderboard, and
  • 60% from the quality of the accompanying paper describing their methods and results.

After the competition phase is completed, a link for the submission of the accompanying academic paper will be provided to the top 10 participants as ranked by the public/private leaderboard weighting described above. The academic papers will be ranked by peer reviewers and a final decision will be made using the weighting method detailed above.






As a prize for this competition, we will pay for the conference registration fee for the top 2 up to 3 finalists to attend the IEEE BigData 2019 conference to present their work, and have their accompanying academic paper published in the conference proceedings.

The data competition is hosted through Kaggle and the participants classification results are submitted through their platform. Teams will be limited to 2 submissions a day.