Publication Details

Are we meeting a deadline? classification goal achievement in time in the presence of imbalanced data

HLOSTA, M.; ZDRÁHAL, Z.; ZENDULKA, J. Are we meeting a deadline? classification goal achievement in time in the presence of imbalanced data. KNOWLEDGE-BASED SYSTEMS, 2018, vol. 2018, no. 160, p. 278-295. ISSN: 0950-7051.
Czech title
Splníme termín? klasifikace dosažení cíle v čase při nevyvážených datech
Type
journal article
Language
English
Authors
Hlosta Martin, Ing., Ph.D.
Zdráhal Zdeněk
Zendulka Jaroslav, doc. Ing., CSc. (UIFS)
URL
Keywords


Classification, imbalanced data, learning analytics, educational data mining

Abstract

This paper addresses the problem of a finite set of entities which are required
to achieve a goal within a predefined deadline. For example, a group of students
is supposed to submit a homework by a specified cutoff. Further, we are
interested in predicting which entities will achieve the goal within the
deadline. The predictive models are built based only on the data from that
population. The predictions are computed at various time instants by taking into
account updated data about the entities. The first contribution of the paper is
a formal description of the problem. The important characteristic of the proposed
method for model building is the use of the properties of entities that have
already achieved the goal. We call such an approach "Self-Learning". Since
typically only a few entities have achieved the goal at the beginning and their
number gradually grows, the problem is inherently imbalanced. To mitigate the
curse of imbalance, we improved the Self-Learning method by tackling information
loss and by several sampling techniques. The original Self-Learning and the
modifications have been evaluated in a case study for predicting submission of
the first assessment in distance higher education courses. The results show that
the proposed improvements outperform the specified two base-line models and the
original Self-Learner, and also that the best results are achieved if
domain-driven techniques are utilised to tackle the imbalance problem. We also
showed that these improvements are statistically significant using Wilcoxon
signed rank test.

Published
2018
Pages
278–295
Journal
KNOWLEDGE-BASED SYSTEMS, vol. 2018, no. 160, ISSN 0950-7051
DOI
UT WoS
000446283900022
EID Scopus
BibTeX
@article{BUT155093,
  author="Martin {Hlosta} and Zdeněk {Zdráhal} and Jaroslav {Zendulka}",
  title="Are we meeting a deadline? classification goal achievement in time in the presence of imbalanced data",
  journal="KNOWLEDGE-BASED SYSTEMS",
  year="2018",
  volume="2018",
  number="160",
  pages="278--295",
  doi="10.1016/j.knosys.2018.07.021",
  issn="0950-7051",
  url="https://www.sciencedirect.com/science/article/pii/S0950705118303496"
}
Files
Back to top