FAIR multi-label classification

ECML PKDD 2021 TUTORIAL


Dragi Kocev · JOŽEF STEFAN INSTITUTE · Dragi.Kocev@ijs.si
Jasmin Bogatinovski · Technical University Berlin · Jasmin.Bogatinovski@tu-berlin.de
Ana Kostovska · JOŽEF STEFAN INSTITUTE · Ana.Kostovska@ijs.si
Panče Panov · JOŽEF STEFAN INSTITUTE · Pance.Panov@ijs.si

Virtual

13 - 17 September


Description



Multi-label classification (MLC) is a machine learning task where the goal is to label an example with multiple labels simultaneously. It receives an increasing interest from the machine learning community as evidenced by the increase of the number of papers (hence, the number of MLC methods and MLC problems is also increasing) -- MLC is widely used in life sciences and medicine, environmental sciences, text classification, semantic annotation of images and videos, business and engineering. Hence, ensuring a proper, correct, robust, trustworthy benchmarking is of utmost importance for the further development of the field. We believe that this can be achieved through adhering to the cutting edge standards of data management such as the FAIR (Findable, Accessible, Interoperable, and Reusable) and TRUST (Transparency, Responsibility, User focus, Sustainability, and Technology) principles. In the first part of the tutorial, we will outline the main principles in designing a repository that closely follows the FAIR and TRUST principles. Its central part is an ontology-based annotation schema for semantic annotation of MLC datasets. In the second part of the tutorial, we will present the execution of and the findings from a large benchmarking study of > 20 MLC methods evaluated on > 40 datasets using the whole range of 18 evaluation measures. Such a study draws the landscape of the task of MLC by identifying the state-of-the-art MLC methods; outlining the strengths and weaknesses of the MLC methods; and performing a meta learning study to further enhance our understanding of the MLC task.


Tutoral outline

  • Opening of the tutorial
  • Introduction to MLC
  • Knowledge and data representation, and FAIR principles
  • FAIR MLC data representation and repositories
  • Coffee break
  • Comprehensive empirical study of MLC methods
  • Landscaping MLC through meta learning on the empirical results
  • Summary remarks outlining the open challenges in MLC

Presenters


Dragi Kocev · JOŽEF STEFAN INSTITUTE · Dragi.Kocev@ijs.si

Dragi Kocev is a researcher at the Department of Knowledge Technologies, JSI. He completed his PhD in 2011 at the JSI Postgraduate School in Ljubljana on the topic of learning ensemble models for predicting structured outputs. He was a visiting research fellow at the University of Bari, Italy in 2014/2015. His research interests lie in the development of methods for analysis of complex data and their application in a variety of domains (incl. life sciences, environmental sciences and engineering). His work considers developing human-centric AI methods that are trustworthy and provide explainability/unerstandability for the domain experts. He has participated in several national Slovenian projects, the EU funded projects IQ, PHAGOSYS and HBP. He was co-coordinator of the FP7 FET Open project MAESTRA. He is currently the Principal Investigator of two ESA funded projects: GALAXAI – Machine learning for spacecraft operation and AiTLAS – AI prototyping environment for EO. He has been member of the PC of premium ML conferences (e.g., DS, ECML PKDD, AAAI, IJCAI, KDD) and member of the editorial board of Data Mining and Knowledge Discovery, Machine Learning Journal, Expert Systems with Applications (as Action Editor) and Ecological Informatics. He served as PC co-chair for DS 2014 and Journal track co-chair for ECML PKDD 2017.

Jasmin Bogatinovski · Technical University Berlin · Jasmin.Bogatinovski@tu-berlin.de

Jasmin Bogatinovski is a research associate in the group of Distributed Operating Systems at TU Berlin. He received his MSc. title in Computer Science from the IPS Jo\v{z}ef Stefan in 2019 while working at the Department of Knowledge Technologies as collaborator. His research interest are in the areas of: artificial intelligence, machine learning and distributed systems. More specifically, he is interested in the area of meta learning and its impact across different learning tasks including single-target, multi-target classification/regression and the anomaly detection task. From the practical aspects he is interested in developing and applying novel machine learning methods in the domain of IT operation (AIOps).

Ana Kostovska · JOŽEF STEFAN INSTITUTE · Ana.Kostovska@ijs.si

Ana Kostovska is a PhD student at the Department of Knowledge Technologies at the Jozef Stefan Institute. She is a holder of the Young Researcher Grant awarded by the Slovenian research agency. Her research interests lie in the fields of machine learning, and knowledge representation and reasoning. Ana’s research work has focused on formalizing the knowledge in a variety of domains (i.e., machine learning, processed-based modelling, optimization) in the form of ontologies. Her main goal is to improve reusability and reproducibility of the research resources and develop benchmarking systems that rely on the use of semantic web technologies. She has been involved in several national and international research projects (e.g., IMPERATRIX, TAILOR) and is currently working on two ESA funded projects: GalaxAI and AITLAS.

Panče Panov · JOŽEF STEFAN INSTITUTE · Pance.Panov@ijs.si

Panče Panov is a researcher at JSI KT and Assistant Professor at the Jožef Stefan International Postgraduate School and Faculty of Information Studies in Novo Mesto. He completed his PhD in 2012 in data mining at the JS International Postgraduate School, Ljubljana. His research interests include ML, data mining, knowledge discovery and ontologies for describing them in various applications. He is the principal investigator of the national basic research project IMPERATRIX (2018-2021) and was involved in several other national and EU projects (e.g. IQ, SUMO, MAESTRA). He published over 30 scientific publications and was the co-editor of one book and one journal special issue, both published by Springer. He was program co-chair of the International Conference on Discovery Science in 2014.