ML Datasets as Synthetic Cognitive Experience Records

Authors

  • H. Castro
  • M. T. Andrade

Keywords:

Machine-Learning, Datasets, Cyber-physical, Synthetic Cognition, Metadata

Abstract

Machine Learning (ML), presently the major research area within Artificial Intelligence, aims at developing tools that can learn, approximately on their own, from data. ML tools learn, through a training phase, to perform some association between some input data and some output evaluation of it. When the input data is audio or visual media (i.e. akin to sensory information) and the output corresponds to some interpretation of it, the process may be described as Synthetic Cognition (SC). Presently ML (or SC) research is heterogeneous, comprising a broad set of disconnected initiatives which develop no systematic efforts for cooperation or integration of their achievements, and no standards exist to facilitate that. The training datasets (base sensory data and targeted interpretation), which are very labour intensive to produce, are also built employing ad-hoc structures and (metadata) formats, have very narrow expressive objectives and thus enable no true interoperability or standardisation. Our work contributes to overcome this fragility by putting forward: a specification for a standard ML dataset repository, describing how it internally stores the different components of datasets, and how it interfaces with external services; and a tool for the comprehensive structuring of ML datasets, defining them as Synthetic Cognitive Experience (SCE) records, which interweave the base audio-visual sensory data with multilevel interpretative information. A standardised structure to express the different components of the datasets and their interrelations will promote re-usability, resulting on the availability of a very large pool of datasets for a myriad of application domains. Our work thus contributes to: the universal interpretability and reusability of ML datasets; greatly easing the acquisition and sharing of training and testing datasets within the ML research community; facilitating the comparison of results from different ML tools; accelerating the overall research process.

Downloads

Download data is not yet available.

Downloads

Published

2018-08-01

How to Cite

H. Castro, & M. T. Andrade. (2018). ML Datasets as Synthetic Cognitive Experience Records. International Journal of Computer Information Systems and Industrial Management Applications, 10, 15. Retrieved from https://cspub-ijcisim.org/index.php/ijcisim/article/view/593

Issue

Section

Original Articles