An empirical evaluation of machine learning techniques to classify code comprehension based on EEG data

Abstract

Context: Code comprehension consists of a cognitive process in which developers invest mental effort to grasp source code snippets. The literature evidence that electroencephalogram (EEG) signals correlate with developers’ code comprehension. An effective machine learning technique that classifies code comprehension based on EEG would be a means for recommendation systems to suggest an appropriate task for the developer. Problem: However, there is a lack of empirical studies to investigate the effectiveness of machine learning techniques for classifying developers’ code comprehension based on their EEG data. Objective: This study conducts an empirical study to analyze the effectiveness of machine learning techniques to classify developers’ code comprehension trained with EEG Data. Method: A wireless EEG device was used to collect brainwaves from software developers while performing code comprehension tasks. The generated data set was used to train a K-Nearest Neighbor ( KNN), Neural Network (NN), a Naïve Bayes (NB), a Random Forest (RF), and a Support Vector Machine ( SVM) classifier. We measured the effectiveness of these techniques using precision, recall, and Furthermore, we analyzed the mean differences of f-measure between classifiers to check if the classifiers’ effectiveness is significantly higher than the f-measure mean of a random guessing than a high effectiveness threshold of 80%. Results:After we trained the classifiers using 10-fold cross-validation, the K-Nearest Neighbor classifier obtained an f-measure mean of 86% to classify code comprehension. Moreover, the results also show that KNN, NN, and RF techniques were able to classify code comprehension based on developers’ EEG data higher with an f-measure above 80%.

Publication
Expert Systems with Applications
Date
Links