A graphically based machine learning approach to predict secondary schools performance in Tunisia

Periodical
Socio-Economic Planning Sciences
Volume
70
Year
2020
Relates to study/studies
PISA 2012

A graphically based machine learning approach to predict secondary schools performance in Tunisia

Abstract

The main purpose of this paper is to identify the key factors that impact schools' academic performance and to explore their relationships through a two-stage analysis based on a sample of Tunisian secondary schools. In the first stage, we use the Directional Distance Function approach (DDF) to deal with undesirable outputs. The DDF is estimated using Data Envelopment Analysis method (DEA). In the second stage we apply machine-learning approaches (regression trees and random forests) to identify and visualize variables that are associated with a high school performance. The data is extracted from the Program for International Student Assessment (PISA) 2012 survey. The first stage analysis shows that almost 22% of Tunisian schools are efficient and that they could improve their students' educational performance by 15.6% while using the same level of resources. Regression trees findings indicate that the most important factors associated with higher performance are school size, competition, class size, parental pressure and proportion of girls. Only, school location appears with no impact on school efficiency. Random forests algorithm outcomes display that proportion of girls at school and school size have the most powerful impact on the predictive accuracy of our model and hence could more influence school efficiency. The findings disclose also the high non-linearity of the relationships between these key factors and school performance and reveal the importance of modeling their interactions in influencing efficiency scores.