Using Data Mining to help with Student Retention in Further and Higher Education

Student retention is an important and high profile issue in Further and Higher Education. Students may drop out of their course for a number of reasons and their withdrawal has implications for both the institution and the student themselves. For the institution the main impacts are reduced funding and the need to maintain expensive processes to identify students at risk of withdrawal.

Whilst being widely used by large private sector companies, my impression is that data mining has not been extensively used in the education sector, with the possible exception of educational research. Although I have an interest in data mining generally, I am really interested in seeing how data mining and predictive analytics can be used in the education sector. One area where I feel it has particular relevance is in student retention. As a result, I have started to investigate whether data mining could be used to help institutions understand student retention, and crucially whether it can help them identify students most at risk of withdrawal.

Student Retention

The Higher Education Academy EvidenceNet report on retention, describes retention in the following way:

Student retention refers to the extent to which learners remain within a higher education institution, and complete a programme of study in a pre-determined time-period.

A wide range of terms is used in both the UK and internationally to describe retention and its opposite. Some tend to emphasise what might be termed the student dimension, e.g. ‘persistence’, ‘withdrawal’ and ‘student success’. By contrast, others focus on the place (e.g. retained within an institution) or the system (e.g. graduation rates) and then the responsibility shifts to either the institution or government.

Measuring Student Retention

In the UK, there are two agreed measures of retention defined by The Higher Education Funding Council for England (HEFCE).

  • Completion rate – the proportion of starters in a year who continue their studies until they obtain their qualification, with no more than one consecutive year out of higher education.
  • Continuation rate – the proportion of an institution’s intake which is enrolled in higher education in the year following their first entry to higher education.

These measures give us ways to measure student retention, and see whether retention rates are improving or getting worse. However, both measures are largely historic. They allow us to see what retention rates were like, but they don’t enable us to be proactive in managing student withdrawal.

What would be really useful then is to be able to understand the factors that influence student retention, and use this understanding to help predict where we are likely to have issues. If we could predict which courses or groups of students are likely to have retention issues, or, perhaps, identify which individual students are most likely to withdraw, this would help us to be much more proactive in managing retention. With restricted public finances squeezing the further and higher education sector, this insight would enable institutions to maximise funding as well as ensure that pastoral support is being given to the right students.

Factors Influencing Student Retention

There has been a large body of research carried out into the issue of student retention and the research generally agrees that poor student retention is linked to low levels of academic and social integration in students. Expanding this in some more detail, the following are examples which have been shown to link to these areas. (Source Higher Education Academy EvidenceNet report – Student retention and success: a synthesis of research)

  • Students are not adequately prepared for higher education, especially academically
  • Students who leave higher education often find that the programme they have enrolled in does not meet their expectations or they are simply on the wrong course.
  • Students do not feel integrated into the social environment of the institution – i.e. the extent to which students feel that they “fit in”
  • From an academic perspective, performance, personal development, academic self-esteem, enjoyment of subjects and identification with one’s role as a student all contribute to a student’s overall sense of integration into the university.

To a lesser extent, the following issues can affect retention:

  • Lack of money and concern about debt both adversely affect retention. Most studies however, confirm that finance is not the main reason why students withdraw.
  • Studies show that personal circumstances (such as mental and physical health problems, whether the student has to care for a relative or dependant) are relevant factors for some students, but they are not as significant as is sometimes assumed.

Whilst it is recognised that students who are highly integrated academically and socially are more likely to persist and complete their degrees, these concepts are abstract and difficult to measure. Many institutions carry out surveys to try and measure different facets of academic and social integration, however these are expensive and time consuming to carry out.

Application of Data Mining to the Student Retention Problem

Data mining uses algorithms with predictive capabilities which can be used to find patterns and correlations in underlying data sets. Data mining is already widely used across the private sector and customer churn analysis is the activity which is most closely related to the problem of student retention, i.e. identification of customers who are at risk of leaving the company. This is important since the cost of retaining a customer is far less than acquiring a new one. There has also been a large amount of research looking at the applicability of using data mining to identify students at risk of withdrawal, and this research shows that data mining can be applied successfully to this problem.

The key benefit of applying data mining to this problem is that often there are multiple complex factors which influence a student’s likelihood to withdraw. Data mining enables us to analyse historical data sets at an institution, identify the combination of factors which are most closely correlated with student withdrawal and build a model which allows us to predict the likelihood of individual student withdrawal in the future. In addition, it allows us to use data which changes over time (the so called “activity” data) to see whether there may be indications of increased risk of withdrawal. Together these give us a really powerful way to understand retention and a proactive way to manage retention issues.

Benefits to HE Institutions and Individual Students

Use of such a mining model would provide a higher education institution with the following benefits:

  • An understanding of the factors which influence student retention (courses, periods of time, students from a certain background etc.)
  • Understand how data which changes over time may influence a student’s risk of withdrawal (for example, their VLE usage)
  • Generate a prediction of the risk associated that an individual student will withdraw from their course
  • Enable them to intervene earlier with high risk students, design and implement appropriate intervention programmes
  • Assess which intervention programmes make a positive difference to student retention
  • Understand the impact of student withdrawal on funding

Providing individual students with an indication of their risk may also have a number of benefits:

  • Allowing students to recognise earlier whether there are patterns in their learning that indicate that they have a problem and help them to be responsible for and be more proactive in resolving the issues.
  • Enabling them to chart their academic engagement and see what impact it has on their success on their course

The proof is in the pudding

My colleagues will tell you I have a particular fondness for cakes and puddings (which is puddingtrue!).

You don’t know that something tastes good until you try it. So, over the next few months, I’ll be working on putting together a prototype to test whether data mining can be used to support student retention. In particular, I’ll be looking to try and build a solution which can be used to aggregate multiple indicators of risk and present them in a way which can help institutions make the right decisions about risk of withdrawal. Also on my radar is how we can build a solution which adds value to institutions over the standard data mining tools, such as SPSS, which are available. Presently, data mining tools require that users have a deep understanding of the different types of data mining algorithm, how to prepare data for mining and how to interpret the mining results. These present significant barriers to institutions looking to use data mining. My research will look at how we can package up flexible data mining processes which are geared towards the education sector, but simplify their adoption and implementation.

I’ll post further updates as I progress.

In the meantime, if you work for an institution in the Further or Higher Education sector or are interested in what we will be doing, I’d love to hear your thoughts – so please, drop me a line!


Higher Education Academy EvidenceNet Report: Student retention and success: a synthesis of research.

This entry was posted in Business Intelligence, Dana Mining, Learning Analytics. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *