Abstract: Longitudinal social surveys are widely used to understand which factors enable or constrain access to higher education. One such data resource is the Next Steps survey comprising an initial sample of 16,122 pupils aged 13–14 attending English state and private schools in 2004, with follow up annually to age 19–20 and a further survey at age 25. The Next Steps data is a potentially rich resource for studying inequalities of access to higher education. It contains a wealth of information about pupils’ social background characteristics—including household income, parental education, parental social class, housing tenure and family composition—as well as longitudinal data on aspirations, choices and outcomes in relation to education. However, as with many longitudinal social surveys, Next Steps suffers from a substantial amount of missing data due to item non-response and sample attrition which may seriously compromise the reliability of research findings. Helpfully, Next Steps data has been linked with more robust administrative data from the National Pupil Database (NPD), which contains a more limited range of social background variables, but has comparatively little in the way of missing data due to item non-response or attrition. We analyse these linked datasets to assess the implications of missing data for the reliability of Next Steps. We show that item non-response in Next Steps biases the apparent socioeconomic composition of the Next Steps sample upwards, and that this bias is exacerbated by sample attrition since Next Steps participants from less advantaged social backgrounds are more likely to drop out of the study. Moreover, by the time it is possible to measure access to higher education, the socioeconomic background variables in Next Steps are shown to have very little explanatory power after controlling for the social background and educational attainment variables contained in the NPD. Given these findings, we argue that longitudinal social surveys with much missing data are only reliable sources of data on access to higher education if they can be linked effectively with more robust administrative data sources. This then raises the question—why not just use the more robust datasets?