1 Introduction: Data science in education – you’re invited to the party!


This chapter welcomes readers and gives an overview of the book, its themes, and its structure. It walks through what it means to learn about data science in education and the complex nature of education systems. The chapter describes how we clarify the data scientist role through a series of walkthroughs that introduce foundational skills for those new to learning R, run analysis on student perceptions of learning and student performance data, and describe how to use aggregate data.

Dear Data Scientists, Educators, and Data Scientists who are Educators:

This book is a warm welcome and an invitation. If you’re a data scientist in education or an educator in data science, your role isn’t exactly straightforward. This book is our contribution to a growing movement to merge the paths of data analysis and education. We wrote this book to make your first step on that path a little clearer and a little less scary.

Whether you’re a data scientist using your skills in an education job or an educator who wants to learn data science skills, we invite you to read this book and put these techniques to work in the real world. We think that your work in the education community will help decide how education and data science come together going forward.

1.1 Learning data science in education

Over the coming chapters, we’ll be learning together about what data science in education can look like. But to understand why we were compelled to write about the topic, we need to talk about why data science in education is not such a straightforward thing.

Learning data science in education is challenging because there isn’t a universal vision for that role yet. Data science in education isn’t straightforward because the role itself is not straightforward. If education were a building, it would be multi-storied with many rooms. There are privately and publicly funded schools. There are more than 18 possible grade levels. Students can learn alone or with others in a classroom.

This imaginary building we call education also has rooms most residents never see—rooms where business and finance staff plan the most efficient use of limited funds. The transportation department plans bus routes across vast spaces. University administrators search for the best way to measure career readiness. Education consultants study how students perform on course work and even how they feel about class materials.

There are a lot of ways one could do data science in education, but building consensus on ways one should do data science in education is just getting started. The “data science in education” community is still working out how it all fits together.

And for someone just getting started, it can all seem very overwhelming.

Even if we did have perfect clarity on the topic, there’s still the issue of helping education systems learn to leverage these new analytical tools. In many education settings, school administrators and their staff may have never had someone around who deeply understands education, knows how to write code, and uses statistical techniques all at once, as data science in education could be defined (Conway, 2010).

1.2 Making the path a little clearer

As data science in education grows, the way we talk about and conceptualize it also needs to grow; doing so can help us advance data science in education as a discipline and speak to the unique opportunities and concerns that arise with analyzing data in our domain.

We begin this book by offering a primer for data science in education, including a discussion of unique challenges and foundational skills in the programming language R. This includes this chapter as well as suggestions for how to use this text (Chapter 2), our definition of the process of data science and what it “looks like” in terms of who does data science and how they do it (Chapter 3), and a discussion of data science in education in the context of the wider fields of both education and data science (Chapter 4).

Next, you’ll take what you’ve learned and apply it in our data analysis in education walkthroughs. The walkthroughs in this book are our contribution towards a more example-driven approach to learning. They’re meant to make the ambiguous path of learning data science in education a little clearer by way of recognizable and actionable demonstrations.

These examples fall into four different themes, with chapters applying to each theme:

Build a foundation to use R and RStudio

Student perceptions of learning

Analyze student performance data

Get value from publicly available data

We’ll end the book by discussing how to bring data science skills into your education job, with strategic considerations for applying data science in your job (Chapter 15, an overview of teaching data science ([Chapter 16])(#c16)), and chapters on learning more (Chapter 17), and additional resources (Chapter 18).

We hope after reading this book you’ll feel like you’re not alone in learning to do data science in education. We hope your experience with this book is the right balance of challenging and fun. Finally, we hope you’ll take what you learned and share it with others who are looking to start this journey.

1.3 Conventions used in the book

The following typographical conventions are used in this book:

  • Package names are surrounded by curly brackets: {caret}
  • Function names are in constant width and then parentheses: clean_names()
  • Variable names are in constant width: var1