Collaborative Research: Exploring the Mathematics of Biological Ecosystems using Data Science (EMBEDS)

Lead Staff:
Andee Rubin
Gillian Puttick


Data science practitioners increasingly use interactive software tools to explore and present data through data visualization, using them for communication, explanation and/or for modeling or making predictions. Yet the tools that data scientists use and the reasons why they use them are obscure for most students. Students are generally expected to read and interpret data and are not presented with the possibility of delving deeper. Furthermore, opportunities to interpret data are seldom presented in the context of meaningful inquiry into real-world phenomena.

While the current standards for computer science and technology education recognize the importance of data practices, they do not specify what effective practices could or should be for developing student competence. We conjecture that data practices focused on real-world phenomena might best be learned in the context of disciplinary learning.

The EMBEDS project is investigating the potential of “data excursions” for developing high school students’ competencies with data practices and modeling within an existing, phenomenon-based science curriculum focused on the Serengeti ecosystem. Integrating CODAP, a free web-based data analysis and modeling tool designed for students in grades 6 – 14, the data excursions will allow students to interact with existing datasets drawn from primary research.

Research Activity

Students will query the contexts and sampling methodology scientists used to gather the data in the original scientific studies. In addition, they will explore ways to aggregate and represent the data to explain population changes of large animals on the Serengeti in Tanzania. In a design-based research study, we will ask: How can working with real datasets, creating computational representations, and using them to reason about ecosystem phenomena, support students’ development of data science literacy?

EMBEDS data chart

A collaboration between TERC and the University of Colorado, Boulder, the project team brings together relevant expertise in biology and data science. Our mixed method design study is being conducted in collaboration with high school teachers from two districts, gathering evidence that can be used to inform subsequent efficacy research.


Our data excursions will be integrated into a free, widely used and recognized high school biology curriculum, ensuring reach to classrooms in 24 states. The research results may serve as a guide for other science curriculum developers who aspire to a similar integration of data science concepts and practices into their materials.

While data science is increasingly relevant and widespread, there is still much to be learned about the issues that arise when data is made a central part of a high school science class. How can the goals of both data education and science education be realized simultaneously? Our project is investigating best practices for positioning investigations of extant datasets centrally in NGSS-aligned science curricula.


Penuel, W., Rubin, A., Henson, K., Puttick, G., and Deverel-Rico, C. (2023) A Teaching Routine for Working with Existing Data in Science Classrooms. Proceedings of the International Society of the Learning Sciences Conference 2023, Montreal, Canada.

Deverel-Rico, C., Penuel, W., Rubin, A., Puttick, G. and Henson, K. (2023) Supporting Alignments in Scientific Activity: Moving Across Question, Evidence and Explanation. Proceedings of the International Society of the Learning Sciences Conference 2023, Montreal, Canada.

Rubin, A., Penuel, W., Puttick, G., Henson, K., and Deverel-Rico, C. (2023). Data-ing in the Context of High School Science. International Workshop on Statistical Reasoning, Thinking and Literacy (SRTL). Maleny, Australia.