On the left side of the image is a histogram showing a relatively even distribution of data over the X axis. On the right side is an illustration of a hand flipping a coin, showing each stage of the flip and the progress of the coin as it is launched into the air, rotates in the air, and then lands in an open hand.

Data Science Project Releases First Learning Pathway: Summary Statistics and Probability

Data science is a highly interdisciplinary field that has become increasingly relevant in the job market, education, and society at large. With this in mind, LabXchange's Data Science–Driven Science Education (DSDSE) project has been diligently working to create two clusters of brand-new learning resources that apply data science to real-world challenges in biotechnology and climate change.

Aimed at high school educators and learners, the resources within each cluster have been carefully designed to align to Next Generation Science Standards (NGSS) and AP standards, ensuring that they are highly relevant to secondary school curricula across the United States and beyond.

The first cluster, Data Science in Biotechnology, teaches data science concepts through the lens of modern biotechnology problems (including disease testing and clinical trials) in a series of five learning pathways containing engaging and interactive learning resources. We are excited to announce that the first of these learning pathways, Summary Statistics and Probability, is now live on LabXchange!

Below, learn more about the first pathway and check out a handful of the new learning resources that feature in it.

Summary Statistics and Probability

The Summary Statistics and Probability pathway covers fundamental concepts in statistics and probability that are related to data science. These include the characterization of distributions, summary statistics, variability in data, the sum and product rule, conditional probability, Bayes’ theorem, and more.

Learning Objectives

In this pathway, students will learn to:

  1. Interpret histograms as a tool for representing data (including variation and uncertainty).
  2. Explain how histograms can be used to represent discrete or continuous distributions.
  3. Define basic summary statistics (mean, median, etc.), as well as how to calculate them, evaluate which are most suitable for characterizing a given dataset, and how to describe a distribution using summary statistics.
  4. Explain the differences among variance, standard deviation, and standard error, as well as interpret the meaning of error bars.
  5. Rewrite elements of written scenarios using probability notation and language.
  6. Calculate the probability of coinciding dependent or independent events.
  7. Employ Bayes' theorem to demonstrate the effects of different relevant conditions on the probability of an event.

Featured Resources

This pathway includes thirteen learning resources of various types, including videos, texts, infographics, question sets, and interactives. In addition, a number of resources within this pathway are accompanied by our newest resource type, the worksheet, which gives educators direct access to a printable worksheet designed to enhance students' engagement and learning with each resource.

Intro to Data Science and Data Literacy (video)

In this video, Dr. Xiao-Li Meng and Dr. Joseph Blitzstein, professors at Harvard University and principal investigators on the DSDSE project, introduce the field of data science and explain the importance of data literacy in the modern world. From advertising and current events to careers and education, data plays an increasingly important role in our daily lives.

Mean vs Median vs Mode (infographic + worksheet)

What are mean, median, and mode, and when are they best used? This infographic defines and provides examples of these three oft-encountered measures of centrality.

How to Interpret Disease Screening Test Results (scrollable interactive + worksheet)

On the heels of the COVID-19 pandemic, the topic of disease testing is more pertinent than ever. Does a positive result for a disease screening test always mean that someone has the disease? Does a negative result always mean that someone does not? In this scrollable interactive, explore how Bayes' theorem can help us to interpret disease screening test results.

About the Data Science–Driven Science Education Project

With generous support from the U.S. Department of Defense (DoD) STEM, the DSDSE project was launched in 2023 as an ambitious initiative aimed at educating the next generation of learners on the the incredible real-world importance of data science and data literacy.

The project will provide sustainable, long-term data science resources for high school students and educators that will augment national digital literacy by integrating data science with existing high school STEM curricula and building educator capacity to confidently lead students in data science explorations.

Learn more in the initial DSDSE project announcement and visit the DSDSE project page to stay up to date on what's coming next.

Written by
Chris Burnett
Digital Content Specialist

Read more