Statistics education research, Data science education, Machine learning


Data science is a new field of research that has attracted growing interest in recent years as it focuses on turning raw data into understanding, insight, knowledge, and value. New data science education programs, which are being launched at an increasing rate, are designed for multiple education levels and populations. Machine learning (ML) is an essential element of data science that requires an extensive background in mathematics. Whereas it is possible to teach the principles of ML only as a black box, novice learners might find it difficult to improve an algorithm’s performance without a white box understanding of the underlying ML algorithms. In this paper, we suggest a pedagogical method, based on hands-on pen-and-paper tasks, to support white box understanding of ML algorithms for learners who lack the level of mathematics knowledge required for this purpose. Data were collected using a comprehension questionnaire and analyzed according to the process-object theory borrowed from mathematics education research. We present evidence of the effectiveness of this method based on data collected in an introduction-level data science course for graduate psychology students. This population had extensive psychology domain knowledge, as well as an established background in statistics, but had gaps in mathematical and computer science knowledge compared with data science majors. The research contribution is both practical and theoretical. Practically, we present a learning module that supports non-major data science students’ white box understanding of ML. Theoretically, we propose a data analysis method to evaluate students’ conceptions of ML algorithms.

Author Biography

ORIT HAZZAN, Technion - Israel Institute of Technology

Professor Orit Hazzan has been a faculty member at the Technion’s Department of Education in Science and Technology since October 2000. Her research focuses on computer science, software engineering and data science education. 


Adams, J. C. (2020). Creating a balanced data science program. Proceedings of the 51st ACM Technical Symposium on Computer Science Education (pp. 185–191).

Anderson, L. W., Bloom, B. S., & others. (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives. Longmans.

Anderson, P., Bowring, J., McCauley, R., Pothering, G., & Starr, C. (2014). An undergraduate degree in data science: Curriculum and a decade of implementation experience. Proceedings of the 45th ACM Technical Symposium on Computer Science Education - SIGCSE ’14 (pp. 145–150).

Berman, F., Rutenbar, R., Hailpern, B., Christensen, H., Davidson, S., Estrin, D., Franklin, M., Martonosi, M., Raghavan, P., Stodden, V., & Szalay, A. S. (2018). Realizing the potential of data science. Communications of the ACM, 61(4), 67–72.

Biehler, R., & Schulte, C. (2018). Paderborn symposium on data science education at school level 2017: The collected extended abstracts. Universitätsbibliothek.

Biggs, J. B., & Collis, K. F. (2014). Evaluating the quality of learning: The SOLO taxonomy (Structure of the Observed Learning Outcome). Academic Press.

Bloom, B. S., Engelhart, M. D., Furst, E. J., Hill, W. H., & Krathwohl, D. R. (1956). Taxonomy of educational objectives. Vol. 1: Cognitive domain. Longmans.

Bryant, C., Chen, Y., Chen, Z., Gilmour, J., Gumidyala, S., Herce-Hagiwara, B., Koures, A., Lee, S., Msekela, J., Pham, A. T., & others. (2019). A middle-school camp emphasizing data science and computing for social good. Proceedings of the 50th ACM Technical Symposium on Computer Science Education (pp. 358–364).

Buckley, J., Brown, M., Thomson, S., Olsen, W., & Carter, J. (2015). Embedding quantitative skills into the social science curriculum: Case studies from Manchester. International Journal of Social Research Methodology, 18(5), 495–510.

Carter, J., Brown, M., & Simpson, K. (2017). From the classroom to the workplace: How social science students are doing data analysis for real. Statistics Education Research Journal, 16(1), 80–101.

Cassel, L. N., Dicheva, D., Dichev, C., Goelman, D., & Posner, M. (2016). Data science for all: An introductory course for non-majors; in flipped format (Abstract Only). Proceedings of the 47th ACM Technical Symposium on Computing Science Education (p. 691).

Conway, D. (2010). The data science venn diagram.

Crooks, N. M., Bartel, A. N., & Alibali, M. W. (2019). Conceptual knowledge of confidence intervals in psychology undergraduate and graduate students. Statistics Education Research Journal, 18(1), 46–62.

Danyluk, A., Leidig, P., Cassel, L., & Servin, C. (2019). ACM task force on data science education: Draft report and opportunity for feedback. Proceedings of the 50th ACM Technical Symposium on Computer Science Education (pp. 496–497).

De Veaux, R. D., Agarwal, M., Averett, M., Baumer, B. S., Bray, A., Bressoud, T. C., Bryant, L., Cheng, L. Z., Francis, A., Gould, R., Kim, A. Y., Kretchmar, M., Lu, Q., Moskol, A., Nolan, D., Pelayo, R., Raleigh, S., Sethi, R. J., Sondjaja, M., … Ye, P. (2017). Curriculum guidelines for undergraduate programs in data science. Annual Review of Statistics and Its Application, 4(1), 15–30.

Demchenko, Y., Belloum, A., Los, W., Wiktorski, T., Manieri, A., Brocks, H., Becker, J., Heutelbeck, D., Hemmje, M., & Brewer, S. (2016). EDISON data science framework: A foundation for building data science profession for research and industry. 2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom) (pp. 620–626).

Dichev, C., & Dicheva, D. (2017). Towards data science literacy. Procedia Computer Science, 108, 2151–2160.

Dryer, A., Walia, N., & Chattopadhyay, A. (2018). A middle-school Module for Introducing data-mining, big-data, ethics and privacy using RapidMiner and a Hollywood theme. Proceedings of the 49th ACM Technical Symposium on Computer Science Education (pp. 753–758).

Elad, M. (2017). Deep, deep trouble: Deep learning’s impact on image processing, mathematics, and humanity. SIAM News, 50(4).

Fillebrown, S. (1994). Using projects in an elementary statistics course for non-science majors. Journal of Statistics Education, 2(2).

Fisher, N., Anand, A., Gould, R., Hesterberg, J. B. ans T., Bailey, J., Ng, R., Burr, W., Rosenberger, J., Fekete, A., Sheldon, N., Gibbs, A., & Wild, C. (2019, September). Curriculum frameworks for introductory data science.

Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179–188.

Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep learning (Vol. 1). MIT Press.

Gould, R., Suyen, M.-M., James, M., Terri, J., & LeeAnn, T. (2018). Mobilize: A data science curriculum for 16-year-old students. In M. A. Sorto, White, & L. Guyot, L. (Eds.), Looking back, looking forward. Proceedings of the 10th International Conference on the Teaching of Statistics (ICOTS10), Kyoto, Japan, July 8–14. International Statistics Institute.

Gray, E. M., & Tall, D. O. (1994). Duality, ambiguity, and flexibility: A “proceptual” view of simple arithmetic. Journal for Research in Mathematics Education, 25(2), 116–140.

Hancock, S. A., & Rummerfield, W. (2020). Simulation methods for teaching sampling distributions: should hands-on activities precede the computer? Journal of Statistics Education, 28(1), 9–17.

Haqqi, S., Sooriamurthi, R., Macdonald, B., Begandy, C., Cameron, J., Pirollo, B., Becker, E., Choffo, J., Davis, C., Farrell, M., & others. (2018). Data jam: Introducing high school students to data science. Proceedings of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education (pp. 387–387).

Havill, J. (2019). Embracing the liberal arts in an interdisciplinary data analytics program. Proceedings of the 50th ACM Technical Symposium on Computer Science Education (pp. 9–14).

Hazan, B., Zhang, W., Olcum, E., Bergdoll, R., Grandoit, E., Mandelbaum, F., Wilson-Doenges, G., & Rabin, L. (2018). Gamification of an undergraduate psychology statistics lab: Benefits to perceived competence. Statistics Education Research Journal, 17(2), 255–265.

Hazzan, O. (1999). Reducing abstraction level when learning abstract algebra concepts. Educational Studies in Mathematics, 40(1), 71–90.

Hazzan, O. (2003a). How students attempt to reduce abstraction in the learning of mathematics and in the learning of computer science. Computer Science Education, 13(2), 95–122.

Hazzan, O. (2003b). Reducing abstraction when learning computability theory. Journal of Computers in Mathematics and Science Teaching, 22(2), 95–117.

Hazzan, O., & Hadar, I. (2005). Reducing abstraction when learning graph theory. Journal of Computers in Mathematics and Science Teaching, 24(3), 255–272.

Hazzan, O., Ragonis, N., & Lapidot, T. (2020). Guide to teaching computer science: An activity-based approach. Springer.

Heinemann, B., Opel, S., Budde, L., Schulte, C., Frischemeier, D., Biehler, R., Podworny, S., & Wassong, T. (2018). Drafting a data science curriculum for secondary schools. Proceedings of the 18th Koli Calling International Conference on Computing Education Research - Koli Calling ’18 (pp. 1–5).

Heyd-Metzuyanim, E., & Graven, M. (2019). Rituals and explorations in mathematical teaching and learning: Introduction to the special issue. Educational Studies in Mathematics, 101(2), 141–151.

Immekus, J. C. (2019). Flipping statistics courses in graduate education: Integration of cognitive psychology and technology. Journal of Statistics Education, 27(2), 79–89.

Khuri, S., VanHoven, M., & Khuri, N. (2017). Increasing the Capacity of STEM Workforce: Minor in bioinformatics. Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education (pp. 315–320).

Kolaczyk, E. D., Wright, H., & Yajima, M. (2021). Statistics practicum: Placing 'practice' at the center of data science education. Harvard Data Science Review.

Lavie, I., Steiner, A., & Sfard, A. (2019). Routines we live by: From ritual to exploration. Educational Studies in Mathematics, 101(2), 153–176.

Leron, U., & Dubinsky, E. (1995). An abstract algebra story. The American Mathematical Monthly, 102(3), 227–242.

Mike, K., Hartal, G., & Hazzan, O. (2021). Widening the shrinking pipeline: The case of data science. 2021 IEEE Global Engineering Education Conference (EDUCON) (pp. 252–261).

Mike, K., & Hazzan, O. (2022). Interdisciplinary CS1 for non-majors: The case of graduate psychology students. 2022 IEEE Global Engineering Education Conference (EDUCON) (pp. 86–93),

Moore, D. S. (1997). New pedagogy and new content: The case of statistics. International Statistical Review, 65(2), 123–137.

Neumann, D. L., Hood, M., & Neumann, M. M. (2013). Using real-life data when teaching statistics: Student perceptions of this strategy in an introductory statistics course. Statistics Education Research Journal, 12(2), 59–70.

Páez, A. (2019). The pragmatic turn in explainable artificial intelligence (XAI). Minds and Machines, 29(3), 441–459.

Pfaff, T. J., & Weinberg, A. (2009). Do hands-on activities increase student understanding? A case study. Journal of Statistics Education, 17(3).

Prodromou, T., & Dunne, T. (2017). Statistical literacy in data revolution era: Building blocks and instructional dilemmas. Statistics Education Research Journal, 16(1), 38–43.

Rabin, L., Fink, L., Krishnan, A., Fogel, J., Berman, L., & Bergdoll, R. (2018). A measure of basic math skills for use with undergraduate statistics students: The MACS. Statistics Education Research Journal, 17(2), 179–195.

Raj, R. K., Parrish, A., Impagliazzo, J., Romanowski, C. J., Ahmed, S. A., Bennett, C. C., Davis, K. C., McGettrick, A., Pereira, T. S. M., & Sundin, L. (2019). Data science education: Global perspectives and convergence. Proceedings of the 2019 ACM Conference on Innovation and Technology in Computer Science Education, pp. 265–266.

Rosenberg-Kima, R. B., & Mike, K. (2020). Teaching online teaching: Using the task-centered instructional design strategy for online computer science teachers’ preparation. In Teaching, Technology, and Teacher Education during the COVID-19 Pandemic: Stories from the Field (pp. 119–123). Association for the Advancement of Computing in Education.

Sfard, A. (1991). On the dual nature of mathematical conceptions: Reflections on processes and objects as different sides of the same coin. Educational Studies in Mathematics, 22(1), 1–36.

Sfard, A., & Lavie, I. (2005). Why cannot children see as the same what grown-ups cannot see as different? Early numerical thinking revisited. Cognition and Instruction, 23(2), 237–309.

Skiena, S. S. (2017). The data science design manual. Springer.

Srikant, S., & Aggarwal, V. (2017). Introducing data science to school kids. Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education (pp. 561–566).

Sulmont, E., Patitsas, E., & Cooperstock, J. R. (2019a). Can You Teach Me to Machine Learn? Proceedings of the 50th ACM Technical Symposium on Computer Science Education (pp. 948–954).

Sulmont, E., Patitsas, E., & Cooperstock, J. R. (2019b). What is hard about teaching machine learning to non-majors? Insights from classifying instructors’ learning goals. ACM Transactions on Computing Education, 19(4), 1–16.

Tartaro, A., & Chosed, R. J. (2015). Computer scientists at the biology lab bench. Proceedings of the 46th ACM Technical Symposium on Computer Science Education (pp. 120–125).

Wiberg, M. (2009). Teaching statistics in integration with psychology. Journal of Statistics Education, 17(1).

Wickham, H., & Grolemund, G. (2016). R for data science: Import, tidy, transform, visualize, and model data. O’Reilly Media.