Podcast
Teaching Open Science & Reproducible Research: Research Software Engineering Practices for Academia
Open original DataTalks.Club episode
Teaching Open Science & Reproducible Research: Research Software Engineering Practices for Academia
Original Episode
Use these links for the canonical episode and media sources.
- Open the original DataTalks.Club podcast page
- Watch on YouTube
- Listen on Spotify
- Listen on Apple Podcasts
Episode Overview
How do you teach reproducible research and practical research software engineering (RSE) skills to neuroimaging students and researchers? In this episode, Johanna Bayer — a psychologist-turned-computational neuroscientist completing a PhD in machine learning for clinical neuroimaging at the University of Melbourne and an open science advocate — walks through concrete approaches for teaching reproducible research. We cover course design (Carpentries-style curricula, Git introductions, and reproducible manuscripts.
People
Use these links to connect the episode to guest notes.
Chapter Summary
Use these checkpoints to decide whether to open the source transcript.
- 0:00 - Podcast Introduction
- 1:08 - Guest Background: Johanna Bayer — Psychology to Machine Learning in Neuroimaging
- 2:24 - Academic Journey: Studies in Germany, Zurich and Move to Melbourne
- 5:27 - Teaching Open Science: Intro to Git, Homework Support and Course Structure
- 7:39 - Carpentries & Structured Beginner Curriculum for Reproducible Research
- 8:30 - Open Science Curriculum: Reproducible Manuscripts with Embedded Code
- 10:52 - Guided Onboarding to Open Source: Small Repos, Pull Requests & Turing Book
- 12:10 - What RSE Means: Software-Focused Research Outputs and Practices
- 14:10 - Academic RSE Roles: PhD Students, Methods Papers and Toolboxes
- 16:36 - Software as Research Output: DOIs, Toolboxes and Publishing Code
- 17:10 - Culture Change in Labs: Convincing Supervisors & Grassroots Hackathons
- 20:05 - Industry Lessons for Academia: Programming Expectations & Tool Adoption
- 22:12 - Experiment Tracking in Research: MLflow and Reproducibility Tools
- 22:16 - Barriers to Teaching Software Skills: Time, Expertise and Fear of Scrutiny
- 23:54 - Infrastructure Gaps: Hosting Interactive Reproducible Papers and Costs
- 27:38 - Core Coding Practices to Teach: Packaging, Environments, Formatting & Tests
- 28:18 - Learning by Doing: Brainhack, Hackathons, Community Contributions
- 30:44 - Formal Courses vs Self-Learning: Structure, Discipline and Freelancing
- 33:04 - Collaboration & Code Review: Working Alone vs Community Feedback
- 36:05 - Benefits of Open Code: Citations, Collaboration and Career Visibility
- 37:01 - Data Sharing Reality: “Data Upon Request”, Access Controls and Consortia
- 38:50 - Project Case Study: Normative Brain Model — Folder Structure & Cookiecutter
- 39:27 - Applied Engineering Practices: Branching, Formatting, Versioning & MLflow
- 42:22 - Sensitive Data Practices: De-identification and Controlled Access
- 45:24 - Balancing Open Source, Hackathons and Full-Time Research Commitments
- 47:42 - Discovering Projects: GitHub Trending, Social Media & Community Platforms
- 49:46 - Contributing to Repositories: Readme, Contributing Guides, Issues & Communication
- 52:22 - Open Publishing vs Industry IP: Academic Openness and Commercial Concerns
- 55:12 - Recommended Resources: The Turing Way, The Carpentries & ML Solutions Handbook
- 58:03 - Episode Conclusion and Closing Remarks