Season 2, episode 1 of the DataTalks.Club podcast with Eugene Yan
Today we’re discussing technical writing, logging, documentation, and more. Our special guest is Eugene Yan. Eugene works at the intersection of machine learning and product, building pragmatic ML systems while writing and speaking about effective data science, ML in production, and career growth.
Here’s what we covered:
Q: Before we dive into technical writing, can you tell us about your career journey? How did you transition into data science and tech?
Eugene: It’s a bit unusual. I graduated about ten years ago with a psychology degree and spent a few years in investment policy. I didn’t enjoy it—I was writing a lot of contracts and agreements—and wanted to work more with data. I took 20-30 MOOCs, interviewed, and was fortunate to join IBM. While there, I entered a product-classification competition; my team placed in the top 3% and we shared our approach at a meetup. An e-commerce startup with a similar problem was in the audience and invited me to lunch to discuss solutions across multiple languages (Vietnamese, Indonesian, English). The next day they offered me a job. I wanted startup experience, so I joined as their third data scientist. That startup was Lazada. A few years later, Alibaba acquired us. I then moved to a healthtech startup (which didn’t work out), and now I’m an Applied Scientist at Amazon in Seattle, working on recommendations and ML systems. So if anyone says Kaggle is a waste of time—well, I was very lucky, but it helped me.
Q: You’re known for publishing regularly on your blog. When did you start writing publicly, and what motivated you to begin?
Eugene: My first post was in September 2015—a report for DataKind, an NGO project accelerator. They asked for volunteer writers; I volunteered because I wanted to practice writing. That leads to the “why.” Early on, I interviewed data scientists and leaders I admired—people two to three steps ahead—and asked, “What skills make someone effective?” I expected answers like domain expertise, ninja hacking, or PhD-level research. Instead, ~80% said: communication—writing and speaking with non-technical stakeholders so things actually happen. I thought they were kidding, but I tried for a year: any chance to write, I wrote; I also gave my first meetup talk (on that competition). Many good things happened, so I kept going.
Q: Writing consistently takes a lot of time and effort, and the rewards aren’t always immediate. What keeps you motivated to publish regularly?
Eugene: I reflected on this at the end of 2020 and realized my main reason is to share. I publish how I cleaned Amazon data, how I deployed an API, summaries of Georgia Tech OMSCS classes (people asked which courses and professors were good), and conference notes to spread learnings. When others find it useful, it motivates me. I also write to learn—you think you understand something until you try to explain it. Writing exposes gaps and consolidates knowledge. I’ve used this for topics like NLP surveys (from RNNs to BigBird), data discovery platforms, and real-time recommendations. Finally, I write to be a lighthouse—to broadcast what I’m interested in so like-minded people can find me, discuss, and learn together. Publishing draws thoughtful feedback and conversations—even with people I’ve long admired—which is humbling and energizing.
Q: You mentioned writing as a “lighthouse” and having specific readers in mind. Who is your target audience?
Eugene: The lighthouse is a signal—I’m broadcasting what I’m interested in, so if you’re interested in the same things, you can find me and we can talk. It’s about attracting like-minded people.
I write for three groups: myself (so I benefit even if no one reads), my wife (to explain what I do), and my current and future teammates and like-minded peers (people with technical or ML backgrounds who want to level up). Some pieces get less traction—like “writing vs. coding”—but I still publish them because they matter to that audience.
Q: Do you approach each article like you’re building a product?
Eugene: Absolutely. If readers don’t “get it,” the UX is bad; if there’s no substance, the backend is weak. I ship weekly to accumulate reps—50+ “product” iterations a year. In five–ten years, I hope to deliver the same clarity in five hours instead of twenty.
Q: Let’s talk about your actual writing process. How do you go from idea to published article?
Eugene: I’ll start with the wrong way I used to think about it: sit at the computer, type, and something beautiful appears; it must be 100% original and 100% useful. That mindset makes writing nearly impossible. Now I remind myself I write to share, to learn, and to be a lighthouse. Writing doesn’t start with writing—it starts with reading, thinking, and note-taking. It doesn’t have to be perfectly original or useful.
When an idea pops up—on a walk or from a tweet—I drop the title into my notes. That backlog has ~50 topics. Then I follow a simple weekly cadence: seven days, publish every week no matter what state it’s in.
The key is iterating on outlines, not prose. I’m slow with sentences and wordsmithing, so I spend 50–70% of the time on the outline where I focus only on ideas.
Q: You emphasize iterating on outlines rather than prose. Can you explain your outline technique—what does it look like, and why do you rewrite it from memory?
Eugene: Section headers plus a topic sentence for each paragraph, with supporting evidence bullets. In later stages, the outline is essentially the content—just not in paragraph form. That makes prose a straightforward translation step.
The memory rewriting is key—it’s like telling a friend what a book was about: you recall only the most important parts. This filters for the strongest arguments. Then I compare with the previous outline to see if I missed any truly essential points. If I couldn’t remember it, maybe it wasn’t essential. I don’t do this every day—if I didn’t finish the outline, I continue. If I finished but I’m not satisfied, I rewrite from memory. If I am satisfied, I start prose earlier and give myself a lighter weekend.
Q: You publish weekly, which seems like a fast pace. How much time do you actually spend writing each article?
Eugene: I’m actually a slow writer. Roughly: 1–2 hours each weekday morning (~7 hours), weekends ~13 hours, plus ~5 hours for final editing—about 25 hours a week. Iterating on outlines makes that feasible.
Q: With all that editing and iteration, is there a point where you can over-edit and actually make a post worse?
Eugene: Yes. My hard stop is Sunday night—I must ship. Over-editing can regress quality. That’s why I iterate on outlines, not prose. I compare outline versions side by side: which tells the story more clearly? For prose, I’ll rewrite a paragraph above the old one and compare. My success metric: is it shorter while preserving all key ideas? If the new version is longer and muddier, I revert. Think of it like regression testing.
Q: Where do you get ideas for what to write about, and with a backlog of ~50 topics, how do you prioritize which one to write each week?
Eugene: Ideas come from three main sources:
For prioritization, my simple trick: readers of my site love ML, but I can’t (and don’t want to) write only ML every week. So I aim for one or two ML posts a month, then mix in pieces I feel are important—for example, why you shouldn’t default to online courses and what better learning paths look like. I picture specific readers—my teammates and mentees—and ask: what message would help them become more effective data scientists? That becomes the priority.
Q: Do you aim for a specific article length? Many people say attention spans are short nowadays.
Eugene: I’ve heard ~1,500 words (≈10 minutes) is “ideal,” but I don’t optimize for a number. I write just enough to communicate the message. Some posts are 600–1,000 words; others (e.g., real-time recommendations) need 4–5k to preserve the big picture. I assume my core readers are genuinely interested and will read or revisit as needed. I still try to be as concise as courtesy allows—and weekly deadlines help constrain scope.
Q: How do you approach titling your articles?
Eugene: Like naming functions: the clearest title that no one can misunderstand. By Sunday night I rarely have energy for clever titles, so clarity beats clickbait. That said, framing for the audience matters—e.g., “The Importance of Writing in a Tech Career” beats “Blogging and Technical Writing” because it speaks to why the audience should care.
Q: For someone who wants to start writing but hasn’t yet, what should they do? Should they find their niche first or just start writing?
Eugene: Start writing. Brutal truth: almost no one will read your first post, so write like nobody’s reading. You’re writing for yourself—to practice. What to write? What you’re thinking about now—work, gardening, recipes—anything. Don’t obsess over “finding your niche” first; you discover it by writing a lot and then looking back at what resonated.
It’s useful to have a niche, but you won’t find it before doing reps. Patrick McKenzie (patio11) wrote hundreds of pieces before recognizing his sweet spot at the intersection of engineering and marketing. You connect the dots retrospectively. Also, writing the same narrow thing forever is boring; I can’t do ML every week. I rotate topics that matter to me and my audience (career, process, ML production, etc.).
Q: What blogging platform or tools do you recommend for someone just starting out?
Eugene: Use whatever is easiest so you remove friction. Writing is already hard—don’t burn cycles on CI/CD and theme yak-shaving. Start with the lowest setup: Medium, Substack, or WordPress. I began on WordPress; when I needed more customization, I moved to Jekyll on GitHub Pages—but only after publishing 50–60 posts. The tool is the least important decision; hitting “publish” weekly is the most important. For what it’s worth, I currently use Jekyll on GitHub Pages.
Q: You work full-time and write a lot. How do you fit writing into your schedule?
Eugene: From 2017 to 2019 I was doing an online MSCS while working part-time—20–40 hours a week. After graduating, I redirected that energy into writing: 1–2 hours every morning as a daily habit (like exercise or meditation). Saturdays I hammer out the prose; Sundays I edit. You don’t have to spend that much—start with short snippets, even 500 words. Tweets and threads can work too, though the constraints can make them harder.
Q: What’s your advice for growing a blog and building an audience? Is there a secret to success?
Eugene: No secret sauce. I just try to be transparent about my process. If anything, the “iterate on the outline” approach is my closest thing to a secret.
For distribution, I publish, then share a short note on Twitter and repost it on LinkedIn. That’s it. Over time, like-minded people find and share it. I’ve hit the Hacker News front page a few times—often when a post invites thoughtful disagreement. But the real lever is consistency: last year I “shot 55 arrows”; a few hit. Consistency beats tricks.
Q: Let’s talk about writing at work—documentation, design docs, etc. Why is internal writing important, and how does Amazon’s “working backwards” approach work?
Eugene: Writing lets you test ideas at scale before coding. At Amazon, we use the press release (PR/FAQ) before a project: does this excite customers (and internal stakeholders)? If yes, we write a design doc with requirements, latency, throughput, cost, training and serving plans, and trade-offs—then circulate it for feedback.
The “working backwards” process means starting from the customer problem and working back to the solution. The initial press release tests whether it’s worth investing. Writing is central: Amazon largely avoids slides; documents scale your message without you in the room.
Documentation also serves your future self and your team: decision logs (e.g., why DynamoDB over Redis, why Flink over Spark), code notes, and rationales you’ll forget six months later.
Q: What about writing for a technical portfolio, like documenting a Kaggle competition or personal project? What should a good README include?
Eugene: Think of the hiring manager. Clear quick start instructions, requirements, and a repo tour: where to find data prep, training, validation, and serving. Show you can document as well as code. A concise decision log (assumptions, trade-offs) helps your credibility.
Subscribe to our weekly newsletter and join our Slack.
We'll keep you informed about our events, articles, courses, and everything else happening in the Club.