Machine Learning Zoomcamp: Free ML Engineering course. Register here!

DataTalks.Club

Snowflake: The Definitive Guide

by Joyce Kay Avila

The book of the week from 23 Jan 2023 to 27 Jan 2023

Snowflake’s ability to eliminate data silos and run workloads from a single platform creates opportunities to democratize data analytics, allowing users at all levels within an organization to make data-driven decisions. Whether you’re an IT professional working in data warehousing or data science, a business analyst or technical manager, or an aspiring data professional wanting to get more hands-on experience with the Snowflake platform, this book is for you.

You’ll learn how Snowflake users can build modern integrated data applications and develop new revenue streams based on data. Using hands-on SQL examples, you’ll also discover how the Snowflake Data Cloud helps you accelerate data science by avoiding replatforming or migrating data unnecessarily.

You’ll be able to:

  • Efficiently capture, store, and process large amounts of data at an amazing speed
  • Ingest and transform real-time data feeds in both structured and semistructured formats and deliver meaningful data insights within minutes
  • Use Snowflake Time Travel and zero-copy cloning to produce a sensible data recovery strategy that balances system resilience with ongoing storage costs
  • Securely share data and reduce or eliminate data integration costs by accessing ready-to-query datasets available in the Snowflake Marketplace

Questions and Answers

Joyce Kay Avila

This has been a wonderful experience to interact with you all! Please feel free to connect with me via LinkedIn at https://www.linkedin.com/in/joycekayavila/
Thank you Francis Terence Amit and Alexey Grigorev for this incredible opportunity. I hope you’ll have me back next year when my new O’Reilly book (“Hands-On Salesforce Genie: Implementing and Managing a Real-Time Customer Data Platform”) is released.

Alexey Grigorev

Thanks for being this week with us!

Alejandro Marmol

I’m so Happy! Thank you so much! I will do my best to read it!!!!! 😊

Philip Dießner

Hello Joyce Kay Avila, Thanks for being here! Would this book make sense for somebody who has some on-prem DWH experience but not so much in the cloud, but who wants to branch out a bit? Or are there ressources to be looked at before?

Joyce Kay Avila

I think learning about Snowflake makes great sense for somebody who has some on-prem DWH experience but not so much in the cloud (but wants to branch out). If you understand the basics of DWH and are proficient in SQL, you should be ready to start learning Snowflake.
If you want some knowledge of cloud data warehousing before you start learning Snowflake, there is a good resource that I’ve found that will only take you a few hours to digest: https://resources.snowflake.com/ebooks/cloud-data-warehousing-for-dummies
It is, of course, going to be a bit biased toward Snowflake as the choice for a cloud data platform but the content is really good and easy to read and understand.
Best of luck to you Philip Dießner (author) and Brent Brewington (interested party)!

Philip Dießner

Thanks a lot for the detailed answer! That sounds like a very good entry point, I will look into it.

Julia

Hi Joyce Kay Avila Will the gift book have your original signature?😊

Liliana

Hi Joyce Kay Avila thank you so much for answering our questions. I’ve been thinking of studying for the Snowflake core certification. Does your book cover all of the topics for the exam? What other resources would you recommend to study along with your book while preparing?

Joyce Kay Avila

Liliana, you’re welcome! It has been my pleasure.
I have had several people tell me that my book has been helpful in passing the Snowflake core certification exam. That said, I’d recommend you take a look at the official Snowflake study guide to make sure you are focusing on the necessary topics to pass the exam. Your first stop should be: https://www.snowflake.com/certifications/ where you can then find details about any of the Snowflake certifications you want to know more about.

Shalltear

first question: I create a table and add comments to each column. Now I build a view referring to this table, but the comments are not available in this view. I could comment it in the view, as well, but I don’t want that. I want that the view inherit the comments from the table. Is that possible?

Joyce Kay Avila

Whenever someone instant messages me on LinkedIn with a very specific technical question such as this, I point them to the Snowflake forums. That way, the questioner can get the benefit of responses from the entire Snowflake community to help solve unique issues. And you can also interact with the Snowflake product experts who monitor the forums; it’s important that they know what questions users are having so that they can improve documenetation and/or add new functionality that is frequently requested but not yet available. I encourage you Shalltear to get involved with the Snowflake community to get answers to any specific technical questions such as this and to also help answer questions that others may have for which you have the answers. You can participate in the community forums at https://community.snowflake.com/s/

Shalltear

Thanks. Spoke to Snowflake directly this morning. Turns out, right now that’s not possible. They are working on it.

Low Kim Hoe

Hi, want to ask is it Snowflake on cloud better or Snowflake Data cloud?

Joyce Kay Avila

Great question Low Kim Hoe! In early days (2015), Snowflake’s original positioning was “The Data Warehouse Buiilt for the Cloud”. While Snowflake was hugely successful with its data warehouse offering, there was so much more that was possible with Snowflake beyond data warehousing. Thus, in 2019, Snowflake branded itself as THE “Data Cloud”. This rebranding helped call out Snowflake’s ability to federate the data and the fact that Snowflake was built from the ground up to be a data sharing platform (the actual workload is now called the “Data Collaboration” workload). Therefore, to answer your question, it is preferred to use the term “Snowflake Data Cloud”.

Low Kim Hoe

Noted on this. Thanks for your speedy reply!😁

Laura

Hey, my question: in the last months and years, there has been a rivalry between Snowflake and Databricks. Do you think this will continue until there is a “winner” or do you think they will focus on their original areas of expertise again?

Joyce Kay Avila

That is a really tricky question to answer, Laura! Snowflake and Databricks are both great products, depending on the use case, so I don’t think there will ever be just one “winner”. And it is not unusual to see clients who have both Snowflake and Databricks, for a variety of reasons. However, I have a preference for Snowflake (probably no big surprise to you!) over Databricks. The general consensus, and what I’ve experienced, is that Snowflake is easier to get up and running right away and overall costs are lower with Snowflake when you consider the specialized (and costly) skillset needed to implement and manage a Databricks project.

Joyce Kay Avila

As far as the future, I think both Snowflake and Databricks continue to evolve so I don’t know that we’ll see a return to their original areas of expertise. But it will interesting to revisit this conversation again in a year (or two!)

Stacey Kellough

Thank you for the excellent guide to Snowflake, Joyce Kay Avila!!

Stacey Kellough

A little bit of an off topic question, but do you have any advice for data professionals/folks with more of a technology focused background to get a better understanding the business value of data and be able to speak more to that side? I don’t know if there’s an easy answer to this beyond getting an MBA/or becoming a CPA/doing career change, but was curious to get your take on this. 🙂 Thank you so much

Joyce Kay Avila

Actually, I can provide you an on-topic answer to the question Stacey Kellough!
One thing I’d recommend is to read about some of the trends about the business value of data that you find interesting. For example, I’m really excited about Data Monetization. Snowflake has some really great resource materials on the topic, especially because of Snowflake’s advanced data collaboration (data sharing) features which make data monetization possible.
Here are a few Snowflake resources for Data Monetization that I’d recommend:
(1) Introduction to Monetizing Data: https://www.snowflake.com/guides/monetizing-data
(2) Modern Data Monetization Strategies:
https://www.snowflake.com/resource/modern-data-monetization-strategies/
(3) Blog post articles written by Jennifer Belissent

Julia

Hi Joyce Kay Avila I’m so excited to see this book. I always prefer to take by hand a paper book instead e-book. It’s amazing how to smell the new book 😊👍 My question is - How many hours did you spend working on this book?

Joyce Kay Avila

The one really nice thing about an electronic copy is that you can do quick and easy searching. However, I, too, really like having a copy to put my hands on and to refer to Julia!

Joyce Kay Avila

As far as the number of hours spent working on the book, I didn’t keep track. Not sure exactly why I didn’t but maybe because it was something that I was doing that I was really passionate about and not something that I was counting on for the financial compensation. Maybe I just didn’t think it would take me as long as it did so I didn’t track my hours in the beginning. Not really sure.
Interestingly, in the beginning, my O’Reilly editor and I agreed on a schedule and I thought that there was no way it was going to take me 15+ months to bring an O’Reilly book to life and boy was I wrong. Initially, I was ahead of schedule but lots of things happened. For one thing, I originally wrote the book using the classic UI because the new Snowsight UI was generally available yet. So, then I had to go back and rewrite major sections to incorporate Snowsight. I just couldn’t, in good conscience, leave things as is with the old user interface. I’m really glad that I took the time to do that but it meant that I finished the book exactly on schedule, not before.
As a first-time author I had no idea everything that was involved. My editor and I laugh about it now. I’ve just started work on my second book and I have a much more realistic idea of what it’s going to take to get it across the finish line this time.

Julia

Awesome😊👍 I agree with you, if passion about what are you doing - you are not counting time for it.

Julia

Joyce Kay Avila is it your fist book?

Joyce Kay Avila

Yes, Julia, the O’Reilly Snowflake Definitive Guide was my first book! But, it’s certainly not my last one! I’ve signed up to write the O’Reilly book titled “Hands-on Salesforce Genie” which (fingers-crossed) will be on bookshelves the early part of 2024.

Buvan

Hi Joyce Kay Avila Snowflake is easy to manage if we have small environment. Once the environment is big (i.e. 100s of DB and PBs of data) , its difficult to manage (as a platform admin) as we have role/grant explosions (i.e., Thousands of grants). How do you see this and how to easily (with minimal effort) manage?

Joyce Kay Avila

Excellent question Buvan
There is no one right answer to this question. Without knowing more details about your architecture and role requirements, I’ll provide a high level response with some suggestions.
There are things you can use within Snowflake to (1) help streamline and optimize role management, and/or (2) processes you can put in place, and then (3) there are also tools that you might want to consider.
You might want to consider designating some databases as having open access to make it clear which data, if any, is available to all. While this prevents interruption to data consumers’ access, it could pose a security and compliance risk.
Some organizations have had success in simplifying management by defining only a limited number of roles per most databases. One example would be to have a Read Only and a Creator role per database.
Using role hierarchies is also a great way to create an abstraction layer where roles can gain access to other roles’ privileges through inheritance. If you are not currently using role hierarchies, this is definitely one thing I’d recommend researching to see if it is appropriate for your use case.
When user / role management is an activity that requires a dedicated resource or team, you could consider using ServiceNow or Salesforce Case functionality where users can request new user accounts and/or roles. This also allows you to document requests and have an approval process in place.
Also, you could take a look at Dataops.live which can be used to help manage a large number of roles. See this article for more details: https://www.dataops.live/blog/security-considerations-ina-dataops-world

jjp

Hi - the book mentions design and deploying - does someone have to have an extensive data engineering background to get the most out of the book?

Joyce Kay Avila

There are two reasons why I don’t think you need an extensive background in data engineering to get the most out of the book jjp.
First, in comparison to other data cloud platforms, Snowflake is cosidered to be a much easier platform on which to design and deploy. Even if you are not a career data engineer, you should be able to understand most (if not all) of the concepts in the book, including the data engineering topics.
Second, the book was purposely designed so that each chapter stands on its own. That way, if there is some specific knowledge you want (that is not related to data engineering, for example) or need right away then you can tackle that chapter first without having to complete any previous chapter exercises. If, for example, you have a need to learn about Snowflake administration (i.e., user and role management) then you can gain a good understanding of that material. Later, you can dive into Snowflake data loading and unloading or other data engineering related topics.

jjp

:thank_you:

Julia

How many pages have this book? 📖

Joyce Kay Avila

The physical (paper) book goes up to page 442, including the index.

Lirone

Thank for the book ! I would like to know if inside the book, maybe at the beginning, you explain why snowflake over others data storage solution ? 🙂

Joyce Kay Avila

The book wasn’t really meant to provide a comparison between Snowflake and other solutions. The intent of the book is to inform and educate the reader about Snowflake, assuming the reader is someone who has already decided on Snowflake or who wants to know more about Snowflake to be able to decide if Snowflake is a good fit for them.
That said, there are many reasons why individuals and companies choose Snowflake. Data storage factors (i.e., the amazing Snowflake micropartitions, for example and the fact that Snowflake bills customers based on compressed data size) are one consideration. Other reasons include ease of use (powerful web UI capabilities, can access existing cloud data, time travel capabilities, simple role based access and more), and the existence of Snowflake workload capabilities such as data collaboration (data sharing) for which Snowflake is incredibly well-known for.
And it isn’t necessarily that Snowflake would be the only data storage solution selected although Snowflake supports data warehousing and data lake workloads, among others. In larger organizations, it is not uncommon to have Snowflake in addition to another different data storage solution. It really depends on the use case(s).

Joyce Kay Avila

Lirone Perhaps a few other reasons why Snowflake could be a good choice, too, for those interested in AI / ML — Snowpark and Snowflake’s acquisition of Myst are other excellent reasons! 🙂

Alejandro Marmol

Hi Joyce Kay Avila! I will like to know, when it’s the best time to apply snowflake¿? What stage of data maturity we have to possess to apply it!?

Joyce Kay Avila

Excellent question Alejandro Marmol! Many of the clients I see getting started with Snowflake are looking to improve their insights so they are not necessarily at the beginning stages of their data journey. That said, I don’t think there is necessarily a right or wrong time to apply Snowflake.
The really great thing about Snowflake is that if the use case lends itself to using Snowflake, you can get started right away. For example, if you already have data sources that exist in the cloud (in an S3 bucket, for example), you don’t even necessarily need to move the data into Snowflake to be able to surface and use the data for insights right away in Snowflake!
Another specific example I frequently see is clients who have their logic built into a BI tool which locks them into that specific BI tool; they’re looking to extract that logic from the BI tool and put it into Snowflake instead which allows them to have more of a “Bring Your Own BI Tool” (BYOT) approach.
Because Snowflake implementation and maintenance rely primarily on having some SQL knowledge, no unique or hard-to-find skillsets are needed which means it is really a good choice to get started with at any data maturity stage.

Alejandro Marmol

Ohhhhh! That’s interesting!!!😄 I will really be glad to enjoy that book!!

Roman Zabolotin

Hey Joyce Kay Avila. Thank you for the book, and for sharing your knowledge with us. What kind of technical skills or background do you think are necessary for someone to get the most out of the Snowflake platform and this book?

Joyce Kay Avila

You’re welcome. It’s been my pleasure to join you.
To answer your question, I’d recommend that you at least be familar with relational databases and SQL to get the most out of the book, Roman Zabolotin. In addition, to take advantage of the vast product offerings that Snowflake has to offer, you’d probably want to be knowledgeable about data warehousing and know a little something about data engineering or at least be working with a team who has those skillsets. Because some of the Snowflake workloads, such as Data Sharing / Data Collaboration, are likely going to be new to most everyone, most people probably won’t have the background in that workload so it will be a new topic for most everyone.

Roman Zabolotin

Joyce Kay Avila I heard one can develop a new revenue stream with his data stored on Snowflake Cloud, is it true? And could you elaborate on that a little bit?

Joyce Kay Avila

You may be referring to the Snowflake Marketplace (a subset of the Data Collaboration workload) where data providers can monetize their data by offering it to the public. There also exists the functionality to create a Snowflake private data exchange to monetize data with a set of known partners. Snowflake is definitely the leader in the Data Collaboration (Data Sharing) space.

Buvan

https://www.snowflake.com/en/data-cloud/marketplace/ and https://www.snowflake.com/provider-policies/ are some good place to start with. I think you can provide data as a data producer, and it can be used by other customers by snowflake’s marketplace (it can be free or paid)… Example: https://www.snowflake.com/datasets/starschema-covid-19-epidemiological-data/ they provided the covid data sets. So, the customers no need to reinvent the wheel, and they can concentrate on the business and these providers are responsible for the data. FYI

Roman Zabolotin

Joyce Kay Avila Does Snowflake have something like a free-tier period for people who want to learn the platform, or does he/she needs to pay for this?

Joyce Kay Avila

Yes Snowflake does offer a free 30 day trial – https://signup.snowflake.com/
Here is a tip for you Roman Zabolotin – Snowflake gives you a full 30 day trial where the time starts on the first day of the next month. So, for example, if you sign up for a free trial the first week of the month, you’ll get the remaining 3 weeks of the month plus the full next month. The free trial is that 30+ time period or $400 usage credits, whichever one comes first. So, you’ll want to be smart about how you use the Snowflake trial org so you don’t run out of usage credits. The amount ($400) is a large amount of usage credits so it’s unlikely you’d run out of usage credits but I just wanted to call that out. The usage credits start from the first day that you create the trial account and go through the life of the trial account.

Roman Zabolotin

Thanks, I think it’s a very useful trick.

Đức Phạm

👋 Hello, team!

richard

hi,
i found the book also on the snowflake company website with Compliments of Snowflake but did not yet go deep into it.
would your definitive guide be the best start into the topic and what would be a good next book
other books are
apress
Building the Snowflake Data Cloud, Monetizing and Democratizing Your Data
apress
Mastering Snowflake Solutions, Supporting Analytics and Data Sharing
apress
Jumpstart Snowflake: A Step-by-Step Guide to Modern Cloud Analytics
apress
Snowflake Access Control, Mastering the Features for Data Privacy and Regulatory Compliance
apress
Snowflake Essentials, Getting Started with Big Data in the Cloud
packt
Snowflake Cookbook: Techniques for building modern cloud data warehousing solutions

To take part in the book of the week event:

  • Register in our Slack
  • Join the #book-of-the-week channel
  • Ask as many questions as you'd like
  • The book authors answer questions from Monday till Thursday
  • On Friday, the authors decide who wins free copies of their book

To see other books, check the the book of the week page.

Subscribe to our weekly newsletter and join our Slack.
We'll keep you informed about our events, articles, courses, and everything else happening in the Club.


DataTalks.Club. Hosted on GitHub Pages. We use cookies.