9 min read

Top Snowflake Interview Questions for Data Engineers

Nasrul Hasan
Nasrul Hasan
Nasrul Hasan
Cover Image for Top Snowflake Interview Questions for Data Engineers

1. Explain Snowflake Architecture?

Snowflake uses a fully decoupled architecture consisting of three independent layers: Storage, Compute, and Cloud Services. The Storage layer holds all data in compressed, columnar structures, internally broken into micro-partitions. Compute is provided through Virtual Warehouses, which can scale independently and execute SQL, transformations, and data loading. Each warehouse is isolated, meaning workloads don’t impact each other. The Cloud Services layer coordinates authentication, metadata management, caching, query optimization, billing, and infrastructure orchestration. Unlike traditional warehouses, Snowflake’s compute and storage scale independently, allowing you to increase compute for performance without increasing storage cost. This architecture supports concurrency, elasticity, and automatic management without exposing infrastructure.

2. Explain Micro Partition in snowflake

Snowflake stores data in micro-partitions, which are immutable storage blocks typically ranging from 50 MB to 500 MB of compressed data. They are stored in columnar format and contain rich metadata, such as min/max values per column, bloom filters, and clustering information. Snowflake’s pruning engine uses this metadata to skip entire micro-partitions during query execution, reducing I/O and improving performance. Micro-partitions get automatically optimized, compacted, and reclustered over time without manual intervention. You cannot directly control micro-partition size, but clustering keys influence how Snowflake organizes data for efficient pruning.

3. What are stages in snowflake?

External stages let Snowflake reference files stored in external cloud storage like S3. They are used for loading, unloading, or querying external data. You typically create a storage integration for secure credentials, then define a stage pointing to an S3 bucket.

CREATE OR REPLACE STAGE my_csv_stage
  URL = 's3://company-ingestion-raw/customer/'
  CREDENTIALS = (
      AWS_KEY_ID = 'AKIAxxxxxxxxxxxxxxx'
      AWS_SECRET_KEY = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
  )
  FILE_FORMAT = (
      TYPE = 'CSV'
      FIELD_DELIMITER = ','
      SKIP_HEADER = 1
  );
n1ql

4. What is Time Travel in snowflake

Time Travel allows you to retrieve historical data from Snowflake tables for up to 90 days (depending on your edition). It lets you query previous versions of data or restore mistakenly dropped tables. Time Travel works because Snowflake retains old micro-partitions and metadata for the retention window.

SELECT * FROM customers AT (OFFSET => -3600);     -- 1 hour ago
SELECT * FROM orders BEFORE (STATEMENT => 'UUID'); 
RESTORE TABLE customers TO BEFORE (STATEMENT => 'UUID');
pgsql

5. What is a Failsafe in snowflake

Fail-safe is a 7-day period after Time Travel expires during which Snowflake can recover data for operational emergencies. Unlike Time Travel, Fail-safe is not user-accessible — only Snowflake support can restore data. Its purpose is long-term protection, not analytics. You should rely on Time Travel or cloning for recoveries, not Fail-safe.

6.How do you dynamically mask data in snowflake

Masking policies in Snowflake enforce dynamic data masking at query time, based on user roles. You can mask PII fields such as email or phone based on who is querying the table.

CREATE MASKING POLICY mask_email AS
  (val STRING) RETURNS STRING ->
    CASE
      WHEN CURRENT_ROLE() IN ('ADMIN_ROLE') THEN val
      ELSE '***MASKED***'
    END;

ALTER TABLE customers MODIFY COLUMN email SET MASKING POLICY mask_email;
pgsql

7. What are snowpipes in snowflake

Snowpipe is Snowflake’s serverless continuous ingestion service, used to automatically load data from S3/GCS/Azure as soon as files arrive. It listens to events from cloud storage and triggers ingestion automatically, without maintaining warehouses. Snowpipe loads data micro-batch style and relies heavily on metadata to avoid duplicate ingestion.

CREATE PIPE my_pipe
AUTO_INGEST = TRUE
AS
COPY INTO raw_sales
FROM @my_s3_stage
FILE_FORMAT = (TYPE = PARQUET);
oxygene

8. What are streams in snowflake?

Streams allow Snowflake to capture CDC-style changes (insert/update/delete) to a table. A stream keeps a change log, enabling incremental processing. When any row is updated on streamed table it capture additional columns what changed like METADATA$ACTION, METADATA$UPDATE.

When used with tasks or Snowpipe, you can build fully automated pipelines.

CREATE STREAM customer_stream ON TABLE customers;

SELECT * FROM customer_stream;
pgsql

9. What are some warehouses are available in snowflake and how do you decide which is best?

In Snowflake, warehouses are compute clusters used to run queries. They come in different sizes—from X-Small, Small, Medium, Large, XL, 2XL, up to 6XL. You can also create multiple warehouses for different workloads, such as ETL/ELT, BI/reporting, data science, or ad-hoc queries.

When deciding which warehouse is best, you look at:

  • Workload intensity: Larger warehouses for heavy transformations or large joins; smaller ones for light queries.

  • Concurrency needs: More users or processes may require a bigger warehouse or multi-cluster mode for auto-scaling.

  • Cost vs. performance: Larger warehouses run faster but cost more per second, so you balance speed and budget.

  • Query pattern: If performance is inconsistent due to queueing, scale up or enable multi-cluster; if queries are simple, scale down.

10 What are Materialised view in snowflake?

Materialized View (MV) in Snowflake is a stored, precomputed copy of a query result. Instead of recalculating the query each time, Snowflake maintains the MV so queries run much faster—especially for aggregations, joins, and filters on large tables.

Refresh Frequency:

  • In Snowflake, materialized views refresh automatically and continuously—there is no fixed frequency like daily or hourly.

  • Refresh happens as soon as underlying table data changes, using Snowflake’s background services.

  • Manual refresh is not required.

CREATE MATERIALIZED VIEW sales_mv AS
SELECT 
    customer_id,
    SUM(amount) AS total_amount,
    COUNT(*) AS txn_count
FROM sales
GROUP BY customer_id;
n1ql