Ikea Data warehousing
π IKEA Data Warehouse | End-to-End Data Engineering & BI Project
Saquib Hazari
π Sep 25, 2025
π οΈ Skills & Technologies Used
This repository showcases a comprehensive data warehousing and analytics solution, covering the end-to-end data pipelineβfrom data ingestion and transformation to business intelligence and insights generation.
Designed as a portfolio project, it demonstrates industry best practices in data engineering, ETL processes, and analytical modeling, enabling efficient data-driven decision-making. This project serves as a blueprint for building a scalable and structured data warehouse architecture that supports business intelligence and reporting.
Basic Data Architect used for this project to build Bronze, Silver, and Gold layers.

This project involves:
This repository is an excellent resource for professionals and students looking to showcase expertise in:
Objective
Develop a modern data warehouse using SQL Server to consolidate sales data, enabling analytical reporting and informed decision-making.
Specifications
The ETL process is the backbone of this data warehouse project, ensuring seamless data ingestion, transformation, and integration from multiple sources into a structured analytical model. This project follows a layered Medallion Architecture (Bronze, Silver, Gold) to progressively refine raw data into a business-ready format for reporting and analytics.
Objective
Develop SQL-based analytics to deliver detailed insights into:

These insights empower stakeholders with key business metrics, enabling strategic decision-making.
π Insights Delivered
This section includes powerful SQL queries to extract actionable insights from the Gold Layer tables: dim_customers, dim_products, and fact_sales.
1. Schema Check
SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'gold' AND table_type = 'BASE TABLE';
SELECT
column_name,
data_type,
is_nullable
FROM information_schema.columns
WHERE table_schema = 'gold'
AND table_name = 'your_table_name';
SELECT 'dim_customers' AS table_name, COUNT(*) FROM gold.dim_customers
UNION ALL
SELECT 'dim_products', COUNT(*) FROM gold.dim_products
UNION ALL
SELECT 'fact_sales', COUNT(*) FROM gold.fact_sales;
2. Gold.dim_customers Table Check
SELECT gender, COUNT(*) FROM gold.dim_customers GROUP BY gender;
SELECT DATE_PART('year', AGE(date_of_birth)) AS age, COUNT(*)
FROM gold.dim_customers
GROUP BY age ORDER BY age;
SELECT preferred_store, COUNT(*)
FROM gold.dim_customers
GROUP BY preferred_store
ORDER BY COUNT(*) DESC;
3. Gold.dim_products Table Check
SELECT category, COUNT(*)
FROM gold.dim_products
GROUP BY category ORDER BY COUNT(*) DESC;
SELECT defect_flag, COUNT(*)
FROM gold.dim_products
GROUP BY defect_flag;
SELECT supplier_name, COUNT(*)
FROM gold.dim_products
GROUP BY supplier_name
ORDER BY COUNT(*) DESC
LIMIT 10;
4. Gold.fact_sales Table Check
SELECT DATE_TRUNC('month', order_date) AS order_month, COUNT(*)
FROM gold.fact_sales
GROUP BY order_month
ORDER BY order_month;
SELECT
AVG(ship_date - order_date) AS avg_shipping_days,
MAX(ship_date - order_date) AS max_shipping_days
FROM gold.fact_sales;
SELECT order_status, SUM(net_sales) AS total_sales
FROM gold.fact_sales
GROUP BY order_status;
5. Health Check
SELECT
conname AS constraint_name,
conrelid::regclass AS table_from,
a.attname AS column_from,
confrelid::regclass AS table_to,
af.attname AS column_to
FROM pg_constraint
JOIN pg_class ON conrelid = pg_class.oid
JOIN pg_attribute a ON a.attrelid = conrelid AND a.attnum = ANY(conkey)
JOIN pg_attribute af ON af.attrelid = confrelid AND af.attnum = ANY(confkey)
WHERE contype = 'f'
AND pg_class.relnamespace::regnamespace::text = 'gold';
βββ assets/ # Static assets like diagrams or images
β βββ images/
β
βββ CSV/ # All CSV datasets
β βββ Gold_csv/ # Final curated data (Gold Layer)
β β βββ dim_customers.csv
β β βββ dim_products.csv
β β βββ fact_sales.csv
β βββ Ikea_sale/ # Raw synthetic ERP + CRM datasets
β βββ customer_preferences.csv
β βββ customers.csv
β βββ order_items.csv
β βββ orders.csv
β βββ products.csv
β βββ sales.csv
β
βββ License # Project license (e.g., MIT)
βββ Procedures/ # ETL procedures for data loading
β βββ bronze_layer_batch.sql
β βββ main.sql
β βββ silver_layer_batch.sql
β
βββ project/
β βββ Business analysis/ # Business logic & insight generation
β βββ advance_analysis.sql
β βββ business_analysis.sql
β βββ EDA.sql
β
βββ python/ # Python automation/data generation
β βββ main.py
β
βββ readme.md # Project documentation
βββ requirements.txt # Python package dependencies
β
βββ scripts/ # DDL scripts to define schema layers
β βββ ddl_bronze_layer.sql
β βββ ddl_gold_layer.sql
β βββ ddl_silver_layer.sql
β βββ keys_constraint.sql
β βββ main.sql
β
βββ tests/ # Placeholder for test SQLs or checks
β
βββ todo.todo # Notes or to-do list
β
βββ venv/ # Virtual environment files (excluded from version control)
This project is licensed under the MIT License. You are free to use, modify, and share this project with proper attribution.
Hi there! Iβm Saquib Hazari. With a strong foundation in analytical thinking and data-driven decision-making, Iβve worked on a range of data science projects, including predictive modeling, ETL pipelines, and data warehousing.
Currently, I focus on:
I enjoy exploring new technologies in machine learning, AI, and deep learning. My approach is hands-on, and I love learning by doing β whether itβs building machine learning models or fine-tuning complex data pipelines.
Feel free to connect or check out my previous projects! Data warehouse project