Food Data Warehousing

Food Data Warehouse (ETL + Star Schema + Tableau Dashboards)

Saquib Hazari

πŸ“… Sep 25, 2025

GitHub

Follow me >
πŸ› οΈ Skills & Technologies Used

PostgreSQL SQL ETL Data Warehousing Data Modeling Tableau Data Analysis Python CSV

🏠 Food Data Warehouse

This project showcases an end-to-end data warehousing solution built using SQL Server and follows the Medallion Architecture (Bronze, Silver, Gold). It simulates a retail food business by consolidating synthetic ERP and CRM data sources into a robust, analytics-ready star schema.

Designed as a portfolio project, it demonstrates real-world practices in data engineering, ETL, data modeling, and analytical querying, serving as a powerful resource for data-driven decision-making.


βš™οΈ Architecture

The warehouse follows the Medallion Architecture:

πŸ“₯ Bronze Layer

  • Raw ingestion from CRM and ERP systems in .csv format
  • No transformation applied
  • Stored in raw form for traceability

πŸ”„ Silver Layer

  • Cleansed, standardized, and transformed data
  • Data quality issues resolved
  • Structured for integration

✨ Gold Layer

  • Star schema modeled for analytics
  • Fact table: fact_sales
  • Dimensions: dim_customers, dim_products
  • Optimized for reporting and business insights
Data Architecture

Architecture

Data mart design

Data Mart

Data Integration

Data integration


πŸ“– Project Goals

  • Ingest and unify data from multiple sources (ERP + CRM)

  • Implement ETL pipelines for each schema layer

  • Design a star schema for analytics

  • Generate valuable business insights using SQL

  • Visualize insights using Tableau dashboards

    Dashboard:

    Dashboard

πŸ“Š Business Insights

  • 75% of customers used online payment, while 25% preferred cash.
  • Achieved $3.32M in total sales last year with an average order value of $386/month.
  • Phoenix led in customer volume, while New York City had the least.
  • Most sold cuisines: Chinese and South Indian, showing consistent monthly demand.

🧱 Gold Layer Tables

Table Name Description
dim_customers Customer demographics and profile
dim_products Product attributes and categories
fact_sales Transactions and order metrics

Example: Star Schema

        dim_customers
               |
               |
          fact_sales
               |
               |
         dim_products

πŸ”Έ Schema: gold

βœ… Purpose

  • This schema represents the final curated layer used for reporting and dashboarding.
  • Tables follow a star schema model, with surrogate keys, dimensional descriptors, and additive fact measures.

πŸ§ͺ EDA Highlights

  • βœ… All tables follow dimensional modeling best practices.
  • βœ… Surrogate keys are in place for all tables.
  • ⚠️ Apply foreign key checks manually (PostgreSQL doesn’t enforce FKs across layers unless specified).
  • ⚠️ Validate business logic (e.g., totals, date order, null distributions) using SQL profiling.

πŸ“ Next Steps

  1. Run null checks on key descriptive fields.
  2. Validate relationships between fact and dimensions.
  3. Standardize text-based fields (category, channel, engaged, payment_method).
  4. Optionally, create automated data tests using:
    • dbt for integrity tests
    • Great Expectations for profiling and validation
    • Custom SQL scripts for business rule assertions

🧠 This markdown serves as an EDA summary for documentation and QA. Use it as a living checklist for data engineers and analysts to validate your gold layer.


Repository Structure

.
β”œβ”€β”€ products.csv                   # Product data
β”œβ”€β”€ sales.csv                      # Sales transaction data
β”œβ”€β”€ License                        # Project license (e.g., MIT)
β”œβ”€β”€ Procedures/                    # ETL procedures for data loading
β”‚   β”œβ”€β”€ bronze_layer_batch.sql     # ETL batch script for Bronze layer
β”‚   β”œβ”€β”€ main.sql                   # Main SQL script for the project
β”‚   └── silver_layer_batch.sql     # ETL batch script for Silver layer
β”œβ”€β”€ project/                       # Project-related files and analysis
β”‚   └── Business_analysis/         # Business logic & insight generation
β”‚       β”œβ”€β”€ advance_analysis.sql   # Advanced analysis SQL script
β”‚       └── complex_analysis.sql   # Additional complex analysis script
β”œβ”€β”€ assets                         # Project images and visuals
β”‚   β”œβ”€β”€ Integration_food_dataset.png
β”‚   └── integration_model.png
β”œβ”€β”€ Business_Analysis              # SQL analysis for business insights
β”‚   β”œβ”€β”€ complex_analysis_1.sql
β”‚   └── complex_analysis_2.sql
β”œβ”€β”€ CSV                             # Raw data files
β”‚   β”œβ”€β”€ dim_customer.csv
β”‚   β”œβ”€β”€ dim_products.csv
β”‚   └── fact_sales.csv
β”œβ”€β”€ EDA                             # Placeholder for exploratory data analysis
β”œβ”€β”€ food_DB                         # Source database files (CRM & ERP data)
β”‚   β”œβ”€β”€ crm_customers.csv
β”‚   β”œβ”€β”€ crm_feedback.csv
β”‚   β”œβ”€β”€ crm_marketing.csv
β”‚   β”œβ”€β”€ erp_order_items.csv
β”‚   β”œβ”€β”€ erp_orders.csv
β”‚   β”œβ”€β”€ erp_payments.csv
β”‚   └── food_products.csv
β”œβ”€β”€ Gold                            # SQL for Gold Layer tables and views
β”‚   β”œβ”€β”€ god_layer_tables.sql
β”‚   └── gold_layer_view.sql
β”œβ”€β”€ License                         # Licensing information
β”œβ”€β”€ procedures                      # Data transformation batch processes
β”‚   β”œβ”€β”€ bronze_layer_batch.sql
β”‚   └── silver_layer_batch.sql
β”œβ”€β”€ python                          # Python scripts for data processing
β”‚   └── main.py
β”œβ”€β”€ readme.md                       # Project overview and setup instructions
β”œβ”€β”€ requirements.txt                # Python packages needed for the project
β”œβ”€β”€ script                          # SQL DDL scripts for database schema
β”‚   β”œβ”€β”€ ddl_bronze.sql
β”‚   β”œβ”€β”€ ddl_gold_layer.sql
β”‚   β”œβ”€β”€ ddl_silver_layer.sql
β”‚   └── index.sql
β”œβ”€β”€ todo.todo                       # Project task tracking file


πŸͺͺ License

This project is licensed under the MIT License. You are free to use, modify, and share this project with proper attribution.

About Me

Hi there! I’m a data science professional with a background in aeronautical engineering, now transitioning into marketing analytics. With a strong foundation in analytical thinking and data-driven decision-making, I’ve worked on a range of data science projects, including predictive modeling, ETL pipelines, and data warehousing.

Currently, I focus on:

  • Building and optimizing ETL pipelines.
  • Working with large-scale databases and performing advanced data analysis.
  • Using tools like SQL, Python, and Tableau to extract actionable insights from data.

I enjoy exploring new technologies in machine learning, AI, and deep learning. My approach is hands-on, and I love learning by doing β€” whether it’s building machine learning models or fine-tuning complex data pipelines.

Current Projects:

  • Developing a predictive model to understand customer behavior.
  • Implementing an advanced data pipeline to optimize data workflows.
  • Working on various marketing analytics projects to drive business decisions.

Feel free to connect or check out my previous projects! Data warehouse project

Connect with Me

LinkedIn Twitter Discord Website Gmail

Technologies
SQL
PostgreSQL
Python
Tools
Compatibility
macOS
Windows