top of page

Sales Insights - Architecture

Aim: Build a small Azure platform infrastructure that could be used by a hypothetical business to ingest, store and visualise business data. Following this I’ll be looking to apply some further intelligence to their data in the form of customer segmentation, sales forecasting or natural language processing on customer reviews.

Design & Approach

Shop_Project_Data-3-High Level Data Flow

Blob Storage

Act as the file landing zone where raw CSV files can be dropped and stored.

Azure Data Factory

Used to ingest and process files from Blob Storage into the SQL Database.

SQL Database

Database set up in a medallion design to store   clean and enrich data. 

Tableau

Reporting tool to access and visualise the "Gold Standard" data within the database.

Azure Architecture

As the aim of this project is to keep it very simple and cost effective, the underlying architecture is also quite straight forward, seen in the image below. 

Shop_Project_Data.jpg
Resource group.png

Azure Subscription & Resource Groups

Subscriptions and resource groups can largely be covered by a Netflix analogy. There’s the bill payer account (the subscription), and then all the free loading user accounts underneath (resource groups). Whilst it’s one entity that owns and pays for the Netflix subscription, there are multiple accounts within that being used. Just instead of re-watching tv series, people use resource groups for building VMs, storing data or data science pipelines etc.

blob storage.png

Blob Storage

Similar to your storage on your laptop, this is a multipurpose file storage for storing all kinds of file types and data. In this use case, it’ll be used as a hypothetical landing zone where files would be dropped for processing. This provides an easy place for Azure Data Factory to find the files.

ADF.png

Azure Data Factory (ADF)

Used for processing data, and lots of it. ADF provides access to all the benefits of cloud (Scalable compute, parallel processing, speed & performance etc.) in a ‘simple’ interface where users can build data pipelines for moving and processing data. In this use case, I built a metadata driven pipeline that looks in the ‘Landing Folder’ for CSV file types and runs an ingestion pipeline for each file. Moving the data from CSV files to a SQL database. The benefit of utilising metadata is rather than building multiple pipelines, you can build one template pipeline and the metadata will dynamically run multiple versions of this template for each variation of the metadata, in this case file name. Building this out further, triggers can be added to run the pipelines based on events such as files landing in the blob storage and then using the pipeline to relocate the files once the data has been transferred into the database.

Screenshot 2024-03-13 at 08.50.54.png
Screenshot 2024-03-13 at 08.51.23.png
SQL Database.png

SQL Database

The core of the solution and where the data sits. Utilising a medallion architecture, which is a design pattern based on the logic of the data improving through Bronze, Silver & Gold stages. The data lands from ADF into the Bronze schema where the data types are incorrect, there may be null or missing values and overall the data needs some work. In the Bronze schema there is an initial data quality stored procedure to conduct some basic processes such as deduplication etc. following this the data is processed to the Silver Schema using another stored procedure that continues the improvement of the data by correcting the data types and handling missing values. Further enrichment occurs in this layer before moving the data into the final Gold layer that will be made accessible to data consumers, in this case we will start with a tableau reporting dashboard.

Schema_Overview.png

Outcome:

For next to no cost, a simple data platform was built making use of Azure's free offerings. With the database and blob storage free, within usage limits, data factory only costing a negligible amount for compute and a little bit of time to set everything up. This gives a solid foundation to continue to build on top of. Quite quickly and easily providing a robust data solution that could be used to replace the common excel reports seen in businesses. In the next phase I’ll be building a Tableau report over the top of this new database to provide insights into the data.

bottom of page