Data Engineering

Tags
Location
Published
Author
 

Guiding Lights

  1. Make reversible, high-ROI decisions
  1. Requirements over building something cool
  1. Architecture first, technology second.
  1. Instead of designing the best architecture, try to design the least worst-architecture.
  1. Perfect is the enemy of good.
  1. Lindy effect: The longer a technology has been established, the longer it will be used.
  1. Use lindy effect as a litmus for determining immutable technologies (as opposed to transitionary)

    Terminology

    Terms that I need to know.
     
    Examples of data architecture:
    1. Data Warehouse
      1. Cloud data warehouse
      2. Data mart
    1. Data Lake
    1. Modern Stack
    1. Lamda Architecture
    1. Kappa Architecture
    1. Data Mesh
     

    Data warehouse

    A data warehouse is a central hub used for reporting and analysis. Data in warehouse is typically highly formatted and structured for analytical use cases. It’s among the oldest and well-established data architectures.
     
    Bill Inmon originated the concept of data warehousing and is defined as “a subject-oriented, integrated, nonvolatile and time-invariant collection of data in support of management’s decisions.”
     

    Data Lakehouse (databricks)

    A data lakehouse is a data management system that combines the benefits of data lakes and data warehouses.
    notion image
     

    Delta Lake

    An optimised storage layer that supports ACID transactions and schema enforcement

    Unity Catalog

    A unified, fine-grained governance solution for data and AI

    Domain

    The real world subject area for which you’re architecting

    Service

    A service is a set of functionality whose goal is to accomplish a task.

    Data Mesh

    Each software team is responsible for preparing its data for consumption across the rest of the organization.
     

    Databricks

     

    Delta Lake

     

    Azure Data Lake Storage (ADLS)

     

    DOMO

     

    Qlik Replicate

     

    Change Data Capture

     

    Portable

     

    SOC 2

     

    Yardi

     

    OLAP

     

    OLAP Cube

     

    OLAP MDX

     

    Netflix Iceberg format

     

    parquet format

     

    Dimensional Modelling (Star Schema)

     

    Power BI

     

    Data Warehousing

     

    Data Lake