bits.of.info

Architecture for a data lake REST API using Delta Lake, Fugue & Spark

Posted on August 14, 2023December 18, 2023 by bitsofinfo

"Hey, we need some kind of a REST API over all our data lakes to let analysts and other integrations query records on demand . Can we please get this done?" That was the use-case laid out that needed a solution. If you've had any experience with data lakes you know that they can be … Continue reading Architecture for a data lake REST API using Delta Lake, Fugue & Spark →

Architecture for generative Terragrunt & Terraform infrastructure as code (IaC)

Posted on February 26, 2023June 26, 2023 by bitsofinfo

This article covers a specific scenario where despite trying to leverage as many DRY (don't repeat yourself) principles made available to us by the underlying IaC (infrastructure as code) frameworks, sometimes we still need to elevate the abstraction to another level to fully reduce code duplication and gain larger economies of scale deploying large platforms … Continue reading Architecture for generative Terragrunt & Terraform infrastructure as code (IaC) →

Fully Automated Lets Encrypt TLS certs with ACME-DNS on Kubernetes

Posted on September 8, 2022June 22, 2023 by bitsofinfo

This article covers fully automating DNS and the issuance of TLS certificates of Kubernetes for Ingress based workloads (both public and private) utilizing cert-manager, external-dns, acme-dns and kubernetes-acme-dns-registrar Scenario You are a busy DevOps professional. You want to setup an Kubernetes platform that can accept any typical HTTP based workload (Ingress based) with minimal management … Continue reading Fully Automated Lets Encrypt TLS certs with ACME-DNS on Kubernetes →

Reacting to K8s Events with k8s-watcher

Posted on May 23, 2022June 26, 2023 by bitsofinfo

As part of a recent project which needs to automatically issue new TLS certificates for hosts defined in Kubernetes Ingress objects, I ended up having to create a library that would let me detect such events in a simplified manner for part of a larger Python program which needs to react to such events. My … Continue reading Reacting to K8s Events with k8s-watcher →

Architecture for non-deterministic mass data collection: part 2: dynamic data lake schemas

Posted on February 14, 2022July 11, 2023 by bitsofinfo

Note, this is the final part of a two part series about this project; article #1 is here. Continuing on from where we last left off, now that we had a functioning collection engine producing full graphs of crawled data all the way down to interrogable dataset_items, it was now time to get down to … Continue reading Architecture for non-deterministic mass data collection: part 2: dynamic data lake schemas →

Architecture for non-deterministic mass data collection: part 1: collection engine

Posted on January 23, 2022July 11, 2023 by bitsofinfo

Note, this is part one of a two part series about this project; article #2 is here. One of my more recent projects was spawned from a pretty interesting idea. The team wanted to build a system that would permit them to scour the Internet for information regarding a particular set of targets; a "target" … Continue reading Architecture for non-deterministic mass data collection: part 1: collection engine →

Serverless AWS Lambda architecture for large scale data ingestion

Posted on November 29, 2021July 7, 2023 by bitsofinfo

Recently was faced with a requirement to build out an extensible data import framework that would be able to consume various file formats provided by 3rd parties.... but make it faster than the current implementation. The current mechanism that was in place was using a proprietary packaged legacy file ETL product who's output was an … Continue reading Serverless AWS Lambda architecture for large scale data ingestion →

Using private Python Azure Artifacts feeds in Alpine Docker builds

Posted on June 3, 2021 by bitsofinfo

This one will be relatively short, figured I'd post this for anyone else who was struggling with use case. Your goal: your application needs to use a Python module that is available in a private Azure Artifact's feed and you want to pip install this module in a Alpine based docker build. Was recently working … Continue reading Using private Python Azure Artifacts feeds in Alpine Docker builds →

Immutable health check management

Posted on August 25, 2020 by bitsofinfo

If you've ever had to monitor an application, endpoint or website, you've likely come across literally hundreds of monitoring services that can execute simple HTTP based checks from N global endpoints then notify an operator when certain thresholds are met. One of the more widely know services that can do this is Pingdom. On a … Continue reading Immutable health check management →