AWS – bits.of.info

Architecture for non-deterministic mass data collection: part 2: dynamic data lake schemas

Posted on February 14, 2022July 11, 2023 by bitsofinfo

Note, this is the final part of a two part series about this project; article #1 is here. Continuing on from where we last left off, now that we had a functioning collection engine producing full graphs of crawled data all the way down to interrogable dataset_items, it was now time to get down to … Continue reading Architecture for non-deterministic mass data collection: part 2: dynamic data lake schemas →

Architecture for non-deterministic mass data collection: part 1: collection engine

Posted on January 23, 2022July 11, 2023 by bitsofinfo

Note, this is part one of a two part series about this project; article #2 is here. One of my more recent projects was spawned from a pretty interesting idea. The team wanted to build a system that would permit them to scour the Internet for information regarding a particular set of targets; a "target" … Continue reading Architecture for non-deterministic mass data collection: part 1: collection engine →

Serverless AWS Lambda architecture for large scale data ingestion

Posted on November 29, 2021July 7, 2023 by bitsofinfo

Recently was faced with a requirement to build out an extensible data import framework that would be able to consume various file formats provided by 3rd parties.... but make it faster than the current implementation. The current mechanism that was in place was using a proprietary packaged legacy file ETL product who's output was an … Continue reading Serverless AWS Lambda architecture for large scale data ingestion →

Hazelcast discovery with Etcd

Posted on November 25, 2015December 20, 2018 by bitsofinfo

I've used Hazelcast for years and have generally relied upon the availability of multicast for Hazelcast cluster discovery and formation (within a single data-center). Recently was faced with two things, expand the footprint into a non-multicast enabled data-center and secondly pre-prep the service for containerization where nodes will come and go as scaling policies dictate … Continue reading Hazelcast discovery with Etcd →

Hazelcast discovery with Consul

Posted on November 20, 2015December 22, 2018 by bitsofinfo

Book review: Building Microservices

Posted on April 6, 2015 by bitsofinfo

Recently I read Sam Newman's "Building Microservices" , at ~280 pages its a fairly quick read. The reviews on this book overall are mixed and I can see where readers are coming from. By the title of this book one might expect some coverage of some of the microservices frameworks out there, concrete examples, maybe … Continue reading Book review: Building Microservices →

Copying lots of files into S3 (and within S3) using s3-bucket-loader

Posted on November 10, 2014December 22, 2018 by bitsofinfo

Recently a project I've been working on had the following requirements for a file-set containing roughly a million files varying in individual size from one byte to over a gigabyte; and the file-set size in total being sized between 500gb and one terabyte Store this file-set on Amazon S3 Make this file-set accessible to applications … Continue reading Copying lots of files into S3 (and within S3) using s3-bucket-loader →

Clustering Liferay globally across data centers (GSLB) with JGroups and RELAY2

Posted on May 21, 2014December 22, 2018 by bitsofinfo

Recently I've have been looking into options to solve the problem of GSLB'ing (global server load balancing) a Liferay Portal instance. This article is a work in progress... and a long one. Jan Eerdekens states it correctly in his article, "Configuring a Liferay cluster is part experience and part black magic" .... however doing it … Continue reading Clustering Liferay globally across data centers (GSLB) with JGroups and RELAY2 →

Testing yas3fs: a distributed S3 FUSE filesystem

Posted on May 4, 2014 by bitsofinfo

I've recently been doing quite a bit of evaluation of a few S3 filesystems, one in particular is yas3fs which so far is quite impressive. I plan on doing a more detailed post about it later, however for now I'd like to share a little tool I wrote to help me in my evaluation of it. You … Continue reading Testing yas3fs: a distributed S3 FUSE filesystem →