How to Quickly Run a Docker Image on GCP


Run docker container in GCP cloud shell In this blog post, we will briefly go over how to build and run a Docker container as quickly as possible on Google Cloud Platform (GCP). First, you need to create a Docker image. Something very simple should suffice. Here is a sample Dockerfile: FROM python:3 RUN apt-get update && apt-get install -y vim RUN python -m pip install pandas After creating the Dockerfile, the next step is to build and push the image to Artifact Registry.…
Read more ⟶

Gorutines - deep dive into Go:s concurrency features


Gorutine high level Go:s native concurrency features is usually one of the first things developers bring up when describing the advantages of using go. However due to its eas of use it is also wildly miss used. “You can have a second computer once you’ve shown you know how to use the first one.” –Paul Barham We will not deep dive in to when to use and not to use Gorutines specifically even though we touch upon some hints.…
Read more ⟶

Rust: Scoped threads - easier multithreading


The current development in CPU design is going towards large amount of cores rather than faster cores and thus writing parallel code becomes more important in order to utilize the full potential (Concurrency is not Parallelism). In this blog post we will dive into scoped threads, what it is and what is the difference between threads in rust in general. First of all only use threads if you need the speed up, introducing threads to a program adds complexity which both makes the program harder to maintain but if not done correct also slower to run(due to communications between threads and scheduling).…
Read more ⟶

Rust Foreign data wrappers for postgres


Background In this blog post we will try to implement a foreign data wrappers for postgres in rust. We will build on top of pgx in order to not have to build everything from the ground. But first of what is a foreign data wrapper? From the postgres docs: A foreign data wrapper is a library that can communicate with an external data source, hiding the details of connecting to the data source and obtaining data from it.…
Read more ⟶

DBOS: A Database-Oriented Operating System


A group of researches are proposing a radical change of the future operating system. Replacing the fundamental idea from Unix that everything is a file and instead relying on concepts from the database world a operating system that supports large scale distributed applications in the cloud can be built. | Everything is a file The core principles suggested to achieve this is: Store all application in tables in a distributed database Store all OS state in tables in a distributed database.…
Read more ⟶

Bloom filters


Bloom filters is a probabilistic data structure, which is space efficient. Bloom filters can be used to quickly check if a value don’t exists or might exists in a set, false positives are possible(with a low likelihood) but false negatives are not possible. The time to check if an element exsist or add an element is also constant O(k), where k is the number of hash functions(we will covert this later).…
Read more ⟶

Setting up a Basic dbt Development Container for BigQuery in GCP


In this post, you will learn how to set up a basic dbt project in Google Cloud Platform (GCP) and share a development container to kickstart your project. While there are numerous blog posts out there about dbt and BigQuery, none of them share how to set it up in a development container without using any of the dbt-cloud services (at least to my knowledge). Setting upp your enviorment Set up a gcp project/ or take one you allready have…
Read more ⟶

Go memory arenas for apache arrow, Part 2


This blog post will continue to try to dive down in to apache arrow and specifically the Go memory allocation for Apache arrow. This is a follow up to Go memory arenas for apache arrow, Part 1. First of all why do we want to manage memory manually instead of using the GC? One of arrows key features is it support to share memory with out copy between programs however for a GC collected language this will not work that great.…
Read more ⟶

Push based query engine


In this blog post we will dive down in to the difference between push based vs pull based query engines. As simple as is sounds push based is based upon that data is pushed from the sink through the different operators, this is used by snowflake and argued to be superior for OLAP which we will dive deeper into. Pull based have been around for a longer time and is based upon that data is pulled from the sink up through the operators, this is also known as theVolcano Iterator Model…
Read more ⟶

Go memory arenas for apache arrow, Part 1


This blog post will try to dive down in to apache arrow and specifically the Go memory allocation for Apache arrow. Apache arrow state that they allow for the following types of memory allocations: Go default allocations(standard go GC collected memory) CGo allocator(memory allocated through CG0) Checked Memory Allocator Will deep dive in to these once in a follow up blog post. Today the goal is to extend with a new memory allocator, mostly becuase I read up on go memory arena which where introduced in to 1.…
Read more ⟶