Google Bigquery



  • BigQuery is distributed DB
  • different ways to load data into BigQuery
  • partitioning the database help to optimize queries performance and reduce query cost. Reason: The BigQuery pricing model depends heavily (among other aspects) on the amount of data that BigQuery needs to process while handling a query partitioned-table
  • clustering helps to organize the data based on the contents of one or more columns in the table' schema clustered-table
  • how to remove duplicates manually in BigQuery
  • the data model/schema in BigQuery is usually denormalized. Because BigQuery uses columnar storage, where each column is stored in a separate file block. This makes BigQuery an ideal solution for OLAP (Online Analytical Processing) use cases bigquery-storage
  • BigQuery when used in a proper context such as append-only data, time-series data, with the ability to run very complex queries, will definitely shine

The free tier of BigQuery require credit card. From 2019, Google introduce BigQuery sandbox, a credit-card free path to enable new users to experiment with BigQuery at no cost—without having to enter credit card information. BigQuery sandbox provides you with up to 1 terabyte per month of query capacity and 10GB of free storage. All tables and partitions have a 60-day retention policy. Some features of BigQuery are not included in the sandbox (DML, Streaming, Data Transfer Service).

2022-01-24, the Try BigQuery Free button mentioned in the blog article always lead to the Free trial instead of BigQuery sandbox as advertised. Hence I follow this guide to access BigQuery sandbox.

Helpful resources:

BigQquery Pricing

  • storage: The first 10 GB per month is free
  • queries (analysis): The first 1 TB of query data processed per month is free

bobby_dreamer | Loading data into BigQuery using Python

Patrick Dunn | Time series analytics with BigQuery

Bence Komarniczky | Loading files into BigQuery

Zakhar Yung | BigQuery Tutorial – How to Add Agility to Your Business