As we navigate our digital environments on a daily basis, data silently influences our decisions and the way we experience things. Just like how we understand different words and gestures in real life, Google BigQuery uses different data types to give meaning to these online experiences. The BigQuery data types might seem small and easy to miss, but they help us understand all the information on the internet, making it more meaningful and useful.
While there is no denying that BigQuery data types play a major role in data integrity and efficiency, the catch is that BigQuery isn't very forgiving when it comes to misaligned data types. Mixing them up can generate unexpected errors, skewed results, and a general feeling of "why isn't this working as it should?"
Now, if this situation sounds like something you are experiencing and desperately want to get out of, you are at the right place. Over the next 10 minutes, we’ll break down the different BigQuery data types to help you understand how to handle your data when working with BigQuery, and to simplify the process of tackling data migration tasks.
What Is Google BigQuery?
BigQuery is Google’s fully managed enterprise data warehouse. It is designed to quickly analyze large datasets using features like machine learning, geospatial analysis, and business intelligence.
One of its key features is its serverless architecture which doesn’t require managing any infrastructure. Instead, you can directly use SQL to dig deep into your data. It provides rapid querying capabilities and can handle terabytes of data in seconds.
BigQuery has a unique approach – it keeps the compute engine (which does the analysis) separate from storage. This design means you can either store data within BigQuery itself or analyze your data where it currently resides. The platform provides:
- Streaming to continuously update data
- Federated queries to pull data from other sources
- Advanced tools like BigQuery ML for deeper data insights
What Are The 9 Different BigQuery Data Types?
Google BigQuery has different data types for various datasets and use cases. Understanding these is important if you are looking to maximize the efficiency of BigQuery’s warehousing solutions. Using BigQuery data types not only has certain performance implications, but also supports maintaining data integrity.
With BigQuery supporting everything from basic numerical values to intricate arrays or structs, it gives you greater flexibility in data processing. So, let’s explore the different data types in detail.
Numeric Data Type in BigQuery
The numeric data type represents any data that can be quantified numerically, including whole numbers, fractions, and numbers with decimal points. In databases like Google BigQuery, these types are important for arithmetic operations, comparisons, and statistical computations. They allow for precise representation and manipulation of quantitative data. There are 3 primary numeric data types in BigQuery:
Integer (INT64)
This data type represents whole numbers without any fractional component.
Example:
plaintextSELECT 100 AS IntValue;
In this simplistic representation, we’re identifying a number, 100, as a whole number, emphasizing its nature as an integer.
Float (Float64)
Floats are used for numbers that have both whole and fractional components. They give an approximation rather than an exact value which is something to be cautious of when dealing with precise calculations.
Example:
plaintextSELECT 3.14159 AS ApproximatePi;
Pi is a number that goes on indefinitely. But for many applications, rounding it to a few decimal places (as an approximation) suffices.
Numeric
This data type is analogous to the ‘Decimal’ type in many database systems. The Numeric type can store valid values with 38 fractional digits of precision and 9 decimal digits of scale, like valid values from 00. This makes it perfect for exact calculations like those required in financial operations.
Example:
plaintextSELECT NUMERIC '25.6789012345' AS PreciseValue;
Here, the Numeric data type guarantees every decimal is accounted for.
Arithmetical/Mathematical Operations
You can perform basic arithmetic operations on numeric data types. It includes addition, subtraction, multiplication, and division. For example, you might calculate the area of a circle using arithmetic/mathematical operations:
plaintextWITH CircleData AS (
SELECT 7 AS Radius
)
SELECT Radius, 3.14159 * Radius * Radius AS Area
FROM CircleData;
Output
Handling Division Errors
Dividing by zero or performing operations that yield non-finite values can disrupt your SQL queries. BigQuery provides the IEEE_divide function to handle these scenarios. This function returns ‘NaN’ (Not-a-Number) when a division by zero occurs, so your query runs without hitches.
plaintextWITH DivisionData AS (
SELECT 100 AS Numerator, 0 AS Denominator
)
SELECT Numerator, Denominator, IEEE_Divide(Numerator, Denominator) AS Result
FROM DivisionData;
Output
Safe Functions
In some cases, you might want your query to return NULL instead of an error. For instance, the logarithm of a negative number is mathematically undefined. Using the ‘SAFE’ prefix with your operations returns a NULL value instead of an error.
plaintextSELECT SAFE.SQRT(-4) AS SquareRootValue;
Output
Comparisons
Numeric data types in BigQuery support standard comparison operators. These comparisons are necessary, especially when filtering data using the WHERE clause. When you understand the sorting order of values in BigQuery, it becomes easier to filter and analyze your data. For instance:
WITH Numbers AS(
SELECT 5 AS Value
UNION ALL SELECT -3
UNION ALL SELECT 8
)
SELECT * FROM Numbers
WHERE Value > 0
ORDER BY Value ASC;
Output
Precise Decimal Calculations With Numeric
When every decimal point matters, BigQuery’s numeric type shines. Especially in financial and accounting sectors where miscalculations, even by the tiniest of margins, can be catastrophic. For example:
plaintext
WITH FinancialData AS (
SELECT NUMERIC '10.25' AS Price,
NUMERIC '0.07' AS TaxRate
)
SELECT Price, TaxRate, Price * TaxRate AS TaxAmount
FROM FinancialData;
Output
Boolean Data Type in BigQuery
The Boolean data type in BigQuery represents variables using 2 distinct keywords: TRUE and FALSE. Unlike some languages where the representation might be 0 and 1, BigQuery uses these explicit keywords which makes it easier to read and understand. Let’s look at the core functions of Boolean.
Logical Operations
- They use logical operators such as AND, OR, and NOT.
- These operations help in refining the results based on certain conditions.
- The primary tool used for filtering with logical operations is the WHERE clause.
- The outcome when employing a WHERE clause paired with a logical operator is dependent on the WHERE condition being TRUE or FALSE.
Conditional Operations
- BOOLEAN values play an important role when working with conditional clauses.
- These clauses make the queries more efficient as they give both an option and an alternative when the condition isn’t met.
- A typical example of this is the IF clause which operates on the given condition, returning different results based on whether the condition evaluates to TRUE or FALSE.
Exploring Boolean Through A Unique Code Example
Suppose we have a dataset of students and we want to determine if they passed or failed based on their scores. Let’s also introduce a Boolean field has_attendance indicating if the student has met the attendance criteria.
plaintextWITH student_data AS(
SELECT "John" AS Name, 85 AS Score, TRUE AS has_attendance
UNION ALL SELECT "Doe", 45, TRUE
UNION ALL SELECT "Jane", 60, FALSE)
SELECT Name, Score,
IF(has_attendance AND Score >= 50, "Passed", "Failed") AS Result
FROM student_data
Output
In this example, the IF clause is used to determine if a student passed or failed. A student is considered to have ‘Passed’ only if they meet the attendance criteria (‘has_attendance’ is TRUE) and their score is 50 or above. Otherwise, they are marked as ‘Failed’.
Strings Data Types in BigQuery
Strings in BigQuery represent sequences of Unicode characters with variable lengths. They play an important role in data handling, and with BigQuery, you get a rich set of functions to improve your string manipulation tasks. Remember that strings must always be enclosed within single, double, or triple quotation marks.
Core Functions & Operations On Strings
To efficiently handle strings in BigQuery, different functions are provided. Here’s a detailed overview:
Cast
This command is crucial for data type conversions. For instance, you can convert a string to an INT64 or FLOAT64, and likewise, an integer or float can be converted back to a string. Here’s an example:
plaintextSELECT
CAST (25 AS STRING) AS StringRepresentation,
CAST('100' AS INT64) AS IntegerRepresentation
Safe_cast
Sometimes, direct conversion using cast might fail. In such scenarios, safe_cast comes to the rescue by returning NULL so your query still runs without problems. For instance, converting the string ‘hello’ to a boolean data type would fail. But with safe_cast, you’d get a NULL instead.
plaintextSELECT CAST('false' AS bool) AS RegularCast,
SAFE_CAST ('hello' AS bool) AS SafeCastResult
Format
This command lets you shape the appearance of the data being converted. For instance, you can control the precision of a float or pad numbers with zeros. Here’s a query example:
plaintextSELECT
FORMAT ('%4d', 320) AS FormattedA,
FORMAT ('%6.2f', 44.4444) AS FormattedB,
FORMAT ('%04d', 91) AS FormattedC
Output
Concat
Concat is an invaluable tool when you want to merge parts of different strings. Let’s say you have a name and city column and you want to combine them:
plaintextWITH name_city AS (
SELECT 'Alice' AS name, 'London' as city
UNION ALL SELECT 'Bob', 'Paris'
UNION ALL SELECT 'Charlie', 'Berlin'
)
SELECT name, city,
CONCAT(
name,
' lives in ', city) AS description
FROM name_city
Output
Regular Expressions
BigQuery provides robust support for regex operations that increase your string search capabilities. Some of the main functions include:
- REGEXP_CONTAINS: Detects if a regex pattern or character is present.
- REGEXP_EXTRACT: Extracts the matched regex pattern.
- REGEXP_REPLACE: Replaces the matched regex pattern.
For example, to check which strings match a pattern of 3 or more letters:
plaintextSELECT text,
REGEXP_CONTAINS (text, r'[a-zA-Z]{3,}') AS MatchStatus
FROM (SELECT *
from unnest(['abc', 'xy', 'defgh']) AS text
)
Output
Bytes Data Types in BigQuery
Bytes, as a data type in BigQuery, serve a unique purpose. Unlike strings, which are one or two-digit characters, they are a sequence of raw data often used to store binary information.
Representation Of Bytes
- Always prefixed with the letter 'B' or 'b' to indicate the byte nature of the data.
- Enclosed within single, double, or even triple quotation marks.
- Should never be swapped with strings despite the familiar look.
Let's say you have a database with a table named ‘music_files’. Each row represents a music file, and there’s a column named ‘AudioSignature’ which contains a byte representation of the audio signature for matching purposes.
SELECT * FROM music_files
WHERE AudioSignature = b'9aF3Zx8J12LnP7Qr';
In this query, we’re looking to retrieve music files with a specific audio signature. The byte value b'9aF3Zx8J12LnP7Qr' serves as our filter. It's important to note the prefix ‘b’, indicating that we are dealing with a byte literal and not a string.
Time & Date Data Types in BigQuery
Google Cloud’s BigQuery has a range of data types to cater to different requirements when working with date and time. These data types help in representing different moments or periods which can be crucial for analysis and reporting in BigQuery.
Date Data Type
Represents an absolute point in time, denoting a valid calendar date that is independent of any time zone.
- Typical format: YYYY : [ M ] M : [ D ] D.
- Represents a valid calendar date that is independent of any time zone.
- A single date can depict different periods if logged in various time zones.
For instance, the date ‘2023-08-25’ in BigQuery’s Date data type would signify the same calendar day across the globe even if the exact moment it refers to varies depending on the local timezone.
TimeData Type
Denotes a specific time devoid of any particular date. If you were to specify “14:30:00” in the Timedata type, it would just be a representation of the time 2:30 PM without any reference to which day it belongs.
Datetime Data Type
- Points to a specific moment
- It disregards any precision beyond that particular moment.
For instance, if you observe ‘2023-08-25 14:30:00’ in the DateTime data type, it gives a picture of that specific minute on that specific day without any finer granularity or timezone context.
Timestamp Data Type
- DateTime is enclosed within Timestamp, albeit with a lower precision level.
- Captures an exact moment with microsecond precision irrespective of geographical location.
To put it simply, while DateTime might tell you it’s 2:30 PM on August 25, 2023, Timestamp would provide added details like ‘2023-08-25 14:30:00.123456’.
Timezone Data Types in BigQuery
Timezones, as another major BigQuery data type, play an important role when it comes to parsing and displaying timestamps. In BigQuery, there are 2 primary formats to represent timezones.
Offset From Coordinated Universal Time (UTC)
- In this format, there should be no spaces between the timezone and the rest of the timestamp.
- The UTC offset format represents the difference in hours and minutes from the Coordinated Universal Time.
For instance:
2020-05-15 10:45:00.00-3:00
Here, -3:00 indicates that the time is 3 hours behind UTC.
Time Zone Name From TZ Database
- This format uses the time zone names from the TZ database which is a comprehensive collection of time zone information.
- There must be a space between the timezone name and the rest of the timestamp.
For example:
2020-05-15 10:45:00.00 America/New_York
Here, America/New_York specifies the timezone corresponding to New York, USA.
Geography (GIS) Data Type in BigQuery
Bigquery’s Geography data type is adept at representing geographical entities like points, paths, and shapes on Earth’s surface. Its alignment with the WGS84 reference ellipsoid, which is common in GPS systems, facilitates direct usage of latitude and longitude to pinpoint locations in BigQuery.
Key Functions & Use-cases
Constructors
They create new geographical values. For instance:
plaintextST_GEOGFROMPOINT(latitude, longitude)
Parsers
These functions interpret external formats like WKT and GeoJSON and transform them into Geography data types:
plaintextST_PARSEGEOG(wkt_string)
Formatters
Acting as the reverse of Parsers, they convert Geography data types into known external formats. For instance:
plaintextST_TOGEOGTEXT(geography_value)
Transformations
They derive new Geography data types from pre-existing ones. Consider:
plaintextST_COMBINE(geography_1, geography_2)
Predicates
These are evaluators that return TRUE or FALSE based on spatial attributes or relationships. For example:
plaintextST_ISWITHIN(geography_1, geography_2, distance)
Accessors
They grant insights into geographical data properties. To retrieve the total number of points in a geographical entity:
plaintextST_TOTALPOINTS(geography_value)
Measures
Used for gauging metrics between geographical entities, like:
plaintextST_MEASUREDISTANCE(geography_1, geography_2)
Aggregate Functions
They perform collective operations specific to geography data. For example:
plaintextST_COMBINEALL(geography)
Array Data Type in BigQuery
In BigQuery, the array data type is your go-to for storing ordered collections of elements. While these collections can have any data type, they should not be arrays themselves. Arrays in BigQuery find representation via angle brackets (< and >).
Let’s look at array declarations
Standard Arrays
BigQuery accommodates most data types within an array. Examples include:
plaintextARRAY<Double 64>
ARRAY<String>
Handling NULL
If a query is to return an array with only NULLs, BigQuery will display an error. However, such arrays can be used internally in your queries.
Nested Arrays
Direct nesting of arrays is not permissible, so ARRAY<ARRAY<INT 32>> isn’t valid.
Using Struct With An Array
To achieve nesting, add a struct. An example:
plaintextARRAY<Struct<ARRAY<INT 32>>>
Struct Data Type in BigQuery
The struct data type can have different fields, each with its designated data type and optionally a name. In BigQuery, structs are demarcated using angle brackets (< and >).
Let’s take a look at struct declarations.
Standard Structs
Structs in BigQuery can have a wide range of data types. An example is:
plaintextSTRUCT<INT 32>
Nested Structs
One Struct can reside within another, as shown by:
plaintextSTRUCT<y STRUCT<'a' String, b INT 32>>
Struct with an array
A struct can carry an array:
plaintextSTRUCT<ARRAY<STRING>>
Handling NULL
Structs in BigQuery are receptive to NULL values or fields comprising NULL values.
Moving BigQuery Data In Real-Time With Estuary Flow
Google’s BigQuery is an advanced tool for managing huge volumes of data. But sometimes businesses have to transfer this data to other systems and doing so quickly and accurately can be a challenge. While BigQuery has tools for this, they might not be the fastest option. This is where Estuary Flow comes into play, making data transfers efficient and timely — without having to worry about maintenance, schema, coding, or the different BigQuery data types.
Estuary Flow enables real-time data movement to and from BigQuery. Flow is our managed data integration platform built for real-time analytics. It lets you move and transform data between databases, warehouses, lakes, and other systems with millisecond latency. It achieves millisecond freshness through change data capture (CDC) from BigQuery.
Let us explore some of the key capabilities of Estuary Flow:
- Live monitoring: Offers live reporting and monitoring to ensure uninterrupted data flow.
- Exactly-once semantics: Guarantees transactional consistency for precise data views.
- Materializations: Offers low-latency views across desired systems with native real-time operations.
- Schema inference: Converts unstructured data into structured data for better data management.
- Powerful transformations: Supports both SQL and JavaScript for real-time streaming transformations.
- Scalability: A distributed system that can handle data volumes up to 7 GB/s and can backfill large data sets quickly.
- Optimized data handling: Minimizes system load with low-impact CDC, distributing data across all systems without re-syncing.
- Data capture: Seamlessly captures data from different sources, including real-time Change Data Capture (CDC) from databases and integrations with SaaS applications.
Key Takeaways
The wide range of Google’s BigQuery data types lets you store and process different datasets with precision. When you pick the right data type, you can enhance data integrity, system performance, and analytical capabilities. For many organizations, the next step is moving BigQuery data to other systems in real-time for additional processing or analytics.
Estuary Flow provides a robust platform for achieving millisecond fresh BigQuery data in downstream tools. Its seamless change data capture from BigQuery combined with streaming ETL enables real-time data pipeline orchestration.
With pre-built connectors, automatic schema management, data consistency features, and enterprise-grade scalability, Estuary Flow simplifies your real-time data integration challenges. To experience its capabilities firsthand, you can sign up for free today. You can also contact our team for additional assistance in implementing real-time BigQuery data movement.