What is a BigQuery schema?

A BigQuery schema defines the structure of a table in Google BigQuery, including column names, data types (STRING, INT64, FLOAT64, BOOL, TIMESTAMP, DATE, RECORD), and modes (NULLABLE, REQUIRED, REPEATED).

How does JSON map to BigQuery types?

JSON strings map to STRING, integers to INT64, floats to FLOAT64, booleans to BOOL, ISO date strings to TIMESTAMP or DATE, null values produce NULLABLE mode, arrays produce REPEATED mode, and nested objects produce RECORD type with nested fields.

What is RECORD and REPEATED in BigQuery?

RECORD is a complex type that represents a nested object structure with its own sub-fields. REPEATED means a field can contain an array of values, similar to an array type in other databases. A field can be both RECORD and REPEATED (array of objects).

JSON to BigQuery Schema Generator Online

snake_case field namesAdd partitioning

JSON Input

BigQuery Schema Output

What is JSON to BigQuery Schema Generator?

Google BigQuery is a columnar, serverless data warehouse, and unlike a document store it needs an explicit schema before you can load data. This tool reads a JSON document and produces a BigQuery table schema — either as the JSON schema array that the bq load command and the BigQuery API expect, or as a CREATE TABLE DDL statement you can paste straight into the console. BigQuery's data model is what makes its schema interesting, because it is not flat. Every field has a type (STRING, INT64, FLOAT64, NUMERIC, BOOL, TIMESTAMP, DATE, RECORD) and, separately, a mode (NULLABLE, REQUIRED, or REPEATED). These two axes are independent, and confusing them is the number-one source of schema errors. A nested JSON object becomes a RECORD (also called a STRUCT in SQL) — a field that contains its own sub-fields. A JSON array becomes a REPEATED field. And a JSON array of objects becomes the most powerful BigQuery construct: a REPEATED RECORD, the way BigQuery stores one-to-many relationships inline instead of in a separate joined table. This nested-and-repeated model is what lets BigQuery store denormalized analytics data efficiently, but it changes how you query. You do not JOIN to a child table — you UNNEST the repeated field. The generator gets the schema right; the FAQ below shows the matching query syntax so the schema is actually usable once your data is loaded.

How to Use

Choose the output format: Schema JSON for the bq command-line tool and the API (bq load --schema=schema.json), or DDL when you want to run CREATE TABLE directly in the BigQuery console
Paste an ARRAY of representative rows, not one row — mode inference (whether a field is NULLABLE or REQUIRED) depends on seeing nulls across multiple records
Enable snake_case conversion: BigQuery column names cannot contain most special characters and the ecosystem convention is snake_case, so camelCase JSON keys should be normalized
Verify that arrays of objects came out as REPEATED RECORD and that you intend that denormalized shape, rather than splitting them into a separate table
For large tables, add partitioning on a TIMESTAMP/DATE column and clustering on high-cardinality filter columns — this is the single biggest lever on query cost in BigQuery

Why Use This Tool?

Emits the exact JSON schema array shape that bq load and the BigQuery REST API require, with both type and mode set on every field

Maps nested JSON objects to RECORD and arrays to REPEATED mode, including the REPEATED RECORD shape for arrays of objects

Detects ISO date strings and assigns TIMESTAMP or DATE rather than leaving them as STRING

Produces a CREATE TABLE DDL alternative with partitioning hints for direct console use

Normalizes column names to snake_case to satisfy BigQuery naming rules

Tips & Best Practices

Mode and type are independent: REPEATED is a mode, not a type. An array of strings is type STRING, mode REPEATED — there is no "array type". Reading the schema as if REPEATED were a type leads to wrong UNNEST queries
Use NUMERIC (or BIGNUMERIC) for money, never FLOAT64. FLOAT64 is a binary float and will give you 0.1 + 0.2 != 0.3 rounding errors in financial sums. The generator may infer FLOAT64 from a decimal sample — change it to NUMERIC for currency
REQUIRED mode is permanent and strict: a load fails entirely if any row is missing that field, and you cannot relax REQUIRED to NULLABLE later without rewriting the table. When in doubt, prefer NULLABLE
TIMESTAMP stores an instant in UTC; DATETIME stores a wall-clock time with no zone; DATE stores just the calendar day. An ISO string with a Z or offset should be TIMESTAMP — pick deliberately because they are not interchangeable in queries
You can ADD new NULLABLE or REPEATED columns to an existing table, but you cannot delete a column, change its type, or change its mode in place — plan the schema before loading large volumes

Frequently Asked Questions

What is the complete JSON to BigQuery type and mode mapping?

JSON string → type STRING, mode NULLABLE (or TIMESTAMP/DATE when it parses as an ISO date). JSON integer → INT64. JSON float → FLOAT64 (switch to NUMERIC for currency). JSON boolean → BOOL. JSON null → the field is emitted with mode NULLABLE. JSON nested object → type RECORD with the sub-fields nested under "fields". JSON array of primitives → the element type with mode REPEATED. JSON array of objects → type RECORD with mode REPEATED. Note that type (STRING, INT64, RECORD...) and mode (NULLABLE, REQUIRED, REPEATED) are always set independently.

What is the difference between RECORD, STRUCT, and REPEATED?

RECORD and STRUCT are the same thing under two names — RECORD is the term used in the schema/API, STRUCT is the term used in SQL. Both mean a field that holds an ordered set of named sub-fields, i.e. a nested object. REPEATED is unrelated: it is a mode meaning the field holds an array of values. They combine: a REPEATED RECORD is an array of nested objects, which is how BigQuery models a one-to-many relationship without a join table.

When should a field be NULLABLE, REQUIRED, or REPEATED?

NULLABLE (the default) means the field may be absent or null in a row. REQUIRED means every row must supply a non-null value — a load fails if any row omits it, and the constraint cannot be relaxed later, so use it sparingly. REPEATED means the field is an array (zero or more values); a REPEATED field is never null, an empty array is the absence case. Use REPEATED for any JSON array, and only use REQUIRED for fields you are certain are always present.

How do I query nested and repeated fields after loading?

For a RECORD (STRUCT) you reach sub-fields with dot notation: SELECT geo.country FROM t. For a REPEATED field you must flatten it with UNNEST in the FROM clause, typically as a cross join: SELECT t.id, item.sku FROM t, UNNEST(t.items) AS item. Forgetting UNNEST and trying to select a repeated sub-field directly raises a "cannot access field on array" error. UNNEST is the BigQuery equivalent of joining to the child rows that a REPEATED RECORD stores inline.

Why should I not use FLOAT64 for money or IDs?

FLOAT64 is an IEEE-754 binary float, so decimal fractions like 0.1 cannot be represented exactly and sums drift. Use NUMERIC (38 digits, 9 decimal places) or BIGNUMERIC for currency and exact decimals. For large integer identifiers, INT64 is fine up to 2^63; do not store IDs as FLOAT64 because values beyond 2^53 lose precision.

Does this tool send my data to a server?

No. Schema generation runs entirely in your browser. Your JSON never leaves your device.

Real-world Examples

Event table with a nested RECORD and a REPEATED RECORD

A product analytics event has a flat top level, a nested geo object (which becomes a RECORD), and an items array of objects (which becomes a REPEATED RECORD). This is the canonical denormalized BigQuery shape — one event row carries its line items inline instead of in a joined table.

Input

{
  "event_id": "evt_88",
  "event_time": "2024-05-01T09:15:00Z",
  "user_id": 4021,
  "geo": { "country": "US", "city": "Austin" },
  "items": [
    { "sku": "A-1", "qty": 2 },
    { "sku": "B-7", "qty": 1 }
  ]
}

Output

[
  { "name": "event_id",   "type": "STRING",    "mode": "NULLABLE" },
  { "name": "event_time",  "type": "TIMESTAMP", "mode": "NULLABLE" },
  { "name": "user_id",     "type": "INT64",     "mode": "NULLABLE" },
  { "name": "geo", "type": "RECORD", "mode": "NULLABLE", "fields": [
      { "name": "country", "type": "STRING", "mode": "NULLABLE" },
      { "name": "city",    "type": "STRING", "mode": "NULLABLE" }
  ]},
  { "name": "items", "type": "RECORD", "mode": "REPEATED", "fields": [
      { "name": "sku", "type": "STRING", "mode": "NULLABLE" },
      { "name": "qty", "type": "INT64",  "mode": "NULLABLE" }
  ]}
]

-- Query the repeated record with UNNEST:
-- SELECT event_id, geo.country, item.sku, item.qty
-- FROM events, UNNEST(items) AS item;

Partitioned orders table as DDL with NUMERIC money

For a transactional table, money is NUMERIC (not FLOAT64) to avoid rounding, the created_at column drives time partitioning, and customer_id is clustered for cheap filtered scans. This DDL is ready to paste into the BigQuery console.