Skip to content

v3.0.2 release #124

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Mar 31, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
6136008
Correct title of ApiServer
markdumay Jan 15, 2025
4543b7b
Merge pull request #108 from markdumay/main
pflooky Jan 16, 2025
d765143
Make physicalType and logicalType optional (already in JSON schema, u…
simonharrer Feb 10, 2025
c6a3683
Merge pull request #112 from bitol-io/dev-types-optional2
simonharrer Feb 11, 2025
d6cd078
Add in common server properties table in docs README
pflooky Feb 26, 2025
ad38dca
Fix naming for required field for stagingDir in Athena Server
ed-fbiberger Feb 26, 2025
32260c1
Merge pull request #118 from ed-fbiberger/fix-athena-staging-dir
pflooky Feb 26, 2025
5e5ac42
add Expressing Date / Datetime / Timezone information to readme
dccakes Mar 10, 2025
2069585
Fix: add physicalName for properties
simonharrer Mar 10, 2025
9f82f4d
Add citation recommendation
simonharrer Mar 10, 2025
f5d12e3
fix build
simonharrer Mar 10, 2025
f18beec
fix build
simonharrer Mar 10, 2025
128e407
Merge pull request #116 from pflooky/add-common-server-doc
simonharrer Mar 10, 2025
cd2e413
update woth JsFormatter defaults
dccakes Mar 10, 2025
f7d8e31
add physicalType example and description information
dccakes Mar 11, 2025
8d3f70d
Update README.md
jgperrin Mar 11, 2025
c06cccd
Update CHANGELOG.md
jgperrin Mar 11, 2025
1ea8d1d
Merge pull request #121 from bitol-io/readme-date-logical
pflooky Mar 14, 2025
ec43113
Merge pull request #122 from bitol-io/jgperrin-patch-default-date-format
pflooky Mar 14, 2025
f8eedae
Update JSON schema and documentation to include 'name' field for team…
simonharrer Mar 25, 2025
01b617a
Update basic-four-dpo.odcs.yaml
jgperrin Mar 25, 2025
9358c78
Update YAML and JSON files to replace 'comment' with 'description' fo…
simonharrer Mar 25, 2025
56ca98b
Update full-example.odcs.yaml
jgperrin Mar 25, 2025
5fc9b92
Merge pull request #123 from bitol-io/jgperrin-patch-1
jgperrin Mar 25, 2025
acddac5
Release for v3.0.2, update docs to use latest version, update changel…
pflooky Mar 31, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,14 @@ image: "https://raw.githubusercontent.com/bitol-io/artwork/main/horizontal/color

This document tracks the history and evolution of the **Open Data Contract Standard**.

# v3.0.2 - 2024-03-31 - APPROVED

* Added field `physicalName` for the properties in JSON schema.
* Explicitly specifies `YYYY-MM-DDTHH:mm:ss.SSSZ` for default date format.
* Added field `name` team members in JSON schema and docs.
* Added field `description` team members in JSON schema and docs.
* Fixed Athena Server required property name from `staging_dir` to `stagingDir`

# v3.0.1 - 2024-12-22 - APPROVED

* Added field `authoritativeDefinitions` into JSON schema
Expand Down
16 changes: 15 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Welcome!
Thanks for your interest and for taking the time to come here! ❤️

## Executive summary
This standard describes a structure for a **data contract**. Its current version is v3.0.1. It is available for you as an Apache 2.0 license. Contributions are welcome!
This standard describes a structure for a **data contract**. Its current version is v3.0.2. It is available for you as an Apache 2.0 license. Contributions are welcome!

## Discover the open standard
A reader-friendly version of the standard can be found on its [dedicated site](https://bitol-io.github.io/open-data-contract-standard/).
Expand Down Expand Up @@ -56,6 +56,20 @@ Check out the [CONTRIBUTING](./CONTRIBUTING.md) page.

## More

### Citation

If you need to cite this standard, you can use the following BibTeX entry:

```bibtex
@manual{ODCS2025,
title = {Open Data Contract Standard (ODCS)},
author = {{Bitol}},
organization = {LF AI \& Data Foundation},
year = {2025},
url = {https://bitol-io.github.io/open-data-contract-standard}
}
```

### History
Formerly known as the data contract template, this standard is used to implement Data Mesh at [PayPal](https://about.pypl.com/). Starting with v2.2.0, it is maintained by a 501c6 non-profit organization called [AIDA User Group (Artificial Intelligence, Data, and Analytics User Group)](https://aidaug.org). On November 30th, 2023, [AIDA User Group](https://aidaug.org) and the [Linux Foundation AI & Data](https://lfaidata.foundation/) joined forces to create [Bitol](https://bitol.io). Bitol englobes ODCS and future standards & tools.

Expand Down
77 changes: 65 additions & 12 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ This section contains general information about the contract.
### Example

```YAML
apiVersion: v3.0.1 # Standard version
apiVersion: v3.0.2 # Standard version
kind: DataContract

id: 53581432-6c55-4ba2-a65f-72344a91553a
Expand All @@ -66,7 +66,7 @@ tags: ['finance']

| Key | UX label | Required | Description |
|--------------------------------------|---------------------------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| apiVersion | Standard version | Yes | Version of the standard used to build data contract. Default value is `v3.0.1`. |
| apiVersion | Standard version | Yes | Version of the standard used to build data contract. Default value is `v3.0.2`. |
| kind | Kind | Yes | The kind of file this is. Valid value is `DataContract`. |
| id | ID | Yes | A unique identifier used to reduce the risk of dataset name collisions, such as a UUID. |
| name | Name | No | Name of the data contract. |
Expand Down Expand Up @@ -243,9 +243,9 @@ Some keys are more applicable when the described property is a column.
|--------------------------|------------------------------|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| primaryKey | Primary Key | No | Boolean value specifying whether the field is primary or not. Default is false. |
| primaryKeyPosition | Primary Key Position | No | If field is a primary key, the position of the primary key element. Starts from 1. Example of `account_id, name` being primary key columns, `account_id` has primaryKeyPosition 1 and `name` primaryKeyPosition 2. Default to -1. |
| logicalType | Logical Type | Yes | The logical field datatype. One of `string`, `date`, `number`, `integer`, `object`, `array` or `boolean`. |
| logicalType | Logical Type | No | The logical field datatype. One of `string`, `date`, `number`, `integer`, `object`, `array` or `boolean`. |
| logicalTypeOptions | Logical Type Options | No | Additional optional metadata to describe the logical type. See [here](#logical-type-options) for more details about supported options for each `logicalType`. |
| physicalType | Physical Type | Yes | The physical element data type in the data source. For example, VARCHAR(2), DOUBLE, INT. |
| physicalType | Physical Type | No | The physical element data type in the data source. For example, VARCHAR(2), DOUBLE, INT. |
| description | Description | No | Description of the element. |
| required | Required | No | Indicates if the element may contain Null values; possible values are true and false. Default is false. |
| unique | Unique | No | Indicates if the element contains unique values; possible values are true and false. Default is false. |
Expand All @@ -270,7 +270,7 @@ Additional metadata options to more accurately define the data type.
| array | maxItems | Maximum Items | No | Maximum number of items. |
| array | minItems | Minimum Items | No | Minimum number of items. |
| array | uniqueItems | Unique Items | No | If set to true, all items in the array are unique. |
| date | format | Format | No | Format of the date. Follows the format as prescribed by [JDK DateTimeFormatter](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html). For example, format 'yyyy-MM-dd'. |
| date | format | Format | No | Format of the date. Follows the format as prescribed by [JDK DateTimeFormatter](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html). Default value is using ISO 8601: 'YYYY-MM-DDTHH:mm:ss.SSSZ'. For example, format 'yyyy-MM-dd'. |
| date | exclusiveMaximum | Exclusive Maximum | No | If set to true, all values are strictly less than the maximum value (values < maximum). Otherwise, less than or equal to the maximum value (values <= maximum). |
| date | exclusiveMinimum | Exclusive Minimum | No | If set to true, all values are strictly greater than the minimum value (values > minimum). Otherwise, greater than or equal to the minimum value (values >= minimum). |
| date | maximum | Maximum | No | All date values are less than or equal to this value (values <= maximum). |
Expand All @@ -289,6 +289,53 @@ Additional metadata options to more accurately define the data type.
| string | minLength | Minimum Length | No | Minimum length of the string. |
| string | pattern | Pattern | No | Regular expression pattern to define valid value. Follows regular expression syntax from ECMA-262 (https://262.ecma-international.org/5.1/#sec-15.10.1). |

#### Expressing Date / Datetime / Timezone information

Given the complexity of handling various date and time formats (e.g., date, datetime, time, timestamp, timestamp with and without timezone), the existing `logicalType` options currently support only `date`. To specify additional temporal details, `logicalType` should be used in conjunction with `logicalTypeOptions.format` or `physicalType` to define the desired format. Using `physicalType` allows for definition of your data-source specific data type.

``` yaml
version: 1.0.0
kind: DataContract
id: 53581432-6c55-4ba2-a65f-72344a91553a
status: active
name: date_example
apiVersion: v3.0.2
schema:
# Date Only
- name: event_date
logicalType: date
logicalTypeOptions:
- format: "yyyy-MM-dd"
examples:
- "2024-07-10"

# Date & Time (UTC)
- name: created_at
logicalType: date
logicalTypeOptions:
- format: "yyyy-MM-ddTHH:mm:ssZ"
examples:
- "2024-03-10T14:22:35Z"

# Time Only
- name: event_start_time
logicalType: date
logicalTypeOptions:
- format: "HH:mm:ss"
examples:
- "08:30:00"

# Physical Type with Date & Time (UTC)
- name: event_date
logicalType: date
physicalType: DATETIME
logicalTypeOptions:
- format: yyyy-MM-ddTHH:mm:ssZ"
examples:
- "2024-03-10T14:22:35Z"

```

### Authoritative definitions

Reference to an external definition on element logic or values.
Expand Down Expand Up @@ -595,7 +642,7 @@ team:
dateIn: 2022-10-01
- username: daustin
role: Owner
comment: Keeper of the grail
description: Keeper of the grail
name: David Austin
dateIn: 2022-10-01
```
Expand All @@ -607,6 +654,8 @@ The UX label is the label used in the UI and other user experiences.
|-------------------------|----------------------|----------|--------------------------------------------------------------------------------------------|
| team | Team | No | Object |
| team.username | Username | No | The user's username or email. |
| team.name | Name | No | The user's name. |
| team.description | Description | No | The user's name. |
| team.role | Role | No | The user's job role; Examples might be owner, data steward. There is no limit on the role. |
| team.dateIn | Date In | No | The date when the user joined the team. |
| team.dateOut | Date Out | No | The date when the user ceased to be part of the team. |
Expand Down Expand Up @@ -722,7 +771,8 @@ Each server in the schema has the following structure:

```yaml
servers:
- type: <server-type>
- server: my-server-name
type: <server-type>
description: <server-description>
environment: <server-environment>
<server-type-specific-fields> # according to the server type
Expand All @@ -734,11 +784,14 @@ servers:

#### Common Server Properties

- **type**: The type of server. Valid values include various server technologies like `athena`, `bigquery`, `postgresql`, etc.
- **description**: A description of the server.
- **environment**: The environment where the server operates (e.g., `prod`, `dev`, `uat`). There are no set values.
- **roles**: An optional array of roles that have access to the server.
- **customProperties**: Any additional custom properties specific to the server that are not part of the standard.
| Key | UX label | Required | Description |
|------------------|-------------------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| server | Server | Yes | Identifier of the server. |
| type | Type | Yes | Type of the server. Can be one of: api, athena, azure, bigquery, clickhouse, databricks, denodo, dremio, duckdb, glue, cloudsql, db2, informix, kafka, kinesis, local, mysql, oracle, postgresql, postgres, presto, pubsub, redshift, s3, sftp, snowflake, sqlserver, synapse, trino, vertica, custom. |
| description | Description | No | Description of the server. |
| environment | Environment | No | Environment of the server. Examples includes: prod, preprod, dev, uat. |
| roles | Roles | No | List of roles that have access to the server. Check [roles](#roles) section for more details. |
| customProperties | Custom Properties | No | Custom properties that are not part of the standard. |

### Specific Server Properties

Expand Down
8 changes: 5 additions & 3 deletions docs/examples/all/full-example.odcs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ description:
tenant: ClimateQuantumInc

kind: DataContract
apiVersion: v3.0.1 # Standard version (follows semantic versioning)
apiVersion: v3.0.2 # Standard version (follows semantic versioning)

# Infrastructure & servers
servers:
Expand All @@ -32,6 +32,7 @@ schema:
- name: tbl
physicalName: tbl_1
physicalType: table
businessName: Core Payment Metrics
description: Provides core payment metrics
authoritativeDefinitions:
- url: https://catalog.data.gov/dataset/air-quality
Expand All @@ -41,7 +42,8 @@ schema:
tags: [ 'finance', 'payments']
dataGranularityDescription: Aggregation on columns txn_ref_dt, pmt_txn_id
properties:
- name: txn_ref_dt
- name: transaction_reference_date
physicalName: txn_ref_dt
primaryKey: false
primaryKeyPosition: -1
businessName: transaction reference date
Expand Down Expand Up @@ -152,7 +154,7 @@ team:
dateIn: "2022-10-01"
- username: daustin
role: Owner
comment: Keeper of the grail
description: Keeper of the grail
dateIn: "2022-10-01"


Expand Down
2 changes: 1 addition & 1 deletion docs/examples/data-types/all-data-types.odcs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ id: 53581432-6c55-4ba2-a65f-72344a91553a
status: active
name: my_table
dataProduct: my_quantum
apiVersion: v3.0.1
apiVersion: v3.0.2
schema:
- name: transactions_tbl
description: Provides core payment metrics
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ kind: DataContract
id: 53581432-6c55-4ba2-a65f-72344a91553a
status: active
name: my_quantum
apiVersion: v3.0.1
apiVersion: v3.0.2
schema:
- name: tbl
description: Provides core payment metrics
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/quality/column-accuracy.odcs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ id: 53581432-6c55-4ba2-a65f-72344a91553a
status: active
name: my_table
dataProduct: my_quantum
apiVersion: v3.0.1
apiVersion: v3.0.2
schema:
- name: Air_Quality
description: Air quality of the city of New York
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/quality/column-completeness.odcs.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
version: 1.0.0
apiVersion: v3.0.1
apiVersion: v3.0.2
kind: DataContract
id: 53581432-6c55-4ba2-a65f-72344a91553a
status: active
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/quality/column-custom.odcs.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
version: 1.0.0
apiVersion: v3.0.1
apiVersion: v3.0.2
kind: DataContract
id: 53581432-6c55-4ba2-a65f-72344a91553a
status: active
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/quality/column-validity.odcs.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
version: 1.0.0
apiVersion: v3.0.1
apiVersion: v3.0.2
kind: DataContract
id: 53581432-6c55-4ba2-a65f-72344a91553a
status: active
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ status: active
name: my_table
dataProduct: my_quantum
schema: []
apiVersion: v3.0.1
apiVersion: v3.0.2
roles:
- role: microstrategy_user_opr
access: read
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/schema/all-schema-types.odcs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ kind: DataContract
id: 53581432-6c55-4ba2-a65f-72344a91553a
status: active
name: my_quantum
apiVersion: v3.0.1
apiVersion: v3.0.2
schema:
- name: tbl
logicalType: object
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/schema/kafka-schema.odcs.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
apiVersion: v3.0.1
apiVersion: v3.0.2
kind: DataContract
id: orders
status: development
Expand Down
Loading