diff --git a/CHANGELOG.md b/CHANGELOG.md index b21aeda..82dba5f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,14 @@ image: "https://raw.githubusercontent.com/bitol-io/artwork/main/horizontal/color This document tracks the history and evolution of the **Open Data Contract Standard**. +# v3.0.2 - 2024-03-31 - APPROVED + +* Added field `physicalName` for the properties in JSON schema. +* Explicitly specifies `YYYY-MM-DDTHH:mm:ss.SSSZ` for default date format. +* Added field `name` team members in JSON schema and docs. +* Added field `description` team members in JSON schema and docs. +* Fixed Athena Server required property name from `staging_dir` to `stagingDir` + # v3.0.1 - 2024-12-22 - APPROVED * Added field `authoritativeDefinitions` into JSON schema diff --git a/README.md b/README.md index d5489ea..ec3cf26 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ Welcome! Thanks for your interest and for taking the time to come here! ❤️ ## Executive summary -This standard describes a structure for a **data contract**. Its current version is v3.0.1. It is available for you as an Apache 2.0 license. Contributions are welcome! +This standard describes a structure for a **data contract**. Its current version is v3.0.2. It is available for you as an Apache 2.0 license. Contributions are welcome! ## Discover the open standard A reader-friendly version of the standard can be found on its [dedicated site](https://bitol-io.github.io/open-data-contract-standard/). @@ -56,6 +56,20 @@ Check out the [CONTRIBUTING](./CONTRIBUTING.md) page. ## More +### Citation + +If you need to cite this standard, you can use the following BibTeX entry: + +```bibtex +@manual{ODCS2025, + title = {Open Data Contract Standard (ODCS)}, + author = {{Bitol}}, + organization = {LF AI \& Data Foundation}, + year = {2025}, + url = {https://bitol-io.github.io/open-data-contract-standard} +} +``` + ### History Formerly known as the data contract template, this standard is used to implement Data Mesh at [PayPal](https://about.pypl.com/). Starting with v2.2.0, it is maintained by a 501c6 non-profit organization called [AIDA User Group (Artificial Intelligence, Data, and Analytics User Group)](https://aidaug.org). On November 30th, 2023, [AIDA User Group](https://aidaug.org) and the [Linux Foundation AI & Data](https://lfaidata.foundation/) joined forces to create [Bitol](https://bitol.io). Bitol englobes ODCS and future standards & tools. diff --git a/docs/README.md b/docs/README.md index 475af3a..a8e859f 100644 --- a/docs/README.md +++ b/docs/README.md @@ -43,7 +43,7 @@ This section contains general information about the contract. ### Example ```YAML -apiVersion: v3.0.1 # Standard version +apiVersion: v3.0.2 # Standard version kind: DataContract id: 53581432-6c55-4ba2-a65f-72344a91553a @@ -66,7 +66,7 @@ tags: ['finance'] | Key | UX label | Required | Description | |--------------------------------------|---------------------------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| apiVersion | Standard version | Yes | Version of the standard used to build data contract. Default value is `v3.0.1`. | +| apiVersion | Standard version | Yes | Version of the standard used to build data contract. Default value is `v3.0.2`. | | kind | Kind | Yes | The kind of file this is. Valid value is `DataContract`. | | id | ID | Yes | A unique identifier used to reduce the risk of dataset name collisions, such as a UUID. | | name | Name | No | Name of the data contract. | @@ -243,9 +243,9 @@ Some keys are more applicable when the described property is a column. |--------------------------|------------------------------|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | primaryKey | Primary Key | No | Boolean value specifying whether the field is primary or not. Default is false. | | primaryKeyPosition | Primary Key Position | No | If field is a primary key, the position of the primary key element. Starts from 1. Example of `account_id, name` being primary key columns, `account_id` has primaryKeyPosition 1 and `name` primaryKeyPosition 2. Default to -1. | -| logicalType | Logical Type | Yes | The logical field datatype. One of `string`, `date`, `number`, `integer`, `object`, `array` or `boolean`. | +| logicalType | Logical Type | No | The logical field datatype. One of `string`, `date`, `number`, `integer`, `object`, `array` or `boolean`. | | logicalTypeOptions | Logical Type Options | No | Additional optional metadata to describe the logical type. See [here](#logical-type-options) for more details about supported options for each `logicalType`. | -| physicalType | Physical Type | Yes | The physical element data type in the data source. For example, VARCHAR(2), DOUBLE, INT. | +| physicalType | Physical Type | No | The physical element data type in the data source. For example, VARCHAR(2), DOUBLE, INT. | | description | Description | No | Description of the element. | | required | Required | No | Indicates if the element may contain Null values; possible values are true and false. Default is false. | | unique | Unique | No | Indicates if the element contains unique values; possible values are true and false. Default is false. | @@ -270,7 +270,7 @@ Additional metadata options to more accurately define the data type. | array | maxItems | Maximum Items | No | Maximum number of items. | | array | minItems | Minimum Items | No | Minimum number of items. | | array | uniqueItems | Unique Items | No | If set to true, all items in the array are unique. | -| date | format | Format | No | Format of the date. Follows the format as prescribed by [JDK DateTimeFormatter](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html). For example, format 'yyyy-MM-dd'. | +| date | format | Format | No | Format of the date. Follows the format as prescribed by [JDK DateTimeFormatter](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html). Default value is using ISO 8601: 'YYYY-MM-DDTHH:mm:ss.SSSZ'. For example, format 'yyyy-MM-dd'. | | date | exclusiveMaximum | Exclusive Maximum | No | If set to true, all values are strictly less than the maximum value (values < maximum). Otherwise, less than or equal to the maximum value (values <= maximum). | | date | exclusiveMinimum | Exclusive Minimum | No | If set to true, all values are strictly greater than the minimum value (values > minimum). Otherwise, greater than or equal to the minimum value (values >= minimum). | | date | maximum | Maximum | No | All date values are less than or equal to this value (values <= maximum). | @@ -289,6 +289,53 @@ Additional metadata options to more accurately define the data type. | string | minLength | Minimum Length | No | Minimum length of the string. | | string | pattern | Pattern | No | Regular expression pattern to define valid value. Follows regular expression syntax from ECMA-262 (https://262.ecma-international.org/5.1/#sec-15.10.1). | +#### Expressing Date / Datetime / Timezone information + +Given the complexity of handling various date and time formats (e.g., date, datetime, time, timestamp, timestamp with and without timezone), the existing `logicalType` options currently support only `date`. To specify additional temporal details, `logicalType` should be used in conjunction with `logicalTypeOptions.format` or `physicalType` to define the desired format. Using `physicalType` allows for definition of your data-source specific data type. + +``` yaml +version: 1.0.0 +kind: DataContract +id: 53581432-6c55-4ba2-a65f-72344a91553a +status: active +name: date_example +apiVersion: v3.0.2 +schema: + # Date Only + - name: event_date + logicalType: date + logicalTypeOptions: + - format: "yyyy-MM-dd" + examples: + - "2024-07-10" + + # Date & Time (UTC) + - name: created_at + logicalType: date + logicalTypeOptions: + - format: "yyyy-MM-ddTHH:mm:ssZ" + examples: + - "2024-03-10T14:22:35Z" + + # Time Only + - name: event_start_time + logicalType: date + logicalTypeOptions: + - format: "HH:mm:ss" + examples: + - "08:30:00" + + # Physical Type with Date & Time (UTC) + - name: event_date + logicalType: date + physicalType: DATETIME + logicalTypeOptions: + - format: yyyy-MM-ddTHH:mm:ssZ" + examples: + - "2024-03-10T14:22:35Z" + +``` + ### Authoritative definitions Reference to an external definition on element logic or values. @@ -595,7 +642,7 @@ team: dateIn: 2022-10-01 - username: daustin role: Owner - comment: Keeper of the grail + description: Keeper of the grail name: David Austin dateIn: 2022-10-01 ``` @@ -607,6 +654,8 @@ The UX label is the label used in the UI and other user experiences. |-------------------------|----------------------|----------|--------------------------------------------------------------------------------------------| | team | Team | No | Object | | team.username | Username | No | The user's username or email. | +| team.name | Name | No | The user's name. | +| team.description | Description | No | The user's name. | | team.role | Role | No | The user's job role; Examples might be owner, data steward. There is no limit on the role. | | team.dateIn | Date In | No | The date when the user joined the team. | | team.dateOut | Date Out | No | The date when the user ceased to be part of the team. | @@ -722,7 +771,8 @@ Each server in the schema has the following structure: ```yaml servers: - - type: + - server: my-server-name + type: description: environment: # according to the server type @@ -734,11 +784,14 @@ servers: #### Common Server Properties -- **type**: The type of server. Valid values include various server technologies like `athena`, `bigquery`, `postgresql`, etc. -- **description**: A description of the server. -- **environment**: The environment where the server operates (e.g., `prod`, `dev`, `uat`). There are no set values. -- **roles**: An optional array of roles that have access to the server. -- **customProperties**: Any additional custom properties specific to the server that are not part of the standard. +| Key | UX label | Required | Description | +|------------------|-------------------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| server | Server | Yes | Identifier of the server. | +| type | Type | Yes | Type of the server. Can be one of: api, athena, azure, bigquery, clickhouse, databricks, denodo, dremio, duckdb, glue, cloudsql, db2, informix, kafka, kinesis, local, mysql, oracle, postgresql, postgres, presto, pubsub, redshift, s3, sftp, snowflake, sqlserver, synapse, trino, vertica, custom. | +| description | Description | No | Description of the server. | +| environment | Environment | No | Environment of the server. Examples includes: prod, preprod, dev, uat. | +| roles | Roles | No | List of roles that have access to the server. Check [roles](#roles) section for more details. | +| customProperties | Custom Properties | No | Custom properties that are not part of the standard. | ### Specific Server Properties diff --git a/docs/examples/all/full-example.odcs.yaml b/docs/examples/all/full-example.odcs.yaml index a1f0d2f..dd71ea6 100644 --- a/docs/examples/all/full-example.odcs.yaml +++ b/docs/examples/all/full-example.odcs.yaml @@ -16,7 +16,7 @@ description: tenant: ClimateQuantumInc kind: DataContract -apiVersion: v3.0.1 # Standard version (follows semantic versioning) +apiVersion: v3.0.2 # Standard version (follows semantic versioning) # Infrastructure & servers servers: @@ -32,6 +32,7 @@ schema: - name: tbl physicalName: tbl_1 physicalType: table + businessName: Core Payment Metrics description: Provides core payment metrics authoritativeDefinitions: - url: https://catalog.data.gov/dataset/air-quality @@ -41,7 +42,8 @@ schema: tags: [ 'finance', 'payments'] dataGranularityDescription: Aggregation on columns txn_ref_dt, pmt_txn_id properties: - - name: txn_ref_dt + - name: transaction_reference_date + physicalName: txn_ref_dt primaryKey: false primaryKeyPosition: -1 businessName: transaction reference date @@ -152,7 +154,7 @@ team: dateIn: "2022-10-01" - username: daustin role: Owner - comment: Keeper of the grail + description: Keeper of the grail dateIn: "2022-10-01" diff --git a/docs/examples/data-types/all-data-types.odcs.yaml b/docs/examples/data-types/all-data-types.odcs.yaml index 93d7b1f..5711d29 100644 --- a/docs/examples/data-types/all-data-types.odcs.yaml +++ b/docs/examples/data-types/all-data-types.odcs.yaml @@ -4,7 +4,7 @@ id: 53581432-6c55-4ba2-a65f-72344a91553a status: active name: my_table dataProduct: my_quantum -apiVersion: v3.0.1 +apiVersion: v3.0.2 schema: - name: transactions_tbl description: Provides core payment metrics diff --git a/docs/examples/fundamentals/table-column-description.odcs.yaml b/docs/examples/fundamentals/table-column-description.odcs.yaml index aacadbb..a225ee8 100644 --- a/docs/examples/fundamentals/table-column-description.odcs.yaml +++ b/docs/examples/fundamentals/table-column-description.odcs.yaml @@ -3,7 +3,7 @@ kind: DataContract id: 53581432-6c55-4ba2-a65f-72344a91553a status: active name: my_quantum -apiVersion: v3.0.1 +apiVersion: v3.0.2 schema: - name: tbl description: Provides core payment metrics diff --git a/docs/examples/quality/column-accuracy.odcs.yaml b/docs/examples/quality/column-accuracy.odcs.yaml index 445f0e3..8abbf5c 100644 --- a/docs/examples/quality/column-accuracy.odcs.yaml +++ b/docs/examples/quality/column-accuracy.odcs.yaml @@ -4,7 +4,7 @@ id: 53581432-6c55-4ba2-a65f-72344a91553a status: active name: my_table dataProduct: my_quantum -apiVersion: v3.0.1 +apiVersion: v3.0.2 schema: - name: Air_Quality description: Air quality of the city of New York diff --git a/docs/examples/quality/column-completeness.odcs.yaml b/docs/examples/quality/column-completeness.odcs.yaml index 92b31a7..c33141c 100644 --- a/docs/examples/quality/column-completeness.odcs.yaml +++ b/docs/examples/quality/column-completeness.odcs.yaml @@ -1,5 +1,5 @@ version: 1.0.0 -apiVersion: v3.0.1 +apiVersion: v3.0.2 kind: DataContract id: 53581432-6c55-4ba2-a65f-72344a91553a status: active diff --git a/docs/examples/quality/column-custom.odcs.yaml b/docs/examples/quality/column-custom.odcs.yaml index 80e68f8..676a409 100644 --- a/docs/examples/quality/column-custom.odcs.yaml +++ b/docs/examples/quality/column-custom.odcs.yaml @@ -1,5 +1,5 @@ version: 1.0.0 -apiVersion: v3.0.1 +apiVersion: v3.0.2 kind: DataContract id: 53581432-6c55-4ba2-a65f-72344a91553a status: active diff --git a/docs/examples/quality/column-validity.odcs.yaml b/docs/examples/quality/column-validity.odcs.yaml index 406d919..2d35810 100644 --- a/docs/examples/quality/column-validity.odcs.yaml +++ b/docs/examples/quality/column-validity.odcs.yaml @@ -1,5 +1,5 @@ version: 1.0.0 -apiVersion: v3.0.1 +apiVersion: v3.0.2 kind: DataContract id: 53581432-6c55-4ba2-a65f-72344a91553a status: active diff --git a/docs/examples/roles/service-and-operational-roles.odcs.yaml b/docs/examples/roles/service-and-operational-roles.odcs.yaml index b0adde2..f0510a9 100644 --- a/docs/examples/roles/service-and-operational-roles.odcs.yaml +++ b/docs/examples/roles/service-and-operational-roles.odcs.yaml @@ -5,7 +5,7 @@ status: active name: my_table dataProduct: my_quantum schema: [] -apiVersion: v3.0.1 +apiVersion: v3.0.2 roles: - role: microstrategy_user_opr access: read diff --git a/docs/examples/schema/all-schema-types.odcs.yaml b/docs/examples/schema/all-schema-types.odcs.yaml index b32f503..5537d27 100644 --- a/docs/examples/schema/all-schema-types.odcs.yaml +++ b/docs/examples/schema/all-schema-types.odcs.yaml @@ -3,7 +3,7 @@ kind: DataContract id: 53581432-6c55-4ba2-a65f-72344a91553a status: active name: my_quantum -apiVersion: v3.0.1 +apiVersion: v3.0.2 schema: - name: tbl logicalType: object diff --git a/docs/examples/schema/kafka-schema.odcs.yaml b/docs/examples/schema/kafka-schema.odcs.yaml index 3e48ef1..33ac834 100644 --- a/docs/examples/schema/kafka-schema.odcs.yaml +++ b/docs/examples/schema/kafka-schema.odcs.yaml @@ -1,4 +1,4 @@ -apiVersion: v3.0.1 +apiVersion: v3.0.2 kind: DataContract id: orders status: development diff --git a/docs/examples/schema/kafka-schemaregistry.odcs.yaml b/docs/examples/schema/kafka-schemaregistry.odcs.yaml index a5f5033..e671d00 100644 --- a/docs/examples/schema/kafka-schemaregistry.odcs.yaml +++ b/docs/examples/schema/kafka-schemaregistry.odcs.yaml @@ -1,4 +1,4 @@ -apiVersion: v3.0.1 +apiVersion: v3.0.2 kind: DataContract id: orders status: production diff --git a/docs/examples/schema/table-column.odcs.yaml b/docs/examples/schema/table-column.odcs.yaml index 863f15f..1db0038 100644 --- a/docs/examples/schema/table-column.odcs.yaml +++ b/docs/examples/schema/table-column.odcs.yaml @@ -4,7 +4,7 @@ id: 53581432-6c55-4ba2-a65f-72344a91553b status: active name: my_table dataProduct: my_quantum -apiVersion: v3.0.1 +apiVersion: v3.0.2 schema: - name: tbl physicalType: table diff --git a/docs/examples/schema/table-columns-with-partition.odcs.yaml b/docs/examples/schema/table-columns-with-partition.odcs.yaml index fdb9684..e3f7172 100644 --- a/docs/examples/schema/table-columns-with-partition.odcs.yaml +++ b/docs/examples/schema/table-columns-with-partition.odcs.yaml @@ -4,7 +4,7 @@ id: 53581432-6c55-4ba2-a65f-72344a91553c status: active name: my_table dataProduct: my_quantum -apiVersion: v3.0.1 +apiVersion: v3.0.2 schema: - name: tbl physicalType: table diff --git a/docs/examples/server/azure-server.odcs.yaml b/docs/examples/server/azure-server.odcs.yaml index 1c3cdf2..5f961dd 100644 --- a/docs/examples/server/azure-server.odcs.yaml +++ b/docs/examples/server/azure-server.odcs.yaml @@ -1,5 +1,5 @@ version: 1.0.0 -apiVersion: v3.0.1 +apiVersion: v3.0.2 kind: DataContract id: abc123 status: in development diff --git a/docs/examples/server/kafka-server.odcs.yaml b/docs/examples/server/kafka-server.odcs.yaml index e155f14..e169d68 100644 --- a/docs/examples/server/kafka-server.odcs.yaml +++ b/docs/examples/server/kafka-server.odcs.yaml @@ -1,5 +1,5 @@ version: 1.0.0 -apiVersion: v3.0.1 +apiVersion: v3.0.2 kind: DataContract id: abc123 status: in development diff --git a/docs/examples/sla/database-table-sla.odcs.yaml b/docs/examples/sla/database-table-sla.odcs.yaml index 53bd9ac..16a5d26 100644 --- a/docs/examples/sla/database-table-sla.odcs.yaml +++ b/docs/examples/sla/database-table-sla.odcs.yaml @@ -1,5 +1,5 @@ version: 1.0.0 -apiVersion: v3.0.1 +apiVersion: v3.0.2 kind: DataContract id: 53581432-6c55-4ba2-a65f-72344a91553a status: active @@ -27,7 +27,7 @@ slaProperties: value: 1 valueExt: 1 unit: d - column: tab1.txn_ref_dt + element: tab1.txn_ref_dt - property: timeOfAvailability value: 09:00-08:00 element: tab1.txn_ref_dt diff --git a/docs/examples/stakeholders/basic-four-dpo.odcs.yaml b/docs/examples/stakeholders/basic-four-dpo.odcs.yaml index a48e54c..1cb1b1e 100644 --- a/docs/examples/stakeholders/basic-four-dpo.odcs.yaml +++ b/docs/examples/stakeholders/basic-four-dpo.odcs.yaml @@ -1,5 +1,5 @@ version: 1.0.0 -apiVersion: v3.0.1 +apiVersion: v3.0.2 kind: DataContract id: 53581432-6c55-4ba2-a65f-72344a91553a status: active @@ -20,7 +20,7 @@ team: - username: cjane role: dpo dateIn: "2019-03-14" - comment: Minor interruption due to sabbatical, will be back by end of April 2021 + description: Minor interruption due to sabbatical, will be back by end of April 2021 dateOut: "2021-04-01" replacedByUsername: bkid - username: bkid diff --git a/schema/odcs-json-schema-latest.json b/schema/odcs-json-schema-latest.json index 435e1db..cb9bfb9 100644 --- a/schema/odcs-json-schema-latest.json +++ b/schema/odcs-json-schema-latest.json @@ -16,9 +16,9 @@ }, "apiVersion": { "type": "string", - "default": "v3.0.1", - "description": "Version of the standard used to build data contract. Default value is v3.0.1.", - "enum": ["v3.0.1", "v3.0.0", "v2.2.2", "v2.2.1", "v2.2.0"] + "default": "v3.0.2", + "description": "Version of the standard used to build data contract. Default value is v3.0.2.", + "enum": ["v3.0.2","v3.0.1", "v3.0.0", "v2.2.2", "v2.2.1", "v2.2.0"] }, "id": { "type": "string", @@ -626,7 +626,7 @@ } }, "required": [ - "staging_dir", + "stagingDir", "schema" ] }, @@ -1555,6 +1555,11 @@ "type": "string", "description": "The physical element data type in the data source. For example, VARCHAR(2), DOUBLE, INT." }, + "physicalName": { + "type": "string", + "description": "Physical name.", + "examples": ["col_str_a"] + }, "required": { "type": "boolean", "default": false, @@ -1913,6 +1918,7 @@ "Tags": { "type": "array", "description": "A list of tags that may be assigned to the elements (object or property); the tags keyword may appear at any level. Tags may be used to better categorize an element. For example, `finance`, `sensitive`, `employee_record`.", + "examples": ["finance", "sensitive", "employee_record"], "items": { "type": "string" } @@ -2195,7 +2201,22 @@ "properties": { "username": { "type": "string", - "description": "The user's username or email." + "description": "The user's username or email.", + "examples": [ + "mail@example.com", + "uid12345678" + ] + }, + "name": { + "type": "string", + "description": "The user's name.", + "examples": [ + "Jane Doe" + ] + }, + "description": { + "type": "string", + "description": "The user's description." }, "role": { "type": "string", diff --git a/schema/odcs-json-schema-v3.0.2.json b/schema/odcs-json-schema-v3.0.2.json new file mode 100644 index 0000000..cb9bfb9 --- /dev/null +++ b/schema/odcs-json-schema-v3.0.2.json @@ -0,0 +1,2382 @@ +{ + "$schema": "https://json-schema.org/draft/2019-09/schema", + "title": "Open Data Contract Standard (ODCS)", + "description": "An open data contract specification to establish agreement between data producers and consumers.", + "type": "object", + "properties": { + "version": { + "type": "string", + "description": "Current version of the data contract." + }, + "kind": { + "type": "string", + "default": "DataContract", + "description": "The kind of file this is. Valid value is `DataContract`.", + "enum": ["DataContract"] + }, + "apiVersion": { + "type": "string", + "default": "v3.0.2", + "description": "Version of the standard used to build data contract. Default value is v3.0.2.", + "enum": ["v3.0.2","v3.0.1", "v3.0.0", "v2.2.2", "v2.2.1", "v2.2.0"] + }, + "id": { + "type": "string", + "description": "A unique identifier used to reduce the risk of dataset name collisions, such as a UUID." + }, + "name": { + "type": "string", + "description": "Name of the data contract." + }, + "tenant": { + "type": "string", + "description": "Indicates the property the data is primarily associated with. Value is case insensitive." + }, + "tags": { + "$ref": "#/$defs/Tags" + }, + "status": { + "type": "string", + "description": "Current status of the dataset.", + "examples": [ + "proposed", "draft", "active", "deprecated", "retired" + ] + }, + "servers": { + "type": "array", + "description": "List of servers where the datasets reside.", + "items": { + "$ref": "#/$defs/Server" + } + }, + "dataProduct": { + "type": "string", + "description": "The name of the data product." + }, + "description": { + "type": "object", + "description": "High level description of the dataset.", + "properties": { + "usage": { + "type": "string", + "description": "Intended usage of the dataset." + }, + "purpose": { + "type": "string", + "description": "Purpose of the dataset." + }, + "limitations": { + "type": "string", + "description": "Limitations of the dataset." + }, + "authoritativeDefinitions": { + "$ref": "#/$defs/AuthoritativeDefinitions" + }, + "customProperties": { + "$ref": "#/$defs/CustomProperties" + } + } + }, + "domain": { + "type": "string", + "description": "Name of the logical data domain.", + "examples": ["imdb_ds_aggregate", "receiver_profile_out", "transaction_profile_out"] + }, + "schema": { + "type": "array", + "description": "A list of elements within the schema to be cataloged.", + "items": { + "$ref": "#/$defs/SchemaObject" + } + }, + "support": { + "$ref": "#/$defs/Support" + }, + "price": { + "$ref": "#/$defs/Pricing" + }, + "team": { + "type": "array", + "items": { + "$ref": "#/$defs/Team" + } + }, + "roles": { + "type": "array", + "description": "A list of roles that will provide user access to the dataset.", + "items": { + "$ref": "#/$defs/Role" + } + }, + "slaDefaultElement": { + "type": "string", + "description": "Element (using the element path notation) to do the checks on." + }, + "slaProperties": { + "type": "array", + "description": "A list of key/value pairs for SLA specific properties. There is no limit on the type of properties (more details to come).", + "items": { + "$ref": "#/$defs/ServiceLevelAgreementProperty" + } + }, + "authoritativeDefinitions": { + "$ref": "#/$defs/AuthoritativeDefinitions" + }, + "customProperties": { + "$ref": "#/$defs/CustomProperties" + }, + "contractCreatedTs": { + "type": "string", + "format": "date-time", + "description": "Timestamp in UTC of when the data contract was created." + } + }, + "required": ["version", "apiVersion", "kind", "id", "status"], + "additionalProperties": false, + "$defs": { + "Server": { + "type": "object", + "description": "Data source details of where data is physically stored.", + "properties": { + "server": { + "type": "string", + "description": "Identifier of the server." + }, + "type": { + "type": "string", + "description": "Type of the server.", + "enum": [ + "api", "athena", "azure", "bigquery", "clickhouse", "databricks", "denodo", "dremio", + "duckdb", "glue", "cloudsql", "db2", "informix", "kafka", "kinesis", "local", + "mysql", "oracle", "postgresql", "postgres", "presto", "pubsub", + "redshift", "s3", "sftp", "snowflake", "sqlserver", "synapse", "trino", "vertica", "custom" + ] + }, + "description": { + "type": "string", + "description": "Description of the server." + }, + "environment": { + "type": "string", + "description": "Environment of the server.", + "examples": ["prod", "preprod", "dev", "uat"] + }, + "roles": { + "type": "array", + "description": "List of roles that have access to the server.", + "items": { + "$ref": "#/$defs/Role" + } + }, + "customProperties": { + "$ref": "#/$defs/CustomProperties" + } + }, + "allOf": [ + { + "if": { + "properties": { + "type": { + "const": "api" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/ApiServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "athena" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/AthenaServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "azure" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/AzureServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "bigquery" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/BigQueryServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "clickhouse" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/ClickHouseServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "databricks" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/DatabricksServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "denodo" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/DenodoServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "dremio" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/DremioServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "duckdb" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/DuckdbServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "glue" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/GlueServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "cloudsql" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/GoogleCloudSqlServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "db2" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/IBMDB2Server" + } + }, + { + "if": { + "properties": { + "type": { + "const": "informix" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/InformixServer" + } + }, + + { + "if": { + "properties": { + "type": { + "const": "custom" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/CustomServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "kafka" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/KafkaServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "kinesis" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/KinesisServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "local" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/LocalServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "mysql" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/MySqlServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "oracle" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/OracleServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "postgresql" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/PostgresServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "postgres" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/PostgresServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "presto" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/PrestoServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "pubsub" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/PubSubServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "redshift" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/RedshiftServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "s3" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/S3Server" + } + }, + { + "if": { + "properties": { + "type": { + "const": "sftp" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/SftpServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "snowflake" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/SnowflakeServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "sqlserver" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/SqlserverServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "synapse" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/SynapseServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "trino" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/TrinoServer" + } + }, + { + "if": { + "properties": { + "type": { + "const": "vertica" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/ServerSource/VerticaServer" + } + } + ], + "required": ["server", "type"] + }, + "ServerSource": { + "ApiServer": { + "type": "object", + "title": "AthenaServer", + "properties": { + "location": { + "type": "string", + "format": "uri", + "description": "The url to the API.", + "examples": [ + "https://api.example.com/v1" + ] + } + }, + "required": [ + "location" + ] + }, + "AthenaServer": { + "type": "object", + "title": "AthenaServer", + "properties": { + "stagingDir": { + "type": "string", + "format": "uri", + "description": "Amazon Athena automatically stores query results and metadata information for each query that runs in a query result location that you can specify in Amazon S3.", + "examples": [ + "s3://my_storage_account_name/my_container/path" + ] + }, + "schema": { + "type": "string", + "description": "Identify the schema in the data source in which your tables exist." + }, + "catalog": { + "type": "string", + "description": "Identify the name of the Data Source, also referred to as a Catalog.", + "default": "awsdatacatalog" + }, + "regionName": { + "type": "string", + "description": "The region your AWS account uses.", + "examples": ["eu-west-1"] + } + }, + "required": [ + "stagingDir", + "schema" + ] + }, + "AzureServer": { + "type": "object", + "title": "AzureServer", + "properties": { + "location": { + "type": "string", + "format": "uri", + "description": "Fully qualified path to Azure Blob Storage or Azure Data Lake Storage (ADLS), supports globs.", + "examples": [ + "az://my_storage_account_name.blob.core.windows.net/my_container/path/*.parquet", + "abfss://my_storage_account_name.dfs.core.windows.net/my_container_name/path/*.parquet" + ] + }, + "format": { + "type": "string", + "enum": [ + "parquet", + "delta", + "json", + "csv" + ], + "description": "File format." + }, + "delimiter": { + "type": "string", + "enum": [ + "new_line", + "array" + ], + "description": "Only for format = json. How multiple json documents are delimited within one file" + } + }, + "required": [ + "location", + "format" + ] + }, + "BigQueryServer": { + "type": "object", + "title": "BigQueryServer", + "properties": { + "project": { + "type": "string", + "description": "The GCP project name." + }, + "dataset": { + "type": "string", + "description": "The GCP dataset name." + } + }, + "required": [ + "project", + "dataset" + ] + }, + "ClickHouseServer": { + "type": "object", + "title": "ClickHouseServer", + "properties": { + "host": { + "type": "string", + "description": "The host of the ClickHouse server." + }, + "port": { + "type": "integer", + "description": "The port to the ClickHouse server." + }, + "database": { + "type": "string", + "description": "The name of the database." + } + }, + "required": [ + "host", + "port", + "database" + ] + }, + "DatabricksServer": { + "type": "object", + "title": "DatabricksServer", + "properties": { + "host": { + "type": "string", + "description": "The Databricks host", + "examples": [ + "dbc-abcdefgh-1234.cloud.databricks.com" + ] + }, + "catalog": { + "type": "string", + "description": "The name of the Hive or Unity catalog" + }, + "schema": { + "type": "string", + "description": "The schema name in the catalog" + } + }, + "required": [ + "catalog", + "schema" + ] + }, + "DenodoServer": { + "type": "object", + "title": "DenodoServer", + "properties": { + "host": { + "type": "string", + "description": "The host of the Denodo server." + }, + "port": { + "type": "integer", + "description": "The port of the Denodo server." + }, + "database": { + "type": "string", + "description": "The name of the database." + } + }, + "required": [ + "host", + "port" + ] + }, + "DremioServer": { + "type": "object", + "title": "DremioServer", + "properties": { + "host": { + "type": "string", + "description": "The host of the Dremio server." + }, + "port": { + "type": "integer", + "description": "The port of the Dremio server." + }, + "schema": { + "type": "string", + "description": "The name of the schema." + } + }, + "required": [ + "host", + "port" + ] + }, + "DuckdbServer": { + "type": "object", + "title": "DuckdbServer", + "properties": { + "database": { + "type": "string", + "description": "Path to duckdb database file." + }, + "schema": { + "type": "integer", + "description": "The name of the schema." + } + }, + "required": [ + "database" + ] + }, + "GlueServer": { + "type": "object", + "title": "GlueServer", + "properties": { + "account": { + "type": "string", + "description": "The AWS Glue account", + "examples": [ + "1234-5678-9012" + ] + }, + "database": { + "type": "string", + "description": "The AWS Glue database name", + "examples": [ + "my_database" + ] + }, + "location": { + "type": "string", + "format": "uri", + "description": "The AWS S3 path. Must be in the form of a URL.", + "examples": [ + "s3://datacontract-example-orders-latest/data/{model}" + ] + }, + "format": { + "type": "string", + "description": "The format of the files", + "examples": [ + "parquet", + "csv", + "json", + "delta" + ] + } + }, + "required": [ + "account", + "database" + ] + }, + "GoogleCloudSqlServer": { + "type": "object", + "title": "GoogleCloudSqlServer", + "properties": { + "host": { + "type": "string", + "description": "The host of the Google Cloud Sql server." + }, + "port": { + "type": "integer", + "description": "The port of the Google Cloud Sql server." + }, + "database": { + "type": "string", + "description": "The name of the database." + }, + "schema": { + "type": "string", + "description": "The name of the schema." + } + }, + "required": [ + "host", + "port", + "database", + "schema" + ] + }, + "IBMDB2Server": { + "type": "object", + "title": "IBMDB2Server", + "properties": { + "host": { + "type": "string", + "description": "The host of the IBM DB2 server." + }, + "port": { + "type": "integer", + "description": "The port of the IBM DB2 server." + }, + "database": { + "type": "string", + "description": "The name of the database." + }, + "schema": { + "type": "string", + "description": "The name of the schema." + } + }, + "required": [ + "host", + "port", + "database" + ] + }, + "InformixServer": { + "type": "object", + "title": "InformixServer", + "properties": { + "host": { + "type": "string", + "description": "The host to the Informix server. " + }, + "port": { + "type": "integer", + "description": "The port to the Informix server. Defaults to 9088." + }, + "database": { + "type": "string", + "description": "The name of the database." + } + }, + "required": [ + "host", + "database" + ] + }, + "CustomServer": { + "type": "object", + "title": "CustomServer", + "properties": { + "account": { + "type": "string", + "description": "Account used by the server." + }, + "catalog": { + "type": "string", + "description": "Name of the catalog." + }, + "database": { + "type": "string", + "description": "Name of the database." + }, + "dataset": { + "type": "string", + "description": "Name of the dataset." + }, + "delimiter": { + "type": "string", + "description": "Delimiter." + }, + "endpointUrl": { + "type": "string", + "description": "Server endpoint.", + "format": "uri" + }, + "format": { + "type": "string", + "description": "File format." + }, + "host": { + "type": "string", + "description": "Host name or IP address." + }, + "location": { + "type": "string", + "description": "A URL to a location.", + "format": "uri" + }, + "path": { + "type": "string", + "description": "Relative or absolute path to the data file(s)." + }, + "port": { + "type": "integer", + "description": "Port to the server. No default value is assumed for custom servers." + }, + "project": { + "type": "string", + "description": "Project name." + }, + "region": { + "type": "string", + "description": "Cloud region." + }, + "regionName": { + "type": "string", + "description": "Region name." + }, + "schema": { + "type": "string", + "description": "Name of the schema." + }, + "serviceName": { + "type": "string", + "description": "Name of the service." + }, + "stagingDir": { + "type": "string", + "description": "Staging directory." + }, + "warehouse": { + "type": "string", + "description": "Name of the cluster or warehouse." + } + } + }, + "KafkaServer": { + "type": "object", + "title": "KafkaServer", + "description": "Kafka Server", + "properties": { + "host": { + "type": "string", + "description": "The bootstrap server of the kafka cluster." + }, + "format": { + "type": "string", + "description": "The format of the messages.", + "examples": ["json", "avro", "protobuf", "xml"], + "default": "json" + } + }, + "required": [ + "host" + ] + }, + "KinesisServer": { + "type": "object", + "title": "KinesisDataStreamsServer", + "description": "Kinesis Data Streams Server", + "properties": { + "region": { + "type": "string", + "description": "AWS region.", + "examples": [ + "eu-west-1" + ] + }, + "format": { + "type": "string", + "description": "The format of the record", + "examples": [ + "json", + "avro", + "protobuf" + ] + } + } + }, + "LocalServer": { + "type": "object", + "title": "LocalServer", + "properties": { + "path": { + "type": "string", + "description": "The relative or absolute path to the data file(s).", + "examples": [ + "./folder/data.parquet", + "./folder/*.parquet" + ] + }, + "format": { + "type": "string", + "description": "The format of the file(s)", + "examples": [ + "json", + "parquet", + "delta", + "csv" + ] + } + }, + "required": [ + "path", + "format" + ] + }, + "MySqlServer": { + "type": "object", + "title": "MySqlServer", + "properties": { + "host": { + "type": "string", + "description": "The host of the MySql server." + }, + "port": { + "type": "integer", + "description": "The port of the MySql server." + }, + "database": { + "type": "string", + "description": "The name of the database." + } + }, + "required": [ + "host", + "port", + "database" + ] + }, + "OracleServer": { + "type": "object", + "title": "OracleServer", + "properties": { + "host": { + "type": "string", + "description": "The host to the oracle server", + "examples": [ + "localhost" + ] + }, + "port": { + "type": "integer", + "description": "The port to the oracle server.", + "examples": [ + 1523 + ] + }, + "serviceName": { + "type": "string", + "description": "The name of the service.", + "examples": [ + "service" + ] + } + }, + "required": [ + "host", + "port", + "serviceName" + ] + }, + "PostgresServer": { + "type": "object", + "title": "PostgresServer", + "properties": { + "host": { + "type": "string", + "description": "The host to the Postgres server" + }, + "port": { + "type": "integer", + "description": "The port to the Postgres server." + }, + "database": { + "type": "string", + "description": "The name of the database." + }, + "schema": { + "type": "string", + "description": "The name of the schema in the database." + } + }, + "required": [ + "host", + "port", + "database", + "schema" + ] + }, + "PrestoServer": { + "type": "object", + "title": "PrestoServer", + "properties": { + "host": { + "type": "string", + "description": "The host to the Presto server", + "examples": [ + "localhost:8080" + ] + }, + "catalog": { + "type": "string", + "description": "The name of the catalog.", + "examples": [ + "postgres" + ] + }, + "schema": { + "type": "string", + "description": "The name of the schema.", + "examples": [ + "public" + ] + } + }, + "required": [ + "host" + ] + }, + "PubSubServer": { + "type": "object", + "title": "PubSubServer", + "properties": { + "project": { + "type": "string", + "description": "The GCP project name." + } + }, + "required": [ + "project" + ] + }, + "RedshiftServer": { + "type": "object", + "title": "RedshiftServer", + "properties": { + "host": { + "type": "string", + "description": "An optional string describing the server." + }, + "database": { + "type": "string", + "description": "The name of the database." + }, + "schema": { + "type": "string", + "description": "The name of the schema." + }, + "region": { + "type": "string", + "description": "AWS region of Redshift server.", + "examples": ["us-east-1"] + }, + "account": { + "type": "string", + "description": "The account used by the server." + } + }, + "required": [ + "database", + "schema" + ] + }, + "S3Server": { + "type": "object", + "title": "S3Server", + "properties": { + "location": { + "type": "string", + "format": "uri", + "description": "S3 URL, starting with `s3://`", + "examples": [ + "s3://datacontract-example-orders-latest/data/{model}/*.json" + ] + }, + "endpointUrl": { + "type": "string", + "format": "uri", + "description": "The server endpoint for S3-compatible servers.", + "examples": ["https://minio.example.com"] + }, + "format": { + "type": "string", + "enum": [ + "parquet", + "delta", + "json", + "csv" + ], + "description": "File format." + }, + "delimiter": { + "type": "string", + "enum": [ + "new_line", + "array" + ], + "description": "Only for format = json. How multiple json documents are delimited within one file" + } + }, + "required": [ + "location" + ] + }, + "SftpServer": { + "type": "object", + "title": "SftpServer", + "properties": { + "location": { + "type": "string", + "format": "uri", + "pattern": "^sftp://.*", + "description": "SFTP URL, starting with `sftp://`", + "examples": [ + "sftp://123.123.12.123/{model}/*.json" + ] + }, + "format": { + "type": "string", + "enum": [ + "parquet", + "delta", + "json", + "csv" + ], + "description": "File format." + }, + "delimiter": { + "type": "string", + "enum": [ + "new_line", + "array" + ], + "description": "Only for format = json. How multiple json documents are delimited within one file" + } + }, + "required": [ + "location" + ] + }, + "SnowflakeServer": { + "type": "object", + "title": "SnowflakeServer", + "properties": { + "host": { + "type": "string", + "description": "The host to the Snowflake server" + }, + "port": { + "type": "integer", + "description": "The port to the Snowflake server." + }, + "account": { + "type": "string", + "description": "The Snowflake account used by the server." + }, + "database": { + "type": "string", + "description": "The name of the database." + }, + "schema": { + "type": "string", + "description": "The name of the schema." + }, + "warehouse": { + "type": "string", + "description": "The name of the cluster of resources that is a Snowflake virtual warehouse." + } + }, + "required": [ + "account", + "database", + "schema" + ] + }, + "SqlserverServer": { + "type": "object", + "title": "SqlserverServer", + "properties": { + "host": { + "type": "string", + "description": "The host to the database server", + "examples": [ + "localhost" + ] + }, + "port": { + "type": "integer", + "description": "The port to the database server.", + "default": 1433, + "examples": [ + 1433 + ] + }, + "database": { + "type": "string", + "description": "The name of the database.", + "examples": [ + "database" + ] + }, + "schema": { + "type": "string", + "description": "The name of the schema in the database.", + "examples": [ + "dbo" + ] + } + }, + "required": [ + "host", + "database", + "schema" + ] + }, + "SynapseServer": { + "type": "object", + "title": "SynapseServer", + "properties": { + "host": { + "type": "string", + "description": "The host of the Synapse server." + }, + "port": { + "type": "integer", + "description": "The port of the Synapse server." + }, + "database": { + "type": "string", + "description": "The name of the database." + } + }, + "required": [ + "host", + "port", + "database" + ] + }, + "TrinoServer": { + "type": "object", + "title": "TrinoServer", + "properties": { + "host": { + "type": "string", + "description": "The Trino host URL.", + "examples": [ + "localhost" + ] + }, + "port": { + "type": "integer", + "description": "The Trino port." + }, + "catalog": { + "type": "string", + "description": "The name of the catalog.", + "examples": [ + "hive" + ] + }, + "schema": { + "type": "string", + "description": "The name of the schema in the database.", + "examples": [ + "my_schema" + ] + } + }, + "required": [ + "host", + "port", + "catalog", + "schema" + ] + }, + "VerticaServer": { + "type": "object", + "title": "VerticaServer", + "properties": { + "host": { + "type": "string", + "description": "The host of the Vertica server." + }, + "port": { + "type": "integer", + "description": "The port of the Vertica server." + }, + "database": { + "type": "string", + "description": "The name of the database." + }, + "schema": { + "type": "string", + "description": "The name of the schema." + } + }, + "required": [ + "host", + "port", + "database", + "schema" + ] + } + }, + "SchemaElement": { + "type": "object", + "properties": { + "name": { + "type": "string", + "description": "Name of the element." + }, + "physicalType": { + "type": "string", + "description": "The physical element data type in the data source.", + "examples": ["table", "view", "topic", "file"] + }, + "description": { + "type": "string", + "description": "Description of the element." + }, + "businessName": { + "type": "string", + "description": "The business name of the element." + }, + "authoritativeDefinitions": { + "$ref": "#/$defs/AuthoritativeDefinitions" + }, + "tags": { + "$ref": "#/$defs/Tags" + }, + "customProperties": { + "$ref": "#/$defs/CustomProperties" + } + } + }, + "SchemaObject": { + "type": "object", + "properties": { + "logicalType": { + "type": "string", + "description": "The logical element data type.", + "enum": ["object"] + }, + "physicalName": { + "type": "string", + "description": "Physical name.", + "examples": ["table_1_2_0"] + }, + "dataGranularityDescription": { + "type": "string", + "description": "Granular level of the data in the object.", + "examples": ["Aggregation by country"] + }, + "properties": { + "type": "array", + "description": "A list of properties for the object.", + "items": { + "$ref": "#/$defs/SchemaProperty" + } + }, + "quality": { + "$ref": "#/$defs/DataQualityChecks" + } + }, + "allOf": [ + { + "$ref": "#/$defs/SchemaElement" + } + ], + "required": ["name"], + "unevaluatedProperties": false + }, + "SchemaBaseProperty": { + "type": "object", + "properties": { + "primaryKey": { + "type": "boolean", + "description": "Boolean value specifying whether the element is primary or not. Default is false." + }, + "primaryKeyPosition": { + "type": "integer", + "default": -1, + "description": "If element is a primary key, the position of the primary key element. Starts from 1. Example of `account_id, name` being primary key columns, `account_id` has primaryKeyPosition 1 and `name` primaryKeyPosition 2. Default to -1." + }, + "logicalType": { + "type": "string", + "description": "The logical element data type.", + "enum": ["string", "date", "number", "integer", "object", "array", "boolean"] + }, + "logicalTypeOptions": { + "type": "object", + "description": "Additional optional metadata to describe the logical type." + }, + "physicalType": { + "type": "string", + "description": "The physical element data type in the data source. For example, VARCHAR(2), DOUBLE, INT." + }, + "physicalName": { + "type": "string", + "description": "Physical name.", + "examples": ["col_str_a"] + }, + "required": { + "type": "boolean", + "default": false, + "description": "Indicates if the element may contain Null values; possible values are true and false. Default is false." + }, + "unique": { + "type": "boolean", + "default": false, + "description": "Indicates if the element contains unique values; possible values are true and false. Default is false." + }, + "partitioned": { + "type": "boolean", + "default": false, + "description": "Indicates if the element is partitioned; possible values are true and false." + }, + "partitionKeyPosition": { + "type": "integer", + "default": -1, + "description": "If element is used for partitioning, the position of the partition element. Starts from 1. Example of `country, year` being partition columns, `country` has partitionKeyPosition 1 and `year` partitionKeyPosition 2. Default to -1." + }, + "classification": { + "type": "string", + "description": "Can be anything, like confidential, restricted, and public to more advanced categorization. Some companies like PayPal, use data classification indicating the class of data in the element; expected values are 1, 2, 3, 4, or 5.", + "examples": ["confidential", "restricted", "public"] + }, + "encryptedName": { + "type": "string", + "description": "The element name within the dataset that contains the encrypted element value. For example, unencrypted element `email_address` might have an encryptedName of `email_address_encrypt`." + }, + "transformSourceObjects": { + "type": "array", + "description": "List of objects in the data source used in the transformation.", + "items": { + "type": "string" + } + }, + "transformLogic": { + "type": "string", + "description": "Logic used in the element transformation." + }, + "transformDescription": { + "type": "string", + "description": "Describes the transform logic in very simple terms." + }, + "examples": { + "type": "array", + "description": "List of sample element values.", + "items": { + "$ref": "#/$defs/AnyType" + } + }, + "criticalDataElement": { + "type": "boolean", + "default": false, + "description": "True or false indicator; If element is considered a critical data element (CDE) then true else false." + }, + "quality": { + "$ref": "#/$defs/DataQualityChecks" + } + }, + "allOf": [ + { + "$ref": "#/$defs/SchemaElement" + }, + { + "if": { + "properties": { + "logicalType": { + "const": "string" + } + } + }, + "then": { + "properties": { + "logicalTypeOptions": { + "type": "object", + "properties": { + "minLength": { + "type": "integer", + "minimum": 0, + "description": "Minimum length of the string." + }, + "maxLength": { + "type": "integer", + "minimum": 0, + "description": "Maximum length of the string." + }, + "pattern": { + "type": "string", + "description": "Regular expression pattern to define valid value. Follows regular expression syntax from ECMA-262 (https://262.ecma-international.org/5.1/#sec-15.10.1)." + }, + "format": { + "type": "string", + "examples": ["password", "byte", "binary", "email", "uuid", "uri", "hostname", "ipv4", "ipv6"], + "description": "Provides extra context about what format the string follows." + } + }, + "additionalProperties": false + } + } + } + }, + { + "if": { + "properties": { + "logicalType": { + "const": "date" + } + } + }, + "then": { + "properties": { + "logicalTypeOptions": { + "type": "object", + "properties": { + "format": { + "type": "string", + "examples": ["yyyy-MM-dd", "yyyy-MM-dd HH:mm:ss", "HH:mm:ss"], + "description": "Format of the date. Follows the format as prescribed by [JDK DateTimeFormatter](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html). For example, format 'yyyy-MM-dd'." + }, + "exclusiveMaximum": { + "type": "boolean", + "default": false, + "description": "If set to true, all values are strictly less than the maximum value (values < maximum). Otherwise, less than or equal to the maximum value (values <= maximum)." + }, + "maximum": { + "type": "string", + "description": "All date values are less than or equal to this value (values <= maximum)." + }, + "exclusiveMinimum": { + "type": "boolean", + "default": false, + "description": "If set to true, all values are strictly greater than the minimum value (values > minimum). Otherwise, greater than or equal to the minimum value (values >= minimum)." + }, + "minimum": { + "type": "string", + "description": "All date values are greater than or equal to this value (values >= minimum)." + } + }, + "additionalProperties": false + } + } + } + }, + { + "if": { + "anyOf": [ + { + "properties": { + "logicalType": { + "const": "integer" + } + } + } + ] + }, + "then": { + "properties": { + "logicalTypeOptions": { + "type": "object", + "properties": { + "multipleOf": { + "type": "number", + "exclusiveMinimum": 0, + "description": "Values must be multiples of this number. For example, multiple of 5 has valid values 0, 5, 10, -5." + }, + "maximum": { + "type": "number", + "description": "All values are less than or equal to this value (values <= maximum)." + }, + "exclusiveMaximum": { + "type": "boolean", + "default": false, + "description": "If set to true, all values are strictly less than the maximum value (values < maximum). Otherwise, less than or equal to the maximum value (values <= maximum)." + }, + "minimum": { + "type": "number", + "description": "All values are greater than or equal to this value (values >= minimum)." + }, + "exclusiveMinimum": { + "type": "boolean", + "default": false, + "description": "If set to true, all values are strictly greater than the minimum value (values > minimum). Otherwise, greater than or equal to the minimum value (values >= minimum)." + }, + "format": { + "type": "string", + "default": "i32", + "description": "Format of the value in terms of how many bits of space it can use and whether it is signed or unsigned (follows the Rust integer types).", + "enum": ["i8", "i16", "i32", "i64", "i128", "u8", "u16", "u32", "u64", "u128"] + } + }, + "additionalProperties": false + } + } + } + }, + { + "if": { + "anyOf": [ + { + "properties": { + "logicalType": { + "const": "number" + } + } + } + ] + }, + "then": { + "properties": { + "logicalTypeOptions": { + "type": "object", + "properties": { + "multipleOf": { + "type": "number", + "exclusiveMinimum": 0, + "description": "Values must be multiples of this number. For example, multiple of 5 has valid values 0, 5, 10, -5." + }, + "maximum": { + "type": "number", + "description": "All values are less than or equal to this value (values <= maximum)." + }, + "exclusiveMaximum": { + "type": "boolean", + "default": false, + "description": "If set to true, all values are strictly less than the maximum value (values < maximum). Otherwise, less than or equal to the maximum value (values <= maximum)." + }, + "minimum": { + "type": "number", + "description": "All values are greater than or equal to this value (values >= minimum)." + }, + "exclusiveMinimum": { + "type": "boolean", + "default": false, + "description": "If set to true, all values are strictly greater than the minimum value (values > minimum). Otherwise, greater than or equal to the minimum value (values >= minimum)." + }, + "format": { + "type": "string", + "default": "i32", + "description": "Format of the value in terms of how many bits of space it can use (follows the Rust float types).", + "enum": ["f32", "f64"] + } + }, + "additionalProperties": false + } + } + } + }, + { + "if": { + "properties": { + "logicalType": { + "const": "object" + } + } + }, + "then": { + "properties": { + "logicalTypeOptions": { + "type": "object", + "properties": { + "maxProperties": { + "type": "integer", + "minimum": 0, + "description": "Maximum number of properties." + }, + "minProperties": { + "type": "integer", + "minimum": 0, + "default": 0, + "description": "Minimum number of properties." + }, + "required": { + "type": "array", + "items": { + "type": "string" + }, + "minItems": 1, + "uniqueItems": true, + "description": "Property names that are required to exist in the object." + } + }, + "additionalProperties": false + }, + "properties": { + "type": "array", + "description": "A list of properties for the object.", + "items": { + "$ref": "#/$defs/SchemaProperty" + } + } + } + } + }, + { + "if": { + "properties": { + "logicalType": { + "const": "array" + } + } + }, + "then": { + "properties": { + "logicalTypeOptions": { + "type": "object", + "properties": { + "maxItems": { + "type": "integer", + "minimum": 0, + "description": "Maximum number of items." + }, + "minItems": { + "type": "integer", + "minimum": 0, + "default": 0, + "description": "Minimum number of items" + }, + "uniqueItems": { + "type": "boolean", + "default": false, + "description": "If set to true, all items in the array are unique." + } + }, + "additionalProperties": false + }, + "items": { + "$ref": "#/$defs/SchemaItemProperty", + "description": "List of items in an array (only applicable when `logicalType: array`)." + } + } + } + } + ] + }, + "SchemaProperty": { + "type": "object", + "$ref": "#/$defs/SchemaBaseProperty", + "required": ["name"], + "unevaluatedProperties": false + }, + "SchemaItemProperty": { + "type": "object", + "$ref": "#/$defs/SchemaBaseProperty", + "properties": { + "properties": { + "type": "array", + "description": "A list of properties for the object.", + "items": { + "$ref": "#/$defs/SchemaProperty" + } + } + }, + "unevaluatedProperties": false + }, + "Tags": { + "type": "array", + "description": "A list of tags that may be assigned to the elements (object or property); the tags keyword may appear at any level. Tags may be used to better categorize an element. For example, `finance`, `sensitive`, `employee_record`.", + "examples": ["finance", "sensitive", "employee_record"], + "items": { + "type": "string" + } + }, + "DataQuality": { + "type": "object", + "properties": { + "authoritativeDefinitions": { + "$ref": "#/$defs/AuthoritativeDefinitions" + }, + "businessImpact": { + "type": "string", + "description": "Consequences of the rule failure.", + "examples": ["operational", "regulatory"] + }, + "customProperties": { + "type": "array", + "description": "Additional properties required for rule execution.", + "items": { + "$ref": "#/$defs/CustomProperty" + } + }, + "description": { + "type": "string", + "description": "Describe the quality check to be completed." + }, + "dimension": { + "type": "string", + "description": "The key performance indicator (KPI) or dimension for data quality.", + "enum": ["accuracy", "completeness", "conformity", "consistency", "coverage", "timeliness", "uniqueness"] + }, + "method": { + "type": "string", + "examples": ["reconciliation"] + }, + "name": { + "type": "string", + "description": "Name of the data quality check." + }, + "schedule": { + "type": "string", + "description": "Rule execution schedule details.", + "examples": ["0 20 * * *"] + }, + "scheduler": { + "type": "string", + "description": "The name or type of scheduler used to start the data quality check.", + "examples": ["cron"] + }, + "severity": { + "type": "string", + "description": "The severance of the quality rule.", + "examples": ["info", "warning", "error"] + }, + "tags": { + "$ref": "#/$defs/Tags" + }, + "type": { + "type": "string", + "description": "The type of quality check. 'text' is human-readable text that describes the quality of the data. 'library' is a set of maintained predefined quality attributes such as row count or unique. 'sql' is an individual SQL query that returns a value that can be compared. 'custom' is quality attributes that are vendor-specific, such as Soda or Great Expectations.", + "enum": ["text", "library", "sql", "custom"], + "default": "library" + }, + "unit": { + "type": "string", + "description": "Unit the rule is using, popular values are `rows` or `percent`, but any value is allowed.", + "examples": ["rows", "percent"] + } + }, + "allOf": [ + { + "if": { + "properties": { + "type": { + "const": "library" + } + } + }, + "then": { + "$ref": "#/$defs/DataQualityLibrary" + } + }, + { + "if": { + "properties": { + "type": { + "const": "sql" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/DataQualitySql" + } + }, + { + "if": { + "properties": { + "type": { + "const": "custom" + } + }, + "required": ["type"] + }, + "then": { + "$ref": "#/$defs/DataQualityCustom" + } + } + ] + }, + "DataQualityChecks": { + "type": "array", + "description": "Data quality rules with all the relevant information for rule setup and execution.", + "items": { + "$ref": "#/$defs/DataQuality" + } + }, + "DataQualityLibrary": { + "type": "object", + "properties": { + "rule": { + "type": "string", + "description": "Define a data quality check based on the predefined rules as per ODCS.", + "examples": ["duplicateCount", "validValues", "rowCount"] + }, + "mustBe": { + "description": "Must be equal to the value to be valid. When using numbers, it is equivalent to '='." + }, + "mustNotBe": { + "description": "Must not be equal to the value to be valid. When using numbers, it is equivalent to '!='." + }, + "mustBeGreaterThan": { + "type": "number", + "description": "Must be greater than the value to be valid. It is equivalent to '>'." + }, + "mustBeGreaterOrEqualTo": { + "type": "number", + "description": "Must be greater than or equal to the value to be valid. It is equivalent to '>='." + }, + "mustBeLessThan": { + "type": "number", + "description": "Must be less than the value to be valid. It is equivalent to '<'." + }, + "mustBeLessOrEqualTo": { + "type": "number", + "description": "Must be less than or equal to the value to be valid. It is equivalent to '<='." + }, + "mustBeBetween": { + "type": "array", + "description": "Must be between the two numbers to be valid. Smallest number first in the array.", + "minItems": 2, + "maxItems": 2, + "uniqueItems": true, + "items": { + "type": "number" + } + }, + "mustNotBeBetween": { + "type": "array", + "description": "Must not be between the two numbers to be valid. Smallest number first in the array.", + "minItems": 2, + "maxItems": 2, + "uniqueItems": true, + "items": { + "type": "number" + } + } + }, + "required": ["rule"] + }, + "DataQualitySql": { + "type": "object", + "properties": { + "query": { + "type": "string", + "description": "Query string that adheres to the dialect of the provided server.", + "examples": ["SELECT COUNT(*) FROM ${table} WHERE ${column} IS NOT NULL"] + } + }, + "required": ["query"] + }, + "DataQualityCustom": { + "type": "object", + "properties": { + "engine": { + "type": "string", + "description": "Name of the engine which executes the data quality checks.", + "examples": ["soda", "great-expectations", "monte-carlo", "dbt"] + }, + "implementation": { + "oneOf": [ + { + "type": "string" + }, + { + "type": "object" + } + ] + } + }, + "required": ["engine", "implementation"] + }, + "AuthoritativeDefinitions": { + "type": "array", + "description": "List of links to sources that provide more details on the dataset; examples would be a link to an external definition, a training video, a git repo, data catalog, or another tool. Authoritative definitions follow the same structure in the standard.", + "items": { + "type": "object", + "properties": { + "url": { + "type": "string", + "description": "URL to the authority." + }, + "type": { + "type": "string", + "description": "Type of definition for authority: v2.3 adds standard values: `businessDefinition`, `transformationImplementation`, `videoTutorial`, `tutorial`, and `implementation`.", + "examples": ["businessDefinition", "transformationImplementation", "videoTutorial", "tutorial", "implementation"] + } + }, + "required": ["url", "type"] + } + }, + "Support": { + "type": "array", + "description": "Top level for support channels.", + "items": { + "$ref": "#/$defs/SupportItem" + } + }, + "SupportItem": { + "type": "object", + "properties": { + "channel": { + "type": "string", + "description": "Channel name or identifier." + }, + "url": { + "type": "string", + "description": "Access URL using normal [URL scheme](https://en.wikipedia.org/wiki/URL#Syntax) (https, mailto, etc.)." + }, + "description": { + "type": "string", + "description": "Description of the channel, free text." + }, + "tool": { + "type": "string", + "description": "Name of the tool, value can be `email`, `slack`, `teams`, `discord`, `ticket`, or `other`.", + "examples": ["email", "slack", "teams", "discord", "ticket", "other"] + }, + "scope": { + "type": "string", + "description": "Scope can be: `interactive`, `announcements`, `issues`.", + "examples": ["interactive", "announcements", "issues"] + }, + "invitationUrl": { + "type": "string", + "description": "Some tools uses invitation URL for requesting or subscribing. Follows the [URL scheme](https://en.wikipedia.org/wiki/URL#Syntax)." + } + }, + "required": ["channel", "url"] + }, + "Pricing": { + "type": "object", + "properties": { + "priceAmount": { + "type": "number", + "description": "Subscription price per unit of measure in `priceUnit`." + }, + "priceCurrency": { + "type": "string", + "description": "Currency of the subscription price in `price.priceAmount`." + }, + "priceUnit": { + "type": "string", + "description": "The unit of measure for calculating cost. Examples megabyte, gigabyte." + } + } + }, + "Team": { + "type": "object", + "properties": { + "username": { + "type": "string", + "description": "The user's username or email.", + "examples": [ + "mail@example.com", + "uid12345678" + ] + }, + "name": { + "type": "string", + "description": "The user's name.", + "examples": [ + "Jane Doe" + ] + }, + "description": { + "type": "string", + "description": "The user's description." + }, + "role": { + "type": "string", + "description": "The user's job role; Examples might be owner, data steward. There is no limit on the role." + }, + "dateIn": { + "type": "string", + "format": "date", + "description": "The date when the user joined the team." + }, + "dateOut": { + "type": "string", + "format": "date", + "description": "The date when the user ceased to be part of the team." + }, + "replacedByUsername": { + "type": "string", + "description": "The username of the user who replaced the previous user." + } + } + }, + "Role": { + "type": "object", + "properties": { + "role": { + "type": "string", + "description": "Name of the IAM role that provides access to the dataset." + }, + "description": { + "type": "string", + "description": "Description of the IAM role and its permissions." + }, + "access": { + "type": "string", + "description": "The type of access provided by the IAM role." + }, + "firstLevelApprovers": { + "type": "string", + "description": "The name(s) of the first-level approver(s) of the role." + }, + "secondLevelApprovers": { + "type": "string", + "description": "The name(s) of the second-level approver(s) of the role." + }, + "customProperties": { + "$ref": "#/$defs/CustomProperties" + } + }, + "required": ["role"] + }, + "ServiceLevelAgreementProperty": { + "type": "object", + "properties": { + "property": { + "type": "string", + "description": "Specific property in SLA, check the periodic table. May requires units (more details to come)." + }, + "value": { + "anyOf": [ + { + "type": "string" + }, + { + "type": "number" + }, + { + "type": "integer" + }, + { + "type": "boolean" + }, + { + "type": "null" + } + ], + "description": "Agreement value. The label will change based on the property itself." + }, + "valueExt": { + "$ref": "#/$defs/AnyNonCollectionType", + "description": "Extended agreement value. The label will change based on the property itself." + }, + "unit": { + "type": "string", + "description": "**d**, day, days for days; **y**, yr, years for years, etc. Units use the ISO standard." + }, + "element": { + "type": "string", + "description": "Element(s) to check on. Multiple elements should be extremely rare and, if so, separated by commas." + }, + "driver": { + "type": "string", + "description": "Describes the importance of the SLA from the list of: `regulatory`, `analytics`, or `operational`.", + "examples": ["regulatory", "analytics", "operational"] + } + }, + "required": ["property", "value"] + }, + "CustomProperties": { + "type": "array", + "description": "A list of key/value pairs for custom properties.", + "items": { + "$ref": "#/$defs/CustomProperty" + } + }, + "CustomProperty": { + "type": "object", + "properties": { + "property": { + "type": "string", + "description": "The name of the key. Names should be in camel case–the same as if they were permanent properties in the contract." + }, + "value": { + "$ref": "#/$defs/AnyType", + "description": "The value of the key." + } + } + }, + "AnyType": { + "anyOf": [ + { + "type": "string" + }, + { + "type": "number" + }, + { + "type": "integer" + }, + { + "type": "boolean" + }, + { + "type": "null" + }, + { + "type": "array" + }, + { + "type": "object" + } + ] + }, + "AnyNonCollectionType": { + "anyOf": [ + { + "type": "string" + }, + { + "type": "number" + }, + { + "type": "integer" + }, + { + "type": "boolean" + }, + { + "type": "null" + } + ] + } + } +} diff --git a/src/script/validate-examples.sh b/src/script/validate-examples.sh index 74b6a5f..d98e2d5 100644 --- a/src/script/validate-examples.sh +++ b/src/script/validate-examples.sh @@ -7,7 +7,7 @@ LIGHT_BLUE='\033[1;34m' NC='\033[0m' script_dir=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) -json_schema_version=${JSON_SCHEMA_VERSION:-v3.0.1} +json_schema_version=${JSON_SCHEMA_VERSION:-v3.0.2} num_failed_validation=0 echo "Checking if $json_schema_version JSON schema is valid" diff --git a/vendors.md b/vendors.md index 7bfa377..3306778 100644 --- a/vendors.md +++ b/vendors.md @@ -7,8 +7,7 @@ Vendors who natively support ODCS (Open Data Contract Standard). A non-exhaustive, alphabetical list of organizations offering solutions natively compatible with ODCS, such as data catalogs, data quality platforms, security tools, and more. -* [Data Caterer](https://data.catering/latest/docs/guide/data-source/metadata/open-data-contract-standard/) - Test data - management tool using data contracts as a metadata source +* [Data Caterer](https://data.catering/latest/docs/guide/data-source/metadata/open-data-contract-standard/) - Test data management tool using data contracts as a metadata source * [Data Contract CLI](https://cli.datacontract.com) - Open Source tooling around data contracts * [Data Contract Manager](https://datacontract-manager.com) - Professional data contract management tool with Data Marketplace, Access Management, and Data Governance AI. * [Data Contract Playground](https://data-catering.github.io/data-contract-playground/) - Playground site for creating, exporting and validating data contracts