dwh_thesaurus_data
Description¶
This table stores the standardized set of concepts used to represent medical data within the data warehouse. It acts as a controlled vocabulary for coding clinical information, laboratory tests, procedures, diagnoses, and medications, covering multiple classification systems (e.g., CIM10, CCAM, ATC, local lab codes). Each record represents a unique concept with associated metadata to describe its meaning, structure, permissible values, and usage in the dataset.
Columns¶
| Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table |
|---|---|---|---|---|---|---|---|
| thesaurus_data_num | Unique identifier | bigint(64) | Yes | Yes | No | ||
| thesaurus_code | Identifier of the vocabulary or coding system (e.g. CIM10, CCAM, ATC). | varchar(30) | Yes | No | No | ||
| concept_code | The concept code represents the identifier of the concept in the source vocabulary. Note that concept codes are not unique across vocabularies. | varchar(100) | Yes | No | No | ||
| concept_str | Human-readable name or label of the concept. | varchar(2000) | Yes | No | No | ||
| description | Extended description or definition of the concept. | varchar(4000) | No | No | No | ||
| measuring_unit | Measurement unit for quantitative concepts (e.g., mg, mmHg). | varchar(50) | No | No | No | ||
| value_type | Type of value expected | numeric : quantitative value text: free-text value present : presence/absence indicator liste : enumerated values |
varchar(50) | Yes | No | No | |
| list_values | List of values when type is liste else null | Values should be split by ; (e.g. principal;associé;relié) | varchar(4000) | No | No | No | |
| thesaurus_parent_num | Parent concept in the hierarchy. | bigint(64) | No | No | Yes | dwh_thesaurus_data | |
| count_data_used | Count of dwh_data associated with this concept | integer(32) | No | No | No | ||
| update_date | Date and time of the record’s last update. | timestamptz | No | No | No | ||
| upload_id | Identifier of the pipeline integration run, used to differentiate each batch of integrated data. | Defined at the start of the pipeline as datetime.now().strftime("%Y%m%d%H%M%S"). For example a batch integrated on 15/09/2025 at 00:00:00 has upload_id = 20250915000000. |
bigint(64) | No | No | No |