dwh_document

Description

This table contains unstructured clinical documents and reports associated with a patient’s healthcare encounter. Each record corresponds to a single document instance, such as a clinical report, observation note, or questionnaire form. The table captures the document content as free text together with key metadata including the author, the date of creation, the type of document, and the healthcare context in which it was generated (e.g., visit, service, or care unit).

Columns

Field User Guide ETL Conventions Datatype Required Primary Key Foreign Key FK Table
document_num Unique identifier bigint(64) Yes Yes No
patient_num bigint(64) Yes No Yes dwh_patient
title The title of the document. varchar(400) No No No
document_date The date the document was recorded. timestamptz Yes No No
document_type The type of the document (e.g. CR, observation, formulaire, ...) varchar(100) No No No
author The person who created or authored the document varchar(200) No No No
displayed_text The content of the document. Convert to html text Yes No No
stay_num The visit during which the document was created. bigint(64) No No Yes dwh_patient_stay
department_num The service associated with the document. bigint(64) No No Yes dwh_thesaurus_department
unit_num The care unit associated with the document. bigint(64) No No Yes dwh_thesaurus_unit
instance_document_id Code of the healthcare center, see hospital_instance for more informations varchar(40) No No No
document_origin_code Indicate source software for this document varchar(60) No No No
id_doc_source Unique identifier in source software varchar(300) No No No
document_pid Optional pseudo-identifier for a document. Mainly included for structural consistency; not required for standard analytical use. Generated as a hash of id_doc_source combined with document_salt. varchar(300) No No No
document_salt Optional random salt used in the hash algorithm to generate document_pid. varchar(300) No No No
upload_id Identifier of the pipeline integration run, used to differentiate each batch of integrated data. Defined at the start of the pipeline as datetime.now().strftime("%Y%m%d%H%M%S").
For example a batch integrated on 15/09/2025 at 00:00:00 has upload_id = 20250915000000.
bigint(64) No No No
update_date Date and time of the record’s last update. timestamptz No No No