Re-analysis and generation of Overstay2 model: Difference between revisions
| (12 intermediate revisions by 2 users not shown) | |||
| Line 2: | Line 2: | ||
== Defining the contributing factors data == | == Defining the contributing factors data == | ||
The model depends on a regression analysis of a number of possible factors in our regularly collected data. Our data structure had changed since the original project, so we cleaned up our definitions, resulting in the [[Data definition for | The model depends on a regression analysis of a number of possible factors in our regularly collected data. Our data structure had changed since the original project, so we cleaned up our definitions, resulting in the [[Data definition for factor candidates for the Overstay2 project]]. | ||
{{Discuss | Still needs: | {{Discuss | Still needs: | ||
* considerations | * considerations | ||
* values we considered and rejected | * values we considered and rejected | ||
* minimize duplication of [[Data definition for | * minimize duplication of [[Data definition for factor candidates for the Overstay2 project]], things that users of the data need to know going forward need to live there, decisions taken that don't affect ongoing process should be documented here. | ||
}} | }} | ||
| Line 14: | Line 14: | ||
** \\ad.wrha.mb.ca\WRHA\HSC\shared\MED\MED_CCMED\Julie\MedProjects\Overstay_Project_2025 | ** \\ad.wrha.mb.ca\WRHA\HSC\shared\MED\MED_CCMED\Julie\MedProjects\Overstay_Project_2025 | ||
* '''Reference Admit DtTm:''' We based the date range on the first medicine admit date during a [[Data definition for | * '''Reference Admit DtTm:''' We based the date range on the first medicine admit date during a [[Data definition for factor candidates for the Overstay2 project#Hospitalization]], based on the earliest [[Boarding Loc]] dttm. | ||
* '''Dataset inclusion criteria: (all/and) of the following | * '''Dataset inclusion criteria: (all/and) of the following | ||
** ''Reference Admit DtTm'' >=2020-11-01 and <2025-01-01 | ** ''Reference Admit DtTm'' >=2020-11-01 and <2025-01-01 | ||
** [[RecordStatus]] = Vetted | ** [[RecordStatus]] = Vetted | ||
** final [[dispo]] of the [[Data definition for | ** final [[dispo]] of the [[Data definition for factor candidates for the Overstay2 project#Hospitalization]] is to a destination outside of the hospital of the admission (can be to other hospital) | ||
** HOBS: include the record only if: | ** HOBS: include the record only if: | ||
*** the first medicine admission during a hospitalization is on a HOBS unit, and | *** the first medicine admission during a hospitalization is on a HOBS unit, and | ||
| Line 33: | Line 33: | ||
| All || All|| 42,078|| 1741 (4.1%) || 40,337 (95.9%) | | All || All|| 42,078|| 1741 (4.1%) || 40,337 (95.9%) | ||
|- | |- | ||
| All || Training|| 21,054|| 859 ( | | All || Training|| 21,054|| 859 (4.1%) || 20,195 (95.9%) | ||
|- | |- | ||
| All || Validation|| 21,024|| 882 (2 | | All || Validation|| 21,024|| 882 (4.2%) || 20,142 (95.8%) | ||
|- | |- | ||
|- | |- | ||
| Line 45: | Line 45: | ||
|- | |- | ||
|- | |- | ||
| SBGH || All|| 13,762|| | | SBGH || All|| 13,762|| 398 (2.9%) || 13,364 (97.1%) | ||
|- | |- | ||
| SBGH || Training|| 6,905|| 204 (3.0%) || 6,701 (97.0%) | | SBGH || Training|| 6,905|| 204 (3.0%) || 6,701 (97.0%) | ||
| Line 115: | Line 115: | ||
== Analysis and model generation == | == Analysis and model generation == | ||
=== Parameter candidates === | === Parameter candidates === | ||
See [[Data definition for | See [[Data definition for factor candidates for the Overstay2 project]] for the definitions. | ||
==== Location Grouping considerations ==== | |||
{{DJ | | |||
* When I looked at your code that breaks out {{OSDD|Location / living arrangement}} into groupings and measures it seemed to me that it was mixing up data cleaning and validation with measure definition and it might be good to keep those separate. Cleaning and validation should apply to the data in general, not just this model, no? It would make sense to document the steps taken and things found and remedies implemented on this page, but having them part of the definition seems problematic. I think I sent that as an email, but I think it would be better to track this on the wiki to have a trail for the decisions. [[User:Ttenbergen|Ttenbergen]] 12:03, 25 June 2025 (CDT) | |||
}} | |||
# Postal Code | === reference/examples for links === | ||
{{DJ| | |||
* leaving these here as examples how to link to the definitions on [[Data definition for factor candidates for the Overstay2 project]]. The currently used definition should live there, but changes and reasons should probably live here. We can change that format, talk to me if needed. [[User:Ttenbergen|Ttenbergen]] 11:35, 25 June 2025 (CDT) | |||
}} | |||
* {{OSDD|Age}} | |||
* {{OSDD|PCH/Chronic Care}} | |||
* other {{OSDD|Location / living arrangement}} | |||
* {{OSDD|ADL components}} and | |||
** {{OSDD|ADL_Adlmean_NH }} - among those who came from PCH/CHF | |||
** {{OSDD|ADL_Adlmean_age}} - interaction with Age | |||
* {{OSDD|Glasgow Coma Scale}} | |||
* {{OSDD|Location / living arrangement}} Postal Code (also see [[#Location Grouping for [[Postal Code]] is N/A]]) | |||
* {{OSDD|Charlson Diagnoses}} (Categories and Total Score) | |||
** MI, CHF, PVD, CVA , Pulmonary, Connective, Ulcer, Renal | |||
** {{OSDD|Charlson Comorbidity Index}} | |||
** {{OSDD|Charlson Score * NH }} - among those who came from PCH/CHF | |||
* {{OSDD|Diagnoses}} that might prevent/delay meeting PCH/Home Care criteria | |||
* {{OSDD|Homeless}} | |||
=== Location Grouping for [[Postal Code]] is N/A === | |||
Analysis notes: JM found postal code N/A =2759, JM used the R_Province, Pre_inpt_Location, Previous Location instead to define the 5 categories above. Also encountered no match in the Postal_Code_Master List but was able to categorized based on the first 3 characters (N=273) - list given to Pagasa to add. (DR agreed in the meeting with JM Feb10) | |||
=== Dataset split into training and validation data === | === Dataset split into training and validation data === | ||
| Line 166: | Line 153: | ||
=== Decision on a model === | === Decision on a model === | ||
*For each site's training set and validation set, perform chi square test for independence between the variable OS (Overstay >= 10days and Overstay < 10d) and each factors listed [[Data definition for | *For each site's training set and validation set, perform chi square test for independence between the variable OS (Overstay >= 10days and Overstay < 10d) and each factors listed [[Data definition for factor candidates for the Overstay2 project]] to identify the factors that may affect the overstay individually. | ||
*Training data set - Methodology to find the '''best''' model involves | *Training data set - Methodology to find the '''best''' model involves | ||
** Basic plan for selecting the variables for the model - | ** Basic plan for selecting the variables for the model - | ||