Re-analysis and generation of Overstay2 model: Difference between revisions

(46 intermediate revisions by 2 users not shown)

Line 1:

This page is about the development of the model for generating scores/colours for [[Project Overstay2]]. Since our data collection and the healthcare system changed since the first iteration, we did a re-analysis and generation of Overstay2 model, resulting in the [[Overstay2 scoring ~~model~~]] that ~~generates~~ the colour. Also see the [[Overstay2 Overview]].

This page is about the development of the model for generating scores/colours for [[Project Overstay2]]. Since our data collection and the healthcare system changed since the first iteration, we did a re-analysis and generation of Overstay2 model, resulting in the [[Overstay2 scoring models]] that generate the [[Overstay2 colour]]. Also see the [[Overstay2 Overview]].

== Defining the contributing factors data ==

The model depends on a regression analysis of a number of possible factors in our regularly collected data. Our data structure had changed since the original project, so we cleaned up our definitions, resulting in the [[Data definition for ~~contributing factors~~ for the Overstay2 project]].

The model depends on a regression analysis of a number of possible factors in our regularly collected data. Our data structure had changed since the original project, so we cleaned up our definitions, resulting in the [[Data definition for factor candidates for the Overstay2 project]].

{{Discuss | Still needs:

* considerations

* values we considered and rejected

* minimize duplication of [[Data definition for ~~contributing factors~~ for the Overstay2 project]]

* minimize duplication of [[Data definition for factor candidates for the Overstay2 project]], things that users of the data need to know going forward need to live there, decisions taken that don't affect ongoing process should be documented here.

}}

== Model dataset and date range ==

* '''Dataset:''' We used the file 2025-2-3_13.56.31_Centralized_data.accdb as a basis for the project. A copy for future reference is at

\ad.wrha.mb.ca\WRHA\HSC\shared\MED\MED_CCMED\Julie\MedProjects\Overstay_Project_2025

** \\ad.wrha.mb.ca\WRHA\HSC\shared\MED\MED_CCMED\Julie\MedProjects\Overstay_Project_2025

* '''~~Date Range~~:''' We based the date range on the first medicine admit date during a ~~hospitalization~~, based on the earliest [[Boarding Loc]] dttm of ~~a (~~[[~~PHIN~~]] ~~and~~ [[~~Visit Admit DtTm~~]]) ~~combination~~

* '''Reference Admit DtTm:''' We based the date range on the first medicine admit date during a [[Data definition for factor candidates for the Overstay2 project#Hospitalization]], based on the earliest [[Boarding Loc]] dttm.

* '''Dataset inclusion criteria: (all/and) of the following

** ''Reference Admit DtTm'' >=2020-11-01 and <2025-01-01

** [[RecordStatus]] = Vetted

** final [[dispo]] of the [[Data definition for factor candidates for the Overstay2 project#Hospitalization]] is to a destination outside of the hospital of the admission (can be to other hospital)

** HOBS: include the record only if:

*** the first medicine admission during a hospitalization is on a HOBS unit, and

*** there is a Transfer_Ready_Dttm associated with that unit, and

*** the patient is discharged from that unit to a a destination outside of the hospital of the admission (can be to other hospital)

* This resulted in a dataset with the following:

~~{{Discuss| add a table of admission numbers by year and site }}~~

** Total hospitalizations: 42,078

~~first Med admit dttm~~ >=~~2020~~-11-~~01 and <2025~~-01-~~01 and are Vetted~~

{| class="wikitable notsortable"

!Site !! Data Set!! Total !! Overstay >= 10d !! Overstay < 10 days

|-

| All || All|| 42,078|| 1741 (4.1%) || 40,337 (95.9%)

|-

| All || Training|| 21,054|| 859 (4.1%) || 20,195 (95.9%)

|-

| All || Validation|| 21,024|| 882 (4.2%) || 20,142 (95.8%)

|-

| HSC || All|| 16,813|| 616 (3.7%) || 16,197 (96.3%)

|-

| HSC || Training|| 8,371|| 295 (3.5%) || 8,076(96.5%)

|-

| HSC || Validation|| 8,442|| 321 (3.8%) || 8,121 (96.2%)

|-

| SBGH || All|| 13,762|| 398 (2.9%) || 13,364 (97.1%)

|-

| SBGH || Training|| 6,905|| 204 (3.0%) || 6,701 (97.0%)

|-

| SBGH || Validation|| 6,857|| 194 (2.8%) || 6,663 (97.2%)

|-

| GGH || All|| 11,503|| 727 (6.3%) || 10,776 (93.7%)

|-

| GGH || Training|| 5,778|| 360 (6.2%) || 5,418 (93.8%)

|-

| GGH || Validation|| 5,725|| 367 (6.4%) || 5,358 (93.6%)

|-

|}

JM had found Vetted n=226 cases with Last discharge DtTm (in ICU or Med) after 2024 until Feb 3,2025. Only 13 did not leave own site, 19 expired, 194 left the site. From the 213, some are long stayed patients admitted Aug –1, Sept-3, Oct-8, Nov-18, Dec=196. (DR agreed in the meeting with JM Feb10).

The SAS code defining this dataset can be found in S:\MED\MED_CCMED\Julie\MedProjects\Overstay_Project_2025\Data\prepdata_7Feb2025.sas

The CFE code defining this dataset

{{Collapsable

| always=Specific decisions were discussed and made.

| full=

JM had found Vetted n=226 cases with Last discharge DtTm (in ICU or Med) after 2024 until Feb 3,2025. Only 13 did not leave own site, 19 expired, 194 left the site. From the 213, some are long stayed patients admitted Aug –1, Sept-3, Oct-8, Nov-18, Dec=196. (DR agreed in the meeting with JM Feb10).

* First Med Admits who were [[RecordStatus]] = incomplete but with [[Dispo DtTm]] present are excluded.

* First Med Admits who were still in the unit are excluded.

* First Med Admits who were still in the unit are excluded (ie no [[Dispo DtTm]])

* First Med Admits who were [[RecordStatus]] = ~~incomplete~~ vetted are included.

* First Med Admits who were [[RecordStatus]] = vetted are included.

* Deceased should be included: I think there was talk about excluding these; I don’t think that is valid. We don’t know when they arrive that they will die, and if they die after becoming transfer ready that is still an overstay we could have avoided.

* Discharge to or Previous Location = Hospice should be included – for the same reason we would include PCH.

* Palliative patients should be included

** because our definition “Palliative care” (ICD10 Z51.5) doesn’t imply death is imminent. Palliative patients were excluded before, but our definition has changed, and how this appears to be handled now has as well. Also, they may be waiting in hospital for a hospice, so again, that’s overstay.

** Discharged to STB Palliative Care - -included (DR agreed in the meeting with JM feb10)

* AMA – include these.

** Initial thought was that AMA implies they were not discharge ready, but it could also include those who were sick of waiting for a PCH and walked out. They might just be someone who waited for 2 weeks while dispo ready and eventually ran away because they did not want to wait for home care or etc any longer. But can someone be transfer ready and still leave AMA? Yes, e.g. when they were transfer ready but the discharge took so long that they no longer are and now can leave AMA again.

*** JM found 3061 dispo AMA (2810 wo TR_dt, 251 w TR_Dt)

* Dispo TCU/TCE – include, and treat as discharge from this hospital

* Dispo HSC Lennox Bell/Institution NOS – treat as we would back-to-PCH/home

* Dispo another ward within WPG (LAU at CON, OAKS, VIC)? – include, and treat as discharge from this hospital

* Unknown disposition at discharge on the last admission – those transferred to another service (ICU/ OR/ etc within the same hospital - already excluded with RecordStatus = ”incomplete” and by only including if (1c)

* Dispo Transfers to different hospital ICU within Winnipeg – include

* Transfers outside WPG – include and treat as if discharged

* Overstay 5 to 9 days - included as normal (Rodrigo excluded these from model building)

~~== Analysis and model generation ==~~

* A null Tr_DtTm will be allowed

~~=== Dataset split into training and validation data ===~~

* This defines “hospitalization” as per-site, so if the patient is moved to subsequent medicine wards at a different hospital there will be a new record

~~We separated the population into two datasets based on the odd/even status of the last digit of the [[Chart number]]:~~

* ~~Even: Training set~~

* ~~Odd: validation set~~

~~=== Model generation and testing ===~~

* EMIP / TR_DtTm during ED portion of visit: treat this as you would on the ward. The First TR DTtm at ER will be taken regardless whether there is a second TR dttm when patient moved to a Med Med ward (DR agreed in the meeting with JM Feb10)

~~{{Discuss|~~

* ~~we should add some basic info~~

* details can remain in ~~other files such as SAS, but this should include file links }}~~

~~=== Decision on a model ===~~

~~{{Discuss |~~

* the statistical tests that were done to evaluate the model

* the factors leading to our decision on "Model 8"

}}

~~This resulted in [[Overstay2 scoring model]].~~

=== Model development Inclusion/Exclusion of "Green" admissions ===

=== ~~Decision on a probability threshold~~ ===

The overstay score generated by [[Overstay2 scoring model]] is used to assign an [[Overstay2 colour]] based on a threshold value, which affects the patient care team activities of the [[Overstay2 processes on the units to reduce overstay]]. This section explains how we decided on that threshold value.

~~==== Optimal threshold ====~~

If we plan to generate overstay colours like the last time, then the one group who would not have the model applied to them would be the “greens”, since the decision tree turns them green before the model would be applied. If we were able to determine who these greens would have been, would we want to exclude them from the model?

~~{{Discuss|~~

* What was the ~~consideration for~~ the ~~initial choice of~~, ~~I think~~, ~~0.051~~? }}

~~==== Pragmatic threshold ====~~

There is no way to exclude the greens from the model, so we won’t try.

~~The drives a process that requires additional work form the patient care team.~~ There ~~are limits~~ to ~~those resources. The [[#optimal threshold]] would have resulted in a assigning~~

~~xxx%~~

~~of patients an [[Overstay2 colour]] of "red". This would have overwhelmed~~ the ~~[[Overstay2 processes on~~ the ~~units to reduce overstay]]~~.

~~{{Discuss|~~

== Analysis and model generation ==

* initial thoughts were "15-17% being red, with an aim to get 60-75% of overstay patients" }}

=== Parameter candidates ===

See [[Data definition for factor candidates for the Overstay2 project]] for the definitions.

~~It was determined that a xxx-yyy% of "red" would be the maximum we could assign, at least during the~~ [[Overstay2 ~~timeline | initial phase~~]] of the ~~project. To achieve this, we chose a threshold of '''0~~.~~069'''~~

~~For the selected [[Overstay2 scoring model]] this led to the following predicted values~~

==== Location Grouping considerations ====

* [[Overstay2 colour]] = ~~"red": xxxx%~~

{{DJ |

~~Do you have numbers for something like false positives~~/ ~~false negatives/ positive predictive value/ etc~~? ~~Will rely~~ on ~~you to make~~ this ~~something~~ that would ~~satisfy someone questioning~~ this ~~from~~ a ~~statistical angle~~. [[User:Ttenbergen|Ttenbergen]] 15:19, ~~23 February~~ 2025 (~~CST~~)

* When I looked at your code that breaks out {{OSDD|Location / living arrangement}} into groupings and measures it seemed to me that it was mixing up data cleaning and validation with measure definition and it might be good to keep those separate. Cleaning and validation should apply to the data in general, not just this model, no? It would make sense to document the steps taken and things found and remedies implemented on this page, but having them part of the definition seems problematic. I think I sent that as an email, but I think it would be better to track this on the wiki to have a trail for the decisions. [[User:Ttenbergen|Ttenbergen]] 12:03, 25 June 2025 (CDT)

}}

{{DJ |

=== reference/examples for links ===

* ~~Does this page miss anything that is not addressed elsewhere~~ as ~~per pages either linked from here or from~~ [[Overstay2 ~~Overview~~]]~~? If not feel free~~ to ~~delete this question~~. [[User:Ttenbergen|Ttenbergen]] 15:19, ~~23 February~~ 2025 (~~CST~~)

{{DJ|

* leaving these here as examples how to link to the definitions on [[Data definition for factor candidates for the Overstay2 project]]. The currently used definition should live there, but changes and reasons should probably live here. We can change that format, talk to me if needed. [[User:Ttenbergen|Ttenbergen]] 11:35, 25 June 2025 (CDT)

}}

* {{OSDD|Age}}

* {{OSDD|PCH/Chronic Care}}

* other {{OSDD|Location / living arrangement}}

* {{OSDD|ADL components}} and

** {{OSDD|ADL_Adlmean_NH }} - among those who came from PCH/CHF

** {{OSDD|ADL_Adlmean_age}} - interaction with Age

* {{OSDD|Glasgow Coma Scale}}

* {{OSDD|Location / living arrangement}} Postal Code (also see [[#Location Grouping for [[Postal Code]] is N/A]])

* {{OSDD|Charlson Diagnoses}} (Categories and Total Score)

** MI, CHF, PVD, CVA , Pulmonary, Connective, Ulcer, Renal

** {{OSDD|Charlson Comorbidity Index}}

** {{OSDD|Charlson Score * NH }} - among those who came from PCH/CHF

* {{OSDD|Diagnoses}} that might prevent/delay meeting PCH/Home Care criteria

* {{OSDD|Homeless}}

=== Location Grouping for [[Postal Code]] is N/A ===

Analysis notes: JM found postal code N/A =2759, JM used the R_Province, Pre_inpt_Location, Previous Location instead to define the 5 categories above. Also encountered no match in the Postal_Code_Master List but was able to categorized based on the first 3 characters (N=273) - list given to Pagasa to add. (DR agreed in the meeting with JM Feb10)

=== Dataset split into training and validation data ===

We separated the population into two datasets based on the odd/even status of the last digit of the [[Chart number]]:

* Even: Training set

* Odd: validation set

=== Model generation and testing ===

See \\ad.wrha.mb.ca\WRHA\HSC\shared\MED\MED_CCMED\Julie\MedProjects\Overstay_Project_2025 and emails between Julie, Tina and Dan Roberts ~2025-02

=== Decision on a model ===

*For each site's training set and validation set, perform chi square test for independence between the variable OS (Overstay >= 10days and Overstay < 10d) and each factors listed [[Data definition for factor candidates for the Overstay2 project]] to identify the factors that may affect the overstay individually.

*Training data set - Methodology to find the '''best''' model involves

** Basic plan for selecting the variables for the model -

*** Perform logistic model with the OS as the dependent variable and the independent variables beginning with the results from univariable analysis above and

*** Then by multivariable analysis using all independent variables (full model) and select via stepwise procedure both forward and backward selection.

*** Examine the importance of each variable included based on the probability result of its coefficient.

*** Those not contributing to the model are eliminated and new model is fitted. The process of deleting, refitting and verifying continues until it appears that all important variables are already included.

** Assess the adequacy of the model both in terms of the individual variables and its overall fit by the following :

***Estimated coefficients showing p-values of < 0.05 or having clinical relevance with p-values higher or close to 0.05 are included in the model.

***The association of the predicted probabilities and observed responses is calculated by the Concordance (C) index and area under the curve (AUC) between the true positive rate (sensitivity) and false positive rate (1-specificity). A value > 0.5 implies ability to discriminate the positive and negative outcomes while a value 1 implies perfect classification. This quantity indicates how well the model ranks predictions .

***The Hosmer-Lemeshow Goodness-of-fit test is used to assess how well the logistic regression model fits the data. A high p-value (usually > 0.05) means the model fits well while a low p-value (≤ 0.05) indicates poor fit of the model to the data.

*Validation data set involves:

** Using the candidate models from the training data set - fit the model using the validation data set.

** From the predicted values, determine the Concordance (C) index and area under the curve (AUC) between the true positive rate (sensitivity) and false positive rate (1-specificity). It must result to values closer to 1.

** Group the predicted data into deciles (10 groups) and for each group, the observed number of events is compared to the expected number of events predicted by the model. The sum of these 10 groups called Chi-square statistic with 8 degrees of freedom must have p-value > 0.05 to denote good fit.

* If both the training data set and validation data set gave good results in all tests, then the model is a candidate for selection. If there are more than one candidate models, the one having more clinical relevance is opted.

* This resulted in [[Overstay2 scoring models]] by site.

=== Decision on a probability threshold ===

The predictive models we established are used to stratify the patient population for different [[Overstay2 processes on the units to reduce discharge delay]]. Details about establishing a threshold for the probabilities of the [[Overstay2 scoring models]] are in

*[[Overstay2 colour]].

== Related articles ==

@@ Line 1: / Line 1: @@
-This page is about the development of the model for generating scores/colours for [[Project Overstay2]]. Since our data collection and the healthcare system changed since the first iteration, we did a re-analysis and generation of Overstay2 model, resulting in the [[Overstay2 scoring model]] that generates the colour. Also see the [[Overstay2 Overview]].
+This page is about the development of the model for generating scores/colours for [[Project Overstay2]]. Since our data collection and the healthcare system changed since the first iteration, we did a re-analysis and generation of Overstay2 model, resulting in the [[Overstay2 scoring models]] that generate the [[Overstay2 colour]]. Also see the [[Overstay2 Overview]].
 == Defining the contributing factors data ==
-The model depends on a regression analysis of a number of possible factors in our regularly collected data. Our data structure had changed since the original project, so we cleaned up our definitions, resulting in the [[Data definition for contributing factors for the Overstay2 project]].
+The model depends on a regression analysis of a number of possible factors in our regularly collected data. Our data structure had changed since the original project, so we cleaned up our definitions, resulting in the [[Data definition for factor candidates for the Overstay2 project]].
 {{Discuss | Still needs:
 * considerations
 * values we considered and rejected
-* minimize duplication of [[Data definition for contributing factors for the Overstay2 project]]
+* minimize duplication of [[Data definition for factor candidates for the Overstay2 project]], things that users of the data need to know going forward need to live there, decisions taken that don't affect ongoing process should be documented here.
   }}
 == Model dataset and date range ==
 * '''Dataset:''' We used the file 2025-2-3_13.56.31_Centralized_data.accdb as a basis for the project. A copy for future reference is at
-\ad.wrha.mb.ca\WRHA\HSC\shared\MED\MED_CCMED\Julie\MedProjects\Overstay_Project_2025
+** \\ad.wrha.mb.ca\WRHA\HSC\shared\MED\MED_CCMED\Julie\MedProjects\Overstay_Project_2025
-* '''Date Range:''' We based the date range on the first medicine admit date during a hospitalization, based on the earliest [[Boarding Loc]] dttm of a ([[PHIN]] and [[Visit Admit DtTm]]) combination
+* '''Reference Admit DtTm:''' We based the date range on the first medicine admit date during a [[Data definition for factor candidates for the Overstay2 project#Hospitalization]], based on the earliest [[Boarding Loc]] dttm.
+* '''Dataset inclusion criteria: (all/and) of the following
+** ''Reference Admit DtTm'' >=2020-11-01 and <2025-01-01
+** [[RecordStatus]] = Vetted
+** final [[dispo]] of the [[Data definition for factor candidates for the Overstay2 project#Hospitalization]] is to a destination outside of the hospital of the admission (can be to other hospital)
+** HOBS: include the record only if:
+*** the first medicine admission during a hospitalization is on a HOBS unit, and
+*** there is a Transfer_Ready_Dttm associated with that unit, and
+*** the patient is discharged from that unit to a a destination outside of the hospital of the admission (can be to other hospital)
 * This resulted in a dataset with the following:
-{{Discuss| add a table of admission numbers by year and site }}
+** Total hospitalizations: 42,078
-    first Med admit dttm >=2020-11-01 and <2025-01-01 and are Vetted
+{| class="wikitable notsortable"
+ !Site !! Data Set!! Total !! Overstay >= 10d !! Overstay < 10 days
+|-
+| All  || All|| 42,078|| 1741 (4.1%) || 40,337 (95.9%)
+|-
+| All  || Training|| 21,054|| 859 (4.1%) || 20,195 (95.9%)
+|-
+| All  || Validation|| 21,024|| 882 (4.2%) || 20,142 (95.8%)
+|-
+|-
+| HSC  || All|| 16,813|| 616 (3.7%) || 16,197 (96.3%)
+|-
+| HSC  || Training|| 8,371|| 295 (3.5%) || 8,076(96.5%)
+|-
+| HSC  || Validation|| 8,442|| 321 (3.8%) || 8,121 (96.2%)
+|-
+|-
+| SBGH || All|| 13,762|| 398 (2.9%) || 13,364 (97.1%)
+|-
+| SBGH  || Training|| 6,905|| 204 (3.0%) || 6,701 (97.0%)
+|-
+| SBGH  || Validation|| 6,857|| 194 (2.8%) || 6,663 (97.2%)
+|-
+|-
+| GGH  || All|| 11,503|| 727 (6.3%) || 10,776 (93.7%)
+|-
+| GGH  || Training|| 5,778|| 360 (6.2%) || 5,418 (93.8%)
+|-
+| GGH  || Validation|| 5,725|| 367 (6.4%) || 5,358 (93.6%)
+|-
+|}
-    JM had found Vetted n=226 cases with  Last discharge DtTm (in ICU or Med) after 2024 until Feb 3,2025.  Only 13 did not leave own site, 19 expired, 194 left the site. From the 213,  some are long stayed patients admitted Aug –1, Sept-3, Oct-8, Nov-18, Dec=196. (DR agreed in the meeting with JM Feb10).
+The SAS code defining this dataset can be found in S:\MED\MED_CCMED\Julie\MedProjects\Overstay_Project_2025\Data\prepdata_7Feb2025.sas
+The CFE code defining this dataset
+{{DT | still needs to be set up by Tina... }}
+{{Collapsable
+| always=Specific decisions were discussed and made.
+| full=
+JM had found Vetted n=226 cases with  Last discharge DtTm (in ICU or Med) after 2024 until Feb 3,2025.  Only 13 did not leave own site, 19 expired, 194 left the site. From the 213,  some are long stayed patients admitted Aug –1, Sept-3, Oct-8, Nov-18, Dec=196. (DR agreed in the meeting with JM Feb10).
 * First Med Admits who were [[RecordStatus]] = incomplete but with [[Dispo DtTm]] present are excluded.
-* First Med Admits who were still in the unit are excluded.
+* First Med Admits who were still in the unit are excluded (ie no [[Dispo DtTm]])
-* First Med Admits who were [[RecordStatus]] = incomplete vetted are included.
+* First Med Admits who were [[RecordStatus]] = vetted are included.
+* Deceased should be included: I think there was talk about excluding these; I don’t think that is valid. We don’t know when they arrive that they will die, and if they die after becoming transfer ready that is still an overstay we could have avoided.
+* Discharge to or Previous Location = Hospice should be included – for the same reason we would include PCH.
+* Palliative patients should be included
+** because our definition “Palliative care” (ICD10 Z51.5) doesn’t imply death is imminent. Palliative patients were excluded before, but our definition has changed, and how this appears to be handled now has as well. Also, they may be waiting in hospital for a hospice, so again, that’s overstay.
+** Discharged to  STB Palliative  Care - -included  (DR agreed in the meeting with JM feb10)
+* AMA – include these.
+** Initial thought was that AMA implies they were not discharge ready, but it could also include those who were sick of waiting for a PCH and walked out. They might just be someone who waited for 2 weeks while dispo ready and eventually ran away because they did not want to wait for home care or etc any longer. But can someone be transfer ready and still leave AMA? Yes, e.g. when they were transfer ready but the discharge took so long that they no longer are and now can leave AMA again.
+*** JM found 3061 dispo AMA (2810 wo TR_dt, 251 w TR_Dt)
+* Dispo TCU/TCE – include, and treat as discharge from this hospital
+* Dispo HSC Lennox Bell/Institution NOS – treat as we would back-to-PCH/home
+* Dispo another ward within WPG  (LAU at CON, OAKS, VIC)? – include, and treat as discharge from this hospital
+* Unknown disposition at discharge on the last admission – those transferred to another service (ICU/ OR/ etc within the same hospital - already excluded with RecordStatus = ”incomplete” and by only including if (1c)
+* Dispo Transfers to different hospital ICU within Winnipeg – include
+* Transfers outside WPG – include and treat as if discharged
+* Overstay 5 to 9 days   - included as normal (Rodrigo excluded these from model building)
-== Analysis and model generation ==
+* A null Tr_DtTm will be allowed
-=== Dataset split into training and validation data ===
+* This defines “hospitalization” as per-site, so if the patient is moved to subsequent medicine wards at a different hospital there will be a new record
-We separated the population into two datasets based on the odd/even status of the last digit of the [[Chart number]]:
-* Even: Training set
-* Odd: validation set
-=== Model generation and testing ===
+* EMIP / TR_DtTm during ED portion of visit: treat this as you would on the ward. The First TR DTtm  at ER will be taken regardless whether there is a second TR dttm when patient moved to a Med Med ward (DR agreed in the meeting with JM Feb10)
-{{Discuss|
-* we should add some basic info
-* details can remain in other files such as SAS, but this should include file links }}
-=== Decision on a model ===
-{{Discuss |
-* the statistical tests that were done to evaluate the model
-* the factors leading to our decision on "Model 8"
 }}
-This resulted in [[Overstay2 scoring model]].
+=== Model development Inclusion/Exclusion of "Green" admissions ===
-=== Decision on a probability threshold ===
-The overstay score generated by [[Overstay2 scoring model]] is used to assign an [[Overstay2 colour]] based on a threshold value, which affects the patient care team activities of the [[Overstay2 processes on the units to reduce overstay]]. This section explains how we decided on that threshold value.
-==== Optimal threshold ====
+If we plan to generate overstay colours like the last time, then the one group who would not have the model applied to them would be the “greens”, since the decision tree turns them green before the model would be applied. If we were able to determine who these greens would have been, would we want to exclude them from the model?
-{{Discuss|
-* What was the consideration for the initial choice of, I think, 0.051? }}
-==== Pragmatic threshold ====
+There is no way to exclude the greens from the model, so we won’t try.
-The  drives a process that requires additional work form the patient care team. There are limits to those resources. The [[#optimal threshold]] would have resulted in a assigning
-xxx%
-of patients an [[Overstay2 colour]] of "red". This would have overwhelmed the [[Overstay2 processes on the units to reduce overstay]].
-{{Discuss|
+== Analysis and model generation ==
-* initial thoughts were "15-17% being red, with an aim to get 60-75% of overstay patients"  }}
+=== Parameter candidates ===
+See [[Data definition for factor candidates for the Overstay2 project]] for the definitions.
-It was determined that a xxx-yyy% of "red" would be the maximum we could assign, at least during the [[Overstay2 timeline | initial phase]] of the project. To achieve this, we chose a threshold of '''0.069'''
-For the selected [[Overstay2 scoring model]] this led to the following predicted values
+==== Location Grouping considerations ====
-* [[Overstay2 colour]] = "red": xxxx%
 {{DJ |
-Do you have numbers for something like false positives/ false negatives/ positive predictive value/ etc? Will rely on you to make this something that would satisfy someone questioning this from a statistical angle. [[User:Ttenbergen|Ttenbergen]] 15:19, 23 February 2025 (CST)
+* When I looked at your code that breaks out {{OSDD|Location / living arrangement}} into groupings and measures it seemed to me that it was mixing up data cleaning and validation with measure definition and it might be good to keep those separate. Cleaning and validation should apply to the data in general, not just this model, no? It would make sense to document the steps taken and things found and remedies implemented on this page, but having them part of the definition seems problematic. I think I sent that as an email, but I think it would be better to track this on the wiki to have a trail for the decisions. [[User:Ttenbergen|Ttenbergen]] 12:03, 25 June 2025 (CDT)
 }}
-{{DJ |
+=== reference/examples for links ===
-* Does this page miss anything that is not addressed elsewhere as per pages either linked from here or from [[Overstay2 Overview]]? If not feel free to delete this question. [[User:Ttenbergen|Ttenbergen]] 15:19, 23 February 2025 (CST)
+{{DJ|
+* leaving these here as examples how to link to the definitions on [[Data definition for factor candidates for the Overstay2 project]]. The currently used definition should live there, but changes and reasons should probably live here. We can change that format, talk to me if needed. [[User:Ttenbergen|Ttenbergen]] 11:35, 25 June 2025 (CDT)
 }}
+* {{OSDD|Age}}
+* {{OSDD|PCH/Chronic Care}}
+* other {{OSDD|Location / living arrangement}}
+* {{OSDD|ADL components}} and
+** {{OSDD|ADL_Adlmean_NH }} - among those who came from PCH/CHF
+** {{OSDD|ADL_Adlmean_age}}  - interaction with Age
+* {{OSDD|Glasgow Coma Scale}}
+* {{OSDD|Location / living arrangement}} Postal Code (also see [[#Location Grouping for [[Postal Code]] is N/A]])
+* {{OSDD|Charlson Diagnoses}} (Categories and Total Score)
+** MI, CHF, PVD, CVA , Pulmonary, Connective, Ulcer, Renal
+** {{OSDD|Charlson Comorbidity Index}}
+** {{OSDD|Charlson Score * NH }} - among those who came from PCH/CHF
+* {{OSDD|Diagnoses}} that might prevent/delay meeting PCH/Home Care criteria
+* {{OSDD|Homeless}}
+=== Location Grouping for [[Postal Code]] is N/A ===
+Analysis notes: JM found postal code N/A =2759, JM used the R_Province, Pre_inpt_Location, Previous Location instead to define the 5 categories above. Also encountered no match in the Postal_Code_Master List but was able to categorized based on the first 3 characters (N=273) - list given to Pagasa to add. (DR agreed in the meeting with JM Feb10)
+=== Dataset split into training and validation data ===
+We separated the population into two datasets based on the odd/even status of the last digit of the [[Chart number]]:
+* Even: Training set
+* Odd: validation set
+=== Model generation and testing ===
+See \\ad.wrha.mb.ca\WRHA\HSC\shared\MED\MED_CCMED\Julie\MedProjects\Overstay_Project_2025 and emails between Julie, Tina and Dan Roberts ~2025-02
+=== Decision on a model ===
+*For each site's training set and validation set, perform chi square test for independence between the variable OS (Overstay >= 10days and Overstay < 10d) and each factors listed [[Data definition for factor candidates for the Overstay2 project]] to identify the factors that may affect the overstay individually.
+*Training data set - Methodology to find the '''best''' model involves
+** Basic plan for selecting the variables for the model -
+*** Perform logistic model with the OS as the dependent variable and the independent variables beginning with the results from univariable analysis above and
+*** Then by multivariable analysis using all independent variables (full model) and select via stepwise procedure both forward and backward selection.
+*** Examine the importance of each variable included based on the probability result of its coefficient.
+*** Those not contributing to the model are eliminated and new model is fitted. The process of deleting, refitting and verifying continues until it appears that all important variables are already included.
+** Assess the adequacy of the model both in terms of the individual variables and its overall fit by the following :
+***Estimated coefficients showing p-values of < 0.05 or having clinical relevance with p-values higher or close to 0.05 are included in the model.
+***The association of the predicted probabilities and observed responses is calculated by the Concordance (C) index and area under the curve (AUC)  between the true positive rate (sensitivity) and false positive rate (1-specificity).   A value > 0.5 implies ability to discriminate the positive and negative outcomes while a value 1 implies perfect classification.  This quantity indicates how well the model ranks predictions .
+***The Hosmer-Lemeshow Goodness-of-fit test is used to assess how well the logistic regression model fits the data.  A high p-value (usually > 0.05) means the model fits well while a low p-value (≤ 0.05) indicates poor fit of the model to the data.
+*Validation data set involves:
+** Using the candidate models from the training data set - fit the model using the validation data set.
+** From the predicted values, determine the Concordance (C) index and area under the curve (AUC)  between the true positive rate (sensitivity) and false positive rate (1-specificity).  It must result to values closer to 1.
+** Group the predicted data into deciles (10 groups) and for each group, the observed number of events is compared to the expected number of events predicted by the model. The sum of these 10 groups called  Chi-square statistic with 8 degrees of freedom must have p-value > 0.05 to denote good fit.
+* If both the training data set and validation data set gave good results in all tests, then the model is a candidate for selection. If there are more than one candidate models, the one having more clinical relevance is opted.
+* This resulted in [[Overstay2 scoring models]] by site.
+=== Decision on a probability threshold ===
+The predictive models we established are used to stratify the patient population for different [[Overstay2 processes on the units to reduce discharge delay]]. Details about establishing a threshold for the probabilities of the [[Overstay2 scoring models]] are in
+*[[Overstay2 colour]].
 == Related articles ==