Ambulatory Care Quality: Physician vs Patient Perceptions
Ambulatory Care Quality: Physician vs Patient Perceptions
We developed interview guides and conducted in-depth telephone interviews (45–60 min) with patients and clinicians (physicians, nurse practitioners and physician assistants). Respondents were asked to think about recent office visits and to provide detailed information about behaviors that made each visit good or bad. We used this information to write descriptions of these behaviors, the context in which they occurred and the result of these behaviors (called critical incidents) using a standardized template (see Fig. 1) The standardized template reflects the types of questions and probes that were used to elicit this information but is not the exact wording. These critical incidents were used to develop a taxonomy of behaviors.
(Enlarge Image)
Figure 1.
Examples of patient and clinician critical incidents.
Respondents were identified primarily through two health plans, in the Chicago area and Hawaii, and through a recruiting firm. (Four interviews were conducted with patients in Florida whose predominant language was Spanish, to enable the collection of critical incidents that might be idiosyncratic to such patients.) Because of HIPAA regulations, health plans recruited patients by sending out letters inviting interested people to contact us. Since we did not have control over the number of letters sent out, it is not possible to calculate a participation rate.
We attempted to interview 40 clinicians and 160 patients and to distribute the patients equally among four racial–ethnic groups (Caucasian, African American, Asian American and Hispanic) and between the geographic areas. In our final sample, 90 patients and 19 clinicians were from the Chicago area, 74 patients and 16 clinicians were from Hawaii and 4 patients and 4 clinicians were from other parts of the country. The reason for oversampling minorities was to increase the likelihood of capturing quality behaviors reflective of issues that might be culturally specific. Table 1 provides the number and characteristics of interview respondents. The clinicians, 30 of whom were physicians, each received $125 for participating. We interviewed 168 patients, who each received $35.
Eleven interviewers were trained in the CIT, conducted phone interviews with respondents and prepared critical incidents from their detailed notes and a review of interview tapes or transcripts. Three of the interviewers were bilingual, enabling Spanish-speaking respondents to be interviewed in Spanish. Each incident was reviewed by another researcher to ensure it met the criteria required for a good critical incident. Incidents failing to meet these criteria were sent back to the interviewer who would review the source tape or transcript to enable edits. If adequate edits could not be made, the incident was discarded. A total of 14 incidents (0.5%) were discarded.
A set of 200 randomly selected incidents was used to develop the initial taxonomy. Two teams of two senior researchers read each incident and prepared a description of the incident's focal behavior, grouping together critical incidents whose focal behaviors were judged to be virtually identical. The development of critical incident taxonomies is grounded in the data and is not based on a predefined conceptual model or theory. Categories of critical incidents describing related behaviors were grouped together into larger aggregations. The researchers met and reconciled their respective taxonomies. The researchers then reviewed subsequent sets of incidents. New categories and subcategories were created for incidents that did not fit into existing categories. Codes were compared for agreement. Inconsistencies were used to inform revisions of category descriptions and taxonomic organizational revisions.
The classification scheme was reviewed until there was an agreement that the behaviors grouped together at the finest level were homogeneous groups and that the taxonomy was logical, internally consistent and appropriate for its intended functions. Saturation was achieved when the final 100 incidents resulted in the creation of no major categories and two or fewer subcategories. The reliability of this saturated taxonomy was established through inter-rater reliability checks: Cohen's weighted kappa for agreement at the subcategory level was 0.68.
The prevalence of incidents in the major categories and subcategories reported by clinicians and by patients was compared through the chi-square statistic, using SAS Version 9.1. We limited subcategory comparisons to those subcategories for which 10 or more respondents reported an incident. To compare the prevalence as a function of patient race/ethnicity, we estimated a multivariate probit model, using SAS Version 9.1, for each category and subcategory (with at least 10 respondents) with race/ethnicity, age and gender as explanatory variables. These models determined if persons in each of the four self-identified racial–ethnic groups were more or less likely than self-identified Caucasians to report an incident in each category or subcategory. Similar procedures and models were employed to compare clinicians who were physicians with other clinicians.
Methods
We developed interview guides and conducted in-depth telephone interviews (45–60 min) with patients and clinicians (physicians, nurse practitioners and physician assistants). Respondents were asked to think about recent office visits and to provide detailed information about behaviors that made each visit good or bad. We used this information to write descriptions of these behaviors, the context in which they occurred and the result of these behaviors (called critical incidents) using a standardized template (see Fig. 1) The standardized template reflects the types of questions and probes that were used to elicit this information but is not the exact wording. These critical incidents were used to develop a taxonomy of behaviors.
(Enlarge Image)
Figure 1.
Examples of patient and clinician critical incidents.
Respondents were identified primarily through two health plans, in the Chicago area and Hawaii, and through a recruiting firm. (Four interviews were conducted with patients in Florida whose predominant language was Spanish, to enable the collection of critical incidents that might be idiosyncratic to such patients.) Because of HIPAA regulations, health plans recruited patients by sending out letters inviting interested people to contact us. Since we did not have control over the number of letters sent out, it is not possible to calculate a participation rate.
We attempted to interview 40 clinicians and 160 patients and to distribute the patients equally among four racial–ethnic groups (Caucasian, African American, Asian American and Hispanic) and between the geographic areas. In our final sample, 90 patients and 19 clinicians were from the Chicago area, 74 patients and 16 clinicians were from Hawaii and 4 patients and 4 clinicians were from other parts of the country. The reason for oversampling minorities was to increase the likelihood of capturing quality behaviors reflective of issues that might be culturally specific. Table 1 provides the number and characteristics of interview respondents. The clinicians, 30 of whom were physicians, each received $125 for participating. We interviewed 168 patients, who each received $35.
Eleven interviewers were trained in the CIT, conducted phone interviews with respondents and prepared critical incidents from their detailed notes and a review of interview tapes or transcripts. Three of the interviewers were bilingual, enabling Spanish-speaking respondents to be interviewed in Spanish. Each incident was reviewed by another researcher to ensure it met the criteria required for a good critical incident. Incidents failing to meet these criteria were sent back to the interviewer who would review the source tape or transcript to enable edits. If adequate edits could not be made, the incident was discarded. A total of 14 incidents (0.5%) were discarded.
A set of 200 randomly selected incidents was used to develop the initial taxonomy. Two teams of two senior researchers read each incident and prepared a description of the incident's focal behavior, grouping together critical incidents whose focal behaviors were judged to be virtually identical. The development of critical incident taxonomies is grounded in the data and is not based on a predefined conceptual model or theory. Categories of critical incidents describing related behaviors were grouped together into larger aggregations. The researchers met and reconciled their respective taxonomies. The researchers then reviewed subsequent sets of incidents. New categories and subcategories were created for incidents that did not fit into existing categories. Codes were compared for agreement. Inconsistencies were used to inform revisions of category descriptions and taxonomic organizational revisions.
The classification scheme was reviewed until there was an agreement that the behaviors grouped together at the finest level were homogeneous groups and that the taxonomy was logical, internally consistent and appropriate for its intended functions. Saturation was achieved when the final 100 incidents resulted in the creation of no major categories and two or fewer subcategories. The reliability of this saturated taxonomy was established through inter-rater reliability checks: Cohen's weighted kappa for agreement at the subcategory level was 0.68.
The prevalence of incidents in the major categories and subcategories reported by clinicians and by patients was compared through the chi-square statistic, using SAS Version 9.1. We limited subcategory comparisons to those subcategories for which 10 or more respondents reported an incident. To compare the prevalence as a function of patient race/ethnicity, we estimated a multivariate probit model, using SAS Version 9.1, for each category and subcategory (with at least 10 respondents) with race/ethnicity, age and gender as explanatory variables. These models determined if persons in each of the four self-identified racial–ethnic groups were more or less likely than self-identified Caucasians to report an incident in each category or subcategory. Similar procedures and models were employed to compare clinicians who were physicians with other clinicians.
Source...