How to Calculate Propensity Scores
- 1). Collect the data and enter it properly into a computer program, such as SAS, R, SPSS or Stata. Data are usually obtained from ongoing studies. They can be entered into a spreadsheet and then read into a statistics program, or they can be entered directly into the statistics program using built-in functions. Many statistics packages offer other ways to import data as well.
- 2). Use the software's logistic regression function with the dependent variable being the probability of being selected and the independent variables being all the covariates. The proper way to perform logistic regression varies with the program. For example, in SAS, PROC LOGISTIC is appropriate; in R, the glm() function is appropriate. Each statistics package has extensive documentation. According to authorities such as Rosenthal and Rosnow, it is better to include more variables as opposed to fewer.
- 3). Output the conditional probability of being selected, given the covariates. For example, in SAS, you would use a statement similar to that used by Parsons:
OUTPUT OUT= STUDY.Propen prob=prob
Here, the data set "STUDY" would have a variable called "prob". This is the propensity score. - 4). Check that the model balances the covariates. That is, check that the logistic regression gives a relatively complete explanation. In SAS, you can use the "lackfit" option on the model statement, which implements the Hosmer-Lemeshow test of goodness of fit. A large p-value indicates good fit, but you can look at the tabular output as well.
Source...