all_scores.csv
contains all the estimates with IDs. You can download it from here but it is also available on the Harvard Dataverse.
The codebook for all_scores.csv:
- mean, median, sd, mad, q5, q95: These are all estimates of the distribution of the theta. You should probably use mean unless you have reason not to.
- rhat, ess_bulk, ess_tail: These are the standard convergence/diagnostic measures from Stan.
- year: The year of the survey used in this estimate.
- candidate_ratings: The number of ratings (from survey participants) used in the estimate for this year-candidate (if this is missing then this person was NOT included in the survey and their scores are an extrapolation between the two years on either side).
- candidate_mean: The raw mean of the ratings from the survey.
- candidate_sd: The raw standard deviation of the rating from the survey.
- first_instance: An indicator for if this is the first year a candidate appeared in the data.
- Candidate: This is a “standardized” name for the candidate (it was arbitrarily selected from the names used in the CES data).
- project_id: This is a unique identifier for each candidate used only within this project.
- bonica.rid: This is the ID associated with the candidate in Bonica’s DIME database.
- ICPSR2: This is the ICSPR code for the candidate (taken from DIME).
- politican_id: This is the ID for the candidate matched to the 538 Election Database.
- state: The most common state that survey participants were in who rated this candidate.
- district: The most common district that survey participants were in who rated this candidate.
- ces_name: The name of the candidate used on that year’s CES survey.
- param, variable, param_id, past_id, rating_type, type: You can ignore these.