## SPSS Unemployment Illness

### Introduction

The aim of this report is to try and find patterns and model relevant factors such as:- Unemployment levels across different districts in England.
- Modelling long term illnesses in relation to other key variables
- Create and develop an index which measures affluence for the different English districts.

### Analysis

The first part of the project will focus on developing a model which fits unemployment levels across the different districts. Multiple regression analysis will be used to develop this model. Before taking this step, some basic exploratory analysis would be useful as it gives us an idea of what are the main characteristics of the data. As can be observed from the above chart and table the majority of the districts have unemployment rates ranging between 2% and 6%. Isles of Scilly have the lowest unemployment rate at 1.34% whilst Hackney has the highest unemployment rate at 10.59%. The average unemployment rate per district is in the region of about 4%. The Pearson correlation coefficient (Appendix 1) clearly shows that the level of unemployment is correlated with all the variables involved in this study. However, the key positive relationships are with % of lone parent families (r=0.907, p<0.001), with % of persons without a car (r=0.886, p<0.001), with % of people living in rented LA accommodation (r=0.739, p<0.001). On the other hand the key negative relationships are with % married (r=-0.815, p<0.001), with % of homeowners (r=0.678, p<0.001) and with % of the population aged between 45-59 (r=-0.613, p<0.001). Interpreting the above results means that the higher the unemployment rates than people are less likely to have a car, likely to have an accommodation rented out to them by the LA and are also likely to be single parents. On the other hand low unemployment rates figures suggest that people are more likely to be married, middle aged and have their own house. Correlations are a good start to identify which predictors could be useful for the regression model. In this instance there are 18 variables which are all statistically significant with % of unemployment. In an ideal world the 'best' model which accounts for all the variation would be one which includes all these 18 variables however, from a practical perspective, this is a bit unworkable and hence stepwise regression will be used as this will provide a model with less predictors, yet a statistically robust enough model. For the purpose of this project no more than six predictors can be used to model unemployment rates. Stepwise regression produced the following model, based on six predictors which are % of lone parent families (r=0.907, p<0.001), % with llti in the district (r=0.573, p,<0.001), % of persons aged 60 and over (r=-0.225, p<0.001), % of hh with no earners (r=0.551, p<0.001), % of persons in detached /Semi detached or terraced housing (r=-0.525, p<0.001), % of females in each district (r=0.283, p<0.001). Various multiple regression models were tested. The model with three predictors provided by the stepwise regression model had a lower r-squared statistic and hence has been discarded. Another model with six independent variables (those variables that had the highest correlations with the dependent variable were also tested, but again, the r-squared statistic was lower and hence this option too was discarded) In the light of these results, it has been decided that the original six variables proposed by the stepwise model will be used.- The above model looks appropriate as about 94% of the variability of % of unemployed is explained by these six predictors. The r-sq adjusted is also very similar to the r-squared implying that the data under study provides a true reflection of the entire population.
- Autocorrelations are not a problem for this model as the D-W statistic is very close to 2.
- F = 997.4 and p<0.001 means that this regression model is a statistically valid one.

- Large t-values are linked with small p-values and indicate that particular independent variable is appropriate for the model. In this particular model all the six predictor variables contribute to the model. Furthermore the 95% confidence interval of the value of the coefficients do not include zero.
- Multicollinearity is not a problem for this model as VIF values are very low.
- The residuals appear to be normally distributed and randomly distributed.
- Broadly speaking this is decent model as all diagnostics and criteria indicate so. The tables below show which cases or districts have extreme values, statistically termed as outliers.

## District | ## Observed value | ## Predicted value | ||||||

## 207 | 4.90 | ## 3.54 | ||||||

## 258 | 9.21 | ## 7.81 | ||||||

## 357 | 8.65 | ## 7.35 |

## District | ## Observed value | ## Predicted value |

## 366 | 1.34 | ## 3.05 |

## 241 | 3.84 | ## 5.53 |

## 280 | 4.34 | ## 5.97 |

### Economic Position

- As can be observed from the above table, those who are permanently sick, unemployed and retired are more likely to be long term sick.
- On the other hand those in employment are less likely to be long term sick.

### Marital Status

- From the above table it can be clearly seen that the 'healthiest' people are those who are single.
- Those who remarried, divorced and widowed were more likely to suffer from long-term illness.

### Number of Cars

- Those without a car were more likely to suffer from long-term illness.

### Age

- It can be seen from the above two plots that the older the respondents the less likely that they have long term illness.

- Respondents working part time are about 1.2 times more likely not to have long term illness when compared with full time workers
- Respondents working full time are about 3 times more likely not to have long term illness when compared with retired respondents.
- Respondents working full time are about 1.5 times more likely not to have long term illness when compared with students.
- Respondents working full time are about 3.6 times more likely not to have long term illness when compared with inactive respondents.
- Respondents who are permanently sick are certain to have long term illness.
- Respondents working full time are about 2.4 times more likely not to have long term illness when compared with unemployed respondents.
- Respondents who are married are almost twice as likely to have long term illness when compared with single respondents.
- Respondents who remarry are about 2.2 times more likely to have long term illness when compared with single respondents.
- Respondents who widowed and are divorced are also more likely than single respondents to be suffering from long term illness.
- Respondents who have a car are about 1.8 times more likely to have long term illness when compared with those who do not have a car.
- Respondents who have two cars are about 2.2 times more likely to have long term illness when compared with those who do not have a car.

- Low unemployment rates
- High home ownership
- Higher marriage rates
- Low LA rented housing
- People without cars will be low
- Aged between 45 and 59.

## district | ## county | ## unemp | ## AFF_1 |

Isles of Scilly | Cornwall a | 1.34 | 76.47 |

South Lakeland | Cumbria | 1.81 | 40.65 |

Ribble Valley | Lancashire | 1.88 | 32.08 |

## district | ## county | ## unemp | ## AFF_1 |

Tower Hamlets | Inner Lond | 9.32 | 147.42 |

Knowsley | Merseyside | 9.50 | 107.15 |

Hackney | Inner Lond | 10.59 | 129.90 |