Deciding which statistical test to use
Although it is assumed that the variables are interval and normally distributed, we can include dummy variables when performing correlations. In the first example above, we see that the correlation between read and write is 0. By squaring the correlation and then multiplying by , you can determine what percentage of the variability is shared. In the output for the second example, we can see the correlation between write and female is 0. Squaring this number yields.
Simple linear regression allows us to look at the linear relationship between one normally distributed interval predictor and one normally distributed interval outcome variable. For example, using the hsb2 data file , say we wish to look at the relationship between writing scores write and reading scores read ; in other words, predicting write from read. We see that the relationship between write and read is positive. Hence, we would say there is a statistically significant positive linear relationship between reading and writing.
A Spearman correlation is used when one or both of the variables are not assumed to be normally distributed and interval but are assumed to be ordinal.
The values of the variables are converted in ranks and then correlated. In our example, we will look for a relationship between read and write.
We will not assume that both of these variables are normal and interval. Logistic regression assumes that the outcome variable is binary i. We have only one variable in the hsb2 data file that is coded 0 and 1, and that is female.
We understand that female is a silly outcome variable it would make more sense to use it as a predictor variable , but we can use female as the outcome variable to illustrate how the code for this command is structured and how to interpret the output.
The first variable listed after the logistic command is the outcome or dependent variable, and all of the rest of the variables are predictor or independent variables. In our example, female will be the outcome variable, and read will be the predictor variable. As with OLS regression, the predictor variables must be either dichotomous or continuous; they cannot be categorical.
The results indicate that reading score read is not a statistically significant predictor of gender i. Likewise, the test of the overall model is not statistically significant, LR chi-squared — 0.
Multiple regression is very similar to simple regression, except that in multiple regression you have more than one predictor variable in the equation. For example, using the hsb2 data file we will predict writing score from gender female , reading, math, science and social studies socst scores. Furthermore, all of the predictor variables are statistically significant except for read. Analysis of covariance is like ANOVA, except in addition to the categorical predictors you also have continuous predictors as well.
For example, the one way ANOVA example used write as the dependent variable and prog as the independent variable. Multiple logistic regression is like simple logistic regression, except that there are two or more predictors.
The predictors can be interval variables or dummy variables, but cannot be categorical variables. If you have categorical predictors, they should be coded into one or more dummy variables. We have only one variable in our data set that is coded 0 and 1, and that is female. The first variable listed after the logistic regression command is the outcome or dependent variable, and all of the rest of the variables are predictor or independent variables listed after the keyword with.
In our example, female will be the outcome variable, and read and write will be the predictor variables. These results show that both read and write are significant predictors of female. Discriminant analysis is used when you have one or more normally distributed interval independent variables and a categorical dependent variable. It is a multivariate technique that considers the latent dimensions in the independent variables for predicting group membership in the categorical dependent variable.
For example, using the hsb2 data file , say we wish to use read , write and math scores to predict the type of program a student belongs to prog. Clearly, the SPSS output for this procedure is quite lengthy, and it is beyond the scope of this page to explain all of it. However, the main point is that two canonical variables are identified by the analysis, the first of which seems to be more related to program type than the second.
For example, using the hsb2 data file , say we wish to examine the differences in read , write and math broken down by program type prog. The students in the different programs differ in their joint distribution of read , write and math. Multivariate multiple regression is used when you have two or more dependent variables that are to be predicted from two or more independent variables. In our example using the hsb2 data file , we will predict write and read from female , math , science and social studies socst scores.
These results show that all of the variables in the model have a statistically significant relationship with the joint distribution of write and read. Canonical correlation is a multivariate technique used to examine the relationship between two groups of variables. For each set of variables, it creates latent variables and looks at the relationships among the latent variables. It assumes that all variables in the model are interval and normally distributed.
SPSS requires that each of the two groups of variables be separated by the keyword with. There need not be an equal number of variables in the two groups before and after the with. The output above shows the linear combinations corresponding to the first canonical correlation. At the bottom of the output are the two canonical correlations. These results indicate that the first canonical correlation is. The F-test in this output tests the hypothesis that the first canonical correlation is equal to zero.
However, the second canonical correlation of. Factor analysis is a form of exploratory multivariate analysis that is used to either reduce the number of variables in a model or to detect relationships among variables.
All variables involved in the factor analysis need to be interval and are assumed to be normally distributed. The goal of the analysis is to try to identify factors which underlie the variables.
There may be fewer factors than variables, but there may not be more factors than variables. We will include subcommands for varimax rotation and a plot of the eigenvalues.
We will use a principal components extraction and will retain two factors. Using these options will make our results compatible with those from SAS and Stata and are not necessarily the options that you will want to use.
Communality which is the opposite of uniqueness is the proportion of variance of the variable i. The scree plot may be useful in determining how many factors to retain. From the component matrix table, we can see that all five of the test scores load onto the first factor, while all five tend to load not so heavily on the second factor.
The purpose of rotating the factors is to get the variables to load either very high or very low on each factor. In this example, because all of the variables loaded onto factor 1 and not on factor 2, the rotation did not aid in the interpretation. Instead, it made the results even more difficult to interpret. You need to know what type of variables you are working with to choose the right statistical test for your data and interpret your results. Discrete and continuous variables are two types of quantitative variables :.
Have a language expert improve your writing. Check your paper for plagiarism in 10 minutes. Do the check. Generate your APA citations for free! APA Citation Generator. Home Knowledge Base Statistics Statistical tests: which one should you use? Statistical tests: which one should you use? They can be used to: determine whether a predictor variable has a statistically significant relationship with an outcome variable.
Statistical tests flowchart Table of contents What does a statistical test do? What can proofreading do for your paper? What are the main assumptions of statistical tests?
Statistical tests commonly assume that: the data are normally distributed the groups that are being compared have similar variance the data are independent If your data does not meet these assumptions you might still be able to use a nonparametric statistical test , which have fewer requirements but also make weaker inferences.
What is a test statistic? What is statistical significance? What is the difference between quantitative and categorical variables? What is the difference between discrete and continuous variables? Discrete and continuous variables are two types of quantitative variables : Discrete variables represent counts e. Continuous variables represent measurable amounts e.
Is this article helpful? Rebecca Bevans Rebecca is working on her PhD in soil ecology and spends her free time writing. She's very happy to be able to nerd out about statistics with all of you.
Other students also liked. A step-by-step guide to hypothesis testing Hypothesis testing is a formal procedure for investigating our ideas about the world. It allows you to statistically test your predictions. Test statistics explained The test statistic is a number, calculated from a statistical test, used to find if your data could have occurred under the null hypothesis.
Understanding normal distributions In a normal distribution, data is symmetrically distributed with no skew and follows a bell curve. What is the effect of income on longevity? Statistical errors are the deviations of the observed values of the dependent variable from their true or expected values.
These errors are unobservable, since we usually do not know the true values, but we can estimate them with residuals, the deviation of the observed values from the model-predicted values. Additionally, many of these models produce estimates that are robust to violation of the assumption of normality, particularly in large samples. Leeper, Ph. We thank Professor Leeper for permission to adapt and distribute this page from our site.
0コメント