# Alternate statistical test to linear regression? Classic List Threaded 6 messages Open this post in threaded view
|

## Alternate statistical test to linear regression?

 Greetings, I am testing to see if linear relationships exist between my x and y variables. I conducted various diagnoses in R to test for normality of the x variable data by using qqnorm, qqline and histograms that show the distribution of the data. If the data is shown to be normally distributed in either normal quantile plots or in the histograms (i.e. a bell curve-shaped distribution), I would assume normality and apply the linear regression model, using "lm". However, in some cases, my distributions do not satisfy the normality criteria, and so I feel that using the linear regression model, in those cases, would not be appropriate. For that reason, would you be able to suggest an alternate test to the linear regression model in R? Maybe a non-parametric counterpart to it? Thank you, and any help would be greatly appreciated!           [[alternative HTML version deleted]] _______________________________________________ R-sig-Geo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Open this post in threaded view
|

## Re: Alternate statistical test to linear regression?

 Note that the normality assumptions are about the residuals (or about y conditional on x), not on the x variable(s) or all of y (non-conditional).  If x is highly skewed and the residuals are normal then diagnostics just on y will also show skewness (if there is a relationship between x and y). Also, the normality assumptions are about the tests and confidence intervals, the least squares fit is legitimate (but possibly not the most interesting fit) whether the residuals are normal or not.  The Central Limit Theorem also applies in regression, so if the residuals are non-normal, but you have a large sample size then the tests and intervals will still be approximately correct (with the quality of the approximation depending on the degree of non-normality and sample size). There are many alternative tools.  There is a task view on CRAN for Robust Statistical Methods that gives summaries of many packages and tools for robust regression (and other things as well) which does not depend on the normality assumptions. On Wed, Oct 23, 2019 at 9:21 AM rain1290--- via R-sig-Geo <[hidden email]> wrote: > > Greetings, > I am testing to see if linear relationships exist between my x and y variables. I conducted various diagnoses in R to test for normality of the x variable data by using qqnorm, qqline and histograms that show the distribution of the data. If the data is shown to be normally distributed in either normal quantile plots or in the histograms (i.e. a bell curve-shaped distribution), I would assume normality and apply the linear regression model, using "lm". However, in some cases, my distributions do not satisfy the normality criteria, and so I feel that using the linear regression model, in those cases, would not be appropriate. For that reason, would you be able to suggest an alternate test to the linear regression model in R? Maybe a non-parametric counterpart to it? > Thank you, and any help would be greatly appreciated! >         [[alternative HTML version deleted]] > > _______________________________________________ > R-sig-Geo mailing list > [hidden email] > https://stat.ethz.ch/mailman/listinfo/r-sig-geo-- Gregory (Greg) L. Snow Ph.D. [hidden email] _______________________________________________ R-sig-Geo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Open this post in threaded view
|

## Re: Alternate statistical test to linear regression?

 Hi Greg and others, Thank you for your very informative response! I actually made a mistake in my initial message, in that I was actually testing for the y variable, not the x. I will also look into those packages on CRAN, but even if there is some skewness on the y, because my sample size is much larger than 30 (N>30), it might be safe to apply a linear regression analysis, if we can assume linearity?  A useful alternative would be to use correlation coefficients to test the degree of association between the x and y variables; specifically, the Pearson correlation coefficient, since both x and y variables are quantitative. Does that make sense? Thanks again, -----Original Message----- From: Greg Snow <[hidden email]> To: rain1290 <[hidden email]> Cc: r-sig-geo <[hidden email]> Sent: Wed, Oct 23, 2019 1:00 pm Subject: Re: [R-sig-Geo] Alternate statistical test to linear regression? Note that the normality assumptions are about the residuals (or about y conditional on x), not on the x variable(s) or all of y (non-conditional).  If x is highly skewed and the residuals are normal then diagnostics just on y will also show skewness (if there is a relationship between x and y). Also, the normality assumptions are about the tests and confidence intervals, the least squares fit is legitimate (but possibly not the most interesting fit) whether the residuals are normal or not.  The Central Limit Theorem also applies in regression, so if the residuals are non-normal, but you have a large sample size then the tests and intervals will still be approximately correct (with the quality of the approximation depending on the degree of non-normality and sample size). There are many alternative tools.  There is a task view on CRAN for Robust Statistical Methods that gives summaries of many packages and tools for robust regression (and other things as well) which does not depend on the normality assumptions. On Wed, Oct 23, 2019 at 9:21 AM rain1290--- via R-sig-Geo <[hidden email]> wrote: > > Greetings, > I am testing to see if linear relationships exist between my x and y variables. I conducted various diagnoses in R to test for normality of the x variable data by using qqnorm, qqline and histograms that show the distribution of the data. If the data is shown to be normally distributed in either normal quantile plots or in the histograms (i.e. a bell curve-shaped distribution), I would assume normality and apply the linear regression model, using "lm". However, in some cases, my distributions do not satisfy the normality criteria, and so I feel that using the linear regression model, in those cases, would not be appropriate. For that reason, would you be able to suggest an alternate test to the linear regression model in R? Maybe a non-parametric counterpart to it? > Thank you, and any help would be greatly appreciated! >        [[alternative HTML version deleted]] > > _______________________________________________ > R-sig-Geo mailing list > [hidden email] > https://stat.ethz.ch/mailman/listinfo/r-sig-geo-- Gregory (Greg) L. Snow Ph.D. [hidden email]         [[alternative HTML version deleted]] _______________________________________________ R-sig-Geo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Open this post in threaded view
|

## Re: Alternate statistical test to linear regression?

 First, please expunge the "(N>30)" concept from your mind.  This is an oversimplified rule of thumb used in introductory statistics courses (I am guilty of doing this in intro stat as well, but I try to emphasize to my students that it is only a rule of thumb for that class and the truth is more complex once you are in the real world, so consult with a statistician).  There is nothing magical about a sample size of 30, I have seen cases where n=6 is large enough for the CLT and cases where n=10,000 was not big enough. If the data is not overly skewed and your sample size is large then you can just use regression as is and the inference will be approximately correct (with a really good approximation).  But with skewness we often prefer the median over the mean and least squares regression is equivalent to fitting a mean, some of the robust regression options are equivalent to fitting a median, so they may be preferable on that count. Note that Pearson's correlation does not test linearity, it assumes linearity (and bivariate normality).  Most issues with regression will be the same for the correlation. On Wed, Oct 23, 2019 at 11:25 AM <[hidden email]> wrote: > > Hi Greg and others, > > Thank you for your very informative response! I actually made a mistake in my initial message, in that I was actually testing for the y variable, not the x. I will also look into those packages on CRAN, but even if there is some skewness on the y, because my sample size is much larger than 30 (N>30), it might be safe to apply a linear regression analysis, if we can assume linearity? > > A useful alternative would be to use correlation coefficients to test the degree of association between the x and y variables; specifically, the Pearson correlation coefficient, since both x and y variables are quantitative. Does that make sense? > > Thanks again, > > > -----Original Message----- > From: Greg Snow <[hidden email]> > To: rain1290 <[hidden email]> > Cc: r-sig-geo <[hidden email]> > Sent: Wed, Oct 23, 2019 1:00 pm > Subject: Re: [R-sig-Geo] Alternate statistical test to linear regression? > > Note that the normality assumptions are about the residuals (or about > y conditional on x), not on the x variable(s) or all of y > (non-conditional).  If x is highly skewed and the residuals are normal > then diagnostics just on y will also show skewness (if there is a > relationship between x and y). > > Also, the normality assumptions are about the tests and confidence > intervals, the least squares fit is legitimate (but possibly not the > most interesting fit) whether the residuals are normal or not.  The > Central Limit Theorem also applies in regression, so if the residuals > are non-normal, but you have a large sample size then the tests and > intervals will still be approximately correct (with the quality of the > approximation depending on the degree of non-normality and sample > size). > > There are many alternative tools.  There is a task view on CRAN for > Robust Statistical Methods that gives summaries of many packages and > tools for robust regression (and other things as well) which does not > depend on the normality assumptions. > > > On Wed, Oct 23, 2019 at 9:21 AM rain1290--- via R-sig-Geo > <[hidden email]> wrote: > > > > Greetings, > > I am testing to see if linear relationships exist between my x and y variables. I conducted various diagnoses in R to test for normality of the x variable data by using qqnorm, qqline and histograms that show the distribution of the data. If the data is shown to be normally distributed in either normal quantile plots or in the histograms (i.e. a bell curve-shaped distribution), I would assume normality and apply the linear regression model, using "lm". However, in some cases, my distributions do not satisfy the normality criteria, and so I feel that using the linear regression model, in those cases, would not be appropriate. For that reason, would you be able to suggest an alternate test to the linear regression model in R? Maybe a non-parametric counterpart to it? > > Thank you, and any help would be greatly appreciated! > >        [[alternative HTML version deleted]] > > > > _______________________________________________ > > R-sig-Geo mailing list > > [hidden email] > > https://stat.ethz.ch/mailman/listinfo/r-sig-geo> > > > > -- > Gregory (Greg) L. Snow Ph.D. > [hidden email] -- Gregory (Greg) L. Snow Ph.D. [hidden email] _______________________________________________ R-sig-Geo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/r-sig-geo