Elsevier

Electoral Studies

Volume 19, Issue 4, December 2000, Pages 577-613
Electoral Studies

Electoral inquiry section
Loess:: a nonparametric, graphical tool for depicting relationships between variables

https://doi.org/10.1016/S0261-3794(99)00028-1Get rights and content

Abstract

Loess is a powerful but simple strategy for fitting smooth curves to empirical data. The term “loess” is an acronym for “local regression” and the entire procedure is a fairly direct generalization of traditional least-squares methods for data analysis. Loess is nonparametric in the sense that the fitting technique does not require an a priori specification of the relationship between the dependent and independent variables. Although it is used most frequently as a scatterplot smoother, loess can be generalized very easily to multivariate data; there are also inferential procedures for confidence intervals and other statistical tests. For all of these reasons, loess is a useful tool for data exploration and analysis in the social sciences. And, loess should be particularly helpful in the field of elections and voting behavior because theories often lead to expectations of nonlinear empirical relationships even though prior substantive considerations provide very little guidance about precise functional forms.

Introduction

The purpose of this paper is to discuss the loess procedure for fitting smooth curves to scatterplots. Loess provides a graphical summary of the relationship between a dependent variable and one or more independent variables. The distinctive feature of this procedure is that it “allows the data to speak for themselves”. Loess is nonparametric, so the fitted curve is obtained empirically rather than through stringent prior specifications about the nature of any structure that may exist within the data. Therefore, loess-enhanced scatterplots often reveal relatively complex relationships that could easily be overlooked with traditional statistical modeling procedures.

Loess and other nonparametric estimation strategies are useful in social scientific research because current substantive theories usually provide little detail about the kinds of structural patterns that should exist within empirical data. In other words, hypotheses suggest which variables should be related to each other, and often, the direction of any such relationships: For example, “education levels should be positively related to voting turnout”. Beyond statements like this, however, there are generally no predictions about functional forms. Researchers therefore fall back on simple specifications, for want of theory-based directions to the contrary — a situation that Beck and Jackman (1998) have recently called “linearity by default”. This creates a potentially serious problem because those detailed theories which do exist suggest that nonlinear relationships are pervasive throughout the field of elections, voting, and mass political behavior (e.g. Przeworski and Soares, 1971, Zaller, 1992, Brown, 1995). Thus, a nonparametric technique like loess should be very useful for discerning such nonlinearities and explicating their forms.

The rest of this paper provides a detailed presentation of the loess method, along with the major practical considerations involved in its use. Most of the discussion will focus on the simplest case — using loess as a descriptive, exploratory tool for fitting smooth curves to scatterplots. This is undoubtedly the kind of situation where loess is employed most frequently. However, the technique is much more general than this. So, some attention will also be given to statistical inference and multivariate loess. Overall, loess is a very useful tool for discerning systematic structure within empirical data. As such, this technique should help researchers develop theories that provide accurate, powerful representations of real-world phenomena.

Section snippets

Scatterplot smoothing

The two-dimensional scatterplot is the basic graphical display method for bivariate data. At the same time, the scatterplot is the “building block” for more complex graphical depictions of multivariate data (Jacoby, 1998). One of the great strengths of the scatterplot is that it enables visual assessments of relationships or functional dependencies between the variables included in the display.

An example of loess smoothing

In order to demonstrate the utility of the loess procedure, we will examine a substantive example, using state-level data on education and voter turnout in the 1992 American presidential election. This is an ideal topic for our present purposes, because it epitomizes the ambiguities that often exist in our theoretical propositions. The relationship between education and mass political participation is widely acknowledged by social scientists. However, Nie et al. (1996) point out that even

Fitting a loess smooth curve

The loess procedure is computationally intensive; in other words, there is a large number of distinct steps involved in fitting even a simple loess curve to a small dataset. Nevertheless, the calculations themselves are straightforward. They should be readily understandable to anyone who is familiar with ordinary least squares regression analysis. The discussion in this section will provide a brief overview of the methodology underlying loess. Complete details and a simple, step-by-step example

Fitting parameters for the loess smooth curve

The loess procedure is nonparametric in the sense that the analyst does not specify the functional form of the final smooth curve. However, there are some parameters that must be supplied prior to the fitting procedure in order to guarantee that the loess curve really does pass through the center of the empirical data points. Selecting the values for these parameters is a subjective process, but the considerations that are involved in the decisions are quite straightforward.

Plotting loess residuals

The residuals from a loess fit can be employed as a useful diagnostic tool in order to determine whether the smooth curve adequately incorporates all of the interesting structure in the data. The strategy for doing so is identical to that used in traditional, linear regression analysis. The residuals are scrutinized for systematic patterns that may remain after an hypothesized structural representation has been fitted to the empirical data.

The loess residuals are defined as the difference

Goodness of fit for a loess smooth curve

When a loess smooth curve is fitted to data, attention is usually focused on the shape of the resultant curve because that feature is most revealing of the structure within the data. However, it is also useful to consider how well the smooth curve characterizes the empirical data values. This latter phenomenon is usually called ‘goodness of fit’, although that term is only partially appropriate in the case of nonparametric smoothers like loess.

A summary fit statistic similar to an R2 value can

Loess and statistical inference

The discussion so far has assumed that loess is being used as a strictly descriptive tool. However, the statistical theory for local regression models has been worked out, so it is possible to incorporate an inferential component into a loess analysis. Doing so facilitates generalizations about the structure of the population from which the observed data were drawn. Inferential tools also enable the researcher to assess the degree of uncertainty about the precise form of the smooth curve fitted

Loess and multivariate data

Although the discussion so far has focused on bivariate scatterplot smoothing, loess can also be a useful tool for situations where a dependent variable is hypothesized to be a function of several independent variables. In fact, there are at least two different approaches that can be used: multivariate loess (or, more precisely, ‘local multiple regression’) and generalized additive models. Let us briefly consider each of these strategies.

Software for loess

Because of its computationally intensive nature, loess smoothing is effectively impossible to carry out by hand. Therefore, most potential users (at least non-programmers) are constrained by the options that are provided by the available software. Fortunately, loess fitting is now widely incorporated into statistical software packages. However, the exact nature, capabilities and flexibility of the routines vary markedly from one program to the next.

Some packages only provide basic scatterplot

Conclusions

Loess has recently received a great deal of attention in statistical circles, where it is recognized as one member from a broader family of procedures called nonparametric regression models (Green and Silverman, 1994, Fan and Gijbels, 1996, Fox, 1999). However, loess is far less well known among political scientists. This is unfortunate, because it provides a very flexible approach to the problem of representing structure within a dataset. Accordingly, loess fitting is a useful addition to the

Acknowledgements

I would like to thank Harold D. Clarke and Harvey Starr for their comments and suggestions on an earlier version of this paper. Special thanks go to Saundra K. Schneider; this project could not have been completed without her help.

References (44)

  • W.S. Cleveland et al.

    Regression by local fitting: methods, properties, and computational algorithms

    J. Econometrics

    (1988)
  • EViews 3.0 user's guide

    (1997)
  • SAS/IML software: usage and reference, version 6

    (1990)
  • SAS/INSIGHT user's guide, version 6

    (1995)
  • STATA reference manual, release 5

    (1997)
  • S-PLUS guide to statistical and mathematical analysis

    (1995)
  • Beck N, Jackman S, 1997. Getting the mean right is a good thing: generalized additive models. Working Paper at the...
  • N. Beck et al.

    Beyond linearity by default: generalized additive models

    Am. J. Polit. Sci.

    (1998)
  • R. Becker et al.

    The visual design and control of trellis displays

    J. Comput. Graph. Stat.

    (1996)
  • R. Becker et al.

    The use of brushing and rotation for data analysis

  • D.A. Belsley et al.

    Regression diagnostics

    (1980)
  • C. Brown

    Serpents in the sand: essays on the nonlinear nature of politics and human destiny

    (1995)
  • J.M. Chambers et al.

    Graphical methods for data analysis

    (1983)
  • W.S. Cleveland

    Robust locally weighted regression and smoothing scatterplots

    J. Am. Stat. Assoc.

    (1979)
  • W.S. Cleveland

    Visualizing data

    (1993)
  • W.S. Cleveland

    The elements of graphing data (revised edition)

    (1994)
  • W.S. Cleveland et al.

    Locally weighted regression: an approach to regression analysis by local fitting

    J. Am. Stat. Assoc.

    (1988)
  • W.S. Cleveland et al.

    Graphical perception: theory, experimentation, and application to the development of graphical methods

    J. Am. Stat. Assoc.

    (1984)
  • W.S. Cleveland et al.

    Local regression models

  • R.D. Cook et al.

    An introduction to regression graphics

    (1994)
  • B. Efron et al.

    An introduction to the bootstrap

    (1993)
  • R.S. Erikson et al.

    Statehouse democracy: public opinion and public policy in the American States

    (1993)
  • Cited by (394)

    View all citing articles on Scopus

    The data used in the examples presented in this paper, along with the S-Plus routines used to produce the graphs, can be found on the Worldwide Web, at http://www.cla.sc.edu/gint/faculty/jacoby

    View full text