## DATA ANALYSIS: BARYCENTRIC REPRESENTATION & MULTIDIMENSIONAL SCALING & SCATTERPLOTS

October 21, 2006 at 11:40 pm | Posted in Philosophy, Research, Science & Technology | Leave a comment**Scatterplot & Multidimensional Scaling**

**A scatterplot, scatter diagram or scatter graph is a ****graph**** used in statistics to visually display and compare two or more sets of related quantitative, or numerical, data by displaying only finitely many points, each having a coordinate on a horizontal and a vertical axis.**

**The scatter diagram is one of the seven basic tools of quality control, which include the histogram, Pareto chart, check sheet, control chart, cause-and-effect diagram and flowchart.**

** **

**For example, to study the effects of lung capacity on the ability to hold one’s breath, a statistician would choose a group of people to study, and test each one’s lung capacity (first data set) and how long that person could hold their breath (second data set). They would then set up the data in a scatter plot, assigning “lung capacity” to the horizontal axis, and “time holding breath” to the vertical axis. A person with a lung capacity of 400 cc who held their breath for 21.7 seconds would be represented by a single dot on the scatter plot at the point (400, 21.7) in Cartesian coordinates. The scatter plot of all the people in the study would enable the statistician to obtain a visual comparison of the two sets of data, and help to determine what kind of relationship there might be between them.**

** **

**Scatter plot shows the position of all of the cases in an x-y or
x-y-z coordinate system. The relationship between interval variables can be identified from scatter graph. A dot in the body of the chart represents the intersection of the data on the x and y axis.**

**One advantage of a scatterplot is that it does not require a user to specify dependent or independent variables. Either type of variable can be plotted on either axes.
Scatterplots represents the association (not causation) between two variables.**

**A scatterplot can show three relationships, a positive
(rising), negative (falling), and no relationship, which is varied. If the downward sloping pattern of dots is from upper left to lower right, it suggests a negative correlation between the variables being studied. If the downward slope is from upper right to lower left, it suggests a positive correlation. A line of best fit can be drawn in order to study the correlation between the varibles. An equation for the line of best fit can be calculated by using the correllation coefficient.**

**(In probability theory and statistics, correlation, also called correlation** **coefficient, indicates the strength and direction of a linear relationship between two random variables. In general statistical usage, correlation or co-relation refers to the departure of two variables from independence, although correlation does not imply causality.
In this broad sense there are several coefficients, measuring the degree of correlation, adapted to the nature of data.**

**A number of different coefficients are used for different situations. The best known is the Pearson product-moment correlation coefficient, which is obtained by dividing the covariance of the two variables by the product of their standard deviations. Despite its name, it was first introduced by Francis Galton.)**

**http://en.wikipedia.org/wiki/Scatterplot**

**Multidimensional scaling**

* *

*The technique is also used in marketing, see Multidimensional scaling in marketing*

**Multidimensional scaling (MDS)**** is a set of related statistical techniques often used in data visualisation for exploring similarities or dissimilarities in data. An MDS algorithm starts with a matrix of item-item similarities, then assigns a location of each item in a low-dimensional space, suitable for graphing or 3D visualisation. **

** **

**Categorization of MDS**

** **

**MDS algorithms fall into a taxonomy, depending on the meaning of the input matrix:**

**Classical multidimensional scaling also often called Metric multidimensional scaling — assumes the input matrix is just an item-item distance**

matrix. Analogous to Principal components

analysis, an eigenvector problem is solved to find the locations that minimize distortions to the distance matrix. Its goal is to find a Euclidean distance approximating a given distance. It can be generalized to handle 3-way distance problems (the generalization is known as DISTATIS).

matrix. Analogous to Principal components

analysis, an eigenvector problem is solved to find the locations that minimize distortions to the distance matrix. Its goal is to find a Euclidean distance approximating a given distance. It can be generalized to handle 3-way distance problems (the generalization is known as DISTATIS).

**Metric multidimensional scaling — A superset of classical MDS that assumes a known parametric relationship between the elements of the item-item dissimilarity matrix and the Euclidean distance between the items.**

**Generalized multidimensional scaling (GMDS) — A superset of metric MDS that allows for the target**

distances to be non-Euclidean.

distances to be non-Euclidean.

**Non-metric multidimensional scaling — In contrast to metric MDS, non-metric MDS both finds a non-parametricmonotonic**

** relationship between the dissimilarities in the item-item matrix and the Euclidean distance between items, and the location of each item in the low-dimensional space. The relationship is typically found using isotonic regression. **

** **

**Applications**

** **

**Applications include scientific visualisation
and data mining in fields such as cognitive science, information science, psychophysics, psychometrics and ecology.**

** **

**Marketing**

** **

**In marketing, MDS is a statistical technique for taking the preferences and perceptions of respondents and representing them on a visual grid.
These grids, called perceptual maps are usually
two-dimensional, but they can represent more than two.**

**Comparison and advantages**

**Potential customers are asked to compare pairs of products
and make judgements about their similarity. Whereas other techniques (such as factor analysis, discriminant analysis, and conjoint analysis) obtain underlying dimensions from responses to product attributes identified by the researcher, MDS obtains the underlying dimensions from respondents’ judgements about the similarity of products. This is an important advantage. It does not depend on researchers’
judgments. It does not require a list of attributes to be shown to the respondents. The underlying dimensions come from respondents’ judgements about pairs of products.
Because of these advantages, MDS is the most common technique used in perceptual mapping.**

** **

**Multidimensional scaling procedure**

** **

**There are several steps in conducting MDS research:**

**Formulating the problem – What brands do you want to compare?**

How many brands do you want to compare? More than 20 is cumbersome. Less than 8 (4 pairs) will not give valid results. What purpose is the study to be used for?**Obtaining Input Data – Respondents are asked a series of questions. For each product pair they are asked to rate similarity (usually on a 7 point Likert scale from very similar to very dissimilar). The first question could be for Coke/Pepsi for example, the next for Coke/Hires rootbeer, the next for Pepsi/Dr Pepper, the next for Dr Pepper/Hires rootbeer, etc. The number of questions is a function of the number of brands and can be calculated as Q = N (N – 1) / 2 where Q is the number of questions and N is the number of brands. This approach is referred to as the “Perception data : direct approach”. There are two other approaches. There is the “Perception data : derived approach” in which products are decomposed into attributes which are rated on a semantic differential scale. The other is the “Preference data approach” in which respondents are asked their preference rather than similarity.****Running the MDS statistical program – Software for running the procedure is available in most of the better statistical applications programs. Often there is a choice between Metric MDS (which deals with interval or ratio level data), and Nonmetric MDS (which deals with ordinal data). The researchers must decide on the number of dimensions they want the computer to create. The more dimensions, the better the statistical fit, but the more difficult it is to interpret the results.****Mapping the results and defining the dimensions – The statistical program (or a related module) will map the results. The map will plot each product (usually in two dimensional space). The proximity of products to each other indicate either how similar they are or how preferred they are, depending on which approach was used. The dimensions must be labelled by the researcher. This requires subjective judgement and is often very challenging. The results must be interpreted ( see perceptual mapping).****Test the results for reliability and Validity – Compute R-squared to determine what proportion of variance of the scaled data can be accounted for by the MDS procedure. An R-square of .6 is considered the minimum acceptable level. Other possible tests are**

**Kruskal’s Stress, split data tests, data stability tests (ie.: eliminating one brand), and test-retest reliability.
See also: positioning, perceptual mapping, product management, marketing **

** **

**References**

**Abdi, H. “***[1]*Metric multidimensional scaling. In N.J. Salkind (Ed.):

(2007).*Encyclopedia of Measurement and Statistics.*Thousand Oaks (CA): Sage.*“.***Abdi, H., Valentin, D., O’Toole, A.J., Edelman, B. “***[2]. (2005).*DISTATIS:

The analysis of multiple distance matrices.*Proceedings of the IEEE Computer Society: International Conference on Computer Vision and Pattern Recognition. (San Diego, CA, USA).*“.

pp. 42-47.**Abdi, H., Valentin, D., Chollet, S., & Chrea, C. (in press, 2007).. “***[3].*Analyzing

assessors and products in sorting tasks: DISTATIS, theory and applications.*Food Quality and Preference, 18, -. .*“.**Bronstein, A. M, Bronstein, M.M, and Kimmel, R. (2006),***Generalized multidimensional scaling: a framework for isometry-invariant partial surface matching*, Proc. National Academy of Sciences (PNAS), Vol. 103/5, pp. 1168-1172.**Cox, M.F., Cox, M.A.A., (2001),***Multidimensional Scaling*, Chapman and Hall.**Coxon, Anthony P.M. (1982): “The User’s Guide to Multidimensional Scaling. With special reference to the MDS(X) library of Computer Programs.” London: Heinemann Educational Books.****Green, P. (1975) Marketing applications of MDS: Assessment and outlook,***Journal of Marketing*, vol 39, January 1975, pp 24-31.**Torgerson, W. S. (1958).***Theory & Methods of Scaling*. New York: Wiley.

**Kruskal, J. B., and
Wish, M. (1978), Multidimensional Scaling, Sage University Paper series on Quantitative Application in the Social Sciences, 07-011. Beverly Hills and London: Sage Publications. **

**See also**

**factor analysis, discriminant analysis**

**links**

**An elementary introduction**

to multidimensional scaling

to multidimensional scaling

**Evaluation of
multidimensional scaling algorithms **

**NewMDSX: Multidimensional Scaling Software **

**PERMAP, free software for making multidimensional analyses **

**Relational Perspective Map: MDS on closed manifolds **

**http://en.wikipedia.org/wiki/Multidimensional_scaling**

**Comment: **

**See also barycentric representation such as in Appendix B., “Harvest of the Palm” book, James Fox, Harvard University Press.**