Main | Calculating and reporting analysis of variance »

Glossary of Empirical Musicology

This is an in-progress glossary of terminology that I hope might be useful for those reading in music perception and cognition, but who are unacquainted with conventions of reasoning and description in the behavioral and mind sciences. Many terms from basic statistics are described, but (caution!) I am not an experienced statistician. I welcome feedback.

If you’re just getting started, my definitions for “p-value”, “Pearson coefficient,” and “t-test” might be a good place to start. You’ll need to have a basic sense of these for this week’s readings.

Let me know if you encounter anything unclear, or any additional terms you’d like defined here.


ANOVA, analysis of varianceAn analysis of a data set that determines the degree of association between two or more parameters, and the likelihood that this association could occur by chance. For example, if 6 invidual participants are asked to rate the intensity of 5 stimuli that differ in a variable hypothetically associated with intensity, an ANOVA determines whether differences in that variable are associated with the individuals’ ratings more strongly than with individual patterns in the participants. You can calculate an analysis of variance using Vassar’s statistics analysis site; beginners might like to start with my oversimplification of how to use it…)

bottom-up analysis: as opposed to “top-down” approaches, any approach beginning with specimens, constituents, and individual manifestations of structure, gathering short-term or local relational data before inferring or hypothesizing features a larger system or structure. For example: studying note-to-note relationships before generalizing about phrases and relationships between themes in the middle-ground or back-ground. This can be confusing, because the “bottom” is sometimes also referred to as “surface,” and the “top” in this metaphor is sometimes considered a kind of “depth.”

brain localization: the quality of a mental mechanism being organized in “specific, localizable brain regions.”

cognition: the organization and processing of information by the brain. Cognition is often distinguished loosely from perception with terms like “higher-order,” referring to neural activity of which we’re at least potentially aware, and potentially able to influence. Cognition is at least a little remote from sensory activity; in the classics of the discipline of cognitive science, perception is understood as cognition’s input; action as its output.

Cohen’s d value: a measure of effect size, calculated as the difference between two sample means, divided by the “pooled standard deviation”: a weighting the individual deviations from the mean according to the number of values representing each treatment or level of the independent variable.

confounding variable: one variable feature of experience that might be difficult to distinguish from another. Confounding variables threaten the status of one variable (see esp. “independent variable”) as a basis for predicting another (see esp. “dependent variable”).

correlation: a relationship between the state of a dependent variable and that of an independent variable, which can be observed, and which links the two variables to one another reliably. Strong correlation indicates that a change in one variable is closely matched to a change in another. Negative correlation indicates that an increase in one parameter matches a decrease in another. Correlations do not indicate causality.

deduction: reasoning from general premises, usually assumed from prior observation and study, to particular conclusions.

degrees of freedomthe number of aspects of a system that can vary, and that contribute to the calculation of a mean; typically, this is the number of scores taken in a test, minus 1. This is reported with t-tests and F-tests because a higher number increases the “power” of the test result. T-tests have one degree of freedom measurement, F-tests for ANOVA have two measurements—the first is usually either 1 or 2 (corresponding with the samples being compared).

dependent variable:  a variable feature of experience that can be observed repeatedly, and whose variation we hope to explain. We assign the term “dependent variable” when our hypothesis predicts its dependency on other measurable factors. Examples from seminar discussion: what (how much) information is immediately available to a viewer after a brief visual presentation? Less obviously: what types of bonding mechanisms (tactile, visual-gestural, vocal, rhythmic) are favored in early hominids? Although the second example is more complex than the first, both are examples of question marks that can be refined—add a phrase “under ___ conditions?” to either of those questions, and you have an opportunity to observe the variable change with the conditions on which it depends.


domain, (cf “domain specificity), cognitive domain: “a category of knowledge and behavior as we explicitly conceptualize it, and not the cognitive and neural resources upon which this category relies.” (Justus & Hutsler 2005, 2)

domain-specificity: a quality of a mental mechanism being “specific to processing only one kind of information,” (Justus & Hutsler 2005, 2)—information in only one (cognitive) “domain.”

effect: the relationship between variables; to describe an “effect” is to state what kind of change in one variable is produced by a change in another.

effect size: the strength or extent that one variable affects another. A popular measure of effect size is Cohen’s d value. Effect size is a crucial measurement to take in meta-analysis—for example, in the comparison of studies of similar but not identical designs—especially when a variable is included in one study that’s excluded from another. Two experiments may acheive similarly strong (or weak) correlation measures (showing that the variance in one parameter closely or remotely matches variance in another), without the degree of impact of one variable on another being clear; effect-size measurements are designed to clarify that issue.  Unlike tests of significance, effect-size tests are not affected by sample size or degrees of freedom.

F-value: a test of variance and significance, usually associated with ANOVA. ANOVA is generally reported as follows: F (df1, df2) = ___ , p < ___. Df1 is the degrees of freedom (see definition in this glossary) between groups (number of samples minus 1), and df2 is the degrees of freedom within groups (sample size minus 1).

The F-value is calculated as the ratio “MSR/MSE”, where MSR (the “model mean square”) is the sum of the squares of the deviations of the independent variable from its mean, and MSE (the “error mean square”) is the same measurement of the deviations of the dependent variable from its mean. In other words, it’s how much variance you find in the conditions you’re applying to the situation, divided by how much variance you find that isn’t related to those conditions. A high number suggests that the dependent variable is responding more to those variable conditions that you’re testing (the independent), than to all those unrelated, and unknown factors, that make the relationship less than perfect. It’s called an F value because it’s named after Ronald Fisher, the genetic and evolutionary biologist who invented it.

The p value is basically the likelihood that this F value could have occurred at random. It’s expressed as “<” if the test is statistically significant (usually <.005  or <.0005), and “=” if it is not.

independent variable: a variable feature of experience that can usually be manipulated reliably and specifically, with the purpose of observing how the world (and especially “dependent variables”) are affected by its changes. Examples from seminar discussion: the presence vs. absence of a cue to narrow the “reporting” task for 12 letters in a brief visual presentation. Imagined conditions that serve as evolutionary pressure for early hominids to favor tactile, gestural, aural, and aural-rhythmic bonding.

induction: usually, the process of using particular facts or observations to infer, or guess, general principles; the use of commonalities among a set of phenomena to imagine what might next occur. Contrary to popular belief, induction is not “reasoning from the particular to the general”—what distinguishes induction is rather its tentative, hypothesis-oriented nature. You get a hunch from a sequence of experiences and you reason from that what is probable. “Bias” is also a form of induction, and in these forms, induction is viewed skeptically by thinkers ranging from Hume to Popper.

information-encapsulation: a mental mechanism’s relative independence, and separation, from other mechanisms; its ability to work quickly/efficiently in a narrow domain of function, as a result of its not being impacted by information not directly associated with that function.

innate constraint:  a quality of a mental mechanism being constrained, innately, to its function.

level: (1) in reference to experimental design, a level (more often described as a “treatment”) is a point of reference on an independent variable; a point to which an independent variable is set, from which its relationship to other variables is observed.
(2) in reference to musical structure, a level is usually one of many ways of dividing the time of a composition or performance into meaningful complementary (and often, but not necessarily, similar-lengthed) sections. The term is normally used under a presumption of hierarchical levels, nesting types of division within each other recursively, so that a “top” level (or a “deep” level) contains sections that are few, and large, governing the whole form of the piece, while middle and bottom levels (levels closer to the “surface”) describe progressively more detailed and “local” information. However, heterarchical levels might consist of different aspects of a work wherein the sections at one level overlap with those of another level; in other words, divisions of one level may occur independently of another.


mean: the standard calculation of an average, the sum of collected values divided by the number of measurements. In a set 1, 2, 3, 3, 4, 5, 7, 8, and 11, the mean is 44/9 = 4.89. 

median: in a collection of values, the value with an equal or nearly equal (n-1) number of differing measurements greater than it, as lesser than it. In even collections, the median is the mean of the two values nearest the mid-point of distributed values. In a set 1, 2, 3, 3, 4, 5, 7, 8, and 11, the median is 4. (If there were an extra “3” in the set, the median would be 3.5.)

mode: in a collection of values, the value which occurs most frequently. In a set 1, 2, 3, 3, 4, 5, 7, 8, and 11, the mode is 3. When more than one value occurs at the highest frequency (i.e. if there were a “5” added to the previous set), we refer to the set as “multi-modal.”

modularity of mind: the belief that innate constraint, domain specificity, information encapsulation, and brain localization are tightly related and associated with one another, in a variety of mental mechanisms, and that we can take “demonstration of one quality in a given system, such as brain localization…as evidence for the other qualities, such as innate constraint.”

standard deviation (SD or “st dev”): in a collection of measurements of equivalent phenomena, the absolute value of the mean distance from the mean of the measurements. High standard deviations indicate a widely varying collection, low standard deviations indicate invariance and relative consistency. 

standard error (SE or “st err”): A measurement of the accuracy with which a sample represents a population—the degree to which sample mean (the mean within a single sample) deviates from the actual mean of a population. The smaller the standard error, the more representative the sample will be of the overall population. Standard errors are inversely proportional to the sample size; the larger the sample, the closer the sample’s “mean” will likely be to the mean of the whole population.

treatment: a point of reference on an independent variable; a point to which an independent variable is set, from which its relationship to other variables is observed. See also “level.”

p-value (“p <”, “p =”): a test of the likelihood that results would occur under random conditions. Traditionally, scientists accept a correlation or an association of independent and dependent variables when p < .005; stricter tests require <.001 or smaller. When p is expressed as “<”, statistical significance is being claimed; otherwise it expressed as “=.”

Pearson correlation coefficient (r): a measure of the degree to which one value predicts another in a linear continuum. (How consistently adding or subtracting from one value will change the other.) When r is 1, two values predict each other perfectly in positive correlation, when r is -1, the correlation is perfectly negative (adding to one value always corresponds to a reduction in the other).

perception: the process of taking up, and processing, sensory information. Perception is distinguished from cognition with terms like “lower-order,” referring to neural uses of sensary information that are relatively automatic and minimally subject to conscious influence. Visual perception includes everything from the intake of light by rods and cones, to the inversion of an image by the retina, and the re-inversion of that image within the brain, and probably a number of judgments about the borders of objects, their relative distances, and perhaps even their stability as constructs in the visual field. Auditory perception includes binaural processing to locate objects’ directions in relation to the listener, judgments of pitch height, and stream-segregations—for example the ability to separate words between sentences spoken simultaneously by different speakers—that organize sounds into likely causes and continua.

priming: any effect in which one stimulus influences responses to a later stimulus. In psychology, this refers often to the way a stimulus “cues” certain kinds of attention at the expense of others; for example, when listeners hear two sentences simultaneously from different speakers, they might be less likely to comprehend one of them if the other includes utterance of the listeners’ name (Broadbent 1958, discussed in Barry Arons “A Review of the Cocktail Party Effect”, MIT Media Lab, date unlisted). Snowden, Wichter, and Gray (2008) showed that male or female images differently “prime” male subjects’ rating of a word’s attractiveness or aesthetic appeal, depending upon their sexuality.

r = : See “Pearson correlation coefficient.”

t-test, Tukey HSD test, student’s t-test: These three types of values are similar; all are significance tests used in “analyses of variance” (see ANOVA); roughly, they are designed to compare all possible pairs of means, and determine the likelihood of random correlation across several treatments. The most common test—the t-test—is expressed in formats like “t(17) = 5.4, p <.005”, indicating: 

- 17 degrees of freedom (the size of your sample, minus 1).

- A significance of 5.4 (this is calculated in varying ways depending on the type of test involved, but it is, in general the ratio of treatment error to effect error—higher is better!) … and 

- A p-value (lower than .05—i.e. lower than 1/20th of a bell-curve—is usually required, this tests the likelihood the result would occur randomly)

type I error:  the assertion an effect (the impact of an independent variable on a dependent one) when none is present.

type II error: the assertion of no effect (in relation to any given variable, i.e. a confounding or independent variable), when one is present.

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>