Glossary of Empirical Musicology

This is an in-progress glossary of terminology that I hope might be useful for those reading in music perception and cognition, but who are unacquainted with conventions of reasoning and description in the behavioral and mind sciences. Many terms from basic statistics are described, but (caution!) I am not an experienced statistician. I welcome feedback.

If you’re just getting started, my definitions for “p-value”, “Pearson coefficient,” and “t-test” might be a good place to start. You’ll need to have a basic sense of these for this week’s readings.

Let me know if you encounter anything unclear, or any additional terms you’d like defined here.


ANOVA, analysis of varianceAn analysis of a data set that determines the degree of association between two or more parameters, and the likelihood that this association could occur by chance. For example, if 6 invidual participants are asked to rate the intensity of 5 stimuli that differ in a variable hypothetically associated with intensity, an ANOVA determines whether differences in that variable are associated with the individuals’ ratings more strongly than with individual patterns in the participants. You can calculate an analysis of variance using Vassar’s statistics analysis site; beginners might like to start with my oversimplification of how to use it…)

bottom-up analysis: as opposed to “top-down” approaches, any approach beginning with specimens, constituents, and individual manifestations of structure, gathering short-term or local relational data before inferring or hypothesizing features a larger system or structure. For example: studying note-to-note relationships before generalizing about phrases and relationships between themes in the middle-ground or back-ground. This can be confusing, because the “bottom” is sometimes also referred to as “surface,” and the “top” in this metaphor is sometimes considered a kind of “depth.”

brain localization: the quality of a mental mechanism being organized in “specific, localizable brain regions.”

cognition: the organization and processing of information by the brain. Cognition is often distinguished loosely from perception with terms like “higher-order,” referring to neural activity of which we’re at least potentially aware, and potentially able to influence. Cognition is at least a little remote from sensory activity; in the classics of the discipline of cognitive science, perception is understood as cognition’s input; action as its output.

Cohen’s d value: a measure of effect size, calculated as the difference between two sample means, divided by the “pooled standard deviation”: a weighting the individual deviations from the mean according to the number of values representing each treatment or level of the independent variable.

confounding variable: one variable feature of experience that might be difficult to distinguish from another. Confounding variables threaten the status of one variable (see esp. “independent variable”) as a basis for predicting another (see esp. “dependent variable”).

correlation: a relationship between the state of a dependent variable and that of an independent variable, which can be observed, and which links the two variables to one another reliably. Strong correlation indicates that a change in one variable is closely matched to a change in another. Negative correlation indicates that an increase in one parameter matches a decrease in another. Correlations do not indicate causality.

deduction: reasoning from general premises, usually assumed from prior observation and study, to particular conclusions.

degrees of freedomthe number of aspects of a system that can vary, and that contribute to the calculation of a mean; typically, this is the number of scores taken in a test, minus 1. This is reported with t-tests and F-tests because a higher number increases the “power” of the test result. T-tests have one degree of freedom measurement, F-tests for ANOVA have two measurements—the first is usually either 1 or 2 (corresponding with the samples being compared).

dependent variable:  a variable feature of experience that can be observed repeatedly, and whose variation we hope to explain. We assign the term “dependent variable” when our hypothesis predicts its dependency on other measurable factors. Examples from seminar discussion: what (how much) information is immediately available to a viewer after a brief visual presentation? Less obviously: what types of bonding mechanisms (tactile, visual-gestural, vocal, rhythmic) are favored in early hominids? Although the second example is more complex than the first, both are examples of question marks that can be refined—add a phrase “under ___ conditions?” to either of those questions, and you have an opportunity to observe the variable change with the conditions on which it depends.


domain, (cf “domain specificity), cognitive domain: “a category of knowledge and behavior as we explicitly conceptualize it, and not the cognitive and neural resources upon which this category relies.” (Justus & Hutsler 2005, 2)

domain-specificity: a quality of a mental mechanism being “specific to processing only one kind of information,” (Justus & Hutsler 2005, 2)—information in only one (cognitive) “domain.”

effect: the relationship between variables; to describe an “effect” is to state what kind of change in one variable is produced by a change in another.

effect size: the strength or extent that one variable affects another. A popular measure of effect size is Cohen’s d value. Effect size is a crucial measurement to take in meta-analysis—for example, in the comparison of studies of similar but not identical designs—especially when a variable is included in one study that’s excluded from another. Two experiments may acheive similarly strong (or weak) correlation measures (showing that the variance in one parameter closely or remotely matches variance in another), without the degree of impact of one variable on another being clear; effect-size measurements are designed to clarify that issue.  Unlike tests of significance, effect-size tests are not affected by sample size or degrees of freedom.

F-value: a test of variance and significance, usually associated with ANOVA. ANOVA is generally reported as follows: F (df1, df2) = ___ , p < ___. Df1 is the degrees of freedom (see definition in this glossary) between groups (number of samples minus 1), and df2 is the degrees of freedom within groups (sample size minus 1).

The F-value is calculated as the ratio “MSR/MSE”, where MSR (the “model mean square”) is the sum of the squares of the deviations of the independent variable from its mean, and MSE (the “error mean square”) is the same measurement of the deviations of the dependent variable from its mean. In other words, it’s how much variance you find in the conditions you’re applying to the situation, divided by how much variance you find that isn’t related to those conditions. A high number suggests that the dependent variable is responding more to those variable conditions that you’re testing (the independent), than to all those unrelated, and unknown factors, that make the relationship less than perfect. It’s called an F value because it’s named after Ronald Fisher, the genetic and evolutionary biologist who invented it.

The p value is basically the likelihood that this F value could have occurred at random. It’s expressed as “<” if the test is statistically significant (usually <.005  or <.0005), and “=” if it is not.

independent variable: a variable feature of experience that can usually be manipulated reliably and specifically, with the purpose of observing how the world (and especially “dependent variables”) are affected by its changes. Examples from seminar discussion: the presence vs. absence of a cue to narrow the “reporting” task for 12 letters in a brief visual presentation. Imagined conditions that serve as evolutionary pressure for early hominids to favor tactile, gestural, aural, and aural-rhythmic bonding.

induction: usually, the process of using particular facts or observations to infer, or guess, general principles; the use of commonalities among a set of phenomena to imagine what might next occur. Contrary to popular belief, induction is not “reasoning from the particular to the general”—what distinguishes induction is rather its tentative, hypothesis-oriented nature. You get a hunch from a sequence of experiences and you reason from that what is probable. “Bias” is also a form of induction, and in these forms, induction is viewed skeptically by thinkers ranging from Hume to Popper.

information-encapsulation: a mental mechanism’s relative independence, and separation, from other mechanisms; its ability to work quickly/efficiently in a narrow domain of function, as a result of its not being impacted by information not directly associated with that function.

innate constraint:  a quality of a mental mechanism being constrained, innately, to its function.

level: (1) in reference to experimental design, a level (more often described as a “treatment”) is a point of reference on an independent variable; a point to which an independent variable is set, from which its relationship to other variables is observed.
(2) in reference to musical structure, a level is usually one of many ways of dividing the time of a composition or performance into meaningful complementary (and often, but not necessarily, similar-lengthed) sections. The term is normally used under a presumption of hierarchical levels, nesting types of division within each other recursively, so that a “top” level (or a “deep” level) contains sections that are few, and large, governing the whole form of the piece, while middle and bottom levels (levels closer to the “surface”) describe progressively more detailed and “local” information. However, heterarchical levels might consist of different aspects of a work wherein the sections at one level overlap with those of another level; in other words, divisions of one level may occur independently of another.


mean: the standard calculation of an average, the sum of collected values divided by the number of measurements. In a set 1, 2, 3, 3, 4, 5, 7, 8, and 11, the mean is 44/9 = 4.89. 

median: in a collection of values, the value with an equal or nearly equal (n-1) number of differing measurements greater than it, as lesser than it. In even collections, the median is the mean of the two values nearest the mid-point of distributed values. In a set 1, 2, 3, 3, 4, 5, 7, 8, and 11, the median is 4. (If there were an extra “3” in the set, the median would be 3.5.)

mode: in a collection of values, the value which occurs most frequently. In a set 1, 2, 3, 3, 4, 5, 7, 8, and 11, the mode is 3. When more than one value occurs at the highest frequency (i.e. if there were a “5” added to the previous set), we refer to the set as “multi-modal.”

modularity of mind: the belief that innate constraint, domain specificity, information encapsulation, and brain localization are tightly related and associated with one another, in a variety of mental mechanisms, and that we can take “demonstration of one quality in a given system, such as brain localization…as evidence for the other qualities, such as innate constraint.”

standard deviation (SD or “st dev”): in a collection of measurements of equivalent phenomena, the absolute value of the mean distance from the mean of the measurements. High standard deviations indicate a widely varying collection, low standard deviations indicate invariance and relative consistency. 

standard error (SE or “st err”): A measurement of the accuracy with which a sample represents a population—the degree to which sample mean (the mean within a single sample) deviates from the actual mean of a population. The smaller the standard error, the more representative the sample will be of the overall population. Standard errors are inversely proportional to the sample size; the larger the sample, the closer the sample’s “mean” will likely be to the mean of the whole population.

treatment: a point of reference on an independent variable; a point to which an independent variable is set, from which its relationship to other variables is observed. See also “level.”

p-value (“p <”, “p =”): a test of the likelihood that results would occur under random conditions. Traditionally, scientists accept a correlation or an association of independent and dependent variables when p < .005; stricter tests require <.001 or smaller. When p is expressed as “<”, statistical significance is being claimed; otherwise it expressed as “=.”

Pearson correlation coefficient (r): a measure of the degree to which one value predicts another in a linear continuum. (How consistently adding or subtracting from one value will change the other.) When r is 1, two values predict each other perfectly in positive correlation, when r is -1, the correlation is perfectly negative (adding to one value always corresponds to a reduction in the other).

perception: the process of taking up, and processing, sensory information. Perception is distinguished from cognition with terms like “lower-order,” referring to neural uses of sensary information that are relatively automatic and minimally subject to conscious influence. Visual perception includes everything from the intake of light by rods and cones, to the inversion of an image by the retina, and the re-inversion of that image within the brain, and probably a number of judgments about the borders of objects, their relative distances, and perhaps even their stability as constructs in the visual field. Auditory perception includes binaural processing to locate objects’ directions in relation to the listener, judgments of pitch height, and stream-segregations—for example the ability to separate words between sentences spoken simultaneously by different speakers—that organize sounds into likely causes and continua.

priming: any effect in which one stimulus influences responses to a later stimulus. In psychology, this refers often to the way a stimulus “cues” certain kinds of attention at the expense of others; for example, when listeners hear two sentences simultaneously from different speakers, they might be less likely to comprehend one of them if the other includes utterance of the listeners’ name (Broadbent 1958, discussed in Barry Arons “A Review of the Cocktail Party Effect”, MIT Media Lab, date unlisted). Snowden, Wichter, and Gray (2008) showed that male or female images differently “prime” male subjects’ rating of a word’s attractiveness or aesthetic appeal, depending upon their sexuality.

r = : See “Pearson correlation coefficient.”

t-test, Tukey HSD test, student’s t-test: These three types of values are similar; all are significance tests used in “analyses of variance” (see ANOVA); roughly, they are designed to compare all possible pairs of means, and determine the likelihood of random correlation across several treatments. The most common test—the t-test—is expressed in formats like “t(17) = 5.4, p <.005”, indicating: 

- 17 degrees of freedom (the size of your sample, minus 1).

- A significance of 5.4 (this is calculated in varying ways depending on the type of test involved, but it is, in general the ratio of treatment error to effect error—higher is better!) … and 

- A p-value (lower than .05—i.e. lower than 1/20th of a bell-curve—is usually required, this tests the likelihood the result would occur randomly)

type I error:  the assertion an effect (the impact of an independent variable on a dependent one) when none is present.

type II error: the assertion of no effect (in relation to any given variable, i.e. a confounding or independent variable), when one is present.


Calculating and reporting analysis of variance

The goal of an ANOVA is to be able to report that you compared how different independent variables affected a dependent variable, under specific conditions. You’ll end up stating your statistical evidence in this way:

“A one-way between-subjects ANOVA was conducted to compare the effect of (Ind. Variable)______________ on (Dependent Variable)_______________ in ______ (list the conditions).”

E.g. “A one-way ANOVA was conducted to compare the effect of average streaming distance (in phons) between two unpulsed streams segregated by loudness, on participants’ confidence in hearing a pulsed high or low stream segregated by pitch. The conditions were .8 phons, 2.5 phons, and 4.2 phons [F(1,20)  = 25.65, p <.0001].”

I constructed that assertion from the following steps. My data consisted of a list of the loudness distances of 12 stimuli (12 rows, each one with the number .8, the number 2.5, or the number 4.2 in it), and 12 “average participant responses” (12 rows, each with an average of the 5 participants’ subjective 1-5 rankings of confidence in it.

Eager to understand the significance of these numbers, I navigated to http://vassarstats.net, and sure enough, their server was online. (I think it helps to keep your fingers crossed, though I can’t demonstrate that statistically.)

1. Preparation: I selected ANOVA in the left column, entered “2” in “Number of samples…”, selected “Independent Samples,” and chose a “Weighted” analysis. (These are the most common procedures; I’ll refer you to literature on their distinctions, and special cases, in our individual meetings, if these become important.)

2. Data: Then I entered the list of loudness distances in the column marked “Sample 1,” and the list of “average participant responses” in the column marked “Sample 2.” (These are the two columns whose variance you hope will be correlated.) Then I clicked “calculate.”

To report the analysis:

1. I stated my “degrees of freedom” in parentheses after the enigmatic letter “F.” The first degree is 1 (number of samples, minus 1); the second is 20 (the number of scores in all samples, minus the number of samples), so it reads “F(1,20).”

2. I stated the “F” value (F = the ratio of the squares of the treatment variance to the squares of the effect variance^1), which is listed under “F” in the “Treatment” row: = 25.65.

3. The “p” value(<.0001, which is the chance arrangement of numbers could have happened randomly) —Vassar shows it as a “less than” statement if it’s low enough to be considered “significant,” or as an “equals” statement if it’s not. That’s the value that counts!

Having trouble with any terms on this page? Visit my glossary! <http://benleedscarson.com/empiricism-neurosis/>


^1. The F value is MSR/MSE where MSR (the “model mean square”) is the sum of the squares of the deviations of the independent variable from its mean, and MSE (the “error mean square”) is the same measurement of the deviations of the dependent variable from its mean. In other words, it’s how much variance you find in the conditions you’re applying to the situation, divided by how much variance you find that isn’t related to those conditions. A high number suggests that the dependent variable is responding more to those variable conditions that you’re testing (the independent), than to all those unrelated, and unknown factors, that make the relationship less than perfect. It’s called an F value because it’s named after Ronald Fisher, the genetic and evolutionary biologist who invented it.


Empiricism(, radical)

Empiricism is, roughly, a belief that experience and observation are the most reliable sources of knowledge. (Normally this is contrasted with “rationalism”…) It might also be described as the tendency in Western philosophy to recognize the limits of a priori beliefs, theories, or assumptions, in understanding our present experience—a kind of humility about how we already think about things: it is a resistence to (but not a rejection of!) established categorical ways of thinking or comprehending. When we say we are doing an “empirical” study of x, it means that we resist theory as a basis for understanding x … and instead, we embark on observation. The observation/theory distinction actually plays a more complex role in empirical philosophy, but it can be simplified by saying that for empiricists, the production of knowledge is a process, maybe even a kind of dance, between these two types of mental activity.

Observations consist of limited perspectives, but in empirical philosophy they are never “flawed” per se, because it’s in the very nature of an observation to be temporary, and contingent.^1 As long as we recognize that a perspective is bounded, an observation has stability as such: you can dispute what it means, and what actually happened, but it doesn’t make much sense to dispute whether or not an observation occurred. This is important because as limited as an observation might be, it can be construed as a part of a series of observations that form a pattern that might be a useful source of knowledge about other things. A ‘pattern’ of observations is just as bounded as an observation, always referring not to what is, but to “what we experience when ___, and if ____.” These patterns don’t have to lead to a profound truth in order to lead to a broad basis for prediction. More importantly, when a series of observations includes some sense of stability across a range of perspectives—even stability means understanding the role that those perspectives play—the broadening range becomes a broadening basis for theory. Scientific methods are systematic ways of expanding the utility of observation, through the controlled comparison of observations repeating across time and space; theories that arise from those observations not only contribute to our world view, but they help us both to stay humble about our limited perceptions as individuals—for example, we learn that the sun doesn’t revolve around us—and to challenge authority where it might force a kind of knowledge—for example, sexism, or internalized self-oppression—that limits us as a culture or as a species.

Theories are attempts to explain such a series of observations, in a way that lets them all relate to some common explanation or premise. By establishing a way of connecting a large number of experiences to a small number of ideas, our thoughts about those experiences are economized. Theory in that sense, functions in Lacan’s symbolic domain: theory individuates a complex reality with a simpler series of statements; it replaces something chaotic with something invocative and orderly; sharper or bolder lines in the world that, while artificial and sometimes simplistic, sometimes give us half a chance at seeing more of it. (And sometimes need to be broken when clarity requires a new view.) Theory doesn’t just have that definition in science; it applies colloquially as well. When Robert Mapplethorpe says “my theory about creativity is that the more money one has, the more creative one can be,” he is representing creativity, by establishing a common idea (money facilitating freedom) as a stand-in (a standing-for) for a more elusive one (creativity).

The question left wide open by the empirical dance of theory and observation—even in some presumably “empirical” scientific discourse—is the degree to which the status and organization of the knowledge left behind, so to speak, in the wake of the method, is itself empirical. (Again, we know that the facts were gained empirically, but what of their status? Their organization?) Say we are trying to determine some differences between women’s brains and men’s brains, and we observe that women tend to have denser, more abundant dendrite-networks connecting the brain hemispheres. We also observe that patients with severed dentrites have difficulty recognizing emotions in the faces of others. Then we theorize that women are more sensitive to the emotional determinants of facial expressions. If we can confirm the theory in a series of observations, and we’re careful in recording the circustances of those observations, then the theory becomes a kind of knowledge.

But if instead, we find that women tend to fall on either end of the spectrum of sensitivity, while men cluster in a moderate level of sensitivity, and if we also find that the ends of the spectrum are populated by men in rare cases, where in fact men are even more extreme. Then we return to observation, a little disatisfied with what seems like a messy relationship between gender and sensitivity. Say we notice that the location of women on the spectrum is sometimes radically affected by whether the face is a stranger’s or a relative’s, and sometimes not…but men never produce that effect. And then in one group of women, there is a much higher standard deviation, except where the faces are elderly men or androgynous. But the results of the Boston experiment can’t be duplicated in Denmark.

Is this knowledge? Certainly—at the very least, we can say that we have knowledge that the situation is complex or ambiguous. However, this isn’t the kind of knowledge (yet) that qualifies colloquially as “scientific knowledge”: the paper isn’t ready for publication, or if it is published, the journalist does not feel that the scientist is ready to be interviewed, or if the journalist finds it newsworthy, the managing editors do not find room for it this week, or if they do, perhaps the news-consuming public tunes out the story.

“Knowledge” is very different from “experience” or “reality.” Like “theory,” knowledge is a semiotic entity (Gordon Wells (2007) “Semiotic mediation, dialogue, and the construction of knowledge” Human Development 50:244–274) consisting of memes that stand in for a chaotic abundance experience and reality. What determines the signifier? When does a scientist declare the emergence of knowledge? No matter how empirical her process, the declaration of success in the process is an aesthetic declaration, is it not? It reflects a decision that experience has become something both more than experience and less than it, something worth holding, something worth situating next to the next chaotic bit of experience you’ll have, in hopes of improving your experience of it. The “aesthetics” to which I’m referring here isn’t merely a problem at the level of journalists and magazines and the public who read them; it begins with a scientists’ sense of time, and becoming, in the construction of questions and their meanings.



William James thought-through these issues in the early 1900s, and recognized that sound science requires not only what I am calling a dance between observation and theory, but a willingness to reject the aesthetics, or the narrativity, of “clarity” in knowledge, as it is spiritually and culturally defined. And to replace that aesethetics with an embrace of ambiguity and the inherent fluctuation of its subjectivity, its becoming. In order to embrace that—a key element of his most famous thesis on knowledge—“Essays in Radical Empiricism“—(New York: Longman Green and Co., 1912^2) requires us to give the same status of “reality” to the relations between things, as to things themselves. If physical objects are real, and shapes and letters are real, then the ways that shapes and signs refer to other things are also real, and so are the processes and developments of our narratives about those references.

Implicitly or not, the best scientists today have embraced James’ outlook, not presuming that clarity = grand truth, & ambiguity = ask again, something must be wrong. When I speak casually with a neurobiologist studying sex differences in the cerebral cortex, and ask her about her research, she normally does not say “I am learning about the mental essence of man and woman”. Instead, she’ll say something like “I am testing whether female cats with elevated levels of potassium enjoy hunting more than their male counterparts.” And if you ask her what she has learned, she will always say “well the cats I have studied exhibit…”, always being certain to profess the limitedness of observation, never saying that what she has learned is the truth in any broad sense.

However, this project, what turns out to be quite humble at its core, doesn’t begin or end that way. A sense of knowledge in the inquiry, and a sense of choice about the nature of question and answer, unfolds simultaneously on different planes, and in relation to different criteria. The lab she is working in discovers something useful, the utility is put forward in a grant application, the institution produces a name for itself and projects it to a journalist. The empirical ideal doesn’t disappear, but at some point it has to make a transition from the base/systemic labor of methodology, to the superstructural domain of utility or meaning…the dance of observation has to be cut, into a shape, something that looks like knowledge (or doesn’t look like it), before we can interrupt one barrage of tests and begin another, searching once again for that aesthetic. Flash forward six months, and the scientist finds herself “humanized”, she is inspiring us about the wonders of knowledge, musing the implications of what she has found, she is laughing approvingly at a joke about why women like to shop. This mediating process does not float freely, or independently, apart from scientific inquiry; it is an integrated part of scientific method.


The legacy of empiricism, defined radically, also extends from Bergson’s Matter and Memory (partially read in DANM 201), which in turn informs familiar Deleuzian/Guattarian concepts of the “Nomadic”, and “Rhizome,” developed in Thousand Plateaus: Capitalism and Schizophrenia. Rhizomes.net’s resource on rhizomes and becoming is excellent, and is more practical for our purposes than direct engagement of the whole Deleuzian project.




^1.  For example, “I saw the bird fly west” is, for an empiricist, only incorrect if it is a lie. I may have seen something that didn’t happen, I may have misidentified a bug as a bird, I can mistake jumping for flying, and I can mistake west for east. But regardless of what ‘actually’ happened when those mistakes were made, the observation itself (if not a lie) is irrefutable; that is what I saw. (One can debate the semantic point here: “did you see it, or only think that you saw it?” But for our purposes here, seeing is always tantamount to “thinking that you see.” For empiricists, there is no perfect certainty about the senses, except one: we can be certain that we sensed.)

^2. That link’s to a pdf of my own design, the full text (James, William Essays in Radical Empiricism. New York: Longman Green and Co., 1912) is fun too.



This is too much of a digression from 206D to make a big deal out of—but on the subject of empirical reasoning, and what might be “essentially” or “universally” human, consider this bit of spiral-shaped scientific rancor around the life of a little dog and the words he knows.

In relation to the debate narrative you find in these very short basic articles (and their supplementary materials), consider one important question that hovers in the background, but isn’t fully stated:

If dogs aren’t quite like humans, what are humans like? 

In other words: how, exactly, do we get to describing ourselves right out of the limits of dogs (or dolphins, or parrots, or non-human apes)? This is an open question… don’t read the phrase “how do we get to” as “how dare we” or “isn’t it silly that we.” I want to know more literally how we do it, and what kind of sense of it makes or doesn’t make.

You will need access as a UCSC student to the library’s portal, in order to read these articles.

1. Word Learning in a Domestic Dog: Evidence for “Fast Mapping” Juliane Kaminski, Josep Call, Julia Fischer. Science 11 June 2004: Vol. 304 no. 5677 pp. 1605-1606, in html text or pdf.

2. Can a Dog Learn a Word? Paul Bloom Science 11 June 2004:  Vol. 304 no. 5677 pp. 1605-1606, in html text or pdf. Here’s a the supplemental video for this article.

3. Differential Sensitivity to Human Communication in Dogs, Wolves, and Human Infants. József Topál, György Gergely, Ágnes Erdőhegy, Gergely Csibra, and Ádám Miklósi. Science 4 September 2009: Vol. 325 no. 5945 pp. 1269-1272. There are also six videos that supplement the article, available here under the headings “Movies” (S1 to S6).