You bad, bad systems neuroscientists

A recent paper in Nature Neuroscience is suggesting that Systems Neuroscience has nearly as big of a problem with statistics as the fMRI field used to. The abstract:

In neuroscience, experimental designs in which multiple observations are collected from a single research object (for example, multiple neurons from one animal) are common: 53% of 314 reviewed papers from five renowned journals included this type of data. These so-called ‘nested designs’ yield data that cannot be considered to be independent, and so violate the independency assumption of conventional statistical methods such as the t test. Ignoring this dependency results in a probability of incorrectly concluding that an effect is statistically significant that is far higher (up to 80%) than the nominal α level (usually set at 5%). We discuss the factors affecting the type I error rate and the statistical power in nested data, methods that accommodate dependency between observations and ways to determine the optimal study design when data are nested. Notably, optimization of experimental designs nearly always concerns collection of more truly independent observations, rather than more observations from one research object.

I would write something on this but there’s nothing I would say that Tal Yarkoni hasn’t already:

I don’t have any objection to the advocacy for hierarchical models; that much seems perfectly reasonable. If you have nested data, where each subject (or petrie dish or animal or whatever) provides multiple samples, it’s sensible to try to account for as many systematic sources of variance as you can. That point may have been made many times before,  but it never hurts to make it again.

What I do find surprising though–and frankly, have a hard time believing–is the idea that 53% of neuroscience articles are at serious risk of Type I error inflation because they fail to account for nesting. This seems to me to be what the abstract implies, yet it’s a much stronger claim that doesn’t actually follow just from the observation that virtually no studies that have reported nested data have used hierarchical models for analysis. What it also requires is for all of those studies that use “conventional” (i.e., non-hierarchical) analyses to have actively ignored the nesting structure and treated repeated measurements as if they in fact came from entirely different subjects or clusters.

 

Genetically programmed, but with options

I have a guest post at NeuWriteSD:

Australia has been having a problem with discarded beer bottles. It turns out that the Australian Jewel Beetle finds these bottles so attractive that they will mate with them until they die from dehydration. The bottles, so fine with their seductive golden shading and arousingly dimpled bottoms, are the ultimate beetle aphrodisiac. So gorgeous are these bottles that the Jewel Beetles have been known to continue mating even while being eaten alive by ants. Unfortunately for them, they have been genetically adapted to a different environment than the one they currently find themselves in, and their genetic programming has been hijacked. What they once found so deeply enticing about their fellow beetles, they now find in empty beer bottles instead.

Much of our behavior – not just insect behavior but our behavior – is driven by our ‘genetic programming’. Take a look at the list of things that are ‘cultural universals’. Language, color, family and friend groups, toys, weapons, play, song, dance. These are all things that we do no matter who we are. These are some of the things it means to be genetically human.

Somehow, most of my links disappeared, though! The most important are of the Australian Jewel Beetle mating with the beer bottle:

They are not the only species to be in an ‘evolutionary trap’; this happens in nature all the time:

Australian wasps have long been known to fall prey to a native orchid’s seductive ways.

“Orchids are really famous for their wide variety of evolutionary traps,” says Dr Anne Gaskett, a biologist at The University of Auckland, in New Zealand. “They have a diverse range of ways of fooling insects into acting as pollinators without having to give them a reward.”

Australian tongue orchids (Cryptostylis), for example, have evolved to mimic the scents and appearance of female wasps (Lissopimpla excelsa), in order to trick male wasps into spreading their pollen.

And here is the list of cultural universals. Remember that even though these are things that are seen in every culture, they need not be biological! There is tons of cultural transmission, even to so-called ‘uncontacted tribes’.

 

Unrelated to all that, 3/28 edition

Why academics should blog: it’s not just about overqualified science journalists

The uncomfortable project; things that work but are a pain in the ass to use

Fun fact: squirrels hibernate so hard that you can juggle them

Is French the language of the future? If you add up the population of the Francophone countries, by 2050 it will have more speakers than any other language in the world. (Of course the answer is: no.)

Paul Erdős has published 108 papers since he died. You’re all slackers.

Microscope photos of butterfly wings.

A list of the top 5 papers in bioinformatics, according to someone with a blog

Half of all edits to Wikipedia are made by bots.

Bright lights, big data, and the digital humanities.

Some notes on scent:

Real musk is obtained from the dried gland of a wild male deer, and fresh from the wild it has a repulsive smell. Harvesting musk involves hanging out in forests during mating season with a high-powered rifle, shooting a medium-large mammal, cutting out a small gland, and then taking and preparing that gland for a luxury product. Hunters can’t tell whether the deer in their sights is a buck or a doe, and consequently half of the animals they shoot are the wrong sex. That’s one reason for synthetic musks. Another is that real musk is a mélange of chemicals that will cause some wearers to break out in hives. The first synthetic variant—Musk Xylol—was produced in 1888, and as the industry bible by Steffen Arctander notes, it is a close relative of trinitrotoluene, or TNT. Attempts to synthesize Musk Xylol in commercial quantities have been responsible for serious explosions in fragrance laboratories, and the death of more than a few fragrancers.

Penguins do battle with their natural predators:

Ants are using our cocoa trees as their own personal farms, and that means less chocolate for us (via Jennifer Welsh)

Seminars by the numbers: let’s just code common questions in order to save time

My buddy Seth Kadish made a graphic of how gridded different cities are (more here):

Neuroecology programming note

Posting will likely be lighter than normal over the next month or two. I am writing my dissertation, working on multiple manuscripts, visiting family in India and moving to Philadelphia. Busy! And all this writing leaves me feeling drained and uncreative; it is hard to come up with the energy or ideas required for the blog to keep up its regular output.

Incidentally, if anyone who is reading this is a professor at UPenn or Princeton (or somewhere within driving distance of Philly), please send me an email, I am hunting for postdocs 😉

Fractal organization in MMOs

One of my favorite pet topics is using MMOs (online games) to understand questions of social structure and economics. Benedikt Fuchs looked at social structure in the game Pardus:

But exactly what kinds of structures form and to what extent these groupings depend on the environment is still the subject of much debate. So an interesting question is whether humans form the same kinds of structures in online worlds as they do in real life…

…[they] have studied the groups humans form when playing a massive multiplayer online game called Pardus. Their conclusion is that humans naturally form into a fractal-like hierarchy in which people belong to a variety of groups on different scales. In fact, the formation of hierarchies seems to be an innate part of the human condition.

They find groups that are progressively larger: (1) individuals, (2) close friends, (3) other friends, (4) ‘alliances’, (5) communication, and (6) everyone. These aren’t necessarily subsets of each other, either (ie, friends may be outside of alliances). They claim that these groups exhibit a fractal structure that is seen in other human societies, though I think that just means that each group is progressively larger…

Incidentally, I had never heard of Pardus before but now I desperately want to play… if only I had free time.

Reference

Fuchs B, Sornette D, & Thurner S (2014). Fractal multi-level organisation of human groups in a virtual world arXiv

Explaining the structure of science through information theory

tl;dr I propose a half-baked and  totally wrong theory of the philosophy of science based on information theory that explains why some fields are more data oriented, and some more theory oriented

Which is better: data or theory? Here’s a better question: why do some people do theory, and some people analyze data? Or rather, why do some fields tend to have a large set of theorists, and some don’t?

If we look at any individual scientist, we could reasonably say that they were trying to understand world as well as possible as well as possible. We could describe this in information theory terms: they are trying to maximize the information they have about that description, when given some set of data. One way to think about information is that it reduces uncertainty. In other words, when given a set of data we want to reduce our uncertainty about our description of the world as much as possible. When you have no information about something, you are totally uncertain about it. You know nothing! But the more information you have, the less uncertain you are. How do we do that?

Thanks to Shannon, we have an equation that tells us how much information two things share. In other words, how much will knowing one thing tell us about the other:

I(problem; data) = H(problem) – H(problem | data)

This tells you (in bits!) how much certain you will be about a problem or our description of the world if you get some set of data.

H(problem) is the entropy function; it tells us how many different possibilities we have of describing this problem. Is there only one way? Many possible ways? Similarly, H(problem | data) is how many possible ways we have of describing the problem if we’ve seen some data. If we see data, and there are still tons of possibilities, the data has not told us much; we won’t have much information about the problem. But if the data is so precise that for each set of data we know exactly how to describe the problem, then we will have a lot of information!

This tells us that there are two ways to maximize our information about our problem if we have a set of data. We can either increase our set of descriptions of the problem or we can decrease how many possible ways there are to describe the problem when we see data.

In a high-quality, data-rich world we can mostly get away with the second one: the data isn’t noisy, and will tell us what it represents. Information can simply be maximized by collecting more data. But what happens when the data is really noisy? Collecting more data gives us a smaller marginal improvement in information than working on the set of descriptions – modeling and theory.

This explains why some fields have more theory than others. One of the hopes of Big Data is that it will reduce the noise in the data, shifting fields to focusing on the H(problem|data) part of the equation. On the other hand, the data in economics, honestly, kind of sucks. It’s dirty and noisy and we can’t even agree on what it’s telling us. Hence, marginal improvements come by creating new theory!

Look at the history of physics; for a long time, physics was getting high-quality data and had a lot of experimentalists. Since, oh, the 50s or so it’s been getting progressively harder to get new, good data. Hence, theorists!

Biology, too, has such an explosion of data it’s hard to know what to do with it. If you put your mind to it, it’s surprisingly easy to get good data that tells you about something.

Theory: proposes possible alternatives for how the world could work [H(problem)]; data limits H(problem | data). Problem is data itself is noisy.

Humans can discriminate a trillion smells – wait, what?

To me, the idea that you can smell a trillion smells is somehow baffling. What does that even mean? To talk it through to myself, I wrote a story about it on Medium which somehow partially metastasized into a history of the classification of smells:

In What the Nose Knows, Avery Gilbert describes the history of the people who have tried to force an order onto smells. It begins, somehow predictably, with the godfather of scientific systemization, Linnaeus. The man who had brought us the taxonomic classification for animals, with names such as Felis catus and Caenorhabditis elegans, attempted to do the same for odors. He decided there must be a discrete set of classes of odors, which include fragrant, spicy, musky, garlicky, goaty, foul, and nauseating. This was later refined by Hendrik Zwaardermaker who added the classesethereal and empyreumatic, as well as adding subclasses for each class. Next came Hans Henning, who decided that smells lie on an odor prism. Each vertex was a specific quality of odor — flowery, foul, fruity, spicy, burnt, and resinous — and the distance of a small from each vertex was the relative contribution of that quality to the odor. This gave odors a space and direction and possibly even dynamics from one point to another. But it also didn’t work.

I think the key to understanding the concept of ‘a trillion smells’ is that there are so many basis elements; there are only three primary colors but there are many, many primary odorants.

Go read the whole thing.

The Hierarch of Cosyne

Attention warning: I appear to be in the list-making and ranking mood, lately. This list is probably not 100% accurate. This is from every Cosyne except 2012 & 2013 (seriously, get the list of posters up in a non-PDF form, I ain’t scraping that.)

I thought that a good way to get a handle on who is active in the computational neuroscience community would be to see who presents the most posters at Cosyne. Presumably, the more active you are, the more posters that you will have. There are obvious biases here: bigger labs will have more posters, international researchers have a harder time making it to Cosyne, and some people (eg Terry Sejnowski) just aren’t interested in showing up. So take this for what it is.

cosyne names

This year the winner of the ‘Most Posters’ award (aka, The Hierarch of Cosyne) was Wulfram Gerstner with 6 posters, followed by Jonathan Pillow, Tatyana Sharpee, and Maneesh Sahani with 5.

Historically, the number of posters follows a power law (with obeisance given to Cosma Shalizi, noting this is probably not a power law and I’m too lazy to test it.)

Here is the ranking of most Cosyne posters aka the “Pope of Cosyne” award – remember that I’m unfortunately omitting 2012-2013:

(32) Liam Paninski

(22) Maneesh Sahani

(18) Jonathan Pillow

(16) Wei Ji Ma

(15) Paul Schrater, Markus Meister

(14) Masato Okada, Peter Dayan

(13) Wulfram Gerstner, Vijay Balasubramanian, Mate Lengyel, Zach Mainen, Alexendre Pouget, Krishnan Shenoy

I was going to make a connectivity diagram but I realized I have no idea how! If anyone has a tool that is easy to use, let me know.

(Incidentally, the most common last name was ‘Wang’ followed by ‘Paninski’)

What’s the use of econophysics?

Mark Buchanan has the answers:

1. More than anything, physicists have helped to establish empirical facts about financial markets; for example, that the probability of large market movements (up or down) decreases in accordance with an inverse cubic power law in many diverse markets…

4. Work in econophysics — through the study of minimal models such as the minority game — has also revealed surprising qualitative features of markets; for example, that a key determinant of market dynamics is the diversity of participants’ strategic behaviour…

7. On a similar theme, fundamental analysis by physicists has examined the relationship between market efficiency and stability.

Go read about 8 success stories of econophysics. Whenever I delve into the econophysics literature, it mostly seems…pretty boring and bad. But there’s some good stuff out there! You just have to find it.

Purkinje and his cell

A hundred thousand hourglasses – on the Purkinje cell:

If a mid-19th century European—a Prussian, let’s say—wanted to contact famed Czech histologist Jan Evangelista Purkinje, he only needed to address his envelope with two words: Purkinje, Europe; so large was Purkinje’s renown, that his dwelling was an entire continent…

Born in 1787 to a housewife and a German priest, Purkinje was raised in Bohemia (now Czech Republic) and graduated in 1818 with a degree in medicine. He was soon appointed as a Professor of Physiology at Prague’s Charles University where he taught and conducted research on human anatomy. In addition to discovering Purkinje images (reflections of objects from structures of the eye) and the Purkinje shift (the change in the intensity of red and blue colors as light intensity ebbs at nightfall) he also proposed the scientific term for plasma, the colorless fluid part of blood, lymph, or milk, in which corpuscles or fat globules are suspended. Today, his name also adorns a university in Ústí nad Labem, Czech Republic; a crater on the Moon; and a small asteroid (#3701), but he lives on—commemorated best, I like to think—as an elegant cerebellar cell.

That I did not know about Purkinje! Go read this beautiful essay on the Purkinje cell.