Reward and punishment operate through two very different pathways in the human brain. The general idea is that these two types of learning – positive and negative – operate through different unique types of dopamine receptors. The D1 receptors (D1R) are generally ‘positive’ receptors, while the D2 receptors (D2R) are ‘negative’. Specifically, D1Rs generally tend to increase the concentration of CamKII and D2Rs decrease it; this means that they are going to have opposite effects on downstream pathways such as receptor plasticity, intrinsic voltage channels, etc.
How are the D1 and D2 pathways distinct in terms of learning? The hypothesis has been that in striatal projection neurons, D1R expressing medium spiny neurons (dMSNs) mediate reinforcement and D2R expressing indirect pathway neurons (iMSNs) mediate punishment. Kravitz et al expressed channelrhodopsin selectively in dMSNs and iMSNs so they could use light to activate only one type of neuron at a time. They figured that the striatum would be a good place to start looking for the effects of these neurons. After all, it is a primary site of reinforcement and action selection (also, they probably tried a few other places and didn’t get great results…?). These transgenic mice were then placed in a box with two triggers, one of which would stimulate the light and the other would do nothing. So the mice are in this box, and able to turn on and off their neurons if they want to. I wonder how that feels?
When the mice were able to activate their D1R (positively-reinforcing) neurons, they were much more likely to keep pressing the trigger. The D2R (negatively-reinforcing) mice were more likely to press the other trigger. But that’s not all! By the third day, the effects of activating the D2R pathway had worn off – they no longer cared about the effect. You can see this on the graph to the left, where 50% is chance. The preference for the D1R pathway persisted, however. Even on short time scales of 15 – 30 seconds, the mice kept their preference for stimulating D1R reward cells over D2R aversion cells. In the figure to the right, this is seen with YFP being a control (it should have no effect); whereas activating the dMSN pathway over the first 30 seconds always is different than activating YFP, the iMSN pathway only shows a (statistical) different over the first 15 seconds.
The authors conclude by saying that that the dMSN pathway is sufficient for persistent reinforcement, while iMSNs are sufficient for transient punishment. This is a nice finding; that the D1R pathway really is doing some positive reinforcement and that the D2R pathway is doing negative reinforcement, and one is more effective in the long-term than the other. Remember this when raising your kids!
Kravitz AV, Tye LD, & Kreitzer AC (2012). Distinct roles for direct and indirect pathway striatal neurons in reinforcement. Nature neuroscience PMID: 22544310