Three stories about the rise of data science programs

Data science programs of various types seem to be proliferating, and everyone seems to have a different notion of what, precisely, “data science” is. I’ve been watching this trend for years mostly as someone looking to maybe enter that world. Recent efforts to create a data science minor at Middlebury got me thinking about it again as a quasi-disinterested observer of college/university decision-making. I’ve come up with three stories on why data science programs are emerging at colleges and universities. These stories aren’t exhaustive or mutually exclusive, but I think they capture the essence of what I’m seeing.

1. Data science as an expression of the needs/fruits of capital/technology

I think this is what most people point to when making the case for a new data science program. Moore’s Law, the proliferation of open-source software, massively scalable cloud + distributed edge computing, and the explosion of data associated with these forces all create strong demand for a class of workers with a (somewhat) unique blend of skills. They need to be able to process data and do some analysis, create digestible summaries (often for executives or other decision-makers), and ideally do it all in a web-ready format using some amount of cloud capacity. But they don’t need to be statisticians or computational experts—those roles (still) often demand enough specialized knowledge to require more-advanced training. These pressures lead to data science programs that tend to have a lower level of statistical sophistication than pre-existing data-centric disciplines, like an econometrics focus or a stats minor. In exchange they can deliver more data acquisition and analytics communication skills. I think this view is pretty compatible with a “data science” major/minor, albeit one which may be mostly a convex combination of existing programs. There’s a non-trivial overlap between this type of data science and “business analytics” programs.

I think existing quantitative (i.e. not “stocks for jocks”) economics and business programs are probably the closest thing to this type of data science program. They tend to feature a statistical sequence emphasizing a blend of theory, applications, and interpretation (mostly the latter two); a theory sequence focused on the relevant modeling foundations (e.g. logics for considering certain variables in a model, domain-specific patterns to look for to ensure model validity); and applied courses really focused on topic-specific issues (e.g. environmental issues in an environmental econ courses). They a focus on developing digestible analytics products for managers or investors (more true of economics programs than business programs), on programmatically grabbing data from the web or specific servers (I think true for both, though maybe more for economics programs), and on working as part of a larger data generation/collection-storage-analysis pipeline (maybe true for both). To the extent that this story is leading to data science programs, I would expect to see them either not emerge where there are already business analytics programs. In those places I might expect to see an existing business analytics program morph into a data science program (at least partly for the branding). Where there is a strong economics department and/or business school, I would predict involvement from those faculty and courses.

2. Data science as a category for grouping scientific activities

This story holds that the rise of data science reflects the need for a semantic grouping of a class of activities that aren’t necessarily new, but are perhaps becoming more common. The terms “natural” or “social” science describe topics that a scientist studies, and the term “lab science” describes a methodology for conducting science. “Data science”, then, is a term for describing a different class of scientific methodologies. Loosely, in this story, “natural science:social science::lab science:data science”. These pressures lead to data science programs that codify and legitimize existing practices and activities, bringing them under a common umbrella. This can be valuable, especially if you believe that cool things happen when people with compatible skills/interests in different disciplines share a roof, e.g. cool new interdisciplinary work. But it’s not a set of pressures that really tends to motivate fundamental changes in how disciplines work—the change has already happened and the program’s rise reflects that, and/or the new “data science-y” variant of the field finds itself a bit on the fringes of its own discipline. In a sense, the emergence of this type of data science is the inverse of the process that led to the formation of political science: a grouping of disparate work under a common umbrella without a clear agenda (DS), rather than a fracturing of an established field into a new one focused on a specific agenda (PS). I’ve seen some research, like Jessica Hullman’s work, that I think could be classified as “data science first” rather than just “data science by way of another discipline”. But such research seems more the exception than the rule. I don’t think there’s much of an agreed-upon “data science” research agenda or many field-specific institutions guiding agendas yet.

This story came to me from reflecting on a colleague’s statement, “don’t all sciences use data?” Yes, surely, and how is that reflected in our language? We call some things “lab sciences”. Google tells me Oxford defines “laboratory” as “a room or building equipped for scientific experiments, research, or teaching, or for the manufacture of drugs or chemicals.” The drugs and chemicals bit can’t be essential to the definition, else we wouldn’t really consider many “labs” in physics or biology as “labs”. The bit about “equipped for scientific experiments, research, or teaching” could apply to nearly any natural or social science. Even economists run experiments! Even sociologists teach! Do we then consider economics or sociology as “lab sciences”? Not usually. We reserve the term mostly for “natural sciences”, where we view experiments as a key part of methodology. “Data science” provides a term for fields or subfields built around data analysis rather than data generation. In this view, a “data scientist” is like a “lab technician”: an individual trained in any of a set of fields, skilled in particular methodologies, and not necessarily wedded to a single topic. In this view, a program offering a “data science” major/minor is a bit odd—would we consider a “lab science” major/minor, training students in how to use laboratory equipment at large?—but a “data science” job classification or working group is perfectly logical and even useful.

3. Data science as an expression of shifting power relations between factions

This story views the emergence of “data science” programs as an expression of some programs growing/evolving/colonizing/dying/being colonized by data-centric philosophies. From what I’ve seen math, stats, and computer science seem to be typical initiators of these processes, e.g. the rise of computational linguistics. This doesn’t have to be an active takeover—there are often real synergies to these collaborations. But fundamentally in this story the emergence of a data science program is about specific factions within a college or university accumulating instititional power.

This take came to me from something Miles Kimball once said, something along the lines of, “you can tell how much a field feels it’s exhausted the low-hanging fruit in its core problem area by how much it tries to branch out into other areas.” Clearly there are many fascinating problems left in math/stats/CS. But even if there are low-hanging fruits, a program unable to attract the talent and resources necessary to chase those fruits may find itself forced to branch out, e.g. a CS department that cannot attract/retain leading scholars but is nevertheless active and seeking to do exciting work. I’ll call this “Kimball’s necessary condition for field branching”. (If it were a a sufficient condition I would expect see more small/under-resourced physics departments initiating data science programs.)

I think the other necessary condition here is a pursuit of institutional power. Programs which train relatively many students and can obtain consistent and unrestricted funding streams (independent of existing streams controlled by an advancements or alumni relations office) have the potential to accumulate institutional power—primarily resources and influence within/beyond the college or university. In places with programs subject to Kimball’s necessary condition, this story can generate data science programs where the first two stories may not—particularly if other conditions for power accumulation (e.g. rising relative enrollments, greater external prestige) are met.

How does one identify a data science program primarily oriented around gathering institutional power? One marker, I think, is rejection of existing power centers within the institution, particularly those which are data-centric but not subject to Kimball’s necessary condition. Such groups pose a threat to a power-oriented data science program. If let into the fold, the existing power centers can use their existing power to claim a large share of the rents generated by a data science program. Another is a lack of clear connection to story #1. A data science program which doesn’t emphasize specific skill packages and outcomes (or lacks a credible plan to deliver them) seems to be less based in story #1 than one which does. Note that power accumulation is completely compatible with story #2. Indeed, groups which already “do data science” but are not very powerful in their institutions may face a strong incentive to agglomerate and form a program that gives them an umbrella and some clout.

Reflections

In general I think stories #1 and #3 can be sufficient conditions for a data science program to emerge. I don’t think story #2 can be—maybe for a data science working group or something, but not for a full-fledged program, at least not without surplus resources that another faction within a college or university hasn’t claimed. Story #2 just doesn’t generate the potential for rents that (I think) is necessary to motivate and strengthen costly efforts to claim additional resources. Story #1 generates rents to college or university stakeholders primarily from outside entities, e.g. greater aboslute enrolments or wealthier alumni who donate more. Story #3 generates rents, but a non-trivial share may be transfers from existing data-centric fields towards the data science program initiators. New rents at the college/university level are generated if and only if the data science program is able to induce otherwise non-data-oriented students to enroll and improve their future earnings or social standing. If there are already data-centric power centers in the college or university, then most such students may already be claimed. A minor, then, may make more sense than a major—a low-commitment way to get students in seats and signal strength to other factions.

Story #1 is arguably consistent with any college or university’s mission, though it may be better received at some (e.g. technical colleges with a compatible focus, or universities focusing on expanding educational access to historically-underserved populations) than others (e.g. elite liberal arts colleges with high post-grad employment rates). Story #2 may or may not be mission-consistent, but provided the resource utilization is relatively low, it’s unlikely to be harmful. Story #3 is a bit trickier. It’s not obvious to me when or whether internecine power struggles can support a college or university mission. Perhaps a useful test is, “what new capacities does this program enable?” If it enables new sufficiently-valuable new capacities, perhaps the existing power balance was in need of disruption. Else, perhaps the struggle is simply a zero- or even negative-sum game.

View or add comments

Powers of 176 end in 76

Vikram Hegde tweeted a statement about powers of \(176\): they all end in \(76\).

I was about to ask for a proof before I realized I’m procrastinating and could write an R function to test this pretty quickly. So I did, and claimed that the statement was not true. Only the first \(8\) digits end in \(76\). Proof by computer!

But no. These numbers get big pretty fast, and floating point is not great with trailing digits of big numbers. Vikram and Manish noticed some oddities with the sequence. Manish went a step further, noting that powers all all digits ending in \(6\) ought to end in \(6\).

But I’m still procrastinating, so here’s a proof by induction: all powers of \(176\) end in \(76\).

The proof

First, note that \(176^1 = 176\) and \(176^2 = 30976\), both of which end in \(76\).

Now suppose \(176^m\) ends in \(76\). If we can establish that \(176^{m+1}\) ends in \(76\), we’re done.

Since \(176^m\) ends in \(76\), we can write

\[\begin{equation} 176^m = 100c + 76, \end{equation}\]

for some natural number \(c\). So now we have

\[\begin{align} 176^{m+1} &= 176^m 176 \\ &= (100c + 76)(100 + 76) \\ &= 100^2 c + 100 c \cdot 76 + 100 \cdot 76 + 76^2 \\ & \text{defining } 100 c = c_1, 76c = c_2, 76 = c_3, \\ &= 100 (c_1 + c_2 + c_3) + 5776 \\ &= 100 (c_1 + c_2 + c_3) + 5700 + 76 \\ & \text{defining } 57 = c_4, \\ &= 100 (c_1 + c_2 + c_3 + c_4) + 76 \\ & \text{defining } c' = c_1 + c_2 + c_3 + c_4, \\ &= 100 c' + 76, \end{align}\]

which is of the same form as what we had for \(176^m\). So \(176^{m+1}\) must also end in \(76\)…? I’m not entirely sure this follows, since our earlier statement was applied for some \(c\), not all \(c\).

But anyway, assuming it’s true (my favorite way to proof), we have what we need. The base case says \(176^2\) ends in \(76\), and the induction step says if \(176^m\) ends in \(76\) then so does \(176^{m+1}\). I’ll stop here—these are dark powers and I dare not toy with them. Thanks Vikram and Manish for this problem!

(I like proofs by induction because they make me picture a little train engine, chugging along establishing the statement till the end of time.)

EDIT: I changed the factorization from \(176^m = 10c + 76\) to \(176^m = 100c + 76\). I think this makes it a little stronger, since now \(c\) doesn’t need to be a multiple of \(10\) to keep the trailing \(76\). I think this also makes the step I was unsure about go through: \(c\) can be any natural number and \(100 c + 76\) will still end in \(76\).

View or add comments

Distribution regression in R

Distribution regression is a cool technique I saw someone talking about on Twitter. The idea is pretty straightforward: given an outcome \(Y\) which depends on some \(X\), we recover the distribution of treatment effects of \(X\) on \(Y\) by running regressions of the form \(I(Y \leq y_i) = f_i(X) + \epsilon\), where \(I(Y \leq y_i)\) is an indicator variable for whether \(Y\) is less than or equal to some value \(y_i\). By running this for a grid of \(y_i\) values covering the range over observed \(Y\), we recover the distribution of \(Y\) given values of \(X\). Chernozhukov et al. (2012) discusses this and other ideas in more detail (ungated version here).

The advice I’ve seen on choosing \(y_i\) is to use the quantiles of \(Y\), which gives a uniform grid. \(f_i\) could be a different function for each \(y_i\), or just a different coefficient in a sequence of linear models.

A simple example with a homogeneous treatment effect

Let’s illustrate this with a simple example. Suppose we have a binary \(X\) which has a constant treatment effect on a continuous \(Y\). The model is

\[\begin{equation} Y = 1 + 2X + \epsilon, \end{equation}\]

where \(\epsilon \sim N(0,1)\).

library(tidyverse)
library(patchwork)
set.seed(101)

# Generating the data
X <- round(runif(1000,0,1),0)
Y <- 1 + 2*X + rnorm(1000,0,1)
dfrm <- data.frame(Y=Y, X=X)

# Generating the grid
nodes <- as.numeric(quantile(Y, probs = seq(0,1,0.01)))[-1] # convert it to a numeric to strip the labels, and drop the min observation since we can't do anything with it

# Generating the indicators
for(i in seq_along(nodes)) {
  dfrm <- dfrm %>% mutate( "Y_{i}0" := ifelse(Y<=nodes[i], 1, 0) )
}

# summary(dfrm)

Now let’s run the regressions:

intercept_vec <- rep(NA,length.out=length(nodes))
prediction_vec_0 <- rep(NA,length.out=length(nodes)) # conditional distribution of Y given X = 0
prediction_vec_1 <- rep(NA,length.out=length(nodes)) # conditional distirbution of Y given X = 1
se_vec_0 <- rep(NA,length.out=length(nodes)) # vector for standard error of Y|X=0. obtain 95% CI by multiplying SE by 1.96 and +/- from prediction
se_vec_1 <- rep(NA,length.out=length(nodes))
models_list <- list()

for(i in seq_along(nodes)) {
  # Generate the models
  # models_list[[i]] <- lm(dfrm[,i+2] ~ X, data = dfrm)
  models_list[[i]] <- glm(dfrm[,i+2] ~ X, data = dfrm, family = "binomial")

  # Predictions when X=0
  # prediction <- predict(models_list[[i]], newdata = data.frame(X=0), se.fit=TRUE) 
  prediction <- predict(models_list[[i]], newdata = data.frame(X=0), se.fit=TRUE, type = "response") 
  prediction_vec_0[i] <- prediction$fit[[1]]
  se_vec_0[i] <- prediction$se
  # Predictions when X=1
  # prediction <- predict(models_list[[i]], newdata = data.frame(X=1), se.fit=TRUE) 
  prediction <- predict(models_list[[i]], newdata = data.frame(X=1), se.fit=TRUE, type = "response") 
  prediction_vec_1[i] <- prediction$fit[[1]]
  se_vec_1[i] <- prediction$se
}

results <- data.frame(Y_0 = prediction_vec_0, se_0 = se_vec_0, Y_1 = prediction_vec_1, se_1 = se_vec_1, Y=nodes) %>%
  arrange(Y)

F_y_x_plot <- ggplot(data = results, aes(x=Y)) + 
  geom_line(aes(y=Y_0), color="dodgerblue2", size=1) +
  geom_line(aes(y=Y_1), color="firebrick2", size=1) +
  geom_ribbon(aes(ymin=Y_0-1.96*se_0, ymax=Y_0+1.96*se_0), alpha = 0.25) +
  geom_ribbon(aes(ymin=Y_1-1.96*se_1, ymax=Y_1+1.96*se_1), alpha = 0.25) +
  labs(title="Conditional distribution of Y (blue: X=0, red: X=1)", x="Y value", y="F(y|x)") +
  theme_bw()

Q_y_x_plot <- ggplot(data = results, aes(y=Y)) + 
  geom_line(aes(x=Y_0), color="dodgerblue2", size=1) +
  geom_line(aes(x=Y_1), color="firebrick2", size=1) +
  geom_ribbon(aes(xmin=Y_0-1.96*se_0, xmax=Y_0+1.96*se_0), alpha = 0.25) +
  geom_ribbon(aes(xmin=Y_1-1.96*se_1, xmax=Y_1+1.96*se_1), alpha = 0.25) +
  labs(title="Conditional quantiles of Y (blue: X=0, red: X=1)", x="F(y|x)", y="Y") +
  theme_bw()

F_y_x_plot | Q_y_x_plot

plot of chunk unnamed-chunk-3

In the left plot, the blue line is the distribution of \(Y\) given \(X=0\) and the red line is the distribution of \(Y\) given \(X=1\). The right plot shows the corresponding conditional quantiles of \(Y\). The average difference in Y values between the blue and red distributions is about 2—almost exactly the treatment effect of \(X\) on \(Y\). It’s consistent throughout, indicating that the effect is constant. The conditional distribution for \(Y \vert X=0\) reaches \(1\) faster than the distribution of \(Y \vert X=1\), affirming that having \(X=1\) increases the value of \(Y\).

It’s a bit awkward that the confidence are going above 1/below 0, but that’s fixable. Chernozhukov et al. (2012) also discuss a bootstrap procedure for generating “correct” confidence intervals. I don’t actually know if the “usual” way I did them here is appropriate, or what assumptions it would imply.

Multiple regression

Distribution regression can scale to multiple RHS variables, too. Suppose we take the model from before (binary \(X\), continuous \(Y\)) and introduce a variable \(Z\). The model is

\[\begin{equation} Y = 1 + 2X + 0.75Z + \epsilon, \end{equation}\]

where \(\epsilon \sim N(0,1)\).

# Generating the data
X <- round(runif(1000,0,1),0)
Z <- runif(1000,0,1)
Y <- 1 + 5*X + 2*sqrt(Z) + rnorm(1000,0,1)
dfrm <- data.frame(Y=Y, X=X, Z=Z)

# Generating the grid
nodes <- as.numeric(quantile(Y, probs = seq(0,1,0.01)))[-1] # convert it to a numeric to strip the labels, and drop the min observation since we can't do anything with it

# Generating the indicators
for(i in seq_along(nodes)) {
  dfrm <- dfrm %>% mutate( "Y_node{i}" := ifelse(Y<=nodes[i], 1, 0) )
}
mean(dfrm$Z)
## [1] 0.5141996
intercept_vec <- rep(NA,length.out=length(nodes))
prediction_vec_0 <- rep(NA,length.out=length(nodes)) # conditional distribution of Y given X = 0
prediction_vec_1 <- rep(NA,length.out=length(nodes)) # conditional distirbution of Y given X = 1
se_vec_0 <- rep(NA,length.out=length(nodes)) # vector for standard error of Y|X=0. Obtain 95% CI by multiplying SE by 1.96 and +/- from prediction
se_vec_1 <- rep(NA,length.out=length(nodes))
models_list <- list()

for(i in seq_along(nodes)) {
  # Generate the models
  # models_list[[i]] <- lm(dfrm[,i+3] ~ X + Z, data = dfrm)
  models_list[[i]] <- glm(dfrm[,i+3] ~ X + Z, data = dfrm, family = "binomial")
  
  # Predictions when X=0
  prediction <- predict(models_list[[i]], newdata = data.frame(1, X=0, Z=mean(dfrm$Z)), se.fit=TRUE, type="response")
  prediction_vec_0[i] <- prediction$fit[[1]]
  se_vec_0[i] <- prediction$se
  
  # Predictions when X=1
  prediction <- predict(models_list[[i]], newdata = data.frame(1, X=1, Z=mean(dfrm$Z)), se.fit=TRUE, type="response")
  prediction_vec_1[i] <- prediction$fit[[1]]
  se_vec_1[i] <- prediction$se
}

results <- data.frame(Y_0 = prediction_vec_0, se_0 = se_vec_0, Y_1 = prediction_vec_1, se_1 = se_vec_1, Y=nodes) %>%
  arrange(Y)

F_y_x_plot <- ggplot(data = results, aes(x=Y)) + 
  geom_line(aes(y=Y_0), color="dodgerblue2", size=1) +
  geom_line(aes(y=Y_1), color="firebrick2", size=1) +
  geom_ribbon(aes(ymin=Y_0-1.96*se_0, ymax=Y_0+1.96*se_0), alpha = 0.25) +
  geom_ribbon(aes(ymin=Y_1-1.96*se_1, ymax=Y_1+1.96*se_1), alpha = 0.25) +
  labs(title="Conditional distribution of Y (blue: X=0, red: X=1)", x="Y value", y="F(y|x)") +
  theme_bw()

Q_y_x_plot <- ggplot(data = results, aes(y=Y)) + 
  geom_line(aes(x=Y_0), color="dodgerblue2", size=1) +
  geom_line(aes(x=Y_1), color="firebrick2", size=1) +
  geom_ribbon(aes(xmin=Y_0-1.96*se_0, xmax=Y_0+1.96*se_0), alpha = 0.25) +
  geom_ribbon(aes(xmin=Y_1-1.96*se_1, xmax=Y_1+1.96*se_1), alpha = 0.25) +
  labs(title="Conditional quantiles of Y (blue: X=0, red: X=1)", x="F(y|x)", y="Y") +
  theme_bw()

F_y_x_plot | Q_y_x_plot

plot of chunk unnamed-chunk-5

Anyway, I thought this was a cool technique and I’d like to use it sometime.

View or add comments

Recessionary jargon and epidemics

The definition of “a recession” is weird.

The NBER Business Cycle Dating Committee has spoken: the COVID recession in the US lasted 2 months. NBER defines a recession as the time from the peak of a business cycle to the trough. The trough has to be “significant” enough that it isn’t just noise, which in the past usually meant that it was at least somewhat prolonged. As the Committee notes in their announcement, the shortest recession prior to this one was 6 months long.

I know this definition, have known this for a while, but I’ve never really thought about it before. Why peak to trough? What concept are we trying to capture there? The definition is describing the duration of the decline, but why?

In my experience of general use, “recession” often means “the bad time when the economy isn’t doing well”. I don’t know many folks who use generally use it without referring to the suffering first, technical definition sometime later (if at all). When the technical definition differs from the general-use sense of the term, it’s more common that I hear “those economists are crazy” than “huh, let me recalibrate my use of this term”. Economics papers and reports use the technical definition but that’s to be expected. Maybe the two senses of the term align pretty well for most types of recessions observed before. But this wasn’t like most of the recessions observed before. (Even the Committee agrees, this time is different.)

I’m not original for saying this was the first pandemic that the global “we” have been so prepared to deal with. We went into it with lots of machines and drugs to keep people alive, we developed vaccines in record time, in some places we acted very quickly with social distancing, and so on. It’s also been the first set of epidemic-driven recessions we’ve experienced with so much preparation.

“Rational epidemic” models describe laissez-faire epidemic recessions as short, brutal things. I’m working on such a model (with an amazing group of people way smarter than me) to think about disease-economy tradeoffs under different policy approaches. When calibrated to the national US economy and contact structure using pre-pandemic data, the recession (technical meaning) hits the trough at day 55 of the epidemic. At the quarterly frequency, it’s “only” a 33% decline in consumption. But at a daily frequency, it’s a 66% decline. In our calibrated model, it takes about a year and a half for the economy to recover from the epidemic and the virus to be suppressed. The recovery is a slow, grinding thing. It doesn’t have to be; if the coordination failure driving the recession is addressed, it’s a 3% daily consumption decline at the trough, with a similar 2-month recession duration. The virus still takes about a year and a half to suppress, but the economic recovery takes only half a year. The suffering is much lower when policy addresses the coordination failure effectively even if the recession (technical sense) is just as long. (We define recovery as “output gap within 1.5% of the initial steady state”, but the result holds for any such threshold. Our definition is entirely arbitrary and “reasonable-seeming”.)

This isn’t a post about that paper, so I’ll leave it at that. This isn’t a post about rational epidemics, either. This is about the term “recession” in general, and epidemic recessions in particular.

When I first saw the NBER announcement, I thought it was some kind of joke. Were they out to lunch? The peak-to-trough decline in GDP was on the something like 38%, and you’d have to be under a literal rock to miss the ongoing suffering. The Sahm’s rule indicator doesn’t flatten out till April 2021. I had to read the announcement twice before the definition stuck. It was right at the top, so it’s a testament to the strength of my prior that the COVID recession (general sense) really sucked and continues to do so for a lot of people. Okay, given this definition, the COVID recession (technical sense) in the US was 2 months long. The grinding “expansion” that followed has been a year and change of very polarized suffering, but polarized suffering is nowhere in the definition.

I think this recession highlights the issue of concept-jargon disconnect: the technical term we use to describe a thing actually describes an effect of the thing in its general usage, rather than the thing generally discussed itself. (I’m sure this is an idea people already have a name for, but I couldn’t find it.) Describing a recession in the general-use sense is hard, and maybe the technical sense came first, I don’t know. But now the term seems pretty clearly linked in general usage to a concept that’s similar to, but can be quite distinct from, the definition of the term. The concept and the jargon aren’t fully connected to each other. I think this kind of thing is bad PR for economists and promotes sloppy thinking, but what else is new. There are lots of terms like this (“rational”), and anyway I don’t know if it’s avoidable when studying people and social things. Folks who study critical race theory are getting a particularly nasty version of the bad PR angle, with a general use of the concept being actively shaped to push an inimical agenda.

Sometimes there’s a clear theoretical reason behind keeping a bit of jargon around despite its disconnect from the concept. “Rational” is a great example here. The disconnect between “complete and transitive preferences” (technical sense) and “doing smart things” (general sense) is large, especially when talking about things like epidemics and coordination failures. It’s still bad PR, and I still think it can promote sloppy thinking. But the technical-use definition maps to a real theoretical construct, the precise definition of which is important for a lot of other theory and empirical approaches. Even if we changed the name we’d still keep the definition because it’s important on its own terms. Is the peak-to-trough definition important for any theoretical reasons? If we changed the name, would “peak-to-trough decline” still be an important part of the edifice of economics rather than just one among many statistics we use to describe “the time when the economy isn’t doing well”? Or is it a case of “we’ve always done it this way, so let’s be consistent”/”that’s just the definition”? The latter feel like lazy reasoning to me, especially in the macro world where things like unemployment statistics aren’t directly comparable across certain periods because of definitional changes.

Suppose we take the definitions we’re given. In that case, let’s be very clear that epidemic recessions are quite different from financial recessions. The coordination failure isn’t about “expectations” or other residuals, it’s about a real biological phenomenon that can be meaningfully measured. (I’m skeptical of our ability to measure expectations meaningfully, but that’s another story.) The coordination failure can be resolved (in part) through Pigouvian/Coasean policies—though fiscal and monetary policy have a role to play, this isn’t about getting shovels in the dirt or scaring the bond vigilantes. In the absence of policy that actually addresses the failure, the recession will be severe. The severity of the COVID recession for so many people in the US is an indicator of realtime policy failure. It’s a really bad failure, especially given how well-prepared we were by historical standards.

A recession definition that’s focused on the duration of the decline instead of its magnitude, or recovery time, or dispersion of losses, or some combination, or even a composite Ramsey-like indicator of welfare losses (let’s dream) isn’t really equipped to describe an epidemic recession. I’m not sure what the NBER/current macro definition is equipped to describe. I’m not sure why I (as a person experiencing the economy, or even as an economist studying it) should care about what it does describe beyond its attenuated connection to the suffering people experience when the economy isn’t doing well. True, the Committee decided to call this a recession despite how short it was (“what’s duration got to do with the price of milk?”). A sociobiologist friend once described economists as “phenomenal mathletes playing Calvinball”. To me, this announcement has strong Calvinball energy.

View or add comments

Institutional governance and endowment management

Good institutional governance involves good management of institutional wealth. For some institutions a lot of that wealth is in an endowment. “Endowment rules” governing the proportion of wealth consumed today vs saved for tomorrow encode a lot of information about the institution’s values, goals, and constraints.

Maximization is probably not a desirable framework for designing governance

For a broad class of institutional objective functions, endowment rules which maximize the objective ought to be Euler-like. They should equate marginal utilities and shadow prices today with their discounted counterparts tomorrow. (If the objective does not involve the future then those tomorrow components can be set to zero, but let’s suppose we’re in the class of objectives that presume continuation into the indefinite future.)

Let’s suppose the institution uses a market discount rate. The longer the impact of a new project stretches, the more the lowest possible discount rate ought to be used. Regardless, any maximizing endowment draw rate \(c_t\) ought to satisfy something like

\[u'(c_t) + \lambda_{t} f'(K_{t}, c_t) = \beta E_t[u'(c_{t+1}) + \lambda_{t+1} f'(K_{t+1}, c_{t+1})],\]

in all periods, where $u$ is the utility function which encodes the institution’s values and goals, \(\beta\) is the discount factor, \(K_{t}\) is the stock of wealth, $f’$ is the marginal growth of the institution’s wealth, and $\lambda_t$ is the shadow price of that wealth. \(E_t\) on the right hand side indicates the need for forecasts of how things might look tomorrow. This can be generalized to multiple types of wealth and consumption if necessary, but it already gives us a lot of information. For example, \(u'\) and the \(\lambda\)s tell the institution’s managers how to value marginal fundraising or new projects.

A lot of institutional governance can be expressed as struggles over the components of equations like this one. What’s the $u$ we’ll use, and who gets more or less voice in that? Which \(\lambda\)s should be considered, and how should their value be assessed? Which discount rate is relevant? In actual practice, the components of equations like these change over time as different factions become ascendant, constraints from previous decisions bind, and constraints on future decisions (perhaps to limit some new faction’s ascendancy) are introduced. Getting an institution to commit to following any explicit stationary maximization rule is probably hard and likely to raise a lot of unpleasant political battles. Maximization takes us to the Pareto frontier (defined appropriately relative to the objective being maximized), and the Pareto frontier tends to force uncomfortable tradeoffs. It’s not obvious to me that any generation’s preferred maximization rule can really be time consistent when current members exit and new members enter. Part of the appeal to turn-taking modes of politics is the chance to change the components guiding decision-making eventually, so the battle to make a stationary rule could be quite ugly even without raising the prospect of future generations with intrinsically different goals or constraints.

Sustainability and guaranteeing capabilities to future generations might be a better approach

It’s probably a lot easier and less contentious to have some sort of heuristic which acts as a safeguard against “bad” decisions rather than a rule which forces “good” decisions. I think a good one for long-lived institutions involves ensuring some notion of sustainability. Fortunately, smart folks have thought about this. An equivalence result in the linked paper establishes that changes in intergenerational well-being map to changes in “comprehensive” wealth. In the institutional context, that “intergenerational well-being” is simply the intertemporal objective. “Comprehensive” wealth includes machine and financial (“reproducible”) capital, human capital, and natural resource capital. Endowments are not the same as “comprehensive” wealth, since many institutions have at least substantial human capital if not natural resource capital, but suppose we take it as such for now. This suggests a simple rule to use for ensuring an institution’s wealth is governed sustainably: make sure the real value of the endowment (before gifts) doesn’t fall.

This approach gives the current generation a lot of flexibility to exercise their capabilities and look after their own interests without harming future generations. Large negative shocks to permanent income (e.g., a pandemic with a large recessionary impact) would reduce future comprehensive wealth, giving current generations a bit more room to draw on capital and smooth over the shock. There are probably more nuanced and smarter things to be said about sustainability over the business cycle. But the bigger appeal, to me, is that it allows us to abstract from the specific utility function (and avoid encoding maximization goals) to focus on wealth instead.

Gifts could perhaps be included in the value being maintained, but I think there are two reasons to exclude gifts. First, to the extent that gifts are random and hard to forecast, forecast errors can lead to unsustainable paths. If the goal is to be sustainable for as much of the future as possible, excluding gifts can support the objective but including them can only harm it. This is a sort of “conservative management” principle. Second, to the extent that gifts come with additional obligations, the short-run gains they provide may not net out with the long-run costs they impose. This is a really unpleasant effect, since endowment managers can end up having to starve other line items being funded in order to preserve the real value of the gift. The fact that gift contracts include these kinds of terms seems to me a signal that people have found value in rules like this. The entire endowment is effectively a gift from previous generations to current ones, with current generations acting as stewards for future generations, so applying the rule to the endowment as a whole (rather than some parts only) seems like a natural extension.

A sustainability rule like this one requires institutional managers to accept new projects with caution even if they come with funding sources. If new projects create future obligations which could reduce wealth eventually, they may be unsustainable despite guaranteed current funding. Austerity measures which reduce future flows into wealth (e.g., by reducing the long-run prestige of the institution) are similarly unsustainable. These conclusions follow from purely financial considerations, since we are treating “comprehensive” wealth as only the financial wealth in an endowment.

For institutions like colleges and universities adding some measure of “accessible knowledge capital” seems sensible as well. It’s not obvious how to value it (“what’s the shadow price?”) but changes in library scope or quality could form a useful metric for investment in said capital. Arrow and co manage to assess sustainability using only changes in comprehensive wealth (“comprehensive investment”), so the measurement problem here is really only to track and value flows. Endowment flows are easy to track and value, at least in principle.

The most interesting issue to me is how to think about human capital in this framework. Arrow and co think about this as education and health, which maybe makes sense for nations. Health capital seems like a reasonable component for any kind of institution that employs people, especially under the American system of employer-provided healthcare. Perhaps some metric like “proportion of claims filled” so as to avoid selection against those with pre-existing conditions, is a useful way to track the flows of health capital. There’s still a problem of figuring out the right shadow price. Supposing the measurement challenge is resolvable, adding a health capital metric to comprehensive wealth offers an even more stringent (conservative) test for the sustainability of austerity measures.

I don’t think “education” forms a desirable metric for human capital. Institutions often need employees with a wide range of skills, representing different types and degrees of formal and informal education. Appropriately measuring such capital/investment and valuing it with respect to institutional goals seems quite hard and like a recipe for bitter internecine fights. Perhaps there’s some other metric that better gets at the end goal of having different mixes of employees, but I still see unpleasant politics ahead when trying to incorporate human capital.

A sustainability approach delivers a bunch of things

Properly defined, sustainability and guaranteeing capabilities for the future requires relatively simple heuristics for wealth management. These principles will generally not be of the form “only draw \(x\%\) per year”, no matter what \(x\) is chosen. They do not provide a guide to what a “good” decision is, only guardrails against “bad” decisions. A “bad” decision in this framework is anything which reduces comprehensive wealth along future paths. “Bad” decisions can include projects with funding not guaranteed beyond a few years, austerity measures, and direct reductions in institutional capital stocks (e.g., libraries). Sustainability rules create a need for forecasts and some further rules on how to handle uncertain impacts in the future. The rules to handle uncertainty seem tricky: it’s not obvious what the right level of risk tolerance ought to be. But some state-dependent sustainability guardrails seem better than none.

View or add comments