You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

Archive for the 'Reflections' Category

Time to Put the Cards on the Table

ø

After a weekend that was less productive than I would have hoped, I finally bit the bullet and made a start on laying out my index cards and building the structure of the papers. It’s been quite helpful so far—forcing myself to determine what conclusions I can actually make with the data and plots I have makes it clear that those are the questions I must ask at the outset, and build introductions to support.

A big sticking point for me right now is the first paper. With the exception of the side-track about morphology and phylogeny, this paper is basically about “how to build a morphospace”, which isn’t really an interesting finding so much as a long methods section. The one thing I realized in doing it, and that Andy found quite interesting too, is that the choice of data culling criteria can have a pretty substantial effect on what you see, and it’s not a choice that has (to my knowledge) been addressed explicitly in prior morphospace studies. But for that to be a useful finding of the paper, I will need to actually run some analyses to show how different choices affect the outcome. Setting up the choices shouldn’t be too hard—I should think starting with the full data set as collected and then executing random (bootstrap) replicates for progressively smaller subsets of the original data is the way to go—but I’m not sure what metric I should use to show these effects. Mean pairwise distance through time? This is one of those metrics that is used a lot, but then I need to front-load a whole lot of explanation about the through-time stuff (linking the morphospace to Neptune, etc.) that I was hoping to save for the second paper, where I think this belongs.

Anyway, these are the sorts of questions I’m dealing with. I really had hoped that I was all done with analyses by this point, but I’m not sure how well the first paper will stand up on its own without a little bit of additional work. It’s really not a huge deal—it shouldn’t take more than a day to code up once I’ve decided on a metric—but choosing an output variable that captures what I’m trying to say and works with the logic and construction of the two papers is a bit of a challenge.

Here is what the poker table looks like, by the way:

I did also take two closer-up views of the layout for the first paper and the second paper, just in case disaster strikes in the form of wind, fire, loss, etc. You can’t be too careful at this stage.

The Genuine Improvement™ Weekend

ø

The end of last week was a bit of a struggle—the accumulated weeks of struggle combined with watching SJ and John Crowley defend (in the same day) drove the point of my stasis home with a vengeance. It was a bit of a low.

It was doubly pleasant and important, then, that the weekend was a real raiser of spirits. On Saturday Kati and I got away for the day, spent a very relaxing morning talking at Darwin’s, and then a restorative afternoon walking through Maudslay State Park in Newburyport. It gave us the chance to finally spend the sort of quality, pair-bonding, unstructured and carefree time together that I had hoped Copenhagen would provide, but was disappointed that it hadn’t. Perhaps it just needed time, but things feel markedly improved this morning.

On Sunday afternoon we spent some time with Evan and Katie (and Gavi), which was also a surprising source of motivation. Evan has an almost uncannily positive attitude to big tasks and intimidating projects at work. Perhaps it was because I came primed from a weekend of relaxing and connecting, but somehow giving my usual “no, I’m not done yet” pity party spiel this time inspired me to take a more Evan-ish, optimistic, go-gettum view of the task at hand. I am at a point where I can finish up (these first two chapters, at the very least), and what a formidable challenge. So, instead of moping, fearing, and pushing my head far into the sand until the last moment of the weekend, I actually spent Sunday evening quietly looking forward to getting to work and moving forward. I programmed the coffee maker before bed and felt rested and ready to go this morning.

Anyway, this is all a long preamble, but the bottom line is that I am working at Darwin’s today feeling qualitatively different than I have for the past few months—since the big push started petering out in March.

Pairwise Combinations, Continued

ø

I left off yesterday with a cliffhanger about the conceptual jump from a simple 1D space of character states to the “number of realized pairwise character combinations” disparity metric. This metric takes the 1D list of character states and expands it to two dimensions, with two identical axes. Each grid square (matrix element) is now a combination of two characters, the matrix representing all possible pairwise combinations. Some of these pairwise combinations need to be disregarded, of course. For starters, the main diagonal of the matrix represents pairings of a character state with itself, which obviously makes no sense. So from a total of (m^2) combinations, we must subtract m items. Then, because the matrix is symmetric, we must disregard one half of the remaining combinations, because the upper triangle is a duplication of the lower triangle, leaving [(m^2)-m]/2 combinations. Finally, some of these remaining combinations will be impossible—for examples, pairwise combinations of states of the same character (you can’t be square and triangular at the same time, using yesterday’s example), as well as logically inapplicable combinations (from my morphospace, for example, you can’t have “no raphe” and “fibulae” at the same time). So the final number of combinations can be written as something like  {[(m^2)-m]/2} – k. Crucially, this number is of order (n*x)^2 rather than the “full” morphospace, which for binary characters has 2^n, for multistate characters x^n where n is the number of characters and x is the number states for each, and so on.

So, to cut a long story short—there are three ways I’ve talked about to measure how much morphospace is occupied. The first is to measure how many “cells” of the full, n-dimensional morphospace are occupied, but this is such a vast space that changes in occupancy are likely to be meaningless. And more importantly, a nightmare to try and calculate. The other extreme is to simply collapse the morphospace down into 1 dimension, namely a binary space consisting of just the character states. This will be much less sparsely filled. Somewhere in between those two options, but closer to the second one, is the “realized pairwise character combinations” approach. It will be more sparsely occupied than the 1D approach, but much more densely occupied than the full morphospace. But: it’s no different in principle than the 1D approach, and I think it only really makes sense if the 1D approach fails because the data “saturate” in 1D (like in our toy example).

Here’s the justification why I’m not going to go through the trouble of doing pairwise states: The number of realized states tops out at about 300 (306 to be exact). My plot shows an increase through time up to that point. There are 317 states in total, so we are not “saturating” that 1D space early—I think we’re seeing everything we’re going to be able to see. Justification, part II: it would be a ton of work and a genuine hassle to figure out which are the forbidden character combinations (due to logical inapplicability). [BUT: this might not actually be necessary, since that’s only required to tally the total number of possible pairwise combinations, e.g. for calculating % saturation of the space. Wouldn’t be needed for a tally of the raw number of realized pairwise combinations.]

 

Time for a Break

ø

Made this plot, following on from the plots showing average list length through time and average convex hull volume per list through time. It struck me that they looked similar, and indeed, when time is taken out of the equation and one is plotted against the other, it seems indeed that the major control on morphospace occupied by a list is the diversity of that list (at least when viewed on average per time bin).

What does this mean? Well, in the most conservative (and perhaps cynical) interpretation, I would read this to mean that morphospace is pretty well constant over time. Some lists are longer than others, perhaps because of the choice of what taxa to list for a particular section, or perhaps because there were simply fewer taxa present in the section. But the more taxa are found, the more morphospace is occupied. The two outliers are, of course, the Cretaceous samples (data collected according to very different rules), the rest fall on a pretty tight trendline.

Progress Report Time (This Time, Without Giving a Shit)

ø

Progress report season… for the sixth time! Yay! Well, actually I’m not sure I even had a progress report the first two years, but whatever. I spent yesterday evening reading through the past year’s DSA notes (depressing), comparing them to the goals I set out at this time last year (actually, precisely to the day!). This morning, I sat down and wrote out the progress report and handed it to Andy. I was feeling pretty good about myself, particularly since I managed to convince him not to call a committee meeting (for the first time ever). But then he read over the report and came back to my office to request a change: I had described how, as per his recommendation, I was dropping the diatom diversity project and instead expanding the morphospace project to two chapters, he “reminded” me of the PlanktonTech book chapter we agreed to write and asked that I change the section in my progress report back to include a chapter on diatom diversity.

What?! I thought the diatom diversity chapter was dead. I thought I had explained to Andy that I didn’t think the SQ subsampling method was going to work on the Neptune data. I thought he had suggested I drop the chapter, “with an eye toward finishing sooner rather than later”. Well, that didn’t seem to matter much—I suppose he remembered that there was a book chapter due for the PlanktonTech people, and that it was supposed to be about diatom diversity, and that was it. Just add it to the dissertation, as another chapter.

I could freak out at this point. I could despair about how to goalposts keep shifting. I could sit down and try to realistically plan how I am going to go from two chapters worth of data and analysis and no chapters written to four chapters worth of data and analysis and four chapters written by September 15th (the deadline for dissertation submission for the November graduation date). But I think I’m just too exhausted to do that at this point. Andy wants a chapter on diversity? Fine. So I rewrite the progress report (here it is, by the way) to include a few sentences about how “the diatom diversity project will take on a smaller role and will be represented by a short review chapter for submission to the book resulting from the PlanktonTech research initiative”.

Whatever. I don’t have the energy to engage with stressing out about how long things are going to take, when I am going to be done, what the dissertation is going to look like. The best I can do right now is go from one day to the next. Today, I needed to get a progress report done and signed by my committee members. I did that—I got Andy, Jacques, and Dave to sign off on it (and without requiring a committee meeting!). Whatever happens tomorrow, or next month, or when the thesis is due, happens then. Who cares what the damn report says.

March Madness: Day 22 is Tufte Day

ø

Took the day off to see Tufte do his thing. It was cool. I liked the idea of making graphics about the content, and putting everything in service of the cognitive task at hand—of making every aspect of the display support the intellectual activity the display is trying to accomplish. At many points along the way I reflected on what this means my morphospace project. In some ways, a helpful reflection. In other ways, reinforcing my crippling stuckness. There’s nothing I can accomplish with a good figure if I don’t know what I’m trying to say with that figure.

The metaphor is the map. Make the graphic as clear and uncluttered and minimal as a map. But, how can you make a map if you don’t know where you’re going?

“The best good design can do is not to get in the way.” I liked that thought. But it scared me a bit, too, because in some ways I feel like well-designed figures is all I have in this project. What I’m lacking is the spine to back it up.

March Madness Day 21: It’s Officially Not Madness Anymore

ø

It’s just March Stuckness. I helped Wil with his project some more. Sure, it felt good. But I felt like crap about the morphospace and didn’t do a single little thing about it. Tried to get Github set up for it. Nothing else.

Not a single word written that I could send Beau. Not a single shred of motivation to do anything about it. What a dire low.

Wasted Weekend

ø

Saturday: Ali and Mike brought Amanda for her college visit. I just couldn’t find it in me to recuse myself given Ali’s condition. Got two phone calls, the latter of which kept me up until past 1am, which were both very emotionally draining.

Sunday: Completely exhausted, spent the morning having a more or less complete meltdown. Spent the afternoon doing taxes. All in all, a completely non-work weekend.

To Crawl From a Stall—March Madness Day (Barf) 13

ø

I can’t believe it’s two weeks into the March Madness month. I am trying very hard not to focus on it, but I’m half way to the deadline for my first chapter, and I don’t feel great about the rate of progress in the last week. I think being sick was just a part of it—I’m also pretty capitally stuck, and I’m not sure what how to forge ahead. The past few days haven’t been a complete write-off. I’ve had some good ideas, chief among them plotting the taxa in morphospace with the plot symbol size reflecting the proportion of occurrences that the taxon represents. This would create one more plot for the morphospace-through-time paper, if indeed I go with the division of papers Andy suggested.

This is one of the sticking points. What will go in which of the two papers? One possible layout, suggested by Andy, was to present the results more or less in the order I did them. First paper:

  1. General introduction of both papers
  2. PCO dimensionality reduction and methods for estimating variance captured by axes
  3. What the axes mean, and how hard it is to know that
  4. Comparison to NMDS
  5. Comparing PCO (axes 1-2) to phylogeny
  6. Comparing morphological to molecular distance

The second paper would then look at morphospace through time.

  1. Linking the PCO morphospace to Neptune—stacked 3D plot with convex hulls
  2. Disparity and diversity through time, under range-through
  3. Ditto under sampling in-bin
  4. Ditto under subsampling by unweighted lists
  5. …?

Andy’s suggestion was actually to end the first paper with the PCO through time plot, and the leave the disparity analyses for the second papers. It seems more logical to me to have all of the through-time stuff in the second paper, but maybe I’m wrong in that. Also, if I do put the first PCO-through-time plot in the first paper, what does that leave for the second paper? Just looking at disparities through time in various subsampling regimes? Hmph.

Well. That’s the first sticking point. The second sticking point is the narrative, the biological hook, the meaning of it all. At this point everything I have is sheer analysis, divorced from biological meaning or hypotheses—both at the general evolutionary theory level and at the specific level of diatoms, their ecology, and what has been written about their evolution. The past few days’ reading (half-hearted though it was) has not really helped me find direction in that regard.

The third sticking point, which is closely related to the second, is what to do about the missing characters-through-time plot. I want a plot in my paper that shows some relevant characters changing in their prevalence or occurrence through time. I gave up on the idea of plotting all characters, and instead found the characters responsible for morphospace expansion in each time bin. That didn’t lead to any clear story. So then I was hunting for biological hypotheses in the literature that might be tested by looking for particular characters (see the problems in sticking point #2). So now all that’s left to be done, besides more open-ended reading in the hope of coming across something that makes sense, is to sift through the list of characters in the morphospace and hope that something jumps out at me as relevant. Yech.

Well. So much for the main sticking points. I suppose I have a couple of clear to-dos arising:

  1. Generate the morphospace “density” plot.
  2. Look through the list of characters for ones that might be biologically interesting to plot through time.

These, unfortunately, don’t address the major to-do, which is to get the damn chapters written. Possibilities: since writing isn’t working so well, try talking through what I have, if necessary on tape or to an unsuspecting victim, and hope that some sort of narrative emerges that way. Read more. Try turning my figures into a Keynote presentation and generate a narrative (or narratives) that way. That’s something I’ll have to do sooner or later anyway.

 

Chew(-ing-my-arm-off)sday

ø

Grrrr. This is not the level of productivity I’d been hoping for. There’s a definitive taste of slump in the air, and I don’t like it. Tomorrow is February, and the feeling of disappointment is hard to fend off. I was going to be done with the morphospace chapter, once and for all, but I’m still back on square one, and item one on my Listy Fifty, with a queasy feeling in the pit of my stomach about the quality and meaningfulness of my morphospace data.

What I need, of course, is to ignore this, and just forge ahead, and GRD. This is easier said than done, unfortunately.