You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

March Madness Midpoint

ø

Zoiks. Best not to think about it too much.

In spite of being on the mend, still felt pretty grotty and exhausted in the morning, and consequently didn’t arrive at the office until well after 10 am. Fired off emails to Scott Edwards and Zoe—setting up a meeting with the latter for Friday morning (the former is out of the office for a while).

In the afternoon, helped Wil with a stats problem. He was trying to do a nonlinear regression in Excel (a nightmare), and I was able to figure it out in R in less than an hour. I was quite proud of myself and felt very useful—once again, it’s proving an invaluable skill to have under my belt. I did hesitate for a moment, thinking I shouldn’t take more time away from work, but since I was mostly stewing about how to write up my chapters, I figured I wasn’t that productive anyway, and that model fitting was something I really ought to know how to do in R. I promised myself I’d take no longer than an hour and then give up.  That turned out to be ample time.

I spent the rest of the working day reading two of Mike Foote’s papers, one on “Rarefaction analysis of morphological and taxonomic diversity”, which I ended up not concentrating on all too deeply, and the other on the Paleozoic crinoid morphospace, which I read with much more attention than I had given it before. Some thoughts:

  • The rarefaction paper does classical rarefaction, (none of the fancier Alroy algorithms, which I suppose post-date that paper); what’s more, it rarefies by species, not by occurrence—because of course Foote isn’t working off of an occurrence-level PBDB type database. This is of course good news for me—I think my study might be (?) the first time someone’s actually populating a morphospace with occurrence-level data in the time dimension. That allows me to subsample/rarefy by occurrence.
  • The 1995 crinoid morphospace paper is full of cross-references to other Foote papers on the crinoid morphospace. Basically, his crinoid morphospace project is a ~5-paper monster split into bits. That makes it really, really hard to read—tricky to understand one without having read all the others. I realized in the reading that I am veering into that direction with my paper, and it’s something I’d prefer to avoid. What’s the point of doing all that detailed work if nobody’s going to understand it because it’s presented in a scattered and opaque manner?
  • His work is almost all at the abstract, morphospace/disparity/diversity meta-level. Rarely does biology, function, or phylogeny enter into the discussion. This is, on the one hand, heartening—if he can do it, so can I. But it’s also unsatisfying.
  • Where he does touch on biology: early on, he sets up the idea of “morphological constraint” in crinoids, i.e. the idea that crinoids early on hit some sort of intrinsic limit on morphological diversity, and the subsequently just evolve about in a constrained space. This acts as a straw man of sorts for developing his arcane mathematical manipulations of the morphospace data.
  • In toto, I’m not entirely enamored of using Foote’s paper(s) as a model for my own.

March Madness, Day 2

ø

Gave Kati a ride to work and planted myself in the shiny new law school building in the morning. I had written a note to myself last night when I stopped working (at about 10:30, a March-madness-worthy hour) with three tasks for the start of my next day, but of course I forgot to bring that in the rush of the morning. Perhaps I can remember:

  1. Fix the genus richness plot to remove bins with zero diversity (replace with NA).
  2. Implement the species-level diversity calculation for UW.
  3. Add code to the genus-level morphospace subsampling to keep track of genus diversity.

Following this list, I pretty much nailed the morphospace subsampling exercise this morning, with only moments to spare before my battery failed at the law school and I migrated to the office. The subsampling takes a while to run, perhaps long enough to warrant running it on the cluster. 100 iterations of the morphospace subsampling and 200 iterations of the species diversity run for a couple of minutes. To get a really reasonable result I guess I should probably run it for, on the order of, 10,000 iterations. So that would be a hundred times a couple of minutes, in other words, on the order of hours. Even 1000 minutes is just 18 hours, which is just fine by me in terms of computing time on my laptop, I can just have it running while I sleep.

I really should, however, plot error bars on my subsampled data to give a sense for the range of variation in subsamples. Spent some time wondering about what a 95% confidence interval meant for these things until Tinker explained that it was literally just the range in which 95% of the subsamples fall, which is so obvious it’s embarrassing I even had to ask. Anyway, that makes it fairly easy to plot the error bars, I think. Well, so I thought. It ended up taking up the rest of the afternoon until 4:15 to come up with this, although it finally is a plot of my diversity/disparity measures as calculated for UW subsamples (100 morphospace, 200 species diversity replicates):

Now, alas, I have to stop—I promised the museum I would volunteer at tomorrow’s paleo fest and I need to spend the rest of the hour before set-up starts reviewing my paleobotany notes so that I’m not completely and utterly unprepared when it comes to talking about the plant fossils I’ll be standing behind. All in all, a busy and successful day. One more plot out of the way (I think—I could try and make confidence intervals for the other two panels, but it’s been so much damn work I’m not going to do it unless someone forces me to).

March Madness Begins: Working Towards a Subsampling Solution

ø

From the beginning of the day:

A fresh day, a fresh start. Dark grey skies and heavy “wintery mix” falling outside—perfect conditions to hole up and work hard. Where to start today? I’m torn between working on the rarefaction/UW subsampling, so that I can finish off the disparity-diversity comparison plots, and working on the “all characters” through time plot, or the associated exercise of picking a few characters, and the associated exercise of figuring out which characters are actually responsible for increases in morphospace area expansion, which would address the looming question of what the biological story might be. The latter is more important in some ways, I think, but it also has the most potential for going off the rails (in the sense that I’m going to need to do a fair bit of reading). So let me start with the subsampling, and get it done, and then move on to the character-specific story.

From the end of the day:

A very brief report, because there are no results yet—but I am getting close—from what I’ve been working on today. On account of the gruesome weather and a desire to focus, I stayed at home, and it was a good day. Put in a very decent effort and made some serious headway in adapting the existing code for generating the diversity/disparity plot so that it will perform by-list, unweighted subsampling for the morphospace. The whole thing isn’t working yet, but a tantalizing glimpse of the subsampled values for convex hulls and alpha volumes looked like the curve flattens out completely under subsampling—suggesting that the increase over time might be an artifact of sampling.