Sebastián Pardo bio photo

Sebastián Pardo

Email Twitter LinkedIn Github   ORCID   ResearchGate   Google Scholar

The idea behind this post was born after a conversation with Simon Valdez-Juarez, a fellow graduate student at SFU. He mentioned an idea he had for creating a Shiny web app that allows people to see the evolutionary distinctiveness of bird species in their eBird checklists. Given that I have played around with wrangling eBird checklists to display summary stats of sightings by location and year (the myebird package, see a previous blog post about it here), we decided to join forces to create this website. The functions introduced below will be the backbone of such a project.

What is ED?

Evolutionary Distinctiveness (ED) is a measure of species’ relative contributions to phylogenetic diversity. It is calculated by dividing the total phylogenetic diversity of a clade among all its members. David Redding, a former PhD student at SFU, came up with the idea of ED for his M.Sc. thesis at the University of East Anglia. Simply put, the ED score of a species is the sum of all the branch lengths to the root of the tree, each divided by the number of species each branch section is shared among. I’ll illustrate this using four hypothetical species (A, B, C, and D). For each branch, the number above represents the length of the branch (in million years), while the number below the branch shows the number of species that share that branch, so ED is the sum of all these fractions from the root of the tree to each tip (Figure is adapted from Isaac et al. [2007]):

ED is a pretty straightforward metric that can help identify individual (perhaps flagship?) species of high evolutionary distinctiveness.This metric can be combined with a threat score based on the Red List status compiled by the International Union for the Conservation of Nature (IUCN). The Red List status of each species can be converted to a quantitative measure by converting qualitative status into ordered integers, in this case denoted as GE: thus a species assess as “Least Concern” would have a GE score of 0, a “Near Threatened” species would be scored as a 1, and so on up to a species assessed as “Critically Endangered”, which would have a score of 4. Thus, by using Evolutionary Distinctiveness (ED) and Global IUCN Red List status (GE) you can create a composite score of Evolutionary Distinctive and Globally Endangered (EDGE) species. The one proposed by Isaac et al. (2007) and subsequently used by Jetz et al. (2014) on birds is as follows:

\[EDGE = ln(ED + 1) + GE * ln(2)\]

The data from Jetz et al. (2014) include a bird phylogeny covering almost 10,000 species (available here) and their associated EDGE scores and ranks (available here); these data are the basis for these R functions and I would not have been able to create them without their work.

The myedge and mypd functions

To being playing with bird trees and your eBird data, you first have to install the myebird package from GitHub using devtools::install_github and load it in R:

devtools::install_github("sebpardo/myebird")
library(dplyr) # for piping and data wrangling
library(myebird)
library(ggplot2)

There are three functions in this package that I’ve already written about here (ebirdclean, myebird, and myebirdcumul). I have added two new functions: myedge and mypd, as well as a dataset named edge.

To use these functions, we first need to download all our data from eBird: an email with a download link will be sent to you after pressing “Sumbit” in the following eBird link (make sure you’re logged in to eBird!): http://ebird.org/ebird/downloadMyData. We then clean this csv file using the ebirdclean function:

mybirds <- ebirdclean("MyEBirdData.csv")

To obtain ED scores as well as EDGE scores and ranks of the species included in your eBird checklist, you can simply use the myedge function, whose first argument is the output provided by ebirdclean, and the second argument is edge.cutoff, which is the cutoff for species to be shown in terms of ranking (defaults to returning all species):

myedgebirds <-myedge(mybirds, edge.cutoff = 500)
myedgebirds %>% select(-sciName)
## # A tibble: 12 × 5
##                     comName  ED.Score EDGE.Score EDGE.Rank
##                       <chr>     <dbl>      <dbl>     <int>
## 1    Peruvian Diving-Petrel 28.418480   5.461065        29
## 2  Australian Painted-Snipe 26.901620   5.408126        32
## 3             Andean Condor 43.702267   4.493171       228
## 4          Marbled Murrelet  8.815285   4.363382       277
## 5       Hutton's Shearwater  8.341314   4.313888       293
## 6           Andean Flamingo 15.887414   4.212863       339
## 7              Magpie Goose 62.675689   4.153803       372
## 8                    Osprey 56.063247   4.044160       431
## 9                Great Knot 12.885775   4.017159       451
## 10         Beach Thick-knee 26.631221   4.012094       459
## 11   Yellow-nosed Albatross  5.730249   3.986054       471
## 12        Providence Petrel 11.894030   3.943059       500
## # ... with 1 more variables: sciName.edge <chr>

We can also easily subset to see the ED and EDGE ranks and scores for a single country or even region:

mybirds %>%
  filter(Country == "Chile") %>%
  myedge(edge.cutoff = 1000) %>%
  select(-sciName)
## # A tibble: 15 × 5
##                         comName  ED.Score EDGE.Score EDGE.Rank
##                           <chr>     <dbl>      <dbl>     <int>
## 1        Peruvian Diving-Petrel 28.418480   5.461065        29
## 2                 Andean Condor 43.702267   4.493171       228
## 3               Andean Flamingo 15.887414   4.212863       339
## 4        Black-browed Albatross  5.340944   3.926469       509
## 5     Diademed Sandpiper-Plover 23.787857   3.903501       525
## 6              Peruvian Pelican 22.821709   3.863744       553
## 7               Westland Petrel 10.252474   3.806882       599
## 8              Chilean Flamingo 20.040577   3.739600       657
## 9          White-chinned Petrel  8.965589   3.685432       704
## 10             Humboldt Penguin  8.335894   3.620161       754
## 11            Pied-billed Grebe 35.996441   3.610822       763
## 12 South American Painted-Snipe 34.544290   3.570780       805
## 13                Black Vulture 33.390317   3.537775       858
## 14             James's Flamingo 15.887414   3.519716       875
## 15              Royal Albatross  7.167664   3.486477       931
## # ... with 1 more variables: sciName.edge <chr>
mybirds %>%
  filter(County == "Metro Vancouver") %>%
  myedge(edge.cutoff = 1000) %>%
  select(-sciName)
## # A tibble: 6 × 5
##                  comName  ED.Score EDGE.Score EDGE.Rank
##                    <chr>     <dbl>      <dbl>     <int>
## 1       Marbled Murrelet  8.815285   4.363382       277
## 2                 Osprey 56.063247   4.044160       431
## 3      Red-throated Loon 49.484481   3.921666       511
## 4      Pied-billed Grebe 35.996441   3.610822       763
## 5       Long-tailed Duck  8.053587   3.589455       786
## 6 Golden-crowned Kinglet 32.860078   3.522237       873
## # ... with 1 more variables: sciName.edge <chr>

myedge also can provide a separate data frame for each country when used in conjunction with dplyr::do:

mybirds %>% 
  group_by(Country) %>% 
  do(myedge(., edge.cutoff = 500)) %>%
  select(-sciName)
## Source: local data frame [17 x 6]
## Groups: Country [6]
## 
##                     Country                  comName  ED.Score EDGE.Score
##                       <chr>                    <chr>     <dbl>      <dbl>
## 1                 Australia Australian Painted-Snipe 26.901620   5.408126
## 2                 Australia      Hutton's Shearwater  8.341314   4.313888
## 3                 Australia             Magpie Goose 62.675689   4.153803
## 4                 Australia                   Osprey 56.063247   4.044160
## 5                 Australia               Great Knot 12.885775   4.017159
## 6                 Australia         Beach Thick-knee 26.631221   4.012094
## 7                 Australia   Yellow-nosed Albatross  5.730249   3.986054
## 8                 Australia        Providence Petrel 11.894030   3.943059
## 9                    Canada         Marbled Murrelet  8.815285   4.363382
## 10                   Canada                   Osprey 56.063247   4.044160
## 11                    Chile   Peruvian Diving-Petrel 28.418480   5.461065
## 12                    Chile            Andean Condor 43.702267   4.493171
## 13                    Chile          Andean Flamingo 15.887414   4.212863
## 14               Costa Rica                   Osprey 56.063247   4.044160
## 15                Nicaragua                   Osprey 56.063247   4.044160
## 16 United States of America         Marbled Murrelet  8.815285   4.363382
## 17 United States of America                   Osprey 56.063247   4.044160
## # ... with 2 more variables: EDGE.Rank <int>, sciName.edge <chr>

Problems with ED

As it has been previously pointed out, there are some issues with composite indices of evolutionary distinctiveness. One important problem with ED estimates is that they can be very sensitive to splits or lumps at the species level. Let’s take the hypothetical example presented in the tree above, but this time using a taxonomy that considers species A to actually be three separate species (A, E, and F) which split around 0.2 million years (MY) before present:

Using this taxonomy, the ED score of species A drastically drops, even though it is still nested in a very long branch. One way around this is to change the focus from scoring individual species (which is sexy) and instead look at the phylogenetic diversity of a community of species (not as sexy).

Phylogenetic Diversity

One neat way to estimate phylogenetic diversity of a clade is to simply add all branches of a tree; this represents the total evolutionary time covered by the species in the clade/tree. Again with our hypothetical four species, if we wanted to calculate PD for species B and C:

Our estimate of PD for species B and C is 4 MY of evolution. Now, if we also include species D, our estimate of PD will increase only slightly as species C and D are closely related.

On the other hand, if we include species A instead of species D, the phylogenetic diversity of this group will be much higher, as including species A means a much longer branch of evolutionary time is now being added:

The function mypd does exactly this: it calculates the phylogenetic distance of the species in your eBird checklist based on the Jetz et al. phylogeny. It is very straightforward to run, however it averages values across 20 very large trees, and therefore is a bit slow to run (shouldn’t take more than a minute though):

mypdvalues <- mypd(mybirds)
mypdvalues
## # A tibble: 1 × 3
##    mean_pd median_pd    sd_pd
##      <dbl>     <dbl>    <dbl>
## 1 20917.93  20755.27 1247.857

This means that the combined phylogenetic distance of the species recorded in all my eBird checklists encompasses 20.8 billion years of evolution!

As with the myedge function, you can look at PD values by country:

mybirds %>%
  group_by(Country) %>%
  do(mypd(.))
## Source: local data frame [16 x 4]
## Groups: Country [16]
## 
##                             Country   mean_pd median_pd     sd_pd
##                               <chr>     <dbl>     <dbl>     <dbl>
## 1                         Argentina 2426.8959 2403.6887 121.87432
## 2                         Australia 7679.9299 7596.6293 466.10298
## 3                            Canada 6226.8165 6172.2301 357.44688
## 4                             Chile 5746.2029 5813.5036 277.16189
## 5                             China 1615.5454 1605.0135 104.20097
## 6                        Costa Rica 5331.2497 5273.0615 323.00325
## 7                           Estonia  782.8760  776.5314  55.08342
## 8                             Italy 2097.5834 2085.1940 133.16929
## 9  Lao People's Democratic Republic  576.1632  573.7042  39.61243
## 10                         Malaysia 1473.7752 1458.2762  94.28255
## 11                      Netherlands 2495.7160 2466.9962 140.25366
## 12                        Nicaragua 1569.0076 1567.9522  96.66135
## 13                        Singapore 1615.6165 1592.0610 108.28689
## 14                         Thailand 3207.9384 3154.9717 242.42609
## 15         United States of America 6125.7561 6052.1918 336.01728
## 16                          Vanuatu 1282.8242 1281.9231  80.43118

and even by county/region within a state or province (in this example, regions within British Columbia):

mybirds %>%
  filter(State.Province == "CA-BC") %>%
  group_by(County) %>%
  do(mypd(.))
## Source: local data frame [15 x 4]
## Groups: County [15]
## 
##                    County   mean_pd median_pd     sd_pd
##                     <chr>     <dbl>     <dbl>     <dbl>
## 1       Alberni-Clayoquot 1151.9561 1140.2276  63.73931
## 2                 Capital 2981.2589 2970.1906 160.05197
## 3        Central Okanagan 2367.7571 2347.5893 142.14835
## 4        Columbia-Shuswap 1442.1585 1444.1457  70.47375
## 5        Comox-Strathcona 1068.2720 1066.9558  44.22091
## 6           Fraser Valley 2447.5208 2436.1617 134.31824
## 7       Kootenay Boundary 1797.7464 1768.6904 116.87098
## 8         Metro Vancouver 5505.4194 5458.0431 315.90021
## 9        Mount Waddington 1232.4101 1223.7951  71.03491
## 10                Nanaimo  768.0895  762.9666  43.55813
## 11   Okanagan-Similkameen 3934.0560 3907.4926 234.13345
## 12 Skeena-Queen Charlotte  354.8324  350.6889  19.17975
## 13      Squamish-Lillooet 1825.5306 1803.6692 114.35689
## 14         Sunshine Coast 1528.9859 1511.2885  77.00035
## 15        Thompson-Nicola 2503.5555 2483.0836 147.10740

Are ED and PD correlated?

Arne Mooers, who is a co-author of the bird tree paper, mentioned to me that there is often a strong correlation between ED and PD across species lists. With these two functions, it is easy to look at the relationship between PD and sum of EDs for a bunch of different lists, for example, ED and PD values by country:

# calculating ED by country
edcountry <- mybirds %>% 
  group_by(Country) %>% 
  do(myedge(.)) %>%
  select(-sciName) %>%
  summarise(EDs = sum(ED.Score), n = n())

# calculating PD by country
pdcountry <- mybirds %>%
  group_by(Country) %>%
  do(mypd(.))

edpdcountry <- left_join(edcountry, pdcountry, by = "Country")

# fitting linear model and extracting coefficients to print on plot
m <- lm(EDs ~ median_pd, edpdcountry)
rout <- list(paste('Fitted model: ', round(coef(m)[1], 3), ' + ',
                               round(coef(m)[2], 3), '* PD', sep = ''),
              paste('R^2 == ', round(summary(m)[['r.squared']], 3),
                    sep = '')) 

ggplot(data = edpdcountry, aes(median_pd, EDs, size = n)) + 
  geom_point(color = "grey60") +
  ggtitle("ED and PD by country") +
  geom_smooth(method = "lm", show.legend = FALSE) +
    geom_text(aes(label = Country), size = 4, 
            nudge_y = -100, check_overlap = TRUE) +
  geom_label(aes(x = 500, y = 3000, label = rout[[1]]), hjust = 0,
             size = 5,  inherit.aes = FALSE) +
  geom_label(aes(x = 500, y = 2600, label = rout[[2]]), hjust = 0, size = 5,
            inherit.aes = FALSE, parse = TRUE) 

At least in the example above, PD and ED are strongly correlated, with an \(R^2\) of 0.98 and a coefficient of 0.45, which indicates that the sum of ED scores for a group of species is roughly half of the PD covered by that same group.


Lastly, the edge object provides a data frame that is useful for the two previous functions as it links the Jetz et al. (2014) phylogeny with your eBird checklist. You can peruse it independently, as it shows updated scientific names, common names, and EDGE scores and ranks as reported by the Jetz et al. paper:

edge
## # A tibble: 9,993 × 6
##                    sciName                       comName ED.Score
##                      <chr>                         <chr>    <dbl>
## 1       Pseudibis gigantea                    Giant Ibis 56.62592
## 2        Aegotheles savesi New Caledonian Owlet-nightjar 40.26083
## 3  Gymnogyps californianus             California Condor 33.39032
## 4      Strigops habroptila                        Kakapo 32.37154
## 5      Rhynochetos jubatus                          Kagu 55.38186
## 6  Houbaropsis bengalensis               Bengal Florican 24.27388
## 7          Athene blewitti                  Forest Owlet 24.08449
## 8    Pithecophaga jefferyi              Philippine Eagle 23.63732
## 9         Fregata andrewsi  Christmas Island Frigatebird 22.67818
## 10     Carpococcyx viridis        Sumatran Ground-cuckoo 21.15445
## # ... with 9,983 more rows, and 3 more variables: EDGE.Score <dbl>,
## #   EDGE.Rank <int>, sciName.edge <chr>

The species with the highest EDGE score is the Giant Ibis, which is classified as Critically Endangered by the IUCN (GE = 4) and also has the highest ED score among all Critically Endangered bird species.

Important note: There are many discrepancies between the taxonomy used in the Jetz paper and that one used in eBird; the latter even gets updated yearly! I’ve wrangled both species lists to try and match as many species as possible (you can see the code here), however the coverage is not perfect as there are many species which have been recently split and can’t be matched either by common or scientific name. Unfortunately, any species in your eBird checklists that can’t be matched to the Jetz phylogeny are ignored by myedge and mypd, yet there isn’t many of them (~50 out of 1300+ species in my checklist) however, this shouldn’t affect (too much) the species with highest EDGE scores or considerably bias PD estimates.

The Giant Ibis (Thamantibis gigantea or Pseudibis gigantea, depending on who you ask) is the bird species with the highest EDGE score. Illustration by Henrik Grönvold - 1911 / Public Domain

Bugs/Problems?

This package is still a work in progress, so if you find any issues/bugs/problems please let me know. There are a few more features I’d like to add (e.g., graphing functions, plotting location of sightings on a map, to name a few); but any ideas are welcome!

References

Isaac, N. J. B., Turvey, S. T., Collen, B., Waterman, C. & Baillie, J. E. M. (2007) Mammals on the EDGE: Conservation Priorities Based on Threat and Phylogeny. PLOS ONE, 2, e296. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0000296

Jetz, W., Thomas, G. H., Joy, J. B., Redding, D. W., Hartmann, K. & Mooers, A. O. (2014) Global Distribution and Conservation of Evolutionary Distinctness in Birds. Current Biology, 24, 919–930. http://www.cell.com/current-biology/fulltext/S0960-9822(14)00270-X