Species Diversity¶
EcoPy contains several methods for estimating species diversity:
diversity()
rarefy()
 :py:func`div_partition`
 :py:func`beta_dispersion`

diversity
(x, method='shannon', breakNA=True, num_equiv=True)¶ Calculate species diversity for every site in a site x species matrix
Parameters
 x: numpy.ndarray or pandas.DataFrame (required)
 A site x species matrix, where sites are rows and columns are species.
 method: [‘shannon’  ‘ginisimpson’  ‘simpson’  ‘dominance’  ‘spRich’  ‘even’]
shannon: Calculates Shannon’s H
where is the relative abundance of species k
ginisimpson: Calculates the GiniSimpson coefficient
simpon: Calculates Simpson’s D
dominance: Dominance index.
spRich: Species richness (# of nonzero columns)
even: Evenness of a site. Shannon’s H divided by log of species richness.
 breakNA: [True  False]
 Whether null values should halt the process. If False, then null values are removed from all calculations.
 num_equiv: [True  False]
Whether or not species diversity is returned in numberequivalents, which has better properties than raw diversity. Number equivalents are calculated as follows:
shannon:
ginisimpson:
simpson:
‘spRich’: No conversion needed.
Example
Calculate Shannon diversity of the ‘varespec’ dataset from R:
import ecopy as ep varespec = ep.load_data('varespec') shannonH = ep.diversity(varespec, 'shannon')

rarefy
(x, method='rarefy', size=None, breakNA=True)¶ Returns either rarefied species richness or draws a rarefaction curve for each row. Rarefied species richness is calculated based on the smallest sample (default) or allows userspecified sample sizes.
Parameters
 x: numpy.ndarray or pandas.DataFrame (required)
 A site x species matrix, where sites are rows and columns are species.
 method: [‘rarefy’  ‘rarecurve’]
rarefy: Returns rarefied species richness.
where N is the total number of individuals in the site, is the number of individuals of species i, and size is the sample size for rarefaction
rarecurve: Plots a rarefaction curve for each site (row). The curve is calculated as
where is the total number of species in the matrix and size ranges from 0 to the total number of individuals in each site.
Example
Calculate rarefied species richness for the BCI dataset:
import ecopy as ep BCI = ep.load_data('BCI') rareRich = ep.rarefy(BCI, 'rarefy')
Show rarefaction curves for each site:
ep.rarefy(BCI, 'rarecurve')

div_partition
(x, method='shannon', breakNA=True, weights=None)¶ Partitions diversity into alpha, beta, and gamma components. First, diversity is calculated for each site (see
diversity()
). Then, a weighted average of each diversity metric is calculated to yield an average alpha diversity. This average alpha diversity is converted to number equivalents (seediversity()
). Next, gamma diversity is calculated using the species totals from the entire matrix (i.e. summing down columns) and converted to a number equivalent . Beta diversity is then:Parameters
 x: numpy.ndarray or pandas.DataFrame (required)
 A site x species matrix, where sites are rows and columns are species.
 method: [‘shannon’  ‘ginisimpson’  ‘simpson’  ‘spRich’]
 See
diversity()
 breakNA: [True  False]
 Whether null values should halt the process. If False, then null values are removed from all calculations.
 weights: list or np.ndarray
 Weights given for each row (site). Defaults to the sum of each row divided by the sum of the matrix. This yields weights based on the number of individuals in a site for raw abundance data or equal weights for relative abundance data.
Example
Partition diversity into alpha, beta, and gamma components for the ‘varespec’ data:
import ecopy as ep varespec = ep.load_data('varespec') D_alpha, D_beta, D_gamma = ep.div_partition(varespec, 'shannon')

beta_dispersion
(X, groups, test='anova', scores=False, center='median', n_iter=99)¶ Calculate beta dispersion among groups for a given distance matrix. First, the data is subject to transformation as described in PCoA. Next, eigenvalues and eigenvectors are calculated for the transformed matrix. Eigenvectors are split into two matrices, those pertaining to nonnegative eigenvalues and those pertaining to negative eigenvalues. Next, centroids for the positive and negative eigenvector matrices are determined (using spatial_median if center=’median’). Zdistances are calculated as:
where is the squared euclidean distance between observation ij and the center of group i, and +/ denote the nonnegative and negative eigenvector matrices.
A oneway ANOVA is conducted on the zdistances.
Parameters
 X: numpy.ndarray, pandas.DataFrame
 A square, symmetric distance matrix
 groups: list, pandas.Series, pandas.DataFrame
 A column or list containing the group identification for each observation
 test: [‘anova’  ‘permute’]
 Whether significance is calculated using the ANOVA approximation or permutation of residuals.
 scores: [True  False]
 Whether or not the calculated zdistances should be returned for subsequent plotting
 center: [‘median’  ‘centroid’]
 Which central tendency should be used for calculating zdistances.
 n_iter: integer
 Number of iterations for the permutation test
Example
Conduct beta dispersion test on the ‘varespec’ dataset from R:
varespec = ep.load_data('varespec') dist = ep.distance(varespec, method='bray') groups = ['grazed']*16 + ['ungrazed']*8 ep.beta_dispersion(dist, groups, test='permute', center='median', scores=False)