Quantifying inherent predictability and spatial synchrony in the aphid vector Myzus persicae: field‐scale patterns of abundance and regional forecasting error in the UK

Abstract Background Sugar beet is threatened by virus yellows, a disease complex vectored by aphids that reduces sugar content. We present an analysis of Myzus persicae population dynamics with and without neonicotinoid seed treatment. We use 6 years' yellow water trap and field‐collected aphid data and two decades of 12.2 m suction‐trap aphid migration data. We investigate both spatial synchrony and forecasting error to understand the structure and spatial scale of field counts and why forecasting aphid migrants lacks accuracy. Our aim is to derive statistical parameters to inform regionwide pest management strategies. Results Spatial synchrony, indicating the coincident change in counts across the region over time, is rarely present and is best described as stochastic. Uniquely, early season field populations in 2019 did show spatial synchrony to 90 km compared to the overall average weekly correlation length of 23 km. However, 70% of the time series were spatially heterogenous, indicating patchy between‐field dynamics. Field counts lacked the same seasonal trend and did not peak in the same week. Forecasts tended to under‐predict mid‐season log10 counts. A strongly negative correlation between forecasting error and the proportion of zeros was shown. Conclusion Field populations are unpredictable and stochastic, regardless of neonicotinoid seed treatment. This outcome presents a problem for decision‐support that cannot usefully provide a single regionwide solution. Weighted permutation entropy inferred that M. persicae 12.2 m suction‐trap time series had moderate to low intrinsic predictability. Early warning using a migration model tended to predict counts at lower levels than observed. © 2022 The Authors. Pest Management Science published by John Wiley & Sons Ltd on behalf of Society of Chemical Industry.

There are two forms of the spline correlogram: • Univariate -the analysis is conducted on a vector representing a single week using a step function.This step function is a method for calculating the local spatial covariance across a range of focal distances, to then approximate its continuous function over the entire spatial extent as a spline.The bandwidth parameter h adjusts the smoothness of the spline based on those local distances, that itself is determined by the distance class width parameter λ.The function uses the scalar product on Euclidean distances, a 2-dimensional calculation for parallelism.The overall spatial dependence is measured by Moran's I.
• Multivariate -as above but the analysis is conducted on a set of vectors, representing multiple sampling weeks that have identical dimensions, to estimate a cross-correlogram.These correlograms are first differenced time series of abundance (i.e. using the difference between successive observations).In essence, this is a crossproduct statistic, a 3-dimensional calculation that reports the size of the parallelism in Euclidean space between a variable and its spatial lags, with the variable expressed in deviations from its mean.The smooth is on the (lower.tri+ diagonal) and centred (default) Pearson correlation matrix on distance.The overall spatial dependence is measured by a centred Mantel statistic.

Spatial Synchrony Theory: Covariance
Local covariance is moderate Decays to regional synchrony over entire distance.

3.
#seasonal gamm spatial with temporally independent model # more detailed model of the season by week using a factor smooth approach # y ~ f+ s(x, by =f).Separate smooth functions for each level of the factor which is centred # and does not include the group means.Must include f, the factor separately because # parametric term f includes the uncentred means which becomes testable.# Duchon splines bs ='ds' cope better with the spatial boundaries of a model, preventing # curling at the edges spatialgam.week<-gam(Myzus~fWeek+ s(Lon, Lat, bs= 'ds', by=fWeek, k=32)+ s(fsite,bs="re"), family = nb, method = 'REML', data = YWT.2019.gam)draw(spatialgam.week)summary (spatialgam.week)

S1b: Forecasting Error
• In a retrospective study, we used simple linear regression to predict the number of aphids caught in the Broom's Barn suction-trap to 17 th June from average daily mean temperatures over the 59-day period 1 st January to 28 th February inclusive.
• Abundances were predicted for each of 2002 to 2021 in turn, with each regression including abundance and temperature data from 1965 up to the year previous to the prediction year (so N = 37…55 for 2002…2021, respectively).In this way the most up to date aphid information was used to produce each subsequent forecast.
• All observed counts were logged (base 10 after adding an offset of unity to cope with zeros) before analysis.The adjusted coefficient of determination (r 2 adj ) ranged between 0.552 (2019) and 0.664 (2003).
• For each prediction year, the forecasting error was defined as the difference between the observed log count to the 17 th June, derived from the 12.2 m suction-trap, and the predicted log count to the same date, derived from the linear model (Figure S1a).

FIGURE S1a
Predicted

S1c Permutation Entropy
• Entropy has a perfect inverse relationship with knowledge, where the more knowledge there is, the lower the entropy and the easier it is to produce a prediction for the system • Permutation entropy (PE) is a type of entropy measure • Model free method approximating the rate at which new information is generated along a time series and how this is transmitted from past states to the present.Define a window length, that slides along a time series.
• In our example, the window length (m) is short (3days) numerical phrase is being sought using ordinal ranks.
• It will permute or slide the window down 1 to the next phrase ….. x 3, x 1 x 2 , which yields 3,1,2 number sequence etc • The frequency distribution of these phrases then allows a measure of the stochastic/deterministic components.
• library(statcomp) in R the function weighted_ordinal_pattern_distribution(x=x, ndemb = 3) were used to computed unweighted and weight PE respectively.
FIGURE S1bAccumulation of M. persicae caught at Brooms Barn between late March and end August each year from 2002-2021.

TABLE S1a .
Summary of results from the retrospective forecasting error study.Observed and predicted counts of M. persicae to 17 th June are on log 10 scale (after adding an offset of unity).Forecast error is the difference between the observed and predicted logged counts.Intercept and slope are the parameters of simple linear regressions relating historical M. persicae counts to 17 th June to average January-February daily mean temperatures.