Basic of Autocovariance, Autocorrelation and Partial Autocorrelation explained.
Hello guys,
Today i am going to explain about Autocovariance, Autocorrelation and partial Autocorrelation. How get them in python.
In Time series analysis(TSA) this concept is must, so lets start.
Before knowing about Autocovariance, Autocorrelation better to know about correlation and covariance let me explain about covariance and correlation first(intro).{ refereed here}
Covariance is defined as the variance of two variables. To calculate the covariance between x and y or x1 and x2 below formula can be used.
Correlation is defined as how much increase(decrease) in one variable causes the other variable to increase(decrease) or vice-versa. E.g. (salary increases with experience). And it can be calculated as
Now going into time series data.
Autocovariance in time series data.
Autocovariance.
Autocovariance is defined as the covariance between the present value (xt) with the previous value (xt-1) and the present value (xt) with (xt-2). And it is denoted as ϒ. Here Mean will not change if it is a stationary time series. so formula will become
Autocovariance (auto means itself) of (xt) and (xt-1) is defined as covariance between same variable with different values.
Special case k=0 covariance becomes variance. and k is called lags
Calculation of autocovariance in python
import pandas as pd
import numpy as np
import statsmodels.tsa.api as smt#loading Data
data = pd.read_csv("https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-min-temperatures.csv",index_col="Date")
data.index = pd.to_datetime(data.index)
print(data.shape)
data.head()output:
(3650, 1)Date Temp 1981-01-01 20.7
1981-01-02 17.9
1981-01-03 18.8
1981-01-04 14.6
1981-01-05 15.8
Calculate Autocovariance of Temp columnsmt.stattools.acovf(data[:20]) #autocovariance upto 20 valuesOuutput:
array([ 1.010510e+01, 4.408355e+00, -6.342900e-01, -1.495935e+00, -1.316180e+00, -1.443475e+00, -4.288200e-01, 7.155850e-01, 1.831690e+00, 1.372950e-01, -2.870950e+00, -3.624195e+00, -1.485090e+00, -5.879850e-01, -4.358000e-02, 8.315750e-01, 6.112800e-01, 6.655350e-01, 5.390000e-03, -3.287550e-01])
Autocorrelation.
In time series we will deal with variables w.r.t time like sales of a company over the years (predicting feature temperature, ozone level etc) . While predicting the feature sales of a company the past sales will impact more on the feature sales than the previous one. Then finding a correlation between the present(xt) and the previous sales(xt-1) and then with (xt) and (xt-2), (xt-3) etc… to find correlation in the same column we use autocorrelation.
Autocorrelation can be defined as a the correlation between itself and the other values of same variable(features) (in our case correlation between (Xt and Xt-1) (Xt and Xt-2). etc…) and it is denoted as ρ.
Autocorrelation function(ACF) of time series is defined as,
Autocorrelation of k terms can be defined as.
Special case k=0 correlation value is 1. and k is called lags
will plot a Autocorrelation plot in python for the same data with different lags.
lags = [10,50,100,200,500,1500,3000]
for l in lags:
titles = "'Autocorrelation with lags "+ str(l)
smt.graphics.plot_acf(data,lags=l,alpha=0.05,title=titles)OUTPUT:
Partial autocorrelation(PACF)
In autocorrelation we find correlation between present(xt) and next (xt-1) values. In Partial Autocorrelation is finding the correlation between present(xt) random lags value (xt-h) so, the correlation in the middle values like (xt-1) (xt-2) (xt-3) …. (xt-(h-1)) will not be taken into account.
The partial autocorrelation at lag k is the correlation that results after removing the effect of any correlations due to the terms at shorter lags.
lags = [10,50,100,200]
for l in lags:
titles = "Partial Autocorrelation with lags "+ str(l)
smt.graphics.plot_acf(data,lags=l,alpha=0.05,title=titles)OUTPUT:
I hope you really enjoyed this article — please leave your feedback and suggestions below.
Thanks for reading.
Reference-
- machinelearningmastery
- Nallagoni Omkar(Data Scientist | ML & DL Trainer)
- Introductory Time Series with R by Paul S.P. Cowpertwait (Author), Andrew V. Metcalfe (Contributor)