Front page|TimeSeries - Time Series Analysis in Python (0.2)

2. Reference guide

2.1. TimeSeries

2.1.1. Some terminology

  • data value: single, scalar value recorded at a specific time
  • data samples:one or more values associated with a specific time. number of data samples in a time series is the same as thr length of the time vector

2.1.2. todo

2.1.2.1. Methods to implement in timeseries

  • addsample Add a data sample to a timeseries object.
  • append Concatenate timeseries objects in the time dimension.
  • delsample Delete a sample from a timeseries object.
  • detrend Subtract the mean or best-fit line and remove all NaNs from time-series data.
  • filter Shape frequency content of time-series data using a 1-D digital filter.
  • getinterpmethod Get the interpolation method for a timeseries object.
  • getsampleusingtime Extract data samples from an existing timeseries object into a new timeseries object based on specified start and end time values.
  • idealfilter Apply an ideal pass or notch (noncausal) filter to a timeseries object.
  • resample elect or interpolate data in a timeseries object using a new time vector.
  • setinterpmethod et interpolation method for a timeseries object.
  • synchronize Synchronize and resample two timeseries objects using a common time vector.
class Event(date=None, text=None, value=None, short_text=None)

Event class

>>> import datetime
>>> d1 = datetime.datetime(2000, 1, 1)
>>> event = Event(d1, "event 1", 10)
>>> event.value
10
class Events

List of events

Inherits from dict. Simply check that the object added is an instance of Event.

Each key is either a from event.short_text or from a counter.

2 methods to remove or get elements that are just aliases to del and get method of dictionary class.

>>> import datetime
>>> import datetime
>>> d1 = datetime.datetime(2000, 1, 1)
>>> event1 = Event(d1, "event 1", 10, 'ev1')
>>> d2 = datetime.datetime(2000, 10, 1)
>>> event2 = Event(d2, "event 2", 10, 'ev2')
>>> events = Events()
>>> events.addevent(d1)
>>> events.addevent(d2)
>>> del events['ev1']
>>> events.keys()
['ev2']
addevent(event)
class TSCollections
add_timeseries(ts)
class TimeRange(start, end, num=None, step=None, frequency=None)

d1 = datetime.datetime(2010,1,1) d2 = datetime.datetime(2010,2,1)

class TimeSeries(data, name=None, time=None, units=None, step=1, events=None, dataquality=None, interpolation=None, start=None, end=None, frequency=None)

TimeSeries stores data and time vectors.

The time vector length (if provided) must be the same as data vector. If the time values are date strings, you must specify Time as a cell array of date strings.

If the time vector contains duplicate values:
  • Duplicated values must occupy contiguous elements.
  • Time values must not be decreasing.

Interpolating time-series data using methods like resample and synchronize can produce different results depending on whether the input timeseries contains duplicate times.

Default: A time vector that ranges from 0 to N-1 with a 1-second interval, where N is the number of samples. In such case, the time vector is said to be relative. If startdate is provided, then it is absolute.

The attribute data contains a wrapping version of the input parameter data into a numpy.array So, data contains all methods provided by numpy.array. For example, data.mean(). However, we creates attributes for some of the standard descriptive statistics. So:

>>> ts = TimeSeries([1,2,0,1])
>>> ts.mean
2.

is equivalent to:

>>> ts = TimeSeries([1,2,0,1])
>>> ts.mean()
2.
>>> ts = TimeSeries([-1,1,-2,2,5])
>>> ts.time
[0,1,2,3,4]
>>> ts.mean
1.0

In addition to data and time values, you can also use the time-series object to store events, descriptive information about data and time, data quality, and the interpolation method.

Data Sample

if start and end are not provided, time range is (0, N*step, N) if start is provided but not end, time range is (start, start+step*N, N) if start and end provided, time range is (start, end, N)
N

data sample size

addSample(data)
data
getData()
getDataSampleSize()
getIQR()
getMAX()
getMEAN()
getMEDIAN()
getMIN()
getN()
getSTD()
getVAR()
gettsafteratevent(label)

Create a new timeseries object by extracting the samples from an existing time series that occur after or at a specified event.

gettsatevent()

Create a new timeseries object by extracting the samples that occur at the same time as a specified event from an existing time series.

gettsbeforeatevent(label)

gettsbeforeevent Create a new timeseries object by extracting the samples that occur before a specified event from an existing time series.

gettsbetweenevents(label1, label2)

Create a new timeseries object by extracting the samples that occur between two specified events from an existing time series.

from timeseries import *
d1 = datetime.datetime(2010,1,1)
d2 = datetime.datetime(2011,1,1)
fd = FinancialData('MT.PA', d1, d2)

t1 = TimeSeries(fd.data.low, time=fd.data.date)
event1 = Event(datetime.datetime(2010, 11, 8), "event1", t1.data[10], "ev1")
event2 = Event(datetime.datetime(2010, 12, 8), "event2", t1.data[32], "ev2")
t1.events.addevent(event1)
t1.events.addevent(event2)

t1.plot()
t2 = t1.gettsbetweenevents('ev1', 'ev2')
t2.plot('xg-', keep=True)  # to not erase the previous plot

[hires.png, pdf]

../_images/de84e1862a4.png
hist(bins=10)

Simple histogram using pylab.hist

iqr

Return the iqr of timeseries data.

max

Return the maximum value of timeseries data.

mean

Return the mean of timeseries data.

median

Return the median of timeseries data.

min

Return the minimum value of timeseries data.

plot(*args, **kargs)

kargs withevents bool

events_properties todo

setData(data)
std

Return the standard deviation of timeseries data.

step
var

Return the var of timeseries data.

addmonth(date)
ar(values, errors, alpha)

An autoregressive time series process has the following form:

y_t = \alpha_0 + \alpha_1 y_{t-1} + \dots + \alpha_n y_{t-n} + \epsilon_t

arch(values, errors, alpha)

An autoregressive conditional heteroscedastic (ARCH) process has the following form

y_t = \sigma_t \epsilon_t

\sigma_t = \alpha_0 + \alpha_1 y^2_{t-1} + \dots + \alpha_n y^2_{t-n}

arma(values, errors, alpha, beta)

AR and MA processes can be combined to obtain an ARMA-process:

y_t = \alpha_0 + \alpha_1 y_{t-1} + \dots + \alpha_n y_{t-n} + \beta_1 \epsilon_{t-1} + \dots + \beta_m\epsilon_{t-m} + \epsilon_t

Such an ARMA time series can be created with the following code:

import numpy
n = 10
mu = 0
sig = 1
errors = numpy.random.normal(mu, sig, n)
n_ar = 3
alpha = numpy.random.uniform(0,1,n_ar)
n_ma = 2
beta = numpy.random.uniform(0,1,n_ma)
values = numpy.zeros(n)
arma(values, errors, alpha, beta)
garch(values, errors, alpha, beta)

GARCH provess

ARCH process can be extended to a general autoregressive conditional heteroscedastic (GARCH) process by incorporating also laged values of

y_t = \sigma_t \epsilon_t

\sigma_t = \alpha_0 + \alpha_1 y^2_{t-1} + \dots + \alpha_n y^2_{t-n} + \beta_1 \sigma_{t-1} + \dots + \beta_m\sigma_{t-m} + \epsilon_t

import numpy
import math
n = 10
mu = 0
sig = 1
errors = numpy.random.normal(mu, sig, n)
n_a = 2
alpha = numpy.random.uniform(0,1,n_a)
n_b = 2
beta = numpy.random.uniform(0,1,n_b)
values = numpy.zeros(n)
sigma2 = numpy.zeros(n)
garch(values, errors, alpha, beta)
ma(values, errors, beta)

A moving average time series process has the form:

y_t = \beta_0 + \beta_1 \epsilon_{t-1} + \dots + \beta_n\epsilon_{t-n} + \epsilon_t

timeConvertor(date)

Convert an input into a valid datetime instance.

If the input is already a datetime, just return it. If the input is a string, the format may be :

dd-mm-yyyy dd:mm:yyyy yyyy:mm:dd yyyy-mm-dd dd/mm/yyyy yyyy/mm/dd

Note that month is always between year and days.

>>> d1 = timeConvertor('2000-12-31')
>>> d2 = timeConvertor('31-12-2000')
>>> assert d1 == d2
True
>>> d1 = timeConvertor('2000:12:31')
>>> d2 = timeConvertor('2000/12/31')
>>> assert d1 == d2
True

2.2. Data sets

get_imcenfant_data()

#imcenfant.csv #Description Un echantillon de dossiers d’enfants a ete saisi. Ce sont des enfants vus lors d’une visite en 1ere section de maternelle en 1996-1997 dans des ecoles de Bordeaux (Gironde, France). L’echantillon est constitue de 152 enfants ages de 3 ou 4 ans. #tableau descriptif du jeu de donnees #sexe, f or g, #ecole situe en zone prioritaire oui (O) non (N) # poids # age annee # age mois # taille(cm)

get_m30_data()

fatalities on the road frequency of 30 days

Source :[Aragon2010]
get_nottem_data()
from pylab import *
from timeseries import *
ts = get_nottem_data()
ts.plot()

[hires.png, pdf]

../_images/1c25fc23b82.png
Source :[Aragon2010]
get_popfr_data()

French population over time.

returns a TimeSeries instance

Source :[Aragon2010]

2.3. Financial Data

class FinancialData(value, d1, d2)

Class to get financial data and create summary plots.

import datetime
from timeseries import FinancialData
d1 = datetime.datetime(2010,1,1)
d2 = datetime.datetime(2011,1,1)
fd = FinancialData('MT.PA', d1, d2)
fd.plot_summary()

[hires.png, pdf]

../_images/1a8f1816ba4.png

Uses matplotlib.finance to get the data from yahoo.

Parameters:
  • value – a valid string e.g. “google”, ‘arcelor’, ...
  • d1 – a valid datetime
  • d2 – a valid datetime
Attributes :

d1, d2, value, data

data contains the volume, open, close, low and high values.

d1
d2
data
getD1()
getD2()
getDATA()
getReturns()
getValue()
get_finance_yahoo(adjusted=True)

Uses pylab tools to get yahoo finance data

Parameters:adjusted (bool) – True see pylab doc
hist_returns(nbins=100)

plot the histogram of returns values and approximate normalised histogram

plot_returns(i=None, f=None, log=False)

plot the returns values

plot_summary()

Plot the open values and volumes.

plot_volume(*args, **kargs)

Plot the volume versus time

returns

returns the arithmetic returns (close-open()/open( to be checked

rotate_xticks(fontsize=10, rotation=0)
setD1(d1)
setD2(d2)
value