fit.ipynb,

by Joe Hahn, jhahn@spacescience.org, 17 January 2014.

To execute this ipython notebook, use > ipython notebook --pylab inline and to generate an html version use > ipython nbconvert --to html fit.ipynb

The following fits a model to the client's business data; that data is synthetic and was generated via Monte Carlo methods (see mc_model.py), with that code's output being used as input here. In the model the client produces 'widgets' that are then made available for sale online, which customers then order at later dates. The probability (per time) that a customer orders a widget is assumed to vary as lambda(dt) = lambda_0 x ((dt/tau)**n) x exp(-dt/tau) where the model parameters are: scale factor lambda_0, timescale tau over which customer interest in that widget wanes, and n is a power-law exponent. dt is the time since the widget's creation. This code uses Python's lmfit library to optimize the values of lambda_0, tau, and n to obtain the best mode fit the data, and the code also assess the uncertainties in these parameters.

Import the modules to be used below.

In [1]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
import lmfit

This function will be called by lmfit.minimize(), and it calculates the difference between the real and simulated number of orders.

In [2]:

def min_fn(params, t, n_orders_hist, n_widgets_hist):
    L0 = params['L0'].value
    n = params['n'].value
    tau = params['tau'].value
    x = t/tau
    L = L0*(x**n)*np.exp(-x)
    Nt = t.size
    n_orders_model = np.zeros(Nt)
    dt = t[1] - t[0]
    for i in range(Nt):
        for j in range(i+1):
            n_orders_model[i] += L[i-j]*n_widgets_hist[j]*dt
    return n_orders_model - n_orders_hist

Load simulated data that was generated by mc_model.py.

In [3]:

s = np.load('mc_model.npz')
t = s['t']
n_widgets_hist = s['n_widgets_hist']
n_orders_hist = s['n_orders_hist']
t_orders = s['t_orders']
t_widgets = s['t_widgets'] 
lambda_0 = s['lambda_0'] + 0
n = s['n'] + 0
tau = s['tau'] + 0

Plot histograms of widgets and orders created over time, noting that n_orders_hist=rate at which orders are produced each day.

In [4]:

plt.close()
fig = plt.figure(figsize=(6.5, 4.5))
ax = fig.add_subplot(1, 1, 1)
ax.set_title('widget & orders production rate')
ax.set_xlabel('time t    (days)')
ax.set_ylabel('number of orders and widgets per day')
ax.plot(t, n_widgets_hist, drawstyle='steps-mid', label='widgets', color='blue')
ax.plot(t, n_orders_hist, drawstyle='steps-mid', label='orders', color='green')
ax.legend(loc='upper left')
plt.draw()

Params contains initial (guessed) parameter values plus limits on their range of values.

In [5]:

params_guess = lmfit.Parameters()
params_guess.add('L0',   value= 0.5,  min=0.0, max=1.0)
params_guess.add('n', value= 0.0, min=-10.0, max=10.0)
params_guess.add('tau', value= 10.0, min=0.0, max=300.0)

Fit model to data. Use lmfit.minimize() to find the Parameters that minimize the discrepancy between n_orders_hist and the simulated values.

In [6]:

fit = lmfit.minimize(min_fn, params_guess, args=(t, n_orders_hist, n_widgets_hist))
lmfit.printfuncs.report_fit(fit.params, show_correl=False)
n_orders_fit = min_fn(fit.params, t, n_orders_hist*0, n_widgets_hist)
ax.plot(t, n_orders_fit, label='model', color='red')
ax.legend(loc='upper left')
plt.draw()

[[Variables]]
     L0:      0.006651 +/- 0.002832 (42.58%) initial =  0.500000
     n:       2.568166 +/- 0.509048 (19.82%) initial =  0.000000
     tau:     12.471051 +/- 1.898055 (15.22%) initial =  10.000000

<matplotlib.figure.Figure at 0x46f82d0>

Calculate confidence intervals. Note that lmfit.conf_interval() is slow, this takes a few minutes.

In [7]:

prob = 0.1
ci = lmfit.conf_interval(fit, sigmas=[prob, 0.95], verbose=True)
lmfit.printfuncs.report_ci(ci)

Calculating CI for L0
Calculating CI for L0
Calculating CI for n
Calculating CI for n
Calculating CI for tau
Calculating CI for tau
       95.00%    10.00%     0.00%    10.00%    95.00%
tau   9.21621  12.23860  12.47105  12.70702  16.61173
 L0   0.00191   0.00630   0.00665   0.00700   0.01148
  n   1.72848   2.50563   2.56817   2.63212   3.75666

Extrapolate widgets out to t=400 days.

In [8]:

t_ext = np.arange(t.max() + 101, dtype=float)
coef = np.polyfit(t, n_widgets_hist, 1)
n_widgets_hist_ext = np.polyval(coef, t_ext)

Plot extrapolated widgets versus time. Then show the best-fitting model also extrapolated out to t=400 days. Then overplot the best-fitting model's confidence intervals.

In [10]:

#Show extrapolated widgets and orders versus time.
plt.close()
fig = plt.figure(figsize=(6.5, 4.5))
ax = fig.add_subplot(1, 1, 1)
ax.set_title('widget & orders production rate')
ax.set_xlabel('time t    (days)')
ax.set_ylabel('number of orders and widgets per day')
ax.plot(t, n_widgets_hist, drawstyle='steps-mid', color='blue')
ax.plot(t, n_orders_hist, drawstyle='steps-mid', label='orders', color='green')
tm = np.where(t_ext>=t[-1])[0]
ax.plot(t_ext[tm], n_widgets_hist_ext[tm], drawstyle='steps-mid', label='widgets', color='blue')

#Plot confidence intervals and extrapolated model.
params = fit.params
j = 1
p = 'L0'
params[p].value = ci[p][j][1]
p = 'n'
params[p].value = ci[p][j][1]
p = 'tau'
params[p].value = ci[p][j][1]
n_orders = min_fn(params, t_ext, n_widgets_hist_ext*0, n_widgets_hist_ext)
ax.plot(t_ext, n_orders, color='orange')
plt.draw()
params = fit.params
j = 3
p = 'L0'
params[p].value = ci[p][j][1]
p = 'n'
params[p].value = ci[p][j][1]
p = 'tau'
params[p].value = ci[p][j][1]
n_orders = min_fn(params, t_ext, n_widgets_hist_ext*0, n_widgets_hist_ext)
ax.plot(t_ext, n_orders, color='yellow')
plt.draw()
params = fit.params
j = 2
p = 'L0'
params[p].value = ci[p][j][1]
p = 'n'
params[p].value = ci[p][j][1]
p = 'tau'
params[p].value = ci[p][j][1]
n_orders = min_fn(params, t_ext, n_widgets_hist_ext*0, n_widgets_hist_ext)
ax.plot(t_ext, n_orders, color='red', label='model')
ax.legend(loc='upper left')
plt.draw()
plt.savefig('fit.png', dpi=200)