numpy - Fitting a Lognormal Distribution in Python using CURVE_FIT -


i have hypothetical y function of x , trying find/fit lognormal distribution curve shape on data best. using curve_fit function , able fit normal distribution, curve not optimized.

below give y , x data points y = f(x).

y_axis = [0.00032425299473065838, 0.00063714106162861229, 0.00027009331177605913, 0.00096672396877715144, 0.002388766809835889, 0.0042233337680543182, 0.0053072824980722137, 0.0061291327849408699, 0.0064555344006149871, 0.0065601228278316746, 0.0052574034010282218, 0.0057924488798939255, 0.0048154093097913355, 0.0048619350036057446, 0.0048154093097913355, 0.0045114840997070331, 0.0034906838696562147, 0.0040069911024866456, 0.0027766995669134334, 0.0016595801819374015, 0.0012182145074882836, 0.00098231827111984341, 0.00098231827111984363, 0.0012863691645616997, 0.0012395921040321833, 0.00093554121059032721, 0.0012629806342969417, 0.0010057068013846018, 0.0006081017868837127, 0.00032743942370661445, 4.6777060529516312e-05, 7.0165590794274467e-05, 7.0165590794274467e-05, 4.6777060529516745e-05] 

y-axis probabilities of event occurring in x-axis time bins:

x_axis = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0] 

i able better fit on data using excel , lognormal approach. when attempt use lognormal in python, fit not work , doing wrong.

below code have fitting normal distribution, seems 1 can fit in python (hard believe):

#fitting distributino on top of savitzky-golay %matplotlib inline import matplotlib import matplotlib.pyplot plt import pandas pd import scipy import scipy.stats import numpy np scipy.stats import gamma, lognorm, halflogistic, foldcauchy scipy.optimize import curve_fit  matplotlib.rcparams['figure.figsize'] = (16.0, 12.0) matplotlib.style.use('ggplot') # results savgol x_axis = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0,     13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0] y_axis = [0.00032425299473065838, 0.00063714106162861229, 0.00027009331177605913, 0.00096672396877715144, 0.002388766809835889, 0.0042233337680543182, 0.0053072824980722137, 0.0061291327849408699, 0.0064555344006149871, 0.0065601228278316746, 0.0052574034010282218, 0.0057924488798939255, 0.0048154093097913355, 0.0048619350036057446, 0.0048154093097913355, 0.0045114840997070331, 0.0034906838696562147, 0.0040069911024866456, 0.0027766995669134334, 0.0016595801819374015, 0.0012182145074882836, 0.00098231827111984341, 0.00098231827111984363, 0.0012863691645616997, 0.0012395921040321833, 0.00093554121059032721, 0.0012629806342969417, 0.0010057068013846018, 0.0006081017868837127, 0.00032743942370661445, 4.6777060529516312e-05, 7.0165590794274467e-05, 7.0165590794274467e-05, 4.6777060529516745e-05]  ## y_axis values must normalised sum_ys = sum(y_axis)  # normalize 1 y_axis = [_/sum_ys _ in y_axis]  # def gamma_f(x, a, loc, scale): #     return gamma.pdf(x, a, loc, scale)  def norm_f(x, loc, scale): #     print 'loc: ', loc, 'scale: ', scale, "\n"     return norm.pdf(x, loc, scale)  fitting = norm_f  # param_bounds = ([-np.inf,0,-np.inf],[np.inf,2,np.inf]) result = curve_fit(fitting, x_axis, y_axis) result_mod = result  # mod scale # results_adj  = [result_mod[0][0]*.75, result_mod[0][1]*.85]  plt.plot(x_axis, y_axis, 'ro') plt.bar(x_axis, y_axis, 1, alpha=0.75) plt.plot(x_axis, [fitting(_, *result[0]) _ in x_axis], 'b-') plt.axis([0,35,0,.1])  # convert probability y_norm_fit = [fitting(_, *result[0]) _ in x_axis] y_fit = [_*sum_ys _ in y_norm_fit] print list(y_fit)  plt.show() 

i trying answers 2 questions:

  1. is best fit normal distribution curve? how can imporve fit?

normal distribution result: enter image description here

  1. how can fit lognormal distribution data or there better distribution can use?

i playing around lognormal distribution curve adjust mu , sigma, looks there possible better fit. don't understand doing wrong similar results in python.

actually, gamma distribution might fit @glen_b proposed. i'm using second definition \alpha , \beta.

nb: trick use quick fit compute mean , variance , typical two-parametric distribution enough recover parameters , quick idea if fit or not.

enter image description here

code

import math scipy.misc import comb  import matplotlib.pyplot plt  y_axis = [0.00032425299473065838, 0.00063714106162861229, 0.00027009331177605913, 0.00096672396877715144, 0.002388766809835889, 0.0042233337680543182, 0.0053072824980722137, 0.0061291327849408699, 0.0064555344006149871, 0.0065601228278316746, 0.0052574034010282218, 0.0057924488798939255, 0.0048154093097913355, 0.0048619350036057446, 0.0048154093097913355, 0.0045114840997070331, 0.0034906838696562147, 0.0040069911024866456, 0.0027766995669134334, 0.0016595801819374015, 0.0012182145074882836, 0.00098231827111984341, 0.00098231827111984363, 0.0012863691645616997, 0.0012395921040321833, 0.00093554121059032721, 0.0012629806342969417, 0.0010057068013846018, 0.0006081017868837127, 0.00032743942370661445, 4.6777060529516312e-05, 7.0165590794274467e-05, 7.0165590794274467e-05, 4.6777060529516745e-05] x_axis = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0]  ## y_axis values must normalised sum_ys = sum(y_axis)  # normalize 1 y_axis = [_/sum_ys _ in y_axis]  m = 0.0 k in range(0, len(x_axis)):     m += y_axis[k] * x_axis[k]  v = 0.0 k in range(0, len(x_axis)):     t = (x_axis[k] - m)     v += y_axis[k] * t * t  print(m, v)  b = m/v = m * b  print(a, b)  z = [] k in range(0, len(x_axis)):     q = b**a * x_axis[k]**(a-1.0) * math.exp( - b*x_axis[k] ) / math.gamma(a)     z.append(q)  plt.plot(x_axis, y_axis, 'ro') plt.plot(x_axis, z, 'b*') plt.axis([0, 35, 0, .1]) plt.show() 

Comments