i have dataframe of weather data:
id date element data_value 0 usw00094889 2014-11-12 tmax 22 1 usc00208972 2009-04-29 tmin 56 2 usc00200032 2008-05-26 tmax 278 3 usc00205563 2005-11-11 tmax 139 4 usc00200230 2014-02-27 tmax -106 5 usw00014833 2010-10-01 tmax 194 6 usc00207308 2010-06-29 tmin 144 7 usc00203712 2005-10-04 tmax 289 8 usw00004848 2007-12-14 tmin -16 9 usc00200220 2011-04-21 tmax 72 10 usc00205822 2013-01-16 tmax 11 11 usc00205822 2008-05-29 tmin 28 12 usc00203712 2008-10-17 tmin 17 13 usc00205563 2006-05-14 tmax 183 14 usc00200842 2006-05-14 tmax 122 .... 165083 usc00200230 2006-11-29 tmin 117
i'd make 2 lists - of min , max temp each day. way tried doing making list of dates: dates = df['date'].unique()
, , looping through data , appending values lists:
for in dates: mint.append(df[(df['date']==i) & (df['element'] == 'tmin')]['data_value'].min()) maxt.append(df[(df['date']==i) & (df['element'] == 'tmax')]['data_value'].max())
i tried sorting dataframe dates , data_values, , picking out first in list max, , last min:
df = df.sort_values(['date','data_value'], ascending=false) in dates: mint.append(df[df['date']==dates[0]]['data_value'].values[-1]) maxt.append(df[df['date']==dates[0]]['data_value'].values[0])
but still takes reeeeeeeally long :( ... please me make faster?
you may want try pandas.dataframe.groupby
method:
# generate test data data = \ u""" id,date,element,data_value usw00094889,2014-11-12,tmax,22 usc00208972,2014-11-12,tmin,56 usc00200032,2008-05-26,tmax,278 usc00205563,2005-11-11,tmax,139 usc00200230,2014-02-27,tmax,-106 usw00014833,2010-10-01,tmax,194 usc00207308,2010-06-29,tmin,144 usc00203712,2012-06-29,tmax,289 usw00004848,2007-12-14,tmin,-16 usc00200220,2011-04-21,tmax,72 usc00205822,2013-01-16,tmax,11 usc00205822,2008-05-29,tmin,28 usc00203712,2006-05-14,tmin,17 usc00205563,2006-05-14,tmax,183 usc00200842,2006-05-14,tmax,122 """ buffer = io.stringio(data) df = pandas.dataframe.from_csv(buffer).reset_index(0) # here magic sauce iteration grouper = df.groupby('date') df_min_max = pandas.dataframe(columns=['min', 'max']) # can use grouper iteration date, data in grouper: df_min_max.loc[date, 'min'] = min(data['data_value']) df_min_max.loc[date, 'max'] = max(data['data_value'])
note: can add other fields output dataframe if like. aware appending dataframe becomes more expensive larger dataframe becomes. may want append max , min values list, depending on how data analyzing.
Comments
Post a Comment