Faster way to get row data (based on a condition) from one dataframe and merge onto another b pandas python -


i have 2 dataframes different indices , lengths. i'd grab data asset column asset_df if row matches same ticker , year. see below.

i created simplistic implementation using for-loops, imagine there fancier, faster ways this?

ticker_df

year   ticker    asset    doc 2011   fb        nan      doc1 2012   fb        nan      doc2 

asset_df

year   ticker    asset 2011   fb        100 2012   fb        200 2013   goog      300 

ideal result ticker_df

year   ticker    asset    doc      2011   fb        100      doc1 2012   fb        200      doc2 

my sucky implementation:

for in ticker_df.name.index:      c_asset = asset_df[asset_df.tic == ticker_df.name.ix[i]]     if len(c_asset) > 0:     #this checks see if there asset data on company         asset = c_asset[c_asset.fyear == ticker_df.year.ix[i]]['at']          if len(asset) > 0:             asset =  int(asset)             print 'g', asset, type(asset)             ticker_df.asset.ix[i] = asset             continue          else:             ticker_df.asset.ix[i] = np.nan             continue      if len(c_asset) == 0:         ticker_df.asset.ix[i] = np.nan         continue 

you can use update method. indices aligned first.

in [23]: ticker_df['ticker'] = ticker_df.ticker.str.upper()  in [24]: ticker_df = ticker_df.set_index(idx)  in [25]: asset_df = asset_df.set_index(idx)  in [26]: ticker_df.update(asset_df)  in [27]: ticker_df out[27]:               asset   doc year ticker              2011 fb        100  doc1 2012 fb        200  doc2  [2 rows x 2 columns] 

Comments

Popular posts from this blog

How to access named pipes using JavaScript in Firefox add-on? -

multithreading - OPAL (Open Phone Abstraction Library) Transport not terminated when reattaching thread? -

node.js - req param returns an empty array -