python - How to subset pandas dataframe by two-column list of any length -
i have tried different combinations of boolean arrays , .isin constructions, pandas fu not strong enough.
if have following example dataframe:
in[1]: import pandas pd exampledf = pd.dataframe({ 'factor1' : ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd'], 'factor2' : ['e', 'e', 'e', 'e', 'f', 'f', 'f', 'f'], 'numeric' : [1., 2., 3., 4., 5., 6., 7., 8.] })
i need pass list of factor1, factor2 pairs of length return subset of dataframe has combination of factors.
for example:
in[2]: def factorfilter(df, factorlist): # code goes here # returns dataframe factorfilter(exampledf, [['a', 'e'], ['c', 'f']]) out[2]: factor1 factor2 numeric 0 e 1 6 f f 7
(if there's better way set lists, i'm ears, it's occurred me , easy produce , pass function).
you can utilize multi-index (index off more 1 column). 2 ways of building index example schema come mind.
import pandas pd index = pd.multiindex.from_product([list('abcd'),list('ef')], names=['factor1','factor2'])
factor1 = list('abcdabcd') factor2 = list('eeeeffff') index = pd.multindex.from_tuples(list(zip(factor1, factor2)), names=['factor1', 'factor2'])
from this, can create multi-index dataframe by
numerics = list(range(1,9)) df = pd.dataframe({'numeric': numerics}, index=index)
df outputs
numeric factor1 factor2 e 1 f 2 b e 3 f 4 c e 5 f 6 d e 7 f 8 [8 rows x 1 columns]
then, can retrieve subset of indices, passing list of tuples ix property.
subdf = df.ix[[('a','e'), ('c','f')]]
subdf outputs
numeric factor1 factor2 e 1 c f 6 [2 rows x 1 columns]
Post a Comment