python - Pandas groupby: compute (relative) sizes and save in original dataframe -
my database structure such have units, belong several groups , have different variables (i focus on one, x, question). have year-based records. database looks like
unitid, groupid, year, x 0 1 1, 1990, 5 1 2 1, 1990, 2 2 2 1, 1991, 3 3 3 2, 1990, 10 etc. measure "intensity" variable, going number of units per group , year, , put database.
so far, doing
asd = df.drop_duplicates(cols=['unitid', 'year']) groups = asd.groupby(['year', 'groupid']) intensity = groups.size() and intensity looks like
year groupid 1961 2000 4 2030 3 2040 1 2221 1 2300 2 however, don't know how put them old dataframe. can access them through intensity[0], intensity.loc() gives locindexer not callable error.
secondly, nice if scale intensity. instead of "units per group-year", "units per group-year, scaled average/max units per group-year in year". if {t,g} denotes group-year cell, be:

that is, if simple intensity variable (for time , group) called intensity(t, g), create relativeintensity(t,g) = intensity(t,g)/mean(intensity(t=t,g=:)) - if fake code helps @ making myself clear.
thanks!
update
just putting answer here (explicitly) readability. first part solved by
intensity = intensity.reset_index() df['intensity'] = intensity[0]
it's a multi-index. can reset index calling .reset_index() resultant dataframe. or can disable when compute group-by operation, specifying as_index=false groupby(), like:
intensity = asd.groupby(["year", "groupid"], as_index=false).size() as second question, i'm not sure mean in instead of "units per group-year", "units per group-year, scaled average/max units per group-year in year".. if want compute "intensity" intensity / mean(intensity), can use transform method, like:
asd.groupby(["year", "groupid"])["x"].transform(lambda x: x/mean(x)) is you're looking for?
update
if want compute intensity / mean(intensity), mean(intensity) based on year , not year/groupid subsets, first have create mean(intensity) based on year only, like:
intensity["mean_intensity_only_by_year"] = intensity.groupby(["year"])["x"].transform(mean) and compute intensity / mean(intensity) year/groupid subset, mean(intensity) derived year subset:
intensity["relativeintensity"] = intensity.groupby(["year", "groupid"]).apply(lambda x: pd.dataframe( {"relativeintensity": x["x"] / x["mean_intensity_only_by_year"] } )) maybe you're looking for, right?
Comments
Post a Comment