python - Pandas groupby: compute (relative) sizes and save in original dataframe -


my database structure such have units, belong several groups , have different variables (i focus on one, x, question). have year-based records. database looks like

    unitid, groupid, year, x 0        1        1, 1990, 5 1        2        1, 1990, 2 2        2        1, 1991, 3 3        3        2, 1990, 10 

etc. measure "intensity" variable, going number of units per group , year, , put database.

so far, doing

asd = df.drop_duplicates(cols=['unitid', 'year']) groups = asd.groupby(['year', 'groupid']) intensity = groups.size() 

and intensity looks like

year groupid 1961    2000    4         2030    3         2040    1         2221    1         2300    2 

however, don't know how put them old dataframe. can access them through intensity[0], intensity.loc() gives locindexer not callable error.

secondly, nice if scale intensity. instead of "units per group-year", "units per group-year, scaled average/max units per group-year in year". if {t,g} denotes group-year cell, be:

relative intensity

that is, if simple intensity variable (for time , group) called intensity(t, g), create relativeintensity(t,g) = intensity(t,g)/mean(intensity(t=t,g=:)) - if fake code helps @ making myself clear.

thanks!

update

just putting answer here (explicitly) readability. first part solved by

intensity = intensity.reset_index() df['intensity'] = intensity[0] 

it's a multi-index. can reset index calling .reset_index() resultant dataframe. or can disable when compute group-by operation, specifying as_index=false groupby(), like:

intensity = asd.groupby(["year", "groupid"], as_index=false).size() 

as second question, i'm not sure mean in instead of "units per group-year", "units per group-year, scaled average/max units per group-year in year".. if want compute "intensity" intensity / mean(intensity), can use transform method, like:

asd.groupby(["year", "groupid"])["x"].transform(lambda x: x/mean(x)) 

is you're looking for?

update

if want compute intensity / mean(intensity), mean(intensity) based on year , not year/groupid subsets, first have create mean(intensity) based on year only, like:

intensity["mean_intensity_only_by_year"] = intensity.groupby(["year"])["x"].transform(mean) 

and compute intensity / mean(intensity) year/groupid subset, mean(intensity) derived year subset:

intensity["relativeintensity"] = intensity.groupby(["year", "groupid"]).apply(lambda x: pd.dataframe(                         {"relativeintensity": x["x"] / x["mean_intensity_only_by_year"] }                     )) 

maybe you're looking for, right?


Comments

Popular posts from this blog

java - Intellij Synchronizing output directories .. -

git - Initial Commit: "fatal: could not create leading directories of ..." -