mysql - how to get average of rows that have a certain relationship -
i have bunch of data stored pertaining county demographics in database. need able access average of data within in state of county. example, need able average of counties who's state_id matches state_id of county county_id of 1. essentially, if county in virginia, need average of of counties in virginia. i'm having trouble setting query, , hoping guys give me help. here's have written, returns 1 row database because of linking county_id of 2 tables together.
select avg(demographic_data.percent_white) avg_percent_white demographic_data,counties, states counties.county_id = demographic_data.county_id , counties.state_id = states.state_id
here's basic database layout:
counties ------------------------ county_id | county_name states --------------------- state_id | state_name demographic_data ----------------------------------------- percent_white | percent_black | county_id
your query returning 1 row, because there's aggregate , no group by. if want average of counties within state, we'd expect 1 row.
to "statewide" average, of counties within state, here's 1 way it:
select avg(d.percent_white) avg_percent_white demographic_data d join counties on a.county_id = d.county_id join counties o on o.state_id = a.state_id o.county_id = 42
note there's no need join state
table. need counties have matching state_id. query above using 2 references counties table. reference aliased "a" counties within state, reference aliased "o" state_id particular county.
if had state_id, wouldn't need second reference:
select avg(d.percent_white) avg_percent_white demographic_data d join counties on a.county_id = d.county_id a.state_id = 11
followup
q if wanted bring in table.. let's call demographic_data_2 linked via county_id
a made assumption demographic_data
table had 1 row per county_id. if same holds true second table, simple join operation.
join demographic_data_2 c on c.county_id = d.county_id
with table joined in, add appropriate aggregate expression in select list (e.g. sum, min, max, avg).
the trouble spots typically "missing" , "duplicate" data... when there isn't row every county_id in second table, or there's more 1 row particular county_id, leads rows not included in aggregate, or getting double counted in aggregate.
we note aggregate returned in original query "average of averages". it's average of values each county.
consider:
bucket count_red count_blue count_total percent_red ------ --------- ---------- ----------- ----------- 1 480 4 1000 48 2 60 1 200 30
note there's difference between "average of averages", , calculating average using totals.
select avg(percent_red) avg_percent_red , sum(count_red)/sum(count_total) tot_percent_red avg_percent_red tot_percent_red --------------- --------------- 39 45
both values valid, don't want misinterpret or misrepresent either value.
Comments
Post a Comment