mysql - how to get average of rows that have a certain relationship -
i have bunch of data stored pertaining county demographics in database. need able access average of data within in state of county. example, need able average of counties who's state_id matches state_id of county county_id of 1. essentially, if county in virginia, need average of of counties in virginia. i'm having trouble setting query, , hoping guys give me help. here's have written, returns 1 row database because of linking county_id of 2 tables together.
select avg(demographic_data.percent_white) avg_percent_white  demographic_data,counties, states  counties.county_id = demographic_data.county_id , counties.state_id = states.state_id   here's basic database layout:
counties ------------------------ county_id | county_name  states --------------------- state_id | state_name  demographic_data ----------------------------------------- percent_white | percent_black | county_id      
your query returning 1 row, because there's aggregate , no group by. if want average of counties within state, we'd expect 1 row.
to "statewide" average, of counties within state, here's 1 way it:
select avg(d.percent_white) avg_percent_white   demographic_data d   join counties     on a.county_id = d.county_id   join counties o     on o.state_id = a.state_id   o.county_id = 42   note there's no need join state table. need counties have matching state_id. query above using 2 references counties table. reference aliased "a" counties within state, reference aliased "o" state_id particular county.
if had state_id, wouldn't need second reference:
select avg(d.percent_white) avg_percent_white   demographic_data d   join counties     on a.county_id = d.county_id  a.state_id = 11   followup
q if wanted bring in table.. let's call demographic_data_2 linked via county_id
a made assumption demographic_data table had 1 row per county_id. if same holds true second table, simple join operation.
  join demographic_data_2 c     on c.county_id = d.county_id    with table joined in, add appropriate aggregate expression in select list (e.g. sum, min, max, avg).
the trouble spots typically "missing" , "duplicate" data... when there isn't row every county_id in second table, or there's more 1 row particular county_id, leads rows not included in aggregate, or getting double counted in aggregate.
we note aggregate returned in original query "average of averages". it's average of values each county.
consider:
bucket  count_red  count_blue  count_total  percent_red ------  ---------  ----------  -----------  -----------      1        480           4         1000           48      2         60           1          200           30   note there's difference between "average of averages", , calculating average using totals.
select avg(percent_red) avg_percent_red      , sum(count_red)/sum(count_total) tot_percent_red  avg_percent_red  tot_percent_red ---------------  ---------------              39               45   both values valid, don't want misinterpret or misrepresent either value.
Comments
Post a Comment