google bigquery - Parallelizable OVER EACH BY -

- September 15, 2014

i hitting obstacle again , again...

join each , group each clauses can't used on output of window functions

is there best practice or recommendations how use window functions (over()) large data sets cannot processed on single node?

fragmenting data , running same query different filters can work, limiting, takes lot of time (and manual labor) , costly (running same query on same data set 30 times instead of once).

referring jeremy's answer bellow... it's better, still doesn't work properly. if take original query sample:

select title,count (case when contributor_id<>leadcontributor 1 else null end) different, count (case when contributor_id=leadcontributor 1 else null end) same, count(*) total ( select title,contributor_id,lead(contributor_id)over(partition title order timestamp) leadcontributor   [publicdata:samples.wikipedia] regexp_match(title,r'^[a,b]')=true ) group title

now works... but

select title,count (case when contributor_id<>leadcontributor 1 else null end) different, count (case when contributor_id=leadcontributor 1 else null end) same, count(*) total ( select title,contributor_id,lead(contributor_id)over(partition title order timestamp) leadcontributor   [publicdata:samples.wikipedia] regexp_match(title,r'^[a-z]')=true ) group each title

gives again resources exceeded error...

window functions can executed in distributed fashion according partition clause given inside over. if supply partition window functions, data processed in parallel similar how join each , group each processed.

in addition, can use partition on output of join each or group each without serializing execution. using same keys partition join each or group each particularly efficient, because data not need reshuffled between join/aggregation , window function execution.

Search This Blog

GHI

google bigquery - Parallelizable OVER EACH BY -

Comments

Post a Comment

Popular posts from this blog

reporting services - Visible Export Data Feed option SSRS report -

git - Initial Commit: "fatal: could not create leading directories of ..." -

parcelable - Unmarshalling unknown type code exception while reading parcel values in Android 4.4+ -