Skip to main content
added 2 characters in body
Source Link
Alper t. Turker
  • 35.3k
  • 9
  • 89
  • 118

Commonly I see dataset.countDataset.count throughout codebases in 3 scenarios:

  1. logging log.info("this ds has ${dataset.count} rows")
  2. branching if (dataset.count > 0) do x else do y
  3. force a cache dataset.persist.count

Does it prevent the query optimizer from creating the most efficient dag by forcing it to be eager prematurely in any of those scenarios?

Commonly I see dataset.count throughout codebases in 3 scenarios:

  1. logging log.info("this ds has ${dataset.count} rows")
  2. branching if (dataset.count > 0) do x else do y
  3. force a cache dataset.persist.count

Does it prevent the query optimizer from creating the most efficient dag by forcing it to be eager prematurely in any of those scenarios?

Commonly I see Dataset.count throughout codebases in 3 scenarios:

  1. logging log.info("this ds has ${dataset.count} rows")
  2. branching if (dataset.count > 0) do x else do y
  3. force a cache dataset.persist.count

Does it prevent the query optimizer from creating the most efficient dag by forcing it to be eager prematurely in any of those scenarios?

edited body
Source Link
soote
  • 3.3k
  • 1
  • 25
  • 36

Commonly I see dataset.count throughout codebases in 3 scenarios:

  1. logging log.info("this ds has ${dataset.count()} rows")
  2. branching if (dataset.count > 0) do x else do y
  3. force a cache dataset.persist.countdataset.persist.count

Does it prevent the query optimizer from creating the most efficient dag by forcing it to be eager prematurely in any of those scenarios?

Commonly I see dataset.count throughout codebases in 3 scenarios:

  1. logging log.info("this ds has ${dataset.count()} rows")
  2. branching if (dataset.count > 0) do x else do y
  3. force a cache dataset.persist.count

Does it prevent the query optimizer from creating the most efficient dag by forcing it to be eager prematurely in any of those scenarios?

Commonly I see dataset.count throughout codebases in 3 scenarios:

  1. logging log.info("this ds has ${dataset.count} rows")
  2. branching if (dataset.count > 0) do x else do y
  3. force a cache dataset.persist.count

Does it prevent the query optimizer from creating the most efficient dag by forcing it to be eager prematurely in any of those scenarios?

added 95 characters in body
Source Link
soote
  • 3.3k
  • 1
  • 25
  • 36

Commonly I see dsdataset.count throughout codebases in 23 scenarios:

  1. logging log.info("this ds has ${dsdataset.count()} rows")
  2. branching if (dsdataset.count > 0) do x else do y
  3. force a cache dataset.persist.count

Does it prevent the query optimizer from creating the most efficient dag by forcing it to be eager prematurely in eitherany of those scenarios?


It is also commonly used to force a cache, however I am not interest in that scenario here.

Commonly I see ds.count throughout codebases in 2 scenarios:

  1. logging log.info("this ds has ${ds.count()} rows")
  2. branching if (ds.count > 0) do x else do y

Does it prevent the query optimizer from creating the most efficient dag by forcing it to be eager prematurely in either of those scenarios?


It is also commonly used to force a cache, however I am not interest in that scenario here.

Commonly I see dataset.count throughout codebases in 3 scenarios:

  1. logging log.info("this ds has ${dataset.count()} rows")
  2. branching if (dataset.count > 0) do x else do y
  3. force a cache dataset.persist.count

Does it prevent the query optimizer from creating the most efficient dag by forcing it to be eager prematurely in any of those scenarios?

added 95 characters in body
Source Link
soote
  • 3.3k
  • 1
  • 25
  • 36
Loading
Source Link
soote
  • 3.3k
  • 1
  • 25
  • 36
Loading