0

I'm working on PostgreSQL (latest version), in a simple SCADA system. The problem is that every query involving the largest table in my schema [figure 1] takes about 2 hours.

Schema

What I want is to obtain the constructora's PKs for buildings (edificio) whose departments (departamento) record (valor in table medicion) surpluses in the measurement of "gas" or "electricidad" in 2010. So...

  • variable contains the max value for a record (valmax)
  • medicion contains the value of the metric (valor)

Tables' Size (in rows).

  • constructura -> 10
  • edificio -> 100
  • departamento -> 50.000
  • variable -> 8
  • medicion_departamento -> 400.000
  • medicion -> 8.000.000

Query

I've made a query with INNER JOINs that goes from the littlest tables (constructora and variable) to the biggest one (medicion).

SELECT DISTINCT C.id_constructora FROM constructora C INNER JOIN variable V ON (V.nombre = 'electricidad' OR V.nombre = 'gas') INNER JOIN edificio E ON (E.id_constructora = C.id_constructora) INNER JOIN departamento D ON (E.id_edificio = D.id_edificio) INNER JOIN medicion M ON (M.id_variable = V.id_variable) WHERE ( (M.valor > V.valmax) AND EXTRACT(YEAR FROM M.fecha) = 2010 ); 

EXPLAIN

"HashAggregate (cost=2343438.58..2343438.68 rows=10 width=4)" " Group Key: c.id_constructora" " -> Hash Join (cost=164536.25..1947605.25 rows=158333333 width=4)" " Hash Cond: (e.id_constructora = c.id_constructora)" " -> Hash Join (cost=4.25..1510.75 rows=50000 width=4)" " Hash Cond: (d.id_edificio = e.id_edificio)" " -> Seq Scan on departamento d (cost=0.00..819.00 rows=50000 width=4)" " -> Hash (cost=3.00..3.00 rows=100 width=8)" " -> Seq Scan on edificio e (cost=0.00..3.00 rows=100 width=8)" " -> Hash (cost=164136.12..164136.12 rows=31670 width=4)" " -> Nested Loop (cost=0.00..164136.12 rows=31670 width=4)" " -> Nested Loop (cost=0.00..163739.12 rows=3167 width=0)" " Join Filter: ((m.valor > v.valmax) AND (v.id_variable = m.id_variable))" " -> Seq Scan on medicion m (cost=0.00..162408.00 rows=38000 width=8)" " Filter: (date_part('year'::text, fecha) = '2010'::double precision)" " -> Materialize (cost=0.00..1.13 rows=2 width=24)" " -> Seq Scan on variable v (cost=0.00..1.12 rows=2 width=24)" " Filter: (((nombre)::text = 'electricidad'::text) OR ((nombre)::text = 'gas'::text))" " -> Materialize (cost=0.00..1.15 rows=10 width=4)" " -> Seq Scan on constructora c (cost=0.00..1.10 rows=10 width=4)" 

My question is what can I do to reduce -significantly- the execution time of my query?

4
  • How is the tables variable and medicion linked with other tables in the query? It seems that you have missed medicion_departamento in it... Commented Jul 20, 2017 at 0:51
  • But I only need "valmax" and not the data from medicion_departamento, because I know that medicion contains id_variable from medicion_departamento and that table got this value from variable Commented Jul 20, 2017 at 1:00
  • 1
    But you should at least to link medicion with departamento using id_departamento column. Isn't it? Commented Jul 20, 2017 at 1:03
  • This solved my problem! Now I'm getting the results in 3.2 seconds. Commented Jul 20, 2017 at 1:38

2 Answers 2

1

It seems (according to the schema) your query should be:

SELECT DISTINCT C.id_constructora FROM constructora C INNER JOIN edificio E ON (E.id_constructora = C.id_constructora) INNER JOIN departamento D ON (E.id_edificio = D.id_edificio) -- Changes here INNER JOIN medicion M ON (D.id_departamento = M.id_departamento) INNER JOIN variable V ON (M.id_variable = V.id_variable AND (V.nombre = 'electricidad' OR V.nombre = 'gas')) WHERE ( (M.valor > V.valmax) AND EXTRACT(YEAR FROM M.fecha) = 2010 ); 

Because in your original query you have the cross join of medicion and variable tables with all other tables.

Also you should to have the functional index like

create index idx_medicion_fecha_year on medicion(EXTRACT(YEAR FROM M.fecha)); 

to be used for EXTRACT(YEAR FROM fecha) = 2010 condition.

Sign up to request clarification or add additional context in comments.

Comments

1

The or in the join is a performance killer. Perhaps this does what you want:

SELECT DISTINCT C.id_constructora FROM constructora C INNER JOIN edificio E ON E.id_constructora = C.id_constructora INNER JOIN departamento D ON E.id_edificio = D.id_edificio LEFT JOIN variable ve ON ve.nombre = 'electricidad' AND M.valor > ve.valmax LEFT JOIN variable vg ON vg.nombre = 'gas' LEFT JOIN medicion me ON me.id_variable = ve.id_variable AND EXTRACT(YEAR FROM me.fecha) = 2010 AND me.valor > ve.valmax LEFT JOIN medicion mg ON mg.id_variable = vg.id_variable AND EXTRACT(YEAR FROM mg.fecha) = 2010 AND mg.valor > vg.valmax WHERE EXTRACT(YEAR FROM M.fecha) = 2010 AND (me.id_variable IS NOT NULL OR mg.id_variable IS NOT NULL); 

I also suspect that you want join conditions on mg.id_departamento = d.id_departamento and me.id_departamento = d.id_departamento in the appropriate on clauses.

This assumes that you have appropriate indexes on the join keys (particularly the primary keys). In addition, you want an index on variable(nombre, valor) and medicion(id_variable, fecha, valor). If you are using id_departamento, it should be the first key in the index.

2 Comments

Thanks for the answer! The problem here is: what is V.id_variable?
@TomiSebastiánJuárez . . . I misunderstood the data model. I modified the answer.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.