I'm using MySQL to work with a large log file (300 million records or so) with four columns (two varchars, an int, and a key), but it's taking a long time.
The goal is to dig through the log file and find records who are taking a certain action at a high frequency.
Records with a status of A or U during events higher than an arbitrary eventID. I'm inserting them into a new table using a GROUP BY and it's taking upwards of the entire day to run. Is there a way to do this faster?
INSERT INTO `tbl_FrequentActions`(`ActionCount`, `RecordNumber`) SELECT COUNT(`idActionLog`) as 'ActionCount', `RecordNumber` FROM `ActionLog` WHERE (`ActionStatus` like 'D' or `ActionStatus` like 'U') AND `EventID` > 103 GROUP BY `RecordNumber` HAVING COUNT(`idActionLog`) > 19 ; Would it be faster to use temporary tables to run the WHERE arguments separately. Like create temporary tables to cut everything thing down before I ran the GROUP BY?
All fields in the ActionLog are indexed.
EDIT: All the data is already in the log database in one table. It was mentioned that I was ambiguous on that point earlier.
The indexes are individual to the column.
EDIT2: Somebody asked if my log file buffers were correctly configured for something of this size, and that's a great question, but I don't know. Yes, it is in an InnoDB format.
I built a test table of a couple million records and ran the query on there. It took 1 minute 30 seconds. I broke the query down into using a temporary table to handle all of the where clause then ran the GROUP BY query on the temporary table. That knocked the time down to under a minute. So there is a several hour savings.
EDIT3: Can I use 'ON DUPLICATE UPDATE' to make this faster? I tried this, but it just ran forever. I think it's a Cartesian error. Do I need to alias the tables somehow?
INSERT INTO `tbl_FrequentActions`(`ActionCount`, `RecordNumber`) SELECT '1' as 'ActionCount', `RecordNumber` FROM `ActionLog` WHERE (`Status` like 'D' or `Status` like 'U') AND `EventID` > 103 ON DUPLICATE KEY UPDATE `DeliveryCount` = (`DeliveryCount` + 1) ;
liketo=might help. I'm not sure that MySQL will correctly optimiseActionStatus like 'D'toActionStatus = 'D'innodb_buffer_pool_size? (My point is that the timings for the smaller table may not extrapolate to the bigger table.)