0

Im trying to join 3 tables together using inner joins, but the results are showing more records than what should be there. My Data tables are set up like this:

Table:gameday.atbats GameName Inning num b s o Batter Pitcher Result ----------------------------------------------------------------------------------------- gid_2008_09_24_cinmlb_houmlb_1 1 1 2 3 1 457803 150116 Jay Bruce strikes out swinging. gid_2008_09_24_cinmlb_houmlb_1 1 2 1 0 2 433898 150116 Jeff Keppinger lines out to right fielder Hunter Pence. gid_2008_09_24_cinmlb_houmlb_1 1 3 3 1 2 458015 150116 Joey Votto singles on a line drive to right fielder Hunter Pence. gid_2008_09_24_cinmlb_houmlb_1 1 4 2 3 3 429665 150116 Edwin Encarnacion called out on strikes. gid_2008_09_24_cinmlb_houmlb_1 1 5 1 2 0 430565 459371 Kazuo Matsui singles on a line drive to right fielder Jay Bruce. ----------------------------------------------------------------------------------------- Table: Gameday.pitches GameName GameAtBatID Result ------------------------------------------------------ gid_2008_09_24_cinmlb_houmlb_1 1 Called Strike gid_2008_09_24_cinmlb_houmlb_1 1 Ball gid_2008_09_24_cinmlb_houmlb_1 1 Swinging Strike gid_2008_09_24_cinmlb_houmlb_1 1 Ball gid_2008_09_24_cinmlb_houmlb_1 1 Foul gid_2008_09_24_cinmlb_houmlb_1 1 Foul gid_2008_09_24_cinmlb_houmlb_1 1 Swinging Strike gid_2008_09_24_cinmlb_houmlb_1 2 Ball gid_2008_09_24_cinmlb_houmlb_1 2 In play, out(s) gid_2008_09_24_cinmlb_houmlb_1 3 Called Strike gid_2008_09_24_cinmlb_houmlb_1 3 Ball -------------------------------------------------------- Table:batters GameName id name_display_first_last ---------------------------------------------------------------------------------- gid_2008_09_24_cinmlb_houmlb_1 407783 Geoff Geary gid_2008_09_24_cinmlb_houmlb_1 209315 David Newhan gid_2008_09_24_cinmlb_houmlb_1 115629 LaTroy Hawkins gid_2008_09_24_cinmlb_houmlb_1 113889 Darin Erstad gid_2008_09_24_cinmlb_houmlb_1 457803 Jay Bruce gid_2008_09_24_cinmlb_houmlb_1 433898 Jeff Keppinger gid_2008_09_24_cinmlb_houmlb_1 458015 Joey Votto gid_2008_09_24_cinmlb_houmlb_1 429665 Edwin Encarnacion --------------------------------------------------------------------------- 

I'm running what seems like a fairly standard set of inner joins, connecting each of the various tables together to get a output that shows me pitch by pitch what each batter did throughout the game. My code is as follows:

SELECT gameday.atbats.inning, gameday.batters.name_display_first_last, gameday.pitches.Result FROM gameday.atbats Inner join gameday.pitches on gameday.atbats.num = gameday.pitches.gameAtBatID inner join gameday.batters on gameday.atbats.batter = gameday.batters.ID where gameday.atbats.gamename = "gid_2008_09_24_cinmlb_houmlb_1" 

My issue is that when I run this query, batters are having more results than they should. For example, in the first inning Batter jay Bruce (num 1 in the atbats table) should have 7 pitches thrown to him in the first inning, but when I run the query he will have 10 pitches thrown to him. What Am I doing incorrectly to get these results. Also, I am aware that these field names are named horribly, but they were named by someone else, and I have not had a chance to change them yet.

2 Answers 2

2

I'm betting that atbats.num and pitches.GameAtBatID are not meant to globally uniquely identify an at-bat, but rather, that they only uniquely identify an at-bat within a given game. So in addition to restricting atbats.GameName to the desired game, you also need to specify that pitches.GameName = atbats.GameName:

SELECT gameday.atbats.inning, gameday.batters.name_display_first_last, gameday.pitches.Result FROM gameday.atbats JOIN gameday.pitches ON gameday.atbats.GameName = gameday.pitches.GameName AND gameday.atbats.num = gameday.pitches.GameAtBatID JOIN batters ON gameday.atbats.GameName = gameday.batters.GameName AND gameday.atbats.batter = gameday.batters.ID WHERE gameday.atbats.gamename = 'gid_2008_09_24_cinmlb_houmlb_1' 

(Note: I also included the analogous AND for batters, because although the values of batters.ID are large enough that it seems plausible that that really is a unique field, it made sense to include it for consistency.)

Sign up to request clarification or add additional context in comments.

Comments

1

That is true, because SQL work from TOP to buttom so when you join first two table you will have have

Inner join gameday.pitches on gameday.atbats.num = gameday.pitches.gameAtBatID 

you will have these results

GameName GameAtBatID Result Batter -------------------------------------------------------------------------- gid_2008_09_24_cinmlb_houmlb_1 1 Called Strike 457803 gid_2008_09_24_cinmlb_houmlb_1 1 Ball 457803 gid_2008_09_24_cinmlb_houmlb_1 1 Swinging Strike 457803 gid_2008_09_24_cinmlb_houmlb_1 1 Ball 457803 gid_2008_09_24_cinmlb_houmlb_1 1 Foul 457803 gid_2008_09_24_cinmlb_houmlb_1 1 Foul 457803 gid_2008_09_24_cinmlb_houmlb_1 1 Swinging Strike 457803 gid_2008_09_24_cinmlb_houmlb_1 2 Ball 433898 gid_2008_09_24_cinmlb_houmlb_1 2 In play, out(s) 433898 gid_2008_09_24_cinmlb_houmlb_1 3 Called Strike 458015 gid_2008_09_24_cinmlb_houmlb_1 3 Ball 458015 

then when you add new line of join that is

inner join gameday.batters on gameday.atbats.batter = gameday.batters.ID 

you will have these result from three table

name_display_first_last GameAtBatID Result Batter -------------------------------------------------------------------------- Jay Bruce 1 Called Strike 457803 Jay Bruce 1 Ball 457803 Jay Bruce 1 Swinging Strike 457803 Jay Bruce 1 Ball 457803 Jay Bruce 1 Foul 457803 Jay Bruce 1 Foul 457803 Jay Bruce 1 Swinging Strike 457803 Jeff Keppinger 2 Ball 433898 Jeff Keppinger 2 In play, out(s) 433898 David Newhan 3 Called Strike 458015 David Newhan 3 Ball 458015 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.