7

I'm trying to set up two PostgreSQL servers on one machine and perform a streaming replication. I have succeeded once, but when i've tried again following exactly the same steps it doesn't work.. Those are the steps: I have $PGDATA = home/postgresql/9.1/data and $STANDBY = home/postgresql/9.1/data2

  1. Set up two nodes:
     initdb -D $PGDATA initdb -D $STANDBY
  2. In the master node create a user for replication. I do that in pgAdmin (it does have superuser privileges)
  3. In the master node in pg_hba.conf add the part that allows standby to connect: host replication repuser 127.0.0.1/0 md5
  4. In the master node in postgresql.conf set:

    max_wal_senders = 1 archive_mode = on archive_command = 'cp %p ~/postgresql/backup/archivedir/%f' wal_level = archive wal_keep_segments = 32

  5. Start the master node and do the base backup:

    psql -d dellstore2 -c "SELECT pg_start_backup('backup for replication', true)" rsync -av ${PGDATA}/ $STANDBY --exclude postmaster.pid psql -d dellstore2 -c "select pg_stop_backup()" 
    pg_stop_backup says that everything is fine, all the WAL files were archived

  6. In the standby (data2) node I create recovery.conf with:

    standby_mode = 'on' primary_conninfo = 'host=127.0.0.1 port=5432 user=repuser password=haslo' trigger_file = '/home/michau/postgresql/replication.trigger' restore_command = 'cp /home/michau/postgresql/backup/archivedir/%f "%p"'

  7. Start the master node, then start the standby node - replication should start and standby should catch up with the master. That was exactly what happened the first time. Now when I start the standby I get: "Address already in use" error. Of course both standby and master have the same port specified in postgresql.conf (they have exactly the same postgresql.conf files). If I change the port in standby to let's say 5433 then I get:

     LOG: database system was shut down in recovery at 2012-06-12 19:48:01 CEST LOG: entering standby mode cp: cannot stat /home/michau/postgresql/backup/archivedir/000000010000000000000007: No such file or directory LOG: consistent recovery state reached at 0/7000070 LOG: record with zero length at 0/7000070 cp: cannot stat /home/michau/postgresql/backup/archivedir/000000010000000000000007: No such file or directory LOG: streaming replication successfully connected to primary LOG: redo starts at 0/7000070 

And it just hangs here. Running ps -ef | grep postgresql yields:

 michau 2491 1898 0 19:46 pts/0 00:00:00 postgres -D /home/michau/postgresql/9.1/data michau 2493 2491 0 19:46 ? 00:00:01 postgres: writer process michau 2494 2491 0 19:46 ? 00:00:00 postgres: wal writer process michau 2495 2491 0 19:46 ? 00:00:00 postgres: autovacuum launcher process michau 2496 2491 0 19:46 ? 00:00:00 postgres: archiver process last was 000000010000000000000008 michau 2497 2491 0 19:46 ? 00:00:00 postgres: stats collector process michau 2571 2214 0 19:49 pts/1 00:00:00 postgres -D /home/michau/postgresql/9.1/data2 michau 2572 2571 0 19:49 ? 00:00:01 postgres: startup process recovering 000000010000000000000009 michau 2575 2571 0 19:49 ? 00:00:01 postgres: writer process michau 2578 2571 0 19:49 ? 00:00:02 postgres: wal receiver process streaming 0/99782DC michau 2579 2491 0 19:49 ? 00:00:00 postgres: wal sender process repuser 127.0.0.1(42142) streaming 0/99782DC michau 2586 2491 0 19:51 ? 00:00:00 postgres: michau postgres ::1(49941) idle michau 2587 2491 0 19:51 ? 00:00:01 postgres: michau dellstore2 ::1(49942) idle 
The recovering 0000000010000009 where changing for a while, but for half an hour it doesn't anymore.

I'm sure there is something I must have done the first time and not written down or something, but I am at a complete loss to say what it was. I would appreciate any help.

4
  • To clarify your configuration on the slave, what port numbers do you for these keys in postgresql.conf? 'port' and 'primary_conninfo' ? The former should be 5433 and the latter 5432. Commented Jun 12, 2012 at 19:35
  • 1
    That's what I have right now, but the replication is not working. Shouldn't the ports be the same in master and slave so in case of failover the slave can replace the master? Commented Jun 12, 2012 at 19:50
  • 2
    So what doesn't work? According to the ps output, the replication seems to work. Commented Jun 12, 2012 at 23:06
  • 3
    The ports cannot be the same when they run on the same machine. Failover is normally done with more then one physical machine as with a single machine it is pointless. Commented Jun 13, 2012 at 5:34

3 Answers 3

8

PostgreSQL replicas never finish recovering. This is by design. Basically a replica is always in "recovering from disaster" mode except that it is using receiving the WAL segments from the master rather than on disk.

So what you are seeing is not cause for concern. If it is not working yet, then you will need to provide a more detailed description of what you are trying to do and what is not working. But as far as you are posting it seems normal.

0

The steps above are not for a Streaming Replication but for a WAL Archiving Replication setup. If you want to setup Streaming Replication take a look at this instruction Streaming Replication Setup.

0

Are you starting PostgreSQL on the standby using the postgresql command ? If yes then you should use the pg_ctl command as it detaches the DB server process from the terminal. With the postgresql command it will only display the log output on the screen like you have described and stay there because it is a standby server.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.