Skip to main content
1 of 4
Richard
  • 133
  • 5

Mongodb Replica Member Server Changed: Status Stuck In STARTUP. Missing config

I am on mongo v3.0.x and have a replication set with 3 members, but the second member (and entire server) was unexpectedly taken down and destroyed. I have stood up a new server to use as a replacement for the the second member and need up bringing it back to a working state. I am not sure if this is the proper way to bring a member back in, even though they have the same address (mongochat02).

Do I need to do a rs.reconfig() or remove(mongochat02) and add(mongochat02) again? Or am I supposed to follow a different procedure to get this member working?

When I issue rs.status():

001-rs:PRIMARY> rs.status() { "set" : "001-rs", "date" : ISODate("2017-04-07T19:39:23.860Z"), "myState" : 1, "term" : NumberLong(-1), "heartbeatIntervalMillis" : NumberLong(2000), "members" : [ { "_id" : 0, "name" : "mongochat01:27017", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 174004, "optime" : Timestamp(1491593961, 4), "optimeDate" : ISODate("2017-04-07T19:39:21Z"), "lastHeartbeat" : ISODate("2017-04-07T19:39:22.386Z"), "lastHeartbeatRecv" : ISODate("2017-04-07T19:39:21.977Z"), "pingMs" : NumberLong(0), "syncingTo" : "mongochat03:27017", "configVersion" : 3 }, { "_id" : 1, "name" : "mongochat02:27017", "health" : 0, "state" : 8, "stateStr" : "(not reachable/healthy)", "uptime" : 0, "optime" : Timestamp(0, 0), "optimeDate" : ISODate("1970-01-01T00:00:00Z"), "lastHeartbeat" : ISODate("2017-04-07T19:39:23.672Z"), "lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"), "pingMs" : NumberLong(0), "lastHeartbeatMessage" : "Operation timed out", "configVersion" : -1 }, { "_id" : 2, "name" : "mongochat03:27017", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 15127641, "optime" : Timestamp(1491593963, 2), "optimeDate" : ISODate("2017-04-07T19:39:23Z"), "electionTime" : Timestamp(1491419961, 1), "electionDate" : ISODate("2017-04-05T19:19:21Z"), "configVersion" : 3, "self" : true } ], "ok" : 1 } 

While on mongochat02, when I issue rs.status()

> rs.status() { "info" : "run rs.initiate(...) if not yet done for the set", "ok" : 0, "errmsg" : "no replset config has been received", "code" : 94 } 

Connections seem to work up until the heartbeat. In the logs, there is the following error while on mongochat02:

2017-04-06T10:42:11.831-0600 I REPL [ReplicationExecutor] Error in heartbeat request mongochat01:27017; ExceededTimeLimit: Operation timed out 2017-04-06T10:42:11.911-0600 I NETWORK [initandlisten] connection accepted from 10.1.240.185:36358 #6671 (151 connections now open) 2017-04-06T10:42:11.947-0600 I REPL [ReplicationExecutor] Error in heartbeat request to mongochat03:27017; ExceededTimeLimit: Operation timed out 

This made me question if the members are unable to communicate with one another:

  • All the members are able to ping one another

  • But mongochat02 is unable to connect to mongochat03/01 through shell

     [root@mongochat02]$ mongo --host mongochat03:27017 MongoDB shell version: 3.2.9 connecting to: mongochat03:27017/test 

    2017-04-07T14:02:03.411-0600 I NETWORK [thread1] Socket recv() errno:110 Connection timed out mongochat03:27017

    2017-04-07T14:02:03.411-0600 I NETWORK [thread1] SocketException: remote: (NONE):0 error: 9001 socket exception [RECV_ERROR] server [mongochat03:27017]

Richard
  • 133
  • 5