I'm trying to read html code from a URL Connection. In one case the html file I'm trying to read includes 5 line breaks before the actual doc type declaration. In this case the input reader throws an exception for EOF.
URL pageUrl = new URL( "http://www.nytimes.com/2011/03/15/sports/basketball/15nbaround.html" ); URLConnection getConn = pageUrl.openConnection(); getConn.connect(); DataInputStream dis = new DataInputStream(getConn.getInputStream()); //some read method here Has anyone ran into a problem like this?
URL pageUrl = new URL("http://www.nytimes.com/2011/03/15/sports/basketball/15nbaround.html"); URLConnection getConn = pageUrl.openConnection(); getConn.connect(); DataInputStream dis = new DataInputStream(getConn.getInputStream()); String urlData = ""; while ((urlData = dis.readUTF()) != null) System.out.println(urlData); //exception thrown
java.io.EOFException at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323) at java.io.DataInputStream.readUTF(DataInputStream.java:572) at java.io.DataInputStream.readUTF(DataInputStream.java:547)
in the case of bufferedreader, it just responds null and doesn't continue
pageUrl = new URL("http://www.nytimes.com/2011/03/15/sports/basketball/15nbaround.html"); URLConnection getConn = pageUrl.openConnection(); getConn.connect(); BufferedReader br = new BufferedReader(new InputStreamReader(getConn.getInputStream())); String urlData = ""; while(true) urlData = br.readLine(); System.out.println(urlData); outputs null