Page 1 of 2

C++ reading info from a webpage

Posted: Wed Jun 09, 2010 12:09 am
by VolTeK
i want my program to read just a line of information from a cite, for example a "daily update" and just have my program copy that information like this


www.examplesite.com

webpage:

Daily Update: First Day Of Program Release

end of webpage
thats all the site willl display

how can i get my C++ win32 program to read that line from that site?

Re: C++ reading info from a webpage

Posted: Wed Jun 09, 2010 2:33 am
by Combuster
1: Implement the HTTP protocol to download the page in question
2: Parse the XML to get the wanted data.

Re: C++ reading info from a webpage

Posted: Wed Jun 09, 2010 3:12 am
by Thomas
Combuster wrote:Re: C++ reading info from a webpage
1: Implement the HTTP protocol to download the page in question
2: Parse the XML to get the wanted data.
That's rather far fetched :wink: . See : http://www.w3.org/Library/

--Thomas

Re: C++ reading info from a webpage

Posted: Wed Jun 09, 2010 4:29 am
by Combuster
Where did I say that there wasn't a library that does most of that for you. :wink:

Re: C++ reading info from a webpage

Posted: Wed Jun 09, 2010 4:57 am
by Thomas
Hi,
Yeah .. that's right :mrgreen:
Program to an interface not an implementation
--Thomas

Re: C++ reading info from a webpage

Posted: Wed Jun 09, 2010 3:17 pm
by Gigasoft
On Windows, either use the WinInet API or URLDownloadToCacheFile.

If the data is formatted as HTML, you can use the MSHTML component to manipulate it.

Re: C++ reading info from a webpage

Posted: Wed Jun 09, 2010 5:02 pm
by VolTeK
Thank you very much

Re: C++ reading info from a webpage

Posted: Thu Jun 10, 2010 7:00 am
by dak91
I made this simple url_get function in C/Linux socket

http://www.inventati.org/dak/src/c/geturl.cpp

Re: C++ reading info from a webpage

Posted: Thu Jun 10, 2010 9:53 am
by Solar
Not being (completely) serious here:

Code: Select all

#include <stdlib.h>
#include <stdio.h>

#define MAXLEN 200

int main()
{
    FILE * input;
    char infoline[ MAXLEN ];
    system( "wget http://www.examplesite.com/index.html" );
    input = fopen( "index.html", "r" );
    fgets( infoline, MAXLEN, input );
    fclose( input );
    remove( "index.html" );
    puts( infoline );
    return 0;
}
Sorry, I just wanted to write a piece of code. 8)

Re: C++ reading info from a webpage

Posted: Thu Jun 10, 2010 9:54 am
by fronty
dak91 wrote:I made this simple url_get function in C/Linux socket

http://www.inventati.org/dak/src/c/geturl.cpp
Did you even try to compile that? It doesn't compile, it isn't written in C and more correct name of the API is Berkeley sockets.

I made this simple version in C, uses getaddrinfo(3) (better version of gethostbyname(3) and getserbyname(3)). Tested on FreeBSD/amd64 and NetBSD/sparc. link

Re: C++ reading info from a webpage

Posted: Thu Jun 10, 2010 11:01 am
by dak91
fronty wrote:
dak91 wrote:I made this simple url_get function in C/Linux socket

http://www.inventati.org/dak/src/c/geturl.cpp
Did you even try to compile that? It doesn't compile, it isn't written in C and more correct name of the API is Berkeley sockets.

I made this simple version in C, uses getaddrinfo(3) (better version of gethostbyname(3) and getserbyname(3)). Tested on FreeBSD/amd64 and NetBSD/sparc. link
I made that code 2 years ago, but I remember that it compile correctly...

anyway thanks for the correction about the api name

Re: C++ reading info from a webpage

Posted: Thu Jun 10, 2010 2:32 pm
by Owen
dak91 wrote:
fronty wrote:
dak91 wrote:I made this simple url_get function in C/Linux socket

http://www.inventati.org/dak/src/c/geturl.cpp
Did you even try to compile that? It doesn't compile, it isn't written in C and more correct name of the API is Berkeley sockets.

I made this simple version in C, uses getaddrinfo(3) (better version of gethostbyname(3) and getserbyname(3)). Tested on FreeBSD/amd64 and NetBSD/sparc. link
I made that code 2 years ago, but I remember that it compile correctly...

anyway thanks for the correction about the api name
Reading that code...

Code: Select all

using namespace std;
OMG. Bringing in craptonne of unknown crap. Crazy.

Code: Select all

	for(int x=0;x<strlen(serverd.c_str());x++){ 
We've never heard of std::string::length() now? Which is far faster?

Oh, and were executing strlen every freaking time through the loop

Code: Select all

		if(serverd.c_str()[x] == '/'){ 
Wait, so were reimplementing std::string::find/strchr now?

Code: Select all

			y = x; 
			break; 
Again with the one letter variables. I'm glad nobody is trying to read your code.

Oh, wait...

Code: Select all

	if(y!=0){
		data = server = "";
		for(int x=y;x<strlen(serverd.c_str());x++){ data += serverd.c_str()[x]; }
		for(int x=0;x<y;x++){ server += serverd.c_str()[x]; }
	}
Yay! Lets poorly reimplement std::string's substring constructor

Code: Select all

	if(connect(y,(struct sockaddr*) &server_addr, sizeof(server_addr))  != 0){
		cout<<"Cannot connect...\n";
		return ""; 
	}
Oh, I encountered an error. Lets print it. Not, you know, return it to the caller. Not emit it to the error stream either.

Code: Select all

		string get_request = "GET "+data+" HTTP/1.1\n\n\n\n";
Yargh! Lets build an invalid HTTP/1.1 request (For a start, you're missing the required Host: header. And you definitely want that one, too. I mean, it would be such a shame if 99% of websites didn't work.

Oops.

Code: Select all

		send(y, get_request.c_str(), strlen(get_request.c_str()), 0);
Lets pretend that the OS will always send my data in one go.

Oh wait, it won't...

Code: Select all

		char dat[10000];
		for(int x=0;x<10000;x++){ dat[x] = '\0'; }
Wait, we are reinventing memset now?!

And what if my page is bigger than 10kb? Did dynamically allocated buffers go out of style?

I mean, we are just forgetting that std::string and std::stringstream are part of the language? :-(

Code: Select all

		recv(y, dat, 10000, 0);
		close(y);	
Lets pretend the OS will always return the page in one go...

Code: Select all

		y = 0;
		char buf;
		while(buf!='\0'){
			buf = dat[y];
			data += buf;
			y++;
		}
		return data;
Again with the reimplementing basic functionality badly

Incidentally, I notice that fronty's also makes the "send/recv will always do everything in one go" assumption, but is on the whole at least much cleaner.

Its ironic, but Solar's is the only one which works properly.

It's not like me to tear into people's code like this, but theres code with problems and code which is plain bad, and this falls into the latter category.

Seriously people, just use QNetworkAccessManager, or libwww, or libcurl.

Re: C++ reading info from a webpage

Posted: Thu Jun 10, 2010 2:53 pm
by fronty
Owen wrote:Incidentally, I notice that fronty's also makes the "send/recv will always do everything in one go" assumption, but is on the whole at least much cleaner.
Damn, should've read it couple times more. :D Not enough network programming for me in last years.

Re: C++ reading info from a webpage

Posted: Thu Jun 10, 2010 4:55 pm
by dak91
Owen wrote: Reading that code...

Code: Select all

using namespace std;
OMG. Bringing in craptonne of unknown crap. Crazy.

Code: Select all

	for(int x=0;x<strlen(serverd.c_str());x++){ 
We've never heard of std::string::length() now? Which is far faster?

Oh, and were executing strlen every freaking time through the loop

Code: Select all

		if(serverd.c_str()[x] == '/'){ 
Wait, so were reimplementing std::string::find/strchr now?

Code: Select all

			y = x; 
			break; 
Again with the one letter variables. I'm glad nobody is trying to read your code.

Oh, wait...

Code: Select all

	if(y!=0){
		data = server = "";
		for(int x=y;x<strlen(serverd.c_str());x++){ data += serverd.c_str()[x]; }
		for(int x=0;x<y;x++){ server += serverd.c_str()[x]; }
	}
Yay! Lets poorly reimplement std::string's substring constructor

Code: Select all

	if(connect(y,(struct sockaddr*) &server_addr, sizeof(server_addr))  != 0){
		cout<<"Cannot connect...\n";
		return ""; 
	}
Oh, I encountered an error. Lets print it. Not, you know, return it to the caller. Not emit it to the error stream either.

Code: Select all

		string get_request = "GET "+data+" HTTP/1.1\n\n\n\n";
Yargh! Lets build an invalid HTTP/1.1 request (For a start, you're missing the required Host: header. And you definitely want that one, too. I mean, it would be such a shame if 99% of websites didn't work.

Oops.

Code: Select all

		send(y, get_request.c_str(), strlen(get_request.c_str()), 0);
Lets pretend that the OS will always send my data in one go.

Oh wait, it won't...

Code: Select all

		char dat[10000];
		for(int x=0;x<10000;x++){ dat[x] = '\0'; }
Wait, we are reinventing memset now?!

And what if my page is bigger than 10kb? Did dynamically allocated buffers go out of style?

I mean, we are just forgetting that std::string and std::stringstream are part of the language? :-(

Code: Select all

		recv(y, dat, 10000, 0);
		close(y);	
Lets pretend the OS will always return the page in one go...

Code: Select all

		y = 0;
		char buf;
		while(buf!='\0'){
			buf = dat[y];
			data += buf;
			y++;
		}
		return data;
I've made it when I just started programming

Re: C++ reading info from a webpage

Posted: Sun Jun 13, 2010 3:19 pm
by Gigasoft
Send won't return until it has sent everything or it fails, unless the socket is set to non-blocking mode. Recv, however, may return less data than requested.