C++ reading info from a webpage
C++ reading info from a webpage
i want my program to read just a line of information from a cite, for example a "daily update" and just have my program copy that information like this
www.examplesite.com
webpage:
Daily Update: First Day Of Program Release
end of webpage
thats all the site willl display
how can i get my C++ win32 program to read that line from that site?
www.examplesite.com
webpage:
Daily Update: First Day Of Program Release
end of webpage
thats all the site willl display
how can i get my C++ win32 program to read that line from that site?
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: C++ reading info from a webpage
1: Implement the HTTP protocol to download the page in question
2: Parse the XML to get the wanted data.
2: Parse the XML to get the wanted data.
Re: C++ reading info from a webpage
That's rather far fetched . See : http://www.w3.org/Library/Combuster wrote:Re: C++ reading info from a webpage
1: Implement the HTTP protocol to download the page in question
2: Parse the XML to get the wanted data.
--Thomas
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: C++ reading info from a webpage
Where did I say that there wasn't a library that does most of that for you.
Re: C++ reading info from a webpage
Hi,
Yeah .. that's right
Yeah .. that's right
--ThomasProgram to an interface not an implementation
Re: C++ reading info from a webpage
On Windows, either use the WinInet API or URLDownloadToCacheFile.
If the data is formatted as HTML, you can use the MSHTML component to manipulate it.
If the data is formatted as HTML, you can use the MSHTML component to manipulate it.
Re: C++ reading info from a webpage
Thank you very much
Re: C++ reading info from a webpage
Not being (completely) serious here:
Sorry, I just wanted to write a piece of code.
Code: Select all
#include <stdlib.h>
#include <stdio.h>
#define MAXLEN 200
int main()
{
FILE * input;
char infoline[ MAXLEN ];
system( "wget http://www.examplesite.com/index.html" );
input = fopen( "index.html", "r" );
fgets( infoline, MAXLEN, input );
fclose( input );
remove( "index.html" );
puts( infoline );
return 0;
}
Every good solution is obvious once you've found it.
Re: C++ reading info from a webpage
Did you even try to compile that? It doesn't compile, it isn't written in C and more correct name of the API is Berkeley sockets.dak91 wrote:I made this simple url_get function in C/Linux socket
http://www.inventati.org/dak/src/c/geturl.cpp
I made this simple version in C, uses getaddrinfo(3) (better version of gethostbyname(3) and getserbyname(3)). Tested on FreeBSD/amd64 and NetBSD/sparc. link
Re: C++ reading info from a webpage
I made that code 2 years ago, but I remember that it compile correctly...fronty wrote:Did you even try to compile that? It doesn't compile, it isn't written in C and more correct name of the API is Berkeley sockets.dak91 wrote:I made this simple url_get function in C/Linux socket
http://www.inventati.org/dak/src/c/geturl.cpp
I made this simple version in C, uses getaddrinfo(3) (better version of gethostbyname(3) and getserbyname(3)). Tested on FreeBSD/amd64 and NetBSD/sparc. link
anyway thanks for the correction about the api name
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: C++ reading info from a webpage
Reading that code...dak91 wrote:I made that code 2 years ago, but I remember that it compile correctly...fronty wrote:Did you even try to compile that? It doesn't compile, it isn't written in C and more correct name of the API is Berkeley sockets.dak91 wrote:I made this simple url_get function in C/Linux socket
http://www.inventati.org/dak/src/c/geturl.cpp
I made this simple version in C, uses getaddrinfo(3) (better version of gethostbyname(3) and getserbyname(3)). Tested on FreeBSD/amd64 and NetBSD/sparc. link
anyway thanks for the correction about the api name
Code: Select all
using namespace std;
Code: Select all
for(int x=0;x<strlen(serverd.c_str());x++){
Oh, and were executing strlen every freaking time through the loop
Code: Select all
if(serverd.c_str()[x] == '/'){
Code: Select all
y = x;
break;
Oh, wait...
Code: Select all
if(y!=0){
data = server = "";
for(int x=y;x<strlen(serverd.c_str());x++){ data += serverd.c_str()[x]; }
for(int x=0;x<y;x++){ server += serverd.c_str()[x]; }
}
Code: Select all
if(connect(y,(struct sockaddr*) &server_addr, sizeof(server_addr)) != 0){
cout<<"Cannot connect...\n";
return "";
}
Code: Select all
string get_request = "GET "+data+" HTTP/1.1\n\n\n\n";
Oops.
Code: Select all
send(y, get_request.c_str(), strlen(get_request.c_str()), 0);
Oh wait, it won't...
Code: Select all
char dat[10000];
for(int x=0;x<10000;x++){ dat[x] = '\0'; }
And what if my page is bigger than 10kb? Did dynamically allocated buffers go out of style?
I mean, we are just forgetting that std::string and std::stringstream are part of the language?
Code: Select all
recv(y, dat, 10000, 0);
close(y);
Code: Select all
y = 0;
char buf;
while(buf!='\0'){
buf = dat[y];
data += buf;
y++;
}
return data;
Incidentally, I notice that fronty's also makes the "send/recv will always do everything in one go" assumption, but is on the whole at least much cleaner.
Its ironic, but Solar's is the only one which works properly.
It's not like me to tear into people's code like this, but theres code with problems and code which is plain bad, and this falls into the latter category.
Seriously people, just use QNetworkAccessManager, or libwww, or libcurl.
Re: C++ reading info from a webpage
Damn, should've read it couple times more. Not enough network programming for me in last years.Owen wrote:Incidentally, I notice that fronty's also makes the "send/recv will always do everything in one go" assumption, but is on the whole at least much cleaner.
Re: C++ reading info from a webpage
I've made it when I just started programmingOwen wrote: Reading that code...
OMG. Bringing in craptonne of unknown crap. Crazy.Code: Select all
using namespace std;
We've never heard of std::string::length() now? Which is far faster?Code: Select all
for(int x=0;x<strlen(serverd.c_str());x++){
Oh, and were executing strlen every freaking time through the loop
Wait, so were reimplementing std::string::find/strchr now?Code: Select all
if(serverd.c_str()[x] == '/'){
Again with the one letter variables. I'm glad nobody is trying to read your code.Code: Select all
y = x; break;
Oh, wait...
Yay! Lets poorly reimplement std::string's substring constructorCode: Select all
if(y!=0){ data = server = ""; for(int x=y;x<strlen(serverd.c_str());x++){ data += serverd.c_str()[x]; } for(int x=0;x<y;x++){ server += serverd.c_str()[x]; } }
Oh, I encountered an error. Lets print it. Not, you know, return it to the caller. Not emit it to the error stream either.Code: Select all
if(connect(y,(struct sockaddr*) &server_addr, sizeof(server_addr)) != 0){ cout<<"Cannot connect...\n"; return ""; }
Yargh! Lets build an invalid HTTP/1.1 request (For a start, you're missing the required Host: header. And you definitely want that one, too. I mean, it would be such a shame if 99% of websites didn't work.Code: Select all
string get_request = "GET "+data+" HTTP/1.1\n\n\n\n";
Oops.
Lets pretend that the OS will always send my data in one go.Code: Select all
send(y, get_request.c_str(), strlen(get_request.c_str()), 0);
Oh wait, it won't...
Wait, we are reinventing memset now?!Code: Select all
char dat[10000]; for(int x=0;x<10000;x++){ dat[x] = '\0'; }
And what if my page is bigger than 10kb? Did dynamically allocated buffers go out of style?
I mean, we are just forgetting that std::string and std::stringstream are part of the language?
Lets pretend the OS will always return the page in one go...Code: Select all
recv(y, dat, 10000, 0); close(y);
Code: Select all
y = 0; char buf; while(buf!='\0'){ buf = dat[y]; data += buf; y++; } return data;
Re: C++ reading info from a webpage
Send won't return until it has sent everything or it fails, unless the socket is set to non-blocking mode. Recv, however, may return less data than requested.