• Howdy! Welcome to our community of more than 130.000 members devoted to web hosting. This is a great place to get special offers from web hosts and post your own requests or ads. To start posting sign up here. Cheers! /Peo, FreeWebSpace.net
managed wordpress hosting

grabbing html from a separate page

keith

anti-liberal
NLC
grabbing html from a separate page with cgi

how would i do this? like i have a page at something.com/page.cgi and i want to include the sourcecode of http://www.geocities.com/keith in the html.

i tried the format:

Code:
print "Content-type: text/html\n\n";
include "http://www.geocities.com/keith";

but that didn't work. i can get it in php, but i need it in cgi, and i'm somewhat cgi dumb. any suggestions?
 
Last edited:
when you say include the source code is that

a) include the code not processed

b) include the output of the code (run on other sever)

c) include the source into your cgi and run the whole thing
 
i mean just include the textual html on the page being copied from.


so if the entire index page at geocities.com/keith read simply:
Code:
<title>whoopity doo</title>
...then it would print the text "<title>whoopity doo</title>" in the cgi output wherever the command appeared. basically just needs to print whetever html is on the page being grabbed. i just don't know how to get it to work.


php uses
Code:
<? include "http://geocities.com/keith" ?>
...and it works fine, but it's gotta be cgi.
 
Last edited:
perl

Code:
use LWP::Simple;
 
$url="http://domain.com/mypage.html";
print "Content-type: text/html\n\n";
getprint($url);

should work
 
dude, you rock!

quick question though, i'm using this for a new redirection option at wzr.net, to make your subdomain appear as a real address and not a framed redirect. it acts like ip pointing, only it points at a webpage, not an ip.

so, if you visit http://keith.wzr.net, it goes to http://www.weezer.com with this new redirection format

if you right click the page and hit 'properties', you'll see the page reads as keith.wzr.net. so it appears that keith.wzr.net is a real domain rather than a redirect.

but if you right click on an image and hit 'properties', you'll see it reads 'keith.wzr.net/imagename.jpg'. so not only does it load the html through the subdomain, it looks like the subdomain is actually hosting the images.

so what i'm wondering is, does it really load the images through wzr.net? or directly from the weezer.com server and only appear to be hosted by keith.wzr.net? because if it loads it off weezer.com though the redirect, i'm not going to do it, unfortunately, that would obviously eat up a ton of bandwidth.

if that's the case, i'd add
Code:
<base href="$url">
in there as well, that'd take care of the problem, but not before i figure out how it loads the images. it's actually kind of cool to see 'keith.wzr.net/image.jpg', but there's always the bandwidth deal...
 
Last edited:
since it's server side, the images are going to load through a script much like the one lucifer posted, gives the information from your server to the client and so it uses up your bandwidth. :(
 
damn... thanks a lot. looks like i'll be adding the <base> tag.

well, at least it'll still load the html through it, making it look like real domain.
 
maybe you could do something like


print "Location: $real_url\n\n";

if it's an image etc (or not text)
 
actually that's what the <base href="$url"> tag would do, but it's not 100% failsafe.

i'd need a way of locating code with .jpg, .gif, .zip, etc... extensions and parse in the real url. any ideas, oh cgi guru? :)

hey, any way i could have the cgi locate:
Code:
<img src="

in the code and change them all to:
Code:
<img src="$url/

? ...that would be pretty sweet, but probably more work than it's worth.
 
Last edited:
use LWP::Simple;
$content = get("http://.....");


would let you parse the $content and then print it out so change the links just use reg expressions


also in LWP:Simple

head($url)

Get document headers. Returns the following 5 values if successful: ($content_type, $document_length, $modified_time, $expires, $server)

use that before and decide if to give real or fake info

ie pass through or use print "location: .....\n\n"

do you want all this extra bandwidth?
 
alright, i think you lost me there. what would i have to do to this code to get what you're saying?:

Code:
{
	use LWP::Simple;
	print "Content-type: text/html\n\n";
	getprint($url);
}

i'm guessing it shouldn't be too much more bandwidth if i can filter out everything but the html. hmmm, but maybe i'm wrong.
 
yep, not very well written I needed to get to bed.

two things

1) you could check file types using head to see if things are text/html it's proberbly easier to just look for .htm and .html

2) using get you can get the contents of a page and then you could change all the href/src as wanted using regular expressions

so instead of
Code:
use LWP::Simple;
	print "Content-type: text/html\n\n";
	getprint($url);

you have

Code:
use LWP::Simple;
	
	$content=get($url);
                # transform paths - you need a better reg exp this (just example)
                $content =~ s!<img src="!<img src="$url/!ig;

                
                print "Content-type: text/html\n\n";
                print $content;
 
I would also like to know how to do this but via php. I want a script that say you put: www.mydomain.com it will have someting similar to what keith.wzr.net has (full pathforwarding etc, so it looks like a real domain).

I need this via PHP if anyone has any information it would be greatly appreciated.
 
PHP:
<? 
# get html
$stuff=open("http://mydomain.com/page.html");
#rewrite urls etc
$stuff=preg_replace(    some patten , some replace , $stuff)
#output
echo $stuff;
?>
 
Thanks lucifer didnt work though:

Parse error: parse error in /home/test/public_html/test.php on line 5
 
Back
Top