Page 1 of 1

obtain information from a website

PostPosted: Thu Nov 26, 2009 8:44 pm
by SabrinaE
hello,
I'd like to know if there is a way to retrieve offers rental or sale of real estate. Here is an example site: http://www.guyhoquet-reunion.com?

I would get such a list of all rental offers and incorporate them into neobookdb. Is it possible? thank you for your help

PostPosted: Sat Nov 28, 2009 6:26 am
by Tony Kroos
You can send requests to the site engine and take necessary information from the code of request results page. I doubt that it is possible to obtain a ready-made database of the site, unless you hack it.

PostPosted: Sat Nov 28, 2009 6:37 am
by Gaev
maxoureunion:

Do you own (or have control over) the website in question ?

If you do, you can create a special request ... which can be initiated via InternetGet/InternetPost commands ... and have the website script return comma separated list of values in a plain text (without fancy html tags) format ... which would then be easy to parse within NeoBook.

If you don't, there might be some legal issues about re-distribution of proprietary data anyways.

PostPosted: Sat Nov 28, 2009 7:57 am
by SabrinaE
le directeur d'agence immobilière me donne l'autorisation d'exploiter les photos et textes qui se trouvent sur le site. je cherche à réaliser une présentation multimédia plus attractives que le site. j'ai besoin de votre aide pour acquérir ses photos automatiquement par une requête. Quels méthodes existe t-il ?

director of real estate gives me permission to use photos and text found on the site. I try to make a multimedia presentation more attractive the site. I need your help to develop photos automatically by a query. What methods are there?

thanks

PostPosted: Sat Nov 28, 2009 8:29 am
by dpayer
maxoureunion wrote:director of real estate gives me permission to use photos and text found on the site. I try to make a multimedia presentation more attractive the site. I need your help to develop photos automatically by a query. What methods are there?

thanks


You can retrieve the HTML of the page using the InternetGet command in NB. Then you have to parse out the document to find images and other relevant data.

For images you have to look for something like <src image= > tag

Code: Select all
 <img src="templates/subSilver/images/icon_minipost.gif>


This gif is found on this page of the forum so the URL will be http://www.neosoftware.com/forum/templa ... nipost.gif . Remember Linux servers are case sensitive.

You may need to look for a section of html code that represents the area you are interested in and then search only in that section.

It is similar when attempting to get data from a webpage. The webpage developer made great efforts to take simple data from a database and wrap the necessary code around it to make it look nice. Now you have to make great efforts to unwrap that code and get back to the data. There is no simple way to do this.

David P

PostPosted: Sat Nov 28, 2009 8:42 am
by Tony Kroos
there's nothing complicated, just analyze the resulting html-page code and extract the necessary information. Also, you can use VBScript to perform all of these operations.

PostPosted: Sat Nov 28, 2009 8:42 am
by Gaev
maxoureunion:
I need your help to develop photos automatically by a query.
When I filled in the form on the Home page and made a request to search, the resulting page had a url (web address) like ...

http://www.guyhoquet-reunion.com/REP_1/ ... ces=1&ci=&

... so you can ...

a) either use this structure to compose the values for your InternetGet command ... and then parse the contents of the [variable] for references to image files.

b) or specify such a URL for one of your WebBrowser objects ... then deploy javascript/vbscript code to extract all image elements' src filenames (beyond the scope of this free forum assist).

c) or (using Firefox Browser) right click on any image ... and then select the option called "Copy Image Location" ... which will copy the URL of the image to the Clipboard ... so you can follow it up with a button in your NeoBook Application that can paste the same into a NeoBook [variable] or [array] ...
Code: Select all
If "[myCount]" "=" ""
  SetVar "[myCount]" "!0"
EndIf
SetVar "[myCount]" "1+[myCount]"
SetVar "[myImageURL[myCount]]" "[Clipboard]"
... and then use the DownloadFile command to download each file to your local disk ... not fully automated, but easy/efficient to implement.

Some of the image file URLs I found were like ...

http://d.visuels.poliris.com/thumbnails ... 6-6cc1.jpg
http://5.visuels.poliris.com/2d/5/1/b/4 ... a-622d.jpg
http://www.guyhoquet-reunion.com/z/weba ... lphoto.jpg

PostPosted: Sat Nov 28, 2009 9:11 am
by Wrangler
As David says, this is not a simple thing to do. Each web site is different and will display with different source code. But you can probably get it done with some basic detective work.

First, do a query on the site. Look at the url:

http://www.guyhoquet-reunion.com/REP_1/ ... ng=fr&ci=&

It should be safe to assume that the url for every query will be pretty much the same, with just the parameters changing.

Now look at the source code of the page. Decide what information or images you want to grab, and look for consistancies in the code. The html on these pages should be the same, with just the data changing.

This is where the real detective work comes in. You must determine what snippets of code on these pages you want to grab. Then, Using the action DownloadFile, download the results page, and use neobook to parse and manipulate the page code until you get what you want, which you can store in a flat file. Just change the parameters in the url as needed.

On this web site, it will usually display more than 1 page of results. Click on page 2 link, and look at the url:

http://www.guyhoquet-reunion.com/REP_1/ ... NN_QRYpg=2

Note the "pg=2" on the end. Using Downloadfile, you can grab subsequent pages by increasing the page number at the end of this url. Once you've downloaded all the pages, you can loop through and grab all the data.

I hope some of this will make sense to you. It is difficult to explain, and I don't understand French. But it might make enough sense to point you in the right direction. It should also be possible to use internetget and post, but I've never tried it that way. Using the basic concept I've described, it may even be easier using that method.

I have done this with success on sites such as the IMDB movies database site, and it works great. But keep in mind it won't be easy. A lot of testing and tweaking in neobook. Knowing string manipulation in neobook helps A LOT.

Anyway, I hope this helps.

PostPosted: Sun Nov 29, 2009 11:13 am
by SabrinaE
thank you all. I have tried to obtain information with (InternetGet) see this link: http://testmax974.ifrance.com/neobook/citations.zip

but the work is long. I thought it was a faster way to get all this information. thank you for your advice. I'm going to start in this direction for my research.