Source language: Translate to:

obtain information from a website

Questions and information about creating Internet aware NeoBook applications. Including PHP, HTML, FTP, HTTP, Email, etc.

Moderator: Neosoft Support

obtain information from a website

Postby SabrinaE » Thu Nov 26, 2009 8:44 pm

hello,
I'd like to know if there is a way to retrieve offers rental or sale of real estate. Here is an example site: http://www.guyhoquet-reunion.com?

I would get such a list of all rental offers and incorporate them into neobookdb. Is it possible? thank you for your help
SabrinaE
 
Posts: 182
Joined: Fri Mar 10, 2006 11:51 am

Postby Tony Kroos » Sat Nov 28, 2009 6:26 am

You can send requests to the site engine and take necessary information from the code of request results page. I doubt that it is possible to obtain a ready-made database of the site, unless you hack it.
Tony Kroos
 
Posts: 402
Joined: Thu Oct 15, 2009 3:43 pm

Postby Gaev » Sat Nov 28, 2009 6:37 am

maxoureunion:

Do you own (or have control over) the website in question ?

If you do, you can create a special request ... which can be initiated via InternetGet/InternetPost commands ... and have the website script return comma separated list of values in a plain text (without fancy html tags) format ... which would then be easy to parse within NeoBook.

If you don't, there might be some legal issues about re-distribution of proprietary data anyways.
User avatar
Gaev
 
Posts: 3716
Joined: Fri Apr 01, 2005 7:48 am
Location: Toronto, Canada

Postby SabrinaE » Sat Nov 28, 2009 7:57 am

le directeur d'agence immobilière me donne l'autorisation d'exploiter les photos et textes qui se trouvent sur le site. je cherche à réaliser une présentation multimédia plus attractives que le site. j'ai besoin de votre aide pour acquérir ses photos automatiquement par une requête. Quels méthodes existe t-il ?

director of real estate gives me permission to use photos and text found on the site. I try to make a multimedia presentation more attractive the site. I need your help to develop photos automatically by a query. What methods are there?

thanks
SabrinaE
 
Posts: 182
Joined: Fri Mar 10, 2006 11:51 am

Postby dpayer » Sat Nov 28, 2009 8:29 am

maxoureunion wrote:director of real estate gives me permission to use photos and text found on the site. I try to make a multimedia presentation more attractive the site. I need your help to develop photos automatically by a query. What methods are there?

thanks


You can retrieve the HTML of the page using the InternetGet command in NB. Then you have to parse out the document to find images and other relevant data.

For images you have to look for something like <src image= > tag

Code: Select all
 <img src="templates/subSilver/images/icon_minipost.gif>


This gif is found on this page of the forum so the URL will be http://www.neosoftware.com/forum/templa ... nipost.gif . Remember Linux servers are case sensitive.

You may need to look for a section of html code that represents the area you are interested in and then search only in that section.

It is similar when attempting to get data from a webpage. The webpage developer made great efforts to take simple data from a database and wrap the necessary code around it to make it look nice. Now you have to make great efforts to unwrap that code and get back to the data. There is no simple way to do this.

David P
User avatar
dpayer
 
Posts: 1380
Joined: Mon Apr 11, 2005 5:55 am
Location: Iowa - USA

Postby Tony Kroos » Sat Nov 28, 2009 8:42 am

there's nothing complicated, just analyze the resulting html-page code and extract the necessary information. Also, you can use VBScript to perform all of these operations.
Tony Kroos
 
Posts: 402
Joined: Thu Oct 15, 2009 3:43 pm

Postby Gaev » Sat Nov 28, 2009 8:42 am

maxoureunion:
I need your help to develop photos automatically by a query.
When I filled in the form on the Home page and made a request to search, the resulting page had a url (web address) like ...

http://www.guyhoquet-reunion.com/REP_1/ ... ces=1&ci=&

... so you can ...

a) either use this structure to compose the values for your InternetGet command ... and then parse the contents of the [variable] for references to image files.

b) or specify such a URL for one of your WebBrowser objects ... then deploy javascript/vbscript code to extract all image elements' src filenames (beyond the scope of this free forum assist).

c) or (using Firefox Browser) right click on any image ... and then select the option called "Copy Image Location" ... which will copy the URL of the image to the Clipboard ... so you can follow it up with a button in your NeoBook Application that can paste the same into a NeoBook [variable] or [array] ...
Code: Select all
If "[myCount]" "=" ""
  SetVar "[myCount]" "!0"
EndIf
SetVar "[myCount]" "1+[myCount]"
SetVar "[myImageURL[myCount]]" "[Clipboard]"
... and then use the DownloadFile command to download each file to your local disk ... not fully automated, but easy/efficient to implement.

Some of the image file URLs I found were like ...

http://d.visuels.poliris.com/thumbnails ... 6-6cc1.jpg
http://5.visuels.poliris.com/2d/5/1/b/4 ... a-622d.jpg
http://www.guyhoquet-reunion.com/z/weba ... lphoto.jpg
User avatar
Gaev
 
Posts: 3716
Joined: Fri Apr 01, 2005 7:48 am
Location: Toronto, Canada

Postby Wrangler » Sat Nov 28, 2009 9:11 am

As David says, this is not a simple thing to do. Each web site is different and will display with different source code. But you can probably get it done with some basic detective work.

First, do a query on the site. Look at the url:

http://www.guyhoquet-reunion.com/REP_1/ ... ng=fr&ci=&

It should be safe to assume that the url for every query will be pretty much the same, with just the parameters changing.

Now look at the source code of the page. Decide what information or images you want to grab, and look for consistancies in the code. The html on these pages should be the same, with just the data changing.

This is where the real detective work comes in. You must determine what snippets of code on these pages you want to grab. Then, Using the action DownloadFile, download the results page, and use neobook to parse and manipulate the page code until you get what you want, which you can store in a flat file. Just change the parameters in the url as needed.

On this web site, it will usually display more than 1 page of results. Click on page 2 link, and look at the url:

http://www.guyhoquet-reunion.com/REP_1/ ... NN_QRYpg=2

Note the "pg=2" on the end. Using Downloadfile, you can grab subsequent pages by increasing the page number at the end of this url. Once you've downloaded all the pages, you can loop through and grab all the data.

I hope some of this will make sense to you. It is difficult to explain, and I don't understand French. But it might make enough sense to point you in the right direction. It should also be possible to use internetget and post, but I've never tried it that way. Using the basic concept I've described, it may even be easier using that method.

I have done this with success on sites such as the IMDB movies database site, and it works great. But keep in mind it won't be easy. A lot of testing and tweaking in neobook. Knowing string manipulation in neobook helps A LOT.

Anyway, I hope this helps.
Wrangler
--------------
"You never know about a woman. Whether she'll laugh, cry or go for a gun." - Louis L'Amour

Windows 7 Ultimate SP1 64bit
16GB Ram
Asus GTX 950 OC Strix
Software made with NeoBook
http://highdesertsoftware.com
User avatar
Wrangler
 
Posts: 1505
Joined: Thu Mar 31, 2005 11:40 pm
Location: USA

Postby SabrinaE » Sun Nov 29, 2009 11:13 am

thank you all. I have tried to obtain information with (InternetGet) see this link: http://testmax974.ifrance.com/neobook/citations.zip

but the work is long. I thought it was a faster way to get all this information. thank you for your advice. I'm going to start in this direction for my research.
SabrinaE
 
Posts: 182
Joined: Fri Mar 10, 2006 11:51 am


Return to NeoBook and the Internet

Who is online

Users browsing this forum: No registered users and 0 guests

cron