Source language: Translate to:

Problem with code %20 in filenames

Questions and information about creating Internet aware NeoBook applications. Including PHP, HTML, FTP, HTTP, Email, etc.

Moderator: Neosoft Support

Problem with code %20 in filenames

Postby CN_Iceman » Wed Aug 13, 2014 11:14 am

Hi people.

I am trying to download a list of pdf files (airport charts) and I found a rather curious problem. If the link file contains two spaces in its name, the browser does not find the file.
With an example it is easier to understand:

In this link you obtain a list of links to the pdf files:
http://api.openaviationdata.com/v1/charts/retrieve/icao/lezg.xml

The XML file in question has some links to files with spaces in the filename.

In the XML there is 29 links, but I am having problems with 3 of them. Let's check one of them, the 11th link.
Code: Select all
<dat>
    <name>AD 2 - LEZG - IAC HI-TACAN RWY 12R - MIPS</name>
<url>
    charts/1407/LEZG/AD 2 - LEZG - IAC HI-TACAN RWY 12R - MIPS.pdf
</url>
</dat>


Adding http://data.openaviationdata.com/ to the link, you obtain the correct link for the pdf file:
Code: Select all
http://data.openaviationdata.com/charts/1407/LEZG/AD 2 - LEZG - IAC HI-TACAN RWY 12R - MIPS.pdf


The problem is that you can't download that file because the link is not correct. Between HI-TACAN and RWY 12R there is 1 space, but the correct link has 2.

Check this link now:
Code: Select all
http://data.openaviationdata.com/charts/1407/LEZG/AD%202%20-%20LEZG%20-%20IAC%20HI-TACAN%20%20RWY%2012R%20-%20MIPS.pdf

As you can see there is two %20 between HI-TACAN and RWY and the link works. If you insert this link on your browser, you can see that the browser delete one of the %20... Why? I don't know.

The browser interprets or translates each code space by %20 and there are filenames with 2 or more spaces, the translation of the code would be %20%20 if you have two spaces for example.

When I capture the XML, I use this code:
Code: Select all
InternetGet "[WEB]" "[Pagina]" "HideProgress"
FileWrite "[Carpeta_Datos]\CapturaPagina.dat" "All" "[Pagina]"


Parsing the CapturaPagina.dat I can observe that the 2 spaces are gone, so it is impossible to get the correct link to the pdf files that contain 2 spaces on his name.

Any help?, please...
Greetings/Saludos, Jose.
www.icemansoft.es
User avatar
CN_Iceman
 
Posts: 297
Joined: Tue Mar 01, 2011 11:04 am
Location: España

Re: Problem with code %20 in filenames

Postby Neosoft Support » Wed Aug 13, 2014 11:26 am

It looks like the xml file simply has a typo. If the file name is wrong in the xml file, the solution would be to edit the xml file and add the missing space or rename the pdf file so that it matches the xml.
NeoSoft Support
Neosoft Support
NeoSoft Team
 
Posts: 5593
Joined: Thu Mar 31, 2005 10:48 pm
Location: Oregon, USA

Re: Problem with code %20 in filenames

Postby CN_Iceman » Wed Aug 13, 2014 12:02 pm

Thanks for the answer Dave.

Yes, now I think that the error may be that. The problem is that my program searches for nearly 10.000 airports, giving me thousands and thousands of links and find those bugs is not fun.
Also some of the links and xml files change every 28 days...
Greetings/Saludos, Jose.
www.icemansoft.es
User avatar
CN_Iceman
 
Posts: 297
Joined: Tue Mar 01, 2011 11:04 am
Location: España

Re: Problem with code %20 in filenames

Postby dec » Wed Aug 13, 2014 1:34 pm

Hello,

Can npUrlEncode help here? Probably not, if finally the problem is a served "broken" link, but...
.
Enhance your NeoBook applications!
.
57 plugins, 1113 actions and 230 samples
.
NeoPlugins website: www.neoplugins.com
.
User avatar
dec
 
Posts: 1663
Joined: Wed Nov 16, 2005 12:48 am
Location: Spain

Re: Problem with code %20 in filenames

Postby Gaev » Wed Aug 13, 2014 3:37 pm

Jose:

Try ...
Code: Select all
DownloadFile "!http://api.openaviationdata.com/v1/charts/retrieve/icao/lezg.xml" "[PubDir]downloadedxml.txt" ""

... the file contents look like ...
Code: Select all
<?xml version="1.0" encoding="UTF-8"?>
<message time="1407967830"><response><code>200</code><data><dat><name>AD 2 - LEZG - ADC 1.1</name><url>charts/1407/LEZG/AD 2 - LEZG - ADC 1.1.pdf</url></dat><dat><name>AD 2 - LEZG - ADC 1.2</name><url>charts/1407/LEZG/AD 2 - LEZG - ADC 1.2.pdf</url></dat><dat><name>AD 2 - LEZG - AOC1</name><url>charts/1407/LEZG/AD 2 - LEZG - AOC1.pdf</url></dat><dat><name>AD 2 - LEZG - AOC2</name><url>charts/1407/LEZG/AD 2 - LEZG - AOC2.pdf</url></dat><dat><name>AD 2 - LEZG - AOC4</name><url>charts/1407/LEZG/AD 2 - LEZG - AOC4.pdf</url></dat><dat><name>AD 2 - LEZG - ARR RWY 1230</name><url>charts/1407/LEZG/AD 2 - LEZG - ARR RWY 1230.pdf</url></dat><dat><name>AD 2 - LEZG - DEP RWY 1230 - AATC-1(C)</name><url>charts/1407/LEZG/AD 2 - LEZG - DEP RWY 1230 - AATC-1(C).pdf</url></dat><dat><name>AD 2 - LEZG - DEP RWY 1230</name><url>charts/1407/LEZG/AD 2 - LEZG - DEP RWY 1230.pdf</url></dat><dat><name>AD 2 - LEZG - DPN</name><url>charts/1407/LEZG/AD 2 - LEZG - DPN.pdf</url></dat><dat><name>AD 2 - LEZG - GMC</name><url>charts/1407/LEZG/AD 2 - LEZG - GMC.pdf</url></dat><dat><name>AD 2 - LEZG - IAC HI-TACAN  RWY 12R - MIPS</name><url>charts/1407/LEZG/AD 2 - LEZG - IAC HI-TACAN  RWY 12R - MIPS.pdf</url></dat><dat><name>AD 2 - LEZG - IAC HI-TACAN or ILS or LOC RWY 30R - MIPS</name><url>charts/1407/LEZG/AD 2 - LEZG - IAC HI-TACAN or ILS or LOC RWY 30R - MIPS.pdf</url></dat><dat><name>AD 2 - LEZG - IAC ILS Z RWY 30R</name><url>charts/1407/LEZG/AD 2 - LEZG - IAC ILS Z RWY 30R.pdf</url></dat><dat><name>AD 2 - LEZG - IAC LOC Z RWY 30R</name><url>charts/1407/LEZG/AD 2 - LEZG - IAC LOC Z RWY 30R.pdf</url></dat><dat><name>AD 2 - LEZG - IAC NDB Y RWY 12R</name><url>charts/1407/LEZG/AD 2 - LEZG - IAC NDB Y RWY 12R.pdf</url></dat><dat><name>AD 2 - LEZG - IAC NDB Z  RWY 12R - ICAO</name><url>charts/1407/LEZG/AD 2 - LEZG - IAC NDB Z  RWY 12R - ICAO.pdf</url></dat><dat><name>AD 2 - LEZG - IAC NDB Z  RWY 12R</name><url>charts/1407/LEZG/AD 2 - LEZG - IAC NDB Z  RWY 12R.pdf</url></dat><dat><name>AD 2 - LEZG - IAC TACAN RWY 12R - MIPS</name><url>charts/1407/LEZG/AD 2 - LEZG - IAC TACAN RWY 12R - MIPS.pdf</url></dat><dat><name>AD 2 - LEZG - IAC TACAN or ILS Y or LOC Y RWY 30R - MIPS</name><url>charts/1407/LEZG/AD 2 - LEZG - IAC TACAN or ILS Y or LOC Y RWY 30R - MIPS.pdf</url></dat><dat><name>AD 2 - LEZG - IAC VOR RWY 12R - ICAO</name><url>charts/1407/LEZG/AD 2 - LEZG - IAC VOR RWY 12R - ICAO.pdf</url></dat><dat><name>AD 2 - LEZG - IAC VOR RWY 12R</name><url>charts/1407/LEZG/AD 2 - LEZG - IAC VOR RWY 12R.pdf</url></dat><dat><name>AD 2 - LEZG - IAC VOR RWY 30R</name><url>charts/1407/LEZG/AD 2 - LEZG - IAC VOR RWY 30R.pdf</url></dat><dat><name>AD 2 - LEZG - PDC</name><url>charts/1407/LEZG/AD 2 - LEZG - PDC.pdf</url></dat><dat><name>AD 2 - LEZG - SID RWY 1230</name><url>charts/1407/LEZG/AD 2 - LEZG - SID RWY 1230.pdf</url></dat><dat><name>AD 2 - LEZG - STAR RWY 1230</name><url>charts/1407/LEZG/AD 2 - LEZG - STAR RWY 1230.pdf</url></dat><dat><name>AD 2 - LEZG - VAC JET MIL VFR CORRIDORS</name><url>charts/1407/LEZG/AD 2 - LEZG - VAC JET MIL VFR CORRIDORS.pdf</url></dat><dat><name>AD 2 - LEZG - VAC</name><url>charts/1407/LEZG/AD 2 - LEZG - VAC.pdf</url></dat><dat><name>AD 2 - LEZG- CDA - RWY 12R</name><url>charts/1407/LEZG/AD 2 - LEZG- CDA - RWY 12R.pdf</url></dat><dat><name>AD 2 - LEZG- CDA - RWY 30R</name><url>charts/1407/LEZG/AD 2 - LEZG- CDA - RWY 30R.pdf</url></dat></data></response></message>

... contains 2 spaces ... using NeoBook's String manipulation functions, you should be able to extract the content between each <url> and </url>.
User avatar
Gaev
 
Posts: 3716
Joined: Fri Apr 01, 2005 7:48 am
Location: Toronto, Canada

Re: Problem with code %20 in filenames

Postby CN_Iceman » Wed Aug 13, 2014 11:28 pm

Thanks to all for the help.

I am using the David's npXmls plugin to parse the file, but I am going to try to do it as Gaev suggest.
Greetings/Saludos, Jose.
www.icemansoft.es
User avatar
CN_Iceman
 
Posts: 297
Joined: Tue Mar 01, 2011 11:04 am
Location: España

Re: Problem with code %20 in filenames

Postby dec » Thu Aug 14, 2014 11:03 am

Hello,

CN_Iceman wrote:Thanks to all for the help.

I am using the David's npXmls plugin to parse the file, but I am going to try to do it as Gaev suggest.


If you use the npXmls plugin you can "sanitize" the URL after get it. In addition to NeoBook the npUtil plugin also have some strings utilities actions. And, on the other hand, if you like consider the possible use of Regular Expressions with the npRexp plugin.
.
Enhance your NeoBook applications!
.
57 plugins, 1113 actions and 230 samples
.
NeoPlugins website: www.neoplugins.com
.
User avatar
dec
 
Posts: 1663
Joined: Wed Nov 16, 2005 12:48 am
Location: Spain

Re: Problem with code %20 in filenames

Postby CN_Iceman » Thu Aug 14, 2014 1:00 pm

Well, after some fun trying again and again, I finally found the solution.

I have had to do without npXmls and do it "by hand" using NeoBook commands and some npUtil options. After "sanitize and cleaning up" the file, I managed to get the expected result.

Again, thank you all for the help.
Greetings/Saludos, Jose.
www.icemansoft.es
User avatar
CN_Iceman
 
Posts: 297
Joined: Tue Mar 01, 2011 11:04 am
Location: España

Re: Problem with code %20 in filenames

Postby virger » Sun Aug 17, 2014 3:14 pm

Hi...
For example i din't have any problem with this link, ussing XP and NB ver 5.80
http://MyPageFromCostaRica.com/K%20K%20%20C%20%20%20K%20%20%20%20H.pdf


InternetFileExists "http://MyPageFromCostaRica.com/k%20k%20%20c%20%20%20k%20%20%20%20h.txt" "[yn]" ""
DownloadFile "http://MyPageFromCostaRica.com/k%20k%20%20c%20%20%20k%20%20%20%20h.txt" "[PubDir]k%20k%20%20c%20%20%20k%20%20%20%20h.txt" ""


Pura Vida
Desde Costa Rica
COSTA RICA
PURA VIDA
User avatar
virger
 
Posts: 509
Joined: Mon Sep 18, 2006 12:21 pm
Location: Costa Rica, America Central


Return to NeoBook and the Internet

Who is online

Users browsing this forum: No registered users and 1 guest