Saturday, September 19, 2015

Parsing and splitting openstreetmap file [ C ]


Its quite bizarre to me, more I'm trying to friendly with string / character processing with C programming, they are showing their envy to me.

I was/am need to split a / any big osm file to three parts, NODE, WAY and RELATION. You can do that by giving line number range to program. But I want to get the line number automatically.

This was seems very easy, just counting the new line till getting required string, i.e. <way id=

So I wrote,

 char way_compare[] = "<way id=";  
 FILE *fp = fopen("Kiel.osm", "r");  
 char ckchr[10];  
 int count_line=1;  
 char chr = getc(fp);  
 while(chr != EOF)  
   {  
   if(chr == '\n')  
     {  count_line++;  
       fgets(ckchr, 10, fp);  
       if(!strcmp(way_compare,ckchr))  
         { break; }  
     }  
     chr = getc(fp);  
   }  
   printf("Searched: %s found at %d\n",way_compare,count_line);  



Well, it happend nothing. WHY ? My searching on google brings, fgets adding new line ( '\n' ) at the end of string it reads. Well, I modified to,

 char way_compare[] = "<way id=\n";  and  
 char way_compare[] = "<way id=\n\0";  

Same. 
More character or less character with fgets, also endup same result.

Later, tried strncmp by giving the amount of character to compare.

  if(!strncmp(way_compare,ckchr,10))  

While printing chkchr its showing the right string that I'm searching. But while comparing it does not match. Later I figure that a tab character infront of every child / child-child element in osm file.

So, I've change the searching string to char

 way_compare[] = "\t<way id=";  

For  \t<node id= it just worked like charm. But for \t<way id=, no good.

While examining line by line what fgets bringing in its buffer, I found the <way id= or first node child after last child ends, is an empty line. At augenblicklich, I understood fgets reads </node> or </way> includes the '\n' and more the file pointer next line. So always the next line after end of a child is absent.

I throught, if I replace the new line with another character, maybe - it will work. So, I did as -

 char *replace_nl;  
 if((replace_nl = strchr(ckchr,'\n')))  
         *replace_nl = 0;  

Yes, it just helps to clear the empty line from output, but file pointer is already moved.

So, the solution could come here by two way, either, moving the file pointer back to last line position, which is quite difficult with static searching, or avoid the new line at  the end of a child. So my solution became ->

 int main(){  
   char way_compare[] = "\t<way i";  
   FILE *fp = fopen("Kiel.osm", "r");  
   char ckchr[8];  
   int count_line=1;  
   char chr = getc(fp);  
   while(chr != EOF)  
   {  
   if(chr == '\n')  
     {  count_line++;  
       fgets(ckchr, 8, fp);  
       if(!strncmp(way_compare,ckchr,8))  
         {  
           printf("Found IT: %s\n", ckchr);  
           break;  
         }  
     }  
     chr = getc(fp);  
   }  
   printf("Searched: WAY TAG found at %d\n",way_compare,count_line);  
   return 0;  
 }  



No comments:

Post a Comment