A PIPELINE ALGORITHM FOR BUILDING AND STRUCTURE SHAPES GENERATION USING DERIVED LAS DATASET: AN EFFICIENT ALTERNATIVE TO MANUAL DIGITIZATION FROM ORTHOPHOTOS IN FLOOD HAZARD FEATURE EXTRACTION
Bernardino J. Buenaobra1 , Aure Flo A. Oraya1 , Aries Martin P. Openiano1 , Claurice L. Mangle1 ,
Lora Magnolia F. Cubero1 , Kirby Henriksen L. Tan1 , Arthur Gil T. Sabandal1 , Laarlyn N. Abalos1 ,
Janice B. Jamora1,2, Ricardo L. Fornis1,2 and Roland Emerito S. Otadoy1,3
USC Phil-LiDAR Research Center, University of San Carlos
Talamban, Cebu City, Philippines 6000
School of Engineering, Department of Civil Engineering, University of San Carlos
Talamban, Cebu City, Philippines 6000
Theoretical and Computational Sciences and Engineering Group
Department of Physics, University of San Carlos
Talamban, Cebu City, Philippines 6000
KEY WORDS: Floodplain, LiDAR, Computational, Height break, Workflow
ABSTRACT: In this paper we propose and describe an implementation of a computationally efficient generation of building and structure shapes which dramatically improves the manual process in flood hazard feature extraction workflow without orthophotos. The impact cuts through not only on the cost in procuring very high-resolution true color RGB images but also ultimately to the reduction of average processing time from manual editing of 4.5 hours down to just over 40 minutes using the computational approach. This time saving scheme is the result from carrying out a prescribed pipeline operation on the LIDAR LAS dataset. The principle from which the generation of our wanted shapes rely uses the convex hull set that forms an outline from the LAS data points during program execution. To contrast when orthophotos with LAS point data they represent a 2-dimensional data matrix whose elements are light intensity RGB values in their horizontal and vertical pixels alone without any depth information and do not have elevation values of the spatial surfaces. In this paper we exploit the zth value representing the relative heights and elevation from ground in the LAS cloud point data that comprise of millions of laser point returns from its coordinates in 3D space. By incremental stepping through a number of height breaks, we are able to select structures in the floodplain areas that is otherwise could be difficult by visual acuity and experience in the polygon outlining by hand alone in 2D image. A test to quantify the difference between the manually derived polygons and those by machine, we used a 2-dimensional cross correlation to yield a number for goodness of alignment. Our experience indicated that an optimum for our test case was reached after a height break of 30m from an iteration of 3-30 meters and step size of 0.5-5.0 sq. meter. The new method prototype achieved practically a one to one alignment and an improved granularity in height discrimination.
1.1 A work flow background review
The current practice Phil-LIDAR 1 feature extraction requires the selection of 5 km x 5 km blocks. These are to be covered from areas that are historically known to experience flooding due to and brought about by natural hydrological processes that could be seasonal. These blocks will be delineated following a processing workflow as depicted in Figure 1. For the most part the feature analyst or editor for the floodplain need to use one’s judgement to select features from a true color image such as from an orthophoto or DSM derivative from the usual (d,v,s,t) ascii data supplied from data acquisition group which preprocesses flight missions data . A constraint is placed so that for example a 2m above structures and 25 sq. m area as an accepted criterion and anything below that is discarded. The analyst or data processor then draws by hand the outline of the structure based on his or her visual perception of depth or rise from the input photo of area under study or from a featureless DSM raster.
1.2 The proposed new workflow concept
In the interest of computational efficiency with cost as well the more objective manner of a software driven feature extraction that does not entirely depend on the skill and experience of an analyst oU data processor person, we have proposed an enhancement to the current process by introducing a modification of the workflow. Our test case is based on the unavailability of an orthophoto or similar true color imagery when LAS point cloud is readily available when floodplain planning has started. This paper proposes instead the use of LiDAR LAS laser point and the use LAS cloud data processing program LAStools as input in contrast with those in shown in the previous section. We show in the later development the notable differentiation in terms of
In principle the hand editing from orthophotos that gives the analyst or data processor person his or her visual perception from front of operator’s monitor is replaced with a script that can run an arrangement of executable programs called in sequence with parameters being passed from placeholders or variables in a command interface (CMD) batch file in a Windows OS environment – this is a custom Pipeline Algorithm that could programmatically differentiate for example buildings or structures from vegetation.
2. COMPUTATIONAL SET-UP FOR AUTOMATING FEATURE EXTRACTION FROM LAS DATA SET
2.1 Selection of study area and set-up specifications for prototyping of new method
For our initial test and proof of the concept we take three areas namely the flight missions for Cebu_Blk47A with the following .las laser point files: pt000218.las, pt000219.las and pt000220.las which are specially selected for the planned floodplain feature extraction and field validation under actual study. We have our analyst and data processors performed the usual hand editing to the best of their ability to generate the polygons with a constraint placed to structures to drop 2m downwards and enclose on a single polygon smaller structures to a lumped 25sq.m polygon from a provided DSM. We obtained the start and finish time of 4.5 hrs. on the average for the manual editing using iMAC desktop computers running with i7 quad core class CPUs on 2TB hybrid HDDs on 6 cores up to the shape file generation from the finished polygons. Our very recent versions of the scripts for the pipeline algorithm runs to an average of 1 hr. 10 min. when all the mission flights in Cebu_Blk47A on a folder are processed while between point files is just under 12 minutes. At the end of the LAStool based pipeline algorithm execution we would have produced one complete self-contained folder zipped for each of the three point files consisting of .shp, .tiff, .kml and other associated files. The shape files are then read into ArcGIS and the overlap between those that were processed by hand editing and those with LAStool processing results and are again visually compared. The process goes to iteration on which height break and step size will show the optimal match by visual inspection. Both of these shape files will overlaid on the DSM and visually checked again on the quality of their alignment all of these files are then stored in a server database. We summarize this scheme in Table 1 shown in the next page.
Table 1 Run set-up and table of specifications for prototyping new method
2.2 Description of the Pipeline Algorithm for discriminating buildings and structures
We adapted and modified for our purposes a publicly shared example for the use of LAStools for discriminating buildings and vegetation from “rapidlasso” the makers of LAStools for las cloud point processing – we reference to that work for our development of a custom script. At the core of this algorithm are successive steps that will lead to the generation of shape files and classified TIFF images for both buildings and vegetation. In our application we are only interested in the structures or buildings that are in the areas on the floodplains.
At a high level description of the implementation of the this algorithm it can be put in simple sentences the various steps that will lead one to the final output that would be useful for the analyst and data processor for the work on feature extraction for the floodplains. The following are the key steps in the Pipeline Algorithm used in this paper:
1. Generate the first and last return DSM
2. Compute for the normalized DSM by getting the difference between the last return DSM from first return DSM
3. Extract the bare earth points from last return assume for large cities a metro option
4. Compute the height for each las point from ground, drop points below or drop points above for an unwanted height as needed
5. Classify the structures (buildings) and vegetation (trees) from the results of (3) and (4)
6. Convert the las points from the file into grid to raster (.tiff file type) for a specific classification that is wanted (buildings or vegetation)
7. Compute for the boundary polygon for the points to form the shape file
2.3 Data processing script and execution profiling
To be able to determine the optimal or the best possible shape file that could match a template here the manual edited shape file is arbitrarily chosen we run the pipeline algorithm at starting height break of 3m to end in 30m in a loop for each change in the step size in the lasgrid.exe optional parameter of 0.5m and ends in 5.0m. This is graphically depicted in Figure 4 below:
To determine how fast to finish each tile being processed, a code profiling measurement was devised that is for each of the loop on each height break change the CPU time is captured and log to a text file . Also, the same profiling method is done for each of the executable program that is executed internally in the Pipeline Algorithm routine. At the code level in the loop we show in Figure 5.0 in the next page a code snippet of the script for the Runloop.bat the boxed part highlights where the time capture happens and logged. We do not show the Pipeline Algorithm script algorithm here but is made available by the authors in the provided download link shown in the references and can be freely be downloaded and modified by the reader and using their own LiDAR LAS data set.
At a code level a loop script calls a main script and passes a variable value for an count for height breaks because of the limitation in the command interface (CMD) in Windows only integer values can be used and so the fractional step size is changed inside the batch file first before running the loop. The code snippet is shown in Figure 5.0
3. RESULTS AND DISCUSSIONS
3.1 Evolution of structural detail from height breaks iteration
The result from carrying out the pipeline algorithm described in the previous sections shows that that the structure start to reveal themselves when the iteration begins at 3m and the outline of structures becomes very clear and very similar to those that were edited by hand at 30m. Therefore if we have a set criterion for an “accept” for a certain height and certain outline of the structure what is just necessary is to determine by visual inspection which is most suitable from the results of the iteration. Because we have decided to start the height break at 2.0m anything below this height have shown during the execution of pipeline algorithm that it indicated that nothing will survive that is, there are no classification or computation done for those LAS laser points from that height break down. Table 2 on the next page shows the evolution of the structures for an ever increasing number of break heights. We can see that at these heights structures starts to evolve from 2m up as evidenced by a slowly developing additions of outlines from a spatial point of view. Unlike the perspective from a manual operation of feature extraction aided with a DSM and orthophotos which does not give true depth or in our case heights because of the inherent 2D nature of these references the LAS laser point files is by its own are true 3D coordinates in space and are points representing a height and are not images because they do not form intensity values for only as a 2D data matrix but are computed optical beam returns. The paper emphasizes a computational and a more objective approach rather than a subjective and often skill based that could vary dramatically between analysts and or data processors as well as the effect of operator vision fatigue.
Table 2 Evolution of height structures from selected height breaks from laser point file
3.3 Measurement of similarity by 2-dimensional Normalized Cross Correlation
We developed an objective test for similarity between an arbitrary template here being the “hand generated” shape file from the output of an analyst and data processor and those that were “computationally generated” from LAS data set using the pipeline algorithm. The basic concept of correlation as we apply them for images one is a sub image I1(x, y) of size K L within an image I2(x, y) of size 0 N, where K M and L N. The Normal Cross Correlation between I1(x, y) and I2(x, y) at a point (i, j) is thus a mathematical operation given by following equation:
Because an image I (x, y) is represented as a 2D data matrix whose elements are light intensity values of R,G and B color planes, we can see that in equation (1) above that it is essentially a discrete multiplication operation of two matrices in which one image is offset by an amount (x+i) in the rows element and (y+j) in columns element. A perfect cross correlation between two images also means that it is a perfect match between the template and the image under comparison or otherwise results to a zero if they are completely off. We use a readily available function in Matlab to perform the 2D cross correlation called normxcorr2 in the script below in Figure 6 to compare in a graphical manner the manually edited shape file from that of the LAS laser point file derived shape file. Here shown in the script for the test case of pt000218.las. The results for each test case are shown in the next page.
Table 4 Result of 2D Cross Correlation between hand generated shape file (template) and 30m derived LAS using pipeline algorithm
When assumed that the results from the manual editing is takes as the reference or the template in the 2D cross correlation process we see some spread in the peaking. This can be explained by the fact that the pipeline algorithm results in more detail in the exact height breaks as compared with those done by hand. If the reverse is true that is when the computer generated shape file is decided to be the template then the manual editing actually misses much more height details from what is should. This demonstrates better accuracy and precision when height or elevation of the structures from ground becomes the most important criterion in the feature extraction in the floodplains modeling.
We have shown that a computational approach to the generation of shape files for feature extraction in the floodplains modeling using only LAS laser point files will lead to a better results and in greater detail in the height discrimination of structures from otherwise manual or hand edited counterpart. Moreover, the proposed new workflow and method shows a typical finish time of 4.5 hrs. per 5km x 5km floodplain area could be reduced to around 40 minutes by use of a pipeline algorithm to automatically discriminate buildings and structures.
This paper is an output of “Project 8. LIDAR Data Processing and Validation in the Visayas: Central Visayas (Region 7)” under “Phil-LiDAR 1. Hazard Mapping of the Philippines Using LiDAR – Program B. LIDAR Data Processing and Validation by SUCs and HEIs” headed by Dr. Enrico Paringit. The close collaboration between the USC Phil-LiDAR Research Center, University of San Carlos and the Training Center for Applied Geodesy and Photogrammetry, National Engineering Center, University of the Philippines-Diliman is also hereby acknowledged. We thank the Philippine Council for Industry, Energy and Emerging Technology Research and Development of the Department of Science and Technology (DOST-PCIEERD) for funding support. We also thank the University of San Carlos for moral, logistical, and financial support to the Phil-LiDAR 1 and 2 research programs. .
Isenberg, M., LAStools – efficient tools for LiDAR processing, version 150330. Retrieved September 9, 2015 from http://lastools.org.
Isenberg, M, 2015. Discriminating Vegetation from Buildings. Retrieved September 9, 2015 from http://rapidlasso.com/2014/10/23/discriminating-vegetation-from-buildings/
Gonzalez, R., Woods R., Eddins S. 2009. Digital Image Processing Using Matlab, 2nd Ed. p.15..
National Instruments. 2005. Pattern Matching. IMAQ Vision Manual, pp. 12-7..
Matlab normxcorr2 function. Retrieved September 9, 2015 from: http://www.mathworks.com/search/site_search.html?suggestion=&c=entire_site_en&q=normxcorr2