HISPCRT | High Intelligence Spatial Computing Research Team

Journal	Remote Sensing
English Title	A Single Data Extraction Algorithm for Oblique Photographic Data Based on the U-Net
Chinese Title	一种基于U-Net的倾斜摄影数据的单一数据提取算法
Authors	Shaohua Wang and Xiao Li et al.
DOI	https://doi.org/10.3390/rs16060979

Abstract

In the automated modeling generated by oblique photography, various terrains cannot be physically distinguished individually within the triangulated irregular network (TIN). To utilize the data representing individual features, such as a single building, a process of building monomer construction is required to identify and extract these distinct parts. This approach aids subsequent analyses by focusing on specific entities, mitigating interference from complex scenes. A deep convolutional neural network is constructed, combining U-Net and ResNeXt architectures. The network takes as input both digital orthophoto map (DOM) and oblique photography data, effectively extracting the polygonal footprints of buildings. Extraction accuracy among different algorithms is compared, with results indicating that the ResNeXt-based network achieves the highest intersection over union (IOU) for building segmentation, reaching 0.8255. The proposed “dynamic virtual monomer” technique binds the extracted vector footprints dynamically to the original oblique photography surface through rendering. This enables the selective representation and querying of individual buildings. Empirical evidence demonstrates the effectiveness of this technique in interactive queries and spatial analysis. The high level of automation and excellent accuracy of this method can further advance the application of oblique photography data in 3D urban modeling and geographic information system (GIS) analysis.

Background

Three-dimensional (3D) building modeling involves selecting a single building in the aerial image and the queries for building information, which has significant implications in various fields. For instance, 3D building modeling can aid architects, urban planners, and policymakers in making informed decisions about the development, design, and sustainability of buildings and urban spaces. The accurate 3D models of buildings can help in civil engineering and construction by detecting potential design conflicts, and in disaster management and emergency response by assessing vulnerability or planning evacuation routes. Three-dimensional modeling is closer to human visual habits, providing more information than two-dimensional (2D) modeling and expressing more spatial relationships. Both group users and individual users have an urgent need for a 3D geographic information system (GIS). Three-dimensional building modeling is one of the major functions in 3D GIS applications, the development of which is affected by various factors. The economic cost and time cost of 3D data acquisition were the most critical constraints affecting the wide application of 3D GIS in the early days. With the continuous development of various theories and technologies, such as computer graphics, virtual reality technology, and mapping technology, 3D GIS has gradually become one of the mainstream directions of GIS research in recent years. Instead of manual modeling of 3D data production, new 3D data acquisition methods, such as oblique photogrammetry, have emerged. Oblique photogrammetry uses aircraft to move from vertical to tilt, with multiple sensing devices capturing images simultaneously. The oblique photographic model is generated by automatic batch modeling. The oblique photographic model has the potential to become an important data source of 3D GIS with the advantages of high precision, high efficiency, high realism, and low cost.

Fig 1：Building selection by overlaying of polygon on remote sensing images.

Framework

The deep convolution neural network was applied for building bottom extraction, and the shadow volume rendering technique was proposed for attaching the bottom polygon to the building surface provided by the oblique photographic model. Firstly, the oblique photography model data was preprocessed by conversion into DOM and digital surface model (DSM) raster data, serving as the input for a deep convolutional neural network. Then, a deep convolutional neural network based on U-Net was constructed, with ResNeXt50 serving as the backbone network. It was employed to automatically extract building outlines from oblique aerial photography data, comparing its accuracy with that of different algorithms. Following this, shadow volume rendering techniques were utilized to bind the extracted vector base data to the building surfaces of the oblique photography model, thereby realizing the dynamic virtual method of building monomer construction using oblique photography.

Fig 2：Overall workflow.

Result

From the experimental results in Table 1, it can be seen that, for each algorithm, the accuracy of extracting buildings based on oblique photographic modeling data is higher than that of extracting buildings using DOM alone. The gap of IOU between the DOM and oblique photographic modeling data in random forest experiments is larger than in the other two algorithms. The gap of IOU under U-Net based on the ResNeXt50 experiment has the smallest number. Since the oblique photographic modeling data can provide a more detailed feature for deep learning, the IOU of three algorithms for oblique photographic data are compared. The IOU for the extraction building footprint using U-Net based on the ResNeXt50 has a value of 0.8255, which is higher than the 0.8143 in U-Net based on the VGG-19 and also higher than that in random forest, which is 0.7532. This result indicates that the accuracy of extracting the building footprint polygon using the U-Net based on the ResNeXt50 is higher than the accuracy of the other two algorithms. The above findings prove that the extraction from an oblique photograph by U-Net based on the ResNeXt50 would be the most accurate and optimal.

Tab 1：Experimental results.

Innovation

The successful selection of the building by the dynamic virtual method of building monomer construction using oblique photography proposed in this study provides a solid base for its various applications with different functions. One of the applications is the interactive query of the oblique photographic model. Clicking the building in the 3D scene with the mouse obtains the intersection point O. The point O in the Cartesian coordinate system is converted into the point P in the geographic coordinate system. The geometric data, including ID value and attribute information of the found polygon, are returned by finding the underlying polygon where point P is located (as shown in Figure 3a). In addition, the implementation of the structured query language (SQL) query and spatial query of the oblique photographic model can also be realized by the dynamic virtual method of building monomer construction using oblique photography (as shown in Figure 3b,d). The set of polygon objects satisfying the SQL query and the spatial query condition in the database is selected, the polygons are attached to the oblique photographic modeling data by rendering the stencil shadow volume, and the corresponding building surface is highlighted with a specific color. This function contributes to the thematic map and other productions (as shown in Figure 3c).

Fig 3：Dynamic virtual monomer of oblique photographic model: (a) query properties, (b) buffer query, (c) thematic map representation, (d) peripheral query.

Conclusion

Fully automated oblique photographic modeling technology solves the problem of 3D data sources for 3D GIS applications. However, oblique photographic modeling data are full-element slice data, which makes it impossible to operate on a single building. This research proposes a dynamic virtual monomer method for oblique photographic modeling data. The vector building footprint is attached to the surface of the oblique photographic modeling data by the shadow volume rendering technique, and the dynamic correlation between the vector underlying data and the oblique photographic modeling data is built. In the case of not cutting data, the singular expression and object-oriented query of oblique photographic modeling data are realized. A deep convolutional neural network based on a U-Net deep convolution neural network structure is proposed to extract the underside of the building from the oblique geographic image and thus improve the effectiveness of the fully automated oblique photographic model monomer process. This study proposes a deep-learning-based approach for extracting monomeric data from oblique photography, thereby achieving dynamic virtual monomerization of oblique photographic modeling data. The deep learning network employed in this method is based on the U-Net architecture, with ResNeXt50 serving as the backbone network for the U-Net structure. Comparative experiments indicate that this method is efficient and feasible, with higher extraction accuracy compared to traditional algorithms. It achieves full automation in representing oblique photography singularly, supports the two-dimensional integration of oblique photographic modeling data into GIS applications, and effectively promotes the widespread application of oblique photographic modeling data in industries such as surveying, planning, and smart cities.

Fig 4：Building extraction from oblique photographic data based on U-Net.