Towards a sustainable geoprocessing environment at Statistics Netherlands through performance benchmarking
MetadataShow full item record
The research project has aimed to support decision making regarding possible transition towards an up-to-date, sustainable geo-ict architecture based on an analysis of technical factors that influence performance of geoprocessing tools at the spatial team of Statistics Netherlands and experiences at comparable organizations. The current geoprocessing environment is ArcInfo Workstation and ArcGIS Desktop 10.1, although most production processes are conducted in ArcInfo Workstation and its scripting language AML. Support of ArcInfo Workstation will not be continued after 2015. Therefore, migration of these processes to ArcGIS Desktop 10.1 or higher version of this product suite has to be considered. Consequently, the following research question has been formulated: Which alternatives to the current geo-ict infrastructure can be proposed for Statistics Netherlands that meet performance requirements of its geoprocessing activities and are suitable for implementation within the organizational constraints of Statistics Netherlands? Based on studying the workload, resources and performance bottlenecks at Statistics Netherlands, a benchmark has been developed evaluating four geoprocessing tools (UNION, INTERSECT, DISSOLVE and NEAR) to be tested on scalability (using synthetic data), impact of optimization factors available in ArcGIS Desktop 10.1 and impact of big data hardware and new software releases ArcGIS 10.2 and ArcGIS Pro (using real data). A number of administrative processing tools (SUMMARY STATISTICS, FREQUENCY, CALCULATE and JOIN FIELD) has been additionally included for tests with the big data hardware and the new software releases. The results of the benchmark have been combined with migration experiences of ArcInfo to ArcGIS Desktop or ArcSDE from other organizations and held against possible internal and external restrictions and trends. These organizations are Statistics Portugal, Statistics Italy, United States Geological Survey and PBL (Netherlands Environmental Assessment Agency). The benchmark results showed different results per tool: Whereas UNION and INTERSECT show the same performance in ArcGIS desktop 10.1 as in ArcInfo Workstation, the DISSOLVE and the JOIN are considerable slower in all higher desktop versions. The selected geoprocessing tools UNION, INTERSECT, DISSOLVE and NEAR did not show improvement with the use of available optimization factors (spatial index, spatial sort, parallel processing environment, compacting and compressing) in ArcGIS 10.1, although the use of optimization factors has not been exhaustive. The most remarkable results showed a decline in performance, for example compression of the input datasets. The NEAR shows no difference between ArcInfo Workstation and ArcGIS 10.1 on the fat client, but showed a big improvement in ArcGIS 10.2 and ArcGIS Pro on the big data computer. The improvement of the NEAR implicates that the algorithm of the tool has been redesigned. The other tools, also a number of administrative processing tools tested for Statistics Netherlands, did not show substantial improvement within a big data hardware environment and within ArcGIS Pro. The lessons learned that have been obtained by the interview and questionnaires can be formulated in several points: 1. The transition has been a gradual process for the respondent organizations, but for Statistics Netherlands the urgency to migrate is higher because of the high number of production processes written in AML that need to be migrated because of the production load in AML and the product cycle of ArcInfo Workstation. 2. The influence on ICT infrastructure decision making in terms of facilities for geoprocessing will depend on the user base and the organization of the GIS users: 3. The PBL shows interesting options in optimization, but also has a more flexible (Geo) ICT infrastructure. 4. Open Source or other proprietary software is largely new territory and not really investigated on functionality and performance. 5. For most the Statistical agencies and the PBL some part of the migration has been outsourced, at least the initial phase of the migration. 6. The evaluation of the migration to a higher ArcGIS version is mixed and varies per tool or model. It is also dependent on the user needs of the organization. To provide a migration advice that also takes into account future developments, a number of trends in geoprocessing technology have been described, such as cloud computing, MapReduce/Hadoop. Applying these technologies would require a substantial investment in knowledge and adaption of production processes. Moreover, solutions that affect the ICT infrastructure as a whole, like cloud computing or could affect the results of production of statistics by using algorithms like MapReduce are less likely to be implemented within the near future. Based on the previous information, the following conclusion can be stated: Given the limited scalability and end of support of ArcInfo Workstation, migration to a more up-to-date infrastructure is unavoidable. Three alternatives are possible: Alternative 1: Keep small scale infrastructure within ArcGIS Desktop suite an future ArcGIS Pro Spatial Statistics stays within the ArcGIS Desktop suite but should consider quicker migration to 10.2 or rather 10.3 because more performance problems are resolved within these versions (according to ESRI) and because the possibilities to optimize ArcGIS Desktop 10. 1 remain very limited. This will mean that Spatial Statistics will need less expertise in-house on performance optimization, but will remain dependent on ESRI product development processes. This dependency could be countered by cooperating more closely with other ArcGIS users nationally and internationally to press for the needed improvements. Alternative 2: Invest in a sustainable (geo) ict environment with other departments To be more in control of performance optimization means that a more configurable Geo ICT infrastructure is needed. Spatial Statistics is too small to make such an investment in infrastructure and needed expertise (“humanware”). Therefore it should cooperate with other departments at SN who have a need for spatial data analysis. Such a (geo) ict environment could include several components: a spatial database such as Oracle combined with ArcSDE, an open source spatial database such as PostGIS or a file-based environment. Additionally, extension of the benchmark towards spatial databases would be needed. This need is especially apparent if the volume of data is expected to grow in future. Alternative3: Organize innovative projects with internal and external partners as a complementary step: In addition to the more short-term solution Spatial Statistics should start a number of projects dedicated to the application of new geo-ict technologies. For example, a project involving Spatial Statistics and other teams in cooperation with the innovation laboratory. The conclusions have also resulted in a number of recommendations that are aimed at improving the benchmark and at helping the migration process of Statistics Netherlands.