Relational database management system (RDMS) and Geographic Information System (GIS) Essay
Relational database management system (RDMS) and Geographic Information System (GIS)
Relational database are based on the relational model of data developed by E.F. Codd. In this model, the table is the only data structure. Tables that have their rows of data stored in the database are called base tables. Derived tables that have only their definitions (not their contents) stored are called views. Within a base table, duplicate rows are not allowed. The order of the rows and columns does not matter because of the fact that each fact is totally contained within a single row. Columns are also ordered by their names rather than by their positions. Although order of rows and columns is irrelevant at the logical level, at the physical level, the order does exist and needs to be taken into account while tuning the database for efficient access. The result of queries is also tables, but these are unnamed and may contain duplicate rows. When query is formulated, one can specify whatever order one wants the data to be displayed in, regardless of the actual order used to store the data in base tables (Halpin, 2001, p.44).
The database management system is the software and collection of tools that manages the database. Oracle software is the DMBS. A relational database management system is a DMBS that is relational in nature. This means that the internal workings access data in relational manner. Oracle is an RDMBS (Edward & Adrien, 1998, p.6).
Oracle 7 relational database management system is a suite of product to support various functions including query processing, online transaction processing, data warehousing, workgroup management, and internet access among others. High performance transaction processing is accomplished through a multi server multi threaded architecture. Several user requests are coordinated simultaneously. Sophisticated query optimization strategies are supported. B-tree based access method as well as hashing techniques is used for efficient retrieval. Concurrency control is provided by row level locking of both data and indexes (Thuraisingham, p.243).
Special query processing and indexing techniques for data warehousing applications are also supported. The index techniques include bit map indexes, hash joins, and portioned data. The query optimizer for data warehousing is cost based and supports queries such as start queries and snow flakes queries. The query optimizer takes syntax as well as dynamic database environment into consideration. In addition, parallel database processing techniques are utilized for efficiency. Parallel query architecture supports parallel execution of various operations including sorts, aggregation and table creation ((Thuraisingham, 1997, p.243).
The Structured Query Language (SQL) provides users of relational databases with the facility to define the database scheme, and then insert, modify, and retrieve data from the database (data manipulation). The data definition language (DDL) component of SQL allows the creation, alteration, and deletion of relation schemes. Normally a relation scheme is altered only rarely once the database is operational. A relation schemes provide a set of attributes, each with an associated data domain, with additional properties relating to the keys and integrity constraints. Data manipulation using SQL commands are quite straight forward, allowing insertion of single, or multiple tuples, update of tuples in tables, and deletion of tuples (Worboys & Duckham, 2004, p.51).
The primitive operations that can be supported by a relational database are the traditional set operations of union, intersection, and difference, along with the characteristically relational operation of project, restrict, join, and divide. The structure of these operations and the way that they can be combined is provided by relational algebra. The set operations union, intersection, and difference work on the relations as sets of tuples. The project operation applies to a single relation and returns a new relation that has a subset of attributes of the original. The restrict operation acts on a relation to return only those tuples that satisfy a given condition. The join operation makes connections between relations, taking two relations as operands, and returns a single relation (Longley et al, 1999, p.376).
Geographical objects are described by two types of data, namely locational data and attribute data. GIS typically employ database management system strategies for handling these two types of data. The relational model is the most popular database used to organize data in GIS. Spatial data are stored in a set of direct access operating system files for speed of input-output, while attribute data are usually stored in a standard commercial DBMS. The GIS software manages linkages between the spatial data files and the DBMS during different map processing operations. While a number of different approaches to the storage of the spatial data are used, the linking mechanism to the database is essentially the same. It involves unique identifiers stored in a database table of attributes that allows them to be tied to individual map elements. Relational database management system (RDBMS) is the most popular database management system used in GIS. In a relational model, the database is a group of relations. The tables are also referred to as relations. A matrix of tables is used to store the data. Each table contains a data item ( or a column of data) that is the same as at least one other table containing additional data. In other word, each table contains data relevant to a particular object and is linked to other tables by a common value. This type of data is particularly suited to SQL (Structured Query Language) (Malczewski, 1999, p.30).
The relational database manipulations are based on relational algebra operations. Basic relational operations on data sets include restriction, projection, product, union, intersect, difference, join, divide, a relational assignment operation, and a number of additional branch operators. The database processing therefore is much closer to bit pattern matching process rather than individual compare and branch item searching. Hence, the database operations are rather efficient, fast, and almost independent of database size. However, it depends on the implementation of virtual memory, the cache size, the external disk memory performance, the I/O channel bandwidth, or the size of allocated real memory. The database manipulation language, usually referred to as SQL, can either stand alone, or be embedded in usual programming languages. The 4GL, which is also known as the fourth generation language is a kind of higher scripting language supported by SQL pre compiler that allows common users effortless creation and manipulation of complex databases, performing queries, and database processing (Nikolik, 2003, p.219).
There are two fundamental approaches to relational database design, namely the synthesis approach and decomposition approach. The synthesis approach is a bottom up approach where we start from data dependencies, in order to obtain a database schema in the required normal form. The decomposition approach on the other hand is a top down approach where we start from the set of attributes schema over which FDs are defined and decompose this set iteratively until the resulting database schema is in the desired normal form (Levene & Loizou, 1999, p.238).
The representation of inexact or incomplete information has become an important area of database research. A number of authors have proposed a variety of generalizations to parts of classical database theory so as to be able to deal with vague or incomplete information. For most parts, the generalizations have been of one specific data model, the relational database model. One of the features of the relational database model is that its associated quarry language, though algebraic in nature, is equivalent to a fragment of predicate calculus, the relational calculus. Many of the generalizations use fuzzy set theory to provide an interpretation of impreciseness in relational database (Hohle et al, p.308).
Database process in GIS
Database system provides the engines for GIS. In the database approach, the computer acts as a facilitator of data storage and sharing. It allows the data to be modified and analyzed while in the store (Longley et al, 1999, p.373). GIS shares common areas with a number of other disciplines such as computer aided design, computer cartography, database management, and remote sensing. None of these disciplines, however, can by themselves fully meet the requirements of a GIS application. Examples of such requirements include the ability to use locational data to produce high quality plots, perform complex operations such as network analysis, enable spatial searching and overlay operations, support spatial analysis and modeling, and provide data management functions such as efficient storage, retrieval, and modification of large datasets, independence, integrity and security of data, and concurrent access to multiple users (Gangopadhyay, & Adam, p.2).
Certain design standards need to be followed in creating a GIS database. The design should adhere to computer industry standards. The standards set forth guidelines on system interoperability and integration, which are critical for the success of a GIS application project. There are four important standards for modern GIS software namely, Microsoft windows for interface, Structured Query Language (SQL) for data access, Component Object Model (COM) for tools, and Transmission Control Protocol/Internet Protocol (TCP/IP) and hyper text transfer protocol (HTTP) for network data transfer (Shamsi, 2002, p.181).
History of GIS database
The progression of GIS as a technology follows the proliferation of computers from the 1950s to the present day. Significant milestone that were reached during this time include theories on using computers to correctly represent our planet and mimic our daily needs and routines, the progression of software and hardware development, and founding of industry support groups. As a technology, GIS can trace its roots back to the 1950s and 1960s. In 1959, a simple model for applying computer technology to cartography was produced. This system known as map in – map out, sought to automate map making through the integration of database technology with computer aided drafting (CAD) systems. By doing so, maps could be text annotated automatically through the use of an existing database. This drastically improves the cartographer’s ability to recreate standardized maps and led to future GIS applications. Today, GIS technology is an integral part of cartography and mapmaking.2
2. GIS implementation for water and wastewater treatment facilities, by Water Environment Federation, 2005, McGraw Hills Professionals, p.3
A quite distinct history of GIS stems from the benefits of automating the map production process. Widespread achievements of the benefits of automated cartography had to await the development of suitable mechanism for input display, and output of map data, but the necessary devices such as map digitizer, interactive graphics display device and plotter had become become available at reasonable cost by mid seventies and from then on large number of organizations set out to convert all their maps into computerized form (Longley et at, 1999, p.3).
Different databases that can be used in GIS
GIS data comes from many sources, such as maps, remote sensing, imagery, CD-ROMs and the Internet. These diverse sets of data are not easily integrated. The central data integrator for GIS is the database. A major strength of GIS is that it accepts and merges diverse databases and different types of data, giving the user a flexible and powerful set of data for project work. Database in GIS is a simple concept which is a list or tables of data arranged as rows, and columns. Rows are the records or each observation entered into the database. Columns are called fields, which presents the attributes or descriptions of each record. Attributes are data descriptors, such as colour, ownership, magnitude, and classification. Database can be simple or very large with dozens of fields and hundreds of records (Davis, 2001, p.53).
Spatial data is at the heart of every GIS project or application. Spatial data contains the locations and shapes of map features. Also known as digital map data, this is the kind of data one need to make maps and study spatial relationship. Spatial data include points that represent such points as shopping centres, banks, and physician’s office lines that represents streets, highways, and rivers. It also includes polygons that represents natural areas and political or administrative areas, such as boundaries of countries, states, cities, census tracts, postal zones, and markets.2
Geo-spatial data is divided into two classes, namely raster and vector. Raster data is structured as an array or grid of cells, referred to as pixels. The three dimensional
2.Getting to know arcview GIS: The geographic information system for everyone, ESRI press, 1999, p.2
equivalent is a three dimensional array of cubic cells, called voxels. Each cell in a raster
is addressed by its position in the array (row and column number). Rasters are able to represent a large range of computable spatial objects. Rasters are natural structures to use in computers, because programming languages commonly support array handling and operations. A vector on the other hand, is a finite straight line segment defined by its end points. The discretization of space into a grid of cells in not explicit as it is with the raster structure. Vectors are an appropriate representation for a wide range of spatial data. The vector data representation is inherently more efficient in its use of computer storage than raster because only points of interest need to be stored (Worboys, & Duckhams, 2004, p.17).
Data manipulation and interrelation in GIS as it relates to Relational databases
The distinguishing feature of GIS systems is their capability of performing an integrated analysis of spatial and attributes data. The data are manipulated and analysed to obtain information useful for a particular application. There is a wide range of operations available to GIS users. Of this, two broad categories of GIS functions can be distinguished, namely, fundamental and advanced functions. This distinction is based on the extent to which these functions can be used in a variety of spatial analyses, including spatial multi-criteria decision analysis. The functions considered to be useful for a wide range of applications are referred to as fundamental functions. They are more generic than advanced functions in the sense that they are available in a wide variety of GIS systems for different data structures (Malczewski, p.36).
For data storage and manipulation, a database management system uses a data model, such as a hierarchical, network, or relational data model. The relational data model is the most widely used data model. An RDBMS is a software program that is used to create, maintain, modify, and manipulate a relational database. An RDBMS is also used to create the applications that will enable users to interact with the data stored in the database. It allows for easy data entry and manipulation, provides fast query and display and maintains data integrity and security. Relational database systems have become the commercial de facto standard because of ease of use and implementation, ability to be modified, and flexibility (Shamsi, 2002, p.182).
Most specific GIS systems support standard industry applications. Additionally, all GIS systems have unique or special tools to enable enhanced operations in certain application areas. These standard applications and special tools need to be known prior to the design of the physical database. Therefore it is imperative that GIS selection must precede the physical GIS database design process (Castle, 1993, p.283).
Castle, G.H. (1993) Profiting from a geographic information system, John Wiley and Sons, p. 394
Davis, B.E. (2001) GIS: A visual approach, Thomson Delmer Learning, p.448
Edward, W. & Adrien, D.S. (1998) Teach yourself oracle 8 in 21 days, Ind. Sams publishing, Indianapolis.
Gangopadhyay, A. & Adam, N.R. (1997) Database issues in geographic information systems, Springer, p.136
Halpin, T.A. (2001) Information modeling and relational databases: From conceptual analysis to logical design, Morgan Kaufmann, p.761
Hohle, U., Klement, E.P., Rodabough, S.E. (1992) Application of category theory to fuzzy subsets, Springer
Levene, M.M. & Loizou, G. (1999) A guided tour of relational database and beyond, Springer, p.625
Malczewski, J (1999) GIS and Multicriteria Decision Analysis, John Wliey & Sons, p.408
Nikolik, D A (2003) Managers primer on E-networking: An introduction to enterprise networking in E-business acid, Springer publications, p.285
Ricardo, C.M. (2004) Databases Illuminated, Design, Impleme CB, Johns and Bartlett publishers, p.874
Shamsi, U.M. (2002) GIS tools for water, wastewater, and stormwater systems, ASCE publications, p.375
Thuraisingham, B.M. (1997) Data Management Systems: Concepts, Developments, and Trends, CRC Press, p.257
Worboys. M.F. & Duckham, M. (2004) GIS: A computing perspective, Taylor and Francis, p.426