Review process data stored on the server
Review :Advance Distributed File System SnahilIndoriaDepartmentof Computer Technology JKLakshmipat University Jaipur, [email protected] ShuklaDepartmentof Computer and technology JKLakshmipat UniversityJaipur, [email protected] Abstract—Networks of computers areeverywhere. The internet is one of the most common example of it likewisedistributed system is a network that consist of autonomous computer that areconnected through a distributed middleware. In this paper four distributed filesystem architecture Google File System, Microsoft distributed file networkAndrew File System and Sun Network File System is reviewed on the basis of performance,Scalability, Data Integrity, Security and heterogeneity for the better understandingof different file system a comparative study is required.
Keywords— DFS,GFS,SUN,AFS,GoogleFile System ,Sun Network File System,Andrew File System . I. Introduction 1File System is referred to as file management andsometimes abbreviated ad FS, A file system is a method and data structure that anoperating system user to keep track of the files on a disk or partition, theword is also refer to a partition or disk that is used to store the file or thetype of file system. A file is a collection of related information that isrecorded on secondary storage. Or file is a collection of logical related entities. File system usually consist of filesseparated into groups called directories. There are many types of File systemwhich are commonly used to determine how data is accessed.
Distributed file System or DFS is a file system is aclient/server-based application that allows clients to access and process datastored on the server as if it were ontheir own machine , when a user accessed a file on the server , the serversends the user a copy of the file, which is cached on the user’s computer whilethe data is being processed and then return to the server , a distributed filesystem organizes files and directory services of individual servers into aglobal directory in such a way that remote data access is not location-specificbut is identical from any client . All the files are requested by the by the userare located at different system at different places globally whenever any userrequest any service/file all the system simultaneously provideinformation/service to the Client. Sharing of resources is the main motive ofthe DFS.A DFA operating system runs on multiple independentcomputers, connected through communication network, but appears to its user asa single virtual machine and runs its own os. Each computer node has its ownmemory. Internet, Intranet, Mobile and ubiquitous computing are the comeexamples of DFS. Fig__ show the Architecture of a distributed file system II.
LiteratureReview Aditya B. Patel, Manashvi Birla, Ushma Nair,”AddressingBig Data Problem Using Hadoop and Map Reduce”, NIRMA university internationalconference on engineering, nuicone, 06-08december, 2012.2 The Google File SystemSanjay Ghemawat, Howard Gobioff, and Shun-Tak LeungGoogle3 A REVIEW: Distributed File System International Journal ofComputer Networks and Communications Security VOL. 3, NO. 5, MAY 2015, 229–234 Shiva Asadianfam1, Mahboubeh Shamsi2 andshahrad kashany34 III. DistributedFile System5A Distributed file system is aclient/server -based application that allows clients to access and process datastored on the server as it is on their local node, when user accesses a file onthe server, the server sends the user a copy of the file , which is cached onthe user’s computer while the data is being processed and is then returned tothe server. The Distributed file system are the bedrock of distributedcomputing in office/engineering environments.
Fig-I Architectureof Distributed File System6 Features of Distributed file system7 v Transparency8Transparencyrefers to hiding details from a user, there are three types of transparency i. Structure transparencyMultiple fileservers are used to provide better performance, scalability, and reliability.The multiplicity of file servers should be transparent to the client of adistributed file system ii. Access transparencyLocal andremote files should be accessible in the same way. The file system shouldautomatically locate an accessed file and transport it to the client’s site iii. Naming transparencyThe name of thefile should not reveal the location of the file.
The name of the file must notbe changed while moving from one node to another. iv. Replication transparencyThe existenceof multiple copies and their locations should be hidden from the clients wherefiles are replicated on multiple nodes. v UserMobilityThe user is notbounded to work on a specific node but should have the flexibility to work onany given machine at different time. v PerformancePerformance ismeasured as the average amount of time needed to satisfy client requests, whichincludes CPU time plus the time for accessing secondary storage along withnetwork access time. Explicit file placement decisions should not be needed toincrease the performance of a distributed file system. v DataIntegrityConcurrent accessrequests from multiple users who are competing to access the file must beproperly synchronized using some form of concurrency control mechanism. Atomictransactions can also be provided to users by a file system for data integrity.
IV. Charactersticsof Distributed File system 9 v ConcurrencyIt thecircumstances of happening two or more events at same time, how to handle thesharing of resources between clients/ Execution of concurrent programs shareresources: ex web pages, files, etc. v NoGlobal ClockIn adistributed system, Computers are connected through network and have their ownclocks. Communication/sharing between programs is only through messages andtheir coordination depends on time. v IndependentFailureEach component ofa distributed system can fail independently, leaving other system unaffectedv FaultTolerance.Fault toleranceis the property of the system that continue operating properly in the event offailure.
v ScalabilityScalability isthe capability of a system, network, or process to handle a growing amount of work,or its potential to be enlarged to accommodate that growth. v HeterogeneityHeterogeneitycomputing refers to system which use more than one kind of processor or cores. Thesesystems gain performance or energy efficiency but not just by adding the sametype processors also by adding dissimilar co-processor. v SecuritySecurity is oneof the most important principles, since security need to be pervasive throughthe system, security system is normally placed in distributed system. V. GoogleFile System10 Google file system is a highly scalable,distributed file system on expensive commodity hardware that provide faulttolerance and high aggregate performance and it delivers high aggregateperformance to many clients. The design has been driven by observationof our application workloads, and technological environment, both current and anticipated,that reflect a marked department from some earlier file system assumptions.
This has led to reexamine traditional choices and explore radically differentdesign points. The file system has successfully met the google storage platformfor the generation and processing of data. The largest cluster of data provideshundred of terabytes of storage across thousand of disks on over a thousandmachines, and its concurrently accessed by hundreds of clients. GFS is one ofthe most successful example of real-time application of distributed system.With very high percentage of fault tolerance.
Fig-II Architectureof Google File System11″A GFS clusterconsists of a single master and multiple chunk-servers and is accessed bymultiple clients. The basic analogy of GFS is master maintains the metadata,client contact the master and retrieves the metadata about chunks that arestored in chunk server next time, client directly contact to the chunk-serverFig II is Describing the same. Each of these is typically a commodity Linuxmachine running a user-level server process. Files are divided into fixed-sizechunks. Each chunk is identified by an immutable and globally unique 64 bitchunk handle assigned by the master at the time of chunk creation.
Chunk-serversstore chunks on local disks as Linux files and read or write chunk dataspecified by a chunk handle and byte range. For reliability, each chunk isreplicated on multiple chunk-servers. By default, three replicas are stored,though users can designate different replication levels for different regionsof the file namespace”. The mastermaintains all file system metadata. This includes the namespace, access controlinformation, the mapping from files to chunks, and the current locations ofchunks. It also controls system-wide activities such as chunk lease management,garbage collection of orphaned chunks, and chunk migration between chunk-servers.
The masterperiodically communicates with each chunk-server in Heart-Beat messages to giveit instructions and collect its state. GFS client code linked into eachapplication implements the file system API and communicates with the master andchunk-servers to read or write data on behalf of the application. Clientsinteract with the master for metadata operations, but all data-bearingcommunication goes directly to the chunk-servers. Neither the client nor thechunk-server caches file data. Client caches offer little benefit because mostapplications stream through huge files or have working sets too large to becached. Not having themsimplifies the client and the overall system by eliminating cache coherenceissues. (Clients do cache metadata, however.) Chunk-servers need not cache filedata because chunks are stored as local files and so Linux’s buffer cachealready keeps frequently accessed data in memory.
VI. SunNetwork File System12 A network file system is aremotely located file system on different networks it is a type of filemechanism which provide the storage and retrieval of data and services formmultiple disk/nodes, NFA was initially developed by SUN Microsystem in the1980s and now it is managed by the Internet engineering Task Force(IETF).Network file system versions 2 and 3 allows the user datagram protocol (UDP)running over IP network to provide stateless network connection between clientand server but the current version of NFS require transmission control protocol13(TCP) Fig-III Architecture of Sun Network File System(Client Side)14 Fig-IV Architecture of Sun Network File System(Server Side)15 VII.
Microsoftdistributed file system16 Distributed File System (DFS)Namespaces and DFS Replication offer simplified, highly-available access tofiles, load sharing, and WAN-friendly replication. In the Windows Server® 2003R2 operating system, Microsoft revised and renamed DFS Namespaces (formerlycalled DFS), replaced the Distributed File System snap-in with the DFSManagement snap-in, and introduced the new DFS Replication feature. In theWindows Server® 2008 operating system, Microsoft added the Windows Server 2008mode of domain-based namespaces DFS Namespaces Enables you to group sharedfolders that are located on different servers into one or more logicallystructured namespaces. Each namespace appears to users as a single sharedfolder with a series of subfolders.
DFS Replication DFS Replication is an efficient,multiple-master replication engine that you can use to keep folderssynchronized between servers across limited bandwidth network connections. Itreplaces the File Replication Service (FRS) as the replication engine for DFSNamespaces Fig-V Elements of Name-space 16 Namespace Server A namespace server hosts a namespace. Thenamespace server can be a member server or a domain controller. Folder Targets A folder target is the UNC path of a sharedfolder or another namespace that is associated with a folder in a namespace.The folder target is where data and content are stored.
In the previous figure,the folder named Tools has two folder targets, one in London and one in NewYork, and the folder named Training Guides has a singlefolder target in New York. A user who browser toContosoPublicSoftwareTools is transparently redirected to the sharedfolder LDN-SVR-01Tools or NYC-SVR-01Tools, depending on which site theuser is currently located in. VIII. AndrewFile System17 Started as a joint effort of Carnegie MellonUniversity and IBM_ today basis for DCE/DFS: the distributed file systemincluded in the Open Software Foundations’ Distributed Computing Environment someUNIX file system usage observations, as pertaining to caching Andrew file system (AFS) is alocation-independent file system that uses a local cache to reduce the workloadand increase the performance of a distributed computing environment. A firstrequest for data to a server from a workstation is satisfied by the server andplaced in a local cache. A second request for the same data is satisfied fromthe local cache.
An AFS may be accessed from adistributed environment or location independent platform. A user accesses anAFS from a computer running any type of OS with Kerberos authentication andsingle namespace features. Users share files and applications after logginginto machines that interact within the Distributed Computing Infrastructure(DCI).18 Fig-VI Architecture of Andrew File System (Server Side)19 File System Google File System (GFS) Sun Network File System (SUN-NFS) Microsoft Distributed System (MSDN) Andrew File System (AFS) Architecture Clustered-based asymmetric Symmetric Symmetric Processes State Full State Full State Full State Full Communication RPC/TCP RPC/TCP and in version 4 UPD TCP RPC/TCP Scalability Highly scalable Highly scalable – Highly scalable Synchronization Write-once -read many Read ahead delayed -write – Call Back Promise Fault Tolerance Failure as standard Failure as standard Failure as standard Failure as standard Table – I Comprasion of GFS,SUN-NFS,MSDN and AFS File System Conclusion In this paper the DistributedFile system of Distributed Google File system, Distributed Sun Network filesystem, Distributed Microsoft file system and Distributed Andrew File Systemreview has been done based on the Special Features of the distributed filesystem and their Comprasion is also done on the basis of implementation.
Acknowledgment There are many people associatedthe completion of this paper I would like to thank each one of them. References 1 http://searchstorage.techtarget.
com/definition/file-system.2 Aditya B. Patel, Manashvi Birla, Ushma Nair,”AddressingBig Data Problem Using Hadoop and Map Reduce”, NIRMA university internationalconference on engineering, nuicone, 06-08december, 2012.
3 Sandberg R., GoldbergD., Kleiman S., Walsh D.
, Lyon B., “Design and Implementation of the SunNetwork Filesystem”..4 Ghemawat S., Gobioff H.,Leung S., “The Google File System”.
.5 Dean J., Ghemawat S.,”MapReduce: Simplified Data Processing on Large Clusters”, OSDI 2004. I.S. Jacobs and C.P.
Bean, “Fineparticles, thin films and exchange anisotropy,” in Magnetism, vol. III, G.T.Rado and H. Suhl, Eds.
New York: Academic, 1963, pp. 271-350.6 Keerthivasan M.
, “Reviewof Distributed File Systems: Concepts and Case Studies”, ECE 677 DistributedComputing Systems 7 Aditya B. Patel, Manashvi Birla, UshmaNair,”Addressing Big Data Problem Using Hadoop and Map Reduce”, NIRMAuniversity international conference on engineering,8 Youwei Wang, Jiang Zhou,Can Ma,WeipingWang, Dan Meng, Jason Kei , “Clover: A distributed file system ofexpandable metadata service derived from HDFS”, IEEEInternational Conference on Cluster Computing, DOI10.1109/CLUSTER.54,pp