Spectrum There are several studies using neural

Spectrum sharing is one of the most challenging issues in Cognitive Radio Systems (CRS). In November 2008, the Federal Communications Commission (FCC) conducted a new rule which allows applications and devices to use the white space, the unused part of the RF rangecite{radio_2011}. The devices which use this white space spectrum have to prevent collision by using technologies,for example, range detecting and geolocation capacities. The thought for CR was created by Joseph Mitola at the Defense Advanced Research Projects Agency (DARPA) in the United Statescite{cognitive_2017}.In order to solve the spectrum sharing problem, DARPA hosts a competition, DARPA Spectrum Collaboration Challenge (SC2) cite{darpa_darpa}.

The competition focuses on collaboration between competitors. Current spectrum management assigns exclusive spectrum to different users by some human-made policies, and thus leads to lots of underutilized private radio band. Rather than dividing radio band into small regions for exclusive users, SC2 believes that the solution to this issue is collaboration. It is trying to resolve the issue, between scarcity and increasing demand of radio band, by encouraging radio users to actively communicate with each other and to dynamically adjust the radio frequencies by the shared information.As CRS is driven by a decision making, the decision on spectrum availability, strategy for selecting a channel for sensing or access, and how to optimize radio performance should be taken. The cognitive process could be implemented in a centralized or distributed fashion. This aspect is critical since the decision making could be influenced by collaboration between them and also with other devices. The decision algorithms can be an issue here.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

There are several studies using neural networks, genetic algorithms, ant-colony optimization, etc. which have customized the algorithms to fulfill the CRS requirementscite{oshea_deep_2016,jiang2011efficient,ding_spectrum_2017,li_multi-agent_2009}.Here the problem is defined as taking the best action by an agent while there is another agent affecting the environment, named DARPAcite{darpa_darpa}. Taking the best action is to avoid colliding DARPA in any states, thus an agent can take the best action if it can both take the current system state and DARPA’s next step as input. To accomplish the task of selecting a good action among all possible actions for an agent, reinforcement learning rules are powerful tools. The reward and punishment used in RL algorithms help the agent to learn actively from the dynamic environment around. In one hand, whenever the agent takes a good action it will be encouraged by a positive reward. On the other hand, when the chosen action results in a collision with DARPA, it will be discouraged by a punishment.

The agent has to increase the total score by gaining more reward and less punishment. There have been many studies using different methodologies in learning the RL model, the model-free ones consist of SARSA, Q-learning and Expected SARSA, while there are some model-based approachescite{sutton_reinforcement_2017}. In this study, we will discuss the result of model-free RL algorithms, along with a model-based, Markov decision process approach.This paper is organized as follows: after this introduction, a brief background is presented in Section

In Section
ef{appr}, studied approaches are described. Section
ef{eval} will present analysis on the results. Section
ef{relat} will describe related work done by others briefly.

The conclusion of the paper is drawn in Section


I'm Ruth!

Would you like to get a custom essay? How about receiving a customized one?

Check it out