Nowadays a lot of methods of intelligent software agents learning and adaptation exist. One of them represents the reinforcement learning. This method has proved to be a mechanism capable to be a success in coping with different tasks, types of environment (with or without Markov property), discrete and continuous variables values. Taking into account that in the basis of the algorithm there are mechanisms of random selection, the methods of reinforcement learning suffer from the problem of “curse of dimensionality”. This paper offers an approach considerably reducing the space of search without losing the quality of Q-table obtained. The most ordinary but popular method of learning - SARSA(λ) (temporal-difference with eligibility traces) – is an example where the developed algorithm was applied. As a task, not less popular example of agent management in the cellular world possessing Markov property is used. The essence of this method is that the agent, as in the case of the eligibility traces, uses additional labels (marks) operating as an award. The approach does not go outside the framework of the actions available for the agent.