Under the hood
How does IoTIFY simulate your job when you submit the run? What happens behind the scene? Learn more about our orchestration strategy.
One of the key requirements of a simulator is seamless scalability without deteriorating simulation performance. IoTIFY is designed to be truly horizontally scalable in this regard i.e. adding more clients to the simulation without affecting the baseline performance of other clients. The architecture of IoTIFY utilizes docker containers extensively, enabling seamless global scalability. As long as the IoTIFY agents have connectivity to your IoT backend, they could run and simulate the job. However, to distribute, orchestrate and collect the results of the test, we have certain internal strategies which are worth having a look at.
Let's follow your simulation as it is submitted from either the UI or through the API.
When a simulation job is submitted, it is sent to a job queue. Based on the current availability of the nodes and the job settings, the total numbers of clients required for simulation are further divided into smaller groups, let's say 100 clients each. Simulation for each chunk of the clients is then submitted as an individual task, i.e. if you would like to simulate 1000 clients, your first 100 clients could be simulated by one VM while the next 100 clients could be running on another machine. Once all tasks have been distributed, the job is marked as running.
When a task is scheduled at a node agent, it calls the Device Model Init stage before establishing a connection to the server. This is particularly useful if you want to set up credentials for the connection or even dynamically control which client should connect to which server.
When a task starts, all the clients within the task must complete the Init stage before the first message function could be run for the first client. The timeout limit specified in the template comes into play here. If a particular client could not connect to the server within the specified time, its result will be marked failed and it will not be able to send any messages to the cloud platform. So in summary, all clients must either successfully connect or definitely fail to connect before the first iteration could be executed and the first message could be sent to the server. A client who fails to connect will simply sit idle while the other clients could continue to run normally.
The connection limits/second is an advanced variable that slows down connection initiation across all the clients in all of the jobs to apply the global limit. Note that you should increase the connection timeout value if you are applying global connection limits.
Once the setup function has been run and the connection established, all clients will send messages independently to the server. The clients within the same task will all be sending messages almost simultaneously, however, the execution of the task across multiple containers may not be fully synchronized. This means that all clients in your simulation may not start exactly at the same time (which is a good thing btw) however, the interval between their message sending will be fixed. This effect also mimics the real-world behaviour of the devices which do not send data at exactly the same time.