Development of a robust real-time synchronized data transmission technique from a Magnetic Observatory to an INTERMAGNET GIN

Development of a robust real-time synchronized data transmission technique from a Magnetic Observatory to an INTERMAGNET GIN

Since the internet availability at CPL is very limited due to its location far-off from a city, we have approached a reliable permanent fiber optic setup from BSNL (Bharath Sanchar Nigam Limited) maintained by Govt. or India. But it was too expensive to set up and maintain, so we have used the facilities of a local service provider, with the maximum bandwidth of 20 Mbps to initiate the data transfer technique.

Initial configuration

The online data transfer from CPL to HYB Observatory was started using cross-platform data transmission as the ISP (Internet Service Provider) resources were unavailable. As the service provider could not resolve a few issues at the TCP/IP network level about data transmission from a Linux machine to another distant Linux machine, we had to go ahead with a cross-platform data transmission process, since the final data had to be processed on Windows based Matlab codes.

Initially, we have set up some shell scripts, cron jobs and rsync protocol to transfer the data from the Magrec-4B data logger to an intermediate Linux machine (Centos) deployed at CPL. The data was transferred from Magrec-4B to the Linux machine (backup storage) at the CPL control room with a latency of 5 min and then it was transferred to a Windows machine (client) at HYB Observatory using codes, scripts developed by us and , third-party tools (Fig. 2). Since the bandwidth was low, we have decided to transfer the data from the Linux machine to Windows pc at HYB-NGRI with a time-lapse of 1 min.

Figure 2
figure 2

Cross platform data transfer system from Linux PC (deployed at CPL) to Windows PC (deployed at HYB) & percentage of successful data transmission between the systems.

We have installed a Batch file with the “Abort” option and confirm with the “Off” option to ascertain the health of the connection at the client end (Windows pc), reiterated for a default time limit of 120 s. The session commences by checking the Host ID username and the authenticated pre-entered password with RSA (Rivest, Shamir, and Adelman) Key through SFTP (Secured File Transfer Protocol). The terms ‘Comparing’ and ‘Synchronizing’ in the figure show the details of the data transmission from host to client machine at conventional intervals with a time interval of 120 s.

From Magrecc-4B, we have culled 9 data parameters as shown in Fig. 2, to transmit real-time data to the Client machine. The details of the file size, each data parameter, and the haste at which the data is being transmitted from the host to the client machine are withal. The percentages in column 5 of fig. 2 show the client machine’s data transmission and updating process. 100% data transfer is achieved only when the data is copied with the latest records of 120 s, and additionally, the client machine rechecks the data by synchronizing the earlier records of the current day. The example of the perpetual data transmission process with the latest records and the updating process is also shown in row 9 of Fig. 2. Once the data is synchronized with the latest records (for example row 9 filename in Fig. 2), the 23% of file transmission will become 100% upon completion of this task, further synchronizing with the earlier saved data. The file size of the above said nine parameters keeps incrementing for every 120 s of the data being updated at the host machine. The whole process is reiterated for each cycle of 120 s till the day is completed.

As a large amount of data from both the Observatories is to be transferred and needs a dedicated storage to save the data on a daily basis, we have setup a server at HYB Observatory. And also, at CPL, the internet network services were upgraded recently along with the increased bandwidth of 50 Mbps (which is the maximum available bandwidth to date), which allowed us to configure the automated robust data transmission technique to GIN and details of these are discussed below.

Final configuration

Since our main aim was to achieve automated 1 min data transmission from HYB and CPL Observatories to GIN, we had to make additional R&D efforts to develop a robust setup concerning both hardware (ie, high-end workstation, firewall router setup) and software. Thus the Python code, shell scripts, cron jobs and rsync protocol were developed to take care of data transmission without data loss. Even when there is a disconnection in the internet services, once the internet services are restored, the Python code will recheck the data from the last successful transmitted file.

The data transfer from CPL and HYB to the Central Server located at HYB Observatory, does follow RSH & SSH key algorithm which is by itself a much secured algorithm. We have designed a system to transfer the data in a secured and encrypted pattern with SSH keys and save the same dataset in the local server at CSIR-NGRI. We have used the RSA-SSH algorithm (Rivest–Shamir–Adleman), which is a public-key cryptosystem that is widely used for a secure data transmission. The key generated by the ssh-keygen in the source machine (MAGREC-DAS) will create two files namely “id_rsa & id_rsa.pub in the .ssh directory, which is shared/copied to the destination machine (Centos). So there is a perfect handshake in between the source and destination machine for data transfer. This setup remains the same unless the network remains the same, so for this reason we have assigned a static IP. Along with the ssh-keys, a code has been written to transfer the data using ‘rsync tool’ and the same has been inculcated in the ‘crontab’ to keep reiterate the same with a time frame of 10 s. Now the same technique was also used at HYB Observatory from Centos machine to the Server for a secured and successful data transmission.

After the successful R & D efforts of data transmission from both the Observatories to a dedicated high-end Linux Server, with a 24 TB RAID-5 configuration at HYB Observatory, we have created individual user accounts in the server, ie, IMO-CPL , IMO-HYB, to store the received data from the respective Observatories. The developed Python code will transfer several data types from DAS and store them in respective user accounts daily (Fig. 3). The developed scripts from each Linux PC will filter the data according to the directory’s requirement (ie, GIN). The sorted data from the individual directory will be transmitted with a latency period of 300 s to INTERMAGNET GIN.

Figure 3
figure 3

Automated 1 min data transmission from (a) CPL and (b) HYB Observatories to Edinburgh GIN using the Python code.

After successful data transmission from both the Observatories to GIN, we came across a few minor issues, and how we have resolved them is discussed in detail:

Issue-1 Initially the Python code was executed using ‘rsync synchronization protocol’ with a minimum latency period of 60 s to transfer the real-time data from both the Observatories. As reported by GIN experts, with this latency period the same data was being sent repeatedly to the receiving web service (http://app.geomag.bgs.ac.uk/GINFileUpload/UploadForm.html), Fig. 4a, due to which the storage/cache memory at GIN was receiving huge volumes of data from both the Observatories. This was causing problems for their entire web service, with log files filling up very quickly and, the cache of the data in the web service was difficult to use because it occupies large disk space (Fig. 4b).

Figure 4
figure 4

a) Details of data cache memory for both the Observatories at INTERMAGNET BGS website (b) error message showing “no space left” due to huge volume of duplicate data on BGS server.

Solution To resolve the above issue, we have created background daemons instead of ‘rsync synchronization protocol’, so that the rechecking of data for every 60 s was replaced with 300 s. The daemons at the backend will execute the Python code for every 300 s for smooth data transmission of real-time data without any duplication (as shown in Fig. 3).

Issue-2 After successful transmission of data from both the Observatories, on a few occasions, the data plotting services at the INTERMAGENT were not reflected even though our hardware and software were intact. We have cross-checked the logs from our end and found the data was successfully uploaded to GIN. Even though the data records are successful, why was the data not plotted on the INTERMAGNET website was unknown.

Solution The above issue was resolved after the BGS experts have suggested a link (http://app.geomag.bgs.ac.uk/GINFileUpload/UploadForm.html) to upload a single day file to check if it was successful or not? As suggested by BGS, if the upload of data was not successful and with some errors (Fig. 4), there is an issue at the INTERMAGNET server. This verification made us ascertain that the code which we are executing is functioning correctly (Fig. 5).

Figure 5
figure 5

Cross checking or (a) CPL and (b) HYB data logs from HYB Observatory server to GIN server.

Leave a Comment

Your email address will not be published.