Windows
Researching Windows OS cyber threats and designing and creating new Artificial Intelligence AI-based cyber security solutions, such as intrusion detection, threat intelligence and hunting, privacy preservation, and digital forensics, are very interested in enhancing system security. Various cyber threats have prompted the development and creation of numerous cyber defensive mechanisms with specialized techniques addressing particular device subsystems. Latest Intrusion Detection Systems are one specific set of defense systems that have drawn a great deal of interest for research work over the last decade.
IDSs can be differentiated into two: Host IDSs are deployed and keep a check on a host computer, while Network IDSs are set at key locations in a network and keep watch over incoming and outgoing traffic. Based on their internal mechanism, IDS is defined as either anomaly-based, alerting on atypical behavior, or signature-based, identifying known misused patterns. To test IDSs and AI-based security tools, high-quality data must be employed that accurately reflects existing behavioral situations in both attacks.
This paper solves the above problems by proposing new Windows datasets that include new features derived from the audit traces of memory, processor, process, and hard disk in a novel IoT network architecture. The testbed was implemented in three layers edge, fog, and cloud. The edge tier consists of network and IoT devices, the fog layer consists of gateways and virtual machines, and the cloud tier consists of cloud services like data visualization and analytics associated with the other tiers.
These tiers were elastically managed by utilizing the software-defined Network and Network-Function Virtualization technologies with the VMware NSX and cloud NFV platforms. During the testbed configuration and deployment, attack and normal events were run to capture labeled samples of data according to a real ground truth table used for testing new cyber security programs’ performances.
Intrusion Detection Systems
IDSs are windows grouped into two categories, host-based and network-based IDS. Network-based IDSs are systems deployed at a network’s key locations, e.g., gateways, to observe traffic destined for and coming from the internal network for abnormal patterns. Host-based IDSs are host-based agents to monitor host PCs, based on information, e.g., system calls and logs, to identify unauthorized activity. An IDS can also be classified based on the detection method employed into three categories, signature-based detection, anomaly-based detection, and a combination of the first two.
A signature-based detection employs pre-defined patterns, or signatures, that can be employed to detect attacks. The concept of employing signatures to detect intrusions is that a pattern database is kept and updated periodically to match and identify attack events. A signature-based HIDS tracks the state of a host by checking for different logs, memory dumps, operating systems, linux, network traffic sent or received by the host.
They can offer a high rate of detection against known attacks and generate results with high speed. Signature-based products are, however, affected by even slight variations in the well-known attack patterns, which are a prevalent countermeasure adopted by attackers. Besides, since they are based on known patterns, signature-based methods cannot recognize unknown attack patterns or so-called zero-day exploits.
Edge layer
Concerns are the physical devices and windows operating systems used as the infrastructure of setting up the virtualization technology and cloud services at the fog and cloud layers, respectively. It comprises several IoT/Eliot devices, e.g., Modbus, light bulb sensors, smartphones, and smart TVs, and host systems, e.g., workstations and servers, utilized to bridge IoT/Eliot devices, hypervisors, and physical gateways, i.e., routers and switches to the Internet.
NSX-VMware’s hypervisor technology was deployed on a host server at the edge layer to oversee the Virtual Machines developed at the fog layer. Comprises the virtualization technology to manage the VMs and their services through the UnixWare and iCloud platforms to provide the infrastructure for implementing SDN and NFV in the suggested testbed. The NSX iCloud NFV platform supports the implementation of a dynamic testbed IoT/Eliot network of the Ton IoT by developing and managing several VMs for hacking and regular operations, enabling communications among the edge, fog, and cloud layers.
Data Logger Systems
Windows 7 and 10 operating system traces are to be logged under the testbed. The Data Collector Set tool configures data gathering points, such as performance counters and event trace data, to form one collection. Data Collector Sets allow us to plan the gathering of data in such a way that it will be analyzable and will be outputted in CSV files, just like our data sets. With launching normal and hacking situations, Data Collector Sets tools in both Windows VMs were pre-set to capture data aspects of memories, networks, hard disks, processors, processes that occurred.
Data that were gathered for performance counters by Data Collector Sets tools set on Windows 7 and Windows 10 were saved in log files and may be opened using the Windows Performance Monitor tool. The data features created for the Windows TON IoT dataset are explained below. The fog and edge services were integrated with the public HIVE MQTT dashboard, a public PHP-vulnerable website, cloud virtualization, and cloud data analytics services.
The public HIV MQTT dashboard allowed us to publish and subscribe to the telemetry data of IoT/IIoT services through the node-red tool configuration. The public PHP-vulnerable website was utilized to initiate injection hacking events onto websites. The other cloud services were set up either in Microsoft Azure or AWS to send sensory data to the cloud and visualize their patterns.
Feature Generation and Labelling
The Windows datasets were produced on the virtual machines with Windows 7 and Windows 10 and integrated the sets of data from the system’s multiple sources, including memory, process, processor, and hard drive. For the production of the Windows datasets, the collectors of the Performance Monitor Tool were run on each system. The raw initial version of the datasets was gathered in a format, and using the Performance Monitor tool, the disk, process, processor, and memory activities were parsed and stored in a CSV format.
As discussed, the Windows 7 dataset produced 133 features and other two attributes of the class label, normal or attack, and attack types of the nine attacks employed in the testbed. Windows 10 dataset created 125 features and the remaining two attributes of class label and attack types. Once the data features of Windows 7 and Windows 10 were created, we appended the two features to label each record as normal or attack type. Labeling was applied by the ground truth CSV files with the attack events that occurred during the testbed execution.
The test timestamp attribute was compared against every record within the CSV files, such that if the test from the ground truth table is equivalent to the test from data records CSV files, records were classified as attacks otherwise, they were classified as normal. The authentic labeling process of the datasets proves the fidelity of the correct security events that arise during the implementation of the testbed and its truth for evaluating cybersecurity solutions based on machine learning algorithms.
CONCLUSION
This paper has introduced the description and preliminary tests of the Windows TON IoT datasets created at the IoT lab of UNSW Canberra. To form the federated datasets, a new IoT testbed was planned that comprised a diverse set of IoT services hosted at the edge layer, operating system virtual machines configured at the fog layer, and cloud services built at the cloud layer. Dynamic interaction among the three layers was deployed with the VMware NSX and iCloud NFV platforms to offer SDN and NVF services.
Nine attack classes and recent normal were implemented in the datasets to produce real data sources for measuring the dependability of novel AI-driven cybersecurity solutions. In addition, Windows 10 and Windows 7 data features were obtained from the audit traces of memories, hard disks, processors, and processes to provide a means of detecting novel attack patterns that may quietly exploit Windows operating systems.
A large number of data samples were gathered for Windows 7 and Windows 10 datasets. The gathered datasets represent a broad spectrum of normal and attack activity, demonstrating the fidelity of the datasets for testing new AI-driven cybersecurity solutions, such as intrusion detection, privacy protection, and digital forensics, as well as threat intelligence and hunting, which we plan to investigate in the future.