Mining Streams.

            Stream mining is the ability to process and analyze a large quantity of data in real-time using minimal resources—for instance, an autonomous car with over 1000 sensors capturing over 20 KB of information every second. In one hour, the autonomous car captures more than 36 GB of row data. This massive row data has to be analyzed in real-time to detect irregularities in the surrounding environment for the autonomous car to react quickly. 

            Data Stream Mining is analyzing raw data from continuous rapid data records in a stream to extract information. A data stream is a collection of data bounded by timestamps that is a continuous stream of data that evolves over time and is volatile. For instance, in our autonomous car example above, data from sensors like cameras and motion detectors come in continuously, and it changes over time as the car moves around. Once the car has analyzed the data coming in and decided, it summarizes the information stored, and the data is discarded.

            Datastream mining uses data-based techniques to analyze a representative subset of timestamped data [data within timestamp1, data within timestamp2]. Data-based techniques are used as preprocessing for data stream algorithms. An example of data-based techniques used is Approximation Techniques. Approximation techniques maintain only a subset of current data and discard previous data.  The approximation can be sequence-based or timestamp-based. In the sequence-based approximation, the system stores Y elements, and when a new element arrives, the last element is removed. The timestamp-based system is bounded in instant Tn and Tn+1, which holds the elements received within that period.

            For the system to generate useful information from the data stream, it uses data mining techniques. Several data mining algorithms such as classification, regression, and outlier detections are used in stream data analysis. These algorithms need to deal with concept drift, huge quantities of data, and limited resources (Albert C, 2019).  For example, the autonomous car uses classification algorithms such as Decision Stump to decide if it should stop or go around the detected object.  

Reference.

Albert C, 2019. Introduction to stream Mining.

Link: Introduction to Stream Mining. Stream Mining enables the analysis of… | by Albert C. | Towards Data Science


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *