Cost-effective and Fault-tolerant Model for Real-time Analysis

Das, Sushree (2018) Cost-effective and Fault-tolerant Model for Real-time Analysis. MTech thesis.

[img]PDF (Restricted upto 26/04/2021)
Restricted to Repository staff only



In this study, an attempt has been made for making financial decisions such as stock market prediction, to predict the potential prices of a company’s stock. To serve the need of this, Twitter data and Stock market data have been considered for scoring the impression that is carried for a particular firm. These behaviours and sentiments are then used to predict the rise and drop of market value of each stock. Streaming data prove to be a perennial source of data analysis collected in real-time, which basically deals with the continuous flow of data carrying information from sources like websites, mobile phone applications, server logs, social websites, trading floors, etc. The major characteristics of such data being its accessibility and availability, help in proper analysis and prediction of user behavior in a ceaseless manner. The classifying model made out of historical data can be relentlessly honed to give even more accurate results since its outcome is always compared to the next tick of the clock. Spark streaming has been considered for the processing of humongous data and data ingestion tools like Twitter API and NodeJS have been further implemented for analysis. There have been researches made on the same concept but the present goal is to develop such a model which is scalable, fault tolerant and has a lower latency. The model rests on a distributed computing architecture called the “Lambda Architecture” which helps in attaining the goals as intended. Upon analysis, it was found that prediction of stock values is accurate when support vector regression through the help of Spark’s MLlib is used and historical stock values are considered as datasets for training the models.

Item Type:Thesis (MTech)
Uncontrolled Keywords:Spark streaming1; NodeJS2; Twitter API3; Lambda architecture4; MLlib5.
Subjects:Engineering and Technology > Computer and Information Science > Data Mining
Divisions: Engineering and Technology > Department of Computer Science Engineering
ID Code:9622
Deposited By:IR Staff BPCL
Deposited On:24 Apr 2019 19:10
Last Modified:24 Apr 2019 19:10
Supervisor(s):Rath, Santanu Kumar

Repository Staff Only: item control page