SDF: psychological Stress Detection Framework from Microblogs using Pre-defined rules and Ontologies

: Spreading of Unwanted microblogs from Social Networking Sites (SNS) is pervasive in social media that leads to unaccountable disturbances such as Mental disorders, Wastage of precious time, Break-up of relationships, Stressness giving birth to psychological health problems and many more. To overcome these problems, the immense necessity is to ignore those unwanted microblogs in SNS, which is uncontrollable by humans due to addiction towards social media. Even the literate people fall prey to psychological stress from SNS. This seriousness of stress related issues is very rarely attended by researchers, to tackle such vicious microblogs. The prediction strategy is proposed named as Stress Detection Framework (SDF) to analyze the stress in microblog. SDF is developed using Ontology based Information Extraction technique using Probabilistic Model (GSHL & TreeAlignment Algorithm ), set of pre-defined knowledge based logical rules that constitutes of low-level attributes (simple textual, linguistic words) and visual features (emoticons & Images) and social Interaction (Likes and Dislikes) to detect and predict stress in microblog messages.SDF is compared with TeniStrength that has shown an increase of 94.2% of stress detection rate. The experimental results obtained will aid to take precise decision for blocking/eradicating/ segregating stress related microblogs from Social media (especially SNS).


Introduction
Predicting of Stress related messages in social media especially from Social Networking Sites (SNS) has made adverse affect on human health condition and their behavior. Pew Research reports briefed the upsurge of social media and its impact on civic life due to sharing of information leading towards psychological stress on human minds [1]. The dark face of Facebook SNS is it could collect and then uncover innumerable hidden facts of the individual's privacy unintentionally with other individuals unknowingly. The affordances of Facebook is programmed in such a way that explore the individual profiles, likes, dislikes, relationships, negative psychological emotions, thoughts, events, visibility and persistence resulting in unusual stressfulness [2]. The Surveillance of Facebook Offline and Online contacts had revealed the ex-partners relationships, break-ups and their association of sexual desire ended in degradation of personal growth and stressful life [3]. No doubt, the feasibility of Facebook's social interaction provides the users with uncountable advantages of Fast and economic communication for sharing of information, providing updates, personal satisfaction through entertainment and establishing reliable connectivity with other groups along the wide area networks at cheaper costs. The accumulation of psychological stress words in microblogs from various SNS (Facebook, WhatsApp, Linkedin, Twitter) of same user is bit difficult task due to different architectures of SNS. As it is well-known fact that, Facebook has its own methodology for collecting of individual profile information at different point of time, where as WhatsApp does not support this facility instead, it uses mobile number through which various activities of the users are attached at one location. Similarly, Twitter has its own affordance of collecting, sharing and responses of user's tweets from different users. In SNS, Identifying the behavior of user varies from person to person as it depends on various artifacts such as linguistic, usage of patterns, words and their mentality of responses for a given context. To detect those words, WordNet Ontology can be effectively used for this purpose, WordNet Ontology is an intelligent logical dictionary that assists to detect the various synonyms for a single or multiple words [4]. Further, the Probabilistic Greater Similarity Hierarchy Learning (GSHL) and TreeAlignment Algorithms assists in categorizing the predicted synonym of a particular word precisely under a specified Ontology through learning process [16]. Predicting the context from set of existing words is an intelligent mining process, for this machine learning techniques are used which relies on historical knowledge. Most of these techniques are efficient, but are deployed on well-defined data to get the results from which logical rules are formulated. The hindrance of machine learning technique is they make use of statistical measures which exploit topic-related terms, but the results obtained are not satisfactory in practical applications [5]. TensiStrength program uses the lexical approach with set of logical rules to predict the stress from tweets has given better results when compared to machine learning and sentimental analysis techniques [5]. Thus, studies suggest embedding of welldefined rules in real-time applications, assist for fast and quick retrieval of useful information. knowledge guided with pre-defined rules for classification of various e-crimes [6]. Similarly, Anti Phishing Detection (APD) system aids to predict the various deceptive phishing attacks from microblogs in SNS, this APD system was developed using data mining technique in which well-defined phishing rules were effectively used [7]. Formulating of knowledge based logical rules and extracting the best features for decision making process requires extensive domain expert knowledge in a specific field. Adverse effects on human minds with continuous usage of SNS as well as strategies exist to overcome stress from SNS are discussed in this section. The Section 2 briefs the detailed history of psychological stress detection strategies developed earlier in literature, beginning from Manual testing stress detection methods till automated tools. Section 3, elaborates the proposed Stress Detection Framework (SDF) strategy for detection of stress using WordNet Ontology and Set of pre-defined logical rules. In Section 4, SDF strategy is undergone through testing phase and the experimental results obtained for a given scenario that constitutes of chatting sessions (i.e., microblogs). Further, this SDF strategy is compared with TensiStrength stress detection technique. Finally, Section 5, we conclude that the results from SDF strategy obtained are far better than TensiStrength technique which could be able to detect the patterns of words from various users by understanding language linguistics in SNS. No doubt, Sentimental analysis program and machine learning techniques are good enough to retrieve the results using various algorithms. But, in real time applications with the use of pre-defined rules the accuracy of stress detection rate is increased up to 94%. Future scope of this research is to identify the stress from multilingual languages, short-form words among different age groups that vary from childhood to young-aged, middle-aged and old-aged people. It is well-known fact that with the development of innovative high-end technologies, electronic gadgets has become cheaper, which could be affordable by everyone and usage frequency has drastically increased. Its usage is found more between the age group of 17 to 40 as per PEW trends report [1].

Problem Statement And Related Work
These days, Psychological Stress is turning into a risk to individual's wellbeing. It is of critical significance to recognize and oversee the stress before it transforms into serious issues. Psychological Stress Detection (PSD) constitutes of four different methods namely i) psychological evaluation, ii) Evaluation of physiological signals, iii) Behavioral responses (Twitter) iv)and Visual & Social media interactions (Facebook, WhatsApp). In this paper we considered Textual behavioral responses, Visuals and social media interactions for predicting the PSD from Social media, especially through SNS. In 2014, H. Lin and et. al., detected the psychological stress from Text, Visual, and Social interaction attributes from short text, images properties, and count responses received through tweets using Neural networks that involves excessive training [12]. The same authors H.Lin and et. al., extended their work in the year 2017, using Convolutional Neural Network (CNN) and Factor Graph Model (FGM) approaches to increase the stress detection rate i.e., 87.2% to 91.4% in which tweets constitutes of Text, Visual, Behavior and Social interaction attributes, among these attributes behavior is an extra attribute is considered for analysis of stress [13]. Identifying Psychological stress in SNS is rarely attended by researchers. The "blue whale challenge game" shocked the world due to multiple suicides of young children's with the use of social networks. The victims of stress-buster game are found to use keywords of suspicious activities inside their personal SNS accounts [8]. Many of these keywords are pointed towards psychological stress-levels. Decision for committing of suicides is due to increase of acute stress-levels. The cybercrime department needs to surveillance the mciroblogs of those offenders for inciting suspicious actions in SNS and then initiate rigorous steps to entrap these e-criminals [6].
FriendA reply to User: when will you return from busy works ?
User reply to friendA: I am sad as it will take time . The real time snippet of user posts is taken from one of the users twitter account as shown in Fig. 1, explores a short post in which stress related words through which the intention of emotions is conveyed by using keywords such as "unfinished work", "feeling sad" marked in red color. Similarly, an Image and Emojis (":(" or "") are also posted that shows emotions of the user. Identifying the stress features from user's textual words, images and Emoticons is not an easy task. The precise decision related to stress from SNS could be taken after keenly examining all these (3) three features of user's post as discussed in Section 1. Efficient models were developed earlier that spy the textual posts from SNS exists for surveillance of phishing and suspicious words [6] [7]. These models embedded the Learning Ontologies using Probabilistic GSHL and TreeAlignment Algorithms which had enhanced the accuracy for categorizing the predicted words precisely under the various suspicious cybercrime domains in SNS [7] [16]. The emotions are categorized into (2) Two different types which are either Positive or Negative. Every emotion is not a stress it depends on various parameters such as context, time, day, event and occasion in which the keywords are used [11]. The stress from image is identified by extracting the features such as dullness, brightness and clearness which is calculated using the mean between the specified threshold values [9]. Similarly, the saddish emojis are used to express the emotions in Facebook which ultimately leads to stress through social media [10]. Likes and Comments, expresses the strength for the given post from 'n' number of users and the comments raised for the shared post which are collected in the form of Text, emojis or images that has to be analyzed to find stress [11]. Recently, a new approach named as TensiStrength program uses the lexical approach with set of logical rules to predict the stress from tweets has given better results when compared to machine learning and sentimental analysis techniques, but its use is restricted to Textual keywords [5].
In this paper knowledge based well-defined rules are used that constitutes Two (2) types of attributes namely, Low-level and Social Interaction attributes, in which the terms Linguistic and emoticons are mentioned. The term "linguistic" means hidden meaning for words in minds are expressed via short posts in natural language which are frequently used by children's and adults to convey the intent of message through set of words, the term "emoticons" represents emojis from which the thoughts are expressed using small digital icons or images where as the term social interaction represents "Likes & Comments". Thus, short posts either in Linguistic words, emoticons format or through social interaction mode that shows emotions which leads to stress is predicted from SNS/ Instant Messaging System (IMS).

Proposed Stress Detection Framework (SDF)
In this Section the operational phases of Stress Detection Framework (SDF) for predicting the stress from SNS is elaborately illustrated in Fig. 3. The Stress Detection (SD) algorithm initiate the steps to capture the short posts/microblogs that are sent between the clients/users and stores them into database for identifying Stress using Set of pre-defined logical rules (SPSWDB) shown in Table I, and Ontology based Information Extraction (OBIE) technique. The Schematic-cumflowchart of SD algorithm is shown in Fig. 2. DFDF

Fig. 2. Stress Detection Framework for detection of Psychological Stress
For, the proposed SDF architecture (Fig. 2), the SD Algorithm ( Fig. 6) is applied that initiates the microblogs to be captured into SPDB. The words stored in SPDB are checked for unnecessary words and are filtered out using Information retrieval techniques [14]. After filtering of irrelevant words, the Textual words, Emoticons, and Images are found to exist. The Textual words are pushed into TPDB, where as Emoticons and Images are pushed into EIDB [13]. The relevant words stored in TPDB are mapped with set of pre-defined logical rules stored in SPSWDB words to find the Stress words, on detection these stress words are pushed into SKDB. Subsequently, Emoticons and Images that are stored in EIDB are mapped with SPSWD, to find the stress related emojis, on detection, these emoticons are also pushed into SKDB where as for images the saturation (brightness) is calculated using mean value and then the stress value obtained is pushed into SKDB [ 9]. In the next stage, the stress words, emoticons and image mean value which are stored altogether in SKDB are checked for threshold values of stress level. When, the pre-defined constraint of threshold is satisfied the stress is identified and reported to ecrime department. a. The stress words that are compared with SPSWDB, on detection are shortlisted and then stored in SKDB. Subsequently, Emoticons for which a match is found after comparing with SPSWDB are also stored in SKDB. b. Whereas the calculated Mean Image value retrieved after mapping with pre-defined threshold value (SPSWDB (Rule 2)) given in SD algorithm is stored in SKDB. c. Similarly, the calculated Mean value of social interactions i.e., likes and comments are checked with the pre-defined threshold value if it is met then that value is stored in SKDB.

5.
All the features retrieved from Textual attribute, Emoticons attribute, Image and Social interaction attribute which is stored in SKDB are fed into the probabilistic SD algorithm to predict the stress levels of the user. Finally, in last step results obtained regarding stress-levels and its seriousness are predicted, which is stored in the form of report. If the chatting session activities are encounter into stressful activities then a recommendation is submitted automatically to ecrime department for appropriate action for stress management in SNS.

Fig. 3. Schematic-cum-algorithmic Steps of SDF is Illustrated
The OBIE, is a probabilistic learning method that predicts the domain to which these stress words belongs [16]. We used different database tables namely SPDB, TPDB, EIDB, ODB, SPSWDB, SKDB, EDB and Metadata. In this Framework, SPDB (Short Posts database) is used to store the online messages/posts that are communicated in between the users (chatmates). ODB (Ontology Database) is a lexical database that identifies terms, Synonyms, Concepts, Taxonomy (concept hierarchy), relations, Axioms and Rules [4] (Table I).

Topic (Domain) Hierarchy Construction GSHL Algorithm
Wang Wei and et al., had developed Global Similarity Hierarchy Learning (GSHL), this algorithm recursively finds for the most alike topics of the current "root" topic and removes of those words under the domain which does not fulfill the condition of the difference of KL divergence. This algorithm starts with an initial topic as the root node and look for the top n most similar topics according to (dis)similarity measures. The parameters used in the algorithm are given below: • N -The total number of topics (domains i.e. root words).   Fig. 4.

OBIE Model for Root word (Domain) Extraction using Treealignment algorithm
Ontology refers to intelligent information constitutes a set of words within a domain and predicts the association among those words within the specific domain and among the other successive domain(s). The hidden stress words are explored from the short posts or microblogs using pre-defined logical rules (SPSWDB) given in Table I, and stress domain words are identified i.e., Rule 1 (Stress Lexicon, Negative emotion Lexicon and Negating words Lexicon, Emoticons) using ontology, Rule 2 (Images and Social interaction attributes) are found in our Stress Detection Framework. The Ontology actively assists in forming of partial tree (constrained to 2-levels, i.e. "parent-child") using algorithm in [15] [17]. During this Tree building phase, the stem words (TPDB), participates in mapping with the Domain words (SPSWDB), at the same time threshold value for each Domain is checked using SD algorithm shown in Fig. 6.  Mean of brightness and contrast of images is calculated. When Mean is less than 0%, image has low brightness. When Mean is greater than 0%, image has high brightness. (low < 0% >high) Social Interaction (Mean Value) →

RULE 3 (threshold value)
Check the user-defined threshold value for the stem words that may belongs to multiple domains, using precision and Mean values (Rule 1 and Rule 2) SD algorithm (Fig. 6) RULE 4 (undetected words to be ignored) Sometimes, Special characters, unknown image formats are also sent via SNS which are ignored Algorithm 1. GSHL (root) Require: Initialize V, Ms, I, THs, THd, THn, and Mc. Ensure: A terminological ontology with "broader" and "related" relations.

Role of Stress Detection(SD) Algorithm for the proposed Stress Detection
Framework. Detection Framework, as already discussed in previous section that has initiated the overall progress starting from storing text messages in SPDB till finding the stress keywords by providing a detailed report from SKDB and EDB databases to E-crime department on detection of stress words. SDF schematic cum algorithmic steps of Fig. 3, are revisited again are shown in Fig.  6.

Algorithm 3. Stress Detection Algorithm (SD)
Input: Short posts/microblogs stored in Text Database (SPDB) (day to day) from IMS/SD Framework. Output: Report to E-crime when Stress messages are detected

1. Evaluation method for data sets
Precision metric is used as given in Equation (2) [14] for evaluation of tweets in SD Framework. The stress words extracted efficacy are based on two factors, the number of actual words available in the pre-defined database i.e. SPSWDB with respect to stress domain, to that of the number of extracted stress words from tweet chat session:

Tested using SD Framework and TensiStrength
The real chatting session is intentionally conducted and the experimental results are demonstrated for the conversation happened between the two users, as shown in Fig. 7. For the twitter chatting session (Fig. 7), is tested using SD algorithm (Fig. 6), where the threshold value is set to 7%. The stress are in red-color are detected namely "busy works", "feeling sad", "unfinished" (not detected), "overtime", "stress". Apart from that, Emoticon which are exactly mapped with our predefined database (SPSWDB) rule 2, are Four (4), where as stressful image mean value appeared to be highly saturated is One (1). The precision rate obtained by Tensistrength is 58.32%, whereas 94.2% with SDF system as shown in Table II.

Challenges And Future Work
As it is of significant importance to detect and manage stress before it turns into severe problems. Traditional methods are mainly based on interviews conducted by psychologists or using sensors which is time consuming process. The proposed SDF strategy detects psychological stress by employing the probabilistic model [17]. In SDF model two unique features are added which were not done earlier, except TensiStrength which has used only one feature of pre-defined rules that to only textual words are considered. In our research work, the First feature is we have not only embedded set of predefined logical rules comprising of Linguistic stress lexicon words, but also Emoticons & Images and Social interaction attributes. The Second feature is integration of OBIE technique (Probabilistic GSHL Algorithm) which was not used earlier by any of the state-of-art systems. The comparative features of SD Framework and TensiStrength are shown in Table III.