Spark streaming - filter tweets after streaming with geolocation
NickName:Harika Punyamurthula Ask DateTime:2017-04-12T23:35:49

Spark streaming - filter tweets after streaming with geolocation

I am a beginner trying to get tweets using spark streaming using Scala with some filter keywords. Is there a possibility to filter only the tweets which don't have geolocation as Null after streaming ? I am trying to save the tweets in ElasticSearch. So before saving the tweet map to ElasticSearch, can I filter the ones with geolocation information and then save them? I am creating JSON using json4s.JSONDSL with fields from the tweet. This is the sample code

val stream = TwitterUtils.createStream(ssc, None, filters) val tweetMap = stream.map(status => { val tweetMap =

      ("location" -> Option(status.getGeoLocation).map(geo => { s"${geo.getLatitude},${geo.getLongitude}" })) ~
      ("UserLang" -> status.getUser.getLang) ~
      ("UserLocation" -> Option(status.getUser.getLocation)) ~
      ("UserName" -> status.getUser.getName) ~
      ("Text" -> status.getText) ~
      ("TextLength" -> status.getText.length) ~
      //Tokenized the tweet message and then filtered only words starting with #
      ("HashTags" -> status.getText.split(" ").filter(_.startsWith("#")).mkString(" ")) ~
      ("PlaceCountry" -> Option(status.getPlace).map (pl => {s"${pl.getCountry}"}))

tweetMap.map(s => List("Tweet Extracted")).print

// Each batch is saved to Elasticsearch 
tweetMap.foreachRDD { tweets => EsSpark.saveToEs(tweets, "sparksender/tweets")) }

//before this step is there a way to filter out tweets which have "location" as null?

I referred the code from github: https://github.com/luvgupta008/ScreamingTwitter/blob/master/src/main/scala/com/spark/streaming/TwitterTransmitter.scala

Copyright Notice:Content Author:「Harika Punyamurthula」,Reproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/43373942/spark-streaming-filter-tweets-after-streaming-with-geolocation

More about “Spark streaming - filter tweets after streaming with geolocation” related questions

Spark streaming - filter tweets after streaming with geolocation

I am a beginner trying to get tweets using spark streaming using Scala with some filter keywords. Is there a possibility to filter only the tweets which don't have geolocation as Null after streami...

Show Detail

Twitter Streaming Filter API: Not seeing tweets

I'm dabbling with the twitter streaming filter API (https://dev.twitter.com/streaming/reference/post/statuses/filter). I have two twitter accounts. One account I'm using for the API calls, the othe...

Show Detail

How to get just English tweets using apache spark streaming and Java API from all?

Hello i am newbie in Spark)I would like to make some Spark project which will be collect and process tweets from this social network with help spark-streaming module(For my little university resear...

Show Detail

Not able to see time zone, place or geolocation of any tweets

I am following two tutorials right now and both are up and running and I've gotten plenty of tweets/sentiment scores from them: 1) Twitter Stream Analytics on Azure https://azure.microsoft.com/en-us/

Show Detail

Not getting any tweets using TwitterUtils and Spark Streaming

So, I've been trying to get a stream of tweets using TwitterUtils and Spark Streaming, in scala language. This is my code so far, I think it should be enough to achieve what I'm looking for, but it...

Show Detail

Spark Streaming Twitter createStream Issue

I was trying to Stream data from Twitter Using Spark Streaming . But the below issue. import org.apache.spark.streaming.twitter._ import twitter4j.auth._ import twitter4j.conf._ import org.apache...

Show Detail

Twitter4j Streaming with Geolocation

I am using Twitter4j streaming API. I am using a geolocation constraint to get the tweets only from a specific area. Here is that part of the code: twitterStream.addListener(listener); Str...

Show Detail

Spark Streaming - Filter dynamically

I have a Spark streaming job. I want to apply filter to my input RDD. I want to fetch filter criteria dynamically each time from Hbase during each spark streaming batch. How do I achieve this ? ...

Show Detail

constructing a graph from streaming data using spark streaming

I am new to spark. I need to construct a co-occurrence graph(In a tweet -words will become nodes and the if the words are from same tweet we add an edge between them) from streaming data like twitter

Show Detail

Scala - Spark Streaming and Twitter

I have been trying to build a program in Scala to use stream tweets. My issue is while building the program. I am getting this error Exception in thread "main" java.lang.NoClassDefFoundError: org/

Show Detail