James Stanier and Steve Loughran teamed up for an interview! James Stanier will be speaking at Berlin Buzzwords on Monday, June 6, 2016. He's a VP Engineering, Product (Backend) at Brandwatch. He works on projects that try to analyse and make sense of large amounts of social media data. A perfect interview partner for our program committee member Steve Loughran who is a member of technical staff at Hortonworks, where he works on leading-edge developments within the Hadoop ecosystem, including service availability, cloud infrastructure integration, and emerging layers in the Hadoop stack. We believe that both are cycling enthuasiats.
Steve Loughran: Hi James, I am glad you got a minute. Tell me what is your talk about, and why should people attend it?
James Stanier: I will give an overview and some examples of probabilistic data structures, bloom filters, etc.. The coverage of these data structures usually scares people off because the maths seem too complicated. This is easier than people think, you can do some clever interesting stuff with not much memory and everyone will be really impressed. Therfore, I will give practical tips on how to go about these particular structures, how to influence heap usage, how to pick perfect timings and situations to use them, which particular libraries to use, etc.
Can you describe the uses of the technology you are covering?
I am going to dwell on real time filtering of lists, as well as on example filtering: e.g. twitter uses next to no memory. So I'll focus on cache optimisation.
Probability and statistics are becoming core parts of big system applications. Where would somebody begin to learn (or re-learn) this branch of mathematics?
Start with the problems, the tools and then look at the history behind them. For example, reading Lamport's papers from the 1980s is a lot easier when you can read something and then immediately go "hey, this is in Zookeeper!"
Where do you see the area of technology you are working in going in the next few years?
Tooling like spark & mllib make it easier to do ML on large datasets, improving things we can do in sentiment analysis + extensions, e.g. intent to purchase things, mood detection, people trying to detect when unusual things happen online. New techs will let you ramp up a knotch, now thinking of long term strategy, "store in HDFS forever".
What are you most enjoying working on at the moment?
I got some VC funding, so I am having a lot of fun hiring, building more stuff, managing a team.
Emacs, vi or something else?
I' m using emacs and latex for my PhD, but have always been using vi. Intellj is great for coding.
I think this is one of the best conferences, you can always find some good talks and it's free of nonsense. This is going to be my 4th time there, I first attended in 2012, and it‘s my second talk.
Having looked at the other talks — what do you hope to learn from the conference yourself?
Solr, — big Solr shop; Solr 6. I'm interested in Kafka Streams; Confluent spin off looks good, I would like to see where they are going. Apache Kudu looks promising as well.
Thank you, James!
Don't miss James talk on "Acceptably inaccurate: probabilistic data structures" on Monday, June 6, 2016 at 11 am @ Kesselhaus, Kulturbrauerei.
Photo: James Stanier at Postbahnhof CC BY-SA 2.0 by Gregor Fischer