Analysts of Big Data should have the following strengths:
- Familiarity with newer statistical languages like R
- Understanding and use of analytics modeling techniques
- Outstanding familiarity with the data to be analyzed
- Risk-taking mentality to experiment with data
Technical skills needed are, among others:
- Very good understanding and experience with Open Source Software
- Data architecting of databases with terabytes of data and growing every minute
- Experience managing software frameworks like Hadoop; expertise in databases like noSQL, Cassandra, and HBase
- Expertise with analytics programming languages and facilities such as very important languages R or Pig
- Ability to manage hardware with hundreds or thousands of “small’ CPUs, for multiple terabytes of data.
Soft skills having not much to do with Big Data are needed in many organizations:
- Understanding of the ”ins and outs” of the business
- Understanding of the “bottom line” of the business
- Ability to discern which analytics will answer the bottom-line questions
- Communications skills to explain the analytics results
- Understanding not only transactions but also interactions and observations
10 Skills of Big data slides
Skill 1. Open Source: Apache Hadoop
A Big Data processing software has to be able to disperse the data in “chunks” to a number of processors and reassemble it without losing anything in the process! The Hadoop platform is powerful, but it is a beast which requires tender loving care and appropriate feeding by skilled technicians because of its distributed storage and processing architecture. Skills with Hadoop stack-such as HDFS, MapReduce, Flume, Oozie, Hive, Pig, HBase, and YARN – are in high demand in the industry.
Skill 2. Open Source: Apache Spark- an alternative to MapReduce
In contrast to Hadoop’s two-stage disk-based MapReduce paradigm, Spark’s multi-stage in-memory primitives provide performance up to 100 times faster for certain applications by allowing user programs to load data into a cluster’s memory and query it repeatedly. Spark could be used either within a Hadoop framework or outside it. Spark requires technical expertise to program and run.
Skill 3. Some More Technologies: Python, Data Lake, NoSQL
Is a widely used general-purpose, high-level programming language. Its design philosophy emphasizes code readability, and its syntax allows programmers to express concepts in fewer lines of code than would be possible in languages such as C++ or Java. Python supports multiple programming paradigms, including object-oriented, imperative, and functional programming or procedural styles.
- Data Lake
A Data Lake is a large storage repository that “holds data until it is needed.” The term was coined by the chief technology officer of Pentaho.
A NoSQL (originally referring to “non SQL” or “nonrelational”) database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Motivations for this approach include: simplicity of design, simpler “horizontal” scaling to clusters of machines, which is a problem for relational databases, and finer control over availability.
Skill 4. SQL
This is the old faithful of a programming language – almost 40 years old. It has been resurrected after a lull of the relational world. NoSQL is used in the more complex environment of humongous data, but SQL is used for “no brainer” simple applications. And because of the impetus of organizations such as Cloudera’s Impala, SQL is almost becoming the lingua franca for the next generation of Hadoop-scale Data Ware –
Skill 5. General-Purpose Programming Languages:
Java, C, Python, Scala General-purpose programming languages such as Java, C, Python, and Scala would be very useful for a person with an analytics background. Computer programmers with data analytics backgrounds are highly in demand. In computer software a general-purpose
programming language is a programming language designed to be used for writing software in a wide variety of application domains.
Skill 6. Data Mining and Machine Learning
- Data Mining
Data Mining is the computational process of discovering patterns in large data sets (“Big Data”) involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. It is the analysis of data with the intent to discover gems of hidden information in the vast quantity of data that has been captured in the normal course of running the business.
- Machine Learning
Machine learning evolved from the study of pattern recognition and computational learning theory in artificial intelligence. Machine learning explores the study and construction of algorithms that can learn from and make predictions on data.
Skill 7. Statistical and Quantitative Analysis
This is the crux of what Big Data is all about, and its main purpose. If a person has a background in quantitative reasoning and a degree in a field like mathematics or statistics, the person is already halfway there. If you have worked with the language R, or have used statistical software, you are a number of notches up. Quantitative background is a BIG plus. It is analysis of a situation or event by means of complex mathematical and statistical modeling. It is translation of data into information and in turn into Predictive Insight.
Skill 8. Data Visualization
Big Data could be very complex to comprehend if one is looking only at numbers and letters. There is no comparison to comprehension by the human brain when your eyes see the “shape of your data.” Visualized representation is an interface that presents information in an easy-to-understand and easy-to-relate, often graphical way, providing users with a lot of meaningful information at a glance.
Skill 9. Creativity
Creativity is a phenomenon whereby something new and somehow valuable is formed. No matter what software and hardware you use, in whichever industry, your brain is invaluable. These tools listed here will be replaced with other ones in a few years. But the human brain has been developed over a few million years. The creativity potential of our brain cells is monumental. Curiosity is the key to creativity, leading to new ways of looking at Big Data. Can you tell stories based on the data and can you communicate to the appropriate audience? Do you like data and like to play with it?
Skill 10. Problem Solving and Subject Matter Expertise
If you are equipped with the subject matter expertise, such as health, finances, telecommunications, retail, etc., and have the ability to think out of the box (look at the data differently from the way everybody else is looking at it), are not afraid of swimming against the stream, and don’t take the path of least resistance out of convenience, you are the best candidate for Big Data projects.