Edureka

Splunk knowledge objects which include splunk events, event types and tags - Edureka

In my previous blog, I spoke about 3 Knowledge objects: Splunk Timechart, Data model and Alert that were related to reporting and visualizing data. In case you want to have a look, you can refer here. In this blog, I am going to explain Splunk Events, Event types and Splunk Tags.
These knowledge objects help to enrich your data in order to make them easier to search and report on.

So, let’s get started with Splunk Events.

Splunk Events

An event refers to any individual piece of data. The custom data that has been forwarded to Splunk Server are called Splunk Events. This data can be in any format, for example: a string, a number or a JSON object.

Let me show you how events look in Splunk:

splunk-events-edureka
As you can see in the above screenshot, there are default fields (Host, Source, Sourcetype and Time) which gets added after indexing. Let us understand these default fields:

Host: Host is a machine or an appliance IP address name from where the data comes. In the above screenshot, My-Machine is the host.
Source: Source is where the host data comes from. It is the full pathname or a file or directory within a machine.
For example: C:\Splunk\emp_data.txt
Sourcetype: Sourcetype identifies the format of the data, whether it is a log file, XML, CSV or a thread field. It contains the data structure of the event.
For example: employee_data
Index: It is the name of the index where the raw data is indexed. If you don’t specify anything, it goes into a default index.
Time: It is a field which displays the time at which the event was generated. It is barcoded with every event and cannot be changed. You can rename or slice it for a period of time in order to change its presentation.
For example: 3/4/16 7:53:51 represents the timestamp of a particular event.

Now, let us learn how Splunk Event types help you to group similar events.

Get Started with Splunk

Splunk Event Types

Assume you have a string containing the employee name and employee ID and you want to search the string using a single search query rather than searching them individually. Splunk Event types can help you here. They group these two separate events and you can save this string as a single event type (Employee_Detail).

Splunk event type refers to a collection of data which helps in categorizing events based on common characteristics.
It is a user-defined field which scans through huge amount of data and returns the search results in the form of dashboards. You can also create alerts based on the search results.

Do note that you cannot use a pipe character or a sub search while defining an event type. But, you can associate one or more tags with an event type. Now, let us learn how these Splunk event types are created.
There are multiple ways to create an event type:

Using Search
Using Build Event Type Utility
Using Splunk Web
Configuration files (eventtypes.conf)

Let us go into more detail to understand it properly:

1. Using Search: We can create an event type by writing a simple search query.
Go through the below steps to create one:
> Run a search with the search string
For Example: index=emp_details emp_id=3;
> Click Save As and select Event Type.
You can refer to the below screenshot to get a better understanding:

Splunk-Event-Types-Splunk-events-Edureka

2. Using Build Event Type Utility: The Build Event Type utility enables you to dynamically create event types based on events returned by searches. This utility also enables you to assign specific colors to event types.

You can find this utility in your search results. Let’s go through the below steps: Splunk-event-actions-splunk-events-Edureka
Step1: Open the dropdown event menu
Step2: Find the down arrow next to the event timestamp
Step3: Click Build event type
Once you click on ‘Build Event Type’ displayed in the above screenshot, it will return the selected set of events based on a particular search.

3. Using Splunk Web: This is the easiest way to create an event type.
For this, you can follow these steps:
» Go to Settings
» Navigate to Event Types
» Click New

Let me take the same employee example to make it easy.
Search query would be same in this case:
index=emp_details emp_id=3

Refer to the below screenshot to get a better understanding:

4. Configuration files (eventtypes.conf): You can create event types by directly editing eventtypes.conf configuration file in $SPLUNK_HOME/etc/system/local
For Example: “Employee_Detail”
Refer to the below screenshot to get a better understanding:

splunk-eventtype-conf-Edureka

By now, you would have understood how event types are created and displayed. Next, let us learn how Splunk tags can be used and how they bring clarity to your data.

Learn Splunk From Experts

Splunk Tags

You must be aware of what a tag means in general. Most of us use the tagging feature in Facebook to tag friends in a post or photo. Even in Splunk, tagging works in a similar fashion. Let’s understand this with an example. We have an emp_id field for a Splunk index. Now, you want to provide a tag (Employee2) to emp_id=2 field/value pair. We can create a tag for emp_id=2 which can now be searched using Employee2.

Splunk tags are used to assign names to specific fields and value combinations.
It is the simplest method to get the results in pair while searching. Any event type can have multiple tags to get quick results.
It helps to search groups of event data more efficiently.
Tagging is done on the key value pair which helps to get information related to a particular event, whereas an event type provides the information of all the events associated with it.
You can also assign multiple tags to a single value.

Look at the screenshot on right side to create a Splunk tag.

Go to Settings -> Tags

Now, you might have understood how a tag is created. Let us now understand how Splunk tags are managed. There are three views in Tag Page under Settings:
1. List by field value pair
2. List by tag name
3. All unique tag objects

Let us get into more details and understand different ways to manage and get quick access to associations that are made between tags and field/value pairs.

1. List by field value pair: This helps you to review or define a set of tags for a field/value pair. You can see the list of such pairings for a particular tag.
Refer to the below screenshot to get a better understanding:

splunk-tags-field-value-splunk-events-Edureka

2. List by tag name: It helps you to review and edit the sets of field/value pairs. You can find the list of field/value pairing for a particular tag by going to ‘list by tag name’ view and then click on the tag name. This takes you to the detail page of the tag.
Example: Open the detail page of employee 2 tag.
Refer to the below screenshot to get a better understanding:

splunk-tags-list-by-tag-name-splunk-events-Edureka

3. All unique tag objects: It helps you to provide all the unique tag names and field/value pairings in your system. You can search a particular tag to quickly see all the field/value pairs with which it’s associated. You can easily maintain the permissions, to enable or disable a particular tag.

Refer to the below screenshot to get a better understanding:

splunk-tags-unique-tag-splunk-events-Edureka

Now, there are 2 ways to search tags:

If we need to search a tag associated with a value in any field, we can use:
tag=<tagname>
In the above example, it would be: tag=employee2
If we are looking for a tag associated with a value in a specified field, we can use:
tag::<field>=<tagname>
In the above example, it would be: tag::emp_id=employee2

In this blog, I have explained three knowledge objects that help to make your searches easier. In my next blog, I will explain some more knowledge objects like Splunk fields, how field extraction works and Splunk lookups. Hope you enjoyed reading my second blog on knowledge objects. Stay tuned for my next blog in this series!

<< Previous

Do you wish to learn Splunk and implement it in your business? Check out our Splunk certification training here, that comes with instructor-led live training and real-life project experience.

Check out Splunk course

The post Splunk Knowledge Objects: Splunk Events, Event Types And Tags appeared first on Edureka Blog.

informatica-transformations-Edureka Informatica Transformations are repository objects which can read, modify or pass data to the defined target structures like tables, files, or any other targets required. A Transformation is basically used to represent a set of rules, which define the data flow and how the data is loaded into the targets. Informatica PowerCenter provides multiple transformations, each serving a particular functionality.

To understand Informatica Transformations better, let us first understand what is mapping? A mapping is a collection of source and target objects linked together by a set of transformations. Hence transformations in a mapping represent the operations that the integration service will perform on the data during the execution of the workflow. To get a better understanding of workflow, you can check out our blog Informatica Tutorial: Workflow management

What are the Various Informatica Transformations?

Informatica Transformations can be mainly classified into two categories. Firstly based on the connectivity(Linking in mapping) of the transformations with each other and the second is based on the change in the overall no of rows between the source and target. Let’s start by taking a look at the Informatica transformations based on connectivity.

1) Types of transformations in Informatica based on connectivity:

Connected Transformations.
Unconnected Transformations.

In Informatica, those transformations which are connected to one or more transformations are called as Connected transformations.

The connected transformations are used when for every input row, a transformation is called and is expected to return a value. For example, we can use a connected lookup transformation to know the names of every employee working a specific department by specifying the Department ID in the lookup expression.

Some of the Major connected Informatica transformations are Aggregator, Router, Joiner, Normalizer, etc.

Those transformations that are not connected to any other transformations are called Unconnected transformations. Their functionality is used by calling them inside other transformations like Expression transformation. These transformations are not part of the mapping pipeline.

The unconnected transformations are used when their functionality is only required based upon certain conditions. For example, As a programmer you wish to perform a complicated operation on the data, however you do not wish to use Informatica transformations like expression or filter transformations to perform this operation. In such a case, you can create an external DLL or UNIX shared library with the codes to perform the operation and call them in the External procedure transformation.

There are 3 Informatica transformations viz. External Procedure, Lookup, and Stored Procedure which can be unconnected in a valid mapping (A mapping which the Integration Service can execute).

2) Types of Informatica transformations based on the change in no of rows

Active Transformations
Passive Transformations

Active Transformations: – An active transformation can perform any of the following actions:

Change the number of rows that passes through the transformation: For instance, the Filter transformation is active because it removes rows that do not meet the filter condition.
Change the transaction boundary: A transaction boundary is a boundary that encloses all the transactions before a commit is called or between two commit calls. For e.g., During a transactional operation, the user feels that after certain transactions a commit is required and calls the commit command to create a savepoint and by doing so the user changes the default transaction boundary. By default, the transaction boundary lies between the start of the file to auto commit point or EOF.
Change the rowtype attribute: Rowtype attribute is a record type that represents a row in a table. The record can store an entire row of data selected from the table or fetch from a pointer or pointer variable. For e.g., The Update Strategy transformation flags rowstype as 0 for inserting values, 1 for update, 2 for delete or 3 for reject.
Aggregator, Filter, Joiner, Normalizer, etc. are a few examples of Active transformation.

Passive Transformation: A passive transformation is one which will satisfy all these conditions:

The number of rows before and after transformation is the same.
Maintains the transaction boundary .
Maintains the rowtype attribute.
Expression, ExternalProcedure, HTTP, etc. are a few examples of Passive transformation.

In the passive transformation, no new rows are created, or existing rows are dropped.

You must be wondering why passive transformations are used for if they do not change the number of rows. They are generally used to update values, calling an external procedure from a shared library and to define the input and output of maplets. A maplet is a collection of only the transformations from the mapping. For e.g., For a student database we wish to update the values of marks column to percentile instead of the percentage, this can be done by using an expression transformation which will convert the values and update in the same columns keeping the overall number of rows same after the transformations.

There is no restriction that if a transformation is being used as a passive transformation, it cannot be used later as active transformation. Similarly, an unconnected transformation can be used as a connected transformation as per needs. All possible combinations can be formed between these categories and this is the magic of Informatica transformations. You will get a better idea later in this blog about the possible types a transformation can belong to.

Now that we have gotten an understanding of the various types of Informatica transformations, let’s begin exploring them. Below are a few major types of Informatica transformations:

Transformation	Type	Description
Aggregator	Active Connected	Performs aggregate calculations.
Expression	Passive Connected	Calculates a value.
Java	Active Connected or Passive Connected	Executes user logic coded in Java. The bytecode for the user logic is stored in the repository
Joiner	Active Connected	Joins data from different databases or flat file systems.
Lookup	Active Connected or Passive Connected or Active Unconnected or Passive Unconnected	Lookup and return data from a flat file, relational table, view, or synonym.
Normalizer	Active Connected	Used in the pipeline to normalize data from relational or flat file sources.
Rank	Active Connected	Limits records to a top or bottom range.
Router	Active Connected	Routes data into multiple transformations based on group conditions.
SQL	Active Connected or Passive Connected	Executes SQL queries against a database.
Union	Active Connected	Merges data from different databases or flat file systems.
XML Generator	Active Connected	Reads data from one or more input ports and outputs XML through a single output port.
XML Parser	Active Connected	Reads XML from one input port and outputs data to one or more output ports.
XML Source Qualifier	Active Connected	Represents the rows that the Integration Service reads from an XML source when it runs a session.

Let us now start looking at the transformations one by one.

Aggregator Transformation

Aggregator transformation is an Active and Connected transformation. This Informatica transformation is useful to perform calculations such as averages and sums (mainly to perform calculations on multiple rows or groups). For example, to calculate the total number of daily sales or to calculate the average of monthly or yearly sales. Aggregate functions such as AVG, FIRST, COUNT, PERCENTILE, MAX, SUM, etc., can be used in aggregate transformation.

Lookup Transformation

Lookup transformation is the most popular and widely used Informatica transformation. Based on the requirement of the user has, the lookup transformation can be used as a Connected or Unconnected transformation combining it as an Active or Passive transformation. It is used to mainly look up the details from a source, source qualifier, or target in order to get relevant required data. You can also look up a ‘flat file’, ‘relational table’, ‘view’ or ‘synonym’. One can use multiple lookup transformations in a mapping.

The lookup transformation is created with the following type of ports(Logical points for transfer of information):

Input port (I)
Output port (O)
Look up Ports (L)
Return Port (R) (Only in case of Unconnected lookup)

Differences between Connected and UnConnected Lookup Transformation:

Connected lookup receives input values directly from mapping pipeline, whereas UnConnected lookup receives values from the lookup expression from another transformation. A mapping in Informatica may contain Source, Transformations and Targets connected together are considered as a pipeline.
Connected lookup returns multiple columns from the same row as they have multiple return ports, whereas UnConnected lookup has only one return port and returns one column from each row. For e.g., If we use a connected lookup on an employee database for a specific department id as a parameter, we can get all the details related to the employees of that department like their Names, Employee ID number, Address, etc., whereas with an Unconnected lookup we can get only one attribute of the employee like their Name or Employee Id number or any attribute specified by the user.
Connected lookup caches all lookup columns, whereas UnConnected lookup caches only the lookup output and lookup conditions.
Connected lookup supports user-defined default values, whereas UnConnected lookup does not support user defined values. For e.g., If you wish to change all values of a certain column to NULL after lookup, you can set the default value of those columns to NULL in the lookup expressions. This feature is however not possible in case of UnConnected lookup.

Let’s say from a customer database, I wish to know the details of customers who have more than 1 non-cancelled invoice. To obtain this data, We can use a lookup transformation.

Here are the steps.

Begin by loading the Invoice table as the source into the mapping designer. In case you are not clear on how to load source data into the Designer, Click here.
Let us now filter out the Invoices which are not cancelled. To do this Create a new filter named fil_ODS_CUSTOMER_ACTIVE to the Source Qualifier with the property NOT (ISNULL (DATE_CLOSED)) AND CANCELED = 0.
Now Add a lookup transformation in the designer as seen below with name as lkp_CUSTOMER :
Specify the lookup table as the customer table.
Double click on the header of lkp_CUSTOMER to open the edit menu. Under Condition tab set the lookup condition as CUST_ID = CUST_NO.
In the Properties tab change the Connection Information to $Source and click on OK to save the transformation:
Link the lkp_CUSTOMER ports to ODS_CUSTOMER_ACTIVE ports to complete the required transformation where ODS_CUSTOMER_ACTIVE is the required target file:
The final iconic map including the lookup transformation should be as below:

Expression Transformation

Expression transformation is a Passive and Connected Informatica transformation. Expression transformations are used for row-wise manipulation. For any type of manipulation you wish to perform on an individual record, use an Expression transformation. The Expression transformation accepts the row-wise data, manipulates it, and passes it to the target. For example, to calculate the discount for each product or to concatenate first and last names or to convert dates to a string field.

Joiner Transformation

The Joiner transformation is an Active and Connected Informatica transformation used to join two heterogeneous sources. The joiner transformation joins sources based on a specified condition that matches one or more pairs of columns between the two sources. The two input pipelines include a master and a detail pipeline or branch. To join more than two sources, you need to join the output of the joiner transformation with another source. To join n number of sources in a mapping, you need n-1 joiner transformations. The Joiner transformation supports the following types of joins:

Normal
Master Outer
Detail Outer
Full Outer

Normal join discards all the rows of data from the master and detail source that do not match, based on the condition.

Master outer joins discards all the unmatched rows from the master source and keeps all the rows from the detail source and the matching rows from the master source.

Detail outer join keeps all rows of data from the master source and the matching rows from the detail source. It discards the unmatched rows from the detail source.

Full outer join keeps all rows of data from both the master and detail sources.

We cannot join more than two sources using a single joiner. To join three sources, we need to have two joiner transformations.

Let’s say, we want to join three tables – Employees, Departments and Locations – using Joiner. We will need two joiners. Joiner-1 will join, Employees and Departments and Joiner-2 will join, the output from the Joiner-1 and Locations table.

Here are the steps:

Bring three sources into the mapping designer.
Create the Joiner -1 to join Employees and Departments using Department_ID.
Create the next joiner, Joiner-2. Take the Output from Joiner-1 and ports from Locations Table and bring them to Joiner-2. Join these two data sources using Location_ID.
The last step is to send the required ports from the Joiner-2 to the target or via an expression transformation to the target table.

Union Transformation

The Union Transformation is an Active and Connected Informatica transformation. It is used to merge multiple datasets from various streams or pipelines into one dataset. This Informatica transformation works similar to the UNION ALL command in SQL but, it does not remove any duplicate rows. It is recommended to use an aggregator to remove duplicates which are not expected at the target.

Normalizer Transformation

Normalizer Transformation is an Active and Connected Informatica transformation. It is one of the most widely used Informatica transformations mainly with COBOL sources where most of the time data is stored in de-normalized format. Also, Normalizer transformation can be used to create multiple rows from a single row of data.

Let’s try to load a comma separated data flat file from a flat file/Cobol Source.

Here are the steps:

Start by loading the Store (flat file) with the store name and Quarterly revenue:
Create a new Normalizer transformation named NRM_STORE_EXP with two ports Store and Quarter(Repeats 4 times because we have data for 4 quarters) as seen below:
The ports tab should be as seen below:
Copy/Link the following columns and connect to Normalizer Transformation.
Store
Quarter1
Quarter2
Quarter3
Quarter4
The mapping should look as follows:
Create a new Expression Transformation with exp_STORE. Copy/Link the following columns and connect to Expression Transformation as seen below:
Store
Quarter
GK_QUARTER
GCID_QUARTER
Link the expression to the final target to complete the mapping using Normalization transformation.

normalizer-final-mapping-informatica transformation-edureka

Master Informatica with Use cases

XML transformation

XML transformations is an Active and Connected Informatica transformation. In Informatica transformations, XML transformation is mainly used when the source file is of XML type or data is of XML type. XML transformation can mainly be classified into 3 transformations:

XML Source Qualifier Transformation.
XML Parser Transformation.
XML Generator Transformation.

XML Source Qualifier Transformation: XML Source Qualifier is an Active and Connected transformation. XML Source Qualifier is used only with an XML source definition. It represents the data elements that the Informatica Server reads when it executes a session with XML sources. XML Source Qualifier has one input or output port for every column in the source. If you remove an XML source definition from a mapping, the Designer also removes the corresponding XML Source Qualifier transformation.

XML Parser Transformation : XML Parser Transformation is an Active and Connected transformation. XML Parser transformation is used to extract XML inside a pipeline and then pass this to the target. The XML is extracted from the source systems such as files or databases. The XML Parser transformation reads XML data from a single input port and writes data to one or more output ports.

XML Generator Transformation : XML Generator is an Active and Connected transformation. XML Generator transformation is used to create XML inside a pipeline. XML Generator Transformation reads data from one or more input ports and outputs XML through a single output port.

Rank Transformation

Rank transformation is an Active and Connected transformation. It is an Informatica transformations that helps you in selecting the top or bottom rank of data. For example, to select top 10 Regions where the sales volume was very high or to select 10 lowest priced products.

Consider you wish to load the first and last record into a target table from my employee database.The idea behind this is to add a sequence number to the records and then take the Top 1 rank and Bottom 1 Rank from the records.

Drag and drop ports from source qualifier to two rank transformations.
Create a reusable sequence generator having start value 1 and connect the next value to both rank transformations.
Set rank properties as follows. The newly added sequence port should be chosen as Rank Port. No need to select any port as Group by Port.Rank – 1
Rank – 2
Make two instances of the target. Connect the output port to target.

Router Transformation

Router is an Active and Connected transformation. It is similar to filter transformation. The only difference is, filter transformation drops the data that do not meet the condition whereas router has an option to capture the data that do not meet the condition. It is useful to test multiple conditions. It has input, output and default groups.

Let’s say you wish to separate the odd and even records of a table, this can be done by using a router transformation.

The idea is to add a sequence number to the records and then divide the record number by 2. If it is divisible, then move it to even target and if not then move it to odd target.

Drag the source and connect to an expression transformation.
Add the next value of a sequence generator to expression transformation.
In expression transformation make two port, one is “odd” and another “even”.
Write the expression as below
Connect a router transformation to expression.
Make two group under the router transformation.
Give condition as below
Then send the two group to different targets. This is the entire flow.

I hope this Informatica Transformation blog was helpful to build your understanding on the various Informatica transformation and has created enough interest to learn more about Informatica.

View Upcoming Informatica Batches

If you found this blog helpful, you can also check out our Informatica Tutorial blog series What is Informatica: A Beginner Tutorial of Informatica PowerCenter and Informatica Tutorial: Understanding Informatica ‘Inside Out’ . In case if you are looking for details on Informatica Certification, you can check our blog Informatica Certification: All there is to know.

If you have already decided to take up Informatica as a career, I would recommend you why don’t have a look at our Informatica training course page. The Informatica Certification training at Edureka will make you an expert in Informatica through live instructor-led sessions and hands-on training using real life use cases.

The post Informatica Transformations: The Heart and Soul of Informatica PowerCenter appeared first on Edureka Blog.

Introduction

What is AWS? – Amazon Web Services(AWS) is a cloud service from Amazon, which provides services in the form of building blocks, these building blocks can be used to create and deploy any type of application in the cloud.

These services or building blocks are designed to work with each other, and result in applications which are sophisticated and highly scalable.

Become an AWS Expert!

Each type of service in this “What is AWS” blog, is categorized under a domain, the few domains which are widely used are:

Compute
Storage
Database
Migration
Network and Content Delivery
Management Tools
Security & Identity Compliance
Messaging

The Compute domain includes services related to compute workloads, it includes the following services:

EC2 (Elastic Compute Cloud)
Lambda
Elastic Beanstalk
Amazon LightSail

The Storage domain includes services related data storage, it includes the following services:

S3 (Simple Storage Service)
Elastic Block Store
Amazon Glacier
AWS Snowball

The Database domain is used for database related workloads, it includes the following services:

Amazon Aurora
Amazon RDS
Amazon DynamoDB
Amazon RedShift

The Migration domain is used for transferring data to or from the AWS Infrastructure, it includes the following services:

AWS database Migration Service
AWS SnowBall

The Networking and Content Delivery domain is used for isolating your network infrastructure, and content delivery is used for faster delivery of content. It includes the following services:

Amazon Route 53
AWS CloudFront

The Management Tools domain consists of services which are used to manage other services in AWS, it includes the following services:

AWS CloudWatch
AWS CloudFomation
AWS CloudTrail

The Security & Identity, Compliance domain consist of services which are used to manage to authenticate and provide security to your AWS resources. It consists of the following services:

AWS IAM
AWS KMS
AWS Shield

The Messaging domain consists of services which are used for queuing, notifying or emailing messages. It consists of the following domains:

Amazon SQS
Amazon SNS
Amazon SES
Amazon Pinpoint

To learn more about the products of AWS, you can refer to our Amazon AWS Tutorial, which contains detailed information about all of these services.

Learn with our AWS Experts!

You have a fair idea now about what is AWS, and the services which are covered in AWS, let’s go ahead and straightaway apply this knowledge to build applications. You might feel that you don’t know much about AWS, but then,

Sometimes you have to run before you can walk!

Keeping that in mind, let’s understand how does one build applications in AWS:

Building Applications

First and foremost, you should analyze, what is your application about? Is it something that requires you to be worried about the underlying infrastructure? Is it something that requires a database? Is it something which will require monitoring?

So, once you know all the requirements about your application, you can pick the domain, and hence choose a service.

Like for example, you want to deploy an application in AWS, which does not require you to worry about the underlying architecture, which service will you choose?

Well, in the compute section there is this service called Elastic Beanstalk. You just upload your application, and AWS does the rest for you. It’s that simple!

Of course you wouldn’t know about any of these services without using them right? That’s why AWS came up with an amazing free tier option.

Who is eligible for a free tier?

Every customer from the time he registers on AWS, receives the free tier option, and is eligible for the same till 1 year from the time he registers.

How shall this help?

You can try every application in AWS and learn! The more you practice, the more you learn, what is AWS.

So basically, you learn for free!

How do you sign up on AWS?

Step 1: Go to aws.amazon.com and click on Create an AWS Account.

AWS Signup - What is AWS - Edureka

Step 2: Click on ‘I am a new customer’ option, enter your email address and at last click on Sign In.

Signup - What is AWS - Edureka

Step 3: On the next page, fill-in all the relevant information and click on Create Account.

Free-Tier-Create - What is AWS - Edureka

Step 4: On the next page, fill in your personal details and click on Create Account.

Create-Account-AWS - What is AWS - Edureka

Step 5: You would be asked to enter your credit or debit card details on this page, once you do that, proceed by clicking on continue.

Step 6: Next Step would be to verify your phone number, enter the details and click on Call me Now.

verify - What is AWS - Edureka

Step 7: You will get a call from AWS and will be asked to enter a pin, next up you will be selecting your plan for AWS, but before that click on Next.

next-select-plan - What is AWS - Edureka

Step 8: You shall select a plan, which suits you, I will be going with a basic plan since this account would be for personal use.

Plan-Select - What is AWS - Edureka

Step 9: Congrats! Your AWS Account is ready to be used! Go sign in and play!

Free-Tier-5 - What is AWS - Edureka

Now, since you have an AWS account at your disposal, why not do some hands-on? What say?

Let’s host a PHP website on EC2 and back it up with an RDS MySQL database. Not familiar with the services? Let me brief you up:

EC2 (Elastic Compute Cloud) is a compute service offered by AWS which provides resizable compute capacity in the cloud.

Learn AWS Now!

So in simple words you get a server with custom compute capacity, this capacity can be adjusted according to your needs. Cool, right? Want to know more, AWS EC2 Blog.

Let’s discuss RDS now, so RDS is Relational Database Service, which includes different databases like MySQL, MongoDB etc.

So basically, RDS manages these databases for you, How? Check out this blog on AWS RDS.

Demo

We will be creating a small application on EC2-RDS infrastructure in this What is AWS blog. By the end, you will have a PHP application on EC2, backed by a fully managed MySQL server.

Let’s start by deploying an EC2 instance first in this What is AWS blog.

Step 1: Login to AWS Management Console.

Management-Console - What is AWS - Edureka

Step 2: Select a region from the drop down.

Regions - What is AWS - Edureka

Step 3: Click EC2 under Compute section. This will take you to EC2 dashboard.

EC2 - What is AWS - Edureka

Step 4: Select Launch Instance and hence select an AMI, for our example in this What is AWS blog, we will be selecting a Windows 2016 Server Instance which falls under free tier.

AMI - What is AWS - Edureka

Step 5: Once you select your desired AMI, select your instance type, this is basically where you decide how much computing power you need to start, since ours is a small application, we shall suffice with the free tier.

Type - What is AWS - Edureka

Step 6: Configure all the details and then click on add storage.

Step 7: Here you will be configuring your storage devices, once done click on tag instance.

Storage - What is AWS - Edureka

Step 8: Here you will be tagging your instance, this is how your instance will be identified.

Tag - What is AWS - Edureka

Step 9: Now you will be configuring your security group.

Configure - What is AWS - Edureka

Step 10: Check all your settings, once verified launch your instance!

Launch- What is AWS - Edureka

Step 11: In the next step you will be prompted for a key pair, create one and download at a handy location.

Key - What is AWS - Edureka

Step 12: Select your instance and click on Connect.

Connect-Instance - What is AWS - Edureka

Step 13: Once you click connect, you will be prompted with the following screen. Copy the public IP and then click on Get Password.

Password - What is AWS - Edureka

Step 14: Select the key-pair that you downloaded, then click decrypt password.

Get - What is AWS - Edureka

Step 15: Copy the password and the public IP, keep it handy for the next step.

Pass-IP - What is AWS - Edureka

Step 16: We have the Public IP and the password now, let’s connect to our instance! Open the remote desktop manager. Enter the public IP address and click on Connect.

Remote - What is AWS - Edureka

Step 17: Enter the saved password here and click on OK.

Password - What is AWS - Edureka

Step 18: Congratulations! Windows Server on EC2 at your service!

Instance - What is AWS - Edureka

Next, Let’s create a RDS instance for MySQL

Step 1: Select the RDS service from the AWS Management Console.

Step 2: Since we will be launching a MySQL instance, select the MySQL instance from the list of DBs. Moving forward in this What is AWS Blog, let’s go to Step 3.

Launch-DB - What is AWS - Edureka

Step 3 : Since we are creating this instance for demo purposes, we will be selecting Dev/Test option and click on Next Step.

Step 4 :

On the next page you will be filling the following details:

You can select your desired Db instance here
You can select whether you want Multi-AZ enabled in your MySQL Db.
You can select how much space you want to allocate to your Db instance, it can vary from 5GB to 6TB.
In the end you will be setting your username and password for your Db Instance

Specify - What is AWS - Edureka

Step 5 :

In the next step, you will be configuring Advanced Settings for your DB

You will be selecting the VPC here, if you do not wish to launch your instance in a VPC you can leave the default settings and move ahead.
In the next section you can select which version of the Db you want to use, for our example we are using MySQL 5.6
In the next section you can set your backup preferences, like the retention period etc.
After that we will be setting the maintenance window, this is the time frame during which your Db instances will be updated.
Once you fill all the details, you will be launching the Db instance!

Configure-Advanced - What is AWS - Edureka

Step 6 :

Db-Successful - What is AWS - Edureka

Congratulations! On your first RDS Instance!!

Next in this What is AWS demo, let’s configure your RDS instance to connect to your EC2 server.

Step 1: On your RDS Dashboard, select your RDS instance.

Select-Db-Tags - What is AWS - Edureka

Step 2: You have to edit the Security Group here, Why? Because you want your EC2 instance to be able to connect to your RDS Instance, for that you have to add the IP address of your EC2 instance here.

Click-Security - What is AWS - Edureka

Step 3: Select the Security group, then select the Inbound rules, then click on Edit.

Inbound - What is AWS - Edureka

Step 4: Select the MySQL/Aurora, and then enter the Public IP address of your EC2 instance in the second field. Any IP address that you enter here should be followed by a ‘/32’ to convert it into CIDR notation. In the end Click Save.

Edit-Rule - What is AWS - Edureka

That’s it! Your RDS instance is ready to receive commands from your EC2 instance.

What next? You would need a MySQL Workbench or a server to connect to your RDS instance. I installed MySQL on the EC2 instance itself, you can do that on your localhost too.

Note: If you are doing it on your localhost, be sure to add your IP address in the Security Group of your RDS instance, like we did in the above step in this What is AWS blog.

Let’s connect to the RDS Instance Now!

Step 1: Open the command prompt and navigate to the bin folder of your MySQL.

Cmd-Mysql - What is AWS - Edureka

Step 2: Next up, copy the endpoint from your RDS instance dashboard, you will be needing this in the next step in this What is AWS blog demo to connect to your RDS instance. The endpoint is how your RDS instance gets identified. Following the endpoint is the port number ‘3306’ which you shall also need in the next step.

Endpoint - What is AWS - Edureka

Step 3: Come back to the command prompt and type the following command, you will prompted for the password, enter the password that you entered while you created your RDS instance and you are set!

mysql -h xx.rds.amazonaws.com -P &amp;amp;amp;amp;amp;lt;port number&amp;amp;amp;amp;amp;gt; -u &amp;amp;amp;amp;amp;lt;username&amp;amp;amp;amp;amp;gt; -p

mySQL-Instance-1 - What is AWS - Edureka

You can create your database and relevant tables here, I have already created mine, for your reference I am creating the sample ones here.

Commands-Sample - What is AWS - Edureka
Your RDS service is set now!

Let’s move on, to the most exciting part of this What is AWS blog! Hosting your website!!

Step 1: On your EC2 instance, click on start and then Server Manager.

Server-Manager - What-is-AWS - Edureka

Step 2: Click on Add roles and features.

Add-Roles - What is AWS - Edureka

Step 3: Click on next on the first page, on the second page, select the following option and click Next.

Select-Role - What is AWS - Edureka

Step 4: Select the server pool option and click on Next.

select-pool - What is AWS - Edureka

Step 5: Select the web server IIS from the list and click on Next.

Web-server - What is AWS - Edureka

Step 6: Select the .NET Frameworks mentioned here, and click next.

Dotnet - What is AWS - Edureka

Step 7: This is the confirmation page, go through what is getting installed, and Click on Install.

Install - What is AWS - Edureka

Once IIS is installed, you will be able to see it on your server manager dashboard.

IIS-Install - What is AWS - Edureka

After this, install Microsoft Web Platform Installer from here.

Step 8: Open IIS now, double click the server, and click on the web platform installer from the Management Section.

WPI - What is AWS - Edureka

We will be deploying a PHP web application, therefore we need PHP installed on this server, therefore we would be needing Web Platform Installer.

Step 9 : Search for PHP in the search bar of WPI, install the following package.

PHP - What is AWS - Edureka

*Note: If your PHP Manager is failing to install, there are some values that you have to change in the registry, you can refer this post.

Step 10: Once installed, you can view your PHP manager in IIS.

PHPm - What is AWS - Edureka

Your EC2 server is ready to host a website now!

Let’s upload your website to this EC2 server.

Step 1: First, copy all your files of your website to this folder in “C:/inetpub/wwwroot” on this server.

Copy - What is AWS - Edureka

Step 2: Return to IIS, click on your server and then right click on Sites.

Add - What is AWS - Edureka

Step 3: In the ‘site name’ give a relevant name to your website, in the next step give the following physical path, and in the end click OK.

Add-Website - What is AWS - Edureka

Your website is now live!

Step 4 : Enter the public IP address of your EC2 instance and voila! Your website is up and running. Enter the details and click on Add.

App1 - What is AWS - Edureka

Step 5 : This shows that your RDS connection with your EC2 instance is working well. So whatever you entered here, is getting stored on your RDS instance, and your website is stored on your EC2 instance. Click on Go Back.

App2 - What is AWS - Edureka

Step 6 : On the main page, click on View Results. And you shall see this page.

Result - What is AWS - Edureka

These are the records which are present in your MySQL table.

Get AWS Certified Now!

Need the code for this application? Here you go:

index.php

<!DOCTYPE html>
<html>
<body>
<h1>Registration Page</h1>
<form action="process.php" method="post">
<b> Name: </b> <input type="text" name="name"><br>
<b> Email: </b> <input type="text" name="email"><br>
<input type="Submit" value="add">
<a href="result.php">View Results</a>
</form>
</body>
</html>

process.php

<html>
<body>
<?php
$name=$_POST['name'];
echo '<br />';
$email=$_POST['email'];
$hostname='edureka-test.xx.rds.amazonaws.com';
$username='edureka';
$password='edureka1234';
$dbname='edu_test';
$usertable='test';
$yourfield='name';
$con=mysqli_connect($hostname,$username, $password) OR DIE ('Unable to connect to database! Please try again later.');
mysqli_select_db($con,$dbname);
$query = "insert into ".$usertable." values('".$name."','".$email."');";
mysqli_query($con,$query) or die("Not Updated!");
echo "Insertion Successful!!";
?>
<br>
<a href="index.php">Go Back</a>
</body>
</html>

result.php

<html>
<?php
$hostname='edureka-test.xx.rds.amazonaws.com';
$username='edureka';
$password='edureka1234';
$dbname='edu_test';
$usertable='test';
$yourfield1='name';
$yourfield2='email';
$con=mysqli_connect($hostname,$username, $password) OR DIE ('Unable to connect to database! Please try again later.');
mysqli_select_db($con,$dbname);
$query = 'SELECT * FROM test';
$result = mysqli_query($con,$query);
echo '<body><center><table><table border=3><tr><td><b>Name</b></td><td><b>Email</b></td></tr>';
if($result) while($row = mysqli_fetch_array($result)){
 $name = $row[$yourfield1];
 $email= $row[$yourfield2];
 echo '<br>';
 echo '<tr><td>' . $name . '</td>' ;
 echo ' <td> ' . $email . '</td></tr>' ;
 }
 ?>
 </table>
 <a href="index.php"> Go Back </a>
 </body>
 </html>

So this is it, guys! I hope you enjoyed this What is AWS blog. If you are reading this, Congratulations! You are no longer a newbie in AWS! The things that you learnt in the hands on part of this what is AWS blog, is what is required in an AWS Interview. The more you practice the more you will learn. To make your journey easy, we have come up with Top AWS Interview Questions. To learn more about AWS you can refer our Amazon AWS Tutorial blog. We have also come up with a curriculum which covers exactly what you would need to crack the Solution Architect Exam! You can have a look at the course details for AWS Solution Architect training.

Enroll Now!

Got a question for us? Please mention it in the comments section of this What is AWS blog and we will get back to you.

The post What is AWS ? – An Introduction to AWS appeared first on Edureka Blog.

Apache Spark is a lightning-fast cluster computing framework designed for fast computation. It is of the most successful projects in the Apache Software Foundation. Spark SQL is a new module in Spark which integrates relational processing with Spark’s functional programming API. It supports querying data either via SQL or via the Hive Query Language.

For those of you familiar with RDBMS, Spark SQL will be an easy transition from your earlier tools where you can extend the boundaries of traditional relational data processing. Through this blog, I will introduce you to this new exciting domain of Spark SQL and together we will equip ourselves to lead our organizaton to leverage the benefits of relational processing and call complex analytics libraries in Spark.

The following provides the storyline for the blog:

Why Spark SQL came into picture?
Spark SQL Overview
Spark SQL Libraries
Features of Spark SQL
Querying using Spark SQL
Adding Schema to RDDs
RDDs as Relations
Caching Tables In-Memory

Why Spark SQL Came Into Picture?

Spark SQL originated as Apache Hive to run on top of Spark and is now integrated with the Spark stack. Apache Hive had certain limitations as mentioned below. Spark SQL was built to overcome these drawbacks and replace Apache Hive.

Limitations With Hive:

Hive launches MapReduce jobs internally for executing the ad-hoc queries. MapReduce lags in the performance when it comes to the analysis of medium sized datasets (10 to 200 GB).
Hive has no resume capability. This means that if the processing dies in the middle of a workflow, you cannot resume from where it got stuck.
Hive cannot drop encrypted databases in cascade when trash is enabled and leads to an execution error. To overcome this, users have to use Purge option to skip trash instead of drop.

These drawbacks gave way to the birth of Spark SQL.

LEARN SPARK NOW

Spark SQL Overview

Spark SQL integrates relational processing with Spark’s functional programming. It provides support for various data sources and makes it possible to weave SQL queries with code transformations thus resulting in a very powerful tool.

Let us explore, what Spark SQL has to offer. Spark SQL blurs the line between RDD and relational table. It offers much tighter integration between relational and procedural processing, through declarative DataFrame APIs which integrates with Spark code. It also provides higher optimization. DataFrame API and Datasets API are the ways to interact with Spark SQL.

With Spark SQL, Apache Spark is accessible to more users and improves optimization for the current ones. Spark SQL provides DataFrame APIs which perform relational operations on both external data sources and Spark’s built-in distributed collections. It introduces extensible optimizer called Catalyst as it helps in supporting a wide range of data sources and algorithms in Big-data.

Spark runs on both Windows and UNIX-like systems (e.g. Linux, Microsoft, Mac OS). It is easy to run locally on one machine — all you need is to have java installed on your system PATH, or the JAVA_HOME environment variable pointing to a Java installation.

Spark SQL Architecture - Spark SQL - Edureka

Figure: Architecture of Spark SQL.

Spark SQL Libraries

Spark SQL has the following four libraries which are used to interact with relational and procedural processing:

Data Source API (Application Programming Interface):

This is a universal API for loading and storing structured data.
- It has built in support for Hive, Avro, JSON, JDBC, Parquet, etc.
- Supports third party integration through Spark packages
- Support for smart sources.
DataFrame API:

A DataFrame is a distributed collection of data organized into named column. It is equivalent to a relational table in SQL used for storing data into tables.
- It is a Data Abstraction and Domain Specific Language (DSL) applicable on structure and semi structured data.
- DataFrame API is distributed collection of data in the form of named column and row.
- It is lazily evaluated like Apache Spark Transformations and can be accessed through SQL Context and Hive Context.
- It processes the data in the size of Kilobytes to Petabytes on a single-node cluster to multi-node clusters.
- Supports different data formats (Avro, CSV, Elastic Search and Cassandra) and storage systems (HDFS, HIVE Tables, MySQL, etc.).
- Can be easily integrated with all Big Data tools and frameworks via Spark-Core.
- Provides API for Python, Java, Scala, and R Programming.
SQL Interpreter And Optimizer:

SQL Interpreter and Optimizer is based on functional programming constructed in Scala.
- It is the newest and most technically evolved component of SparkSQL.
- It provides a general framework for transforming trees, which is used to perform analysis/evaluation, optimization, planning, and run time code spawning.
- This supports cost based optimization (run time and resource utilization is termed as cost) and rule based optimization, making queries run much faster than their RDD (Resilient Distributed Dataset) counterparts.
e.g. Catalyst is a modular library which is made as a rule based system. Each rule in framework focuses on the distinct optimization.
SQL Service

SQL Service is the entry point for working along structured data in Spark. It allows the creation of DataFrame objects as well as the execution of SQL queries.

Features Of Spark SQL

The following are the features of Spark SQL:

Integration With Spark
Spark SQL queries are integrated with Spark programs. Spark SQL allows us to query structured data inside Spark programs, using SQL or a DataFrame API which can be used in Java, Scala, Python and R. To run streaming computation, developers simply write a batch computation against the DataFrame / Dataset API, and Spark automatically increments the computation to run it in a streaming fashion. This powerful design means that developers don’t have to manually manage state, failures, or keeping the application in sync with batch jobs. Instead, the streaming job always gives the same answer as a batch job on the same data.
Uniform Data Access
DataFrames and SQL support a common way to access a variety of data sources, like Hive, Avro, Parquet, ORC, JSON, and JDBC. This joins the data across these sources. This is very helpful to accommodate all the existing users into Spark SQL.
Hive Compatibility

Spark SQL runs unmodified Hive queries on current data. It rewrites the Hive front-end and meta store, allowing full compatibility with current Hive data, queries, and UDFs.
Standard Connectivity
Connection is through JDBC or ODBC. JDBC and ODBC are the industry norms for connectivity for business intelligence tools.
Performance And Scalability
Spark SQL incorporates a cost-based optimizer, code generation and columnar storage to make queries agile alongside computing thousands of nodes using the Spark engine, which provides full mid-query fault tolerance. The interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. Internally, Spark SQL uses this extra information to perform extra optimization. Spark SQL can directly read from multiple sources (files, HDFS, JSON/Parquet files, existing RDDs, Hive, etc.). It ensures fast execution of existing Hive queries.

The image below depicts the performance of Spark SQL when compared to Hadoop. Spark SQL executes upto 100x times faster than Hadoop.

Figure: Runtime of Spark SQL vs Hadoop. Spark SQL is faster
Source: Cloudera Apache Spark Blog
User Defined Functions
Spark SQL has language integrated User-Defined Functions (UDFs). UDF is a feature of Spark SQL to define new Column-based functions that extend the vocabulary of Spark SQL’s DSL for transforming Datasets. UDFs are black boxes in their execution.

The example below defines a UDF to convert a given text to upper case.

Code explanation:
1. Creating a dataset “hello world”
2. Defining a function ‘upper’ which converts a string into upper case.
3. We now import the ‘udf’ package into Spark.
4. Defining our UDF, ‘upperUDF’ and importing our function ‘upper’.
5. Displaying the results of our User Defined Function in a new column ‘upper’.
```
val dataset = Seq((0, "hello"),(1, "world")).toDF("id","text")
val upper: String => String =_.toUpperCase
import org.apache.spark.sql.functions.udf
val upperUDF = udf(upper)
dataset.withColumn("upper", upperUDF('text)).show
```

UDF - Spark SQL - Edureka Figure: Demonstration of a User Defined Function, upperUDF

Code explanation:
1. We now register our function as ‘myUpper’
2. Cataloging our UDF among the other functions.

spark.udf.register("myUpper", (input:String) => input.toUpperCase)
spark.catalog.listFunctions.filter('name like "%upper%").show(false)

Show UDF- Spark SQL - Edureka Figure: Results of the User Defined Function, upperUDF

Querying Using Spark SQL

We will now start querying using Spark SQL. Note that the actual SQL queries are similar to the ones used in popular SQL clients.

Starting the Spark Shell. Go to the Spark directory and execute ./bin/spark-shell in the terminal to being the Spark Shell.

For the querying examples shown in the blog, we will be using two files, ’employee.txt’ and ’employee.json’. The images below show the content of both the files. Both these files are stored at ‘examples/src/main/scala/org/apache/spark/examples/sql/SparkSQLExample.scala’ inside the folder containing the Spark installation (~/Downloads/spark-2.0.2-bin-hadoop2.7). So, all of you who are executing the queries, place them in this directory or set the path to your files in the lines of code below.

Employee TXT File - Spark SQL - Edureka

Figure: Contents of employee.txt

Employee JSON File - Spark SQL - Edureka

Figure: Contents of employee.json

Code explanation:
1. We first import a Spark Session into Apache Spark.
2. Creating a Spark Session ‘spark’ using the ‘builder()’ function.
3. Importing the Implicts class into our ‘spark’ Session.
4. We now create a DataFrame ‘df’ and import data from the ’employee.json’ file.
5. Displaying the DataFrame ‘df’. The result is a table of 5 rows of ages and names from our ’employee.json’ file.

import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().appName("Spark SQL basic example").config("spark.some.config.option", "some-value").getOrCreate()
import spark.implicits._
val df = spark.read.json("examples/src/main/resources/employee.json")
df.show()

Start - Spark SQL - Edureka Figure: Starting a Spark Session and displaying DataFrame of employee.json

Code explanation:
1. Importing the Implicts class into our ‘spark’ Session.
2. Printing the schema of our ‘df’ DataFrame.
3. Displaying the names of all our records from ‘df’ DataFrame.

import spark.implicits._
df.printSchema()
df.select("name").show()

Dataset Operations - Spark SQL - Edureka Figure: Schema of a DataFrame

Code explanation:
1. Displaying the DataFrame after incrementing everyone’s age by two years.
2. We filter all the employees above age 30 and display the result.

df.select($"name", $"age" + 2).show()
df.filter($"age" > 30).show()

Dataset Filter Operation- Spark SQL - Edureka Figure: Basic SQL operations on employee.json

Code explanation:
1. Counting the number of people with the same ages. We use the ‘groupBy’ function for the same.
2. Creating a temporary view ’employee’ of our ‘df’ DataFrame.
3. Perform a ‘select’ operation on our ’employee’ view to display the table into ‘sqlDF’.
4. Displaying the results of ‘sqlDF’.

df.groupBy("age").count().show()
df.createOrReplaceTempView("employee")
val sqlDF = spark.sql("SELECT * FROM employee")
sqlDF.show()

Dataset Grouping Operation - Spark SQL - Edureka Figure: SQL operations on employee.json

Creating Datasets

After understanding DataFrames, let us now move on to Dataset API. The below code creates a Dataset class in SparkSQL.

Code explanation:
1. Creating a class ‘Employee’ to store name and age of an employee.
2. Assigning a Dataset ‘caseClassDS’ to store the record of Andrew.
3. Displaying the Dataset ‘caseClassDS’.
4. Creating a primitive Dataset to demonstrate mapping of DataFrames into Datasets.
5. Assigning the above sequence into an array.

case class Employee(name: String, age: Long)
val caseClassDS = Seq(Employee("Andrew", 55)).toDS()
caseClassDS.show()
val primitiveDS = Seq(1, 2, 3).toDS
()primitiveDS.map(_ + 1).collect()

Creating Dataset - Spark SQL - Edureka Figure: Creating a Dataset

Code explanation:
1. Setting the path to our JSON file ’employee.json’.
2. Creating a Dataset and from the file.
3. Displaying the contents of ’employeeDS’ Dataset.

val path = "examples/src/main/resources/employee.json"
val employeeDS = spark.read.json(path).as[Employee]
employeeDS.show()

Creating JSON Dataset - Spark SQL - Edureka Figure: Creating a Dataset from a JSON file

LEARN SPARK FROM EXPERTS

Adding Schema To RDDs

Spark introduces the concept of an RDD (Resilient Distributed Dataset), an immutable fault-tolerant, distributed collection of objects that can be operated on in parallel. An RDD can contain any type of object and is created by loading an external dataset or distributing a collection from the driver program.

Schema RDD is a RDD where you can run SQL on. It is more than SQL. It is a unified interface for structured data.

Code explanation:
1. Importing Expression Encoder for RDDs. RDDs are similar to Datasets but use encoders for serialization.
2. Importing Encoder library into the shell.
3. Importing the Implicts class into our ‘spark’ Session.
4. Creating an ’employeeDF’ DataFrame from ’employee.txt’ and mapping the columns based on the delimiter comma ‘,’ into a temporary view ’employee’.
5. Creating the temporary view ’employee’.
6. Defining a DataFrame ‘youngstersDF’ which will contain all the employees between the ages of 18 and 30.
7. Mapping the names from the RDD into ‘youngstersDF’ to display the names of youngsters.

import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
import org.apache.spark.sql.Encoder
import spark.implicits._
val employeeDF = spark.sparkContext.textFile("examples/src/main/resources/employee.txt").map(_.split(",")).map(attributes => Employee(attributes(0), attributes(1).trim.toInt)).toDF()
employeeDF.createOrReplaceTempView("employee")
val youngstersDF = spark.sql("SELECT name, age FROM employee WHERE age BETWEEN 18 AND 30")
youngstersDF.map(youngster => "Name: " + youngster(0)).show()

Inferring Schema - Spark SQL - Edureka Figure: Creating a DataFrame for transformations

Code explanation:
1. Converting the mapped names into string for transformations.
2. Using the mapEncoder from Implicits class to map the names to the ages.
3. Mapping the names to the ages of our ‘youngstersDF’ DataFrame. The result is an array with names mapped to their respective ages.

youngstersDF.map(youngster => "Name: " + youngster.getAs[String]("name")).show()
implicit val mapEncoder = org.apache.spark.sql.Encoders.kryo[Map[String, Any]]
youngstersDF.map(youngster => youngster.getValuesMap[Any](List("name", "age"))).collect()

Map Encoder - Spark SQL - Edureka Figure: Mapping using DataFrames

RDDs support two types of operations:

Transformations: These are the operations (such as map, filter, join, union, and so on) performed on an RDD which yield a new RDD containing the result.
Actions: These are operations (such as reduce, count, first, and so on) that return a value after running a computation on an RDD.

Transformations in Spark are “lazy”, meaning that they do not compute their results right away. Instead, they just “remember” the operation to be performed and the dataset (e.g., file) to which the operation is to be performed. The transformations are computed only when an action is called and the result is returned to the driver program and stored as Directed Acyclic Graphs (DAG). This design enables Spark to run more efficiently. For example, if a big file was transformed in various ways and passed to first action, Spark would only process and return the result for the first line, rather than do the work for the entire file.

Schema RDD - Spark SQL - Edureka Figure: Ecosystem of Schema RDD in Spark SQL

By default, each transformed RDD may be recomputed each time you run an action on it. However, you may also persist an RDD in memory using the persist or cache method, in which case Spark will keep the elements around on the cluster for much faster access the next time you query it.

RDDs As Relations

Resilient Distributed Datasets (RDDs) are distributed memory abstraction which lets programmers perform in-memory computations on large clusters in a fault tolerant manner. RDDs can be created from any data source. Eg: Scala collection, local file system, Hadoop, Amazon S3, HBase Table, etc.

Specifying Schema

Code explanation:
1. Importing the ‘types’ class into the Spark Shell.
2. Importing ‘Row’ class into the Spark Shell. Row is used in mapping RDD Schema.
3. Creating a RDD ’employeeRDD’ from the text file ’employee.txt’.
4. Defining the schema as “name age”. This is used to map the columns of the RDD.
5. Defining ‘fields’ RDD which will be the output after mapping the ’employeeRDD’ to the schema ‘schemaString’.
6. Obtaining the type of ‘fields’ RDD into ‘schema’.

import org.apache.spark.sql.types._
import org.apache.spark.sql.Row
val employeeRDD = spark.sparkContext.textFile("examples/src/main/resources/employee.txt")
val schemaString = "name age"
val fields = schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, nullable = true))
val schema = StructType(fields)

Specifying Schema - Spark SQL - Edureka Figure: Specifying Schema for RDD transformation

Code explanation:
1. We now create a RDD called ‘rowRDD’ and transform the ’employeeRDD’ using the ‘map’ function into ‘rowRDD’.
2. We define a DataFrame ’employeeDF’ and store the RDD schema into it.
3. Creating a temporary view of ’employeeDF’ into ’employee’.
4. Performing the SQL operation on ’employee’ to display the contents of employee.
5. Displaying the names of the previous operation from the ’employee’ view.

val rowRDD = employeeRDD.map(_.split(",")).map(attributes => Row(attributes(0), attributes(1).trim))
val employeeDF = spark.createDataFrame(rowRDD, schema)
employeeDF.createOrReplaceTempView("employee")
val results = spark.sql("SELECT name FROM employee")
results.map(attributes => "Name: " + attributes(0)).show()

RDD Schema - Spark SQL - Edureka Figure: Result of RDD transformation

Even though RDDs are defined, they don’t contain any data. The computation to create the data in a RDD is only done when the data is referenced. e.g. Caching results or writing out the RDD.

Caching Tables In-Memory

Spark SQL caches tables using an in-memory columnar format:

Scan only required columns
Fewer allocated objects
Automatically selects best comparison

Loading Data Programmatically

The below code will read employee.json file and create a DataFrame. We will then use it to create a Parquet file.

Code explanation:
1. Importing Implicits class into the shell.
2. Creating an ’employeeDF’ DataFrame from our ’employee.json’ file.

import spark.implicits._
val employeeDF = spark.read.json("examples/src/main/resources/employee.json")

Loading Data - Spark SQL - Edureka Figure: Loading a JSON file into DataFrame

Code explanation:
1. Creating a ‘parquetFile’ temporary view of our DataFrame.
2. Selecting the names of people between the ages of 18 and 30 from our Parquet file.
3. Displaying the result of the Spark SQL operation.

employeeDF.write.parquet("employee.parquet")
val parquetFileDF = spark.read.parquet("employee.parquet")
parquetFileDF.createOrReplaceTempView("parquetFile")
val namesDF = spark.sql("SELECT name FROM parquetFile WHERE age BETWEEN 18 AND 30")
namesDF.map(attributes => "Name: " + attributes(0)).show()

Loading Parquet - Spark SQL - Edureka Figure: Displaying results from a Parquet DataFrame

JSON Datasets

We will now work on JSON data. As Spark SQL supports JSON dataset, we create a DataFrame of employee.json. The schema of this DataFrame can be seen below. We then define a Youngster DataFrame and add all the employees between the ages of 18 and 30.

Code explanation:
1. Setting to path to our ’employee.json’ file.
2. Creating a DataFrame ’employeeDF’ from our JSON file.
3. Printing the schema of ’employeeDF’.
4. Creating a temporary view of the DataFrame into ’employee’.
5. Defining a DataFrame ‘youngsterNamesDF’ which stores the names of all the employees between the ages of 18 and 30 present in ’employee’.
6. Displaying the contents of our DataFrame.

val path = "examples/src/main/resources/employee.json"
val employeeDF = spark.read.json(path)
employeeDF.printSchema()
employeeDF.createOrReplaceTempView("employee")
val youngsterNamesDF = spark.sql("SELECT name FROM employee WHERE age BETWEEN 18 AND 30")
youngsterNamesDF.show()

JSON Dataset - Spark SQL - Edureka Figure: Operations on JSON Datasets

Code explanation:
1. Creating a RDD ‘otherEmployeeRDD’ which will store the content of employee George from New Delhi, Delhi.
2. Assigning the contents of ‘otherEmployeeRDD’ into ‘otherEmployee’.
3. Displaying the contents of ‘otherEmployee’.

val otherEmployeeRDD = spark.sparkContext.makeRDD("""{"name":"George","address":{"city":"New Delhi","state":"Delhi"}}""" :: Nil)
val otherEmployee = spark.read.json(otherEmployeeRDD)
otherEmployee.show()

JSON RDD - Spark SQL - Edureka Figure: RDD transformations on JSON Dataset

Hive Tables

We perform a Spark example using Hive tables.

Code explanation:
1. Importing ‘Row’ class into the Spark Shell. Row is used in mapping RDD Schema.
2. Importing Spark Session into the shell.
3. Creating a class ‘Record’ with attributes Int and String.
4. Setting the location of ‘warehouseLocation’ to Spark warehouse.
5. We now build a Spark Session ‘spark’ to demonstrate Hive example in Spark SQL.
6. Importing Implicits class into the shell.
7. Importing SQL library into the Spark Shell.
8. Creating a table ‘src’ with columns to store key and value.

import org.apache.spark.sql.Row
import org.apache.spark.sql.SparkSession
case class Record(key: Int, value: String)
val warehouseLocation = "spark-warehouse"
val spark = SparkSession.builder().appName("Spark Hive Example").config("spark.sql.warehouse.dir", warehouseLocation).enableHiveSupport().getOrCreate()
import spark.implicits._
import spark.sql
sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")

Hive Tables - Spark SQL - Edureka Figure: Building a Session for Hive

Code explanation:
1. We now load the data from the examples present in Spark directory into our table ‘src’.
2. The contents of ‘src’ is displayed below.

sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src")
sql("SELECT * FROM src").show()

Hive Tables Select - Spark SQL - Edureka Figure: Selection using Hive tables

Code explanation:
1. We perform the ‘count’ operation to select the number of keys in ‘src’ table.
2. We now select all the records with ‘key’ value less than 10 and store it in the ‘sqlDF’ DataFrame.
3. Creating a Dataset ‘stringDS’ from ‘sqlDF’.
4. Displaying the contents of ‘stringDS’ Dataset.

sql("SELECT COUNT(*) FROM src").show()
val sqlDF = sql("SELECT key, value FROM src WHERE key < 10 ORDER BY key")
val stringsDS = sqlDF.map {case Row(key: Int, value: String) => s"Key: $key, Value: $value"}
stringsDS.show()

Hive Tables DF - Spark SQL - Edureka

Figure: Creating DataFrames from Hive tables

Code explanation:
1. We create a DataFrame ‘recordsDF’ and store all the records with key values 1 to 100.
2. Create a temporary view ‘records’ of ‘recordsDF’ DataFrame.
3. Displaying the contents of the join of tables ‘records’ and ‘src’ with ‘key’ as the primary key.

val recordsDF = spark.createDataFrame((1 to 100).map(i => Record(i, s"val_$i")))
recordsDF.createOrReplaceTempView("records")
sql("SELECT * FROM records r JOIN src s ON r.key = s.key").show()

Hive Create Record DF - Spark SQL - Edureka Figure: Recording the results of Hive operations

So this concludes our blog. I hope you enjoyed reading this blog and found it informative. By now, you must have acquired a sound understanding of what Spark SQL is. The hands-on examples will give you the required confidence to work on any future projects you encounter in Spark SQL. Practice is the key to mastering any subject and I hope this blog has created enough interest in you to explore learning further on Spark SQL.

WATCH SPARK TUTORIAL

Got a question for us? Please mention it in the comments section and we will get back to you at the earliest.

If you wish to learn Spark and build a career in domain of Spark and build expertise to perform large-scale Data Processing using RDD, Spark Streaming, SparkSQL, MLlib, GraphX and Scala with Real Life use-cases, check out our interactive, live-online Apache Spark Certification Training here, that comes with 24*7 support to guide you throughout your learning period.

The post Spark SQL Tutorial – Understanding Spark SQL With Examples appeared first on Edureka Blog.

Splunk knowledge objects which include Splunk Lookup and Fields - Edureka In my previous blog, I explained Splunk Events, Event types and Tags that help in simplifying your searches. In this blog, I am going to explain the following concept – Splunk lookup, fields and field extraction.
I will discuss why lookups are important and how you can associate data from an external source by matching the unique key value. On the other hand, Splunk fields help in enriching your data by providing a specific value to an event. I have also explained how these fields can be extracted in different ways.

So, let’s get started with Splunk Lookup.

Splunk Lookup

You might be familiar with lookups in Excel. Splunk lookup work in a similar fashion. For example, you have a product_id value which matches its definition in a different file, say a CSV file. Lookup can help you to map the details of the product in a new field. Suppose you have product_id=2 and the name of the product is present in a different file, then Splunk lookup will create a new field – ‘product_name’ which has the ‘product_id’ associated with it.

A lookup table is a mapping of keys and values.
Splunk Lookup helps you in adding a field from an external source based on the value that matches your field in the event data.
It enriches the data while comparing different event fields.
Splunk lookup command can accept multiple event fields and destfields.
It can translate fields into more meaningful information at search time.

If you see the image below, these are the different types of Splunk lookup which I will be explaining in detail below.

Types of Splunk Lookup - Edureka

CSV Lookup: As the name itself says, a CSV lookup pulls data from CSV files. It populates the event data with fields and represents it in the static table of data. Therefore, it is also called as a “static lookup”. There must be at least two columns representing field with a set of values. They can have multiple instances of the same value.
External Lookup: In this type of lookup, it populates your event data from an external source, say a DNS server. It can use Python scripts or binary executable to get field values from an external source. Therefore, it is also called as “Scripted lookup”.
KV Store Lookup: In this type of lookup, it populates your event data with fields pulled from your App Key Value Store (KV Store) collections. This lookup matches the fields in your event to fields in a KV store.
Geospatial Lookup: In this type of lookup, the data source is a KMZ (compressed keyhole markup language) file which is used to define boundaries of mapped regions such as US states and US counties. It matches your events in a KMZ file and outputs fields to your event encoded in a KMZ, like country, state or county names.

Create Splunk lookup - Edureka

You can configure Splunk lookups by:

Settings -> Lookups

Once you click on ‘Lookups’, a new page will be displayed saying ‘Create and configure lookups’.

You can create new lookups or edit the existing lookups.

Refer to the screenshot on the left to get a better understanding on how to create Splunk lookup.

There are 3 ways to create and configure Splunk lookups:

Lookup table files
Lookup definitions
Automatic lookups

Get Started With Splunk

Let us get into more details and understand these different ways:

1. Lookup table files: In lookup table files, you can simply upload a new file.
When you click on ‘Add new’ view, you can upload CSV files to use in your field lookups.

To create a lookup table file, you need to follow the below steps:
» Go to Lookups page
» Open Lookups table files
» Click Add new
» Upload a lookup file, browse for the CSV file (product.csv) to upload.
» Under Destination filename, name the file product.csv

Refer to the below screenshot to get a better understanding.

Create Splunk Lookup Table Files - Edureka

2. Lookup definitions: Lookup definitions help to edit existing lookup definitions or define a new file-based lookup. While defining a lookup, you can reuse the same file, and later make that lookup run automatically.

To create a lookup definition, you need to follow the below steps:
» Go to Lookups page
» Open Lookups definitions
» Click ‘Add new’
» A new box will open to add field definition
» Provide the name of the lookup
» Set the Type as ‘File-based’
» Select the name of the lookup file (product.csv)

Refer to the below screenshot to get a better understanding.

3. Automatic lookups: Automatic lookup helps to configure a new lookup to run automatically or edit an existing one.

To create an automatic lookup, you can go through the below steps:
» Go to Lookups page
» Open Automatic Lookups
» Click Add new
» A new box will open to add Automatic lookup
» Provide the name for the Automatic Lookup
» Under Lookup tables, select product_lookup
» Select lookup input and output fields.

Refer to the below screenshot to get a better understanding.

There are two important search commands to create a Splunk Lookup – Input and Output lookup. These are explained below.

Input Lookup: Inputlookup command loads the search results from a specified static lookup table. It scans the lookup table as specified by a filename or a table name. If “append’ is set to true, the data from the lookup file will be appended to the current set of results. For example: Read the product.csv lookup file.

| inputlookup product.csv

Outputlookup: Outputlookup command writes the search results to the specified static lookup table. It saves the result to a lookup table as specified by a filename or a table name. If “createinapp” option is set to false or if there is no current application, then Splunk creates the file in the system lookups directory. For example: Write to product.csv lookup file.

| outputlookup product.csv

By now, you would have understood how Splunk lookups are created. Next, I will explain Splunk fields and how these fields can be extracted to enrich your data.

Learn Splunk From Experts

Splunk Fields

Suppose you have a large amount of data for a company and you need an easy way to access information in key=value pair. Let’s say you want to identify the name of a particular employee or want to find the employee ID. For this, we can declare a Splunk field such as Emp_name or Emp_ID and associate a value to it.
For example: Emp_name= “Jack” or Emp_ID= ‘00124’

Fields are the searchable names in the event data.
Fields filter the event data by providing a specific value to a field.
Fields are the building blocks of Splunk searches, reports, and data models.
A field can have multiple values. It can appear more than once having different values each time.
Field names are case-sensitive.

Let us now understand how fields can be extracted.

Splunk Field Extraction: The process of extracting fields from the events is Splunk field extraction. There are some fields which are extracted by default such as: host, source, sourcetype and timestamps.
I will explain with an easy example to understand this process properly.

Splunk Sample event - Edureka

As you can see in the above example, it displays the event data. In this case, I have taken any sample event and kept the source type as ‘splunk_web_access’. Now Splunk Enterprise will extract fields based on the data collected by the sourcetype.
In the above example, I have opted regular expression which helps to match the highlighted value in the sample events. Now, I have extracted value ‘537.36’ and given the field name as ‘test’. This will display the set of values and extracted values.
Let’s look at how these extracted values are displayed.

Splunk fiels extraction display - Edureka

Now, there are two types of field extractions depending on when Splunk extracts fields:

Index Time Field Extraction (in case of default fields)
Search Time Field Extraction (in case of search fields)

Let us go into more detail to understand it properly:

Index Time Field Extraction	Search Time Field Extraction
1. Index time field extraction happens at the index time when Splunk indexes data.	1. Search time field extraction happens at the search time when we search through data.
2. You can define custom source types and host before indexing, so that it can tag events with them.	2. You cannot change the host or source type assignments.
3. Splunk extracts a set of default fields for each event like host, source and sourcetype. It also includes static or dynamic host assignment, structured data field extraction, custom-index time field extraction, event timestamping etc.	3. Splunk can extract additional fields other than default fields depending on its search settings. It includes event type matching, search-time field extraction, addition of fields from lookup, event segmentation, field aliasing, tagging etc.

Next, there are 3 ways in Splunk to achieve field extraction.

Using Field Extractor Utility
Using Field Extractions Page in Splunk Web
Using Field extractions directly in .conf files

These three are explained in detail below:

1. Using Field Extractor Utility: You can use the field extractor utility to create new fields Also, it is used to create custom fields dynamically in your Splunk instance.

The field extractor enables you to define field extraction by selecting a sample event and highlighting fields to extract from that event.
It provides two methods to extract a field – regular expression and delimiters. The regular expression method works best with unstructured event data, whereas delimiters are designed for structured event data.
The field extractor utility is useful if you are not familiar with regular expression syntax and usage, because it generates field-extracting regular expressions and allows you to test them.

Refer to the below screenshot to get a better understanding.

2. Using Field Extractions Page in Splunk Web: We can use the ‘Field Extractions Page’ to manage search-time field extractions.
The Field Extractions page enables us to:

Review the overall set of search-time extractions.
Create new search-time field extractions.
Update permissions for field extractions.
Delete field extractions

Let’s see how we can access Field extraction page in Splunk Web:
Go to Settings -> Fields -> Field Extractions

Add new Splunk field extraction - Edureka

In the above screenshot, I have explained how the employee name field is extracted from employee_data sourcetype. You can use the following regular expression to extract emp_name field:
^\d+\s+(?P<emp_name>\w+\s+)

Also, you can generate this regular expression from field extractor utility if you don’t know how to create regular expressions.

3. Using Field extractions directly in .conf files: You can also extract fields by directly editing props.conf and transforms.conf files.You can find them in: $SPLUNK_HOME/etc/system/local/

NOTE: Do not edit files in $SPLUNK_HOME/etc/system/default/ as it includes System settings, Authentication and authorization information,Index mappings and various other important settings.

So, this was all about Splunk Knowledge Objects. I hope these blogs helped you learn different knowledge objects and the role it plays in bringing operational efficiency to your business. Check out the next tutorial blog which explains the three concepts that every Splunk administrator must know at his fingertips – Licensing, Data ageing and Configuration files.

<< Previous Next >>

Do you wish to learn Splunk and implement it in your business? Check out our Splunk certification training here, that comes with instructor-led live training and real-life project experience.

Check Out Our Splunk Course

The post Splunk Lookup and Fields: Splunk Knowledge Objects appeared first on Edureka Blog.

The purpose of Informatica ETL is to provide the users, not only a process of extracting data from source systems and bringing it into the data warehouse, but also provide the users with a common platform to integrate their data from various platforms and applications. Before we talk about Informatica ETL, let us first understand why we need ETL.

Why Do We Need ETL?

Every company these days have to process large sets of data from varied sources. This data needs to be processed to give insightful information for making business decisions. But, quite often such data have following challenges:

Large companies generate lots of data and such huge chunk of data can be in any format. They would be available in multiple databases and many unstructured files.
This data must be collated, combined, compared, and made to work as a seamless whole. But the different databases don’t communicate well!
Many organisations have implemented interfaces between these databases, but they faced the following challenges:
- Every pair of databases requires a unique interface.
- If you change one database, many interfaces may have to be upgraded.

Below you can see the various databases of an organisation and their interactions:

Various Dataset of an Organisation - Informatica - ETL - Edureka

Various Databases used by different departments of an organization

Different Interface Between the Databases - Informatica - ETL - Edureka

Different Interactions of the Databases in an Organisation

As seen above, an organisation may have various databases in its various departments and the interaction between them becomes hard to implement as various interaction interfaces have to be created for them. To overcome these challenges, the best possible solution is by using the concepts of Data Integration which would allow data from different databases and formats to communicate with each other. The below figure helps us to understand, how the Data Integration tool becomes a common interface for communication between the various databases.

Data Integration-Informatica ETL-Edureka

Various Databases connected via Data Integration

But there are different processes available to perform Data Integration. Among these processes, ETL is the most optimal, efficient and reliable process. Through ETL, the user can not only bring in the data from various sources, but they can perform the various operations on the data before storing this data on to the end target.

Among the various available ETL tools available in the market, Informatica PowerCenter is the market’s leading data integration platform. Having tested on nearly 500,000 combinations of platforms and applications, Informatica PowerCenter inter operates with the broadest possible range of disparate standards, systems, and applications. Let us now understand the steps involved in the Informatica ETL process.

Informatica ETL | Informatica Architecture | Informatica PowerCenter Tutorial | Edureka

This Edureka Informatica tutorial helps you understand the fundamentals of ETL using Informatica Powercenter in detail.

Steps in Informatica ETL Process:

Before we move to the various steps involved in Informatica ETL, Let us have an overview of ETL. In ETL, Extraction is where data is extracted from homogeneous or heterogeneous data sources, Transformation where the data is transformed for storing in the proper format or structure for the purposes of querying and analysis and Loading where the data is loaded into the final target database, operational data store, data mart, or data warehouse. The below image will help you understand how the Informatica ETL process takes place.

ETL Process - Informatica - ETL - Edureka

ETL Process Overview

As seen above, Informatica PowerCenter can load data from various sources and store them into a single data warehouse. Now, let us look at the steps involved in the Informatica ETL process.

There are mainly 4 steps in the Informatica ETL process, let us now understand them in depth:

Extract or Capture
Scrub or Clean
Transform
Load and Index

1. Extract or Capture: As seen in the image below, the Capture or Extract is the first step of Informatica ETL process. It is the process of obtaining a snapshot of the chosen subset of data from the source, which has to be loaded into the data warehouse. A snapshot is a read-only static view of the data in the database. The Extract process can be of two types:

Full extract: The data is extracted completely from the source system and there’s no need to keep track of changes to the data source since the last successful extraction.
Incremental extract: This will only capture changes that have occurred since the last full extract.

ETL Step-1 Extract -Informatica ETL-Edureka

Phase 1: Extract or Capture

2. Scrub or Clean: This is the process of cleaning the data coming from the source by using various pattern recognition and AI techniques to upgrade the quality of data taken forward. Usually, the errors like misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistencies are highlighted and then corrected or removed in this step. Also, operations like decoding, reformatting, time stamping, conversion, key generation, merging, error detection/logging, locating missing data are done in this step. As seen in the image below, this is the second step of Informatica ETL process.

ETL Step-2 Scrub - Informatica ETL-Edureka

Phase 2: Scrubbing or Cleaning of data

3. Transform: As seen in the image below, this is the third and most essential step of Informatica ETL process. Transformations is the operation of converting data from the format of the source system to the skeleton of Data Warehouse. A Transformation is basically used to represent a set of rules, which define the data flow and how the data is loaded into the targets. To know more about Transformation, check out Transformations in Informatica blog.

ETL Step-3 Transform - Informatica ETL-Edureka

Phase 3: Transformation

4. Load and Index: This is the final step of Informatica ETL process as seen in the image below. In this stage, we place the transformed data into the warehouse and create indexes for the data. There are two major types of data load available based on the load process.:

Full Load or Bulk Load: The data loading process when we do it at very first time. The job extracts entire volume of data from a source table and loads into the target data warehouse after applying the required transformations. It will be a one time job run after then changes alone will be captured as part of an incremental extract.
Incremental load or Refresh load: The modified data alone will be updated in target followed by full load. The changes will be captured by comparing created or modified date against the last run date of the job. The modified data alone extracted from the source and will be updated in the target without impacting the existing data.

ETL Step-4 Load and Index - Informatica ETL - Edureka

Phase 4: Load and Index

If you have understood the Informatica ETL process, we are now in a better position to appreciate why Informatica is the best solution in such cases.

Features of Informatica ETL:

For all the Data integration and ETL operations, Informatica has provided us with Informatica PowerCenter. Let us now see some key features of Informatica ETL:

Provides facility to specify a large number of transformation rules with a GUI.
Generate programs to transform data.
Handle multiple data sources.
Supports data extraction, cleansing, aggregation, reorganisation, transformation, and load operations.
Automatically generates programs for data extraction.
High-speed loading of target data warehouses.

Below are some of the typical scenarios in which Informatica PowerCenter is being used:

Data Migration:

A company has purchased a new Accounts Payable Application for its accounts department. PowerCenter can move the existing account data to the new application. The figure below will help you understand how you can use Informatica PowerCenter for Data migration. Informatica PowerCenter can easily preserve data lineage for tax, accounting, and other legally mandated purposes during the data migration process.

Data Migration - Informatica - ETL - Edureka

Data Migration from an Older Accounting application to a new Application

Application Integration:

Let’s say Company-A purchases Company-B. So, to achieve the benefits of consolidation, the Company-B’s billing system must be integrated into the Company-A’s billing system which can be easily done using Informatica PowerCenter. The figure below will help you understand how you can use Informatica PowerCenter for the integration of applications between the companies.

Application Integration - Informatica - ETL - Edureka

Integrating Application between Companies

Data warehousing

Typical actions required in data warehouses are:

Combining information from many sources together for analysis.
Moving data from many databases to the Data warehouse.

All the above typical cases can be easily performed using Informatica PowerCenter. Below, you can see Informatica PowerCenter is being used to combine the data from various kinds of databases like Oracle, SalesForce, etc. and bringing it to a common data warehouse created by Informatica PowerCenter.

Data Warehouse - Informatica - ETL - Edureka

Data From various databases integrated to a common Data warehouse

Middleware

Let’s say a retail organisation is making use of SAP R3 for its Retail applications and SAP BW as its data warehouse. A direct communication between these two applications is not possible due to the lack of a communication interface. However, Informatica PowerCenter can be used as a Middleware between these two applications. In the image below you can see the architecture of how Informatica PowerCenter is being used as middleware between SAP R/3 and SAP BW. The Applications from SAP R/3 transfer their data to the ABAP framework which then transfers it to the SAP Point of Sale (POS) and SAP Bills of Services (BOS). Informatica PowerCenter helps the transfer of data from these services to the SAP Business Warehouse (BW).

SAP retail architecture-Informatica ETL-Edureka

Informatica PowerCenter as Middleware in SAP Retail Architecture

Learn Informatica from Experts

While you have seen a few key features and typical scenarios of Informatica ETL, I hope you understand why Informatica PowerCenter is the best tool for ETL process. Let us now see a use case of Informatica ETL.

Use Case: Joining Two tables to obtain a Single detailed Table

Let’s say you wish to provide department wise transportation to your employees as the departments are located at various locations. To do this, first you need to know which Department each employee belongs to and location of the department. However, the details of employees are stored in different tables and you need to join the details of Department to an existing database with the details of all Employees. To do this, we will be first loading both the tables into Informatica PowerCenter, performing Source Qualifier Transformation on the data and finally loading the details to Target Database. Let us begin:

Step 1: Open PowerCenter Designer.

Launching Designer - Informatica - ETL - Edureka

Below is the Home page of Informatica PowerCenter Designer.

Designer Hompage - Informatica - ETL - Edureka

Let us now connect to the repository. In case you haven’t configured your repositories or are facing any issues you can check our Informatica Installation blog.

Step 2: Right click on your repository and select connect option.

Connect to Repository - Informatica - ETL - Edureka

On clicking the connect option, you will be prompted with the below screen, asking for your repository username and password.

Connecting to Repository - Informatica - ETL - Edureka

Once you have connected to your repository, you have to open your working folder as seen below:

Working Folder - Informatica - ETL - Edureka

You will be prompted asking the name of your mapping. Specify the name of your mapping and click on OK (I have named it as m-EMPLOYEE).

Mapping Name - Informatica - ETL - Edureka

Step 3: Let us now load the Tables from the Database, Start by connecting to the Database. To do this, select Sources tab and Import from Database option as seen below:

Selecting source-Informatica ETL-Edureka

On clicking Import from Database, you will be prompted the screen as below asking the details of your Database and its Username and Password for connection(I am using the oracle database and HR user).

Source Database Details-Informatica ETL-Edureka

Click on Connect to connect to your database.

Connecting to Source Database-Informatica ETL-Edureka

Step 4: As I wish to join the EMPLOYEES and DEPARTMENT tables, I will select them and click on OK.
Datasets - Informatica - ETL - Edureka The sources will be visible on your mapping designer workspace as seen below.

Source Mapping-Informatica ETL-Edureka

Step 5: Similarly Load the Target Table to the Mapping.

Source Target Mapping-Informatica ETL-Edureka

Step 6: Now let us link the Source qualifier and the target table. Right click on any blank spot of the workspace and select Autolink as seen below:

Autolink - Informatica ETL - Edureka

Below is the mapping linked by Autolink.

Linked Mapping-Informatica ETL-Edureka

Step 7: As we need to link both the tables to the Source Qualifier, select the columns of the Department table and drop it in the Source Qualifier as seen below:

Selecting Colums from Source -Informatica ETL-Edureka

Drop the column values into the Source Qualifier SQ_EMPLOYEES.

Droping Colums to Source Qualifier - Informatica ETL - Edureka

Below is the updated Source Qualifier.

Updated Mapping - Informatica ETL - Edureka

Step 8: Double click on Source Qualifier to edit the transformation.

Editing Source Qualifier - Informatica ETL - Edureka

You will get the Edit Transformation pop up as seen below. Click on Properties tab.

Edit Source Transformations - Informatica ETL - Edureka

Step 9: Under the Properties tab, Click on Value field of UserDefined Join row.

Source Qualifier Properties - Informatica ETL - Edureka

You will get the following SQL Editor:

SQL Editor - Informatica ETL - Edureka

Step 10: Enter EMPLOYEES.DEPARTMENT_ID=DEPARTMENT.DEPARTMENT_ID as the condition to join both the tables in the SQL field and click on OK.

User Join Condition - Informatica ETL - Edureka

Step 11: Now click on the SQL Query row to generate the SQL for joining as seen below:

SQL Query Option - Informatica ETL - Edureka

You will get the following SQL Editor, Click on Generate SQL option.

Genetating SQL - Informatica ETL - Edureka

The following SQL will be generated for the condition we had specified in the previous step. Click on OK.

User Defined Join SQL - Informatica ETL - Edureka

Step 12: Click on Apply and OK.

Updating Source Qualifier - Informatica ETL-Edureka

Below is the completed mapping.

Updated Source Qualifier Mapping - Informatica ETL - Edureka

We have completed the designing of the how the data has to be transferred from the source to target. However, the actual transfer of data is still yet to happen and for that we need to use the PowerCenter Workflow Design. The execution of the workflow will lead to the transfer of data from the source to the target. To know more about workflow, check our Informatica Tutorial: Workflow blog

Step 13: Let us now launch the Workflow Manager by Clicking the W icon as seen below:

Launching Workflow - Informatica ETL - Edureka

Below is the workflow designer home page.

Workflow Manager Hompage - Informatica ETL - Edureka

Step 14: Let us now create a new Workflow for our mapping. Click on Workflow tab and select Create Option.

Creating Workflow - Informatica ETL - Edureka

You will get the below pop-up. Specify the name of your workflow and click on OK.

Workflow Naming - Informatica ETL - Edureka

Step 15: Once a workflow is created, we get the Start Icon in the Workflow Manager workspace.

Start Icon - Informatica ETL - Edureka

Let us now add a new Session to the workspace as seen below by clicking the session icon and clicking on the workspace:

Creating Session - Informatica ETL - Edureka

Click on the workspace to place the Session icon.

Adding Session Icon - Informatica ETL-Edureka

Step 16: While adding the session you have to select the Mapping you had created and saved in the above steps. (I had saved it as m-EMPLOYEE).

Selecting Mapping - Informatica ETL - Edureka

Below is the workspace after adding the session icon.

Updated Workflow - Informatica ETL - Edureka

Step 17: Now that you have created a new Session, we need to link it to the start task. We can do it by clicking on Link Task icon as seen below:

Adding Link - Informatica ETL - Edureka

Click on the Start icon first and then on the Session icon to establish a link.

Linking Icon - Informatica etl - Edureka

Below is a connected workflow.

Linked Workflow - informatica etl - edureka

Step 18: Now that we have completed the design, let us start the workflow. Click on Workflow tab and select Start Workflow option.

Starting Workflow - Informatica ETL - Edureka

Workflow manager starting Workflow Monitor.

Launching Workflow Moniter - Informatica ETL - Edureka

Step 19: Once we start the workflow, the Workflow Manager automatically launches and allows you to monitor the execution of your workflow. Below you can see the Workflow Monitor shows the status of your workflow.

Workflow Moniter - Informatica ETL - Edureka

Step 20: To check the status of the workflow, right click on the workflow and select Get Run Properties as seen below:

Getting Run Properties - Informatica - ETL - Edureka

Select the Source/Target Statistics tab.

Source Target Statistics - Informatica ETL - Edureka

Below you can see the number of rows that have been transferred between the source and target after transformation.

Source Target Properties - Informatica - ETL - Edureka

You can also verify your result checking your target table as seen below.

Target Database - Informatica - ETL - Edureka

I hope this Informatica ETL blog was helpful to build your understanding on the concepts of ETL using Informatica and has created enough interest for you to learn more about Informatica.

View Upcoming Batches

If you found this blog helpful, you can also check out our Informatica Tutorial blog series What is Informatica: A Beginner Tutorial of Informatica PowerCenter, Informatica Tutorial: Understanding Informatica ‘Inside Out’ and Informatica Transformations: The Heart and Soul of Informatica PowerCenter. In case if you are looking for details on Informatica Certification, you can check our blog Informatica Certification: All there is to know.

If you have already decided to take up Informatica as a career, I would recommend you to have a look at our Informatica training course page. The Informatica Certification training at Edureka will make you an expert in Informatica through live instructor-led sessions and hands-on training using real life use cases.

The post Informatica ETL: A Beginner’s Guide To Understanding ETL Using Informatica PowerCenter appeared first on Edureka Blog.

In recent years, there has been a big surge in Cloud Computing Technologies. One such technology which has had immense impact on the world of computing is Salesforce. In this blog, I will introduce you to Salesforce and will answer: What is Salesforce? Why use Salesforce? And Where is Salesforce being utilized?

How It All Began?

Before Salesforce, Customer Relationship Management (CRM) solutions were hosted on a company’s own server. Can you imagine the cost and time it took for companies to have their own CRM solutions? Well, it used to take months or even years to set it up and the cost went up to millions of dollars. Even after setting up, they were extremely hard to use. What would be a feasible solution to this? I am sure you guessed it – building an affordable CRM software and delivering it entirely online as a service. This was the main idea behind Salesforce. Started as a Software as a Service (SaaS) company, Salesforce has grown into the fifth-largest software company in the world.

What Made Salesforce An Instant Hit?

The answer to this is very simple, it was Cloud Computing. Salesforce wasn’t just about a better product at a fraction of the cost. It was about replacing the lengthy installation process and moving everything to the internet. They changed the business model – no more long term contracts or expensive licensing deals, anyone could use Salesforce with only a simple 50-dollar monthly subscription fee.

Why Salesforce?

Before I answer the question, what is Salesforce, let me brief you about why you should choose Salesforce.

why salesforce - what is salesforce - edureka

As seen in the above image, Salesforce provides you with the fastest path from Idea to App. You can concentrate on building your app using Salesforce tools, rather than building the infrastructure and tools yourself. This can save you years of time and millions of dollars.
Salesforce customers generally say that it’s unique for three major reasons:
- Fast – Traditional CRM software can take more than a year to deploy, compare that to months or even weeks with Salesforce.
- Easy – Salesforce wins in the easy to use category hands down. You can spend more time putting it to use and less time figuring it out.
- Effective – Because it is easy to use and can be customized to meet business needs, customers find Salesforce very effective.
Salesforce is in the cloud, so your team can use it from anywhere with access to the internet.
If you are a business that is rapidly changing or you are a seasoned company that’s been around for years, your business is probably changing too. Salesforce is completely scalable to your growth.
Salesforce seamlessly integrates with 3rd party apps. If you want to integrate Salesforce with Gmail you can do it, if you want to integrate it with your accounting software you can do that too. On the other hand, integration is tough with other CRMs.
Salesforce is affordable, especially if you consider its vast variety of capabilities. Even startups and small business can use Salesforce.

Statistics which make you choose Salesforce

As of May 2016, Salesforce has had over 150,000 customers across the world. In the world of CRM, Salesforce dominates with a 19.7% market share. Its closest competitors SAP (12.1%), Oracle (9.1%) and Microsoft (6.2%) are far behind. The Salesforce AppExchange features over 2,700 applications which has driven a total of over 3 million installations and more than 70% of Salesforce customers use applications that are listed on the AppExchange. Today, many companies are developing their applications on Salesforce platform or are migrating to Salesforce. This has increased the demand for Salesforce developers and administrators. Currently, Salesforce Architect is one of the hottest skills to have on a tech resume. According to Business Insider,

“Salesforce Architect has the highest average salary across the most valuable job skills.”

infographics - what is salesforce - edureka

What Is Salesforce?

I hope I have answered your question on why you should choose Salesforce. Now, let me introduce you to Salesforce and answer the question on your mind: What is Salesforce? Below is an image which shows the power of Salesforce in today’s tech-savvy world. From tech giants like Google and Facebook to your nearby call center, all of them use Salesforce services and products to solve their problems.

what is salesforce - what is salesforce - edureka

Salesforce started as Software as a Service (SaaS) CRM company. Salesforce now provides various software solutions and a platform for users and developers to develop and distribute custom software. Salesforce.com is based on multi-tenant architecture. This means that multiple customers share common technology and all run on the latest release. You don’t have to worry about the application or infrastructure upgrades – they happen automatically. This helps your organization focus on innovation rather than managing technology.

Get Started With Salesforce

What Are The Services And Products That Salesforce Offers?

To understand what is Salesforce, you need to know the different services and products that Salesforce has to offer and when to use them. Through Salesforce, you can access a wide range of products and services in Cloud, Social and Mobile domains. Below is an image that shows the different services and products that Salesforce offers to its customers.

salesforce services - what is salesforce - edureka

The Cloud Services That Are Offered By Salesforce Are:

Salesforce Sales Cloud – The Sales Cloud is a CRM platform that enables you to manage your organization’s sales, marketing and customer support facets. If your company is engaged in business-to-business (B2B) and business-to-customer (B2C), then sales cloud is the service your sales team needs.

salesforce marketing cloud - what is salesforce - edureka Salesforce Marketing Cloud – The marketing cloud provides you with one of the world’s most powerful digital marketing platforms. The marketers in your organisation can use it to manage customer journey, email, mobile, social media, web personalization, content creation, content management and data analytics.

Salesforce Service Cloud – The Service Cloud is a service platform for your organization’s customer service and support team. It provides features like case tracking and social networking plug-in for conversation and analytics. This not only helps your agents to solve customer problems faster, but also gives your customers access to answers. Using these answers your customers can solve problems on their own.

Salesforce Community Cloud – If you need a social platform for your organization to connect and facilitate communication among your employees, partners and customers then Salesforce Community Cloud is the service you need. You can use this platform to exchange data and images in real time.

Salesforce Commerce Cloud – The commerce cloud enables your organization to provide seamless customer service and experience irrespective of your customer’s location (online or in-store). It also provides for customer data integration so that your consumers can have a better experience. If your goal is to provide customer with a positive, engaging customer experience, Commerce Cloud is the service you need.

Salesforce Analytics Cloud – The Analytics Cloud provides a business intelligence platform for your organization to work with large data files, create graphs, charts and other pictorial representations of data. It is optimized for mobile access and data visualization and can be integrated with other Salesforce clouds.

Salesforce App Cloud – To develop custom apps that will run on the Salesforce platform, you can use the Salesforce App Cloud. It provides you with a collection of development tools that you can utilize to create custom applications. Some of the tools in the App Cloud include:

Force.com, allows admins and developers to create websites and applications into the main Salesforce.com application.
AppExchange is an online application marketplace for third-party applications that run on the Force.com platform.
Heroku Enterprise gives developers the flexibility to create apps using their preferred languages and tools.
Salesforce Thunder is big data and rules processing engine designed to analyze events and take personalized actions.
Salesforce Sandbox allows developers to test ideas in a safe and isolated development environment.

Salesforce IoT Cloud – When your organization needs to store and process Internet of Things (IoT) data, you can utilize the service of Salesforce IoT cloud. The platform is built to take in massive volumes of data generated by devices, sensors, websites, applications, customers and partners. On receiving this data, the platform initiates actions to give you real time responses.

Salesforce Health Cloud – If you are a Health IT organization and require a CRM system that incorporates doctor-patient relationship and record management, then Health Cloud is what you need. Through the patient profile you can support one-to-one relationship by integrating information from multiple data sources.

Other Services That Are Offered By Salesforce Are:

Chatter – Chatter is an enterprise collaboration platform from Salesforce that enables your employees to collaborate. Chatter can help you drive productivity by connecting employees wherever they are. It also helps in knowledge sharing between departments in an organization or different organizations.

Salesforce1 – Salesforce1 is the platform which enables you to develop application and exchange data via Application Programming Interface (APIs). APIs refer to prebuilt programming code components.

Learn Salesforce From Expert

Which Companies Use Salesforce?

Now that we have a clear understanding of what is Salesforce and which service to use when, let’s look at where Salesforce is being used by various companies across different industries.

Industry	Company	Case
Communications	Comcast-Spectator	Comcast-Spectator used Salesforce to maintain detailed customer profiles so that they can identify their biggest fans and market more effectively to them.
Financial Services	American Express	American Express started to use Salesforce Sales Cloud in 2010. Now, they have their customer success platform on Salesforce, which connects thousands of employees across organizations, locations and time zones.
Government	Obama for America	Staff of Obama for America used Salesforce Service Cloud to send personalized emails to users. They also used dashboards to get real-time read on what the nation was thinking about and where opinions differed across the country.
Health Care	Health Leads	With Salesforce Community Cloud, Health Leads are leading the way to a new model for healthcare. Also, with Salesforce App Cloud, they can easily view and update patient data, coordinate with physicians and find effective community resources.
High Tech	Sony	Sony uses Salesforce Service Cloud to tune in with its customers. Sony’s customer cases are managed as one unified agent experience which has helped them to keep their customers happy.
Media	Coco-Cola Enterprise (CCE)	CCE uses Salesforce across multiple geographies and multiple business functions. From call center agent to service technicians and sales representative, Salesforce is being used to connect people and information. This has helped CCE to deliver a better customer experience.
Manufacture	InMobi	InMobi swapped out several CRM systems for the Salesforce Sales Cloud. Salesforce has enabled InMobi to run a single layer of data management through the company. This has made InMobi a fast and efficient company.
Retail	Trip A Deal	Trip A Deal’s Heroku based cloud platform application was designed and deployed in under five weeks. This delivered critical advantages to the start up business including system stability and cost effective scalability.

Salesforce Use Case

We have seen above how different companies are using Salesforce to solve problems and improve productivity. Now, let’s look at how Hindustan Computer Limited (HCL) utilized Salesforce to solve its business challenges.

HCL wanted to cater to the changing business requirements. To solve this problem, HCL wanted to implement, integrate and migrate to a new sales force application (SFA – software that automates business tasks) from their previous .NET SFA.
HCL wanted to overcome the challenges of data redundancy and inconsistency caused by multiple applications.

To solve these business challenges HCL chose to implement SFA using Salesforce.com. HCL evaluated Salesforce and found that it met their requirements in regard to security, governance, collaboration and integration with third party applications.

hcl salesforce - what is salesforce - edureka

The above image shows the different modules that were implemented to solve the challenges. Below I have explained these modules in detail.

Lead Management Module – HCL used the Salesforce standard solution for lead capturing and assignment, lead qualification and conversion. They developed a custom solution for capturing leads from external sources and screening leads to prevent duplication.

Account Management Module – HCL used the Salesforce standard account management module and implemented a custom solution for defining business logic for real-time search for customer accounts, manage the flow of data and manage daily batch updates.

Contact Management Module – HCL used the Salesforce standard contact management module including activity management and implemented a custom solution for managing flow of data after account conversion.

Opportunity Management Module – HCL used the standard Salesforce opportunity module and developed a custom solution for adding products to opportunity, enabling manual product pricing and for creating discount approvals on opportunity product.

Integration With Other Applications – Salesforce was integrated with applications such as SAP, ODS, BI and various third party applications by using AppExchange applications and customized Web Service APIs.

HCL implemented the Salesforce1 Sales Cloud in four months with 92% user adoption. For a program, which impacted over 6200 users across 14 countries, four months is quick. Ease of integration with third party applications improved productivity by 3-4 percent. Using Salesforce, HCL was able to validate data at the time of data entry. This enabled them to achieve an incredible 90% accuracy in their master data.

I hope you have understood Salesforce and got the answer to what is Salesforce. Do read my next blog on how to create a custom Salesforce App with step-by-step instructions.

Next >>

Check out our Salesforce Certification Training, which comes with instructor-led live training and real life project experience. Feel free to leave any questions you have in the comment box below.

Check Out Salesforce Course

The post What Is Salesforce? A Beginners Guide To Understanding Salesforce appeared first on Edureka Blog.

salesforce tutorial - edureka In the previous blog, you learnt what is Salesforce and how it revolutionized Cloud Computing. In this Salesforce tutorial blog, I will show you how to create a custom Salesforce App. I will be creating an app called StudentForce which can be used to maintain student records.

This app will contain three different objects (tables) to store data. The first object called Students Data will contain the names of students and their personal details like email id, phone number and native city. The college, students belong to, will be stored in the second object called College and the third object called Marks will contain the marks obtained by the students in various subjects.

Salesforce Tutorial

I have covered the following topics in this Salesforce tutorial blog with step-by-step instructions and screenshots:

How to create the app environment?
What are tabs and how to create tabs in your app?
What are profiles and how to customize user profiles?
How to create objects in the app?
How to create fields in objects and define their data type?
How to add entries (fields) into these objects?
How to link (create a relationship between) two different objects?

Before I get started with creating an app, let me introduce you to the cloud environment where Salesforce apps are built.

Salesforce Org

The cloud computing space offered to you or your organization by Force.com is called Salesforce org. It is also called Salesforce environment. Developers can create custom Salesforce Apps, objects, workflows, data sharing rules, Visualforce pages and Apex coding on top of Salesforce Org.

Let us now deep dive into Salesforce Apps and understand how it functions.

Salesforce Apps

The primary function of a Salesforce app is to manage customer data. Salesforce apps provide a simple UI to access customer records stored in objects (tables). Apps also help in establishing relationship between objects by linking fields.

Apps contain a set of related tabs and objects which are visible to the end user. The below screenshot shows, how the StudentForce app looks like.

salesforce app - salesforce tutorial - edureka

The highlighted portion in the top right corner of the screenshot displays the app name: StudentForce. The text highlighted next to the profile pic is my username: Vardhan NS.

Before you create an object and enter records, you need to set up the skeleton of the app. You can follow the below instructions to set up the app.

Steps To Setup The App

Click on Setup button next to app name in top right corner.
In the bar which is on the left side, go to Build → select Create → select Apps from the drop down menu.
Click on New as shown in the below screenshot.
Choose Custom App.
Enter the App Label. StudentForce is the label of my app. Click on Next.
Choose a profile picture for your app. Click Next.
Choose the tabs you deem necessary. Click Next.
Select the different profiles you want the app to be assigned to. Click Save.

In steps 7 and 8, you were asked to choose the relevant tabs and profiles. Tabs and profiles are an integral part of Salesforce Apps because they help you to manage objects and records in Salesforce.

In this salesforce tutorial, I will give you a detailed explanation of Tabs, Profiles and then show you how to create objects and add records to it.

Get Started With Salesforce

Salesforce Tabs

Tabs are used to access objects (tables) in the Salesforce App. They appear on top of the screen and are similar to a toolbar. It contains shortcut links to multiple objects. On clicking the object name in a tab, records in that object will be displayed. Tabs also contain links to external web content, custom pages and other URLs. The highlighted portion in the below screenshot is that of Salesforce tabs.

salesforce tabs - salesforce tutorial - edureka

All applications will have a Home tab by default. Standard tabs can be chosen by clicking on ‘+’ in the Tab menu. Accounts, Contacts, Groups, Leads, Profile are the standard tabs offered by Salesforce. For example, Accounts tab will show you the list of accounts in the SFDC org and Contacts tab will show you the list of contacts in the SFDC org.

Steps To Add Tabs

Click on ‘+’ in the tab menu.
Click on Customize tabs, which is present on the right side.
Choose the tabs of your choice and click on Save.

Besides standard tabs, you can also create custom tabs. Students tab that you see in the above screenshot is a custom tab that I have created. This is a shortcut to reach the custom object: Students.

Steps To Create Custom Tabs

Navigate to Setup → Build → Create → Tabs.
Click on New.
Select the object name for which you are creating a tab. In my case, it is Students Data. This is a custom object which I have created (the instructions to create this object is covered later in this blog).
Choose a tab style of your preference and enter a description.
Click on Next → Save. The new Students Data tab will appear as shown below.

Salesforce Profiles

Every user who needs to access the data or SFDC org will be linked to a profile. A profile is a collection of settings and permissions which controls what a user can view, access and modify in Salesforce.

A profile controls user permissions, object permissions, field permissions, app settings, tab settings, apex class access, Visualforce page access, page layouts, record types, login hour and login IP addresses.

You can define profiles based on the background of the user. For example, different levels of access can be set for different users like system administrator, developer and sales representative.

Similar to tabs, we can use any standard profile or create a custom profile. By default, the available standard profiles are: read only, standard user, marketing user, contract manager, solution manager and system administrator. If you want to create custom profiles, you have to first clone standard profiles and then edit that profile. Do note that one profile can be assigned to many users, but one user cannot be assigned many profiles.

Steps To Create A Profile

Click on Setup → Administer → Manage users → Profiles
You can then clone any of the existing profiles by clicking on Edit.

Once the tabs and profiles are set up for your App, you can load data into it. The next section of this Salesforce tutorial will thus cover how data is added to objects in the form of records and fields.

Objects, Fields And Records In Salesforce

Objects, Fields and Records are the building blocks of Salesforce. So, it is important to know what they are and what role they play in building Apps.

Objects are the database tables in Salesforce where data is stored. There are two types of objects in Salesforce:

Standard objects: The objects provided by Salesforce are called standard objects. For example, Accounts, Contacts, Leads, Opportunities, Campaigns, Products, Reports, Dashboard etc.
Custom objects: The objects created by users are called custom objects.

Objects are a collection of records and records are a collection of fields.

Every row in an object consists of many fields. Thus a record in an object is a combination of related fields. Look at the below excel for illustration.

sample excel - salesforce tutorial - edureka

I will create an object called Students Data which will contain personal details of students.

Steps to create a custom object:

Navigate to Setup → Build → Create → Object
Click on New Custom Object.
Fill in the Object Name and Description. As you can see from the below image, the object name is Students Data.
Click on Save.

If you want to add this custom object to the tab menu, then you can follow the instructions mentioned earlier in this Salesforce tutorial blog.

After creating the object, you need to define various fields in that object. e.g. the fields in a student’s record will be student name, student phone number, student email ID, the department a student belongs to and his native city.

You can add records to objects only after defining the fields.

Steps To Add Custom Fields

Navigate to Setup → Build → Create → Objects
Select the object to which you want to add fields. In my case, it is Students Data.
Scroll down to Custom Fields & Relationships for that object and click on New as shown in the below screenshot.
You need to choose the data type of that particular field and then click Next. I have chosen text format because I will be storing letters in this field.
The different data types of fields have been explained in detail in the next section of this blog.
You will then be prompted to enter the name of the field, maximum length of that field and description.
You can also make it an optional/ mandatory field and allow/ disallow duplicate values for different records by checking on the check boxes. See the below screenshot to get a better understanding.
Click on Next.
Select the various profiles who can edit that text field at a later point of time. Click Next.
Select the page layouts that should include this field.
Click Save.

As you can see from the below screenshot, there are two types of fields. Standard fields created for every object by default and Custom fields created by myself. The four fields which I have created for Students Data are City, Department, Email ID and Phone No. You will notice that all custom fields are suffixed with ‘__C’ which indicates that you have the power to edit and delete those fields. Whereas some standard fields can be edited, but not deleted.

fields - salesforce tutotial - edureka

You can now add student records (complete row) to your object.

Steps To Add A Record

Go to the object table from the tab menu. Students Data is the object to which I will add records.
As you can see from the below image, there are no existing records. Click on New to add new student records.
Add student details into different fields as shown in the below screenshot. Click on Save.
You can create any number of student records. I have created 4 student records as shown in the below screenshot.
In case you want to edit the student details, you can click on Edit as shown in the below screenshot.

Data Types Of Fields

Data type controls which type of data can be stored in a field. Fields within a record can have different data types. For example:

If it is a phone number field, you can choose Phone.
If it is a name or a text field, you can choose Text.
If it is a date/ time field, you can choose Date/Time.
By choosing Picklist as data type for a field, you can write predefined values in that field and create a drop-down.

You can choose any one of the data types for custom fields. Below is a screenshot listing the different data types.

data types - salesforce tutorial - edureka

Data types like Lookup Relationship, Master-Detail Relationship and External Lookup Relationship are used to create links/ relationships between one or more objects. Relationships between objects is the next topic of discussion in this Salesforce tutorial blog.

Object Relationship In Salesforce

As the name suggests, object relationship is used in Salesforce to create a link between two objects. The question on your mind would be, why is it needed? Let me talk about the need with an example.

In my StudentForce app, there is a Students Data object, which contains personal information of students. Details regarding student’s marks and their previous college are present in different objects. We can use relationships to link these objects using related fields. The marks of the students and colleges can be linked with the Student Name field of Student Data object.

Relationships can be defined while choosing the data type. They are always defined in the child object and are referenced to the common field in master object. Creating such links will help you to search and query data easily when the required data is present in different objects. There are three different types of relationships that can exist between objects. They are:

Master-Detail
Lookup
Junction

Let us look into each of them:

Master-Detail Relationship (1:n)

Master-Detail relationship is a parent-child relationship in which the master object controls the behaviour of the dependent object. It is a 1:n relationship, in which there can be only one parent, but many children. In my example, Students Data is the master object and Marks is the child object.

Let me give you an example of a Master-Detail relationship. The Students Data object contains student records. Each record contains personal information about a student. However, the marks obtained by students are present in another record called Marks. Look at the screenshot of Marks object below.

master detail relationship - salesforce tutotial - edureka

I have created a link between these two objects by using the student’s name. Below are the points you have to keep in mind when setting up a Master-Detail relationship.

Being the controlling object, the master field cannot be empty.
If a record/ field in master object is deleted, the corresponding fields in the dependent object are also deleted. This is called a cascade delete.
Dependent fields will inherit the owner, sharing and security settings from its master.

You can define master-detail relationships between two custom objects, or between a custom object and standard object as long as the standard object is the master in the relationship.

Lookup Relationship (1:n)

Lookup relationships are used when you want to create a link between two objects, but without the dependency on the parent object. You can think of this as a form of parent-child relationship where there is only one parent, but many children i.e. 1:n relationship. Below are the points you have to keep in mind when setting up a Master-Detail relationship.

The lookup field on the child object is not necessarily required.
The fields/ records in a child object cannot be deleted by deleting a record in the parent object. Thus the records in the child object will not be affected.
The child fields will not inherit the owner, sharing and security settings of its parent.

An example of a lookup relationship in my case would be that of a College object. You can see the child object: Students Data in the below screenshot. You will notice that there is an empty College field for the first record. This indicates that the dependency is not a necessity.

lookup relationship - salesforce tutotial - edureka

Below is a screenshot of the schema diagram of both the relationships. College – Student Data forms the Lookup relationship and Student Data – Marks forms the Master-Detail relationship.

schema builder 1 - salesforce tutotial - edureka

Self-Relationship

This is a form of lookup relationship where instead of two tables/ objects, the relationship is within the same table/ object. Hence the name self-relationship. Here, the lookup is referenced to the same table. This relationship is also called Hierarchical relationship.

Junction Relationship (Many-To-Many)

This kind of a relationship can exist when there is a need to create two master-detail relationships. Two master-detail relationships can be created by linking 3 custom objects. Here, two objects will be master objects and the third object will be dependent on both the objects. In simpler words, it will be a child object for both the master objects.

To give you an example of this relationship, I have created two new objects.

A master object called Professor. It contains the list of professors.
A child object called Courses. It contains the list of courses available.
I will use the Students Data object as another master object.

I have created a many-to-many relationship such that every record in the Courses object must have at least one student and at least one professor. This is because every course is a combination of students and professors. In fact, a course, can have one or more number of students and professors associated with them.

The dependency on Student and Professor objects makes Courses as the child object. Student and Professor are thus the master objects. Below is a screenshot of Courses object.

many to many relationship - salesforce tutotial - edureka

You will notice that there are different combinations of professors and students for these subjects. For example, Kate is associated with two courses and has two different professors for each of those two courses. Mike is associated with only one course, but, has two different professors for that course. Both Joe and Kate are associated with the same course and same professor. In the below screenshot, you will find the schematic diagram of this relationship.

schema builder 2 - salesforce tutotial - edureka

Congrats! The StudentForce App is successfully built. The two schema diagrams present above show how the different objects are linked inside my Salesforce App.

This brings us to the end of this Salesforce tutorial. I hope you understood the various concepts like apps, tabs, profiles, fields, objects and relationships which were explained in this Salesforce tutorial blog. In case you have any doubts or queries, feel free to leave them in the comment section below and I will get back to you at the earliest.

Stay tuned to read the next blog in our Salesforce tutorial series. In the meantime, I would suggest you to create a Salesforce account and play around with the Salesforce app. You can try building your own app by following the instructions mentioned above.

<< Previous

If you want to become a professional skilled in Salesforce then, check out our Salesforce Certification Training which comes with instructor-led live training and real life project experience.

Learn Salesforce From Experts

The post Salesforce Tutorial: Learn To Create Your Own Salesforce App appeared first on Edureka Blog.

What is Data Science - Edureka As the world entered the era of big data, the need for its storage also grew. It was the main challenge and concern for the enterprise industries until 2010. The main focus was on building framework and solutions to store data. Now when Hadoop and other frameworks have successfully solved the problem of storage, the focus has shifted to the processing of this data. Data Science is the secret sauce here. All the ideas which you see in Hollywood sci-fi movies can actually turn into reality by Data Science. Data Science is the future of Artificial Intelligence. Therefore, it is very important to understand what is Data Science and how can it add value to your business.

In this blog, I will be covering the following topics.

The need for Data Science.
What is Data Science?
How is it different from Business Intelligence (BI) and Data Analysis?
The lifecycle of Data Science with the help of a use case.

By the end of this blog, you will be able to understand what is Data Science and its role in extracting meaningful insights from the complex and large sets of data all around us.

Let’s Understand Why We Need Data Science

Traditionally, the data that we had was mostly structured and small in size, which could be analyzed by using the simple BI tools. Unlike data in the traditional systems which was mostly structured, today most of the data is unstructured or semi-structured. Let’s have a look at the data trends in the image given below which shows that by 2020, more than 80 % of the data will be unstructured.

This data is generated from different sources like financial logs, text files, multimedia forms, sensors, and instruments. Simple BI tools are not capable of processing this huge volume and variety of data. This is why we need more complex and advanced analytical tools and algorithms for processing, analyzing and drawing meaningful insights out of it.

This is not the only reason why Data Science has become so popular. Let’s dig deeper and see how Data Science is being used in various domains.

How about if you could understand the precise requirements of your customers from the existing data like the customer’s past browsing history, purchase history, age and income. No doubt you had all this data earlier too, but now with the vast amount and variety of data, you can train models more effectively and recommend the product to your customers with more precision. Wouldn’t it be amazing as it will bring more business to your organization?

Let’s take a different scenario to understand the role of Data Science in decision making. How about if your car had the intelligence to drive you home? The self-driving cars collect live data from sensors, including radars, cameras and lasers to create a map of its surroundings. Based on this data, it takes decisions like when to speed up, when to speed down, when to overtake, where to take a turn – making use of advanced machine learning algorithms.

Let’s see how Data Science can be used in predictive analytics. Let’s take weather forecasting as an example. Data from ships, aircrafts, radars, satellites can be collected and analyzed to build models. These models will not only forecast the weather but also help in predicting the occurrence of any natural calamities. It will help you to take appropriate measures beforehand and save many precious lives.

Let’s have a look at the below infographic to see all the domains where Data Science is creating its impression.

Data Science Use Cases - Edureka

Now that you have understood the need of Data Science, let’s understand what is Data Science.

Get Started With Data Science

What is Data Science?

Use of the term Data Science is increasingly common, but what does it exactly mean? What skills do you need to become Data Scientist? What is the difference between BI and Data Science? How are decisions and predictions made in Data Science? These are some of the questions that will be answered further.

First, let’s see what is Data Science. Data Science is a blend of various tools, algorithms, and machine learning principles with the goal to discover hidden patterns from the raw data. How is this different from what statisticians have been doing for years?

The answer lies in the difference between explaining and predicting.

Data Analyst v/s Data Science - Edureka

As you can see from the above image, a Data Analyst usually explains what is going on by processing history of the data. On the other hand, Data Scientist not only does the exploratory analysis to discover insights from it, but also uses various advanced machine learning algorithms to identify the occurrence of a particular event in the future. A Data Scientist will look at the data from many angles, sometimes angles not known earlier.

So, Data Science is primarily used to make decisions and predictions making use of predictive causal analytics, prescriptive analytics (predictive plus decision science) and machine learning.

Predictive causal analytics – If you want a model which can predict the possibilities of a particular event in the future, you need to apply predictive causal analytics. Say, if you are providing money on credit, then the probability of customers making future credit payments on time is a matter of concern for you. Here, you can build a model which can perform predictive analytics on the payment history of the customer to predict if the future payments will be on time or not.

Prescriptive analytics: If you want a model which has the intelligence of taking its own decisions and the ability to modify it with dynamic parameters, you certainly need prescriptive analytics for it. This relatively new field is all about providing advice. In other terms, it not only predicts but suggests a range of prescribed actions and associated outcomes.
The best example for this is Google’s self-driving car which I had discussed earlier too. The data gathered by vehicles can be used to train self-driving cars. You can run algorithms on this data to bring intelligence to it. This will enable your car to take decisions like when to turn, which path to take, when to slow down or speed up.

Machine learning for making predictions — If you have transactional data of a finance company and need to build a model to determine the future trend, then machine learning algorithms are the best bet. This falls under the paradigm of supervised learning. It is called supervised because you already have the data based on which you can train your machines. For example, a fraud detection model can be trained using a historical record of fraudulent purchases.

Machine learning for pattern discovery — If you don’t have the parameters based on which you can make predictions, then you need to find out the hidden patterns within the dataset to be able to make meaningful predictions. This is nothing but the unsupervised model as you don’t have any predefined labels for grouping. The most common algorithm used for pattern discovery is Clustering.
Let’s say you are working in a telephone company and you need to establish a network by putting towers in a region. Then, you can use the clustering technique to find those tower locations which will ensure that all the users receive optimum signal strength.

Let’s see how the proportion of above-described approaches differ for Data Analysis as well as Data Science. As you can see in the image below, Data Analysis includes descriptive analytics and prediction to a certain extent. On the other hand, Data Science is more about Predictive Causal Analytics and Machine Learning.

Data Science Analytics - Edureka

I am sure you might have heard of Business Intelligence (BI) too. Often Data Science is confused with BI. I will state some concise and clear contrasts between the two which will help you in getting a better understanding. Let’s have a look.

Business Intelligence (BI) vs. Data Science

BI basically analyzes the previous data to find hindsight and insight to describe the business trends. BI enables you to take data from external and internal sources, prepare it, run queries on it and create dashboards to answer the questions like quarterly revenue analysis or business problems. BI can evaluate the impact of certain events in the near future.

Data Science is a more forward-looking approach, an exploratory way with the focus on analyzing the past or current data and predicting the future outcomes with the aim of making informed decisions. It answers the open-ended questions as to “what” and “how” events occur.

Let’s have a look at some contrasting features.

Features	Business Intelligence (BI)	Data Science
Data Sources	Structured (Usually SQL, often Data Warehouse)	Both Structured and Unstructured ( logs, cloud data, SQL, NoSQL, text)
Approach	Statistics and Visualization	Statistics, Machine Learning, Graph Analysis, Neuro- linguistic Programming (NLP)
Focus	Past and Present	Present and Future
Tools	Pentaho, Microsoft BI, QlikView, R	RapidMiner, BigML, Weka, R

This was all about what is Data Science, now let’s understand the lifecycle of Data Science.

A common mistake made in Data Science projects is rushing into data collection and analysis, without understanding the requirements or even framing the business problem properly. Therefore, it is very important for you to follow all the phases throughout the lifecycle of Data Science to ensure the smooth functioning of the project.

Learn Data Science From Experts

Lifecycle of Data Science

Here is a brief overview of the main phases of the Data Science Lifecycle:

Lifecycle of Data Science - Edureka

Phase 1—Discovery: Before you begin the project, it is important to understand the various specifications, requirements, priorities and required budget. You must possess the ability to ask the right questions. Here, you assess if you have the required resources present in terms of people, technology, time and data to support the project. In this phase, you also need to frame the business problem and formulate initial hypotheses (IH) to test.

Phase 2—Data preparation: In this phase, you require analytical sandbox in which you can perform analytics for the entire duration of the project. You need to explore, preprocess and condition data prior to modeling. Further, you will perform ETLT (extract, transform, load and transform) to get data into the sandbox. Let’s have a look at the Statistical Analysis flow below.

You can use R for data cleaning, transformation, and visualization. This will help you to spot the outliers and establish a relationship between the variables. Once you have cleaned and prepared the data, it’s time to do exploratory analytics on it. Let’s see how you can achieve that.

Phase 3—Model planning: Here, you will determine the methods and techniques to draw the relationships between variables. These relationships will set the base for the algorithms which you will implement in the next phase. You will apply Exploratory Data Analytics (EDA) using various statistical formulas and visualization tools.

Let’s have a look at various model planning tools.

Model planning tools in Data Science - Edureka

R has a complete set of modeling capabilities and provides a good environment for building interpretive models.
SQL Analysis services can perform in-database analytics using common data mining functions and basic predictive models.
SAS/ACCESS can be used to access data from Hadoop and is used for creating repeatable and reusable model flow diagrams.

Although, many tools are present in the market but R is the most commonly used tool.

Now that you have got insights into the nature of your data and have decided the algorithms to be used. In the next stage, you will apply the algorithm and build up a model.

Phase 4—Model building: In this phase, you will develop datasets for training and testing purposes. You will consider whether your existing tools will suffice for running the models or it will need a more robust environment (like fast and parallel processing). You will analyze various learning techniques like classification, association and clustering to build the model.

You can achieve model building through the following tools.

Model building tools in Data Science

Phase 5—Operationalize: In this phase, you deliver final reports, briefings, code and technical documents. In addition, sometimes a pilot project is also implemented in a real-time production environment. This will provide you a clear picture of the performance and other related constraints on a small scale before full deployment.

Phase 6—Communicate results: Now it is important to evaluate if you have been able to achieve your goal that you had planned in the first phase. So, in the last phase, you identify all the key findings, communicate to the stakeholders and determine if the results of the project are a success or a failure based on the criteria developed in Phase 1.

Now, I will take a case study to explain you the various phases described above.

Case Study: Diabetes Prevention

What if we could predict the occurrence of diabetes and take appropriate measures beforehand to prevent it?
In this use case, we will predict the occurrence of diabetes making use of the entire lifecycle that we discussed earlier. Let’s go through the various steps.

Step 1:

First, we will collect the data based on the medical history of the patient as discussed in Phase 1. You can refer to the sample data below.

Data Science sample data - Edureka

As you can see, we have the various attributes as mentioned below.

Attributes:

npreg – Number of times pregnant
glucose – Plasma glucose concentration
bp – Blood pressure
skin – Triceps skinfold thickness
bmi – Body mass index
ped – Diabetes pedigree function
age – Age
income – Income

Step 2:

Now, once we have the data, we need to clean and prepare the data for data analysis.
This data has a lot of inconsistencies like missing values, blank columns, abrupt values and incorrect data format which need to be cleaned.
Here, we have organized the data into a single table under different attributes – making it look more structured.
Let’s have a look at the sample data below.

Data Science inconsistent data - Edureka

This data has a lot of inconsistencies.

In the column npreg, “one” is written in words, whereas it should be in the numeric form like 1.
In column bp one of the values is 6600 which is impossible (at least for humans) as bp cannot go up to such huge value.
As you can see the Income column is blank and also makes no sense in predicting diabetes. Therefore, it is redundant to have it here and should be removed from the table.

So, we will clean and preprocess this data by removing the outliers, filling up the null values and normalizing the data type. If you remember, this is our second phase which is data preprocessing.
Finally, we get the clean data as shown below which can be used for analysis.

Data Science consistent data - Edureka

Step 3:

Now let’s do some analysis as discussed earlier in Phase 3.

First, we will load the data into the analytical sandbox and apply various statistical functions on it. For example, R has functions like describe which gives us the number of missing values and unique values. We can also use the summary function which will give us statistical information like mean, median, range, min and max values.
Then, we use visualization techniques like histograms, line graphs, box plots to get a fair idea of the distribution of data.

Data Science visualization - Edureka

Step 4:

Now, based on insights derived from the previous step, the best fit for this kind of problem is the decision tree. Let’s see how?

Since, we already have the major attributes for analysis like npreg, bmi, etc., so we will use supervised learning technique to build a model here.
Further, we have particularly used decision tree because it takes all attributes into consideration in one go, like the ones which have a linear relationship as well as those which have a non-linear relationship. In our case, we have a linear relationship between npreg and age, whereas the nonlinear relationship between npreg and ped.
Decision tree models are also very robust as we can use the different combination of attributes to make various trees and then finally implement the one with the maximum efficiency.

Let’s have a look at our decision tree.

Here, the most important parameter is the level of glucose, so it is our root node. Now, the current node and its value determine the next important parameter to be taken. It goes on until we get the result in terms of pos or neg. Pos means the tendency of having diabetes is positive and neg means the tendency of having diabetes is negative.

If you want to learn more about the implementation of the decision tree, refer this blog How To Create A Perfect Decision Tree

Step 5:

In this phase, we will run a small pilot project to check if our results are appropriate. We will also look for performance constraints if any. If the results are not accurate, then we need to replan and rebuild the model.

Step 6:

Once we have executed the project successfully, we will share the output for full deployment.

Being a Data Scientist is easier said than done. So, let’s see what all you need to be a Data Scientist. A Data Scientist requires skills basically from three major areas as shown below.

Data Science skills - Edureka

As you can see in the above image, you need to acquire various hard skills and soft skills. You need to be good at statistics and mathematics to analyze and visualize data. Needless to say, Machine Learning forms the heart of Data Science and requires you to be good at it. Also, you need to have a solid understanding of the domain you are working in to understand the business problems clearly. Your task does not end here. You should be capable of implementing various algorithms which require good coding skills. Finally, once you have made certain key decisions, it is important for you to deliver them to the stakeholders. So, good communication will definitely add brownie points to your skills.

In the end, it won’t be wrong to say that the future belongs to the Data Scientists. It is predicted that by the end of the year 2018, there will be a need of around one million Data Scientists. More and more data will provide opportunities to drive key business decisions. It is soon going to change the way we look at the world deluged with data around us. Therefore, a Data Scientist should be highly skilled and motivated to solve the most complex problems.

l hope you enjoyed reading my blog and understood what is Data Science. Check out our Data Science certification training here, that comes with instructor-led live training and real-life project experience.

Check Out Our Data Science Course

The post What Is Data Science? A Beginner’s Guide To Data Science appeared first on Edureka Blog.

salesforce certifications - edureka Are you considering a career in Salesforce and not sure which path to take for a career growth? Then its time for you to consider Salesforce certifications. In my previous blog, we have seen what Salesforce is, why organizations should choose Salesforce and where Salesforce technology has been successfully applied. Now, you may be wondering, how Salesforce can be beneficial to your career.

In this blog, I will give you a detailed explanation on why you should learn Salesforce and introduce you to numerous Salesforce credentials. Then, we’ll look at the frequently taken Salesforce certifications and their significance in today’s technology market. Finally, I will provide you with details about the career opportunities available in Salesforce.

Why Should You Learn Salesforce?

In the past, you may have learnt different technologies and later found out that you can’t do much using it yourself or in the industry. Salesforce isn’t like other technologies which are difficult to learn and even more difficult to get experienced in. Salesforce is a skill that you can learn whenever you want, wherever you want and get recognized across the world. It is a skill that thousands of companies use to make their business successful.

salesforce learning - salesforce certifications - edureka

As seen in the above image, learning Salesforce is advantageous and can provide a boost to your career. If you are a professional looking for a technology or career change, then you should definitely consider Salesforce for the following reasons:

Salesforce has a customer base of over 150,000 companies comprising of large scale, medium scale and small scale enterprises which require Salesforce professionals. Source: www.expandedramblings.com
Big companies like Facebook, Google, Twitter, General Electric, HCL and others use Salesforce. Learning Salesforce can help you get recognized and land a job in such top companies. Source: www.Salesforce.com
Learning Salesforce can help you increase your salary.
Salesforce technology is being used by organizations across the world, so you can easily find opportunities everywhere.

Why Salesforce Certifications?

As we have seen, Salesforce is one of the hottest technologies out there. Mentioning Salesforce on your resume is definitely a plus point and makes you stand out from the rest. You might be wondering what is the point of getting certified and what is that you gain by getting certified. Well, here are the reasons as to why you should get Salesforce certified:

Salesforce certifications give credibility to your Salesforce knowledge and expertise. As the standard of Salesforce certifications is high, you can take it for granted that a person who has been certified is an expert in that field.
Currently, there is a high demand for people with Salesforce certifications. If you are a Salesforce certified professional, you’ll be able to easily attract employers.
Organizations need certified people as they attract clients. Clients generally prefer companies whose employees are certified, as it gives them assurance with regard to quality.
If you are a person who is looking for a change in role or advancing your career, then getting certified will provide you with that opportunity.

salesforce job and career graph - salesforce certifications - edureka

Above are two graphs from www.indeed.com, which will give you the current trend in the job market. As you can see from the graph on the left, the number of Salesforce job postings has significantly increased in the last few years. From the graph on the right, we can infer that over the past 3 years, there has been a steep rise in companies interested in Salesforce professionals. The graph below from www.jobgraphs.com provides you with the information on the average yearly salary a Salesforce professional receives in different countries. As you can see Salesforce is a good technology to bet your future on.

salary graph - salesforce certifications - edureka

Learn Salesforce From Expert

Salesforce Certifications

You’ll be surprised to know that millions of people across the world apply for Salesforce certifications every year. Owing to this high number, Salesforce has a separate entity called the Salesforce University to handle certifications and release exams.

Now that you know why it would be useful for you to learn Salesforce and the significance of Salesforce certifications, let’s see all the certifications that Salesforce has to offer and the ones which are relevant to you.

salesforce credentials - salesforce certifications - edureka

The above image shows all the Salesforce credentials. Yes, that’s a lot of credentials that you can go for. But, depending on what you want to pursue, we can divide the frequently taken certifications into two tracks:

Administrator/ Implementation Track
Developer Track

The image below gives you a better idea of what each of these tracks comprises of. I will discuss about each one of these tracks in detail.

salesforce roadmap - salesforce certifications - edureka Administrator/ Implementation Track

In organizations which use Salesforce products, a Salesforce administrator plays an important role. It is not like your regular administrative roles like network administrator or database administrator whose job is to just maintain users and change passwords. A Salesforce administrator has immense responsibilities and needs to have a complete understanding about the business and its functionalities. If you are a Salesforce administrator or an implementation expert you would be a functional consultant working alongside your organization’s customers and solving their problems.

Let us take a look at the certifications that people frequently take up in the Administration/ Implementation track:

1. Salesforce Certified Administrator – If you are a fresher or someone who wants to start off as an administrator or consultant, then you should be looking at being a Salesforce Certified Administrator. To be a Salesforce administrator, you need to have general knowledge of Salesforce features. Having a Salesforce Administrator Certificate proves that:

You have a broad knowledge of Salesforce application.
You know how to configure and manage Sales and Service cloud applications.
You know how to manage Salesforce CRM and have good presentation skills.
You are good with basic functionalities like managing users, development and customizations.

about the exam - salesforce certifications - edureka

The image above gives you the details about the Salesforce Administrator Exam. Below is the weightage distribution of topics on which questions will be asked in the certification examination. topics weightage - salesforce certifications - edureka

2. Salesforce Certified Advanced Administrator – To be a Salesforce Certified Advanced Administrator you first need to be a Salesforce Certified Administrator. Along with this, you need 2-5 years of experience as a Salesforce administrator and be capable of using the full set of Salesforce features. Having a Salesforce Advanced Administrator certificate proves that:

You are comfortable with designing robust and scalable applications for clients.
You have handled a fair share of Salesforce Orgs.

about the exam - salesforce certifications - edureka

The above image gives you the details about the Salesforce Advanced Administrator Exam. Below is the weightage distribution of topics on which questions will be asked in the certification examination. advanced administrator weightage - salesforce certifications - edureka

3. Salesforce Sales Cloud Consultant – To be a Salesforce Sales Cloud Consultant you first need to be a Salesforce Certified Administrator. Along with this, you need to have experience in designing solutions that optimize the Sales cloud functionality, experience working with sales and marketing organizations. Generally, people who go for Sales cloud consultant certification have 2-5 years of experience as a senior business analyst. Having Salesforce Sales Cloud certificate proves that:

You are proficient in understanding customer needs.
You can design and maintain Sales cloud application as per customer business requirements.

about the exam - salesforce certifications - edureka

The above image gives you the details about the Salesforce Sales Cloud Consultant Exam. Below is the weightage distribution of topics on which questions will be asked in the certification examination. sales cloud weightage - salesforce certifications - edureka

4. Salesforce Service Cloud Consultant – To be a Salesforce Service Cloud Consultant you first need to be a Salesforce Certified Administrator. Along with this, you need to have experience in designing solutions using Service cloud. Generally, people who go for Service cloud consultant certification have 2-5 years of experience as a senior business analyst and industry expertise in Salesforce applications. Having Salesforce Service Cloud certificate proves that:

You are an implementation expert in Service cloud solutions as per customer needs.

about the exam - salesforce certifications - edureka

The above image gives you the details about the Salesforce Service Cloud Consultant Exam. Below is the weightage distribution of topics on which questions will be asked in the certification examination. service cloud weightage - salesforce certifications - edureka

Typically, people choose to first become a Salesforce Certified Administrator, which would suit you if you are looking for an entry level role. Then they move on to either Advanced Administrator or an Implementation Expert certification.

Developer Track

Generally, professionals who choose the developer track are the ones who have some experience in developing applications in languages like JAVA, PHP, C++, C#, AngularJS or other programming languages. If you fall in this category, then Salesforce provides you with a wide range of opportunities to build and advance your career.

Let us take a look at the certifications that people take up frequently in the developer track:

1. Salesforce Certified Platform App Builder – This certification is for someone who has experience in developing custom applications on the Force.com Platform. Having Salesforce Platform App Builder certificate proves that:

You are comfortable with the declarative part of Salesforce.
You are comfortable with lightning experience and the lightning app builder.

about the exam -salesforce certifications - edureka

The above image gives you the details about the Salesforce Platform App Builder Exam. Below is the weightage distribution of topics on which questions will be asked in the certification examination. app builder - salesforce certifications - edureka

2. Salesforce Certified Platform Developer I – If you are a developer with 6 months to 1 year of experience developing applications on any platform, then you should be looking at becoming a Salesforce Certified Platform Developer I. You should have at least 6 months of experience on the Force.com platform. Having Salesforce Platform Developer I certificate proves that:

You are proficient in building custom applications on the Force.com platform.
You have basic knowledge of Apex and Visualforce.

about the exam - slaesforce certifications - edureka

The above image gives you the details about the Salesforce Platform Developer I Exam. Below is the weightage distribution of topics on which questions will be asked in the certification examination. platform developer 1 - salesforce certifications - edureka

3. Salesforce Certified Platform Developer II – To become Salesforce Certified Platform Developer II, you first need to be Salesforce Certified Platform Developer I. Generally, people who go for this certification have 2-4 years of experience as a developer and at least 1 year of experience on the Force.com platform. Having Salesforce Platform Developer II certificate proves that:

You have mastered Apex and Visualforce.
You have good idea about SOAP, REST APIs, Heroku and building Lightning components.

about the exam - salesforce certifications - edureka

The above image gives you the details about the Salesforce Platform Developer II Exam. Below is the weightage distribution of topics on which questions will be asked in the certification examination. platform developer 2 - salesforce certifications - edureka

Typically, people who are looking to start their careers in a development role, Salesforce App Builder or Salesforce Platform Developer I would be the right starting points. If you get certified as a Salesforce App Builder then you have options of furthering your career by getting certified in the Architect role. If you choose Salesforce Platform Developer I, then you can advance your career by becoming a Salesforce Platform Developer II. These are the options that are available for you in the developer track.

Careers In Salesforce

Salesforce being the number one CRM company in the world, professionals who work on it are in high demand. Salesforce has been growing at a rapid pace over the last decade. Over the next five years, the field is expected to grow by 25% or more. According to Glassdoor, Salesforce is one of the top 10 skills in the market and over 4,000 jobs are available for Salesforce professionals. If you are a person looking for a technical role, then you would be interested in jobs like administrator, engineer, developer, analyst, cloud professional, technical support and more. The below image will give you a sense of the different roles available for a Salesforce professional and their average salary.

salesforce salary - salesforce certifications - edureka

Source: www.salesforce.com

From this blog, I hope you have got a good idea on the different Salesforce certifications that are available in the market and which certifications/ tracks are suitable and relevant for you. Feel free to leave any questions you have in the comment box below.

<< Previous Next >>

Check out our Salesforce Certification Training, which will enable you to understand the basic concepts related to Salesforce. The course curriculum is aligned with the administrator and developer certifications. This course will provide you a 360 degree view of all Salesforce features, with instructor-led live training and real life project experience.

Check Out Salesforce Course

The post Salesforce Certifications: Jump-Start Your Career In Salesforce appeared first on Edureka Blog.

Salesforce Service Cloud - Edureka Salesforce being a CRM is used to connect people and information. In this blog, I am going to explain one of the core service – Salesforce Service Cloud and how it revolutionized customer support by making interactions easier between an organization and its customers. In my previous blog, you learned how to create a custom Salesforce Application. Moving forward, I will help you understand how Salesforce Service Cloud can add value to your business. First, I will explain the need for Salesforce Service Cloud, what it is and what all services it provides to engage your customers. In the end, I will explain one use case on how Coca-Cola has been extremely successful in enhancing their customer’s experience using Service Cloud.

So, let’s get started with why your organization should choose Salesforce Service Cloud.

Why Salesforce Service Cloud?

If your company deeply cares about the customer service, then Salesforce Service Cloud is what you should go for. Irrespective of whether you are in B2C or B2B domain, you will have several customers raising tickets and queries on a regular basis. These tickets will be received by your service agents. Salesforce Service Cloud helps you in tracking and solving these tickets efficiently.
This is not the only way how you can transform customer experience. Let’s dig deeper and see how Salesforce Service Cloud is creating an impression.

Maximize Agent Productivity – Using Service Cloud, agents can work from anywhere. With the easy management options available (such as web-based application, mobile device, knowledge base) , the agent productivity is enhanced leading to reduction of overhead costs of agents.
Transforms Customer experience – Customer relations are drastically enhanced – connecting one to one with every customer via live agents. You can increase your customer loyalty, satisfaction and customer retention, leading to repeat business from existing customers, increase in LTV (Lifetime value) of your customers, positive word of mouth for your brand.
Security – Your data is completely safe and secure with the Service Cloud platform. It follows a multilayered approach to protect the information which is vital to your business.
Leverage Social Media Platforms – You can also interact with your customers on social media such as Facebook or Twitter in real-time.
Case Tracking – Tracking helps you in faster case resolution. This leads to better management of a person’s day to day activities and manual errors are drastically reduced.

To sum up, Salesforce Service Cloud definitely helps in improving your operational processes leading to better experience for your customers. Based on a study done across companies using Salesforce Service Cloud, growth in performance metric has been drastically increased. If you see the below infographic, agent productivity increased by 40%, case resolution increased by 41%, which eventually led to a 31% increase in customer retention.

Growth in performance using Salesforce Service Cloud- Edureka

This growth illustrates why people prefer Salesforce Service Cloud and how it plays an important role in improving your customer support team.

Now let’s understand what Salesforce Service Cloud is and what services it has to offer.

What is Salesforce Service Cloud?

Salesforce offers Service Cloud as Software as a Service. Service Cloud is built on the Salesforce Customer Success Platform, giving you a 360-degree view of your customers and enabling you to deliver smarter, faster and more personalized service.

With Salesforce Service Cloud, you can create a connected knowledge base, enable live agent chat, manage case interactions – all at one platform. You can have personalized customer interactions or even up-sell your products/ services based on his/her past activity data.

Now, you may be wondering how to access Service Cloud. Let me walk you through the steps to access a Service Cloud Console.
Step 1: Login to login.salesforce.com
Step 2: Create a SF Console App
Step 3: Choose its display
Step 4: Customize push notifications
Step 5: Grant users Console Access – Sc User

Get Started With Salesforce

What services it offers?

As I had mentioned earlier, there are case tracking and knowledge base features. There are several other services that Salesforce Service Cloud offer which will enable you to provide a differentiated customer experience. You can refer the below image to see what Salesforce Service Cloud has to offer you.

Salesforce Service Cloud - Edureka

You can take your console to the next level by learning the following features in Salesforce:

Case Management Salesforce Service Cloud - Edureka
Case Management – Any customer issues raised are usually captured and tracked as cases. Cases can be further classified into the following:

Email-To-Case: Email-To-Case helps you create a case automatically when an email is sent to one of your company’s email addresses, such as support@edureka.co. These generated cases will be displayed in an ‘Emails related list’. This Emails related list includes all emails sent by your customer on a particular case, as well as the email threads.
Web-to-Case: Web-to-case helps you create a new case automatically in Salesforce whenever a support request comes directly from your company’s website. To enable it, you can go to Setup → Build → Self-service → Web-to-case settings.
Check the “Enable Web-to-Case” checkbox. You can select an Auto-response template and select the default case origin as ‘Web’.
Escalation and Auto-Response: Case escalation rules are used to reassign and optionally notify individuals when a case is not closed within a specified time period. Also, you can configure auto-response rules to respond to cases either from the web or email.

At the core of the Service Cloud lies the ‘Case’ module. Let us understand the Case module with an example. Assume in a large organization like Coca-Cola, few of the employees’ systems get crashed, let’s call it as ‘breakdown of laptops’. Now you need to fix this as soon as possible to ensure business continuity. Service Cloud helps you track the progress and provides you with all the necessary information of every Coca-Cola agent. You can solve the problem by creating a case. You can then assign them as ‘high’ priority and also categorize the origin of this case (such as phone, email or web) and then click on ‘Save’. Refer the below screenshot to get a better understanding.

New case in Salesforce Service Cloud - Edureka

Solutions in Salesforce Service Cloud - Edureka
Solutions – You can categorize your solutions into query types – making your solution search easier and closing the case faster. With this, the agent does not need to create a new solution to existing queries every time. This helps in enhancing your agent productivity. Solutions do not need any additional license.

For the same Coca-Cola scenario, if you want to solve a case as an agent, then you will definitely search for a solution. Firstly, you can check whether the solution has been already present or not. If it is not present, then your admin can create a solution stating that the case has been resolved and hence can be closed. You can refer to the screenshot attached below.

laptop solution in Salesforce Service Cloud- Edureka

As you can see in the above screenshot, I have created a solution- ‘Laptop Solution’ that displays the title, status and the details of the solution created.

Salesforce Knowledge - Edureka
Knowledge – Salesforce Knowledge is a knowledge base where users can edit, create and manage content. Knowledge articles are documents of information. Customers can go to the company’s website and search for solutions. Knowledge articles can be associated with a case before it is closed unlike solutions. Salesforce Knowledge needs a separate license to be purchased.

Salesforce Communities - Edureka
Communities – Communities are a way to collaborate with business partners and customers, distributors, resellers and suppliers who are not part of your organization. Typically, these are the people who are not your regular SFDC users, but you want to provide them some channel to connect with your organization and provide them access to some data as well.

In Salesforce, if you go to the ‘Call Center’ dropdown, you will find Success Community. A Salesforce user can use their user id and password to login there. This community is accessible to all the developers, functional consultants or admins. In this community, user can search anything as it has a lot of things like documentation, articles, knowledge, feed, questions and many more. For example: If you want to know about record type, then you can search here. Have a look at the screenshot attached below.

Salesforce Service Community - Edureka

As you can see in the above search, you got a lot of customer’s problems, documentation, known issues, ideas etc. You can now start exploring them, understand the major issues faced by the customers and fix them accordingly.

Salesforce Service Cloud Console - Edureka
Console – Agent console provides unified agent experience. It reduces response time by placing all the information together. In a console, you can find everything from customer profiles, to case histories, to dashboards – all in one place.

As I have shown you the basics of how to set up a Salesforce console in the beginning of this blog. Admin can grant Console Access to the users, Service Cloud gives you the console access where you can assign users to it. Refer the below screenshot, you can assign user profile for the console. Also, you can assign the Service Cloud user license to agents with those profiles so that they can start using your console.

Console Access Salesforce Service Cloud - Edureka

Salesforce Service Social Media - Edureka
Social Media – Service Cloud lets you leverage social media platforms such as Facebook, Twitter to engage visitors. With Salesforce Social Studio, customer requests are escalated directly to your social service team. Social media plays an important role in bridging the gap in virtual world, engaging them in real time.

Salesforce Live Agent - Edureka Live Agent – Live agents deal with 1:1 customer interaction. Agents can provide answers faster with customer chat and keyboard shortcuts. They stay totally connected to the customers as their team members are alerted immediately to get the issue resolved. Also, it makes the agents smarter and more productive in the process with real-time assistance. This in turn improves customer satisfaction.

Salesforce Service Cloud is all about providing services to your customers and building a relationship with them. You can use other features such as call center, email & chat, phone, google search, contracts and entitlements, chatter and call Scripting.

Learn Salesforce From The Experts

How Much Salesforce Service Cloud Cost?

Salesforce Service Cloud offers three pricing packages- Professional, Enterprise and Unlimited. You can refer to the table below and select your plan accordingly.

Professional – $75 USD/user/month	Enterprise – $150 USD/user/month	Unlimited – $300 USD/user/month
Case management Service contracts and entitlements Single Service Console app Web and email response Social customer service Lead-contact account management Order management Opportunity tracking Chatter collaboration Customizable reports and dashboards CTI integration Mobile access and administration Limited process automation Limited number of record types, profiles, and role permission sets Unlimited apps and tabs	Advanced case management Multiple Service Console apps Workflow and approvals Integration via web service API Enterprise analytics Call scripting Offline access Salesforce Identity Salesforce Private AppExchange Custom app development Multiple sandboxes Knowledge base Live Agent web chat Customer Community Live video chat (SOS)	Live Agent web chat Knowledge base Additional data storage Expanded sandbox environments 24/7 toll-free support Access to 100+ admin services Unlimited online training Customer Community Live video chat (SOS)

“Our agents love Salesforce CRM Service. They tell us how easy it is to use and how phenomenal it is when it comes to driving a better customer experience” – Charter

This is how Salesforce Service Cloud has revolutionized the way customers interact with organizations using the services over the internet. Now, let’s have a look at how Coca-Cola implemented Salesforce Service Cloud to solve its business challenges.

Salesforce Service Cloud Use Case: Coca-Cola

coca-coca Many global organizations leverage Salesforce Service Cloud for a better customer relationship management solution. Here, I will talk about how Coca-Cola Germany used Service Cloud to analyze consumer behavior and build data driven business strategies. This use case will give you an idea on how Service Cloud can be used extensively across any domain.
Salesforce Service Cloud is an integrated platform to connect employees, customers, and suppliers around the world.

Earlier, Coca-Cola was facing several issues while managing their customers. Some of them are listed below:

The company’s in-house repair facility formerly had technicians who were tracking their jobs on paper. They took a lot of time and effort.
Call center and repair department suffered from frequent downtime.
Lack of speed, functionality, scalability and connectivity with a fully-mobile experience.
Slow mobile app sync-up.
Overall unsatisfactory user experience.

“In the past, big companies outcompeted smaller companies. But that’s history. Today, the fast companies outcompete the slow companies,” explained Ulrik Nehammer – CEO of Coca-Cola.

Now when they are connected to the Salesforce Service Cloud, technicians are alerted in real-time on customer issues. This helps reduce response time dramatically. Also, call center support agents receive instant access to customer history. With all of this, productivity of Coca-Cola Germany’s technical services has shot up by 30%.

A Big Fix for Coca-Cola

With the Service Cloud, they wanted to understand their customers’ need and cater to them more effectively. Here are some key points that contributed to their excellence.

Customer satisfaction – One to one support to customers through any channels or product with services for app like video chat or agents instantly guiding them to solutions.
Mobile App – Using app mobile support, customers can interact via live agent video chat, screen sharing and on-screen guided assistance. These services transform customer support resulting in making their customers happy.
Analytics – Using Salesforce Service Cloud, all information is gathered and evaluated through custom dashboard. Coca-Cola performed analysis to check the past transactions and immediately took action at the location they serve. This helped them in making better and profitable decisions in lesser time.
Agent productivity is supercharged – With features such as email-to-case, skills-based routing, milestone tracking, Service Cloud gave their agents the tool to respond quickly and efficiently to customers on any channel. This is how Coca-Cola has enhanced the overall productivity.

“This has been a massive step forward for us,” said Andrea Malende, business process expert and mobile solutions in Coca-Cola. “I’m amazed how quick and smooth the implementation was.”

This is how Coca-Cola implemented Salesforce Service Cloud thus making their customers happy. There are several other Salesforce Service Cloud use case stories which show how various companies have benefited and grown their business.

Integrations available for Salesforce Service Cloud

Salesforce Service Cloud supports integration with various application and business system as shown in the image below:

Integrations in Salesforce Service Cloud - Edureka

Since everyone and everything is connected on one platform, you should definitely go for Salesforce Service Cloud. Hope you enjoyed reading my blog, stay tuned for my next blog in this series!

<< Previous

Check out our Salesforce Certification Training, which comes with instructor-led live training and real life project experience. Feel free to leave any questions you have in the comment box below.

Check Out Our Course

The post Salesforce Service Cloud – One Stop Solution For Customer Needs appeared first on Edureka Blog.

k-means clustering - Edureka We do understand that not all customers are alike and have the same taste. So, this leads to the challenge of marketing the right product to the right customer. An offer or product which might entice a particular customer segment may not be very helpful to other segments. So, you can apply k-means clustering algorithm to segment your entire customer audience into groups with similar traits and preferences based on various metrics (such as their activities, likes and dislikes on social media and their purchase history). Based on these customer segments identified, you can create personalized marketing strategies and bring more business to your organisation.

I hope you enjoyed reading my previous blog – What is Data Science which covers Machine Learning and the lifecycle of Data Science in detail. Before delving into k-means clustering directly, I will be covering following topics to give you a basic understanding of clustering.

Introduction to Machine Learning
The need of clustering with examples
What is clustering?
Types of clustering
k-means clustering
Hands-on: Implementation of k-means clustering on movie dataset using R. Cluster formation of movies based on their business and popularity among viewers.

Machine Learning is one of the most recent and exciting technologies. You probably use it dozen of times a day without even knowing it. Machine Learning is a type of artificial intelligence that provides computers with an ability to learn without being explicitly programmed. It works on supervised and unsupervised learning models. Unlike supervised learning model, the unsupervised model of Machine Learning has no predefined groups under which you can distribute your data. You can find these groupings through clustering. I will explain it further through the following examples.

Unlabeled data - k-means Clustering - Edureka

As you can see in this image, the data points are shown as blue dots. These data points do not have labels based on which you can differentiate them. You do not know anything about this data. So now the question is, can you find out any structure in this data? This problem can be solved using clustering technique. Clustering will divide this entire dataset under different labels (here called clusters) with similar data points into one cluster as shown in the graph given below. It is used as a very powerful technique for exploratory descriptive analysis.

Clustering - Edureka Here, the clustering technique has partitioned the entire data points into two clusters. The data points within a cluster are similar to each other but different from other clusters. For example, you have the data on symptoms of patients. Now, you can find out the name of a particular disease based on these symptoms.

Let’s understand clustering further with an example of google news.

What google news does is that every day with hundreds and thousands of news coming up on the web, it groups them into cohesive news stories. Let’s see how?

Once you go to news.google.com, you will see numerous news stories grouped as shown below.

google news cluster - Edureka

They are grouped into different news stories. Here, if you see the red highlighted area, you will get to know that various news URLs related to Trump and Modi are grouped under one section and rest in other sections. On clicking different URL from the group, you will get a different story on the same topic. So, google news automatically clusters new stories about the same topic into pre-defined clusters.

genes cluster - Edureka Another very fascinating application of clustering is in genomics. Genomics is the study of DNA. As you can see in the image, different colors like red, green and grey depict the degree to which individual does or does not has a specific gene. So, you can run clustering algorithm on the DNA data of a group of people to create different clusters. This can give you very valuable insights into the health of particular genes.

For example, people with Duffy-negative genotype tend to have higher resistance to malaria and are generally found in African regions. So, you can draw a relationship between the genotype, the native habitat and find out their response to particular diseases.

So, basically clustering partitions the dataset with similarities into different groups which can act as a base for further analysis. The result will be that objects in one group will be similar to one another but different from objects in another group.

Get Started With Data Science

Now, once you have understood what is clustering, let’s look at different ways to achieve these clusters.

Exclusive Clustering: Exclusive clustering - Edureka In exclusive clustering, an item belongs exclusively to one cluster, not several. In the image, you can see that data belonging to cluster 0 does not belong to cluster 1 or cluster 2. k-means clustering is a type of exclusive clustering.

Overlapping clustering - Edureka Overlapping Clustering: Here, an item can belong to multiple clusters with different degree of association among each cluster. Fuzzy C-means algorithm is based on overlapping clustering.

hierarchical clustering - Edureka

Hierarchical Clustering: In hierarchical clustering, the clusters are not formed in a single step rather it follows series of partitions to come up with final clusters. It looks like a tree as visible in the image.

While implementing any algorithm, computational speed and efficiency becomes a very important parameter for end results. So, I have explained k-means clustering as it works really well with large datasets due to its more computational speed and its ease of use.

k-means Clustering

k-means clustering is one of the simplest algorithms which uses unsupervised learning method to solve known clustering issues. k-means clustering require following two inputs.

k = number of clusters
Training set(m) = {x1, x2, x3,……….., xm}

Let’s say you have an unlabeled data set like the one shown below and you want to group this data into clusters.

unlabeled data - Edureka

Now, the important question is how should you choose the optimum number of clusters? There are two possible ways for choosing the number of clusters.

(i) Elbow Method: Here, you draw a curve between WSS (within sum of squares) and the number of clusters. It is called elbow method because the curve looks like a human arm and the elbow point gives us the optimum number of clusters. As you can see that after the elbow point, there is a very slow change in the value of WSS, so you should take the elbow point value as the final number of clusters.

Elbow method - Edureka

(ii) Purpose Based: You can run k-means clustering algorithm to get different clusters based on a variety of purposes. You can partition the data on different metrics and see how well it performs for that particular case. Let’s take an example of marketing T-shirts of different sizes. You can partition the dataset into different number of clusters depending upon the purpose that you want to meet. In the following example, I have taken two different criteria, price and comfort.

Let’s see these two possibilities as shown in the image below.

T shirt clusters - Edureka

K=3: If you want to provide only 3 sizes(S, M, L) so that prices are cheaper, you will divide the data set into 3 clusters.
K=5: Now, if you want to provide more comfort and variety to your customers with more sizes (XS, S, M, L, XL), then you will divide the data set into 5 clusters.

Now, once we have the value of k with us, let’s understand it’s execution.

Initialisation:
Firstly, you need to randomly initialise two points called the cluster centroids. Here, you need to make sure that your cluster centroids depicted by an orange and blue cross as shown in the image are less than the training data points depicted by navy blue dots. k-means clustering algorithm is an iterative algorithm and it follows next two steps iteratively. Once you are done with the initialization, let’s move on to the next step.

Cluster Assignment:
In this step, it will go through all the navy blue data points to compute the distance between the data points and the cluster centroid initialised in the previous step. Now, depending upon the minimum distance from the orange cluster centroid or blue cluster centroid, it will group itself into that particular group. So, data points are divided into two groups, one represented by orange color and the other one in blue color as shown in the graph. Since these cluster formations are not the optimised clusters, so let’s move ahead and see how to get final clusters.

Move centroid - Edureka

Move Centroid:
Now, you will take the above two cluster centroids and iteratively reposition them for optimization. You will take all blue dots, compute their average and move current cluster centroid to this new location. Similarly, you will move orange cluster centroid to the average of orange data points. Therefore, the new cluster centroids will look as shown in the graph. Moving forward, let’s see how can we optimize clusters which will give us better insight.
Optimization:
You need to repeat above two steps iteratively till the cluster centroids stop changing their positions and become static. Once the clusters become static, then k-means clustering algorithm is said to be converged.

Convergence:
Finally, k-means clustering algorithm converges and divides the data points into two clusters clearly visible in orange and blue. It can happen that k-means may end up converging with different solutions depending on how the clusters were initialised.

As you can see in the graph below, the three clusters are clearly visible but you might end up having different clusters depending upon your choice of cluster centroids.

Cluster partition - Edureka

Below shown are some other possibilities of cluster partitioning based on the different choice of cluster centroids. You may end up having any of these groupings based on your requirements and the goal that you are trying to achieve.

Cluster partition variety - Edureka

Now that you have understood the concepts of clustering, so let’s do some hands-on in R.

Learn Data Science From Experts

k-means clustering case study: Movie clustering

Let’s say, you have a movie dataset with 28 different attributes ranging from director facebook likes, movie likes, actor likes, budget to gross and you want to find out movies with maximum popularity amongst the viewers. You can achieve this by k-means clustering and divide the entire data into different clusters and do further analysis based on the popularity.

For this, I have taken the movie dataset of 5000 values with 28 attributes. You can find the dataset here Movie Dataset.

Step 1. First, I have loaded the dataset in RStudio.

movie_metadata <- read_csv(“~/movie_metadata.csv”)

View(movie_metadata)

Movie metadata - Edureka

Step 2. As you can see that there are many NA values in this data, so I will clean the dataset and remove all the null values from it.

movie <- data.matrix(movie_metadata)

movie <- na.omit(movie)

Clean movie data - Edureka

Step 3. In this example, I have taken first 500 values from the data set for analysis.

smple <- movie[sample(nrow(movie),500),]

Step 4. Further, with the R code below, you can take two attributes budget and gross from the dataset to make clusters.

smple_short <- smple[c(9,23)]

smple_matrix <- data.matrix(smple_short)

View(smple_matrix)

Our dataset will look like below.

gross budget - Edureka

Step 5. Now, let’s determine the number of clusters.

wss <- (nrow(smple_matrix)-1)*sum(apply(smple_matrix,2,var))

for (i in 2:15) wss[i]<-sum(kmeans(smple_matrix,centers=i)$withinss)

plot(1:15, wss, type=”b”, xlab=”Number of Clusters”, ylab=”Within Sum of Squares”)

It gives the elbow plot as follows.

wss vs number of clusters - Edureka

As you can see, there is a sudden drop in the value of WSS (within sum of squares) as the number of clusters increase from 1 to 3. Therefore, the bend at k=3 gives the stability in the value of WSS. We need to strike a balance between k and WSS. So, in this case, it comes at k=3.

Step 6. Now, with this cleaned data, I will apply inbuilt kmeans function in R to form clusters.

cl <- kmeans(smple_matrix,3,nstart=25)

You can plot the graph and cluster centroid using the following command.

plot(smple_matrix, col =(cl$cluster +1) , main=”k-means result with 2 clusters”, pch=1, cex=1, las=1)

points(cl$centers, col = “black”, pch = 17, cex = 2)

k-means cluster - Edureka

Step 7. Now, I will analyze how good is my cluster formation by using the command cl. It gives the following output.

Within cluster sum of squares by cluster:

[1] 3.113949e+17 2.044851e+17 2.966394e+17

(between_SS / total_SS = 72.4 %)

Here, total_SS is the sum of squared distances of each data point to the global sample mean whereas between_SS is the sum of squared distances of the cluster centroids to the global mean. Here, 72.4 % is a measure of the total variance in the data set. The goal of k-means is to maximize the between-group dispersion(between_SS). So, higher the percentage value, better is the model.

Step 8. For a more in-depth look at the clusters, we can examine the coordinates of the cluster centroids using the cl$centers component, which is as follows for gross and budget (in million).

gross budget

1 91791300 62202550

2 223901969 118289474

3 18428131 19360546

As per the cluster centroids, we can infer that cluster 1 and cluster 2 have more gross than the budget. Hence, we can infer that cluster 1 and cluster 2 made the profit while cluster 3 was in a loss.

Step 9. Further, we can also examine how the cluster assignment relates to individual characteristics like director_facebook_likes(column 5) and movie_facebook_likes(column 28). I have taken the following 20 sample values.

Movie facebook likes - Edureka

Using aggregate function we can look at other parameters of the data and draw insight. As you can see below that cluster 3 has least movie facebook likes as well as least director likes. This is expected because cluster 3 is already at a loss. Also, cluster 2 is doing a pretty good job by grabbing maximum likes and maximum gross.

Aggregate movie likes - Edureka

Aggregate director likes - Edureka

Organizations like Netflix is making use of clustering to target movie clusters with maximum popularity among the viewers. They are selling these movies and making a huge profit out of this.

“We live and breathe the customer,” said Dave Hastings, Netflix’s director of product analytics. Currently, Netflix has 93.80 million worldwide streaming customers. They watch your every move very closely on the internet as to what movies you like, which director you prefer and then apply clustering to group the movies based on the popularity. Now, they recommend movies from the most popular cluster and enhance their business.

l hope you enjoyed reading my blog and understood k-means clustering. After doing projects using R, you can give a big boost to your resume. This will eventually lead to recruiters flocking around you. If you are interested to know more, then check out our Data Science certification training here, that comes with instructor-led live training and real-life project experience.

<< Previous

Check Out Our Data Science Course

The post K-means Clustering Algorithm: Know How It Works appeared first on Edureka Blog.

Are you finding it difficult to connect and build relationships with your customers? Is your organization in need of software to market goods and services online? If your answer to the above questions are yes, then you should consider the Salesforce marketing cloud. In my previous blogs we have learnt about Salesforce, different certifications available in Salesforce, how to create an application on Salesforce platform and about Salesforce service cloud.

In this blog, I will be introducing you to Salesforce marketing cloud. I will provide you with details as to why you should choose Salesforce marketing cloud, the different platforms and channels that it provides. Finally, we’ll take a look at an use case which explains how the marketing cloud is being utilized at Peak Games for advertising.

A Brief History

Before Salesforce came out with marketing cloud in 2012, there were various challenges in digital marketing. I have listed those challenges below:

It was difficult to harness the information on customers and audiences from social networks.

The growth of online conversations and metrics required a platform that could manage the huge amount of content being produced.

Conversations between organizations and customers were scattered across various channels.

All the data that was available on a customer was stuck in silos and was not being utilized.

Companies struggled to find return-on-investment across multiple channels.

To solve these challenges, Salesforce came up with the marketing cloud – one platform to integrate all social programs and data.

Why Choose Salesforce Marketing Cloud?

Salesforce marketing cloud is one of the market leaders in the marketing cloud domain along with other clouds like Adobe marketing cloud, IBM marketing cloud and Oracle marketing cloud. Salesforce marketing cloud has a total market share of 24%, second only to Adobe marketing cloud. Below is an image from google trends showing the interest in different marketing clouds over time. As you can see, the interest for Salesforce marketing cloud has been increasing rapidly.

google trends - salesforce marketing cloud - edureka

If you are wondering what makes the marketing cloud stand out, then you should take a look at the benefits given below:

Salesforce marketing cloud provides you with a platform to plan, personalize and optimize customer journey.

You can map customer journeys across multiple channels, devices and customer life-cycle stages all in one software.

Salesforce marketing cloud can be integrated with other software like Salesforce CRM, Salesforce Sales cloud, Workfront and other applications to provide deeper and better insights of customers.

Companies like Aston-Martin, Vodafone, Philips, Western-Union, General Electric etc, who have huge customer base use Salesforce marketing cloud to connect with their customers. Businesses that adopt these tools have a dramatic advantage, not just now, but well into the future.

Get Started With Salesforce

What Is Salesforce Marketing Cloud?

Marketing cloud is the platform for delivering relevant, personalized journeys across channels and devices – enabling marketers to deliver the right messages to the right people via the right channel. Below is an image that shows the different functionalities that Salesforce marketing cloud provides your organization – journey builder, contact management tools, content management tools, analytics builder and various channels like email and mobile.

why salesforce marketing cloud - salesforce marketing cloud - edureka

We have seen as to why Salesforce brought the marketing cloud into the market and why your organization should consider using it. Now, let’s dive deep into the product and take a look at the different platforms and channels that Salesforce marketing cloud provides.

Below is an image from www.salesforce.com which describes the complete Salesforce marketing cloud product. The Salesforce marketing cloud is built on Salesforce infrastructure and on Fuel platform. It consists of various other platforms which you can utilize for your organization’s marketing purpose. A Customer Data Platform which you can use to store your customer’s data. A predictive intelligence platform which you can use for building predictive models of 1-to-1 customer’s journey. It also consists of a platform for maintaining your organization’s content and messages. It also provides tools for performing analytics and marketing operations on data obtained from customers. You can connect with your customers across various channels like email, mobile, ads, social networks etc. You can also use applications listed on hub exchange to add additional marketing features to your marketing cloud.

salesforce marketing cloud product - salesforce marketing cloud - edureka

Now, let’s take a look at the different platforms and channels that Salesforce marketing cloud provides and how your organization can benefit using these powerful features.

Platforms – Salesforce marketing cloud provides you with 6 different platforms which your organization can utilize to build effective marketing strategies. I have described each of the different platforms in detail below:

	Journey Builder – With Journey builder you can build 1-to-1 journeys at scale. You can deliver simple or complex journeys for every individual, no matter the size of your customer base. You can incorporate sales and service activities right into the journey. Using the journey builder you can define specific goals and measure CTRs, timing, channels, conversions and more. You can evaluate your progress and optimize the performance as you go.
	Audience Builder – With audience builder, you can build a single view of your customer using data from different sources like Sales cloud, Service cloud and other data sources. Audience builder provides you with the functionality to filter data from multiple sources instantly. This can help your organization to target smart audiences. Not only this, but you can also validate audiences and engage with them at the right moment.
	Personalization Builder – Your organization can use the power of personalization builder’s predictive analytics and predictive modeling, to understand each customer’s preference. This enables your organization to build profiles of customers. You can then use these profiles to tailor personalized content and deliver it across different channels.
	Content Builder – With content builder, you can create, manage and track content across all your digital channels from a single location. The content builder provides you with drag-and-drop smart content blocks so that you can create content once and use it at various places. Content builder comes with sophisticated algorithms to determine and deliver the best content for each customer.
	Analytics Builder – Using the analytics builder you can uncover new insights about your customers. With analytics builder you can display your reports using bar graph, pie charts, scatter plots and other visualization techniques. Analytics builder also comes with email analytics and reporting using which you can understand whether a customer has opened, clicked, unsubscribed and more for each of your campaigns.
	Marketing Cloud Connect – With marketing cloud connect you get access to all of your Salesforce customer data – data in different Salesforce products. You can trigger activities that connect interactions across Salesforce sales cloud, Salesforce service cloud and other Salesforce products.

Channels – Salesforce marketing cloud provides you with 5 different channels using which your organization can interact with its customers. I have described each channel below:

	Email Studio – Your organization can use the email studio to create customer engaging emails. Using email studio, you can keep tabs on your email campaigns. You can further boost your return on investment with built-in A/B testing capabilities, integrated predictive intelligence and email delivery tools. Also, using email studio you can filter your subscriber base and send targeted email messages based on customer data.
	Social Studio – Social studio provides your organization with social listening tools to hear conversations from different sources. You can plan, execute and track social media marketing campaigns. Using social studio you can monitor your owned social channels and participate in conversations at scale.
	Mobile Studio – With mobile studio your organization can get a mobile-first mindset with SMS, MMS, push messaging and group messaging. You can engage with customers in the moment, send real-time alerts and notifications. With mobile studio you can build powerful APIs to automate mobile marketing solutions. Using geo-location technology, you can interact with your customers at the right place and time.
	Advertising Studio – With advertising studio you can power digital advertising and manage ad campaigns. You can use customer data from multiple sources to securely reach customers and lookalikes across various platforms like Facebook, Google, Instagram, Twitter etc. Your organization can use the advertising studio to manage ad campaigns at scale.
	Web Studio – Web studio provides you with tools to create beautiful, dynamic web-pages and personalized content. You can track real-time customer interaction on your website and gain insights from it. Using the web studio, your organization can deliver personalized content and recommendations.

Well that’s a lot of things that you can do using the Salesforce marketing cloud. Generally organizations utilize only few of the features described above. First Midwest Bank leverages Mobile studio to engage with its customers, Stanley Black and Decker uses email studio and social studio to understand its customers. Below I have described in detail how Peak Games, a mobile gaming company is utilizing Salesforce marketing cloud to reduce cost of their social advertising.

Learn Salesforce From Expert

Salesforce Marketing Cloud Use Case – Peak Games

peak games - salesforce marketing cloud - edureka Peak games is a household name in the mobile gaming industry. They have over 275 million users across 150 countries. You may have played some of their popular games like Okey Plus, War Of Mercenaries and Lost Bubble. Currently, Peak games run over 175 marketing campaigns every day and they use Salesforce marketing cloud to do so. In this section, I will describe the challenges that Peak Games faced and how they used Salesforce marketing cloud to overcome them. We’ll also take a look at the impact that Salesforce marketing cloud had at Peak Games.

The challenges that Peak games faced were:

Peak games used to run high volume of marketing campaigns. Therefore, they required a tool for bulk campaign management.

Peak games marketed their games using paid social advertisements. Peak Games wanted to scale their social advertising.

Peak games used tactics like A/B Testing, ad segmentation and audience targeting optimization. They wanted to make better use of these tactics so that they could achieve better return-on-investment.

Peak games required a powerful reporting center to handle their reporting.

As a solution, Peak games turned to Salesforce marketing cloud’s advertising studio platform.

Peak games used the bulk campaign management feature to handle their marketing campaigns.

Peak games were able to iterate the process of A/B testing for images to gain insights as to which image is effective.

Using the marketing cloud, Peak games advertising team was able to develop best practices to attract high quality game players.

Using the marketing cloud Peak games team was able to look at its players holistically and optimize advertising depending on their actions.

Below is an image which clearly outlines the challenges that peak games faced and the solution that Salesforce marketing cloud provided them.

use case peak games - salesforce marketing cloud - edureka

The marketing cloud didn’t just solve these challenge, it also had a positive impact on the company. The results of Peak games using the Salesforce marketing cloud were:

Peak games were able to reduce their cost per engagement and identify their best audience.

Peak games was able to compare the success of two separate images with regard to Click Through Rate (CTR) and Cost Per Install (CPI).

Peak games also found out that:
- Using 20% text allowance in images reduces CPI by 27%.
- Images showing a user interacting with a game doubled CTR and reduced CPC (Cost Per Click) by 50%.

Salesforce marketing cloud has enabled Peak games to more effectively harness the power of social advertising, while also getting a better understanding of its audiences and their preference.

I urge you to see this Salesforce marketing cloud video tutorial that explains all that we have discussed in the blog. Go ahead, enjoy the video and tell me what you think.

Salesforce Marketing Cloud Training Video | Edureka

This Edureka Salesforce marketing cloud training video for beginners will help you learn Salesforce marketing cloud benefits, what it is, its various features, use case along with marketing cloud demo.

From this blog, I hope you have got a complete understanding of the Salesforce marketing cloud, the different channels and platforms it offers and how to use them for your organization’s benefit. Feel free to leave any questions you have in the comment box below.

Check Out Salesforce Course

Check out our Salesforce Certification Training, which comes with instructor-led live training and real life project experience. Feel free to leave any questions you have in the comment box below.

The post Salesforce Marketing Cloud: A Powerful Marketing Platform appeared first on Edureka Blog.

Are you aspiring to be a software application developer? Do you want to build your own application on the Force.com platform? If your answer to these questions are yes, then you should definitely consider becoming a Salesforce developer.

In my previous blogs, I have discussed about Salesforce, Salesforce certifications and also shown you to build a custom application using the declarative options available in Salesforce. In this blog, I will discuss about the programmatic options available in Salesforce to develop your application.

MVC Architecture

Before I dive into building an application using Visualforce and Apex, I will first discuss about the Salesforce Model-View-Controller architecture. Below is a diagram that outlines the Salesforce Model-View-Controller architecture along with the different Salesforce components.

mvc - salesforce developer - edureka

Model: The model is your Salesforce data objects, fields and relationships. It constitutes of standard (Account, Opportunity, etc) and custom objects (objects you create).

View: The view represents the presentation of the data i.e the user interface. In Salesforce, the view constitutes of the visualforce pages, components, page layouts and tabs.

Controller: The controller is the building block of the actual application logic. You can perform actions whenever the user interacts with visualforce.

Salesforce in Action

To be a Salesforce developer, you need to first know how Salesforce applications work. Below is an image which gives you the complete picture of Salesforce in action. The client or user either requests or provides information to the Salesforce application. This is generally done using Visualforce. This information is then passed on to the application logic layer, written in Apex. Depending upon the information, data is either inserted or removed from the database. Salesforce also provides you with the option of using web services to directly access the application logic.

salesforce in action - salesforce developer - edureka

A Salesforce developer can approach development either using the declarative or programmatic options. Below is an image which provides you with details on both the declarative and programmatic approaches available at each of the user interface, business logic and data model layer. To build your user-interface, you can either use the declarative approach which is using page layouts and records types or use programmatic approach like visualforce pages and components. Generally, you should use programmatic approach only when you can not achieve the necessary user-interface using the declarative approach. To develop your application’s business logic layer you can either use the Salesforce declarative options of workflow, validation rules and approval processes or use programmatic approach like triggers, controllers and classes. To access the data model, you can use the declarative approach using objects, fields and relationships. You can also access the data model programmatically using metadata API, REST API and bulk API.

declarative vs programmatic - salesforce developer - edureka

We have seen how Salesforce applications work, the MVC architecture used for development in Salesforce and the two different approaches that are available for a Salesforce developer. Now, let me discuss about Visualforce and Apex.

Get Started With Salesforce

Visualforce

To build applications on the Salesforce platform you need to know how to develop user-interface and write application logic. As a Salesforce developer, you can develop the user-interface using Visualforce. Visualforce is the user-interface framework for the Force.com platform. Just like how you can use javascript Angular-JS framework to build user-interfaces for your websites, you can use Visualforce to design and build user-interfaces for your Salesforce applications.

You can use visualforce whenever you need to build custom pages. Few examples of situations where you can use Visualforce are:

To build email templates
To develop mobile user-interface
To generate PDF’s of data stored in Salesforce
To embed them into your standard page layouts
To override a standard Salesforce page
To develop custom tabs for your application

A visualforce page consists of two primary elements:

Visualforce Markup – The visualforce markup includes the visualforce tags, HTML, JavaScript or any other web-enabled code.
A Visualforce Controller – The visualforce controller contains the instructions that specify what happens when a user interacts with a component. The visualforce controller is written using Apex programming language.

You can take a look at a simple Visualforce page code along with the different components below:

vf components - salesforce developer - edureka

Below I have shown you the steps to write a simple visualforce page for displaying countries and their currency values:

Step 1: From Setup, enter Visualforce Pages in Quick Find box, then select Visualforce Pages and click New.

Step 2: In the editor add the following code to display country and its currency value:

<apex:page standardController = “country__c”>

<apex:pageBlock title="Countries">

<apex:column value="{!country__c.Name}"/>

<apex:column value="{!country__c.currency_value__c}"/>

</apex:pageBlock>

</apex:page>

Apex

Once you are done developing the user-interface, as a Salesforce developer you need to know how to add custom logic to your application. You can write controller code and add custom logic to your application using the Apex programming language. Apex is an object oriented programming language that allows you to execute flow and transaction control statements on the Force.com platform. If you have used the java programming language before then you can easily learn Apex. Apex syntax is 70% similar to that of java.

You can use Apex whenever you want to add custom logic to your application. Few examples of situations where you can use Apex are:

When you want to add web and email services to your application
When you want to perform complex business processes
When you want to add complex validation rules to your application
When you want to add a custom logic on operations like saving a record

Below is a screenshot of Apex code along with its different components like looping statement, control-flow statement and SOQL query:

apex code - salesforce developer - edureka

Now that we have understood what Apex is and when to use it, let me dive deep into Apex programming.

Programming In Apex

If you have understood the concepts described above, then you are halfway through your journey in becoming a Salesforce developer. In this section, I will dive deeper into Apex by providing you with information on the different data types and variables, different ways of retrieving data from the database and showing you how to write a class and method.

Datatypes And Variables

Salesforce offers you with 4 different data types and variables. The below table provides you with information on each of the 4 data types:

Data Types And Variables	Description	Example
Primitive	Primitive data types in Salesforce include boolean, date, integer, object, string and time.	Boolean isSunny = true; Integer I = 1; String myString = “Hello World”;
sObjects	sObject refers to any object that can be stored in the database.	Account a = new Account(); MyCustomObj__c obj = new MyCustomObj__c();
Collections	Apex has the following types of collections: Lists Maps Sets	List<String> var_lst = new List<String>(); Set<String> setOne = new Set<String>(); Map<String, String> var_map = new Map<String, String>();
Enums	Enums are abstract data types with values that take on a finite set of identifiers.	Public enum Seasons {Winter, Spring, Summer, Fall};

SOQL And SOSL

Developing software applications requires you to know how to insert and retrieve data from databases. In Salesforce, you can retrieve data from the databases using SOQL and SOSL. If you want to be a Salesforce developer, then you must know both of these query languages. I have provided you with a detailed explanation of these languages below:

SOQL stands for Salesforce Object Query Language. Using SOQL statements, you can retrieve data from the database as a list of sObjects, a single sObject or an Integer for count method. You can think of the SOQL as an equivalent of a SELECT SOQL query. I have provided an example of a SOQL query below:

List<Account> accList = [SELECT Id, Name FROM Account WHERE Name=”YourName”];

SOSL stands for Salesforce Object Search Language. You can use SOSL statements to retrieve a list of sObjects, where each list contains the search results for a particular sObject type. You can think of SOSL as an equivalent to a database search query. I have provided an example of a SOSL query below:

List<List<sObject>> searchList = [FIND ‘map*’ IN ALL FIELDS RETURNING Account (Id, Name), Contact, Opportunity, Lead];

You can use SOQL when you know which object the data resides in and use SOSL when you don’t know the name of the object where the data resides.

Classes And Methods

Like in every other object oriented programming language, you can develop classes and methods using Apex. You can think of a class as a blueprint using which individual objects are created and used. You can think of a method as a subprogram, which acts on data and returns a value. I have provided you with the syntax to write a class and method below:

classes and methods - salesforce developer - edureka

I will now show you how to add a class and method in Apex:

Step 1: From setup enter Apex Classes in QuickFind Box, then select Apex Classes and click New.

Step 2: In the editor add the following class definition:

Public class HelloWorld {

}

Step 3: Add a method definition between the class opening and closing brackets:

Public static void helloWorldMethod(Country__c[] countries) {

For ( Country__c country : countries){

country.currency_value__c *= 1.5;

}

Step 4: Click on Save and you should have your full class as:

Public class HelloWorld {

Public static void helloWorldMethod(Country__c[] countries) {

For ( Country__c country : countries){

country.currency_value__c *= 1.5;

}

You can use the syntax and example shown above to develop your own classes and methods for your Salesforce application. To become a Salesforce developer you need to know more than just writing classes and methods. In the next few sections, I will discuss topics which make developing applications on the Salesforce platform simple and easy.

Learn Salesforce From Expert

Triggers

Every Salesforce developer must know the concept of Salesforce triggers. You might have previously come across triggers while working with other databases. Triggers are nothing but stored programs that get invoked when you perform actions before or after changes to Salesforce records. For example, triggers can run before an insert operation is performed or when an update operation is performed. There are two types of triggers:

Before trigger – You can use before triggers to update or validate record values before they are saved to the database.
After trigger – You can use after triggers to access field values that are set by the system and to affect changes in other records.

Triggers get executed before or after the below operations:

Insert
Update
Delete
Merge
Upsert
Undelete

I will show you how to add a trigger in apex by adding a trigger for the Country object which you have seen in the class above:

Step 1: From the object management settings for country, go to Triggers and click on New.

Step 2: In the trigger editor, add the following trigger definition:

Trigger HelloWorldTrigger on Country__c (before insert) {

Country__c countries = Trigger.new;

HelloWorld.helloWorldMethod(countries);

}

The above code will update your country’s currency before every insert into the database.

Governor Limits

You might know that Salesforce works on multi-tenant architecture, this means that resources are shared across different clients. To make sure no one client monopolize the shared resources, Apex run-time engine strictly enforces governor limits. If your Apex code ever exceeds a limit, the expected governor issues a run-time exception that cannot be handled. So, as a Salesforce developer you have to be very careful while developing your application.

Bulk Operations

As a Salesforce developer, you have to always ensure that your code maintains the governor limits. To make sure Apex adheres to governor limits, you must use the bulk calls design pattern. A bulk operation refers to committing more than one record when you make a DML operation. Before you make a DML operation you have to always make sure that you add the rows into a collection. Below is an image that gives you a complete description of the bulk operation design pattern.

bulk operation - salesforce developer - edureka

DMLs And Data Operations

You have seen earlier how to retrieve data from the database using SOQL and SOSL queries. Now let’s take a look at the different statements that you can use to insert data into the Salesforce database. For a Salesforce developer, it is a must to know what these statements can do and how to use them.

DML Statement	Description
Insert	Adds one or more sObjects to your organization’s data
Update	Modifies one or more existing sObject records
Upsert	Creates new records and updates sObject records
Delete	Deletes one or more existing sObject records
Undelete	Restores one or more existing sObject records
Merge	Merges upto three records of the same sObject type into one record

Visualforce And Apex

You have come a long way in your quest to become a Salesforce developer. I will next discuss about how you can integrate your visualforce page and your apex code. You can connect your visualforce page and your apex code by using controllers and extensions.

Custom Controllers – When you want your visualforce page to run entirely in system mode i.e without permissions and field-level security, use a custom controller.

Controller Extension – When you want to add new actions or functions that extend the functionality of a standard or custom controller, use a controller extension.

In the code below, I have shown you how to include custom controller in your visualforce page:

<apex:page controller="myController" tabStyle="Account">

<apex:form>

</apex:form>

</apex:page>

In the code below, I have shown you how to include controller extension in your visualforce page:

<apex:page standardController="Account" extensions="extensionClass">

<apex:form>

</apex:form>

</apex:page>

Exception Handling

If you have developed applications before, then you would have definitely come across exceptions. An exception is a special condition that changes the normal flow of program execution. For example, dividing a number by zero or accessing a list value that is out of bounds. If you don’t handle these exceptions, then the execution of process stops and DMLs will be rolled back.

As a Salesforce developer, you need to know how to catch these exceptions and what to do once you catch them. To catch exceptions you can use the try, catch and finally construct. Once you have caught the exception, then you can handle it in ways mentioned below:

Exception	How To Handle It
DML	Use the addError() method on a record or a field
Visualforce	Use ApexPages.message class
Sending an email on exception	You can notify the developer by email
Logging in a custom object	You can use a future method to catch a custom object

Till now in this Salesforce developer blog you have seen how to develop your user-interface using Visualforce, you have seen how to write custom logic using Apex and different concepts like triggers, bulk operations and exception handling. Last but not the least, we’ll take a look at the Salesforce testing framework.

Testing

As a Salesforce developer, you need to know how to test the code you write. Test driven development is a good way of ensuring long-term success of your software application. You need to test your application so that you can validate that your application works as expected. Especially, if you are developing an application for a customer then testing it before delivering the final product is very important. Apex provides you a testing framework that allows you to write unit tests, run the tests, check test results and have code coverage results.

You can test your application in two ways:

Through the Salesforce user interface, this way of testing is important but will not catch all of the use cases for your applications
You can test bulk functionality, up to 200 records can be passed through your code using SOAP API or visualforce standard set controller

Test classes commits no data to the database and is annotated with @isTest. I have shown you how to add a test class, by adding a test class to the HelloWorld class below:

@isTest

private class HelloWorldTestClass {

static testMethod void validateHelloWorld() {

Country__c country = new Country__c(Name=”India”, currency_value__c=50.0);

Insert country;

country = [SELECT currency_value__c FROM Country WHERE Id = country.Id ];

System.assertEquals(75, country.currency_value__c);

}

I hope you have understood all the concepts that you need to know to be a Salesforce developer. To dive into more details, checkout our Salesforce Certification Training which comes with instructor led live training and real life project experience. If you have any comments, then please leave them in the comment box below.

Check Out Salesforce Course

The post Salesforce Developer Tutorial: Get Started With Salesforce Programming appeared first on Edureka Blog.

Apache Spark is an open-source cluster computing framework for real-time processing. It is of the most successful projects in the Apache Software Foundation. Spark has clearly evolved as the market leader for Big Data processing. Today, Spark is being adopted by major players like Amazon, eBay, and Yahoo! Many organizations run Spark on clusters with thousands of nodes. We are excited to begin this exciting journey through this Spark Tutorial blog. This blog is the first blog in the upcoming Apache Spark blog series which will include Spark Streaming, Spark Interview Questions, Spark MLlib and others.

When it comes to Real Time Data Analytics, Spark stands as the go-to tool across all other solutions. Through this blog, I will introduce you to this new exciting domain of Apache Spark and we will go through a complete use case, Earthquake Detection using Spark.

The following are the topics covered in this Spark Tutorial blog:

Spark Tutorial: Real Time Analytics

Before we begin, let us have a look at the amount of data generated every minute by social media leaders.

Data Generated - Spark Tutorial - Edureka

Figure: Amount of data generated every minute

As we can see, there is a colossal amount of data that the internet world necessitates to process in seconds. We will go through all the stages of handling big data in enterprises and discover the need for a Real Time Processing Framework called Apache Spark.

To begin with, let me introduce you to few domains using real-time analytics big time in today’s world.

Real Time Analytics - Spark Tutorial - Edureka

Figure: Spark Tutorial – Examples of Real Time Analytics

We can see that Real Time Processing of Big Data is ingrained in every aspect of our lives. From fraud detection in banking to live surveillance systems in government, automated machines in healthcare to live prediction systems in the stock market, everything around us revolves around processing big data in near real time.

Let us look at some of these use cases of Real Time Analytics:

Healthcare: Healthcare domain uses Real Time analysis to continuously check the medical status of critical patients. Hospitals on the look out for blood and organ transplants need to stay in a real-time contact with each other during emergencies. Getting medical attention on time is a matter of life and death for patients.
Government: Government agencies perform Real Time Analysis mostly in the field of national security. Countries need to continuously keep a track of all the military and police agencies for updates regarding threats to security.
Telecommunications: Companies revolving around services in the form of calls, video chats and streaming use real-time analysis to reduce customer churn and stay ahead of the competition. They also extract measurements of jitter and delay in mobile networks to improve customer experiences.
Banking: Banking transacts with almost all of the world’s money. It becomes very important to ensure fault tolerant transactions across the whole system. Fraud detection is made possible through real-time analytics in banking.
Stock Market: Stockbrokers use real-time analytics to predict the movement of stock portfolios. Companies re-think their business model after using real-time analytics to analyze the market demand for their brand.

GET STARTED WITH SPARK

Spark Tutorial: Why Spark when Hadoop is already there?

The first of the many questions everyone asks when it comes to Spark is, “Why Spark when we have Hadoop already?“.

To answer this, we have to look at the concept of batch and real-time processing. Hadoop is based on the concept of batch processing where the processing happens of blocks of data that have already been stored over a period of time. At the time, Hadoop broke all the expectations with the revolutionary MapReduce framework in 2005. Hadoop MapReduce is the best framework for processing data in batches.

This went on until 2014, till Spark overtook Hadoop. The USP for Spark was that it could process data in real time and was about 100 times faster than Hadoop MapReduce in batch processing large data sets.

The following figure gives a detailed explanation of the differences between processing in Spark and Hadoop.

Figure: Spark Tutorial – Differences between Hadoop and Spark

Here, we can draw out one of the key differentiators between Hadoop and Spark. Hadoop is based on batch processing of big data. This means that the data is stored over a period of time and is then processed using Hadoop. Whereas in Spark, processing can take place in real-time. This real-time processing power in Spark helps us to solve the use cases of Real Time Analytics we saw in the previous section. Alongside this, Spark is also able to do batch processing 100 times faster than that of Hadoop MapReduce (Processing framework in Apache Hadoop). Therefore, Apache Spark is the go-to tool for big data processing in the industry.

Spark Tutorial: What is Apache Spark?

Apache Spark is an open-source cluster computing framework for real-time processing. It has a thriving open-source community and is the most active Apache project at the moment. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance.

Apache Spark - Spark Interview Questions - Edureka Figure: Spark Tutorial – Real Time Processing in Apache Spark

It was built on top of Hadoop MapReduce and it extends the MapReduce model to efficiently use more types of computations.

Spark Tutorial: Features of Apache Spark

Spark has the following features:

Figure: Spark Tutorial – Spark Features

Let us look at the features in detail:

Polyglot:

Spark provides high-level APIs in Java, Scala, Python and R. Spark code can be written in any of these four languages. It provides a shell in Scala and Python. The Scala shell can be accessed through ./bin/spark-shell and Python shell through ./bin/pyspark from the installed directory.

Spark Features - Spark Tutorial - Edureka

Speed:

Spark runs up to 100 times faster than Hadoop MapReduce for large-scale data processing. Spark is able to achieve this speed through controlled partitioning. It manages data using partitions that help parallelize distributed data processing with minimal network traffic.

Multiple Formats:

Spark supports multiple data sources such as Parquet, JSON, Hive and Cassandra apart from the usual formats such as text files, CSV and RDBMS tables. The Data Source API provides a pluggable mechanism for accessing structured data though Spark SQL. Data sources can be more than just simple pipes that convert data and pull it into Spark.

Spark Features 3 - Spark Tutorial - Edureka

Lazy Evaluation:

Apache Spark delays its evaluation till it is absolutely necessary. This is one of the key factors contributing to its speed. For transformations, Spark adds them to a DAG (Directed Acyclic Graph) of computation and only when the driver requests some data, does this DAG actually gets executed.

Real Time Computation:

Spark’s computation is real-time and has low latency because of its in-memory computation. Spark is designed for massive scalability and the Spark team has documented users of the system running production clusters with thousands of nodes and supports several computational models.

Spark Features 4 - Spark Tutorial - Edureka

Spark Features 5 - Spark Tutorial - Edureka

Hadoop Integration:

Apache Spark provides smooth compatibility with Hadoop. This is a boon for all the Big Data engineers who started their careers with Hadoop. Spark is a potential replacement for the MapReduce functions of Hadoop, while Spark has the ability to run on top of an existing Hadoop cluster using YARN for resource scheduling.

Machine Learning:

Spark’s MLlib is the machine learning component which is handy when it comes to big data processing. It eradicates the need to use multiple tools, one for processing and one for machine learning. Spark provides data engineers and data scientists with a powerful, unified engine that is both fast and easy to use.

Spark Features 6 - Spark Tutorial - Edureka

Spark Tutorial: Getting Started With Spark

The first step in getting started with Spark is installation. Let us install Apache Spark 2.1.0 on our Linux systems (I am using Ubuntu).

Installation:

The prerequisites for installing Spark is having Java and Scala installed.

Download Java in case it is not installed using below commands.

sudo apt-get install python-software-properties
sudo apt-add-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer

Download the latest Scala version from Scala Lang Official page. Once installed, set the scala path in ~/.bashrc file as shown below.
```
export SCALA_HOME=Path_Where_Scala_File_Is_Located
export PATH=$SCALA_HOME/bin:PATH
```
Download Spark 2.1.0 from the Apache Spark Downloads page. You can also choose to download a previous version.
Extract Spark tar using below command.
```
tar -xvf spark-2.1.0-bin-hadoop2.7.tgz
```

Set the Spark_Path in ~/.bashrc file.

export SPARK_HOME=Path_Where_Spark_Is_Installed
export PATH=$PATH:$SPARK_HOME/bin

Before we move further, let us start up Apache Spark on our systems and get used to the main concepts of Spark like Spark Session, Data Sources, RDDs, DataFrames and other libraries.

Spark Shell:

Spark’s shell provides a simple way to learn the API, as well as a powerful tool to analyze data interactively.

Spark Session:

In earlier versions of Spark, Spark Context was the entry point for Spark. For every other API, we needed to use different contexts. For streaming, we needed StreamingContext, for SQL sqlContext and for hive HiveContext. To solve this issue, SparkSession came into the picture. It is essentially a combination of SQLContext, HiveContext and future StreamingContext.

Data Sources:

The Data Source API provides a pluggable mechanism for accessing structured data though Spark SQL. Data Source API is used to read and store structured and semi-structured data into Spark SQL. Data sources can be more than just simple pipes that convert data and pull it into Spark.

RDD:

Resilient Distributed Dataset (RDD) is a fundamental data structure of Spark. It is an immutable distributed collection of objects. Each dataset in RDD is divided into logical partitions, which may be computed on different nodes of the cluster. RDDs can contain any type of Python, Java, or Scala objects, including user-defined classes.

Dataset:

A Dataset is a distributed collection of data. A Dataset can be constructed from JVM objects and then manipulated using functional transformations (map, flatMap, filter, etc.). The Dataset API is available in Scala and Java.

DataFrames:

A DataFrame is a Dataset organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R/Python, but with richer optimizations under the hood. DataFrames can be constructed from a wide array of sources such as: structured data files, tables in Hive, external databases or existing RDDs.

Spark Tutorial: Using Spark with Hadoop

The best part of Spark is its compatibility with Hadoop. As a result, this makes for a very powerful combination of technologies. Here, we will be looking at how Spark can benefit from the best of Hadoop.

Figure: Spark Tutorial – Spark Features

Hadoop components can be used alongside Spark in the following ways:

HDFS: Spark can run on top of HDFS to leverage the distributed replicated storage.
MapReduce: Spark can be used along with MapReduce in the same Hadoop cluster or separately as a processing framework.
YARN: Spark applications can be made to run on YARN (Hadoop NextGen).
Batch & Real Time Processing: MapReduce and Spark are used together where MapReduce is used for batch processing and Spark for real-time processing.

LEARN SPARK FROM EXPERTS

Spark Tutorial: Spark Components

Spark components are what make Apache Spark fast and reliable. A lot of these Spark components were built to resolve the issues that cropped up while using Hadoop MapReduce. Apache Spark has the following components:

Spark Core
Spark Streaming
Spark SQL
GraphX
MLlib (Machine Learning)

Spark Core

Spark Core is the base engine for large-scale parallel and distributed data processing. The core is the distributed execution engine and the Java, Scala, and Python APIs offer a platform for distributed ETL application development. Further, additional libraries which are built atop the core allow diverse workloads for streaming, SQL, and machine learning. It is responsible for:

Memory management and fault recovery
Scheduling, distributing and monitoring jobs on a cluster
Interacting with storage systems

Spark Streaming

Spark Streaming is the component of Spark which is used to process real-time streaming data. Thus, it is a useful addition to the core Spark API. It enables high-throughput and fault-tolerant stream processing of live data streams. The fundamental stream unit is DStream which is basically a series of RDDs (Resilient Distributed Datasets) to process the real-time data.

Spark Streaming - Spark Tutorial - Edureka Figure: Spark Tutorial – Spark Streaming

Spark SQL

Spark SQL is a new module in Spark which integrates relational processing with Spark’s functional programming API. It supports querying data either via SQL or via the Hive Query Language. For those of you familiar with RDBMS, Spark SQL will be an easy transition from your earlier tools where you can extend the boundaries of traditional relational data processing.

Spark SQL integrates relational processing with Spark’s functional programming. Further, it provides support for various data sources and makes it possible to weave SQL queries with code transformations thus resulting in a very powerful tool.

The following are the four libraries of Spark SQL.

Data Source API
DataFrame API
Interpreter & Optimizer
SQL Service

A complete tutorial on Spark SQL can be found in the given blog: Spark SQL Tutorial Blog

GraphX

GraphX is the Spark API for graphs and graph-parallel computation. Thus, it extends the Spark RDD with a Resilient Distributed Property Graph.

The property graph is a directed multigraph which can have multiple edges in parallel. Every edge and vertex have user defined properties associated with it. Here, the parallel edges allow multiple relationships between the same vertices. At a high-level, GraphX extends the Spark RDD abstraction by introducing the Resilient Distributed Property Graph: a directed multigraph with properties attached to each vertex and edge.

To support graph computation, GraphX exposes a set of fundamental operators (e.g., subgraph, joinVertices, and mapReduceTriplets) as well as an optimized variant of the Pregel API. In addition, GraphX includes a growing collection of graph algorithms and builders to simplify graph analytics tasks.

MlLib (Machine Learning)

MLlib stands for Machine Learning Library. Spark MLlib is used to perform machine learning in Apache Spark.

Use Case: Earthquake Detection using Spark

Now that we have understood the core concepts of Spark, let us solve a real-life problem using Apache Spark. This will help give us the confidence to work on any Spark projects in the future.

Problem Statement: To design a Real Time Earthquake Detection Model to send life saving alerts, which should improve its machine learning to provide near real-time computation results.

Use Case – Requirements:

Process data in real-time
Handle input from multiple sources
Easy to use system
Bulk transmission of alerts

We will use Apache Spark which is the perfect tool for our requirements.

Use Case – Dataset:

Earthquake ROC Dataset - Spark Tutorial - Edureka Figure: Use Case – Earthquake Dataset

Click here to download the complete dataset: Earthquake Dataset – Spark Training – Edureka

Before moving ahead, there is one concept we have to learn that we will be using in our Earthquake Detection System and it is called Receiver Operating Characteristic (ROC). An ROC curve is a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied. We will use the dataset to obtain an ROC value using Machine Learning in Apache Spark.

Use Case – Flow Diagram:

The following illustration clearly explains all the steps involved in our Earthquake Detection System.

Figure: Use Case – Flow diagram of Earthquake Detection using Apache Spark

Use Case – Spark Implementation:

Moving ahead, now let us implement our project using Eclipse IDE for Spark.

Find the Pseudo Code below:

//Importing the necessary classes
import org.apache.spark._
...
//Creating an Object earthquake
object earthquake {
 def main(args: Array[String]) {
 
//Creating a Spark Configuration and Spark Context
val sparkConf = new SparkConf().setAppName("earthquake").setMaster("local[2]")
val sc = new SparkContext(sparkConf)
 
//Loading the Earthquake ROC Dataset file as a LibSVM file
val data = MLUtils.loadLibSVMFile(sc, *Path to the Earthquake File* )
 
//Training the data for Machine Learning
val splits = data.randomSplit( *Splitting 60% to 40%* , seed = 11L)
val training = splits(0).cache()
val test = splits(1)

//Creating a model of the trained data
val numIterations = 100
val model = *Creating SVM Model with SGD* (  *Training Data* , *Number of Iterations* )
 
//Using map transformation of model RDD
val scoreAndLabels = *Map the model to predict features*
 
//Using Binary Classification Metrics on scoreAndLabels
val metrics = * Use Binary Classification Metrics on scoreAndLabels *(scoreAndLabels)
val auROC = metrics. *Get the area under the ROC Curve*()
 
//Displaying the area under Receiver Operating Characteristic
println("Area under ROC = " + auROC)
 }
}

Click here to get the full source code of Earthquake Detection using Apache Spark.

From our Spark program, we obtain the ROC value to be 0.088137. We will be transforming this value to get the area under the ROC curve.

Use Case – Visualizing Results:

We will plot the ROC curve and compare it with the specific earthquake points. Where ever the earthquake points exceed the ROC curve, such points are treated as major earthquakes. As per our algorithm to calculate the Area under the ROC curve, we can assume that these major earthquakes are above 6.0 magnitude on the Richter scale.

Earthquake ROC Curve - Spark Tutorial - Edureka Figure: Earthquake ROC Curve

The above image shows the Earthquake line in orange. The area in blue is the ROC curve that we have obtained from our Spark program. Let us zoom into the curve to get a better picture.

Visualizing Earthquake ROC Results - Spark Tutorial - Edureka Figure: Visualizing Earthquake Points

We have plotted the earthquake curve against the ROC curve. At points where the orange curve is above the blue region, we have predicted the earthquakes to be major, i.e., with magnitude greater than 6.0. Thus armed with this knowledge, we could use Spark SQL and query an existing Hive table to retrieve email addresses and send people personalized warning emails. Thus we have used technology once more to save human life from trouble and make everyone’s life better.

GET EARTHQUAKE DETECTION CODE

Now, this concludes the Apache Spark blog. I hope you enjoyed reading it and found it informative. By now, you must have acquired a sound understanding of what Apache Spark is. The hands-on examples will give you the required confidence to work on any future projects you encounter in Apache Spark. Practice is the key to mastering any subject and I hope this blog has created enough interest in you to explore learning further on Apache Spark.

We recommend the following Apache Spark Training videos from Edureka to begin with:

Apache Spark Tutorial | Apache Spark Training | Edureka

This video series on Spark Tutorial provides a complete background into the components along with Real-Life use cases such as Twitter Sentiment Analysis, NBA Game Prediction Analysis, Earthquake Detection System, Flight Data Analytics and Movie Recommendation Systems. We have personally designed the use cases so as to provide an all round expertise to anyone running the code.

GET SPARK CERTIFIED TODAY

Got a question for us? Please mention it in the comments section and we will get back to you at the earliest.

If you wish to learn Spark and build a career in domain of Spark to perform large-scale Data Processing using RDD, Spark Streaming, SparkSQL, MLlib, GraphX and Scala with Real Life use-cases, check out our interactive, live-online Apache Spark Certification Training here, that comes with 24*7 support to guide you throughout your learning period.

The post Spark Tutorial: Real Time Cluster Computing Framework appeared first on Edureka Blog.

Python Tutorial:

I will start this Python Tutorial by giving you enough reasons to learn Python.

Python is simple and incredibly readable since closely resembles the English language. It’s a great language for beginners, all the way up to seasoned professionals. You don’t have to deal with complex syntaxes, let me give you an example:

If I want to print “Hello World” in python, all I have to write is:

print ('Hello World')

It’s that simple.

This blog on Python tutorial includes the following topics:

Let me give you one more motivation to learn Python, It’s wide variety of applications.

Python Applications:

Python finds application in a lot of domains, below are few of those:

ython Applications - Python Tutorial - Edureka This is not all, it is also used for automation and for performing a lot of other tasks.

After this Python tutorial, I will be coming up with a separate blog on each of these applications.

Python Introduction:

Python is an open source scripting language, let’s look at some cool features of python.

Python Features - Python Tutorial - Edureka
Let’s move ahead in this Python tutorial and understand how Variables work in Python.

Variables in Python:

Variables are nothing but reserved memory locations to store values. This means that when you create a variable you reserve some space in memory.

Variables - Python Tutorial - Edureka

Fig: The figure above shows three variables A, B and C

In python you don’t need to declare variables before using it, unlike other languages like Java, C etc.

Assigning values to a variable:

Python variables do not need explicit declaration to reserve memory space. The declaration happens automatically when you assign a value to a variable. The equal sign (=) is used to assign values to variables.

Consider the below example:

S = 10
print(S)

This will assign value ‘10’ to the variable ‘S’ and will print it. Try it yourself.

Now in this Python tutorial, is the time to understand Data types.

Data Types in Python:

Python supports various data types, these data types defines the operations possible on the variables and the storage method.

Below is the list of standard data types available in Python:

Python Data Types - Python Tutorial - Edureka Let’s discuss each of these in detail. In this Python tutorial we will start with ‘Numeric’ data type.

Numeric:

Just as expected Numeric data types store numeric values.

They are immutable data types, this means that you cannot change it’s value.

Python supports three different Numeric types:

Numeric Data Types - Python Tutorial - Edureka

Fig: The figure above shows the example of each of the numeric data types

Integer type: It holds all the integer values i.e. all the positive and negative whole numbers.

Float type: It holds the real numbers and are represented by decimal and sometimes even scientific notations with E or e indicating the power of 10 (2.5e2 = 2.5 x 102 = 250).

Complex type: These are of the form a + bj, where a and b are floats and J represents the square root of -1 (which is an imaginary number).

Now you can even perform type conversion. For example you can convert the integer value to a float value and vice-versa. Consider the example below:

A = 10
# Convert it into float type
B = float(A)
print(B)

The code above will convert an integer value to a float type. Similarly you can convert a float value to integer type:

A = 10.76
# Convert it into float type
B = int(A)
print(B)

Now let’s understand what exactly are lists in this Python tutorial.

List Data Type:

You can consider the Lists as Arrays in C, but in List you can store elements of different types, but in Array all the elements should of the same type.

List is the most versatile datatype available in Python which can be written as a list of comma-separated values (items) between square brackets. Consider the example below:

Subjects = ['Physics', 'Chemistry', 'Maths', 2]
print(Subjects)

Notice that the Subjects List contains both words as well as numbers. Now, let’s perform some operations on our Subjects List.

List Operations:

The table below contains the operations possible with Lists:

Syntax	Result	Description
Subjects [0]	Physics	This will give the index 0 value from the Subjects List.
Subjects [0:2]	Physics, Chemistry	This will give the index values from 0 till 2, but it won’t include 2 the Subjects List.
Subjects [3] = ‘Biology’	[‘Physics’, ‘Chemistry’, ‘Maths’, ‘Biology’]	It will update the List and add ‘Biology’ at index 3 and remove 2.
del Subjects [2]	[‘Physics’, ‘Chemistry’, 2]	This will delete the index value 2 from Subjects List.
len (Subjects)	[‘Physics, ‘Chemistry’, ‘Maths’, 2, 1, 2, 3]	This will concatenate the two Lists.
Subjects * 2	[‘Physics’, ‘Chemistry’, ‘Maths’, 2] [‘Physics’, ‘Chemistry’, ‘Maths’, 2]	This will repeat the Subjects List twice.
Subjects [::-1]	[2, ‘Maths’, ‘Chemistry’, ‘Physics’]	This will reverse the Subjects List

Now, let’s focus on Tuples.

Tuples:

A Tuple is a sequence of immutable Python objects. Tuples are sequences, just like Lists. The differences between tuples and lists are, the tuples cannot be changed unlike lists and tuples use parentheses, whereas lists use square brackets. Consider the example below:

Chelsea = ('Hazard', 'Lampard', 'Terry')

Now you must be thinking why Tuples when we have Lists. Consider the below reason:

Tuples are faster than lists. If you’re defining a constant set of values and all you’re ever going to do with it, is iterate through it. That’s when you can use a Tuple instead of a List.

Guys! all Tuple operations are similar to Lists, but you cannot update, delete or add an element to a Tuple.

Now, stop being lazy and don’t expect me to show all those operations, try it yourself.

Time to move on and understand Strings.

String Data Type:

Strings are amongst the most popular types in Python. We can create them simply by enclosing characters in quotes. Python treats single and double quotes in exactly the same fashion. Consider the example below:


S = "Welcome To edureka!"
D = 'edureka!'

Let’s look at few operations that you can perform with Strings.

Syntax	Operation
print (len(String_Name))	String Length
print (String_Name.index(“Char”))	Locate a character in String
print (String_Name.count(“Char”))	Count the number of times a character is repeated in a String
print (String_Name[Start:Stop])	Slicing
print (String_Name[::-1])	Reverse a String
print (String_Name.upper())	Convert the letters in a String to upper-case
print (String_Name.lower())	Convert the letters in a String to lower-case

I hope you have enjoyed the read till now. Next up, in this Python tutorial we will focus on Set.

Set Data type:

A set is an unordered collection of items. Every element is unique.

A set is created by placing all the items (elements) inside curly braces {}, separated by comma. Consider the example below:

Set_1 = {1, 2, 3}

In Sets every element has to be unique. Try printing the below code:

Set_2 = {1, 2, 3, 3}

Here 3 is repeated twice, but it will print it only once.

Let’s look at some Set operations:

Union:

Union of A and B is a set of all elements from both sets. Union is performed using | operator. Consider the below example:

A = {1, 2, 3, 4}
B = {3, 4, 5, 6}
print ( A | B)

output = {1, 2, 3, 4, 5, 6}

Intersection:

Intersection of A and B is a set of elements that are common in both sets. Intersection is performed using & operator. Consider the example below:

A = {1, 2, 3, 4}
B = {3, 4, 5, 6}
print ( A & B )

Output = {3, 4}

Difference:

Set Difference - Python Tutorial - Edureka Difference of A and B (A – B) is a set of elements that are only in A but not in B. Similarly, B – A is a set of element in B but not in A. Consider the example below:

A = {1, 2, 3, 4, 5}
B = {4, 5, 6, 7, 8}
print(A - B)

Output = {1, 2, 3}

Symmetric Difference:

Symmetric Difference - Python Tutorial - Edureka Symmetric Difference of A and B is a set of elements in both A and B except those that are common in both. Symmetric difference is performed using ^ operator. Consider the example below:

A = {1, 2, 3, 4, 5}
B = {4, 5, 6, 7, 8}
print(A ^ B)

Output = {1, 2, 3, 6, 7, 8}

Now is the time to focus on the last Data type i.e. Dictionary

Dictionary Data Type:

Now let me explain you Dictionaries with an example.

I am guessing you guys know about Adhaar Card. For those of you who don’t know what it is, it is nothing but a unique ID which has been given to all Indian citizen. So for every Adhaar number there is a name and few other details attached.

Now you can consider the Adhaar number as a Key and the person’s detail as the Value attached to that Key.

Dictionaries contains these Key Value pairs enclosed within curly braces and Keys and values are separated with ‘:’. Consider the below example:

Dict = {'Name' : 'Saurabh', 'Age' : 23}

You know the drill, now comes various Dictionary operations.

Dictionary Operations:

Access elements from a dictionary:

Dict = {'Name' : 'Saurabh', 'Age' : 23}
print(Dict['Name'])

Output = Saurabh

Changing elements in a Dictionary:

Dict = {'Name' : 'Saurabh', 'Age' : 23}
Dict['Age'] = 32
Dict['Address'] = 'Starc Tower'

Output = {'Name' = 'Saurabh', 'Age' = 32, 'Address' = 'Starc Tower'}

Enough with Data types, Now is the time to see various Operators in Python.

Check Out Our Python Course

Operators in Python:

Operators are the constructs which can manipulate the values of the operands. Consider the expression 2 + 3 = 5, here 2 and 3 are operands and + is called operator.

Python supports the following types of Operators:

Python Operators - Python Tutorial - Edureka

Let’s focus on each of these Operators one by one.

Arithmetic Operators:

These Operators are used to perform mathematical operations like addition, subtraction etc. Assume that A = 10 and B = 20 for the below table.

Operator	Description	Example
+ Addition	Adds values on either side of the operator	A + B = 30
– Subtraction	Subtracts the right hand operator with left hand operator	A – B = -10
* Multiplication	Multiplies values on either side of the operator	A * B = 200
/ Division	Divides left hand operand with right hand operator	A / B = 0.5
% Modulus	Divides left hand operand by right hand operand and returns remainder	B % A = 0
** Exponent	Performs exponential (power) calculation on operators	A ** B = 10 to the power 20

Consider the example below:

a = 21
b = 10
c = 0

c = a + b
print ( c )

c = a - b
print ( c )

c = a * b
print ( c )

c = a / b
print ( c )

c = a % b
print ( c )
a = 2
b = 3
c = a**b
print ( c )

Output = 31, 11, 210, 2.1, 1, 8

Now let’s see comparison Operator.

Comparison Operators:

These Operators compare the values on either sides of them and decide the relation among them. Assume A = 10 and B = 20.

Operator

Description

Example

If the values of two operands are equal, then the condition becomes true.

(A == B) is not true

If values of two operands are not equal, then condition becomes true.

(A != B) is true

If the value of left operand is greater than the value of right operand, then condition becomes true.

(a > b) is not true

If the value of left operand is less than the value of right operand, then condition becomes true.

(a < b) is true

If the value of left operand is greater than or equal to the value of right operand, then condition becomes true.

(a >= b) is not true

If the value of left operand is less than or equal to the value of right operand, then condition becomes true.

(a <= b) is true

Consider the example below:

a = 21
b = 10
c = 0

if ( a == b ):
   print ("a is equal to b")
else:
   print ("a is not equal to b")

if ( a != b ):
   print ("a is not equal to b")
else:
   print ("a is equal to b")

if ( a < b ):
   print ("a is less than b")
else:
   print ("a is not less than b")

if ( a > b ):
   print ("a is greater than b")
else:
   print ("a is not greater than b")

a = 5
b = 20
if ( a <= b ):
   print ("a is either less than or equal to b")
else:
   print ("a is neither less than nor equal to b")

if ( a >= b ):
   print ("a is either greater than  or equal to b")
else:
   print ("a is neither greater than  nor equal to b")

Output = a is not equal to b
         a is not equal to b
         a is not less than b
         a is greater than b
         a is either less than or equal to b
         b is either greater than or equal to b

Now in the above example, I have used conditional statements (if, else). It basically means if the condition is true then execute the print statement, if not then execute the print statement inside else. We will understand these statements later in the blog.

Assignment Operators:

An Assignment Operator is the operator used to assign a new value to a variable. Assume A = 10 and B = 20 for the below table.

Operator	Description	Example
=	Assigns values from right side operands to left side operand	c = a + b assigns value of a + b into c
+= Add AND	It adds right operand to the left operand and assign the result to left operand	c += a is equivalent to c = c + a
-= Subtract AND	It subtracts right operand from the left operand and assign the result to left operand	c -= a is equivalent to c = c – a
*= Multiply AND	It multiplies right operand with the left operand and assign the result to left operand	c = a is equivalent to c = c a
/= Divide AND	It divides left operand with the right operand and assign the result to left operand	c /= a is equivalent to c = c / a
%= Modulus AND	It takes modulus using two operands and assign the result to left operand	c %= a is equivalent to c = c % a
**= Exponent AND	Performs exponential (power) calculation on operators and assign value to the left operand	c = a is equivalent to c = c a

Consider the example below:

a = 21
b = 10
c = 0

c = a + b
print ( c )

c += a
print ( c )

c *= a
print ( c )

c /= a
print ( c )

c  = 2
c %= a
print ( c )

c **= a
print ( c )

Output = 31, 52, 1092, 52.0, 2, 2097152, 99864

Bitwise Operators:

These operations directly manipulate bits. In all computers, numbers are represented with bits, a series of zeros and ones. In fact, pretty much everything in a computer is represented by bits. Consider the example shown below:

Bitwise And - Python Tutorial - Edureka Following are the Bitwise Operators supported by Python:

Bitwise Operators - Python Tutorial - Edureka Consider the example below:

a = 58        # 111010
b = 13        # 1101
c = 0

c = a & b     # 8 = 1000
print ( c )

c = a | b     # 63 = 111111
print ( c )

c = a ^ b     # 55 = 110111
print ( c )

c = a << 2    # 232 = 11101000
print ( c )

c = a >> 2    # 14 = 1110
print ( c )

Next up, in this Python tutorial we will focus on Logical Operators.

Logical Operators:

Logical Operators - Python Tutorial - Edureka The following are the Logical Operators present in Python:

Operator	Description	Example
and	True if both the operands are true	X and Y
or	True if either of the operands are true	X or Y
not	True if operand is false (complements the operand)	not X

Consider the example below:

x = True
y = False

print('x and y is',x and y)

print('x or y is',x or y)

print('not x is',not x)

Output = x and y is False
         x or y is True
         not x is False

Now, we will focus on Membership Operators.

Membership Operators:

These Operators are used to test whether a value or a variable is found in a sequence (Lists, Tuples, Sets, Strings, Dictionaries)

The following are the Membership Operators:

Operator	Description	Example
in	True if value/variable is found in the sequence	5 in x
not in	True if value/variable is not found in the sequence	5 not in x

Consider the example below:

X = [1, 2, 3, 4]
A = 3
print(A in X)
print(A not in X)

Output = True
         False

Now is the time to look at the last Operator i.e. Identity Operator.

Identity Operators:

These Operators are used to check if two values (or variables) are located on the same part of the memory. Two variables that are equal does not imply that they are identical.

Following are the Identity Operators in Python:

Operator	Description	Example
is	True if the operands are identical	x is True
is not	True if the operands are not identical	x is not True

Consider the example below:



X1 = 'Welcome To edureka!'

X2 = 1234

Y1 = 'Welcome To edureka!'

Y2 = 1234

print(X1 is Y1)

print(X1 is not Y1)

print(X1 is not Y2)

print(X1 is X2)

Output = True
         False
         True
         False

I hope you have enjoyed the read till now. Next, we will look at various Conditional Statements.

Conditional Statements:

Conditional statements are used to execute a statement or a group of statements when some condition is true. There are namely three conditional statements – If, Elif, Else.

Consider the flowchart shown below:

Conditional Statements - Python Tutorial - Edureka

Let me tell you how it actually works.

First the control will check the ‘If’ condition. If its true, then the control will execute the statements after If condition.
When ‘If’ condition is false, then the control will check the ‘Elif’ condition. If Elif condition is true then the control will execute the statements after Elif condition.
If ‘Elif’ Condition is also false then the control will execute the Else statements.

Below is the syntax:

if condition1:
    statements

elif condition2:
    statements

else:
    statements

Consider the example below:

X = 10
Y = 12

if X < Y:
    print('X is less than Y')
elif X > Y:
    print('X is greater than Y')
else:
    print('X and Y are equal')

Output = X is less than Y

Now is the time to understand Loops.

Loops:

In general, statements are executed sequentially: The first statement in a function is executed first, followed by the second, and so on.
There may be a situation when you need to execute a block of code several number of times.

A loop statement allows us to execute a statement or group of statements multiple times. The following diagram illustrates a loop statement:

Python Loops - Python Tutorial - Edureka Let me explain you the above diagram:

First the control will check the condition. If it is true then the control will move inside the loop and execute the statements inside the loop.
Now, the control will again check the condition, if it is still true then again it will execute the statements inside the loop.
This process will keep on repeating until the condition becomes false. Once the condition becomes false the control will move out of loop.

There are two types of loops:

Infinite: When condition will never become false.
Finite: At one point, the condition will become false and the control will move out of the loop.

There is one more way to categorize loops:

Pre-test: In this type of loops the condition is first checked and then only the control moves inside the loop.
Post-test: Here first the statements inside the loops are executed and then the condition is checked.

Python does not support Post-test loops.

Learn Python From Experts

Loops in Python:

In Python, there are three loops:

While
For
Nested

While Loop: Here, first the condition is checked and if it’s true, control will move inside the loop and execute the statements inside the loop until the condition becomes false. We use this loop when we are not sure how many times we need to execute a group of statements or you can say that when we are unsure about the number of iterations.

Consider the example:

count = 0
while (count < 10):
   print ( count )
   count = count + 1

print ("Good bye!")

For Loop: Like the While loop, the For loop also allows a code block to be repeated certain number of times. The difference is, in For loop we know the amount of iterations required unlike While loop, where iterations depends on the condition. You will get a better idea about the difference between the two by looking at the syntax:

for variable in Sequence:
    statements

Notice here, we have specified the range, that means we know the number of times the code block will be executed.

Consider the example:


fruits = ['Banana', 'Apple',  'Grapes']

for index in range(len(fruits)):
   print (fruits[index])

Output = Banana
         Apple
         Grapes

Nested Loops: It basically means a loop inside a loop. It can be a For loop inside a While loop and vice-versa. Even a For loop can be inside a For loop or a While loop inside a While loop.

Consider the example:



count = 1
for i in range(10):
    print (str(i) * i)

    for j in range(0, i):
        count = count +1

Now is the best time to introduce functions in this Python tutorial.

Functions:

Functions are a convenient way to divide your code into useful blocks, allowing us to order our code, make it more readable, reuse it and save some time.

def add (a, b):
    return a + b
c = add(10,20)
print(c)

Output = 30

I hope you have enjoyed reading this Python tutorial. We have covered all the basics of Python in this tutorial, so you can start practicing now. After this Python tutorial, I will be coming up with more blogs on Python for Analytics, Python Oops concepts, Python for web development, Python RegEx, and Python Numpy. Stay tuned!

View Upcoming Python Batches

Got a question for us? Mention them in the comments section and we will get back to you.

To get in-depth knowledge on Python along with its various applications, you can enroll here for live online training with 24/7 support and lifetime access.

The post Python Tutorial – Python Programming For Beginners appeared first on Edureka Blog.

Do you know what is Selenium? Do you have any idea why it is used? If you want the answer to these two questions, then wait until you read the entire content of this blog because you will be glad you spent a worthy amount of time getting an introduction to what could be at the heart of your next job role.

Before I get started, let me tell you what you will get to learn by the end of this blog:

Selenium is an automation testing tool. Wait, before you get carried away, let me inform you that only testing of web applications is possible with Selenium. We can neither test any desktop (software) application nor test any mobile application using Selenium. It’ a bummer, I can feel your pain. But don’t worry, there are many tools for testing software and mobile applications like: IBM’s RFT, HP’s QPT, Appium and many more. But, the focus of this blog is, testing dynamic web applications and why Selenium is the best for that purpose. Before going into further details of Selenium, you ought to know the story behind how Selenium came into being what it is today.

Need For Software Testing

Software testing is where it all boils down to. Today’s world of technology is completely dominated by machines, and their behavior is controlled by the software powering it. Will the machines behave exactly as we want them to? Everytime? Everywhere? The answer to these questions lie in software testing.

At the end of the day, it is the software application’s success rate which is going to control your business growth. The same thing can be said even for web applications because most businesses today are completely reliant on the internet.

Take for example, any e-commerce company. Be it Amazon or E-Bay or Flipkart, they rely on the customer traffic on their web sites and traffic on their web based mobile applications for business. Imagine, if something catastrophic happens like the prices of a number of products being capped off at 10$, all because of a small bug in a “not so easily readable” part of the code. Then what can be done, and how can we prevent it the next time? By testing the code before deployment right? So, this takes us to the next topic of manual testing.

Challenges With Manual Testing

Manual testing refers to the fact that, the web application being tested is done manually by QA testers. Tests need to be performed manually in every environment, using a different data set and the success/ failure rate of every transaction should be recorded.

manual testing challenges - what is selenium Look at the above image of a poor chap, who manually verifies the transactions recorded. The challenges he is facing cause fatigue, boredom, delay in work, mistakes and errors because of manual work. This leads to the need for automation testing.

Automation Testing Beats Manual Testing

Automation testing beats manual testing every time. Why? Because it is faster, needs less investment in human resource, not prone to errors, frequent execution is possible, supports lights out execution, supports regression testing and also functional testing.

Let’s take a similar example to the one mentioned earlier. Suppose there is a login page and we need to verify if all the login attempts are successful, then it will be really easy to write a simple code which will validate if all the transaction/ login attempts are a success or not (automated test case execution). Moreover, these tests can be configured in such a way that they are tested in different environments and web browsers. What else can be done? You can automate the generation of result file, by scheduling it for a particular time during the day. Then you can also automate the generation of reports based on those results and what not. The key point is that automation testing makes a tester’s job a whole lot simpler. Check out the image below which shows a more relaxed environment in which the same tester is working.

automation testing - what is selenium

Now, let me talk about Selenium in particular.

What Is Selenium?

Selenium is a suite of software tools used to automate web browser testing. It is an open-source tool and is mainly used for functional testing and regression testing. Since it is open-source, there is no licensing cost involved, which is a major advantage over other testing tools.

selenium features - what is selenium

Other reasons behind Selenium’s ever growing popularity are:

Allows testers to write Selenese code in 7 different programming languages => Java, Python, C#, PHP, Ruby, Perl and JavaScript
It is not limited to, any environment and it allows testers to perform tests on various operating systems => Windows, Mac, Linux, iOS and Android
Testing can be done on different web browsers => Mozilla Firefox, Internet explorer, Google Chrome, Safari and Opera browsers
Most of all, we can integrate it with frameworks like TestNG, JUnit & NUnit for managing test cases and generating reports

But there surely has to be shortcomings right?

We can use Selenium only to test web applications. We cannot test desktop applications or any other software
There is no guaranteed support available for Selenium. We need to leverage the available customer communities
It is not possible to perform testing on images. We need to integrate Selenium with Sikuli for image based testing
There is no native reporting facility. But we can overcome that issue by integrating it with frameworks like TestNG or JUnit or NUnit

Learn Selenium From Experts

Now let us see where Selenium stands in the market.

Selenium vs. QTP vs. RFT

I have compared its performance with two other popular tools: QTP and RFT in the table below.

Features	HP QTP	IBM RFT	Selenium
License	Required	Required	Open-source
Cost	High	High	Open-source software
Customer Support	Dedicated HP support	Dedicated IBM support	Open-source Community
Hardware consumption during script execution	High	High	Low
Coding experience	Not required	Required	Ample amount of coding skills and experience needed
Environment support	Only for Windows	Only for Windows	Windows, Linux, Solaris OS X (If browser & JVM or JavaScript support exists)
Language Support	VB Script	Java and C#	Java, C#, Ruby, Python, Perl, PHP and JavaScript

It is pretty clear from the above table why Selenium is the most preferred tool. In the next part of what is Selenium blog, let’s go into the depths of Selenium by understanding the different flavors it comes in.

Selenium Suite Of Tools

Selenium RC (Now depreciated)
Selenium IDE
Selenium Grid
Selenium WebDriver

Selenium RC (Remote Control)

Before I talk about the details of Selenium RC, I would like to go a step back and talk about the first tool in the Selenium project. Selenium Core was the first tool. But, Selenium Core hit a roadblock in terms of cross-domain testing because of the same origin policy. Same origin policy prohibits JavaScript code from accessing web elements which are hosted on a different domain compared to from where the JavaScript was launched.

To overcome the same origin policy issue, testers needed to install local copies of both Selenium Core (a JavaScript program) and the web server containing the web application being tested so they would belong to the same domain. The permanent solution to this problem turned out to be Selenium RC. What does RC do then?

RC overcame the problem by involving an HTTP proxy server to “trick” the browser into believing that Selenium Core and the web application being tested come from the same domain. Thus making RC a two-component tool.

Selenium RC Server
Selenium RC Client – Library containing your programming language code

RC Server communicates using simple HTTP GET/ POST requests. Look at the below image for understanding the RC architecture.

selenium rc - what is selenium

Selenium RC was the flagship tool from the entire Selenium project because it could be used to write web application tests in different programming languages.

But the drawback with RC is that every communication with the RC server is time consuming and hence RC is very slow. So slow, that it would sometimes take hours to complete single tests.

From Selenium v3 onwards, RC has been depreciated and been moved to legacy package. You can however download and work with RC, but unfortunately you cannot avail support for it. But on the flip side, why would you want to use a tool which is outdated, especially when there is a more efficient tool called Selenium WebDriver. Before I talk about WebDriver, let me discuss about IDE and Grid, which are the other tools that make up Selenium v1.

Selenium IDE (Integrated Development Environment)

Selenium IDE is a Firefox plugin which is used to quickly and frequently execute simple test cases. Test cases in IDE is created by recording the interactions which the user had with the web browser. These tests can then be played back any number of times.

The advantage with Selenium IDE is that, tests recorded via the plugin can be exported in different programming languages like: Java, Ruby, Python etc. Check out the below screenshot of Firefox’s IDE plugin.

selenium ide - what is selenium

But the associated shortcomings of IDE are:

Plug-in only available for Mozilla Firefox browser
It is not possible for testing dynamic web applications; only simple tests can be recorded
Test cases cannot be created via programming logic
Does not support Data Driven testing

These were some of the aspects of Selenium IDE. Let me now talk about Selenium Grid.

Selenium Grid

Selenium Grid was a part of Selenium v1 and it was used in combination with RC to run tests on remote machines. In fact, with Grid, multiple test scripts can be executed at the same time on multiple machines.

Parallel execution is achieved with the help of Hub-Node architecture. Hub controls the test scripts running on various browsers, operating systems and programming languages on multiple nodes.

selenium grid - what is selenium

Grid is still in use, and works with both WebDriver and RC. Now is the ideal time for me to talk about Selenium v2. Selenium v2 included WebDriver as part of the Selenium project. WebDriver is more efficient than RC and I will tell you why that is the case in the next section of the blog.

Selenium WebDriver

In contrast to IDE, Selenium WebDriver provides a programming interface to create and execute test cases. Test cases are written such that web elements on web pages are identified and then actions are performed on those elements.

WebDriver is an upgrade to RC because it is much faster. It is faster because it makes direct calls to the browser. RC on the other hand needs an RC server to interact with the web browser. Each browser has its own driver on which the application runs. The different WebDrivers are:

Firefox Driver (Gecko Driver)
Chrome Driver
Internet Explorer Driver
Safari Driver and
HTM Unit Driver

Version 2 of WebDriver is basically a byproduct of Selenium RC and Selenium WebDriver v1. But commercially, WebDriver was launched as part of Selenium v2. Currently, Selenium v3 is in use.

Benefits Of Selenium WebDriver:

Support for 7 programming languages: JAVA, C#, PHP, Ruby, Perl, Python and JavaScript.
Supports testing on various browsers like: Firefox, Chrome, IE, Safari
Tests can be performed on different operating systems like: Windows, Mac, Linux, Android, iOS
Overcomes limitations of Selenium v1 like file upload, download, pop-ups & dialogs barrier

Short-comings Of Selenium WebDriver

Detailed test reports cannot be generated
Testing images is not possible

No matter the challenge, these shortcomings can be overcome by integrations with other frameworks. For testing images, Sikuli can be used and for generating detailed test reports, TestNG can be used.

So that draws the conclusion to this blog on what is Selenium. To learn more about Selenium WebDriver and TestNG, read the other blogs in this Selenium tutorial blog series. You can alternatively see the video below delivered by an industry expert where she has shared her opinion of Selenium as an automation testing tool.

What Is Selenium? | Selenium Tutorial | Edureka

This Edureka Selenium tutorial video will give you an introduction to software testing. It talks about the drawbacks of manual testing and reasons why automation testing is the way forward.

Learn Selenium From Experts

If you wish to learn Selenium and build a career in the testing domain, then check out our interactive, live-online Selenium 3.0 Certification Training here, that comes with 24*7 support to guide you throughout your learning period.

Got a question for us? Please mention it in the comments section and we will get back to you.

The post What Is Selenium? A Beginner’s Guide To Automation Testing appeared first on Edureka Blog.

In my previous blog, I told you what Selenium is all about. I spoke about the need for automation testing and also about the Selenium suite of tools. In case, you missed out on reading my previous blog in this Selenium tutorial series, then here is the link for the same: What Is Selenium? This is the second blog in the same Selenium tutorial series and in this blog, I will tell you everything you need to know to get started with testing web apps using Selenium WebDriver. In continuation to the previous blog, here I will deep dive into Selenium WebDriver, which is the flagship tool in the Selenium project.

Selenium Tutorial – Selenium WebDriver

Before I get started with Selenium WebDriver, let me show you the topics I will be covering in this blog which will help you write your first Selenium code for automation testing. In this blog, I have written Selenium code to test automated login to Facebook. The topics are:

Drawbacks Of Selenium RC And The Birth Of WebDriver

Let’s first discuss the limitations of Selenium RC because that was the reason for the eventual development of WebDriver. You might be surprised when I say that Selenium RC became an instant hit when it was launched. That was because it overcame the same origin policy issue which was a major problem while testing web apps with Selenium Core. But do you know what the same origin policy issue was?

Same origin policy are rules which enforce web application security model. According to the same origin policy, the web browser will allow JavaScript codes to access elements on the web page, if and only if both the JavaScript and web page being tested are hosted from the same domain. Selenium Core being a JavaScript-based testing tool, was handicapped for the same reason that it could not test every web page.

But when Selenium RC came into the picture, it rid testers of the same origin policy issue. But, how did RC do that? RC did that by using another component called Selenium RC server. So, RC is a tool which is a combination of two components: Selenium RC server and Selenium RC client.

Selenium RC server is an HTTP proxy server, designed to “trick” the browser into believing that Selenium Core and the web application being tested are from the same domain. Hence, there is no stopping the JavaScript code from accessing and testing any web site.

Even though Selenium RC was a major hit, it had its own share of problems. The major one being time taken for executing tests. Since the Selenium RC server is the middle man in the communication between the browser and your Selenium commands, test executions are very time consuming. Besides the time factor, RC’s architecture is also slightly complicated.

This architecture involves first injecting Selenium Core into the web browser. Then Selenium Core will receive the instructions from RC server and convert it into a JavaScript command. This JavaScript code is responsible for accessing and testing the web elements. If you look at the image below, you will get an idea of how RC works.

rc architecture - selenium tutorial

To overcome these problems, Selenium WebDriver was developed. WebDriver is faster because it interacts directly with the browser and there is no involvement of an external proxy server. The architecture is also simpler as the browser is controlled from the OS level. The below image will help you understand how WebDriver works.

webdriver architecture - selenium tutorial

Another benefit with WebDriver is that it supports testing on the HTML Unit driver which is a headless driver. When we say a headless driver, it refers to the fact that the browser has no GUI. RC on the other hand does not support HTML Unit driver. These are some of the reasons why WebDriver scores over RC.

What Is Selenium WebDriver?

In this part of Selenium tutorial blog, I will dig deep into Selenium WebDriver. There is a good chance that you will be aware of the details I have covered in the below paragraph, but I will be revising it anyway.

Selenium WebDriver is a web-based automation testing framework which can test web pages initiated on various web browsers and various operating systems. In fact, you also have the freedom to write test scripts in different programming languages like: Java, Perl, Python, Ruby, C#, PHP and JavaScript. Do note that Mozilla Firefox is Selenium WebDriver’s default browser.

WebDriver was introduced as part of Selenium v2.0. Selenium v1 consisted of only IDE, RC and Grid. But the major breakthrough in the Selenium project was when WebDriver was developed and introduced as a replacement in Selenium v2. However, with the release of Selenium v3, RC has been deprecated and moved to legacy package. You can still download and work with RC but, don’t expect any support for it.

Learn Selenium From Experts

In a nutshell, the advantages WebDriver has over RC are:

Support for more programming languages, operating system and web browsers
Overcoming the limitations of Selenium 1 like file upload, download, pop-ups & dialog barrier
Simpler commands when compared to RC, and a better API
Support for Batch testing, Cross browser testing & Data driven testing

But the drawback when compared to RC is that, the test reports cannot be generated. RC generates detailed reports.

The below image depicts how WebDriver works:

webdriver - selenium tutorial

You must have heard the term “browser elements” a number of times. The next part of this Selenium tutorial will be about what are these elements and how testing happens on these web elements.

What Are Browser Elements?

Elements are the different components that are present on web pages. The most common elements we notice while browsing are:

Text boxes
CTA Buttons
Images
Hyperlinks
Radio buttons/ Check boxes
Text area/ Error messages
Drop down box/ List box/ Combo box
Web Table/ HTML Table
Frame

Testing these elements essentially means we have to check whether they are working fine and responding the way we want it to. For example, if we are testing text boxes, what would you test it for?

Whether we are able to send text or numbers to the text box
Can we retrieve text that has been passed to the text box, etc.

If we are testing an image, we might want to:

Download the image
Upload the image
Click on the image link
Retrieve the image title, etc.

Similarly, operations can be performed on each of the elements mentioned earlier. But only after the elements are located on the web page, we can perform operations and start testing them right? So, the next topic, I will be covering in this Selenium tutorial blog is element locator techniques.

Locating Browser Elements Present On The Web Page

Every element on a web page will have attributes (properties). Elements can have more than one attribute and most of these attributes will be unique for different elements. For example, consider a page having two elements: an image and a text box. Both these elements have a ‘Name’ attribute and an ‘ID’ attribute. These attribute values need to be unique for each element. In other words, two elements cannot have the same attribute value. Elements can have the same value for ‘Class Name’.

In the example considered, the image and text box can neither have the same ‘ID’ value nor the same ‘Name’ value. However, there are some attributes that can be common for a group of elements on the page. I will tell you which are those attributes later, but before that let me list down the 8 attributes using which we can locate elements. Those attributes are ID, Name, Class Name, Tag Name, Link Text, Partial Link Text, CSS and XPath.

Since the elements are located using these attributes, we refer to them as ‘Locators’. The locators are:

By.id
Syntax: driver.findElement(By.id(“xxx”));
By.name
Syntax: driver.findElement(By.name(“xxx”));
By.className
Syntax: driver.findElement(By.className(“xxx”));
By.tagName
Syntax: driver.findElement(By.tagName(“xxx”));
By.linkText
Syntax: driver.findElement(By.linkText(“xxx”));
By.partialLinkText
Syntax: driver.findElement(By.partialLinkText(“xxx”));
By.css
Syntax: driver.findElement(By.css(“xxx”));
By.xpath
Syntax: driver.findElement(By.xpath(“xxx”));

By looking at the syntax above, you might have realized locators are called inside methods. So, before going any further, you need to learn all the other methods, browser commands and functions that can be used to perform operations on the elements.

Operations On Browser Elements

From this section of the blog onwards, you will be having a lot of fun because there will be less theory and more codes. So be prepared, and keep your Eclipse IDE open with the required Selenium packages installed.

To start testing a web page, we need to first open a browser, then navigate to the web page by providing the URL right? Check out the below piece of code, where I have replicated the same. Firefox browser will first be initiated and then it will navigate to Facebook’s login page.

package seleniumWebDriver;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;

public class WebDriverClass
{
public static void main(String[] args)
{
System.setProperty("webdriver.gecko.driver", "files/geckodriver.exe");
WebDriver driver = new FirefoxDriver();
driver.get("https://www.facebook.com/");
driver.getTitle();
driver.quit();
}
}

import org.openqa.selenium.WebDriver; is a library package which contains the required class to initiate the browser loaded with a specific driver.

import org.openqa.selenium.firefox.FirefoxDriver; is a library package which contains the FirefoxDriver class needed to start FirefoxDriver as the browser initiated by the WebDriver class.

System.setProperty(“webdriver.gecko.driver”, “files/geckodriver.exe”); – This command notifies the run-time engine that the Gecko driver is present in the specified path. After Firefox 35, we need to download the Gecko driver for working with WebDriver. In case you want to test on chrome, then you have to download ChromeDriver, which is a .exe file and specify it’s path in this line of code. We have to do it similarly in case of other browsers also.

WebDriver driver = new FirefoxDriver(); – This command is used to initiate a new Firefox driver object.

driver.get(“https://www.edureka.co/”); – This method is used to open the specified URL.

driver.getTitle(); – This command gets the title of the tab that is currently open in the browser.

driver.quit(); – This command closes the browser driver.

But, what if you want to navigate to a different URL and then do testing? In that case you can use the navigate.to() command as shown in the below code snippet. If you then want to come back to the previous page, then you can do that by using navigate.back() command. Similarly for refreshing the current page, you can use navigate.refresh() command.

driver.navigate().to(“https://www.edureka.co/testing-with-selenium-webdriver”);
driver.navigate().refresh();
driver.navigate().back();

If you want to maximize the size of browser window, then you can do that by using the code in the snippet below.

driver.manage().window().maximize();

In case you want to set a custom size for the browser window, then you can set your own dimensions as shown in the below code snippet.

Dimension d = new Dimension(420,600);
driver.manage().window().setSize(d);

Now that you know most of the basics, let’s go to the next topic in this Selenium tutorial blog. Let’s try to find an element on the web page and then perform any operation that is possible.

I’m pretty sure, you all have Facebook accounts. So, let me show you how to log into Facebook by passing the credentials from the code itself.

There are two text fields in the Facebook login page, one for Email/Phone and another for Password. We have to locate these two elements, pass the credentials to those elements and then find the third element: Login button which needs to be clicked on.

Look at the screenshot below. It is the screenshot of Facebook’s login page.

facebook screenshot - selenium tutorial

If you Inspect (Ctlr + Shift + i) this page, then you will get the same window in your browser. Then, under Elements, list of all the elements present on the page and their attributes will be displayed. There are three portions highlighted in the above screenshot. The first highlighted element is an email text field, the second is the password text field and the third is the Login button.

If you can recall, I mentioned earlier that these elements can be located using element locator techniques. Let’s use it to locate these elements and send the field values.
This is the syntax for finding the element: driver.findElement(By.id(“xxx”));
For sending it values, we can use the method sendKeys(“credentials“);
For clicking on a button, we have to use the method click();

So, let’s get started with finding the element and performing an operation on it. The code for it is in the below snippet.

driver.findElement(By.name("email")).sendKeys("xxx@gmail.com");
driver.findElement(By.name("pass")).sendKeys("xxxxxx");
driver.findElement(By.id("u_0_q")).click();

In line #1, we are identifying the Email element by its unique ‘Name’ attribute and sending it the EmailID.
In line #2, we are identifying the Password element by its unique ‘Name’ attribute and sending it the password.
In line #3, we are locating the Login button element by its unique ID and clicking on that button.

Adding just these lines of code might not be enough. That is because of the dynamics of the page, it might not respond immediately and by the time the page loads, WebDriver will get terminated and throw a timeout exception error. This issue might not happen in Facebook’s page because it is fast, but will most likely happen in any other E-Commerce site and other dynamic web sites.

To overcome this problem, we need to use an advanced technique. We need to request our WebDriver to wait after the page is accessed and after it loads completely, we need to locate the elements and then perform actions.

In case you want your WebDriver to wait until all the elements load in a web page and then close the browser, then we can achieve that by using driver.wait() method or Threads.sleep() method. However, if you are writing more advanced code, then you should use Implicit waits or Explicit waits. In the next blog of this Selenium tutorial series, I will explain the concept of wait conditions. But for our case, the below commands are enough.

driver.wait(5000);
// or use this:-
Thread.sleep(5000);

But, while working with wait conditions, remember to import this library:
import java.util.concurrent.TimeUnit;
We do it because, the class for wait and its related methods will be present in this library.

The entire code I explained, is present in the below code snippet.

package seleniumWebDriver;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;
import java.util.concurrent.TimeUnit;

public class WebDriverClass
{
public static void main(String[] args)
{
System.setProperty("webdriver.gecko.driver", "files/geckodriver.exe");
WebDriver driver = new FirefoxDriver();
driver.get("https://www.facebook.com/");
driver.manage().window().maximize();
driver.getTitle();
driver.navigate().to(“https://www.edureka.co/testing-with-selenium-webdriver”);

driver.navigate().back();
driver.navigate().refresh();
driver.wait(5000);
// or use
// Thread.sleep(5000);

driver.findElement(By.name("email")).sendKeys("xxx@gmail.com");
driver.findElement(By.name("pass")).sendKeys("xxxxxx");
driver.findElement(By.id("u_0_q")).click();

driver.quit();
}
}

When you replace the credentials with your actual email and password and execute this code, then Facebook will open in a new window, enter your credentials and login to your account.

Voila! You have successfully logged in, which means your complete code executed completely.

I have used the ID and Name attributes for locating elements. You can in fact use any other locator for finding the elements. XPath is the most useful and important of the locator techniques. But, as long as you can find even one of the attributes and use them for locating elements, you should be good. You can see the below video delivered by an industry expert, where she has shown all the above mentioned features of Selenium WebDriver with hands-on.

Selenium WebDriver Tutorial For Beginners | Selenium Tutorial | Edureka

This Selenium WebDriver tutorial video talks about the drawbacks of Selenium RC and what was the need for Selenium WebDriver. It goes into the details of the advantages that WebDriver has over RC and how it replaced RC for automation testing.

Learn Selenium From Experts

My next blog in this Selenium tutorial series is about how TestNG can be used along with Selenium WebDriver. I urge you to read it because it talks about how the limitations of WebDriver (test case management and report generation) can be overcome by using TestNG.

If you encounter any problem while executing the code present in this blog or if you have any other queries, then put them in the comments section below and we will get back to you right away.

The post Selenium Tutorial: All You Need To Know About Selenium WebDriver appeared first on Edureka Blog.

In the previous blog, I taught you how to run your first Selenium WebDriver test. In this blog, I will be covering advanced Selenium WebDriver concepts. I have mentioned quite a few times already that Selenium WebDriver has limitations with respect to test case management and test report generation. So, what is the alternative? A tool as popular as Selenium must definitely have a workaround right? Of course it does! We can use a combination of Selenium and TestNG to beat this limitation and that will be the topic of this blog’s discussion.

In case, you are new to Selenium, and want an introduction to the basic concepts, you can start your journey from here: What Is Selenium? However, the others can get started with TestNG for Selenium from this blog.

Software developers from around the world will unanimously agree that writing code in test cases saves a good part of their debugging time. Why? That is because test cases help in creating robust and error-free code. How does it do that? By breaking the entire code into smaller test cases, and then by evaluating each of these test cases to pass/ fail conditions, we can create error-free code. Since Selenium does not support execution of code in test cases, we have to use TestNG for the same. This is where TestNG fits in the Selenium framework.

TestNG stands for Test Next Generation and it is an open-source test automation framework inspired by JUnit and NUnit. Well, not just inspired, but an upgrade to those two frameworks. So you may ask what is the upgrade here? The upgrade with TestNG is that, it provides additional functionality like: test annotations, grouping, prioritization, parameterization and sequencing techniques in the code which was not possible earlier.

Besides managing test cases, even detailed reports of tests can be obtained by using TestNG. There will be a summary displaying the test case that has failed, along with the group which it was a part of, and the class it falls under. When bugs can be accurately located like this, they can be fixed immediately to the relief of developers. The below image depicts the working of TestNG.

testng - selenium webdriver

So, how does TestNG get the job done? This question will be answered in the next section of this Selenium WebDriver tutorial blog, where I will be discussing how to manage various test cases by using TestNG.

Learn Selenium From Experts

Selenium WebDriver With TestNG

Test cases can be defined and managed by one of the following ways:

Let me start explaining each of these functionalities.

Test Annotations

First of all, let’s ask ourselves this question: Why do we need to use annotations? When can we use them? Annotations in Selenium are used to control the next method to be executed. Test annotations are defined before every method in the test code. In case any method is not prefixed with annotations, then that method will be ignored and not be executed as part of the test code. To define them, methods need to be simply annotated with ‘@Test‘. Look at the below code snippet for example.

package testng;

import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.testng.annotations.AfterClass;
import org.testng.annotations.AfterMethod;
import org.testng.annotations.BeforeClass;
import org.testng.annotations.BeforeMethod;
import org.testng.annotations.Test;

public class TestAnnotations {
 @Test
 public void myTestMethod() {
 System.out.println("Inside method:- myTestMethod");
 WebDriver driver = new FirefoxDriver();
 driver.get("http://www.seleniumframework.com/Practiceform/");
 String title = driver.getTitle();
 System.out.println(title);
 driver.quit();
 }

 @BeforeMethod
 public void beforeMethod() {
 System.out.println("This piece of code is executed before method:- myTestMethod");
 System.setProperty("webdriver.gecko.driver", "C:\\Users\\Vardhan\\workspace\\SeleniumProject\\files\\geckodriver.exe");
 }

 @AfterMethod
 public void afterMethod() {
 System.out.println("This piece of code is executed after method:- myTestMethod");
 }

 @BeforeClass
 public void beforeClass() {
 System.out.println("This piece of code is executed before the class is executed");
 }

 @AfterClass
 public void afterClass() {
 System.out.println("This piece of code is executed after the class is executed");
 }
}

In the above code, you would have noticed that I have not defined a ‘main’ method. However, I have 5 other methods defined. They are ‘myTestMethod’, ‘beforeMethod’, ‘afterMethod’, ‘beforeClass’ and ‘afterClass’. Also, note the order of definition of methods in the code because they will not be executed in this same order.

The method ‘myTestMethod’ is annotated with @Test, and it is the main method or piece of code which has to be executed. Other annotated methods will be executed before and after this method is executed. Since ‘beforeMethod’ is annotated with @BeforeMethod, it will be executed before ‘myTestMethod’ is executed. Similarly, ‘afterMethod’ is annotated with @AfterMethod, and thus it will be executed after ‘myTestMethod’.

However, ‘beforeClass’ is annotated with @BeforeClass, which means it will be executed even before the class itself is executed. Our class name here is TestAnnotations, and thus before the class starts getting executed, the piece of code inside ‘beforeClass’ will be executed. Similarly, ‘afterClass’ is annotated with @AfterMethod, and thus will be executed after the class TestAnnotations is executed.

If you still have confusion regarding the order of execution, then the below snippet will definitely help you.

1. BeforeSuite
2. BeforeTest
3. BeforeClass
4. BeforeMethod
5. Test
6. AfterMethod
7. AfterClass
8. AfterTest
9. AfterSuite

The output of the above code will be:

This piece of code is executed before the class is executed
This piece of code is executed before method:- myTestMethod
Inside method:- myTestMethod
1493192682118 geckodriver INFO Listening on 127.0.0.1:13676
1493192682713 mozprofile::profile INFO Using profile path C:\Users\Vardhan\AppData\Local\Temp\rust_mozprofile.wGkcwvwXkl2y
1493192682729 geckodriver::marionette INFO Starting browser C:\Program Files (x86)\Mozilla Firefox\firefox.exe
1493192682729 geckodriver::marionette INFO Connecting to Marionette on localhost:59792
[GPU 6152] WARNING: pipe error: 109: file c:/builds/moz2_slave/m-rel-w32-00000000000000000000/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
1493192688316 Marionette INFO Listening on port 59792
Apr 26, 2017 1:14:49 PM org.openqa.selenium.remote.ProtocolHandshake createSession
INFO: Detected dialect: W3C
JavaScript error: http://t.dtscout.com/i/?l=http%3A%2F%2Fwww.seleniumframework.com%2FPracticeform%2F&j=, line 1: TypeError: document.getElementsByTagName(...)[0] is undefined
Selenium Framework | Practiceform
1493192695134 Marionette INFO New connections will no longer be accepted
Apr 26, 2017 1:14:57 PM org.openqa.selenium.os.UnixProcess destroy
SEVERE: Unable to kill process with PID 6724
This piece of code is executed after method:- myTestMethod
This piece of code is executed after the class is executed
PASSED: myTestMethod

===============================================
 Default test
 Tests run: 1, Failures: 0, Skips: 0
===============================================


===============================================
Default suite
Total tests run: 1, Failures: 0, Skips: 0
===============================================

As you can see from the above output, the number of tests run is 1 and failed is 0. This means that the code is successful. Even the order of execution of methods will be in the order I mentioned earlier.

When you execute this code in your machine, Selenium WebDriver will instantiate your Firefox browser, navigate to Selenium Framework’s practice form, close the browser instance and display the same output as shown above in your Eclipse IDE.

I have only used 5 different annotations in my code. But there are many more annotations which can be used to control the next method to be executed. The entire list of annotations are explained in the table below:

@BeforeSuite – The method annotated with @BeforeSuite will run before all the tests in the suite have run.

@AfterSuite – The method annotated with @AfterSuite will run after all the tests in the suite have run.

@BeforeTest – The method annotated with @BeforeTest will run before any test method belonging to a class is run.

@AfterTest – The method annotated with @AfterTest will run after all the test methods belonging to a class have run.

@BeforeGroup – The method annotated with @BeforeGroup will run before each group is run.

@AfterGroup – The method annotated with @AfterGroup will run after every group is run.

@BeforeClass – The method annotated with @BeforeClass will run once before the first test method in the current class is invoked.

@AfterClass – The method annotated with @AfterClass will run once after all the test methods in the current class have run.

@BeforeMethod – The method annotated with @BeforeMethod will run before any test method inside a class is run.

@AfterMethod – The method annotated with @AfterMethod will run after every test method inside a class is run.

@Test – The method annotated with @Test is the main test method in the entire program. Other annotated methods will be executed around this method.

The screenshot of the TestNG report is present below:-

Prioritization

We spoke about how different methods that can be defined such that they are executed around the @Test method. But, what if you have more than one @Test method and you want to define the execution order between them?

In that case, we can Prioritize them by assigning a number to the annotated test cases. Smaller the number, higher the priority. Priority can be assigned as parameters while defining the test cases. But, if no priority is assigned, then the annotated test methods will be executed as per the alphabetical order of the tests. Look at the parameters of the test annotations in the below piece of code.

@Test(Priority=2)
public static void FirstTest()
{
system.out.println("This is the Test Case number Two because of Priority #2");
}

@Test(Priority=1)
public static void SecondTest()
{
system.out.println("This is the Test Case number One because of Priority #1");
}

@Test
public static void FinalTest()
{
system.out.println("This is the Final Test Case because there is no Priority");
}

Disabling Test Cases

Let me show you something more interesting. What if you have a code spanning a million lines, consisting of hundreds of test cases, and you want to only disable one test method? You don’t need to delete any part of the code, instead, we can simply disable that test method.

The act of disabling a test case is also done via parameters. We can set the enabled attribute to ‘false’. By default, all test cases will be enabled, hence we do not need to define them every time we write a test. Look at the parameters of the third and fourth methods in the below piece of code.

@Test(Priority=2, enabled = True)
public static void FirstTest()
{
system.out.println("This is the Test Case number Two because of Priority #2");
}

@Test(Priority=1, enabled = True)
public static void SecondTest()
{
system.out.println("This is the Test Case number One because of Priority #1");
}

@Test(enabled = false)
public static void SkippedTest()
{
system.out.println("This is the Skipped Test Case because this has been disabled");
}

@Test(enabled = True)
public static void FinalTest()
{
system.out.println("This is the Final Test Case, which is enabled and has no Priority");
}

Method Dependency

Now in case you have a situation wherein, you want a piece of code to be executed only if it satisfies a condition or only if a particular method executes successfully, then we can do that by using dependsOnMethod(). This is basically a condition of method dependency where a method will be executed depending on another method. If we additionally set alwaysRun attribute to true, then the method will be executed irrespective of the fail/ pass condition of the depending method. Look at the code in the below code snippet.

@Test
public static void FirstTest()
{
system.out.println("This is the first Test Case to be executed");
}

@Test(dependsOnMethods = { "FirstTest" })
public static void SecondTest()
{
system.out.println("This is the second Test Case to be executed; This is a Dependent method");
}

@Test(dependsOnMethods = { "SecondTest" })
public static void FinalTest()
{
system.out.println("This is the Final Test Case; It will be executed anyway.");
}

Now, this takes us to another important aspect in test annotations which is Grouping.

Grouping

By now you must know that there will be a number of methods as part of our test case in the code. Let’s say there are 100 test cases but, we want to execute only 20 test cases in our next test. Do you think we can do that? Sure we can.

We can use groups attribute for this purpose. We can assign a group name to a number of test cases and later choose to execute the group instead of the entire code. Look at the below code snippet to understand how to create groups.

@Test(groups = { "MyGroup" })
public static void FirstTest()
{
system.out.println("This is a part of the Group: MyGroup");
}

@Test(groups = { "MyGroup" })
public static void SecondTest()
{
system.out.println("This is also a part of the Group: MyGroup");
}

@Test
public static void ThirdTest()
{
system.out.println("But, this is not a part of the Group: MyGroup");
}

TestNG Assertions

This now takes us to the next topic in TestNG which is assertions. As the name suggests, assertions can be used in test methods to determine the pass/ fail condition a test. Based on the true/ false condition of a statement, the tests will pass/ fail.

In the code below I have included 3 test methods, wherein the first and third methods have a pass condition and the second method will have a fail condition. See the code for yourself.

package testng;

import org.testng.annotations.Test;
import org.testng.annotations.BeforeMethod;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.testng.Assert;
import org.testng.annotations.AfterMethod;

public class Assertions {
 @BeforeMethod
 public void beforeMethod() {
 System.setProperty("webdriver.gecko.driver",
 "C:\\Users\\Vardhan\\workspace\\SeleniumProject\\files\\geckodriver.exe");
 }

 public boolean isEqual(int a, int b) {
 if (a == b) {
 return true;
 } else {
 return false;
 }
 }

 @Test
 public void testEquality1() {
 Assert.assertEquals(true, isEqual(10, 10));
 System.out.println("This is a pass condition");
 }

 @Test
 public void testEquality2() {
 Assert.assertEquals(true, isEqual(10, 11));
 System.out.println("This is a fail condition");
 }

 @Test
 public void getTitle() {
 WebDriver driver = new FirefoxDriver();
 driver.get("https://www.gmail.com");
 String title = driver.getTitle();
 Assert.assertEquals(title, "Gmail");
 System.out.println("This is again a pass condition");
 }
}

When you look at the report that gets generated after this execution, then you will notice that out of the three tests, one failed and two passed. Another important point to note is that when an assertion fails, other commands/ lines of code in that test will be skipped. Only when the assertion is a success, the next line of code will be executed in that test. Check out the output below where system.out.println has executed only for the first and third methods.

1493277977348 geckodriver INFO Listening on 127.0.0.1:47035
1493277977993 mozprofile::profile INFO Using profile path C:\Users\Vardhan\AppData\Local\Temp\rust_mozprofile.Z7X9uFdKODvi
1493277977994 geckodriver::marionette INFO Starting browser C:\Program Files (x86)\Mozilla Firefox\firefox.exe
1493277977998 geckodriver::marionette INFO Connecting to Marionette on localhost:50758
[GPU 6920] WARNING: pipe error: 109: file c:/builds/moz2_slave/m-rel-w32-00000000000000000000/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
1493277981742 Marionette INFO Listening on port 50758
Apr 27, 2017 12:56:22 PM org.openqa.selenium.remote.ProtocolHandshake createSession
INFO: Detected dialect: W3C
This is again a pass condition
This is a pass condition
PASSED: getTitle
PASSED: testEquality1
FAILED: testEquality2
java.lang.AssertionError: expected [false] but found [true]
 at org.testng.Assert.fail(Assert.java:93)
 at org.testng.Assert.failNotEquals(Assert.java:512)
 at org.testng.Assert.assertEqualsImpl(Assert.java:134)
 at org.testng.Assert.assertEquals(Assert.java:115)
 at org.testng.Assert.assertEquals(Assert.java:304)
 at org.testng.Assert.assertEquals(Assert.java:314)
 at testng.Assertions.testEquality2(Assertions.java:38)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
 at java.lang.reflect.Method.invoke(Unknown Source)
 at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:108)
 at org.testng.internal.Invoker.invokeMethod(Invoker.java:661)
 at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:869)
 at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1193)
 at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:126)
 at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:109)
 at org.testng.TestRunner.privateRun(TestRunner.java:744)
 at org.testng.TestRunner.run(TestRunner.java:602)
 at org.testng.SuiteRunner.runTest(SuiteRunner.java:380)
 at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:375)
 at org.testng.SuiteRunner.privateRun(SuiteRunner.java:340)
 at org.testng.SuiteRunner.run(SuiteRunner.java:289)
 at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
 at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
 at org.testng.TestNG.runSuitesSequentially(TestNG.java:1301)
 at org.testng.TestNG.runSuitesLocally(TestNG.java:1226)
 at org.testng.TestNG.runSuites(TestNG.java:1144)
 at org.testng.TestNG.run(TestNG.java:1115)
 at org.testng.remote.AbstractRemoteTestNG.run(AbstractRemoteTestNG.java:132)
 at org.testng.remote.RemoteTestNG.initAndRun(RemoteTestNG.java:230)
 at org.testng.remote.RemoteTestNG.main(RemoteTestNG.java:76)


===============================================
 Default test
 Tests run: 3, Failures: 1, Skips: 0
===============================================


===============================================
Default suite
Total tests run: 3, Failures: 1, Skips: 0
===============================================

So, that is the end of the concepts related to test case management. We are left with one more topic, and that is report generation. Report generation is the last topic in this Selenium WebDriver tutorial because reports can be generated only after all the tests are executed.

Report Generation

The most important thing you need to note is that the report will only be generated via a .xml file. This means, be it a method, or be it a class, or be it a group which you want to test, they all have to be specified in the .xml file.

So first you can create a new folder under your project, and create a new file inside that folder and give a name to the file and save it with .xml extension. You can create the new folder and file by right-clicking on the package explorer. Once you have created the file, go to the source tab from the bottom of the window and enter the configurations as specified in the below snippet.

<!DOCTYPE suite SYSTEM "http://testng.org/testng-1.0.dtd" >
<suite name="TestNGs">
 <test name="Test Annotations">
  <classes>
   <class name="testng.TestAnnotations">
   </class>
  </classes>
 </test>
</suite>

The first line is the XML document type definition. This is standard and compulsory for all test reports. But, the other lines are pretty self explanatory. I have used the open tags for suite, test, classes and class. Classes tag can have one or more class inside it. Thus, it can be used if we want to generate a report where we are testing multiple classes. This is handy especially for developers who want to test a long piece of code.

Anyway getting back to our report, you can name each suite or test or class after opening those tags and remember to close every tag you open. I have given my suite name as TestNGs, test name as Test Annotations and class name as testng.TestAnnotations. Do note that the class name is in the format of ‘packagename.classname’.

When you run this file as TestNG suite, the execution will start and you will get the detailed test reports. You will get the test output in your console tab and the result of the test suite in the next tab. The report that I have generated for executing my code is in the below screenshot. You will notice that this time, there is a suite name, test name, class name along with the time taken for executing each of them.

test report - selenium webdriver

In case you want to view the HTML report (Index report or Emailable-report), you can go to the test-output folder inside the project directory in your workspace. By clicking on them, you can view the reports even at a later point of time. Below are their screenshots.

Index Report:-

Emailable Report:-

So that brings us to the end of this Selenium WebDriver tutorial blog. It is time for you to setup eclipse at your end, install the various Selenium packages, install TestNG and get started with writing your test cases.

You can check out the below Selenium WebDriver tutorial video to witness a demonstration of the various concepts explained in this blog.

Selenium Training | TestNG Framework For Selenium | Edureka

This Edureka Selenium Training video will take you through the in-depth details of Selenium WebDriver. This Selenium tutorial video is ideal for both beginners and professionals who want to brush up the basics of WebDriver commands and learn how TestNG can be used with Selenium for managing various test cases.

Learn Selenium From Experts

Got a question for us? Please mention it in the comments section and we will get back to you.

The post Selenium WebDriver: TestNG For Test Case Management & Report Generation appeared first on Edureka Blog.

In my previous blogs in this Selenium tutorial series, I have discussed the need for automation testing and how Selenium WebDriver is the best tool available in the market. However, in this blog, I am going to talk about another tool in the Selenium suite: Selenium IDE. If you have missed out on reading any of the previous blogs, I urge you to start reading from here: What is Selenium.

IDE in Selenium IDE stands for Integrated Development Environment. While WebDriver and RC allows us to write test cases via programming logic, Selenium IDE works a little different. You can simply use the interactions you have with your browser to write test cases. This sounds simple right? Yes!

Unlike WebDriver and RC, you don’t need to use programming logic. You can simply record the actions you perform on your browser, and use the playback option to re-run tests (i.e your actions). That is how simple it is. But, there is a catch here. Selenium IDE is a Mozilla Firefox plugin, which means tests can be recorded only on Firefox browser and not on any other browser.

The test cases which are recorded on Selenium IDE can be exported to other programming languages. They can be exported to either C#, Java, Ruby or Python. But, don’t mistake this tool’s simplicity for a full-fledged testing tool because it is not meant for complex test cases or test suites. This is just a prototyping tool and works well only with static web pages. Testing dynamic (real-time) web pages brings about a lot of problems associated with it.

Before I go any further, let me show you how the IDE interface looks like.

I have highlighted certain functionalities in the above screenshot. They are:

These functionalities provide various options while working with test cases. Let us deep dive into each of the functionalities.

Learn Selenium From Experts

Selenium IDE – Menu Bar

The menu bar consists of the following tabs: File, Edit, Actions, Favorites, Options and Help.

File: This tab gives us options to open, save, export and create new test cases and test suites. Tests are saved in HTML format by default and they can be exported to either .cs(C#), .java(java), .py(python) or .rb(ruby) formats. Once exported, they can run on Selenium RC and Selenium WebDriver.

Edit: Under this tab, we have options to copy, paste, delete, undo and select all operations for editing test cases and test suites. Along with these, we also have options to insert new commands and new comments. These two options come in handy when we want to manually add a step in the test case.

Actions: Under the actions tab, you can find options to either execute single test cases or a group of test cases in the form of test suites. In fact, it also has an option to execute single commands inside a test case and set breakpoints inside a test case.

Options: This tab provides us further options to change IDE settings. There are two important categories under this tab, they are Options and Clipboard format.

Advanced IDE settings can be found under Options => Options. Though there are many settings here, we will concentrate only on a few important one’s.
options tab - selenium ide

Under the General tab, we can set the timeout value for locating elements on the page. If the said time is exceeded before locating elements, then an error will be thrown and the test case will fail. Default value is 30000ms.

Selenium IDE extensions can be used to extend the capabilities of the IDE.

If the Remember Base URL checkbox is checked, then the IDE will launch with the specified URL. Or else IDE will launch with blank URL.

If the Record assertTitle automatically is checked, then every time a new tab is fetched, the title of the page will be returned.

Start recording automatically on open makes IDE record browser interactions automatically upon startup.

clipboard - selenium ide

Under the Locator Builders tab, you can set the order of preference for element attributes. The attribute with the greatest preference will be used to locate elements by default. If that attribute is not specified for an element, then the attribute with the next preference will be used.

Clipboard format will be found under Options => Clipboard format. Under Clipboard format options, you can choose the programming language into which you want to copy your Selenese command to. A single Selenese command which is a part of a test case can be copied to your preferred IDE.

By default, it will be extracted in HTML format. The other formats we can extract it to, are visible on the screenshot on the side.

Selenium IDE – Address Bar

The address bar consists of Base URL, which is the URL that will be fetched when Selenium IDE is launched. Besides that, the address bar also has a drop-down list, where the previously visited websites will be listed down for easy access.

Selenium IDE – Tool Bar

The tool bar in Selenium IDE has the following options:

You can use the speed control option to control the speed in which the test cases will be executed.
Run button is used to run the currently selected test case
Run all button is used to execute all the test cases in the test suite
Pause/ resume button allows starting and re-starting of a test case
Step is used to “Step” into a test case by running commands one at a time
Rollup is used to repeat a sequence of Selenium commands
Record button records the user’s browser actions thus generating a test case

Selenium IDE – Test Case Pane

The test case pane will contain the list of test cases that you have recorded. You can open more than one test case at a time and when you open a test suite, all the test cases contained in that suite will be listed down in the test case pane. Each test case will contain multiple commands. Below the test case pane, you can see the pass/ fail status of the various test cases.

Selenium IDE – Test Script Edit Box

Every user interaction recorded in a test case will be stored as commands in the editor box. The editor box is divided into 3 columns: Command, Target and Value.

Command is the actual operation/ action that is performed on the browser elements. For Example, if you are opening a new URL, the command will be ‘open’, if you are clicking on a link or button on the web page, then the command will be ‘click’. There will be a dropdown list from which you can choose any command of your choice.
Target is the web element on which the operation has to be performed along with a locator attribute. If you are clicking on a button called ‘Selenium Course’, then the target will be ‘link=Selenium Course’.
Value is an optional field and it is used when we need to send some parameters. If you are entering the email address or password in a textbox, then the value will contain the actual credentials.

The HTML equivalent of these commands can be viewed in source tab. Similar to table, your script can be edited from the Table view.

Selenium IDE – Log/ Reference/ UI-Element/ Rollup Pane

The Log pane displays the runtime messages during execution. It provides real-time updates of the action which Selenium IDE is performing. Log messages can be categorized into four types: info, error, debug and warn. The log messages here will be displayed along with the category it belongs to.

The Reference pane shows a concise description of the currently selected Selenese command in the Editor. It also shows the description of the locator and value to be used on that command.

The UI-Element uses JavaScript Object Notation (JSON) to define element mappings. The documentation and resources are found in the “UI Element Documentation” option under the Help menu of Selenium IDE.

Rollup allows you to execute a group of commands in one step. Rollups are reusable; meaning, they can be used multiple times within the test case. Since rollups are groups of commands combined into one, they contribute a lot in shortening your test script.

Selenium IDE – Running Your First Test Case

Now that you have a fair idea of the various components in IDE and their functionalities, let’s get started with our first test.

Once you have added the IDE plugin to Firefox, you can launch the IDE by clicking on the IDE button on the toolbar of Firefox. You can then click on the record button and start running your test cases by performing actions on the browser. The actions that you perform will be stored as commands in the editor box. You can extract the entire test case or even a single command from the edit box into the programming language of your preference. For understanding the concepts see the below screenshot.

ide interface - selenium ide

The above screenshot is that of the Selenium IDE plugin. The commands and logs you see in the image are a result of recording the following interactions on the browser:

Navigated to www.edureka.co
Clicked on the button: Browse Courses
Scrolled down the page till Selenium 3.0 Certification Training appears
Clicked on the link to the course: Selenium 3.0 Certification Training

I have named my test case as first_IDE_Test. If you look at the log pane, the final log displayed says that the Test case passed. Below the test case pane, you can see the results of the test case. I have run only one test and it has passed.

An important point to note is that, the browser actions which are recorded, will not always execute successfully when we run it later. That is because of the dynamics of the page. This is the same issue which I mentioned earlier that, only static web pages can be tested.

Scrolling down a page is an example of the action that will not be recorded in the IDE editor box. Hence, when we run the recorded test, IDE will not scroll down the page for locating the web element, resulting in the test case failing. Another example is that of page load timeout. It is a challenge to synchronize the Selenium IDE speed to the speed of the web page. This is the reason, IDE is not the preferred tool for testing.

If you try running the same test on Edureka’s home page, which I showed earlier in the blog, the test will most likely fail. The workaround is to manually add commands in the test script editor to synchronize the IDE speed and page speed. I have manually added one command and modified another command in the editor.

One of them is to scroll down the page. The command is:

command: storeEval
target: selenium.browserbot.getCurrentWindow().scrollTo(0,20000)
value: <blank>

The other command I modified is for clicking on the link.

command: clickAndWait
target: link=Selenium 3.0 Certification Training
value: <blank>

So, that brings us to the end of this blog. If you are looking for a video tutorial on Selenium IDE, then you can watch the video below which is delivered by an industry expert.

Selenium IDE Tutorial For Beginners | Selenium Tutorial | Edureka

This Edureka Selenium IDE tutorial will give you an introduction to Selenium IDE and talk about its features. By the end of this video you will know how to record and playback test cases using Selenium IDE.

I urge you to install Selenium IDE and play around with the tool and if you encounter any problem while working with the tool, put those queries in the comment section below and we will get back to you at the earliest.

Learn Selenium From Experts

The post Selenium IDE: Record And Playback Test Cases With Selenium IDE appeared first on Edureka Blog.

Splunk Events

Splunk Event Types

What are the Various Informatica Transformations?

Aggregator Transformation

Lookup Transformation

Expression Transformation

Joiner Transformation

Union Transformation

Normalizer Transformation

XML transformation

Rank Transformation

Router Transformation

Introduction

Building Applications

Who is eligible for a free tier?

How do you sign up on AWS?

Demo

Why Spark SQL Came Into Picture?

Limitations With Hive:

Spark SQL Overview

Spark SQL Libraries

Data Source API (Application Programming Interface):

DataFrame API:

SQL Interpreter And Optimizer:

SQL Service

Features Of Spark SQL

Uniform Data Access DataFrames and SQL support a common way to access a variety of data sources, like Hive, Avro, Parquet, ORC, JSON, and JDBC. This joins the data across these sources. This is very helpful to accommodate all the existing users into Spark SQL.

Hive Compatibility

Standard Connectivity Connection is through JDBC or ODBC. JDBC and ODBC are the industry norms for connectivity for business intelligence tools.

User Defined Functions Spark SQL has language integrated User-Defined Functions (UDFs). UDF is a feature of Spark SQL to define new Column-based functions that extend the vocabulary of Spark SQL’s DSL for transforming Datasets. UDFs are black boxes in their execution.

Querying Using Spark SQL

Creating Datasets

Adding Schema To RDDs

RDDs As Relations

Specifying Schema

Caching Tables In-Memory

Loading Data Programmatically

JSON Datasets

Hive Tables

Splunk Lookup

Splunk Fields

Why Do We Need ETL?

Informatica ETL | Informatica Architecture | Informatica PowerCenter Tutorial | Edureka

Steps in Informatica ETL Process:

Features of Informatica ETL:

Use Case: Joining Two tables to obtain a Single detailed Table

How It All Began?

What Made Salesforce An Instant Hit?

Why Salesforce?

Statistics which make you choose Salesforce

What Is Salesforce?

What Are The Services And Products That Salesforce Offers?

The Cloud Services That Are Offered By Salesforce Are:

Other Services That Are Offered By Salesforce Are:

Which Companies Use Salesforce?

Salesforce Use Case

Salesforce Tutorial

Salesforce Org

Salesforce Apps

Steps To Setup The App

Salesforce Tabs

Steps To Add Tabs

Steps To Create Custom Tabs

Salesforce Profiles

Steps To Create A Profile

Objects, Fields And Records In Salesforce

Steps To Add Custom Fields

Steps To Add A Record

Data Types Of Fields

Object Relationship In Salesforce

Master-Detail Relationship (1:n)

Lookup Relationship (1:n)

Self-Relationship

Junction Relationship (Many-To-Many)

Let’s Understand Why We Need Data Science

What is Data Science?

Business Intelligence (BI) vs. Data Science

Lifecycle of Data Science

Case Study: Diabetes Prevention

Why Salesforce Certifications?

Uniform Data Access
DataFrames and SQL support a common way to access a variety of data sources, like Hive, Avro, Parquet, ORC, JSON, and JDBC. This joins the data across these sources. This is very helpful to accommodate all the existing users into Spark SQL.

Standard Connectivity
Connection is through JDBC or ODBC. JDBC and ODBC are the industry norms for connectivity for business intelligence tools.

User Defined Functions
Spark SQL has language integrated User-Defined Functions (UDFs). UDF is a feature of Spark SQL to define new Column-based functions that extend the vocabulary of Spark SQL’s DSL for transforming Datasets. UDFs are black boxes in their execution.