How to Handle Multiple Windows in Selenium?

April 15, 2019, 9:18 pm

≫ Next: KNN Algorithm: A Practical Implementation Of KNN Algorithm In R

≪ Previous: Top 40 Test Automation Interview Questions You Need To Know In 2019

In this rapidly developing world of technology, one must always find ways to keep up with the time. As the world is evolving towards software development, testing plays a vital role in making the process defect free. Selenium is one such tool that helps in finding bugs and resolving them. This article on How to handle multiple windows in Selenium gives you an idea about how you can handle multiple windows while testing an application.

So, we’ll first begin by taking a look at the topics that we’ll be covering in this article:

- Pre-requisites

How to handle multiple windows?

- Syntax
Demo

So, before we get to know more about handling multiple windows in Selenium simultaneously, we’ll see what Selenium Webdriver is and learn how it functions.

Handling Multiple Windows in Selenium: What is Selenium Webdriver?

I bet most of the software testers out there must have worked with this amazing tool which makes their lives easier, and for those who don’t know what is Selenium, I’ll help you take your first baby steps in exploring this tool.

Selenium is an automation tool whose sole purpose is to test web applications developed by any organization. Now, what’s so fascinating about this tool which makes it popular?

Selenium is an open source portable framework which helps in automating the testing of web applications.
Test scripts in Selenium can be written in any programming language of your choice like Java, Python, C# and many more.
Selenium has its own set of commands called Selenese, which holds the series of commands.
It can run across various web browsers like Chrome, Safari, Firefox, IE and many more.
Selenium Webdriver interacts directly with the browser rather than interacting with the server.
It is fast and efficient.
Selenium is highly flexible when it comes to functional testing and regression testing.
Another interesting factor, Selenium makes use of element locators which help in finding the element on a web page. They are: class, name, XPath, ID, LinkText, DOM, Partial LinkText and CSS Selectors.
It also uses plugins like Maven, JUnit to test the applications more fast and effectively.

Now let’s see how we can test an application using Selenium.

Handling Multiple Windows in Selenium: How to Run a Test Case in Selenium?

To test any application in Selenium, we follow certain procedures which eventually helps in performing the desired tasks.

The basic pre-requisites to run a test-case are:

Pre-requisites to Run Selenium Test-Case

The first thing we require is to choose the right programming language to write the test scripts. As Java is one of the most simple and easy to understand language, we are going to add the JRE libraries to the project.
We require an IDE where we can run the test scripts. There is NetBeans, Eclipse and many other IDEs but we prefer working on Eclipse IDE because it is highly effective while executing the Java projects.
Some Selenium plugins like the Selenium standalone server, Selenium jar files, and Selenium IDE.
Browser drivers for different browsers like Chrome, IE, Firefox, etc.

Check out this article on how to set up Selenium to get an end to end guidance on the installation process.

So, in order to test an application, we need to know the basics of the programming language, which is Java in our case. So first,

Initialize the webdriver and create an object of the same.
Instantiate the browser driver to the new ChromeDriver (in this case, we are working on ChromeDriver, you can choose Firefox or IE) and specify the path in which the ChromeDriver exists followed by the extension of an executable file.
Get the URL of the particular web page which you want to test
Find the element using one of the element locators in Selenium

Take a look at this video to get understand each and every nuance about testing an application.

How to Write & Run a Test Case in Selenium | Edureka

So, once we know how to run a test case in Selenium, we’ll try altering the process by working across multiple windows simultaneously. To do that we’ll see a few functions which help us doing it.

Handling Multiple Windows in Selenium: How to Handle Multiple Windows?

Now, what is a window handle function? How to handle multiple windows in Selenium while testing an application? Well, this will answer all your questions.

What is a window handle?

A window handle is a unique identifier that holds the address of all the windows. This is basically a pointer to a window, which returns the string value. This window handle function helps in getting the handles of all the windows. It is guaranteed that each browser will have a unique window handle.

Syntax

get.windowhandle(): helps in getting the window handle of the current window
get.windowhandles(): helps in getting the handles of all the windows opened
set: helps to set the window handles which is in the form of a string. set<string> set= driver.get.windowhandles()
switch to: helps in switching between the windows
action: helps to perform certain actions on the windows.

These are a few new functions we’ll be seeing in this demo. Apart from these, the rest of the functions help in automating the process.

How to Handle Multiple Windows in Selenium Webdriver | Edureka

Handling Multiple Windows in Selenium: Demo

In this demo section, we’ll automate a few websites and check how to handle multiple windows.

We’ll automate pages such as,

ToolsQA website for testing the window handle function
We’ll also automate our official website edureka.co and perform some actions
Naukri.com which is one of the most popular online job portals

Now, we’ll first start by testing the webpage ToolsQA

We will first specify the browser driver on which we are going to work and specify the path in which it is located using this command: System.setProperty(“webdriver.chrome.driver”, “D:\\\\chromedriver.exe”);
Instantiate the webdriver to the new chromedriver.
Get the URL of the web page that we want to test.
Inspect the element and find it on the webpage using the element locators, in this case, it is the ID.
After this, we need to open multiple child windows. So, I’m going to do this using the for loop.


package selenium;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;

public class demo1 {
public static void main(String[] args) throws InterruptedException
{
System.setProperty("webdriver.chrome.driver", "D:\\\\chromedriver.exe");
WebDriver driver = new ChromeDriver();
driver.get("http://toolsqa.com/automation-practice-switch-windows/");
WebElement clickElement = driver.findElement(By.id("button1"));

for(int i = 0; i < 3; i++)
{
clickElement.click();
Thread.sleep(3000);
}

}
}

Next step, we’ll add on a few more commands and notice the change in the execution.

This is almost similar to the first project except that we need to print the window handle of the parent and the child windows.
Get the handle of the parent window using the command: String parentWindowHandle = driver.getWindowHandle();
Print the window handle of the parent window.
Find the element on the web page using an ID which is an element locator.
Open multiple child windows.
Iterate through child windows.
Pause the execution for a few seconds using the sleep command Thread.sleep(3000) where the time is specified in nanoseconds.
Get the handles of all the windows that are currently open using the command: Set<String> allWindowHandles = driver.getWindowHandles(); which returns the set of handles.
Print the handles of all the windows.


package selenium;

import java.util.Set;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;

public class demo2 {
public static void main(String[] args) throws InterruptedException
{

System.setProperty("webdriver.chrome.driver", "D:\\\\chromedriver.exe");
WebDriver driver = new ChromeDriver();
driver.get("http://toolsqa.com/automation-practice-switch-windows/");
String parentWindowHandle = driver.getWindowHandle();
System.out.println("Parent window's handle -> " + parentWindowHandle);
WebElement clickElement = driver.findElement(By.id("button1"));

for(int i = 0; i < 3; i++)
{
clickElement.click();
Thread.sleep(3000);
}

Set<String> allWindowHandles = driver.getWindowHandles();

for(String handle : allWindowHandles)
{
System.out.println("Window handle - > " + handle);
}

}

}

Now, we’ll customize the web page by adding a few more commands to the above program.

In this program, we’ll test the same web page ToolsQA and pass another URL to the parent window.
Instantiate the browser driver to the new Chromedriver.
Get the window handle of the parent window and print it.
Find the element on the page using an ID which is an element locator.
Use for loop to iterate the number of child windows being created.
Get the handles of all the windows opened.
Print the window handle of the first window.
Use the SwitchTo command to switch to the desired window and also pass the URL of the web page “google.com” to the current window.


package selenium;

import java.util.Set;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;

public class demo4 {

public static void main(String[] args) throws InterruptedException
{
System.setProperty("webdriver.chrome.driver", "D:\\\\chromedriver.exe");
WebDriver driver = new ChromeDriver();
driver.get("http://toolsqa.com/automation-practice-switch-windows/");
String parentWindowHandle = driver.getWindowHandle();
System.out.println("Parent window's handle -> " + parentWindowHandle);
WebElement clickElement = driver.findElement(By.id("button1"));

for(int i = 0; i < 3; i++)
{
clickElement.click();
Thread.sleep(3000);
}

Set<String> allWindowHandles = driver.getWindowHandles();

for(String handle : allWindowHandles)
{
System.out.println("Switching to window - > " + handle);
System.out.println("Navigating to google.com");
driver.switchTo().window(handle); //Switch to the desired window first and then execute commands using driver
driver.get("http://google.com");
}

}
}

After this, we’ll see how we can alter the child window without a change in the parent window.

The process is almost similar to the previous program but just that after passing the URL, google.com, we will be switching to the parent window and close it.
After this, we will define the window handle of the last window and switch to this window and pass the URL of the ToolsQA page. So that this URL opens only in the last window and the other two child windows will still be showing google.com page.


package selenium;

import java.util.Set;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;

public class demo5 {
public static void main(String[] args) throws InterruptedException
{

System.setProperty("webdriver.chrome.driver", "D:\\\\chromedriver.exe");
WebDriver driver = new ChromeDriver();
driver.get("http://toolsqa.com/automation-practice-switch-windows/");
String parentWindowHandle = driver.getWindowHandle();
System.out.println("Parent window's handle -> " + parentWindowHandle);
WebElement clickElement = driver.findElement(By.id("button1"));

for(int i = 0; i < 3; i++)
{
clickElement.click();
Thread.sleep(3000);
}

Set<String> allWindowHandles = driver.getWindowHandles();
String lastWindowHandle = "";
for(String handle : allWindowHandles)
{
System.out.println("Switching to window - > " + handle);
System.out.println("Navigating to google.com");
driver.switchTo().window(handle); //Switch to the desired window first and then execute commands using driver
driver.get("http://google.com");
lastWindowHandle = handle;
}

//Switch to the parent window
driver.switchTo().window(parentWindowHandle);
//close the parent window
driver.close();
//at this point there is no focused window, we have to explicitly switch back to some window.
driver.switchTo().window(lastWindowHandle);
driver.get("http://toolsqa.com");
}
}

Now, we’ll test one of the top job portals Naukri.com

Set the system property to Chromedriver and specify its path
Instantiate the webdriver to the new chromedriver
Get the URL of the web page and maximize the page
Get the window handle of the parent window
Get the window handles of all the windows
Next, we have declared an object of type iterator to use it for switching to the child window and perform actions on it
We check if the main window is equal to the child window or not, if(!mainweb.equals(child)). If the main window is not equal to the child window, condition holds and we switch to the next child window.


package selenium;
import org.testng.annotations.Test;
import java.util.Iterator;
import java.util.Set;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
public class MultipleWindowsClass{
@Test
public void testMultipleWindows() throws InterruptedException{
System.setProperty("webdriver.chrome.driver", "D:\\\\chromedriver.exe");
// To open browser
WebDriver driver = new ChromeDriver();
// To maximize browser
driver.manage().window().maximize();
// To open Naukri website with multiple windows
driver.get("http://www.naukri.com/");
// It will return the parent window name as a String
String mainWindow=driver.getWindowHandle();
// It returns no. of windows opened by WebDriver and will return Set of Strings
Set<String> set =driver.getWindowHandles();
// Using Iterator to iterate with in windows
Iterator<String> itr= set.iterator();
while(itr.hasNext()){
String childWindow=itr.next();
// Compare whether the main windows is not equal to child window. If not equal, we will close.
if(!mainWindow.equals(childWindow)){
driver.switchTo().window(childWindow);
System.out.println(driver.switchTo().window(childWindow).getTitle());
driver.close();
}
}
// This is to switch to the main window
driver.switchTo().window(mainWindow);
}
}

Next, we’ll perform some actions on our official website edureka.co

We’ll initialize the webdriver to the new chromedriver.
Get the URL of the webpage.
We’ll use the JavaScriptExecutor, it is an interface which provides a mechanism to execute Javascript through the Selenium WebDriver.
Get the window handle of the parent window.
Find the element using XPath, which is an element locator and send keys to a particular location on the web page.
Scroll down through the page using the javascriptexecutor command: js.executeScript(“window.scrollBy(X-axis, Y-axis)”);
Get the handles of all the windows and print it.
Next, we have declared an object of type iterator to use it for switching to the child window and perform actions on it.
We check for this condition if(!mainweb.equals(child)), if this condition holds, we switch to the child window and also print the title of it.


package selenium;

import java.util.Iterator;
import java.util.Set;

import org.openqa.selenium.By;
import org.openqa.selenium.JavascriptExecutor;
import org.openqa.selenium.Keys;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;

public class demo3 {

public static void main(String[] args) throws InterruptedException
{
System.setProperty("webdriver.chrome.driver","S:\\\\\\\\chromedriver.exe");
WebDriver driver = new ChromeDriver();
JavascriptExecutor js = (JavascriptExecutor) driver;
driver.get("https://www.edureka.co/community");
String mainweb = driver.getWindowHandle();
driver.findElement(By.xpath("//a[@class='qa-logo-link edureka']")).sendKeys(Keys.SHIFT,Keys.ENTER);
Thread.sleep(100);
js.executeScript("window.scrollBy(0,400)");
Set <String> set = driver.getWindowHandles();
System.out.println(set);
Iterator <String> itr = set.iterator();
while(itr.hasNext())
{
js.executeScript("window.scrollBy(0,400)");
String child = itr.next();
if(!mainweb.equals(child))
{
driver.switchTo().window(child);
System.out.println(driver.switchTo().window(child).getTitle());
// driver.close();
}
}
driver.switchTo().window(mainweb);

}

}

Next, we’ll automate the same webpage by customizing it.

The process is almost similar to the previous one but in this, we print the title of the current page.
Use the javascriptexecutor to scroll down through a page.
Find the element using the element locator XPath and send keys( which is of the form string) to that particular element location.
Declare the web element Link to click on a particular link on the page and in this case, we want the link to open in a new window.
Pause the execution for a few seconds after this.
Get the window handles of all the windows and print them in a sequence.
Switch to the parent window and check if the title matches. If it does, scroll down the page using the javascriptexecutor. Find another element on the web page using the element locator and specify the position of the new window.
Switch back to the parent window and scroll down through the page.

 package selenium; 
import java.util.Set; 
import org.openqa.selenium.By;
import org.openqa.selenium.JavascriptExecutor; 
import org.openqa.selenium.Keys; 
import org.openqa.selenium.Point; 
import org.openqa.selenium.WebDriver; 
import org.openqa.selenium.WebElement; 
import org.openqa.selenium.chrome.ChromeDriver; 
import org.openqa.selenium.interactions.Actions; 
//import org.openqa.selenium.support.ui.Select; 
public class selenium { public static void main(String[] args) throws Exception 
{ 
System.setProperty("webdriver.chrome.driver", "D:\\\\chromedriver.exe"); 
WebDriver driver = new ChromeDriver(); 
driver.get("https://www.edureka.co/"); 
String title = driver.getTitle(); System.out.println(title); 
//driver.get("http://www.google.co"); 
JavascriptExecutor js = (JavascriptExecutor) driver; 
driver.findElement(By.cssSelector("#search-inp")).sendKeys("Selenium Certification Course"); 
js.executeScript("window.scrollBy(0,40)"); 
driver.findElement(By.xpath("//span[@class='typeahead__button']")).click(); 
WebElement link = driver.findElement(By.xpath("//li[@class='ga-allcourses']//a[@class='giTrackElementHeader'][contains(text(),'Courses')]")); 
Actions newwin = new Actions(driver); 
newwin.keyDown(Keys.SHIFT).click(link).keyUp(Keys.SHIFT).build().perform(); 
//Thread.sleep(2000); 
//js.executeScript("window.scrollBy(0,400)"); 
Thread.sleep(3000); 
Set<String> windows = driver.getWindowHandles(); 
System.out.println(windows); 
System.out.println("a1"); 
for (String window : windows) 
{ 
driver.switchTo().window(window); 
if (driver.getTitle().contains("Best Training & Certification Courses for Professionals | Edureka")) 
{ 
System.out.println("a2"); 
js.executeScript("window.scrollBy(0,1000)"); 
System.out.println("b1"); 
driver.findElement(By.xpath("//*[@id=\"allc_catlist\"]/li[3]/a")).click(); 
driver.manage().window().setPosition(new Point(-2000, 0)); 
} 
} 
Thread.sleep(3000); 
Set<String> windows1 = driver.getWindowHandles(); 
System.out.println(windows1); 
System.out.println("a3"); 
for (String window : windows1) 
{ 
driver.switchTo().window(window); 
System.out.println("a4"); 
js.executeScript("window.scrollBy(0,400)"); 
} 
} 
}

Now let’s check the output of the last program.

First, we’ll initialize the browser and get the URL of the web page we want to test and find the search box element on the page and send keys to the searchbox and click the search icon.

First Window - How to handle multiple windows - Edureka
After this, we will open the course link in the new window using the action command

Second Window - How to handle multiple windows in Selenium - Edureka
Once we do this, we’ll scroll down through the child window using the javascriptexecutor.

Scrolling through window - how to handle multiple windows in Selenium - Edureka
And after this, print the title of the first window and also the window handles of the two windows.
Output - how to handle multiple windows in Selenium - Edureka
Now with this, we come to an end to this “How to handle multiple windows in Selenium” blog. I Hope you guys enjoyed this article and understood what is Selenium Webdriver and also understood how the window handle function helps in switching between the windows. Now that you have understood how to work on multiple windows simultaneously, check out theSelenium Certification Course by Edureka, a trusted online learning company with a network of more than 650,000 satisfied learners spread across the globe. This course is designed to introduce you to the complete Selenium features and its importance in testing software. Got a question for us? Please mention it in the comments section of “How to handle multiple windows in Selenium” and we will get back to you.

The post How to Handle Multiple Windows in Selenium? appeared first on Edureka.

↧

KNN Algorithm: A Practical Implementation Of KNN Algorithm In R

April 15, 2019, 11:10 pm

≫ Next: How To Perform Logistic Regression In Python?

≪ Previous: How to Handle Multiple Windows in Selenium?

KNN Algorithm In R:

With the amount of data that we’re generating, the need for advanced Machine Learning Algorithms has increased. One such algorithm is the K Nearest Neighbour algorithm. In this blog on KNN Algorithm In R, you will understand how the KNN algorithm works and its implementation using the R Language.

To get in-depth knowledge on Data Science, you can enroll for live Data Science Certification Training by Edureka with 24/7 support and lifetime access.

The following topics will be covered in this KNN Algorithm In R blog:

It is essential that you know Machine Learning basics before you get started with this KNN Algorithm In R blog. Here’s a list of Machine learning blogs to get you started:

To get an in-depth understanding of the KNN Algorithm In R, you can go through this video recorded by our Machine Learning experts.

KNN Algorithm Using R | Edureka

What Is KNN Algorithm?

KNN which stand for K Nearest Neighbor is a Supervised Machine Learning algorithm that classifies a new data point into the target class, depending on the features of its neighboring data points.

Let’s try to understand the KNN algorithm with a simple example. Let’s say we want a machine to distinguish between images of cats & dogs. To do this we must input a dataset of cat and dog images and we have to train our model to detect the animals based on certain features. For example, features such as pointy ears can be used to identify cats and similarly we can identify dogs based on their long ears.

What is KNN Algorithm - KNN Algorithm In R - Edureka

What is KNN Algorithm? – KNN Algorithm In R – Edureka

After studying the dataset during the training phase, when a new image is given to the model, the KNN algorithm will classify it into either cats or dogs depending on the similarity in their features. So if the new image has pointy ears, it will classify that image as a cat because it is similar to the cat images. In this manner, the KNN algorithm classifies data points based on how similar they are to their neighboring data points.

Now let’s discuss the features of the KNN algorithm.

Features Of KNN Algorithm

The KNN algorithm has the following features:

KNN is a Supervised Learning algorithm that uses labeled input data set to predict the output of the data points.
It is one of the most simple Machine learning algorithms and it can be easily implemented for a varied set of problems.
It is mainly based on feature similarity. KNN checks how similar a data point is to its neighbor and classifies the data point into the class it is most similar to.

Features of KNN - KNN Algorithm In R - Edureka

Features of KNN – KNN Algorithm In R – Edureka

Unlike most algorithms, KNN is a non-parametric model which means that it does not make any assumptions about the data set. This makes the algorithm more effective since it can handle realistic data.
KNN is a lazy algorithm, this means that it memorizes the training data set instead of learning a discriminative function from the training data.
KNN can be used for solving both classification and regression problems.

KNN Algorithm Example

To make you understand how KNN algorithm works, let’s consider the following scenario:

How does KNN Algorithm work - KNN Algorithm In R - Edureka

How does KNN Algorithm work? – KNN Algorithm In R – Edureka

In the above image, we have two classes of data, namely class A (squares) and Class B (triangles)
The problem statement is to assign the new input data point to one of the two classes by using the KNN algorithm
The first step in the KNN algorithm is to define the value of ‘K’. But what does the ‘K’ in the KNN algorithm stand for?
‘K’ stands for the number of Nearest Neighbors and hence the name K Nearest Neighbors (KNN).

How does KNN Algorithm work 1 - KNN Algorithm In R - Edureka

How does KNN Algorithm work? – KNN Algorithm In R – Edureka

In the above image, I’ve defined the value of ‘K’ as 3. This means that the algorithm will consider the three neighbors that are the closest to the new data point in order to decide the class of this new data point.
The closeness between the data points is calculated by using measures such as Euclidean and Manhattan distance, which I’ll be explaining below.
At ‘K’ = 3, the neighbors include two squares and 1 triangle. So, if I were to classify the new data point based on ‘K’ = 3, then it would be assigned to Class A (squares).

How does KNN Algorithm work 2 - KNN Algorithm In R - Edureka

How does KNN Algorithm work? – KNN Algorithm In R – Edureka

But what if the ‘K’ value is set to 7? Here, I’m basically telling my algorithm to look for the seven nearest neighbors and classify the new data point into the class it is most similar to.
At ‘K’ = 7, the neighbors include three squares and four triangles. So, if I were to classify the new data point based on ‘K’ = 7, then it would be assigned to Class B (triangles) since the majority of its neighbors were of class B.

How does KNN Algorithm work 3 - KNN Algorithm In R - Edureka

How does KNN Algorithm work? – KNN Algorithm In R – Edureka

In practice, there’s a lot more to consider while implementing the KNN algorithm. This will be discussed in the demo section of the blog.

Earlier I mentioned that KNN uses Euclidean distance as a measure to check the distance between a new data point and its neighbors, let’s see how.

Euclidian Distance - KNN Algorithm In R - Edureka

Euclidian Distance – KNN Algorithm In R – Edureka

Consider the above image, here we’re going to measure the distance between P1 and P2 by using the Euclidian Distance measure.
The coordinates for P1 and P2 are (1,4) and (5,1) respectively.
The Euclidian Distance can be calculated like so:

Euclidian Distance Calculations - KNN Algorithm In R - Edureka

Euclidian Distance Calculations – KNN Algorithm In R – Edureka

It is as simple as that! KNN makes use of simple measure in order to solve complex problems, this is one of the reasons why KNN is such a commonly used algorithm.

To sum it up, let’s look at the pseudocode for KNN Algorithm.

KNN Algorithm Pseudocode

Consider the set, (Xi, Ci),

Where Xi denotes feature variables and ‘i’ are data points ranging from i=1, 2, ….., n
Ci denotes the output class for Xi for each i

The condition, Ci ∈ {1, 2, 3, ……, c} is acceptable for all values of ‘i’ by assuming that the total number of classes is denoted by ‘c’.

Now let’s pretend that there’s a data point ‘x’ whose output class needs to be predicted. This can be done by using the K-Nearest Neighbour (KNN) Algorithm.

KNN Algorithm Pseudocode:

Calculate D(x, xi), where 'i' =1, 2, ….., n and 'D' is the Euclidean measure between the data points.
The calculated Euclidean distances must be arranged in ascending order.
Initialize k and take the first k distances from the sorted list.
Figure out the k points for the respective k distances.
Calculate ki, which indicates the number of data points belonging to the ith class among k points i.e. k ≥ 0
If ki >kj ∀ i ≠ j; put x in class i.

The above pseudocode can be used for solving a classification problem by using the KNN Algorithm.

Before we get into the practical implementation of KNN, let’s look at a real-world use case of the KNN algorithm.

KNN Algorithm Use-Case

Surely you have shopped on Amazon! Have you ever noticed that when you buy a product, Amazon gives you a list of recommendations based on your purchase? Not only this, Amazon displays a section which says, ‘customers who bought this item also bought this.. ‘.

Machine learning plays a huge role in Amazon’s recommendation system. The logic behind a recommendation engine is to suggest products to customers based on other customers who have a similar shopping behavior.

KNN Use case- KNN Algorithm In R - Edureka

KNN Use Case- KNN Algorithm In R – Edureka

Consider an example, let’s say that a customer A who loves mystery novels bought the Game Of Thrones and Lord Of The Rings book series. Now a couple of weeks later, another customer B who reads books of the same genre buys Lord Of The Rings. He does not buy the Game of Thrones book series, but Amazon recommends it customer B since his shopping behaviors and his choice in books is quite similar to customer A.

Therefore, Amazon recommends products to customers based on how similar their shopping behaviors are. This similarity can be understood by implementing the KNN algorithm which is mainly based on feature similarity.

Now that you know how KNN works and how it is used in real-world applications, let’s discuss the implementation of KNN using the R language. If you’re not familiar with the R language, you can go through this video recorded by our Machine Learning experts.

R Tutorial For Beginners | Edureka

Practical Implementation Of KNN Algorithm In R

Problem Statement: To study a bank credit dataset and build a Machine Learning model that predicts whether an applicant’s loan can be approved or not based on his socio-economic profile.

Dataset Description: The bank credit dataset contains information about 1000s of applicants. This includes their account balance, credit amount, age, occupation, loan records, etc. By using this data, we can predict whether or not to approve the loan of an applicant.

Dataset – KNN Algorithm In R – Edureka

Logic: This problem statement can be solved using the KNN algorithm that will classify the applicant’s loan request into two classes:

Approved
Disapproved

Now that you know the objective of this project, let’s get started with the coding part.

Step 1: Import the dataset

#Import the dataset
loan <- read.csv("C:/Users/zulaikha/Desktop/DATASETS/knn dataset/credit_data.csv")

After importing the dataset, let’s take a look at the structure of the dataset:

str(loan)
'data.frame': 1000 obs. of 21 variables:
$ Creditability : int 1 1 1 1 1 1 1 1 1 1 ...
$ Account.Balance : int 1 1 2 1 1 1 1 1 4 2 ...
$ Duration.of.Credit..month. : int 18 9 12 12 12 10 8 6 18 24 ...
$ Payment.Status.of.Previous.Credit: int 4 4 2 4 4 4 4 4 4 2 ...
$ Purpose : int 2 0 9 0 0 0 0 0 3 3 ...
$ Credit.Amount : int 1049 2799 841 2122 2171 2241 3398 1361 1098 3758 ...
$ Value.Savings.Stocks : int 1 1 2 1 1 1 1 1 1 3 ...
$ Length.of.current.employment : int 2 3 4 3 3 2 4 2 1 1 ...
$ Instalment.per.cent : int 4 2 2 3 4 1 1 2 4 1 ...
$ Sex...Marital.Status : int 2 3 2 3 3 3 3 3 2 2 ...
$ Guarantors : int 1 1 1 1 1 1 1 1 1 1 ...
$ Duration.in.Current.address : int 4 2 4 2 4 3 4 4 4 4 ...
$ Most.valuable.available.asset : int 2 1 1 1 2 1 1 1 3 4 ...
$ Age..years. : int 21 36 23 39 38 48 39 40 65 23 ...
$ Concurrent.Credits : int 3 3 3 3 1 3 3 3 3 3 ...
$ Type.of.apartment : int 1 1 1 1 2 1 2 2 2 1 ...
$ No.of.Credits.at.this.Bank : int 1 2 1 2 2 2 2 1 2 1 ...
$ Occupation : int 3 3 2 2 2 2 2 2 1 1 ...
$ No.of.dependents : int 1 2 1 2 1 2 1 2 1 1 ...
$ Telephone : int 1 1 1 1 1 1 1 1 1 1 ...
$ Foreign.Worker : int 1 1 1 2 2 2 2 2 1 1 ...

Note that, the ‘Creditability’ variable is our output variable or the target variable. The value of the credibility variable represents whether an applicant’s loan is approved or rejected.

Step 2: Data Cleaning

From the structure of the dataset, we can see that there are 21 predictor variables that will help us decide whether or not an applicant’s loan must be approved.

Some of these variables are not essential in predicting the loan of an applicant, for example, variables such as Telephone, Concurrent. Credits, Duration.in.Current.address, Type.of.apartment, etc. Such variables must be removed because they will only increase the complexity of the Machine Learning model.

loan.subset <- loan[c('Creditability','Age..years.','Sex...Marital.Status','Occupation','Account.Balance','Credit.Amount','Length.of.current.employment','Purpose')]

In the above code snippet, I’ve filtered down the predictor variables. Now let’s take a look at how our dataset looks:

str(loan.subset)
'data.frame': 1000 obs. of 8 variables:
$ Creditability : int 1 1 1 1 1 1 1 1 1 1 ...
$ Age..years. : int 21 36 23 39 38 48 39 40 65 23 ...
$ Sex...Marital.Status : int 2 3 2 3 3 3 3 3 2 2 ...
$ Occupation : int 3 3 2 2 2 2 2 2 1 1 ...
$ Account.Balance : int 1 1 2 1 1 1 1 1 4 2 ...
$ Credit.Amount : int 1049 2799 841 2122 2171 2241 3398 1361 1098 3758 ...
$ Length.of.current.employment: int 2 3 4 3 3 2 4 2 1 1 ...
$ Purpose : int 2 0 9 0 0 0 0 0 3 3 ...

Now we have narrowed down 21 variables to 8 predictor variables that are significant for building the model.

Step 3: Data Normalization

You must always normalize the data set so that the output remains unbiased. To explain this, let’s take a look at the first few observations in our data set.

head(loan.subset)
  Creditability Age..years. Sex...Marital.Status Occupation Account.Balance Credit.Amount
1             1          21                    2          3               1          1049
2             1          36                    3          3               1          2799
3             1          23                    2          2               2           841
4             1          39                    3          2               1          2122
5             1          38                    3          2               1          2171
6             1          48                    3          2               1          2241
  Length.of.current.employment Purpose
1                            2       2
2                            3       0
3                            4       9
4                            3       0
5                            3       0
6                            2       0

Notice the Credit amount variable, its value scale is in 1000s, whereas the rest of the variables are in single digits or 2 digits. If the data isn’t normalized it will lead to a baised outcome.

#Normalization
normalize <- function(x) {
return ((x - min(x)) / (max(x) - min(x))) }

In the below code snippet, we’re storing the normalized data set in the ‘loan.subset.n’ variable and also we’re removing the ‘Credibility’ variable since it’s the response variable that needs to be predicted.

loan.subset.n <- as.data.frame(lapply(loan.subset[,2:8], normalize))

This is the normalized data set:

head(loan.subset.n)
   Age..years. Sex..Marital Occupation Account.Balance Credit.Amount
1  0.03571429   0.3333333   0.6666667    0.0000000      0.04396390
2  0.30357143   0.6666667   0.6666667    0.0000000      0.14025531
3  0.07142857   0.3333333   0.3333333    0.3333333      0.03251898
4  0.35714286   0.6666667   0.3333333    0.0000000      0.10300429
5  0.33928571   0.6666667   0.3333333    0.0000000      0.10570045
6  0.51785714   0.6666667   0.3333333    0.0000000      0.10955211
Length.of.current.employment Purpose
            0.25                0.2
            0.50                0.0
            0.75                0.9
            0.50                0.0
            0.50                0.0
            0.25                0.0

Step 4: Data Splicing

After cleaning the data set and formatting it, the next step is data splicing. Data splicing basically involves splitting the data set into training and testing data set. This is done in the following code snippet:

set.seed(123)
dat.d <- sample(1:nrow(loan.subset.n),size=nrow(loan.subset.n)*0.7,replace = FALSE) #random selection of 70% data.

train.loan <- loan.subset[dat.d,] # 70% training data
test.loan <- loan.subset[-dat.d,] # remaining 30% test data

After deriving the training and testing data set, the below code snippet is going to create a separate data frame for the ‘Creditability’ variable so that our final outcome can be compared with the actual value.

#Creating seperate dataframe for 'Creditability' feature which is our target.
train.loan_labels <- loan.subset[dat.d,1]
test.loan_labels <-loan.subset[-dat.d,1]

Step 5: Building a Machine Learning model

At this stage, we have to build a model by using the training data set. Since we’re using the KNN algorithm to build the model, we must first install the ‘class’ package provided by R. This package has the KNN function in it:

#Install class package
install.packages('class')
# Load class package
library(class)

Next, we’re going to calculate the number of observations in the training data set. The reason we’re doing this is that we want to initialize the value of ‘K’ in the KNN model. One of the ways to find the optimal K value is to calculate the square root of the total number of observations in the data set. This square root will give you the ‘K’ value.

#Find the number of observation
NROW(train.loan_labels) 
[1] 700

So, we have 700 observations in our training data set. The square root of 700 is around 26.45, therefore we’ll create two models. One with ‘K’ value as 26 and the other model with a ‘K’ value as 27.

knn.26 <- knn(train=train.loan, test=test.loan, cl=train.loan_labels, k=26)
knn.27 <- knn(train=train.loan, test=test.loan, cl=train.loan_labels, k=27)

Step 6: Model Evaluation

After building the model, it is time to calculate the accuracy of the created models:

#Calculate the proportion of correct classification for k = 26, 27
ACC.26 <- 100 * sum(test.loan_labels == knn.26)/NROW(test.loan_labels)
ACC.27 <- 100 * sum(test.loan_labels == knn.27)/NROW(test.loan_labels)

ACC.26
[1] 67.66667

ACC.27
[1] 67.33333

As shown above, the accuracy for K = 26 is 67.66 and for K = 27 it is 67.33. We can also check the predicted outcome against the actual value in tabular form:

# Check prediction against actual value in tabular form for k=26
table(knn.26 ,test.loan_labels)

       test.loan_labels
knn.26   0     1
0        11    7
1        90   192

knn.26
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1
[51] 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[101] 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[151] 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1
[201] 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1
[251] 0 1 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1
Levels: 0 1


# Check prediction against actual value in tabular form for k=27
table(knn.27 ,test.loan_labels)

       test.loan_labels
knn.27    0     1
0         11    8
1         90   191

knn.27
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1
[51] 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[101] 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[151] 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1
[201] 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1
[251] 0 1 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1
Levels: 0 1

You can also use the confusion matrix to calculate the accuracy. To do this we must first install the infamous Caret package:

install.packages('caret')
library(caret)

Now, let’s use the confusion matrix to calculate the accuracy of the KNN model with K value set to 26:

confusionMatrix(table(knn.26 ,test.loan_labels))
Confusion Matrix and Statistics
       test.loan_labels
knn.26   0   1
     0  11   7
     1  90 192
                                          
               Accuracy : 0.6767          
                 95% CI : (0.6205, 0.7293)
    No Information Rate : 0.6633          
    P-Value [Acc > NIR] : 0.3365          
                                          
                  Kappa : 0.0924          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 0.10891         
            Specificity : 0.96482         
         Pos Pred Value : 0.61111         
         Neg Pred Value : 0.68085         
             Prevalence : 0.33667         
         Detection Rate : 0.03667         
   Detection Prevalence : 0.06000         
      Balanced Accuracy : 0.53687         
                                          
       'Positive' Class : 0

So, from the output, we can see that our model predicts the outcome with an accuracy of 67.67% which is good since we worked with a small data set. A point to remember is that the more data (optimal data) you feed the machine, the more efficient the model will be.

Step 7: Optimization

In order to improve the accuracy of the model, you can use n number of techniques such as the Elbow method and maximum percentage accuracy graph. In the below code snippet, I’ve created a loop that calculates the accuracy of the KNN model for ‘K’ values ranging from 1 to 28. This way you can check which ‘K’ value will result in the most accurate model:

i=1
k.optm=1
for (i in 1:28){
+ knn.mod <- knn(train=train.loan, test=test.loan, cl=train.loan_labels, k=i)
+ k.optm[i] <- 100 * sum(test.loan_labels == knn.mod)/NROW(test.loan_labels)
+ k=i
+ cat(k,'=',k.optm[i],'\n')
+ }
1 = 60.33333
2 = 58.33333
3 = 60.33333
4 = 61
5 = 62.33333
6 = 62
7 = 63.33333
8 = 63.33333
9 = 63.33333
10 = 64.66667
11 = 64.66667
12 = 65.33333
13 = 66
14 = 64
15 = 66.66667
16 = 67.66667
17 = 67.66667
18 = 67.33333
19 = 67.66667
20 = 67.66667
21 = 66.33333
22 = 67
23 = 67.66667
24 = 67
25 = 68
26 = 67.66667
27 = 67.33333
28 = 66.66667

From the output you can see that for K = 25, we achieve the maximum accuracy, i.e. 68%. We can also represent this graphically, like so:

#Accuracy plot
plot(k.optm, type="b", xlab="K- Value",ylab="Accuracy level")

Accuracy Plot - KNN Algorithm In R - Edureka

Accuracy Plot – KNN Algorithm In R – Edureka

The above graph shows that for ‘K’ value of 25 we get the maximum accuracy. Now that you know how to build a KNN model, I’ll leave it up to you to build a model with ‘K’ value as 25.

Now that you know how the KNN algorithm works, I’m sure you’re curious to learn more about the various Machine learning algorithms. Here’s a list of blogs that covers the different types of Machine Learning algorithms in depth:

With this, we come to the end of this KNN Algorithm In R blog. I hope you found this blog informative, if you have any doubts, leave a comment and we’ll get back to you.

Stay tuned for more blogs on trending technologies.

If you are looking for online structured training in Data Science, edureka! has a specially curated Data Science course which helps you gain expertise in Statistics, Data Wrangling, Exploratory Data Analysis, Machine Learning Algorithms like K-Means Clustering, Decision Trees, Random Forest, Naive Bayes. You’ll learn the concepts of Time Series, Text Mining and an introduction to Deep Learning as well. New batches for this course are starting soon!!

The post KNN Algorithm: A Practical Implementation Of KNN Algorithm In R appeared first on Edureka.

↧

How To Perform Logistic Regression In Python?

April 15, 2019, 11:50 pm

≫ Next: Most Frequently Asked Artificial Intelligence Interview Questions

≪ Previous: KNN Algorithm: A Practical Implementation Of KNN Algorithm In R

Logistic regression in Python is a predictive analysis technique. It is also used in Machine Learning for binary classification problems. In this blog we will go through the following topics to understand logistic regression in Python:

You may also refer this detailed tutorial on logistic regression in python with a demonstration for a better understanding or go through the certified python training to master logistic regression.

What is Regression?

Regression analysis is a powerful statistical analysis technique. A dependent variable of our interest is used to predict the values of other independent variables in a data-set.

We come across regression in an intuitive way all the time. Like predicting the weather using the data-set of the weather conditions in the past.

It uses many techniques to analyse and predict the outcome, but the emphasis is mainly on relationship between dependent variable and one or more independent variable.

Logistic regression analysis predicts the outcome in a binary variable which has only two possible outcomes.

Logistic Regression In Python

It is a technique to analyse a data-set which has a dependent variable and one or more independent variables to predict the outcome in a binary variable, meaning it will have only two outcomes.

The dependent variable is categorical in nature. Dependent variable is also referred as target variable and the independent variables are called the predictors.

Logistic regression is a special case of linear regression where we only predict the outcome in a categorical variable. It predicts the probability of the event using the log function.

We use the Sigmoid function/curve to predict the categorical value. The threshold value decides the outcome(win/lose).

Linear regression equation: y = β0 + β1X1 + β2X2 …. + βnXn

Y stands for the dependent variable that needs to be predicted.
β0 is the Y-intercept, which is basically the point on the line which touches the y-axis.
β1 is the slope of the line (the slope can be negative or positive depending on the relationship between the dependent variable and the independent variable.)
X here represents the independent variable that is used to predict our resultant dependent value.

Sigmoid function: p = 1 / 1 + e^-y

Apply sigmoid function on the linear regression equation.

logistic regression in python-edureka

Logistic Regression equation: p = 1 / 1 + e^{-(β0 + β1X1 + β2X2 …. + βnXn)}

Lets take a look at different types of logistic regression.

Types Of Logistic Regression

types-logistic regression in python-edureka

- Binary logistic regression – It has only two possible outcomes. Example- yes or no
- Multinomial logistic regression – It has three or more nominal categories.Example- cat, dog, elephant.
- Ordinal logistic regression- It has three or more ordinal categories, ordinal meaning that the categories will be in a order. Example- user ratings(1-5).

Linear Vs Logistic Regression

logistic regression in python-edureka

linear regression graph-logistic regression in python-edureka

While linear regression can have infinite possible values, logistic regression has definite outcomes.

Linear regression is used when the response variable is continuous in nature, but logistic regression is used when the response variable is categorical in nature.

Predicting a defaulter in a bank using the transaction details in the past is an example of logistic regression, while a continuous output like a stock market score is an example of linear regression.

Use Cases

Following are the use cases where we can use logistic regression.

Weather Prediction

Weather predictions are the result of logical regression. Here, we analyse the data of the previous weather reports and predict the possible outcome for a specific day. But logical regression would only predict categorical data, like if its going to rain or not.

Determining Illness

We can use logical regression with the help of the medical history of the patient to predict if the illness is positive or negative in any case.

Lets take a sample data-set to build a prediction model using logistic regression.

Demo

We are going to build a prediction model using logical regression in Python with the help of a dataset, in this we are going to cover the following steps to achieve logical regression.

Collecting Data

The very first step for implementing the logistic regression is to collect the data. We will load the csv file containing the data-set into the programs using the pandas. We are using the NBA data for building the prediction model to predict the possibility of a home game or away game, by analyzing the relationship between the relevant data.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv(r'C:\Users\MohammadWaseem\Documents\data.csv')
print(df.head(5))

You will get all the data into a readable format for easier analysis. And then you can determine the dependent and independent variables for your model.

Analyzing Data

The data-set is analyzed to determine the relationship between the variables. By creating different plots to check the relationship between the variables.

sns.countplot('Home', hue='WINorLOSS', data=df)
plt.show()

Above is the relationship between the win/lose percentage in respect to the home/away game. Similarly we can plot the graph of relationship between other relevant entries in the data.

Data Wrangling

The data-set is modified according to the target variable. We will eliminate all the null values and the string values as well from the DataFrame.

print(df.isnull().sum())

We will check for all the irrelevant data like null values and the values which will not be required while building the prediction model. If there are no null values in the NBA dataset that we are using, we will proceed with splitting the data.

Test and Train Data

For the performance of the model the data is split into the test data and train data. The data is split using the train_test_split. The data here is split in the ratio 70:30.

Now, for the model prediction the logistic regression function is implemented by importing the logistic regression model in the sklearn module.

The model is then fit on the train set using the fit function. After this the prediction is performed using the prediction function.

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix,accuracy_score

x = df.drop('Home', axis=1)
y = df['Home']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.33, random_state=1)
logmodel = LogisticRegression()
logmodel.fit(x_train, y_train)

predictions = logmodel.predict(x_test)
print(classification_report(y_test, predictions))
print(confusion_matrix(y_test, predictions))
print(accuracy_score(y_test, predictions))

Classification Report:

The classification report displays the Precision, Recall , F1 and Support scores for the model.

Precision score means the the level up-to which the prediction made by the model is precise. The precision for a home game is 0.62 and for the away game is 0.58.

Recall is the amount up-to which the model can predict the outcome. Recall for a home game is 0.57 and for a away game is 0.64. F1 and Support scores are the amount of data tested for the predictions. In the NBA data-set the data tested for home game is 1662 and for the away game is 1586.

Confusion matrix:

Confusion matrix is a table which describes the performance of a prediction model. A confusion matrix contains the actual values and predicted values. we can use these values to calculate the accuracy score of the model.

Confusion matrix heatmap:

Lets plot a heat-map of the confusion matrix using seaborn and matplotlib to visualize the prediction model that we have built. To plot a heatmap, the following syntax is necessary.

sns.heatmap(pd.DataFrame(confusion_matrix(y_test,predictions)))
plt.show()

confusion matrix heatmap-logistic regression in python-edureka

By looking at the heatmap, we can conclude the following:

Out of all the predictions, the classifier predicted yes for the total 1730 times, out of which 1012 were actual yes.
out of all the predictions, the classifier predicted no for the total 1518 times, out of which 944 were actual no.

With this analysis of the confusion matrix we can conclude the accuracy score for our prediction model.

Accuracy score:

Accuracy score is the percentage of accuracy of the predictions made by the model. For our model the accuracy score is 0.60, which is considerably quite accurate. But the more the accuracy score the efficient is you prediction model. You must always aim for a higher accuracy score for a better prediction model.

By following the steps discussed above, we have predicted the possibility of a home/away game using the NBA dataset. After analyzing the classification report we can assume the possibility of a home/away game.

In this blog we have discussed the logistic regression in python concepts, how it is different from the linear approach. Also, we have covered a demonstration using the NBA Dataset. For more insight and practice, you can use a dataset of your choice and follow the steps discussed to implement logistic regression in Python.

Also, checkout the various Data-Science blogs on edureka platform to master the data scientist in you.

If you wish to learn Python and build a career in the Data science, then check out our interactive, live-online DATA SCIENCE CERTIFICATION here, that comes with 24*7 support to guide you throughout your learning period.
Got a question? Please mention it in the comments and we will get back to you.

The post How To Perform Logistic Regression In Python? appeared first on Edureka.

↧

Most Frequently Asked Artificial Intelligence Interview Questions

April 15, 2019, 11:55 pm

≫ Next: Differences Between SQL & NoSQL Databases – MySQL & MongoDB Comparison

≪ Previous: How To Perform Logistic Regression In Python?

Artificial Intelligence Interview Questions:

Ever since we realized how Artificial Intelligence is positively impacting the market, nearly every large business is on the lookout for AI professionals to help them make their vision a reality. In this Artificial Intelligence Interview Questions blog, I have collected the most frequently asked questions by interviewers. These questions are collected after consulting with Artificial Intelligence Certification Training Experts.

In case you have attended any Artificial Intelligence interview in the recent past, do paste those interview questions in the comments section and we’ll answer them at the earliest. You can also comment below if you have any questions in your mind, which you might face in your Artificial Intelligence interview.

In this blog on Artificial Intelligence Interview Questions, I will be discussing the top Artificial Intelligence related questions asked in your interviews. So, for your better understanding I have divided this blog into the following 3 sections:

Artificial Intelligence Basic Level Interview Questions

Q1. What is the difference between AI, Machine Learning and Deep Learning?

Artificial Intelligence vs Machine Learning vs Deep Learning - Artificial Intelligence Interview Questions - Edureka

Artificial Intelligence vs Machine Learning vs Deep Learning – Artificial Intelligence Interview Questions – Edureka

Q2. What is Artificial Intelligence? Give an example of where AI is used on a daily basis.

“Artificial Intelligence (AI) is an area of computer science that emphasizes the creation of intelligent machines that work and react like humans.”

“The capability of a machine to imitate the intelligent human behavior.”

Google’s Search Engine – Artificial Intelligence Interview Questions – Edureka

Google’s Search Engine
One of the most popular AI Applications is the google search engine. If you open up your chrome browser and start typing something, Google immediately provides recommendations for you to choose from. The logic behind the search engine is Artificial Intelligence.

AI uses predictive analytics, NLP and Machine Learning to recommend relevant searches to you. These recommendations are based on data that Google collects about you, such as your search history, location, age, etc. Thus, Google makes use of AI, to predict what you might be looking for.

Q3. What are the different types of AI?

Reactive Machines AI: Based on present actions, it cannot use previous experiences to form current decisions and simultaneously update their memory.
Example: Deep Blue
Limited Memory AI: Used in self-driving cars. They detect the movement of vehicles around them constantly and add it to their memory.
Theory of Mind AI: Advanced AI that has the ability to understand emotions, people and other things in the real world.
Self Aware AI: AIs that posses human-like consciousness and reactions. Such machines have the ability to form self-driven actions.
Artificial Narrow Intelligence (ANI): General purpose AI, used in building virtual assistants like Siri.
Artificial General Intelligence (AGI): Also known as strong AI. An example is the Pillo robot that answers questions related to health.
Artificial Superhuman Intelligence (ASI): AI that possesses the ability to do everything that a human can do and more. An example is the Alpha 2 which is the first humanoid ASI robot.

Q4. Explain the different domains of Artificial Intelligence.

Domains Of AI - Artificial Intelligence Interview Questions - Edureka

Domains Of AI – Artificial Intelligence Interview Questions – Edureka

Machine Learning: It’s the science of getting computers to act by feeding them data so that they can learn a few tricks on their own, without being explicitly programmed to do so.
Neural Networks: They are a set of algorithms and techniques, modeled in accordance with the human brain. Neural Networks are designed to solve complex and advanced machine learning problems.
Robotics: Robotics is a subset of AI, which includes different branches and application of robots. These Robots are artificial agents acting in a real-world environment. An AI Robot works by manipulating the objects in it’s surrounding, by perceiving, moving and taking relevant actions.
Expert Systems: An expert system is a computer system that mimics the decision-making ability of a human. It is a computer program that uses artificial intelligence (AI) technologies to simulate the judgment and behavior of a human or an organization that has expert knowledge and experience in a particular field.
Fuzzy Logic Systems: Fuzzy logic is an approach to computing based on “degrees of truth” rather than the usual “true or false” (1 or 0) boolean logic on which the modern computer is based. Fuzzy logic Systems can take imprecise, distorted, noisy input information.
Natural Language Processing: Natural Language Processing (NLP) refers to the Artificial Intelligence method that analyses natural human language to derive useful insights in order to solve problems.

Q5. How is Machine Learning related to Artificial Intelligence?

Artificial Intelligence is a technique that enables machines to mimic human behavior. Whereas, Machine Learning is a subset of Artificial Intelligence. It is the science of getting computers to act by feeding them data and letting them learn a few tricks on their own, without being explicitly programmed to do so.

Therefore Machine Learning is a technique used to implement Artificial Intelligence.

Artificial Intelligence vs Machine Learning - Artificial Intelligence Interview Questions - Edureka

Artificial Intelligence vs Machine Learning – Artificial Intelligence Interview Questions – Edureka

Q6. What are the different types of Machine Learning?

Types Of Machine Learning - Artificial Intelligence Interview Questions - Edureka

Types Of Machine Learning – Artificial Intelligence Interview Questions – Edureka

Q7. What is Q-Learning?

The Q-learning is a Reinforcement Learning algorithm in which an agent tries to learn the optimal policy from its past experiences with the environment. The past experiences of an agent are a sequence of state-action-rewards:

What Is Q-Learning - Artificial Intelligence Interview Questions - Edureka

What Is Q-Learning? – Artificial Intelligence Interview Questions – Edureka

In the above state diagram, the Agent(a0) was in State (s0) and on performing an Action (a0), which resulted in receiving a Reward (r1) and thus being updated to State (s1).

Q8. What is Deep Learning?

Deep learning imitates the way our brain works i.e. it learns from experiences. It uses the concepts of neural networks to solve complex problems.

What Is Deep Learning - Artificial Intelligence Interview Questions - Edureka

What Is Deep Learning? – Artificial Intelligence Interview Questions – Edureka

Any Deep neural network will consist of three types of layers:

Input Layer: This layer receives all the inputs and forwards them to the hidden layer for analysis
Hidden Layer: In this layer, various computations are carried out and the result is transferred to the output layer. There can be n number of hidden layers, depending on the problem you’re trying to solve.
Output Layer: This layer is responsible for transferring information from the neural network to the outside world.

Q9. Explain how Deep Learning works.

Biological Neurons - Artificial Intelligence Interview Questions - Edureka

Biological Neurons – Artificial Intelligence Interview Questions – Edureka

Deep Learning is based on the basic unit of a brain called a brain cell or a neuron. Inspired from a neuron, an artificial neuron or a perceptron was developed.
A biological neuron has dendrites which are used to receive inputs.
Similarly, a perceptron receives multiple inputs, applies various transformations and functions and provides an output.
Just like how our brain contains multiple connected neurons called neural network, we can also have a network of artificial neurons called perceptron’s to form a Deep neural network.

Deep Neural Network - Artificial Intelligence Interview Questions - Edureka

Deep Neural Network – Artificial Intelligence Interview Questions – Edureka

An Artificial Neuron or a Perceptron models a neuron which has a set of inputs, each of which is assigned some specific weight. The neuron then computes some function on these weighted inputs and gives the output.

Q10. Explain the commonly used Artificial Neural Networks.

Feedforward Neural Network

The simplest form of ANN, where the data or the input travels in one direction.
The data passes through the input nodes and exit on the output nodes. This neural network may or may not have the hidden layers.

Convolutional Neural Network

Here, input features are taken in batch wise like a filter. This will help the network to remember the images in parts and can compute the operations.
Mainly used for signal and image processing

Recurrent Neural Network(RNN) – Long Short Term Memory

Works on the principle of saving the output of a layer and feeding this back to the input to help in predicting the outcome of the layer.
Here, you let the neural network to work on the front propagation and remember what information it needs for later use
This way each neuron will remember some information it had in the previous time-step.

Autoencoders

These are unsupervised learning models with an input layer, an output layer and one or more hidden layers connecting them.
The output layer has the same number of units as the input layer. Its purpose is to reconstruct its own inputs.
Typically for the purpose of dimensionality reduction and for learning generative models of data.

Q11. What are Bayesian Networks?

A Bayesian network is a statistical model that represents a set of variables and their conditional dependencies in the form of a directed acyclic graph.

On the occurrence of an event, Bayesian Networks can be used to predict the likelihood that any one of several possible known causes was the contributing factor.

Bayesian Network - Artificial Intelligence Interview Questions - Edureka

Bayesian Network – Artificial Intelligence Interview Questions – Edureka

For example, a Bayesian network could be used to study the relationship between diseases and symptoms. Given various symptoms, the Bayesian network is ideal for computing the probabilities of the presence of various diseases.

Q12. Explain the assessment that is used to test the intelligence of a machine.

In artificial intelligence (AI), a Turing Test is a method of inquiry for determining whether or not a computer is capable of thinking like a human being.

AI Turing Test – Artificial Intelligence Interview Questions – Edureka

Artificial Intelligence Intermediate Level Interview Questions

Q1. How does Reinforcement Learning work? Explain with an example.

Generally, a Reinforcement Learning (RL) system is comprised of two main components:

An agent
An environment

Reinforcement Learning - Artificial Intelligence Interview Questions - Edureka

Reinforcement Learning – Artificial Intelligence Interview Questions – Edureka

The environment is the setting that the agent is acting on and the agent represents the RL algorithm.
The RL process starts when the environment sends a state to the agent, which then based on its observations, takes an action in response to that state.
In turn, the environment sends the next state and the respective reward back to the agent. The agent will update its knowledge with the reward returned by the environment to evaluate its last action.
The loop continues until the environment sends a terminal state, which means the agent has accomplished all his tasks.

To understand this better, let’s suppose that our agent is learning to play counterstrike. The RL process can be broken down into the below steps:

Counter Strike Example - Artificial Intelligence Interview Questions - Edureka

Counter-Strike Example – Artificial Intelligence Interview Questions – Edureka

The RL Agent (Player1) collects state S⁰ from the environment (Counterstrike game)
Based on the state S⁰, the RL agent takes an action A⁰, (Action can be anything that causes a result i.e. if the agent moves left or right in the game). Initially, the action is random
The environment is now in a new state S¹ (new stage in the game)
The RL agent now gets a reward R¹ from the environment. This reward can be additional points or coins
This RL loop goes on until the RL agent is dead or reaches the destination, and it continuously outputs a sequence of state, action, and reward.

To learn more about Reinforcement Learning you can go through this video recorded by our Machine Learning experts.

Reinforcement Learning Tutorial | Edureka

Q2. Explain Markov’s decision process with an example.

The mathematical approach for mapping a solution in Reinforcement Learning is called Markov’s Decision Process (MDP).

The following parameters are used to attain a solution using MDP:

Set of actions, A
Set of states, S
Reward, R
Policy, π
Value, V

Markov's Decision Process - Artificial Intelligence Interview Questions - Edureka

Markov’s Decision Process – Artificial Intelligence Interview Questions – Edureka

To briefly sum it up, the agent must take an action (A) to transition from the start state to the end state (S). While doing so, the agent receives rewards (R) for each action he takes. The series of actions taken by the agent, define the policy (π) and the rewards collected define the value (V). The main goal here is to maximize rewards by choosing the optimum policy.

To better understand the MDP, let’s solve the Shortest Path Problem using the MDP approach:

Shortest Path Problem – Artificial Intelligence Interview Questions – Edureka

Given the above representation, our goal here is to find the shortest path between ‘A’ and ‘D’. Each edge has a number linked with it, this denotes the cost to traverse that edge. Now, the task at hand is to traverse from point ‘A’ to ‘D’, with minimum possible cost.

In this problem,

The set of states are denoted by nodes i.e. {A, B, C, D}
The action is to traverse from one node to another {A -> B, C -> D}
The reward is the cost represented by each edge
The policy is the path taken to reach the destination

You start off at node A and take baby steps to your destination. Initially, only the next possible node is visible to you, thus you randomly start off and then learn as you traverse through the network. The main goal is to choose the path with the lowest cost.

Since this is a very simple problem, I will leave it for you to solve. Make sure you mention the answer in the comment section.

Q3. Explain reward maximization in Reinforcement Learning.

The RL agent works based on the theory of reward maximization. This is exactly why the RL agent must be trained in such a way that, he takes the best action so that the reward is maximum.

The collective rewards at a particular time with the respective action is written as:

Reward Maximization Equation - Artificial Intelligence Interview Questions - Edureka

Reward Maximization Equation – Artificial Intelligence Interview Questions – Edureka

The above equation is an ideal representation of rewards. Generally, things don’t work out like this while summing up the cumulative rewards.

Reward Maximization - Artificial Intelligence Interview Questions - Edureka

Reward Maximization – Artificial Intelligence Interview Questions – Edureka

Let me explain this with a small game. In the figure you can see a fox, some meat and a tiger.

Our RL agent is the fox and his end goal is to eat the maximum amount of meat before being eaten by the tiger.
Since this fox is a clever fellow, he eats the meat that is closer to him, rather than the meat which is close to the tiger, because the closer he is to the tiger, the higher are his chances of getting killed.
As a result, the rewards near the tiger, even if they are bigger meat chunks, will be discounted. This is done because of the uncertainty factor, that the tiger might kill the fox.

The next thing to understand is, how discounting of rewards work?
To do this, we define a discount rate called gamma. The value of gamma is between 0 and 1. The smaller the gamma, the larger the discount and vice versa.

So, our cumulative discounted rewards is:

Reward Maximization with Discount Equation - Artificial Intelligence Interview Questions - Edureka

Reward Maximization with Discount Equation – Artificial Intelligence Interview Questions – Edureka

Q4. What is exploitation and exploration trade-off?

An important concept in reinforcement learning is the exploration and exploitation trade-off.

Exploration, like the name suggests, is about exploring and capturing more information about an environment. On the other hand, exploitation is about using the already known exploited information to heighten the rewards.

Exploitation & Exploration - Artificial Intelligence Interview Questions - Edureka

Exploitation & Exploration – Artificial Intelligence Interview Questions – Edureka

Consider the fox and tiger example, where the fox eats only the meat (small) chunks close to him but he doesn’t eat the bigger meat chunks at the top, even though the bigger meat chunks would get him more rewards.
If the fox only focuses on the closest reward, he will never reach the big chunks of meat, this is called exploitation.
But if the fox decides to explore a bit, it can find the bigger reward i.e. the big chunk of meat. This is exploration.

Q5. What is the difference between parametric & non-parametric models?

Parametric vs Non Parametric model - Artificial Intelligence Interview Questions - Edureka

Parametric vs Non Parametric model – Artificial Intelligence Interview Questions – Edureka

Q6. What is the difference between Hyperparameters and model parameters?

Model Parameters vs Hyperparameters – Artificial Intelligence Interview Questions – Edureka

Q7. What are hyperparameters in Deep Neural Networks?

Hyperparameters are variables that define the structure of the network. For example, variables such as the learning rate, define how the network is trained.
They are used to define the number of hidden layers that must be present in a network.
More hidden units can increase the accuracy of the network, whereas a lesser number of units may cause underfitting.

Q8. Explain the different algorithms used for hyperparameter optimization.

Grid Search
Grid search trains the network for every combination by using the two set of hyperparameters, learning rate and the number of layers. Then evaluates the model by using Cross Validation techniques.

Random Search
It randomly samples the search space and evaluates sets from a particular probability distribution. For example, instead of checking all 10,000 samples, randomly selected 100 parameters can be checked.

Bayesian Optimization
This includes fine-tuning the hyperparameters by enabling automated model tuning. The model used for approximating the objective function is called surrogate model (Gaussian Process). Bayesian Optimization uses Gaussian Process (GP) function to get posterior functions to make predictions based on prior functions.

Q9. How does data overfitting occur and how can it be fixed?

Overfitting occurs when a statistical model or machine learning algorithm captures the noise of the data. This causes an algorithm to show low bias but high variance in the outcome.

Overfitting can be prevented by using the following methodologies:

Cross-validation: The idea behind cross-validation is to split the training data in order to generate multiple mini train-test splits. These splits can then be used to tune your model.

More training data: Feeding more data to the machine learning model can help in better analysis and classification. However, this does not always work.

Remove features: Many times, the data set contains irrelevant features or predictor variables that are not needed for analysis. Such features only increase the complexity of the model, thus leading to possibilities of data overfitting. Therefore, such redundant variables must be removed.

Early stopping: A machine learning model is trained iteratively, this allows us to check how well each iteration of the model performs. But after a certain number of iterations, the model’s performance starts to saturate. Further training will result in overfitting, thus one must know where to stop the training. This can be achieved by a mechanism called early stopping.

Regularization: Regularization can be done in n number of ways, the method will depend on the type of learner you’re implementing. For example, pruning is performed on decision trees, the dropout technique is used on neural networks and parameter tuning can also be applied to solve overfitting issues.

Use Ensemble models: Ensemble learning is a technique that is used to create multiple Machine Learning models, which are then combined to produce more accurate results. This is one of the best ways to prevent overfitting. An example is Random Forest, it uses an ensemble of decision trees to make more accurate predictions and to avoid overfitting.

Q10. Mention a technique that helps to avoid overfitting in a neural network.

Dropout is a type of regularization technique used to avoid overfitting in a neural network. It is a technique where randomly selected neurons are dropped during training.

Dropout - Artificial Intelligence Interview Questions - Edureka

Dropout – Artificial Intelligence Interview Questions – Edureka

The Dropout value of a network must be chosen wisely. A value too low will result in a minimal effect and a value too high results in under-learning by the network.

Q11. What is the purpose of Deep Learning frameworks such as Keras, TensorFlow, and PyTorch?

Keras is an open source neural network library written in Python. It is designed to enable fast experimentation with deep neural networks.
TensorFlow is an open-source software library for dataflow programming. It is used for machine learning applications like neural networks.
PyTorch is an open source machine learning library for Python, based on Torch. It is used for applications such as natural language processing.

Q12. Differentiate between NLP and Text mining.

Text Mining vs NLP - Artificial Intelligence Interview Questions - Edureka

Text Mining vs NLP – Artificial Intelligence Interview Questions – Edureka

Q13. What are the different components of NLP?

Components Of NLP - Artificial Intelligence Interview Questions - Edureka

Components Of NLP – Artificial Intelligence Interview Questions – Edureka

Natural Language Understanding includes:

Mapping input to useful representations
Analyzing different aspects of the language

Natural Language Generation includes:

Text Planning
Sentence Planning
Text Realization

Q14. What is Stemming & Lemmatization in NLP?

Stemming algorithms work by cutting off the end or the beginning of the word, taking into account a list of common prefixes and suffixes that can be found in an inflected word. This indiscriminate cutting can be successful on some occasions, but not always.

Stemming - Artificial Intelligence Interview Questions - Edureka

Stemming – Artificial Intelligence Interview Questions – Edureka

Lemmatization, on the other hand, takes into consideration the morphological analysis of the words. To do so, it is necessary to have detailed dictionaries which the algorithm can look through to link the form back to its lemma.

Lemmatization - Artificial Intelligence Interview Questions - Edureka.png

Q15. Explain Fuzzy Logic architecture.

Fuzzy Logic Architecture - Artificial Intelligence Interview Questions - Edureka

Fuzzy Logic Architecture – Artificial Intelligence Interview Questions – Edureka

Fuzzification Module − The system inputs are fed into the Fuzzifier, which transforms the inputs into fuzzy sets.
Knowledge Base − It stores analytic measures such as IF-THEN rules provided by experts.
Inference Engine − It simulates the human reasoning process by making fuzzy inference on the inputs and IF-THEN rules.
Defuzzification Module − It transforms the fuzzy set obtained by the inference engine into a crisp value.

Q16. Explain the components of Expert Systems.

Expert Systems – Artificial Intelligence Interview Questions – Edureka

Knowledge Base
It contains domain-specific and high-quality knowledge.
Inference Engine
It acquires and manipulates the knowledge from the knowledge base to arrive at a particular solution.
User Interface
The user interface provides interaction between the user and the Expert System itself.

Q17. How is Computer Vision and AI related?

Computer Vision is a field of Artificial Intelligence that is used to obtain information from images or multi-dimensional data. Machine Learning algorithms such as K-means is used for Image Segmentation, Support Vector Machine is used for Image Classification and so on.

Computer Vision And AI - Artificial Intelligence Interview Questions - Edureka

Computer Vision And AI – Artificial Intelligence Interview Questions – Edureka

Therefore Computer Vision makes use of AI technologies to solve complex problems such as Object Detection, Image Processing, etc.

Q18. Which is better for image classification? Supervised or unsupervised classification? Justify.

In supervised classification, the images are manually fed and interpreted by the Machine Learning expert to create feature classes.
In unsupervised classification, the Machine Learning software creates feature classes based on image pixel values.

Therefore, it is better to choose supervised classification for image classification in terms of accuracy.

Q19. Finite difference filters in image processing are very susceptible to noise. To cope up with this, which method can you use so that there would be minimal distortions by noise?

Image Smoothing is one of the best methods used for reducing noise by forcing pixels to be more like their neighbors, this reduces any distortions caused by contrasts.

Image Smoothing - Artificial Intelligence Interview Questions - Edureka

Image Smoothing – Artificial Intelligence Interview Questions – Edureka

Q20. How is Game theory and AI related?

“In the context of artificial intelligence(AI) and deep learning systems, game theory is essential to enable some of the key capabilities required in multi-agent environments in which different AI programs need to interact or compete in order to accomplish a goal.”

Game Theory And AI - Artificial Intelligence Interview Questions - Edureka

Game Theory And AI – Artificial Intelligence Interview Questions – Edureka

Q21. What is the Minimax Algorithm? Explain the terminologies involved in a Minimax problem.

Minimax is a recursive algorithm used to select an optimal move for a player assuming that the other player is also playing optimally.

A game can be defined as a search problem with the following components:

Game Tree: A tree structure containing all the possible moves.
Initial state: The initial position of the board and showing whose move it is.
Successor function: It defines the possible legal moves a player can make.
Terminal state: It is the position of the board when the game ends.
Utility function: It is a function which assigns a numeric value for the outcome of a game.

Artificial Intelligence Scenario Based Interview Question

Q1. Show the working of the Minimax algorithm using Tic-Tac-Toe Game.

There are two players involved in a game:

MAX: This player tries to get the highest possible score
MIN: MIN tries to get the lowest possible score

The following approach is taken for a Tic-Tac-Toe game using the Minimax algorithm:

Step 1: First, generate the entire game tree starting with the current position of the game all the way up to the terminal states.

Tic-Tac-Toe – Artificial Intelligence Interview Questions – Edureka

Step 2: Apply the utility function to get the utility values for all the terminal states.

Step 3: Determine the utilities of the higher nodes with the help of the utilities of the terminal nodes. For instance, in the diagram below, we have the utilities for the terminal states written in the squares.

Tic-Tac-Toe - Artificial Intelligence Interview Questions - Edureka

Tic-Tac-Toe – Artificial Intelligence Interview Questions – Edureka

Let us calculate the utility for the left node(red) of the layer above the terminal:

MIN{3, 5, 10}, i.e. 3.
Therefore, the utility for the red node is 3.

Similarly, for the green node in the same layer:
MIN{2,2}, i.e. 2.

Tic-Tac-Toe – Artificial Intelligence Interview Questions – Edureka

Step 4: Calculate the utility values.

Step 5: Eventually, all the backed-up values reach to the root of the tree. At that point, MAX has to choose the highest value:
i.e. MAX{3,2} which is 3.

Therefore, the best opening move for MAX is the left node(or the red one).
To summarize,

Minimax Decision = MAX{MIN{3,5,10},MIN{2,2}}
= MAX{3,2}
= 3

Q2. Which method is used for optimizing a Minimax based game?

Alpha-beta Pruning
If we apply alpha-beta pruning to a standard minimax algorithm, it returns the same move as the standard one, but it removes all the nodes that are possibly not affecting the final decision.

Alpha-beta Pruning - Artificial Intelligence Interview Questions - Edureka

Alpha-beta Pruning – Artificial Intelligence Interview Questions – Edureka

In this case,
Minimax Decision = MAX{MIN{3,5,10}, MIN{2,a,b}, MIN{2,7,3}}
= MAX{3,c,2}
= 3

Hint: (MIN{2,a,b} would certainly be less than or equal to 2, i.e., c<=2 and hence MAX{3,c,2} has to be 3.)

Q3. Which algorithm does Facebook use for face verification and how does it work?

Facebook uses DeepFace for face verification. It works on the face verification algorithm, structured by Artificial Intelligence (AI) techniques using neural network models.

Face Verification - Artificial Intelligence Interview Questions - Edureka

Face Verification – Artificial Intelligence Interview Questions – Edureka

Here’s how face verification is done:

Input: Scan a wild form of photos with large complex data. This involves blurry images, images with high intensity and contrast.

Process: In modern face recognition, the process completes in 4 raw steps:

Detect facial features
Align and compare the features
Represent the key patterns by using 3D graphs
Classify the images based on similarity

Output: Final result is a face representation, which is derived from a 9-layer deep neural net

Training Data: More than 4 million facial images of more than 4000 people

Result: Facebook can detect whether the two images represent the same person or not

Q4. Explain the logic behind targeted marketing. How can Machine Learning help with this?

Target Marketing involves breaking a market into segments & concentrating it on a few key segments consisting of the customers whose needs and desires most closely match your product.

It is the key to attracting new business, increasing your sales, and growing the company.

The beauty of target marketing is that by aiming your marketing efforts at specific groups of consumers it makes the promotion, pricing, and distribution of your products and/or services easier and more cost-effective.

Targeted Marketing - Artificial Intelligence Interview Questions - Edureka

Targeted Marketing – Artificial Intelligence Interview Questions – Edureka

Machine Learning in targeted marketing:

Text Analytics Systems: The applications for text analytics ranges from search applications, text classification, named entity recognition, to pattern search and replace applications.
Clustering: With applications including customer segmentation, fast search, and visualization.
Classification: Like decision trees and neural network classifiers, which can be used for text classification in marketing.
Recommender Systems: And association rules which can be used to analyze your marketing data
Market Basket Analysis: Market basket analysis explains the combinations of products that frequently
co-occur in transactions.

Q5. How can AI be used in detecting fraud?

Artificial Intelligence is used in Fraud detection problems by implementing Machine Learning algorithms for detecting anomalies and studying hidden patterns in data.

Fraud Detection Using AI - Artificial Intelligence Interview Questions - Edureka

Fraud Detection Using AI – Artificial Intelligence Interview Questions – Edureka

The following approach is followed for detecting fraudulent activities:

Data Extraction: At this stage data is either collected through a survey or web scraping is performed. If you’re trying to detect credit card fraud, then information about the customer is collected. This includes transactional, shopping, personal details, etc.

Data Cleaning: At this stage, the redundant data must be removed. Any inconsistencies or missing values may lead to wrongful predictions, therefore such inconsistencies must be dealt with at this step.

Data Exploration & Analysis: This is the most important step in AI. Here you study the relationship between various predictor variables. For example, if a person has spent an unusual sum of money on a particular day, the chances of a fraudulent occurrence are very high. Such patterns must be detected and understood at this stage.

Building a Machine Learning model: There are many machine learning algorithms that can be used for detecting fraud. One such example is Logistic Regression, which is a classification algorithm. It can be used to classify events into 2 classes, namely, fraudulent and non-fraudulent.

Model Evaluation: Here, you basically test the efficiency of the machine learning model. If there is any room for improvement, then parameter tuning is performed. This improves the accuracy of the model.

Q6. A bank manager is given a data set containing records of 1000s of applicants who have applied for a loan. How can AI help the manager understand which loans he can approve? Explain.

This problem statement can be solved using the KNN algorithm, that will classify the applicant’s loan request into two classes:

Approved
Disapproved

K Nearest Neighbour is a Supervised Learning algorithm that classifies a new data point into the target class, depending on the features of its neighboring data points.

Bank Loan Approval Using AI - Artificial Intelligence Interview Questions - Edureka

Bank Loan Approval Using AI – Artificial Intelligence Interview Questions – Edureka

The following steps can be carried out to predict whether a loan must be approved or not:

Data Extraction: At this stage data is either collected through a survey or web scraping is performed. Data about the customers must be collected. This includes their account balance, credit amount, age, occupation, loan records, etc. By using this data, we can predict whether or not to approve the loan of an applicant.

Data Cleaning: At this stage, the redundant variables must be removed. Some of these variables are not essential in predicting the loan of an applicant, for example, variables such as Telephone, Concurrent credits, etc. Such variables must be removed because they will only increase the complexity of the Machine Learning model.

Data Exploration & Analysis: This is the most important step in AI. Here you study the relationship between various predictor variables. For example, if a person has a history of unpaid loans, then the chances are that he might not get approval on his loan applicant. Such patterns must be detected and understood at this stage.

Building a Machine Learning model: There are n number of machine learning algorithms that can be used for predicting whether an applicant loan request is approved or not. One such example is the K-Nearest Neighbor, which is a classification and a regression algorithm. It will classify the applicant’s loan request into two classes, namely, Approved and Disapproved.

Q7. You’ve won a 2-million-dollar worth lottery’ we all get such spam messages. How can AI be used to detect and filter out such spam messages?

To understand spam detection, let’s take the example of Gmail. Gmail makes use of machine learning to filter out such spam messages from our inbox. These spam filters are used to classify emails into two classes, namely spam and non-spam emails.

Let’s understand how spam detection is done using machine learning:

Spam Detection Using AI – Artificial Intelligence Interview Questions – Edureka

A machine learning process always begins with data collection. We all know the data Google has, is not obviously in paper files. They have data centers which maintain the customer’s data. Data such as email content, header, sender, etc are stored.
This is followed by data cleaning. It is essential to get rid of unnecessary stop words and punctuations so that only the relevant data is used for creating a precise machine learning model. Therefore, in this stage stop words such as ‘the’, ‘and’, ‘a’ are removed. The text is formatted in such a way that it can be analyzed.
After data cleaning comes data exploration and analysis. Many a time, certain words or phrases are frequently used in spam emails. Words like “lottery”, “earn”, “full-refund” indicate that the email is more likely to be a spam one. Such words and co-relations must be understood in this stage.
After retrieving useful insights from data, a machine learning model is built. For classifying emails as either spam or non-spam you can use machine learning algorithms like Logistic Regression, Naïve Bayes, etc. The machine learning model is built using the training dataset. This data is used to train the model and make it learn by using past user email data.
This stage is followed by model evaluation. In this phase, the model is tested using the testing data set, which is nothing but a new set of emails. After which the machine learning model is graded based on the accuracy with which it was able to classify the emails correctly.
Once the evaluation is over, any further improvement in the model can be achieved by tuning a few variables/parameters. This stage is also known as parameter tuning. Here, you basically try to improve the efficiency of the machine learning model by tweaking a few parameters that you used to build the model.
The last stage is deployment. Here the model is deployed to the end users, where it processes emails in real time and predicts whether the email is spam or non-spam.

Q8. Let’s say that you started an online shopping business and to grow your business, you want to forecast the sales for the upcoming months. How would you do this? Explain.

This can be done by studying the past data and building a model that shows how the sales have varied over a period of time. Sales Forecasting is one of the most common applications of AI. Linear Regression is one of the best Machine Learning algorithms used for forecasting sales.

When both sales and time have a linear relationship, it is best to use a simple linear regression model.

Linear Regression is a method to predict dependent variable (Y) based on values of independent variables (X). It can be used for the cases where we want to predict some continuous quantity.

Dependent variable (Y):
The response variable whose value needs to be predicted.
Independent variable (X):
The predictor variable used to predict the response variable.

In this example, the dependent variable ‘Y’ represents the sales and the independent variable ‘X’ represents the time period. Since the sales vary over a period of time, sales is the dependent variable.

Forecasting Sales Using AI – Artificial Intelligence Interview Questions – Edureka

The following equation is used to represent a linear regression model:

Y=𝒃𝟎+𝒃𝟏 𝒙+ⅇ

Linear Regression - Artificial Intelligence Interview Questions - Edureka

Linear Regression – Artificial Intelligence Interview Questions – Edureka

Here,

Y = Dependent variable
𝒃𝟎 = Y-Intercept
𝒃𝟏 = Slope of the line
x = Independent variable
e = Error

Therefore, by using the Linear Regression model, wherein Y-axis represents the sales and X-axis denotes the time period, we can easily predict the sales for the upcoming months.

Q9. ‘Customers who bought this also bought this…’ we often see this when we shop on Amazon. What is the logic behind recommendation engines?

E-commerce websites like Amazon make use of Machine Learning to recommend products to their customers. The basic idea of this kind of recommendation comes from collaborative filtering. Collaborative filtering is the process of comparing users with similar shopping behaviors in order to recommend products to a new user with similar shopping behavior.

Recommendation System Using AI - Artificial Intelligence Interview Questions - Edureka

Recommendation System Using AI – Artificial Intelligence Interview Questions – Edureka

To better understand this, let’s look at an example. Let’s say a user A who is a sports enthusiast bought, pizza, pasta, and a coke. Now a couple of weeks later, another user B who rides a bicycle buys pizza and pasta. He does not buy the coke, but Amazon recommends a bottle of coke to user B since his shopping behaviors and his lifestyle is quite similar to user A. This is how collaborative filtering works.

Q10. What is market basket analysis and how can Artificial Intelligence be used to perform this?

Market basket analysis explains the combinations of products that frequently co-occur in transactions.

For example, if a person buys bread, there is a 40% chance that he might also buy butter. By understanding such correlations between items, companies can grow their businesses by giving relevant offers and discount codes on such items.

Market Basket Analysis is a well-known practice that is followed by almost every huge retailer in the market. The logic behind this is Machine Learning algorithms such as Association Rule Mining and Apriori algorithm:

Association rule mining is a technique that shows how items are associated with each other.
Apriori algorithm uses frequent itemsets to generate association rules. It is based on the concept that a subset of a frequent itemset must also be a frequent itemset.

Association Rule Mining – Artificial Intelligence Interview Questions – Edureka

For example, the above rule suggests that, if a person buys item A then he will also buy item B. In this manner the retailer can give a discount offer which states that on purchasing Item A and B, there will be a 30% off on item C. Such rules are generated using Machine Learning. These are then applied on items in order to increase sales and grow a business.

Q11. Place an agent in any one of the rooms (0,1,2,3,4) and the goal is to reach outside the building (room 5). Can this be achieved through AI? If yes, explain how it can be done.

Reinforcement Learning - Artificial Intelligence Interview Questions - Edureka

Reinforcement Learning – Artificial Intelligence Interview Questions – Edureka

In the above figure:

5 rooms in a building connected by doors
Each room is numbered 0 through 4
The outside of the building can be thought of as one big room (5)
Doors 1 and 4 directly lead into the building from room 5 (outside)

This problem can be solved by using the Q-Learning algorithm, which is a reinforcement learning algorithm used to solve reward based problems.

Let’s represent the rooms on a graph, each room as a node, and each door as a link, like so:

Reinforcement Learning - Artificial Intelligence Interview Questions - Edureka

Reinforcement Learning – Artificial Intelligence Interview Questions – Edureka

Next step is to associate a reward value to each door:

Reinforcement Learning - Artificial Intelligence Interview Questions - Edureka

Reinforcement Learning – Artificial Intelligence Interview Questions – Edureka

doors that lead directly to the goal have a reward of 100
Doors not directly connected to the target room have zero reward
Because doors are two-way, two arrows are assigned to each room
Each arrow contains an instant reward value

Now let’s try to understand how Q-Learning can be used to solve this problem. The terminology in Q-Learning includes the terms state and action:

The room (including room 5) represents a state
Agent’s movement from one room to another represents an action

In the figure, a state is depicted as a node, while “action” is represented by the arrows. Suppose, the Agent traverses from room 2 to room5, then the following path is taken:

Initial state = state 2
State 2 -> state 3
State 3 -> state (2, 1, 4)
State 4 -> state 5

Next, we can put the state diagram and the instant reward values into a reward table or a matrix R, like so:

Reinforcement Learning - Artificial Intelligence Interview Questions - Edureka

Reinforcement Learning – Artificial Intelligence Interview Questions – Edureka

The next step is to add another matrix Q, representing the memory of what the agent has learned through experience.

The rows of matrix Q represent the current state of the agent
columns represent the possible actions leading to the next state

The formula to calculate the Q matrix:

Q(state, action) = R(state, action) + Gamma * Max [Q(next state, all actions)]

Here, Q(state, action) and R(state, action) represent the state and action in the Reward matrix R and the Memory matrix Q.

Note: The Gamma parameter has a range of 0 to 1 (0 <= Gamma > 1).

If Gamma is closer to zero, the agent will tend to consider only immediate rewards.
If Gamma is closer to one, the agent will consider future rewards with greater weight

Finally, by following the below steps, the agent will reach room 5 by taking the most optimal path:

Reinforcement Learning - Artificial Intelligence Interview Questions - Edureka

Reinforcement Learning – Artificial Intelligence Interview Questions – Edureka

Q12. The crop yield in India is degrading because farmers are unable to detect diseases in crops during the early stages. Can AI be used for disease detection in crops? If yes, explain.

AI can be used to implement image processing and classification techniques for extraction and classification of leaf diseases.

Image Processing Using AI - Artificial Intelligence Interview Questions - Edureka

Image Processing Using AI – Artificial Intelligence Interview Questions – Edureka

This sounds complex, let me break it down into steps:

Image Acquisition: The sample images are collected and stored as an input database.

Image Pre-processing: Image pre-processing includes the following:

Improve image data that suppresses unwanted distortion
Enhance image features
Image clipping, enhancement, color space conversion
Perform Histogram equalization to adjust the contrast of an image

Image Segmentation: It is the process of partitioning a digital image into multiple segments so that image analysis becomes easier. Segmentation is based on image features such as color, texture. A popular Machine Learning method used for segmentation is the K-means clustering algorithm.

Feature Extraction: This is done to extract information that can be used to find the significance of a given sample. The Haar Wavelet transform can be used for texture analysis and the computations can be done by using Gray-Level Co-Occurrence Matrix.

Classification: Finally, Linear Support Vector Machine is used for classification of leaf disease. SVM is a binary classifier which uses a hyperplane called the decision boundary between two classes. This results in the formation of two classes:

Diseased leaves
Healthy leaves

Therefore, AI can be used in Computer Vision to classify and detect disease by studying and processing images. This is one of the most profound applications of AI.

So these are the most frequently asked questions in an Artificial Intelligence Interview. However, if you wish to brush up more on your knowledge, you can go through these blogs:

With this, we come to an end of this blog. I hope these Artificial Intelligence Interview Questions will help you ace your AI Interview.

If you’re looking to learn more about AI, Edureka provides a specially curated Machine Learning Engineer Master Program that will make you proficient in techniques like Supervised Learning, Unsupervised Learning, and Natural Language Processing. It includes training on the latest advancements and technical approaches in Artificial Intelligence & Machine Learning such as Deep Learning, Graphical Models and Reinforcement Learning.

The post Most Frequently Asked Artificial Intelligence Interview Questions appeared first on Edureka.

↧

Differences Between SQL & NoSQL Databases – MySQL & MongoDB Comparison

April 16, 2019, 12:20 am

≫ Next: How to Write a Good Test Case in Software Testing?

≪ Previous: Most Frequently Asked Artificial Intelligence Interview Questions

With the amount of data present in the world, it is almost next to impossible, to manage data without proper databases. In today’s market, there are different kinds of databases present, and deciding on the best database which suits your business can be an overwhelming task. So, in this article on SQL vs NoSQL, I will compare these two type of databases to help you choose which type of database can help you and your organization.

The following topics will be covered in this article:

So, let us get started, folks!!

What is SQL?

SQL aka Structured Query Language is the core of the relational database which is used for accessing and managing the databases. This language is used to manipulate and retrieve data from a structured data format in the form of tables and holds relationships between those tables. The relations could be as follows:

Relationships in SQL - SQL vs NoSQL - Edureka

A One-to-One Relationship is when a single row in Table A is related to a single row in Table B.
A One-to-Many Relationship is when a single row in Table A is related to many rows in table B.
A Many-to-Many Relationship is when many rows in table A can be related to many rows in table B.
A Self -Referencing Relationship is when a record in table A is related to the same table itself.

Now, next in this article let us understand what is NoSQL?

What is NoSQL?

NoSQL, or most commonly known as Not only SQL database, provides a mechanism for storage and retrieval of unstructured data. This type of database can handle a humongous amount of data and has a dynamic schema. So, a NoSQL database has no specific query language, no or a very few relationships, but has data stored in the format of collections and documents.

So, a database can have a ‘n’ number of collections and each collection can have ‘m‘ number of documents. Consider the example below.

Representation of NoSQL Database - SQL vs NoSQL - Edureka

As you can see from the above image, there is an Employee Database which has 2 collections i.e. the Employee and Projects Collection. Now, each of these collections has Documents, which are basically the data values. So, you can assume the collections to be your tables and the Documents to be your fields in the tables.

Alright, So, now that you know what is SQL & NoSQL, let us now see, how these databases stand against each other.

SQL vs NoSQL

So, in this face off, I will be comparing both these databases based on the following grounds:

Type of database

SQL is called a relational database as it organizes structured data into defined rows and columns, with each table being related to the other tables in the database.

NoSQL, on the other hand, is known as a non-relational database. This is because data is stored in the form of collections with no or few relations between them.

Schema

SQL needs a predefined schema for structured data. So, before you start using SQL to extract and manipulate data, you need to make sure that your data structure is pre-defined in the form of tables.

However, NoSQL, have a dynamic schema for unstructured data. So, if you are using a NoSQL database, then there is no pre-defined schema present, and the complete schema of your data completely depends upon how you wish to store data. i.e. which fields do you want to store in documents and collections.

Database Categories

The SQL databases are table based databases. So, you can have ‘n’ number of tables related to each other and each table can have rows and columns which store data in each cell of the table.

Now, if we talk about NoSQL Databases, then NoSQL databases have the following categories of databases:

Document Database– It pairs each key with a complex data structure known as the document. It can contain many different key-value pairs, or key array pairs or even nested documents
Key value stores– They are the simplest NoSQL databases. Every single item in the database is stored as an attribute name or key together with its value.
Graph store– They are used to store information about networks, such as social connections. Graph stores include Neo4J and HyperGraphDB.
Wide column stores– Wide column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows.

So, SQL databases store data in the form of tables and NoSQL databases store data in the form of key-value pair, documents, graph databases or wide-column stores.

Complex Queries

SQL is a better fit for complex query environment when compared to NoSQL as the schema in SQL databases is structured and has data stored in a tabular format. So, even if you wish to apply nested queries with many subqueries inside the outer query, you can easily do by using the proper table and column names.

Now, the reason why NoSQL databases isn’t a good fit for complex queries is because the NoSQL databases aren’t queried in a standard language like SQL.

Hierarchical Data Storage

Well, when we compare the databases on this factor, NoSQL fits better for hierarchical storage when compared to SQL databases.

This is because as the number of tables increases, the complexity of maintaining relations between them also keeps increasing. So, in such a scenario, you cannot relate the humongous amount of tables with many columns in them to each other. But, when you consider a NoSQL database, this kind of database fits better for the hierarchical data storage as it follows the key-value pair way of storing data which is similar to JSON data.

Scalability

The SQL databases are vertically scalable. You can load balance the data servers by optimizing hardware such as increasing CPU, RAM, SSD, etc.

On the other hand, NoSQL databases are horizontally scalable. You can perform load balancing by adding more servers to your cluster to handle a large amount of traffic.

Language

The SQL databases have a specific language, and it does not vary from databases to databases. This kind of databases uses the SQL ( Structured Query Language ) for retrieving and manipulating the data.

The NoSQL databases have no specific language used for queries, and it varies from database to database. In the NoSQL database, the queries are mainly focused on the collection of documents and the language is known as UnQL (Unstructured Query Language).

Online Processing

On comparing SQL and NoSQL, based on this factor, SQL databases are used for heavy-duty transactional type applications. Well, this is because SQL provides atomicity, integrity, and stability of the data. Also, you can use NoSQL for transactions purpose, but, it is still not stable enough in high load and for complex transactional applications. So, you can understand that SQL is mainly used for OLTP(Online Transactional Processing) and NoSQL is mainly used for OLAP(Online Analytical Processing).

Base Properties

SQL databases are based on ACID properties ( Atomicity, Consistency, Isolation, and Durability) whereas the NoSQL databases are based on the Brewers CAP theorem ( Consistency, Availability, and Partition tolerance ).

Let me explain you the ACID properties first:

Atomicity: Atomicity refers to the transactions that are completely done or failed where transaction refers to a single logical operation of a data. It means if one part of any transaction fails, the entire transaction fails and the database state is left unchanged.
Consistency: Consistency ensures that the data must meet all the validation rules. In simple words, you can say that your transaction never leaves the database without completing its state.
Isolation: The main goal of isolation is concurrency control.
Durability: Durability means that if a transaction has been committed, it will occur whatever may come in between such as power loss, crash or any sort of error.

Coming to CAP Theorem,

Brewers CAP Theorem states that a database can only achieve at most two out of three guarantees: Consistency, Availability and Partition Tolerance. Here

Consistency: All the nodes see the same data at the same time.
Availability: Guarantees whether every request is successful in failed.
Partition Tolerance: Guarantees whether a system continues to operate despite message loss or failure of part of the system.

NoSQL can not provide consistency and high availability together.

External Support

All the SQL vendors offer excellent support since SQL has been into existence for more than the past 40 years. However, for some NoSQL database, only limited experts are available and you still have to rely on community support to deploy your large scale NoSQL deployments. This is because NoSQL has come into existence in the late 2000s and people haven’t explored it yet much.

So, if I have to summarize the differences for SQL and NoSQL in this article on SQL vs NoSQL, you can refer to the below table.

Key Areas	SQL	NoSQL
Type of database	Relational Database	Non-relational Database
Schema	Pre-defined Schema	Dynamic Schema
Database Categories	Table based Databases	Document-based databases, Key-value stores, graph stores, wide column stores
Complex Queries	Good for complex queries	Not a good fit for complex queries
Hierarchical Data Storage	Not the best fit	Fits better when compared to SQL
Scalability	Vertically Scalable	Horizontally Scalable
Language	Structured Query language	Unstructured Query language
Online Processing	Used for OLTP	Used for OLAP
Base Properties	Based on ACID Properties	Based on CAP Theorem
External Support	Excellent support is provided by all SQL vendors	Rely on community support.

Table 1: Differences between SQL and NoSQL – SQL vs NoSQL

So, folks, with this we come to an end of this face-off between SQL and NoSQL. Now, that we have discussed so much about SQL and NoSQL, let me show you some examples of the same.

Examples of SQL and NoSQL

Examples of SQL and NoSQL are as follows:

Examples of SQL and NoSQL - SQL vs NoSQL - Edureka

Now, the most popular databases from SQL and NoSQL are MySQL and MongoDB.

So, next in this article on SQL vs NoSQL, we will be comparing MySQL and MongoDB. But, before that, you can also go through this video on SQL vs NoSQL.

SQL vs NoSQL – Difference B/W SQL & NoSQL Databases | Edureka

What is MySQL?

MySQL is an open-source relational database management system that works on many platforms. It provides multi-user access to support many storage engines and is backed by Oracle. So, you can buy a commercial license version from Oracle to get premium support services.

The following are the features of MySQL:

Features of SQL - SQL vs NoSQL - Edureka

Ease of Management – The software very easily gets downloaded and also uses an event scheduler to schedule the tasks automatically.
Robust Transactional Support – Holds the ACID (Atomicity, Consistency, Isolation, Durability) property, and also allows distributed multi-version support.
Comprehensive Application Development – MySQL has plugin libraries to embed the database into any application. It also supports stored procedures, triggers, functions, views and many more for application development. You can refer to the RDS Tutorial, to understand Amazon’s RDBMS.
High Performance – Provides fast load utilities with distinct memory caches and table index partitioning.
Low Total Cost Of Ownership – This reduces licensing costs and hardware expenditures.
Open Source & 24 * 7 Support – This RDBMS can be used on any platform and offers 24*7 support for open source and enterprise edition.
Secure Data Protection – MySQL supports powerful mechanisms to ensure that only authorized users have access to the databases.
High Availability – MySQL can run high-speed master/slave replication configurations and it offers cluster servers.
Scalability & Flexibility – With MySQL you can run deeply embedded applications and create data warehouses holding a humongous amount of data.

Next, in this article let us understand what is MongoDB?

What is MongoDB?

MongoDB is a non-relational database which stores the data in documents. This type of database stores the related information together for quick query processing.

The features of MongoDB are as follows:

Indexing: It indexes are created in order to improve the search performance.
Replication: MongoDB distributes the data across different machines.
Ad-hoc Queries: It supports ad-hoc queries by indexing the BSON documents & using a unique query language.
Schemaless: It is very flexible because of its schema-less database that is written in C++.
Sharding: MongoDB uses sharding to enable deployments with very large data sets and high throughput operations.

Alright, So, now that you know what is MySQL & MongoDB, let us now see, how these databases stand against each other.

MySQL vs MongoDB

So, in this face off, I will be comparing both these databases based on the following grounds:

Query Language

MySQL uses the Structured Query language(SQL). This language is simple and consists of mainly DDL, DML DCL & TCL commands to retrieve and manipulate data. MongoDB, on the other hand, uses an Unstructured Query Language. So, the query language is basically the MongoDB query language. Refer to the image below.

Flexibility of Schema

MySQL has good flexibility of schema for structured data as you just need to clearly define tables and columns. Now, MongoDB, on the other hand, has no restrictions on schema design. You can directly mention, a couple of documents inside a collection without having any relations between those documents. But, the only problem with MongoDB is that you need to optimize your schema based on how you want to access the data.

Relationships

On comparing MySQL and MongoDB based on this factor, MySQL support relationships with the help of JOIN statements but MongoDB does not support the JOIN statements. But, it supports placing one document inside another document (also known as the embedding of documents) and multi-dimensional data types such as arrays.

Security

MySQL basically uses a privilege-based security model. This kind of security model authenticates a user and facilitates the user privileges on a particular database.

MongoDB, on the other hand, uses a role-based access control with a flexible set of privileges providing security features such as authorization, and authentication.

Performance

On comparing MySQL and MongoDB on this parameter, let me tell you that MySQL is quite slow in comparison to MongoDB when large databases are considered. This is mainly due to because MySQL cannot be used for large and unstructured amounts of data.

However, MongoDB has the ability to handle large unstructured data. So, it is faster than MySQL where large databases are considered as it allows users to query in such a way that the load on servers are reduced.

NOTE: There is as such no hard and fast rule that MongoDB will be faster for your data all the time, It completely depends on your data and infrastructure.

Support

Well, both of them offer excellent support 24*7 for security fixes, maintenance releases, bug fixes, patches, and updates. So, there is as such no difference between both of them based on this parameter.

Key Features

You can refer to the following image for the key features of MySQL and MongoDB:

Replication

MySQL supports master-slave replication and master-master replication. MongoDB, on the other hand, supports built-in replication, sharding, and auto-elections. So, with the help of auto-elections in MongoDB, you can set up another or secondary database to automatically take over if the primary database fails.

Usage

You can refer to the following image for understanding where to use MySQL and MongoDB:

Active Community

On comparing MySQL with MongoDB based on this factor, MySQL databases offer a better community than MongoDB as it is owned, and maintained by the Oracle Corporation.

So, if I have to summarize the differences between MySQL and MongoDB, you can refer to the below table.

Key Areas	MySQL	MongoDB
Query Language	Uses Structured Query Language(SQL)	Uses MongoDB Query Language
Flexibility of Schema	Pre-defined schema design	No restrictions on schema design
Relationships	Supports JOIN statements	Does not support JOIN statements
Security	Uses privilege-security based model	Uses role-based access control
Performance	Slower than MongoDB	Faster than MySQL
Support	Provides excellent support 24*7	Provides excellent support 24*7
Key Features	Triggers & SSL Support Provides text searching and indexing Query caching Integrated replication support Different storage engines with various	Auto-sharding Comprehensive secondary indexes In-memory speed Native replication Embedded data models support
Replication	Supports Master-Slave Replication	Supports built-in replication, sharding, and auto-elections.
Usage	Best fit for data with tables and rows Works better for small datasets Frequent updates Strong dependency on multi-row transactions Modify large volume of records	Best fit for unstructured data Works better for large datasets High write loads High availability in an unstable environment Data is location-based
Active Community	Has a good active community.	The community of MySQL is much better than that of MongoDB.

Table 2: Differences between MySQL and MongoDB – SQL vs NoSQL

So, folks, with this we come to an end of this face-off between MySQL and MongoDB. Now, knowing so much more about MySQL and MongoDB might have raised a question on your mind i.e.Wether businesses should go for MySQL or MongoDB?

Well, there is no clear winner between both of them. The choice of database completely depends upon the schema of your database and how you wish to access it. Nevertheless, you can use MySQL when you have a fixed schema, high transaction, low maintenance, data security with a limited budget and MongoDB while you have an unstable schema, high availability, cloud computing, with in-built sharding.

So, there won’t be any final verdict as to which among them is the best as each one of these excel based on your requirement.

Now, that you know the differences between MySQL and MongoDB, next in this article on SQL vs NoSQL let me show you how to insert data into tables and collections in MySQL Workbench and MongoDB Compass respectively.

Demo: Insert Data Into Tables And Collections

Let us start with inserting data into a table using MySQL Workbench.

Insert data into a table using MySQL Workbench

To insert data into tables using MySQL Workbench, you can follow the below steps:

Step 1: Open MySQL Workbench and create a connection. To know how to create a connection, you refer to the MySQL Workbench Tutorial.

Step 2: Now, once your connection has been created, open your connection and then you will be redirected to the following dashboard.

Step 3: Now to create a database and a table, follow the below queries:


//Create Database
CREATE DATABASE Employee_Info;
//Use Database
USE Employee_Info;
//Create Table
CREATE TABLE Employee
(EmpID int,
EmpFname varchar(255),
EmpLname varchar(255),
Age int,
EmailID varchar(255),
PhoneNo int8,
Address varchar(255));

Step4: Now, once your table is created, to insert values into the table, use the INSERT INTO syntax as below:


//Insert Data into a Table
INSERT INTO Employee(EmpID, EmpFname, EmpLname,Age, EmailID, PhoneNo, Address)
VALUES ('1', 'Vardhan','Kumar', '22', 'vardy@abc.com', '9876543210', 'Delhi');

Step 5: When you view your table, you will the output as below.

Table - SQL vs NoSQL - Edureka Now, next in this article on SQL vs NoSQL, let us see how to create database and collections in MongoDB Compass.

Insert data into a collection using MongoDB Compass

To insert data into tables using MongoDB Compass, you can follow the below steps:

Step 1: Open MongoDB Compass and create a host. Once your host is created click on Connect. Refer below.

Step 2: Now, once your host is connected, to create a database, click on the Create Database option and mention the Database and the Collection Name.

Step 3: Now, open your database, and choose the collection. Here I have chosen samplecollection. To add documents into the collection, choose the Insert Document option and mention the parameters. Here I have mentioned the EmpID and EmpName.

Create Documents - SQL vs NoSQL - Edureka

Now with this, we come to an end of this comparison on SQL vs NoSQL. I Hope you guys enjoyed this article and understood all the differences. So, if you have read this, you might have a clear idea about which database will suit your needs.

Now that you have understood the comparison between SQL & NoSQL, check out the MySQL DBA Certification Training & MongoDB Certification Training by Edureka, a trusted online learning company with a network of more than 250,000 satisfied learners spread across the globe.

Got a question for us? Please mention it in the comments section of “SQL vs NoSQL” and we will get back to you.

The post Differences Between SQL & NoSQL Databases – MySQL & MongoDB Comparison appeared first on Edureka.

↧

How to Write a Good Test Case in Software Testing?

April 16, 2019, 2:34 am

≫ Next: Automation Anywhere Control Room – One Stop Solution To The Brain of Automation Anywhere

≪ Previous: Differences Between SQL & NoSQL Databases – MySQL & MongoDB Comparison

The prime objective of any software project is to get a high-quality output while reducing the cost and the time required for completing the project. To achieve that, companies test their software before they release it to the market. Documentation plays a critical role in achieving effective software testing. In this article let’s explore more about a documentation type called test case in software testing.

Listed below are the topics covered in this article:

You can go through this lecture on test case in software testing where our Software Testing Training expert is discussing each & every nitty gritties of the technology.

How To Write A Test Case? | Test Case In Software Testing | Edureka

Is Documentation Needed in Software Testing?

Yes! It is. Documentation plays a critical role in Test Automation. Here’s an example to convince you, people.

A company, let’s call it ‘ABC’, delivered a project (with an unknown issue) to one of its clients. And they found the issue at the client-side, which created a very bad situation for the company. Like always all blame was on Quality Analysts(QAs).

The issue was something regarding the compatibility of one website. The issue was presented to higher authorities, they showed the client a written proof of not receiving any such requirement asking to check compatibility of the website. So, the issue was resolved peacefully. In a way, the documented requirements saved the company from getting sued. That’s how documentation came in very handy.

There are different levels of documentation, like:

Documentation - Test Case in Software Testing - Edureka

- - Test Script: A line-by-line description of all the actions and data needed to perform a test.
  - Test Case: Describes a specific idea that is to be tested, without detailing the exact steps to be taken or data to be used.
  - Test Scenario: It is a simple description of an objective a user might face when testing.

Moving further with this article on ‘Test Case in Software Testing’ let’s learn more about test cases in particular.

What is a Test Case in Software Testing?

A test case is a document which has a set of conditions or actions that are performed on the software application in order to verify the expected functionality of the feature.

After test scripts, test cases are the second most detailed way of documenting testing work. They describe a specific idea that is to be tested, without detailing the exact steps to be taken or data to be used. For example, in a test case, you document something like ‘Test if coupons can be applied on actual price‘. This doesn’t mention how to apply the coupons or whether there are multiple ways to apply. It also doesn’t mention if the tester uses a link to apply a discount, or enter a code, or have a customer service apply it. They give flexibility to the tester to decide how they want to execute the test.

Apart from this, what is the use of test cases?

Benefits of Writing Test Cases

The key purpose of a test case is to ensure if different features within an application are working as expected. It helps tester, validate if the software is free of defects and if it is working as per the expectations of the end users. Other benefits of test cases include:

- - Test cases ensure good test coverage
  - Help improve the quality of software,
  - Decreases the maintenance and software support costs
  - Help verify that the software meets the end user requirements
  - Allows the tester to think thoroughly and approach the tests from as many angles as possible
  - Test cases are reusable for the future – anyone can reference them and execute the test.

So, these are a few reasons why test cases are extremely useful in software testing. Test cases are powerful artifacts that work as a good source of truth for how a system and a particular feature of software works. However, before we deep dive into the lessons for writing top-notch test cases, let us have a basic idea on the terminologies associated with them.

Test Case Format

The primary ingredients of a test case are an ID, description, bunch of inputs, few actionable steps, as well as expected and actual results. Let’s learn what each of them is:

- Test Case Name: A test case should have a name or title that is self-explanatory.
- Test Case Description: The description should tell the tester what they’re going to test in brief.
- Pre-Conditions: Any assumptions that apply to the test and any preconditions that must be met prior to the test being executed should be listed here.
- Test Case Steps: The test steps should include the necessary data and information on how to execute the test. The steps should be clear and brief, without leaving out essential facts.
- Test Data: It’s important to select a data set that gives sufficient coverage. Select a data set that specifies not only the positive scenarios but negative ones as well.
- Expected Result: The expected results tell the tester what they should experience as a result of the test steps.
- Actual Result: They specifies how the application actually behaved while test cases were being executed.
- Comments: Any other useful information such as screenshots that tester want’s to specify can be included here.

This is the typical format that testers follow when they write a test case. Along with these parameters, testers can include additional parameters like test case priority, type of test case, bug id, etc.

Now that we are familiar with the format, let’s go one step ahead in the ‘Test Case in Software Testing’ article and learn about different techniques that you can use to write test cases.

Test Case Design Techniques

An efficient test case design technique is necessary to improve the quality of the software testing process. It helps to improve the overall quality and effectiveness of the released software. The test case design techniques are broadly classified into three major categories:

Specification-Based(Black Box Techniques): This type of techniques can be used to design test cases in a systematic format. These use external features of the software such as technical specifications, design, client’s requirements, and more, to derive test cases. With this type of test case design techniques, testers can develop test cases that save testing time and allow full test coverage.

Structure-Based(White Box Techniques): These techniques design test cases based on the internal structure of the software program and code. Developers go into minute details of the developed code and test them one by one.

Experienced-Based Techniques: These techniques are highly dependent on tester’s experience to understand the most important areas of the software. They are based on the skills, knowledge, and expertise of the people involved.

This article ‘Test Case in Software Testing’ further lists the different techniques which come under the design categories specified above.

Test Case Design Techniques - Test Case in Software Testing - Edureka

Successful application of any of these test case design techniques will render test cases that ensure the success of software testing. The remainder of this ‘Test Case in Software Testing’ article let’s check out how to write a good test case.

Demo: How To Write a Test Case in Software Testing?

Here are the simple steps to get you started.

Preparing to write a test case

Check if a test case already exists. If yes, consider updating test case, rather than writing a new one.
Make sure the test case has certain characteristics like accuracy, tracing, repetition, re-usability, and independence.
Consider all the different scenarios possible before writing.
Give yourself enough time to write test cases.

Writing a test case

Select a tool for writing a test case.
Write a test case in the format discussed earlier.
Write basic test statements.
Review written test cases thoroughly.

Here’s a sample test case for checking login functionality, though I have added just two possibilities.

Demo - Test Case - Test Case in Software Testing - Edureka

Knowing how to write good test cases is extremely important. It doesn’t take too much of your effort and time to write effective test scripts as long as you follow certain guidelines.

Test Cases Best Practices

Test cases are very important for any project as they are the first step in any testing cycle. If anything goes wrong at this step, it might have undesirable impacts as you move forward in the software testing life-cycle. Few guidelines that you need to follow while writing test cases are:

- Prioritize which test cases to write based on the project timelines and the risk factors of your application.
- Remember the 80/20 rule. To achieve the best coverage, 20% of your tests should cover 80% of your application.
- Don’t try to test cases in one attempt instead improvise them as you progress.
- List down your test cases and classify them based on business scenarios and functionality.
- Make sure test cases are modular and test case steps are as granular as possible.
- Write test cases in such a way that others can understand them easily & modify if required.
- Always keep end-users’ requirements in the back of your mind because ultimately the software designed is for the customer
- Actively use a test management tool to manage stable release cycle.
- Monitor your test cases regularly. Write unique test cases and remove irrelevant & duplicate test cases.

Well, I can keep going on but there are way too many guidelines than I can actually list here. The ones listed above should be good enough for you to get started in writing test cases. Hope the things that you have learned here today will help you as you head out on your software testing journey.

If you found this “Test Case in Software Testing” article relevant, check out the live-online Selenium Certification Training by Edureka, a trusted online learning company with a network of more than 250,000 satisfied learners spread across the globe.

Got a question for us? Please mention it in the comments section of this ‘Test Case in Software Testing’ article and we will get back to you.

The post How to Write a Good Test Case in Software Testing? appeared first on Edureka.

↧

Automation Anywhere Control Room – One Stop Solution To The Brain of Automation Anywhere

April 16, 2019, 4:13 am

≫ Next: Node JS Installation – Know How To Download & Install Node.js

≪ Previous: How to Write a Good Test Case in Software Testing?

Since the time Automation has come into existence, various tools have been emerging in the market. Automation Anywhere is one such popular RPA tool used by today’s market to automate various kinds of tasks. Now, this obviously makes us realize the need to enhance our RPA skills and understand the architecture of this tool. In this article on Automation Anywhere Control Room, I am going to discuss the following topics:

What is Automation Anywhere?

Automation Anywhere Logo - Automation Anywhere Control Room - Edureka

Automation Anywhere is an RPA Tool which aims to provide the users with a digital workforce composed of software bots. These bots are responsible to complete an end-to-end process by simultaneously providing scalability and security.

This tool has recently launched a community edition which will help you to first explore the tool and automate tasks. Once you are proficient enough with this tool it also offers an Enterprise Edition with a 30-day free trial.

Now, that you know what is Automation Anywhere, let us next look into the architecture of Automation Anywhere.

Automation Anywhere Architecture

The architecture of Automation Anywhere is a Distributed Architecture. This architecture has mainly 3 components which work together to help the user achieve his tasks. Those are the Bot Creators, Control Room, and Bot Runners. The Bot Creators and the Bot Runners are connected to the Control Room, and the Control Room is the brain of Automation Anywhere.

Now, let us understand each of these components one by one.

Bot Creators

Bot Creators are simply used to create bots. These are desktop based applications whose sole role is to upload or download bots and connect them to the control room. Also, multiple users can create bots which are created and configured for the Control Room.

Bot Runners

Bot Runners are responsible to run or execute the scheduled bots. Multiple bots can be executed in parallel, but the Bot Runners cannot update or create automation. This component of the Automation Anywhere is also connected to the Control Room and have the ability to report back the execution log status to the control room.

Control Room

Control Room is the most important component of the architecture. It is a web server that basically controls the bots created by the Bot Creators and also handles the automation executed by the Bot Runners. The Control Room ensure centralized management by offering features such as centralized user management, automation deployment, source control and so on. You can refer to the below image for a summary of Control Room.

Automation Anywhere Control Room - Automation Anywhere Control Room - Edureka

Now, once you have downloaded your Community Edition or the Enterprise Edition, you have to log in to the Control Room using the credentials, which you might have got in an email. Refer below.

Automation Anywhere Control Room Login - Automation Anywhere Control Room - Edureka

Once you have logged in, you will be redirected to the dashboard of Control Room. Refer below.

Control Room Dashboard - Automation Anywhere Control Room - Edureka

As you can see from the above image, there are few components of Dashboard which you need to understand to use the Control Room dashboard. So, next in this article, let us look into components of Control Room.

Control Room Components

The components of the Control Room are as follows:

Control Room Dashboard Components - Automation Anywhere Control Room - Edureka

Dashboards:

The Dashboard feature of Control Room basically provides stats of various parameters. It has the following 5 options under it:

Home – The Home dashboard shows the users the bot run status, total queues, total bot schedules, total users, bot velocity and the capacity of bots vs bot runners.
Bots – The bots dashboard show the users the bot heartbeat, MVP bots, bot status, failure reasons, and the upcoming schedule of bots.
Devices – The devices dashboard focus on hardware utilization such as CPU utilization, Memory Utilization, HDD Utilization, Upcoming Device Utilization, and Failure Analysis.
Workload – The Workload Dashboard has two tabs: Executive Dashboard and Operation Manager’s Dashboard. The Executive Dashboard provides visualizations for queue status, queue with avg time, device pools by backlog and queues by time to complete. Similarly, the Operation Manager’s Dashboard provides visualization for device pools by FTE, pools by decreasing error rate, queues with avg time, device pools by backlogs.
Insight – The Bot Insight focuses on providing real-time business insights and the digital workforce.

Activity:

The activity feature of the Control Room dashboard is mainly used to check the history and the progress of the scheduled activities. It has the following 3 options under it:

In progress: The In Progress Activity tab is used to check the status of the activity which is currently executing. It gives all the details related to the activity such as the username, bot name, item name, progress, etc.
Scheduled: The Scheduled activity tab is used to check the details of the activity which is scheduled to execute sometime later. It gives all the details related to the activity such as the username, bot name, item name, progress, etc.
Historical: The historical activity tab is used to check the details of the activity which might have completed, finished, stopped or timed out. It gives details related to the device name, automation name, bot name, user, started on, etc.

Bots:

The Bots feature of Control Room dashboard to see the details of tasks created. It gives details such as the task name, credentials, client name, etc. It has the following 3 options under it:

My Bots: The bots tab lists the tasks and its details such as the name, client name, last modified date.
Credentials: The credentials tab displays the credentials, lockers, and credentials requests.

Devices:

The Devices feature displays whether a user is logged into the client UI on the device or not. Also, if the username is disabled then the device is shown ‘Offline’ and if the username is deleted then the device name will no longer appear.

Workloads:

The workload feature of the Control Room Dashboard displays the load of the queues by displaying details such as queue name, automation name, automation status, bot name, and device pool.

So folks! That’s an end to this article on Automation Anywhere Control Room. I hope you understood what is the Control room and what are the different components of the dashboard of the Control Room. Now, If you wish to give a jump start, to your career as an RPA Developer, then starting learning RPA and it’s various Tools.

We at edureka, offer Robotic Process Automation Training using UiPath. Edureka is a trusted online learning company with a network of more than 250,000 satisfied learners spread across the globe. This training will help you gain deep knowledge in Robotic Process Automation and hands-on experience in UiPath.

Got a question for us? Please mention it in the comments section of this Automation Anywhere Control Room article and we will get back to you.

The post Automation Anywhere Control Room – One Stop Solution To The Brain of Automation Anywhere appeared first on Edureka.

↧

Node JS Installation – Know How To Download & Install Node.js

April 16, 2019, 4:16 am

≫ Next: Regression Testing Complete Guide: Everything You Need To Know

≪ Previous: Automation Anywhere Control Room – One Stop Solution To The Brain of Automation Anywhere

Node.js is one of the most powerful JavaScript frameworks that has been acting as a backbone of front end development since ages. This is why front end developers worldwide actively opt for Node.js Certification. You can check out my Node.js Tutorial to get some basic insights of it. But before you get started, you need to install Node.js in your system. Through the medium of this Node JS Installation article, I will give you step by step explanation of how it’s done.

In order to get started with Node.js, you need to install the following in your windows system:

Node.js Installation

To begin with the Node JS installation process, first you need to make sure you have sufficient space and your RAM is at least 4GB

STEP 1: You can download Node.js from its official site: https://nodejs.org/en/download/.

Installation Step 1 - Node JS Installation - Edureka STEP 2: In the downloads page, you will see various versions of Node. All you need to do is, click on the box, suitable to your system configurations.

STEP 3: Once you have successfully downloaded the software, go to the downloaded file, and double-click on it.

STEP 4: As soon as you click on the file, an installation wizard will come up. Select ‘Next’ and move further with the installation.

Installation Step 4 - Node JS Installation - Edureka STEP 5: Checkmark on the “I Agree” checkbox and click on the ‘Next’ button.

Installation Step 5 - Node JS Installation - Edureka STEP 6: By clicking on ‘Change’, set the path, where you want to install the NodeJs file and then click on next.

Installation Step 6 - Node JS Installation - Edureka STEP 7: Again, click on next.

Installation Step 7 - Node JS Installation - Edureka STEP 8: Now, click on the ‘Install’ button to finish up the installation process.

Installation Step 8 - Node JS Installation - Edureka

STEP 9: Once done with the installation, click on the ‘Finish’ button to exit the installation wizard.

Installation Step 9 - Node JS Installation - Edureka
With this, you are done with the most important part of the Node.js Installation. Now, let me now throw some light on another core module of Node.js without which Node.js won’t work.

NPM (Node Package Manager) Installation

NPM is the default package manager for Node.js that is completely written in Javascript. It helps in managing the packages and modules required for Node.js functioning and provides a command line client npm. Packages in Node.js are the entities which bundle up all the necessary files that are required for a module. Now, by modules, I mean the JavaScript libraries which can be incorporated in your project. Apart from this, using package.json file, Node.js can easily install all the required dependencies of the project. Also, you can update or install various packages using npm.

Node.js version 0.6.3 onwards, npm was included as a default package of Node.js. Thus, there is no need to install it explicitly. So, let me directly show you how you can install, updates and uninstall packages via npm.

npm install package_name

IDE/ Text Editor Installation

Now, since you are done with the Node.js installation part, you need to start writing the programs. You can write your programs either in some text editors or you can use various IDE’s available for free. Below I have listed down the most popular IDE’s and Text Editors preferred by the Node.js developers.

Top 5 IDE’s and Text Editors are listed below:

Cloud9
IntelliJ
Webstorm
Brackets
Sublime

To verify whether your installation was successful or not, try executing a Node.js file.

Open your IDE/Text Editor and type in the below code.

console.log('Welcome To Edureka NodeJS Tutorial!');

Save the file with a .js extension such as filename.js.

Now open your Node.js command prompt, navigate to the folder containing the js file. Now, type in the below command and hit enter to display the text in your console.

node filename.js

With this, we come to an end of this article on Node.js installation. I tried to keep the steps as detailed as possible for your better understanding. I hope it gave you a clear understanding of the entire Node.js installation process. So now your system is ready to execute any Node.js program. To get detailed insights into Node.js fundamental concepts, you can refer to my Node.js Tutorial.

If you found this “Node.js Installation” relevant, check out the Node.js Certification Training by Edureka, a trusted online learning company with a network of more than 250,000 satisfied learners spread across the globe.

Got a question for us? Please mention it in the comments section of this Node.js Tutorial and we will get back to you.

The post Node JS Installation – Know How To Download & Install Node.js appeared first on Edureka.

↧

Regression Testing Complete Guide: Everything You Need To Know

April 16, 2019, 4:56 am

≫ Next: All You Need To Know About Page Object Model In Selenium

≪ Previous: Node JS Installation – Know How To Download & Install Node.js

Whenever new software is released, the need to test new functionality is obvious. However, it’s equally important to re-run old tests that the application previously passed. That way we can be sure that the new software does not re-introduce old defects or create new ones in the software. We call this type of testing as regression testing. Throughout this article, we will explore regression testing in detail. If you are new to software testing, be sure to also read the Beginners’ Guide for Software Testing.

Let’s take a look at topics covered in this article:

What is Regression Testing?

“Testing of a previously tested program following modification to ensure that defects have not been introduced or uncovered in unchanged areas of the software, as a result of the changes made is called Regression Testing.”

A regression test is a system-wide test whose main purpose is to ensure that a small change in one part of the system does not break existing functionality elsewhere in the system. If you consider regression as unintended change, then this type of testing is the process of hunting for those changes. In simple terms, it is all about making sure that old bugs don’t come back to haunt you. Let’s take a look at a fictitious example that illustrates the concept.

Regression TestingEx - What is Regression Testing - Edureka

When adding a new payment type to a shopping website, re-run old tests to ensure that the new code hasn’t created new defects or re-introduced old ones. Regression testing is important because, without it, it’s quite possible to introduce intended fixes into a system that create more problems than they solve.

Benefits of Regression Testing

Conducting regression tests benefits companies in a number of ways such as:

It increases the chance of detecting bugs caused by changes to software and application
It can help catch defects early and thus reduce the cost to resolve them
Helps in researching unwanted side effects that might have been occurred due to a new operating environment
Ensures better performing software due to early identification of bugs and errors
Most importantly, it verifies that code changes do not re-introduce old defects

Regression testing ensures the correctness of the software so that the best version of the product is released to the market. However, in the real world, designing and maintaining a near-infinite set of regression tests is just not feasible. So you should know when to apply regression testing.

When to apply Regression Testing?

It is recommended to perform regression testing on the occurrence of the following events:

- When new functionalities are added
- In case of change requirements
- When there is a defect fix
- When there are performance issues
- In case of environment changes
- When there is a patch fix

Next part of this article is about different types of regression testing.

What are the types of Regression Testing?

Regression testing is done through several phases of testing. It is for this reason, that there are several types of regression testing. Some of them are as follows:

Unit Testing: In unit testing when coding changes are made for a single unit, a tester, usually the developer responsible for the code – re-runs all previously-passed unit tests. In continuous development environments, automated unit tests are built into the code, making unit testing very efficient in comparison to other types of testing.

Progressive Testing: This type of testing works effectively when there are changes done in the software/application specifications as well as new test cases are designed.

Selective Testing: In selective testing testers use a subset of the current test cases to cut down the retesting cost and effort. A test unit must be rerun if and only if any of the program entities it covers have been changed.

Retest-All Testing: This type of testing strategy involves the testing of all aspects of a particular application as well as reusing all test cases even where the changes have not been made. It is time-consuming and is not much use when any small modification or change is done to the application.

Complete Testing: This testing is very useful when multiple changes have been done in the existing code. Performing this testing is highly valuable to identify unexpected bugs. Once this testing is completed, the final system can be made available to the user.

It is very important to know which type of testing suits your requirement. Next up, we will discuss how regression testing is implemented.

How is Regression Testing Implemented?

The procedure to implement regression testing is like the one you apply for any other testing process. Every time the software undergoes a change and a new release comes up, the developer carries out these steps as part of the testing process:

First of all, he executes unit-level regression tests to validate code that they have modified, along with any new tests they have written to cover new or changed functionality
Then the changed code is merged and integrated to create a new build of the application under test(AUT)
Next, smoke tests are executed for assurance that the build is good before any additional testing is performed
Once the build is declared good, integration tests are performed to verify the interaction between units of the application with each other and with back-end services such as databases
Depending on the size and scope of the released code, either a partial or a full regression are scheduled
Defects are then reported back to the development team
Additional rounds of regression tests are performed if needed

That’s how regression testing is incorporated into a typical software testing process. The image below clearly depicts how regression testing performed.

Regression Testing Process - What is regression testing - Edureka

Whenever some changes are made to the source code, the program execution fails for obvious reasons. After the failure, the source code is debugged in order to identify the bugs in the program. Appropriate modifications are made. Then the appropriate test cases are selected from the already existing test suite which covers all the modified and affected parts of the source code. New test cases are added if required. In the end, testing is performed using the selected test cases. Now you might be wondering which test cases to select.

Effective Regression Tests can be done by selecting the following test cases:

Test cases which have frequent defects
Complex test cases
Integration test cases
Test cases which cover the core functionality of a product
Functionalities which are frequently used
Test vases which frequently fail
Boundary value test cases

With the regression testing process out of the way let’s check out various techniques.

Regression Testing Techniques

Regression testing simply confirms that modified software hasn’t unintentionally changed and it is typically performed using any combination of the following techniques:

Retest-All: This method simply re-tests the entire software suite, from top to bottom. In many cases, the majority of these tests are performed by automated tools. Certain times automation is not necessary. This technique is expensive as it requires more time and resources when compared to the other techniques.

Test Selection: Instead of choosing all the test cases, this method allows the team to choose a set of tests that will approximate full testing of the test suite. The primary advantage of this practice is that it requires far less time and effort to perform. Usually done by developers who will typically have better insight into the nuances of test edge-cases and unexpected behaviors.

Test Case Prioritization: The goal of this technique is to prioritize a limited set of test cases by considering more potential test cases ahead of less important ones. Test cases which could impact both current and future builds of the software are chosen.

These are the three major techniques. At times based on testing requirements these techniques are combined.

As useful as regression testing can be, it is not without its negative points. You need to understand the challenges that you might face when implementing it.

Challenges of Regression Testing

Time Consuming: Techniques like retest-all need lot of time to test the entire suite of test cases
Expensive: Costly because of resources and manpower that you need to test again and again, something which has already been developed, tested and deployed at early stages
Complex: As the product expands, testers are often overwhelmed by the huge amount of test cases and fall victim to losing the track of test cases, overlooking the important test cases

Despite these negative points, regression testing is very useful in the software testing process. With regression testing, companies can prevent projects from going over budget, keep their team on track, and, most importantly, prevent unexpected bugs from damaging their products. With this, we have reached the end of the blog. Hope the things that you have learned here today will help you as you head out on your software testing journey.

If you found this article relevant, check out the live-online Selenium Certification Training by Edureka, a trusted online learning company with a network of more than 250,000 satisfied learners spread across the globe.

Got a question for us? Please mention it in the comments section of this ‘What is Regression Testing?’ article and we will get back to you.

The post Regression Testing Complete Guide: Everything You Need To Know appeared first on Edureka.

↧

All You Need To Know About Page Object Model In Selenium

April 18, 2019, 1:02 am

≫ Next: Performance Testing Life Cycle : All You Need To Know About Testing Phases

≪ Previous: Regression Testing Complete Guide: Everything You Need To Know

Maintaining 1000 lines of code in a single class file is a cumbersome task and also it increases its complexity. In order to maintain the project structure and efficient performance of the selenium scripts, it is necessary to use different pages for different tasks. To ease the access of distributing the code into different modules, the Page object model comes to the rescue. In this article on Page Object Model in Selenium, you will be learning some of the core concepts of Page object Model and Page factory with the help of an example.

Below are the topics covered in this article:

You may also go through this recording of Selenium Projects by Selenium Certified Experts where you can understand the topics in a detailed manner with examples.

What is the Page Object Model?

Page Object Model is a design pattern in test automation to create an Object Repository for web UI elements. Every web page in the application should have a corresponding page class. This Page class will find the WebElements and also may contain page methods which perform operations on those WebElements.

Page object Model Design Pattern - Page Object Model in Selenium - Edureka The tests then use the methods of this page object class whenever they need to interact with the UI of that page. The benefit here is that, if the page UI changes, then the tests need not to be changed, only the code within the page object needs to be changed. After that, all the changes that support new UI are located in one place. That’s why locators and test scripts are stored separately.

Now let’s see why do you need the Page Object Model.

Why Page Object Model?

The below-mentioned steps depict the need of Page Object Model in Selenium.

Duplication of Code: Increasing automation test coverage results in unmaintainable project structure if locators are not managed properly. This usually happens because of duplication of code or mainly due to duplicated usage of locators.
Less Time Consumption: The chief problem with script maintenance is that, if ten different scripts are using the same page element, with any change in that element, you need to change all ten scripts. This is time-consuming and error-prone. One of the best approaches to script maintenance is to create a separate class file which would find web elements, fill them or verify them.
Code Maintenance: In the future, if there is a change in the web element, you need to make the change in just one class file and not 10 different scripts. That is achieved with POM and it makes code reusable, readable and maintainable. For Example, In the home page of the web application, I have a menu bar which leads to different modules with different features. While performing automation testing, many test cases would be clicking through these menu buttons to execute specific tests.
Reformation of Automation Test Scripts: Now assume that the User Interface is revamped and all the menu buttons are relocated to a different position in the home page. So, this will result in automation tests to fail. Automated test cases will fail as scripts will not be able to find particular element-locators to perform an action. Now, QA Engineer needs to walk through the whole code to update locators where necessary. Reforming the element-locators in the duplicated code will consume a lot of time to only adjust locators, while this time can be consumed to increase test coverage. Here, you can save this time by using the Page Object Model in our test automation framework.

I hope you understood why do you need a page object model. Now, let’s move further and see some of the advantages of the Page Object Model in Selenium.

Advantages of the Page Object Model:

According to the Page Object Model, you should keep the tests and element locators separately. This will keep the code clean and easy to understand and maintain.
The Page Object approach makes automation framework in a testing programmer friendly, more durable and comprehensive.
Another important advantage is our Page Object Repository is Independent of Automation Tests. If you keep a separate repository for page objects, it helps us to use this repository for different purposes with different frameworks like you will be able to integrate this repository with other tools like JUnit/NUnit/PhpUnit as well as with TestNG/Cucumber/etc.
Test cases become short and optimized as you will be able to reuse page object methods in the POM
POM is best applicable for the applications which contain multiple pages. Each of which has fields which can be uniquely referenced with respect to the page.

So these are a few of the advantages that make POM as unique and easy to work with for automation testers. Now, let’s dive further and understand what is Page Factory.

What is Page Factory?

Page Factory is an inbuilt Page Object Model concept for Selenium WebDriver but it is very optimized. Here, you follow the concept of separation of Page Object Repository and Test Methods.

What is Page Factory - Page Object Model in Selenium - Edureka Additionally, with the help of the PageFactory class, I will use annotations @FindBy to find WebElement.

I hope you understood what is Page Factory. Now let’s dive deeper into this article and understand the working of the Page Object Model with the help of the below example.

Creating a Page Object Model with Page Factory in Selenium WebDriver

Scenario: Here, you need to enter the valid credentials in the ‘Facebook Login’ Page in order to redirect to the ‘Facebook Home‘ Page and then log out from the account.

Follow the below steps to implement the Page Object Model Design Pattern.

Step 1: Create TestBase class. Here I have created an object of WebDriver, maximize browser, implementing waits, launching URL and etc.

In the below example program, I have taken Chrome browser and set the System Property to launch Chrome browser.

package edureka.tests;
import java.io.IOException;
import java.util.concurrent.TimeUnit;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.testng.annotations.AfterSuite;
import org.testng.annotations.BeforeSuite;
public class TestBase {
public static WebDriver driver = null;
@BeforeSuite
public void initialize() throws IOException{
System.setProperty("webdriver.chrome.driver", System.getProperty("user.dir")+"\\src\\test\\java\\drivers\\chromedriver.exe");
driver = new ChromeDriver();
//To maximize browser
driver.manage().window().maximize();
//Implicit wait
driver.manage().timeouts().implicitlyWait(20, TimeUnit.SECONDS);
//To open facebook
driver.get("https://www.facebook.com");
}
@AfterSuite
//Test cleanup
public void TeardownTest()
{
TestBase.driver.quit();
}
}

Step 2: Creating classes for each page (Eg., Facebook Login Page, Facebook Inbox Page) to hold element locators and their methods. Usually, you can create page objects for all available pages in the AUT. For each page,youe create a separate class with a constructor. Identify all the locators and keep them in one class. It allows us to reuse the locators in multiple methods and also helps us in easy maintenance, if there is any change in the UI, you can simply change on one Page.

Here, I have created java files (FacebookLoginPage.java and FacebookInboxPage.java) for the corresponding pages (Facebook Login Page, and Facebook Inbox Page) to hold element locators and their methods.

package edureka.pages;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.support.FindBy;
import org.openqa.selenium.support.How;

public class FbHomePage {
WebDriver driver;
public FbHomePage(WebDriver driver){
this.driver=driver;
}
//Using FindBy for locating elements
@FindBy(how=How.XPATH, using="//div") WebElement profileDropdown;
@FindBy(how=How.XPATH, using="//text()[.='Log Out']/ancestor::span[1]") WebElement logoutLink;
// Defining all the user actions (Methods) that can be performed in the Facebook home page
// This method to click on Profile Dropdown
public void clickOnProfileDropdown(){
profileDropdown.click();
}
// This method to click on Logout link
public void clickOnLogoutLink(){
logoutLink.click();
}
}

package edureka.pages;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.support.FindBy;
import org.openqa.selenium.support.How;
public class FbLoginPage {
WebDriver driver;
public FbLoginPage(WebDriver driver){
this.driver=driver;
}
//Using FindBy for locating elements
@FindBy(how=How.XPATH, using="//input[@type='email'][@name='email']") WebElement emailTextBox;
@FindBy(how=How.XPATH, using="//input[@type='password'][@name='pass']") WebElement passwordTextBox;
@FindBy(how=How.XPATH, using="//input[@type='submit'][@id='u_0_5']") WebElement signinButton;
// Defining all the user actions (Methods) that can be performed in the Facebook home page</span>

// This method is to set Email in the email text box
public void setEmail(String strEmail){
emailTextBox.sendKeys(strEmail);
}
// This method is to set Password in the password text box
public void setPassword(String strPassword){
passwordTextBox.sendKeys(strPassword);

// This method is to click on Login Button
public void clickOnLoginButton(){
signinButton.click();
}
}

Step 3: Here, you need to create Test (Eg., FBLoginTest) based on the above pages. As per my test scenario which was mentioned above,test scripts run as follows.

Launch browser and open facebook.com
Enter user credentials and do signin
Verify the loggedIn user name and do logout

package edureka.tests;
import org.openqa.selenium.support.PageFactory;
import org.testng.annotations.Test;
import pages.FbHomePage;
import pages.FbLoginPage;
public class FbLoginTest extends TestBase{
@Test
public void init() throws Exception{
//driver.get("https://www.facebook.com");
FbLoginPage loginpage = PageFactory.initElements(driver, FbLoginPage.class);
loginpage.setEmail("your-username");
loginpage.setPassword("your-password");
loginpage.clickOnLoginButton();
FbHomePage homepage = PageFactory.initElements(driver, FbHomePage.class);
homepage.clickOnProfileDropdown();
homepage.verifyLoggedInUserNameText();
homepage.clickOnLogoutLink();
}
}

Finally, you need to create the testng.xml file and link to the above created test case class files.

Step 4: Creating testng.xml file

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE suite SYSTEM "http://testng.org/testng-1.0.dtd">
<suite name="Everjobs Suite">
<test name="Page Object Model Project">
<classes>
<class name="edureka.tests.TestBase" />
<class name="edureka.tests.FbLoginTest" />
</classes>
</test>
</suite> <!-- Suite -->

On executing this testnG.xml file on test suite, it will redirect to facebook.com webpage and enter all the credentials. It will then verify username and then logout of the account. This is how Page Object Model can be implemented with Page Factory.

With this, we come to an end of this article on Page Object Model in Selenium. I hope you understood the concepts and it added value to your knowledge.

If you wish to learn Selenium and build a career in the testing domain, then check out our interactive, live-online Selenium Certification Training here, that comes with 24*7 support to guide you throughout your learning period.

Got a question for us? Please mention it in the comments section of Page Object Model in Selenium article and we will get back to you.

The post All You Need To Know About Page Object Model In Selenium appeared first on Edureka.

↧

Performance Testing Life Cycle : All You Need To Know About Testing Phases

April 21, 2019, 8:57 pm

≫ Next: A Step By Step Guide to Install TensorFlow

≪ Previous: All You Need To Know About Page Object Model In Selenium

The modern era of IT has seen an overwhelming evolution of the Software Testing industry giving way to greener pastures. Thus it becomes very important to ensure the effective performance of the software application. This “Performance Testing Life Cycle” article will provide in-depth knowledge about the process of testing in the following sequence:

What is Performance Testing?

Performance Testing is a type of software testing which ensures that the application is performing well under the workload. The goal of performance testing is not to find bugs but to eliminate performance bottlenecks. It measures the quality attributes of the system.

The attributes of Performance Testing include:

attributes- performance testing life cycle - edureka

Speed – It determines whether the application responds quickly.
Scalability – It determines maximum user load the software application can handle.
Stability – It determines if the application is stable under varying loads.

Now let’s move ahead with our Performance Testing Life Cycle article and have a look at the advantages of Performance Testing.

Advantages of Performance Testing

Validate Features – Performance testing validates the fundamental features of the software. Measuring the performance of basic software functions allows business leaders to make key decisions about the setup of the software.
Measure the speed, accuracy, and stability – It helps you in monitoring the crucial components of your software under duress. This gives you vital information on how the software will be able to handle scalability.
Keep your users happy – Measuring application performance allows you to observe how your customers respond to your software. The advantage is that you can pinpoint critical issues before your customers.
Identify discrepancies – Measuring performance provides a buffer for developers before release. Any issues are likely to be magnified once they are released.
Improve optimization and load capability – Measuring performance can help your organization deal with volume so your software can cope when you hit high levels of users.

Now that you know the advantages of Performance Testing, let’s have a look at the different steps involved in the Performance Testing Life Cycle.

Performance Testing Life Cycle

Performance Testing starts parallel with Software Development Life Cycle (SDLC). The different phases of Performance Testing Life Cycle (PTLC) are:

Non-Functional Requirements Elicitation and Analysis

It is one of the most important and critical steps to understand the non-functional requirements in PTLC. It helps to evaluate the degree of compliance with non-functional needs.

Entry Criteria	Tasks	Exit Criteria
Application Under Test (AUT) Architecture Non-Functional Requirement Questionnaire	Understanding AUT architecture Identification of critical scenarios and understanding Understanding Interface details Growth pattern	Client signed-off NFR document

Performance Test Strategy

The second defines how to approach Performance Testing for the identified critical scenarios. You need to address the kind of performance testing and the tools required.

Entry Criteria	Tasks	Exit Criteria
Signed-off NFR document	Prepare the Test Strategy and Review Data set up Defining in-scope and out-scope SLA Workload Model Prepare Risks and Mitigation and Review	Baselined Performance Test Strategy doc

Performance Test Design

This phase involves the script generation using the identified testing tool in a dedicated environment. The script enhancements are needed to be done and unit tested.

Entry Criteria	Tasks	Exit Criteria
Baselined Test Strategy Test Environment Test Data	Test Scripting Data Parameterization Correlation Designing the action and transactions Unit Testing	Unit tested performance scripts

Performance Test Execution

The next phase is dedicated to the test engineers who design scenarios based on identified workload and load the system with concurrent virtual users.

Entry Criteria	Tasks	Exit Criteria
Baselined Test scripts	Designing the scenarios Loading the test script Test script execution Monitoring the execution Collecting the logs	Test script execution log files

Performance Test Result Analysis

In this phase, the collected log files are analyzed and reviewed by the experienced test engineers. Tuning recommendation will be given if any conflicts are identified.

Entry Criteria	Tasks	Exit Criteria
Collected log files	Create graphs and charts Correlating various graphs and charts Prepare detailed test report Test report analysis and review Tuning recommendation	Performance Analysis Report

Benchmark & Recommendations

This is the last phase in PTLC which involves benchmarking and providing a recommendation to the client.

Entry Criteria	Tasks	Exit Criteria
Performance Analysis Report	Comparing the result with earlier execution results Comparing with the benchmark standards Validate with the NFR Prepare Test Report presentation	Performance report reviewed and baselined

These were the different phases involved in the performance testing life cycle. Now let’s have a look at the different types of performance testing.

Types of Performance Testing

The different types of performance testing are:

Load testing – It checks the application’s ability to perform under anticipated user loads. The objective is to identify performance bottlenecks before the software application goes live.
Stress testing – This involves testing an application under extreme workloads to see how it handles high traffic or data processing. The objective is to identify the breaking point of an application.
Endurance testing – It is done to make sure the software can handle the expected load over a long period of time.

Spike testing – This tests the software’s reaction to sudden large spikes in the load generated by users.
Volume testing – Under Volume Testing large no. of. Data is populated in a database and the overall software system’s behavior is monitored. The objective is to check the software application’s performance under varying database volumes.
Scalability testing – The objective of scalability testing is to determine the software application’s effectiveness in scaling up to support an increase in user load. It helps plan capacity addition to your software system.

Now, if you want to perform any of these testings on your server, you would need different types of tools that are compatible with your test plan. Let’s have a look at some of the important performance testing tools.

Tools for Performance Testing

The market is full of a number of tools for test management, performance testing, GUI testing, functional testing, etc. I would suggest you opt for a tool which is on-demand, easy to learn as per your skills, generic and effective for the required type of testing. Let’s have a look at the top 10 Performance Testing Tools:

LoadNinja
Apache JMeter
WebLOAD
LoadUI Pro
LoadView
NeoLoad
LoadRunner
Silk Performer
AppLoader
SmartMeter.io

With this, we have come to the end of the Performance Testing Life Cycle article. I hope you guys enjoyed this article and got an idea about the different phases involved in performance testing.

Now that you know about the different performance testing tools, check out the Performance Testing Using JMeter Course by Edureka, a trusted online learning company with a network of more than 250,000 satisfied learners spread across the globe. This course provides you insights into software behavior during workload. In this course, you will learn how to check the response time and latency of software and test if a software package is efficient for scaling. The course will help you check the strength and analyze the overall performance of an application under different load types.

Got a question for us? Please mention it in the comments section of “Performance Testing Life Cycle” and we will get back to you.

The post Performance Testing Life Cycle : All You Need To Know About Testing Phases appeared first on Edureka.

↧

A Step By Step Guide to Install TensorFlow

April 21, 2019, 9:11 pm

≫ Next: A Comprehensive Guide To Naive Bayes In R

≪ Previous: Performance Testing Life Cycle : All You Need To Know About Testing Phases

Deep Learning is one of the Hottest topics of 2019-20 and for a good reason. The advancements in the Industry has made it possible for Machines/Computer Programs to actually replace Humans. Artificial Intelligence is going to create 2.3 million Jobs by 2020 and a lot of this is being made possible by TensorFlow. So, in this Install TensorFlow article, I’ll be covering the following topics:

What is TensorFlow?

TensorFlow is Google’s Open Source Machine Learning Framework for dataflow programming across a range of tasks. Nodes in the graph represent mathematical operations, while the graph edges represent the multi-dimensional data arrays (tensors) communicated between them.

TensorFLow-Definition-Install-TensorFlow

Tensors are just multidimensional arrays, an extension of 2-dimensional tables to data with a higher dimension. There are many features of TensorFlow which makes it appropriate for Deep Learning.

TensorFlow Applications

Now TensorFlow has helped a lot of companies built world-class models to solve real problems. So, before we install TensorFlow, let’s have a look at some of the applications of it.

Airbnb: It improves the guest experience by using TensorFlow to classify images and detect objects at scale.

CocaCola: The advancements in Tensorflow enabled Coco-Cola to finally achieve a long-sought frictionless proof-of-purchase capability.

GE Healthcare: GE trained a Neural Network using TensorFlow to identify specific anatomy during brain MRIs to help improve speed and reliability.

Twitter: Twitter used TensorFlow to build their “Ranked Timeline”, allowing users to not miss any tweets even if they follow a thousand other users.

Install TensorFlow: Steps

There are few Pre-requisites before we install Tensorflow:

Install Python

Install Pip

Pip is already there in python 3.5.x onwards.

Set up Virtual Environment

To set up the Virtual environment:

virtualenv-Install-TensorFlow

After this use the command

pip-install-tensorflow

And, you’re done. Go ahead, after you install tensorflow, just import it and start using it’s amazing deep learning capabilities and create something new.

So, with this, we come to an end of this install TensorFlow article. Edureka’s Deep Learning in TensorFlow with Python Certification Training is curated by industry professionals as per the industry requirements & demands. You will master the concepts such as SoftMax function, Autoencoder Neural Networks, Restricted Boltzmann Machine (RBM) and work with libraries like Keras & TFLearn. The course has been specially curated by industry experts with real-time case studies.

The post A Step By Step Guide to Install TensorFlow appeared first on Edureka.

↧

A Comprehensive Guide To Naive Bayes In R

April 21, 2019, 9:17 pm

≫ Next: Introduction To Machine Learning: All You Need To Know About Machine Learning

≪ Previous: A Step By Step Guide to Install TensorFlow

Machine Learning has become the most in-demand skill in the market. It is essential to know the various Machine Learning Algorithms and how they work. In this blog on Naive Bayes In R, I intend to help you learn about how Naive Bayes works and how it can be implemented using the R language.

To get in-depth knowledge on Data Science, you can enroll for live Data Science Certification Training by Edureka with 24/7 support and lifetime access.

The following topics are covered in this blog:

What Is Naive Bayes?

Naive Bayes is a Supervised Machine Learning algorithm based on the Bayes Theorem that is used to solve classification problems by following a probabilistic approach. It is based on the idea that the predictor variables in a Machine Learning model are independent of each other. Meaning that the outcome of a model depends on a set of independent variables that have nothing to do with each other.

But why is Naive Bayes called ‘Naive’?

In real-world problems, predictor variables aren’t always independent of each other, there are always some correlations between them. Since Naive Bayes considers each predictor variable to be independent of any other variable in the model, it is called ‘Naive’.

Now let’s understand the logic behind the Naive Bayes algorithm.

The Math Behind Naive Bayes

The principle behind Naive Bayes is the Bayes theorem also known as the Bayes Rule. The Bayes theorem is used to calculate the conditional probability, which is nothing but the probability of an event occurring based on information about the events in the past. Mathematically, the Bayes theorem is represented as:

Bayes Theorem - Naive Bayes In R - Edureka

Bayes Theorem – Naive Bayes In R – Edureka

In the above equation:

P(A|B): Conditional probability of event A occurring, given the event B
P(A): Probability of event A occurring
P(B): Probability of event B occurring
P(B|A): Conditional probability of event B occurring, given the event A

Formally, the terminologies of the Bayesian Theorem are as follows:

A is known as the proposition and B is the evidence
P(A) represents the prior probability of the proposition
P(B) represents the prior probability of evidence
P(A|B) is called the posterior
P(B|A) is the likelihood

Therefore, the Bayes theorem can be summed up as:

Posterior=(Likelihood).(Proposition prior probability)/Evidence prior probability

It can also be considered in the following manner:

Given a Hypothesis H and evidence E, Bayes Theorem states that the relationship between the probability of Hypothesis before getting the evidence P(H) and the probability of the hypothesis after getting the evidence P(H|E) is:

Bayes Theorem In Terms Of Hypothesis - Naive Bayes In R - Edureka

Bayes Theorem In Terms Of Hypothesis – Naive Bayes In R – Edureka

Now that you know what the Bayes Theorem is, let’s see how it can be derived.

Derivation Of The Bayes Theorem

The main aim of the Bayes Theorem is to calculate the conditional probability. The Bayes Rule can be derived from the following two equations:

The below equation represents the conditional probability of A, given B:

Deriving Bayes Theorem - Naive Bayes In R - Edureka

Deriving Bayes Theorem Equation 1 – Naive Bayes In R – Edureka

The below equation represents the conditional probability of B, given A:

Deriving Bayes Theorem Equation 2 - Naive Bayes In R - Edureka

Deriving Bayes Theorem Equation 2 – Naive Bayes In R – Edureka

Therefore, on combining the above two equations we get the Bayes Theorem:

Bayes Theorem – Naive Bayes In R – Edureka

Bayes Theorem for Naive Bayes Algorithm

The above equation was for a single predictor variable, however, in real-world applications, there are more than one predictor variables and for a classification problem, there is more than one output class. The classes can be represented as, C1, C2,…, Ck and the predictor variables can be represented as a vector, x1,x2,…,xn.

The objective of a Naive Bayes algorithm is to measure the conditional probability of an event with a feature vector x1,x2,…,xn belonging to a particular class Ci,

Naive Bayes Derivation Equation - Naive Bayes In R - Edureka

On computing the above equation, we get:

Bayes Theorem Derivation - Naive Bayes In R - Edureka

However, the conditional probability, i.e., P(xj|xj+1,…,xn,Ci) sums down to P(xj|Ci) since each predictor variable is independent in Naive Bayes.

The final equation comes down to:

Naive Bayes Derivation Equation 1 - Naive Bayes In R - Edureka

Here, P(x1,x2,…,xn) is constant for all the classes, therefore we get:

How Does Naive Bayes Work?

To get a better understanding of how Naive Bayes works, let’s look at an example.

Consider a data set with 1500 observations and the following output classes:

Cat
Parrot
Turtle

The Predictor variables are categorical in nature i.e., they store two values, either True or False:

Swim
Wings
Green Color
Sharp Teeth

Naive Bayes Example - Naive Bayes In R - Edureka

Naive Bayes Example – Naive Bayes In R – Edureka

From the above table, we can summarise that:

The class of type cats shows that:

Out of 500, 450 (90%) cats can swim
0 number of cats have wings
0 number of cats are of Green color
All 500 cats have sharp teeth

The class of type Parrot shows that:

50 (10%) parrots have a true value for swim
All 500 parrots have wings
Out of 500, 400 (80%) parrots are green in color
No parrots have sharp teeth

The class of type Turtle shows:

All 500 turtles can swim
0 number of turtles have wings
Out of 500, 100 (20%) turtles are green in color
50 out of 500 (10%) turtles have sharp teeth

Now, with the available data, let’s classify the following observation into one of the output classes (Cats, Parrot or Turtle) by using the Naive Bayes Classifier.

Naive Bayes Example Observation - Naive Bayes In R - Edureka

The goal here is to predict whether the animal is a Cat, Parrot or a Turtle based on the defined predictor variables (swim, wings, green, sharp teeth).

To solve this, we will use the Naive Bayes approach,
P(H|Multiple Evidences) = P(C1| H)* P(C2|H) ……*P(Cn|H) * P(H) / P(Multiple Evidences)

In the observation, the variables Swim and Green are true and the outcome can be any one of the animals (Cat, Parrot, Turtle).

To check if the animal is a cat:
P(Cat | Swim, Green) = P(Swim|Cat) * P(Green|Cat) * P(Cat) / P(Swim, Green)
= 0.9 * 0 * 0.333 / P(Swim, Green)
= 0

To check if the animal is a Parrot:
P(Parrot| Swim, Green) = P(Swim|Parrot) * P(Green|Parrot) * P(Parrot) / P(Swim, Green)
= 0.1 * 0.80 * 0.333 / P(Swim, Green)
= 0.0264/ P(Swim, Green)

To check if the animal is a Turtle:
P(Turtle| Swim, Green) = P(Swim|Turtle) * P(Green|Turtle) * P(Turtle) / P(Swim, Green)
= 1 * 0.2 * 0.333 / P(Swim, Green)
= 0.0666/ P(Swim, Green)

For all the above calculations the denominator is the same i.e, P(Swim, Green). The value of P(Turtle| Swim, Green) is greater than P(Parrot| Swim, Green), therefore we can correctly predict the class of the animal as Turtle.

Now let’s see how you can implement Naive Bayes using the R language.

Practical Implementation of Naive Bayes In R

Problem Statement: To study a Diabetes data set and build a Machine Learning model that predicts whether or not a person has Diabetes.

Data Set Description: The given data set contains 100s of observations of patients along with their health details. Here’s a list of the predictor variables that will help us classify a patient as either Diabetic or Normal:

Pregnancies: Number of pregnancies so far
Glucose: Plasma glucose concentration
BloodPressure: Diastolic blood pressure (mm Hg)
SkinThickness: Triceps skin fold thickness (mm)
Insulin: 2-Hour serum insulin (mu U/ml)
BMI: Body mass index (weight in kg/(height in m)^2)
DiabetesPedigreeFunction: Diabetes pedigree function
Age: Age (years)

The response variable or the output variable is:

Outcome: Class variable (0 or 1)

Logic: To build a Naive Bayes model in order to classify patients as either Diabetic or normal by studying their medical records such as Glucose level, age, BMI, etc.

Now that you know the objective of this demo, let’s get our brains working and start coding. For this demo, I’ll be using the R language in order to build the model.

If you wish to learn more about R programming, you can go through this video recorded by our R Programming Experts.

R Tutorial For Beginners | Edureka

Now, let’s begin.

Step 1: Install and load the requires packages


#Loading required packages
install.packages('tidyverse')
library(tidyverse)
install.packages('ggplot2')
library(ggplot2)
install.packages('caret')
library(caret)
install.packages('caretEnsemble')
library(caretEnsemble)
install.packages('psych')
library(psych)
install.packages('Amelia')
library(Amelia)
install.packages('mice')
library(mice)
install.packages('GGally')
library(GGally)
install.packages('rpart')
library(rpart)
install.packages('randomForest')
library(randomForest)

Step 2: Import the data set


#Reading data into R
data<- read.csv("/Users/Zulaikha_Geer/Desktop/NaiveBayesData/diabetes.csv")

Before we study the data set let’s convert the output variable (‘Outcome’) into a categorical variable. This is necessary because our output will be in the form of 2 classes, True or False. Where true will denote that a patient has diabetes and false denotes that a person is diabetes free.


#Setting outcome variables as categorical
data$Outcome <- factor(data$Outcome, levels = c(0,1), labels = c("False", "True"))

Step 3: Studying the Data Set


#Studying the structure of the data

str(data)

Str() in R - Naive Bayes In R - Edureka

Understanding the data set – Naive Bayes In R – Edureka


head(data)

head() in R - Naive Bayes In R - Edureka

Understanding the data set – Naive Bayes In R – Edureka


describe(data)

describe() in R - Naive Bayes In R - Edureka

Understanding the data set – Naive Bayes In R – Edureka

Step 4: Data Cleaning

While analyzing the structure of the data set, we can see that the minimum values for Glucose, Bloodpressure, Skinthickness, Insulin, and BMI are all zero. This is not ideal since no one can have a value of zero for Glucose, blood pressure, etc. Therefore, such values are treated as missing observations.

In the below code snippet, we’re setting the zero values to NA’s:


#Convert '0' values into NA
data[, 2:7][data[, 2:7] == 0] <- NA

To check how many missing values we have now, let’s visualize the data:


#visualize the missing data
missmap(data)

Missing Data Plot - Naive Bayes In R - Edureka

Missing Data Plot – Naive Bayes In R – Edureka

The above illustrations show that our data set has plenty missing values and removing all of them will leave us with an even smaller data set, therefore, we can perform imputations by using the mice package in R.


#Use mice package to predict missing values
mice_mod <- mice(data[, c("Glucose","BloodPressure","SkinThickness","Insulin","BMI")], method='rf')
mice_complete <- complete(mice_mod)

iter imp variable
1 1 Glucose BloodPressure SkinThickness Insulin BMI
1 2 Glucose BloodPressure SkinThickness Insulin BMI
1 3 Glucose BloodPressure SkinThickness Insulin BMI
1 4 Glucose BloodPressure SkinThickness Insulin BMI
1 5 Glucose BloodPressure SkinThickness Insulin BMI
2 1 Glucose BloodPressure SkinThickness Insulin BMI
2 2 Glucose BloodPressure SkinThickness Insulin BMI
2 3 Glucose BloodPressure SkinThickness Insulin BMI
2 4 Glucose BloodPressure SkinThickness Insulin BMI
2 5 Glucose BloodPressure SkinThickness Insulin BMI

#Transfer the predicted missing values into the main data set
data$Glucose <- mice_complete$Glucose
data$BloodPressure <- mice_complete$BloodPressure
data$SkinThickness <- mice_complete$SkinThickness
data$Insulin<- mice_complete$Insulin
data$BMI <- mice_complete$BMI

To check if there are still any missing values, let’s use the missmap plot:


missmap(data)

Using Mice Package In R - Naive Bayes In R - Edureka

Using Mice Package In R – Naive Bayes In R – Edureka

The output looks good, there is no missing data.

Step 5: Exploratory Data Analysis

Now let’s perform a couple of visualizations to take a better look at each variable, this stage is essential to understand the significance of each predictor variable.


#Data Visualization
#Visual 1
ggplot(data, aes(Age, colour = Outcome)) +
geom_freqpoly(binwidth = 1) + labs(title="Age Distribution by Outcome")

Data Visualization - Naive Bayes In R - Edureka

Data Visualization – Naive Bayes In R – Edureka


#visual 2
c <- ggplot(data, aes(x=Pregnancies, fill=Outcome, color=Outcome)) +
geom_histogram(binwidth = 1) + labs(title="Pregnancy Distribution by Outcome")
c + theme_bw()

Data Visualization - Naive Bayes In R - Edureka

Data Visualization – Naive Bayes In R – Edureka


#visual 3
P <- ggplot(data, aes(x=BMI, fill=Outcome, color=Outcome)) +
geom_histogram(binwidth = 1) + labs(title="BMI Distribution by Outcome")
P + theme_bw()

Data Visualization - Naive Bayes In R - Edureka

Data Visualization – Naive Bayes In R – Edureka


#visual 4
ggplot(data, aes(Glucose, colour = Outcome)) +
geom_freqpoly(binwidth = 1) + labs(title="Glucose Distribution by Outcome")

Data Visualization - Naive Bayes In R - Edureka

Data Visualization – Naive Bayes In R – Edureka


#visual 5
ggpairs(data)

Data Visualization - Naive Bayes In R - Edureka

Data Visualization – Naive Bayes In R – Edureka

Step 6: Data Modelling

This stage begins with a process called Data Splicing, wherein the data set is split into two parts:

Training set: This part of the data set is used to build and train the Machine Learning model.
Testing set: This part of the data set is used to evaluate the efficiency of the model.


#Building a model
#split data into training and test data sets
indxTrain <- createDataPartition(y = data$Outcome,p = 0.75,list = FALSE)
training <- data[indxTrain,]
testing <- data[-indxTrain,]

#Check dimensions of the split

> prop.table(table(data$Outcome)) * 100

False True
65.10417 34.89583

> prop.table(table(training$Outcome)) * 100

False True
65.10417 34.89583

> prop.table(table(testing$Outcome)) * 100

False True
65.10417 34.89583

For comparing the outcome of the training and testing phase let’s create separate variables that store the value of the response variable:


#create objects x which holds the predictor variables and y which holds the response variables
x = training[,-9]
y = training$Outcome

Now it’s time to load the e1071 package that holds the Naive Bayes function. This is an in-built function provided by R.


library(e1071)

After loading the package, the below code snippet will create Naive Bayes model by using the training data set:


model = train(x,y,'nb',trControl=trainControl(method='cv',number=10))

> model
Naive Bayes

576 samples
8 predictor
2 classes: 'False', 'True'

No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 518, 518, 519, 518, 519, 518, ...
Resampling results across tuning parameters:

usekernel Accuracy Kappa
FALSE 0.7413793 0.4224519
TRUE 0.7622505 0.4749285

Tuning parameter 'fL' was held constant at a value of 0
Tuning parameter 'adjust' was held
constant at a value of 1
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were fL = 0, usekernel = TRUE and adjust = 1.

We thus created a predictive model by using the Naive Bayes Classifier.

Step 7: Model Evaluation

To check the efficiency of the model, we are now going to run the testing data set on the model, after which we will evaluate the accuracy of the model by using a Confusion matrix.


#Model Evaluation
#Predict testing set
Predict <- predict(model,newdata = testing )
#Get the confusion matrix to see accuracy value and other parameter values

> confusionMatrix(Predict, testing$Outcome )
Confusion Matrix and Statistics

Reference
Prediction False True
False 91 18
True 34 49

Accuracy : 0.7292
95% CI : (0.6605, 0.7906)
No Information Rate : 0.651
P-Value [Acc > NIR] : 0.01287

Kappa : 0.4352

Mcnemar's Test P-Value : 0.03751

Sensitivity : 0.7280
Specificity : 0.7313
Pos Pred Value : 0.8349
Neg Pred Value : 0.5904
Prevalence : 0.6510
Detection Rate : 0.4740
Detection Prevalence : 0.5677
Balanced Accuracy : 0.7297

'Positive' Class : False

The final output shows that we built a Naive Bayes classifier that can predict whether a person is diabetic or not, with an accuracy of approximately 73%.

To summaries the demo, let’s draw a plot that shows how each predictor variable is independently responsible for predicting the outcome.


#Plot Variable performance
X <- varImp(model)
plot(X)

Variable Performance Plot - Naive Bayes In R - Edureka

Variable Performance Plot – Naive Bayes In R – Edureka

From the above illustration, it is clear that ‘Glucose’ is the most significant variable for predicting the outcome.

Now that you know how Naive Bayes works, I’m sure you’re curious to learn more about the various Machine learning algorithms. Here’s a list of blogs on Machine Learning Algorithms, do give them a read:

So, with this, we come to the end of this blog. I hope you all found this blog informative. If you have any thoughts to share, please comment them below. Stay tuned for more blogs like these!

The post A Comprehensive Guide To Naive Bayes In R appeared first on Edureka.

↧

Introduction To Machine Learning: All You Need To Know About Machine Learning

April 21, 2019, 9:38 pm

≫ Next: Top 50 Automation Anywhere Interview Questions You Should Know In 2019

≪ Previous: A Comprehensive Guide To Naive Bayes In R

Introduction To Machine Learning:

Undoubtedly, Machine Learning is the most in-demand technology in today’s market. Its applications range from self-driving cars to predicting deadly diseases such as ALS. The high demand for Machine Learning skills is the motivation behind this blog. In this blog on Introduction To Machine Learning, you will understand all the basic concepts of Machine Learning and a Practical Implementation of Machine Learning by using the R language.

To get in-depth knowledge on Data Science, you can enroll for live Data Science Certification Training by Edureka with 24/7 support and lifetime access.

The following topics are covered in this Introduction To Machine Learning blog:

Need For Machine Learning

Ever since the technical revolution, we’ve been generating an immeasurable amount of data. As per research, we generate around 2.5 quintillion bytes of data every single day! It is estimated that by 2020, 1.7MB of data will be created every second for every person on earth.

With the availability of so much data, it is finally possible to build predictive models that can study and analyze complex data to find useful insights and deliver more accurate results.

Top Tier companies such as Netflix and Amazon build such Machine Learning models by using tons of data in order to identify profitable opportunities and avoid unwanted risks.

Here’s a list of reasons why Machine Learning is so important:

Increase in Data Generation: Due to excessive production of data, we need a method that can be used to structure, analyze and draw useful insights from data. This is where Machine Learning comes in. It uses data to solve problems and find solutions to the most complex tasks faced by organizations.
Improve Decision Making: By making use of various algorithms, Machine Learning can be used to make better business decisions. For example, Machine Learning is used to forecast sales, predict downfalls in the stock market, identify risks and anomalies, etc.

Importance Of Machine Learning – Introduction To Machine Learning – Edureka

Uncover patterns & trends in data: Finding hidden patterns and extracting key insights from data is the most essential part of Machine Learning. By building predictive models and using statistical techniques, Machine Learning allows you to dig beneath the surface and explore the data at a minute scale. Understanding data and extracting patterns manually will take days, whereas Machine Learning algorithms can perform such computations in less than a second.
Solve complex problems: From detecting the genes linked to the deadly ALS disease to building self-driving cars, Machine Learning can be used to solve the most complex problems.

To give you a better understanding of how important Machine Learning is, let’s list down a couple of Machine Learning Applications:

Netflix’s Recommendation Engine: The core of Netflix is its infamous recommendation engine. Over 75% of what you watch is recommended by Netflix and these recommendations are made by implementing Machine Learning.
Facebook’s Auto-tagging feature: The logic behind Facebook’s DeepMind face verification system is Machine Learning and Neural Networks. DeepMind studies the facial features in an image to tag your friends and family.
Amazon’s Alexa: The infamous Alexa, which is based on Natural Language Processing and Machine Learning is an advanced level Virtual Assistant that does more than just play songs on your playlist. It can book you an Uber, connect with the other IoT devices at home, track your health, etc.
Google’s Spam Filter: Gmail makes use of Machine Learning to filter out spam messages. It uses Machine Learning algorithms and Natural Language Processing to analyze emails in real-time and classify them as either spam or non-spam.

These were a few examples of how Machine Learning is implemented in Top Tier companies. Here’s a blog on the Top 10 Applications of Machine Learning, do give it a read to learn more.

Now that you know why Machine Learning is so important, let’s look at what exactly Machine Learning is.

Introduction To Machine Learning

The term Machine Learning was first coined by Arthur Samuel in the year 1959. Looking back, that year was probably the most significant in terms of technological advancements.

If you browse through the net about ‘what is Machine Learning’, you’ll get at least 100 different definitions. However, the very first formal definition was given by Tom M. Mitchell:

“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.”

In simple terms, Machine learning is a subset of Artificial Intelligence (AI) which provides machines the ability to learn automatically & improve from experience without being explicitly programmed to do so. In the sense, it is the practice of getting Machines to solve problems by gaining the ability to think.

But wait, can a machine think or make decisions? Well, if you feed a machine a good amount of data, it will learn how to interpret, process and analyze this data by using Machine Learning Algorithms, in order to solve real-world problems.

Before moving any further, let’s discuss some of the most commonly used terminologies in Machine Learning.

Machine Learning Definitions

Algorithm: A Machine Learning algorithm is a set of rules and statistical techniques used to learn patterns from data and draw significant information from it. It is the logic behind a Machine Learning model. An example of a Machine Learning algorithm is the Linear Regression algorithm.

Model: A model is the main component of Machine Learning. A model is trained by using a Machine Learning Algorithm. An algorithm maps all the decisions that a model is supposed to take based on the given input, in order to get the correct output.

Predictor Variable: It is a feature(s) of the data that can be used to predict the output.

Response Variable: It is the feature or the output variable that needs to be predicted by using the predictor variable(s).

Training Data: The Machine Learning model is built using the training data. The training data helps the model to identify key trends and patterns essential to predict the output.

Testing Data: After the model is trained, it must be tested to evaluate how accurately it can predict an outcome. This is done by the testing data set.

What Is Machine Learning - Introduction To Machine Learning - Edureka

What Is Machine Learning? – Introduction To Machine Learning – Edureka

To sum it up, take a look at the above figure. A Machine Learning process begins by feeding the machine lots of data, by using this data the machine is trained to detect hidden insights and trends. These insights are then used to build a Machine Learning Model by using an algorithm in order to solve a problem.

The next topic in this Introduction to Machine Learning blog is the Machine Learning Process.

Machine Learning Process

The Machine Learning process involves building a Predictive model that can be used to find a solution for a Problem Statement. To understand the Machine Learning process let’s assume that you have been given a problem that needs to be solved by using Machine Learning.

Machine Learning Process – Introduction To Machine Learning – Edureka

The problem is to predict the occurrence of rain in your local area by using Machine Learning.

The below steps are followed in a Machine Learning process:

Step 1: Define the objective of the Problem Statement

At this step, we must understand what exactly needs to be predicted. In our case, the objective is to predict the possibility of rain by studying weather conditions. At this stage, it is also essential to take mental notes on what kind of data can be used to solve this problem or the type of approach you must follow to get to the solution.

Step 2: Data Gathering

At this stage, you must be asking questions such as,

What kind of data is needed to solve this problem?
Is the data available?
How can I get the data?

Once you know the types of data that is required, you must understand how you can derive this data. Data collection can be done manually or by web scraping. However, if you’re a beginner and you’re just looking to learn Machine Learning you don’t have to worry about getting the data. There are 1000s of data resources on the web, you can just download the data set and get going.

Coming back to the problem at hand, the data needed for weather forecasting includes measures such as humidity level, temperature, pressure, locality, whether or not you live in a hill station, etc. Such data must be collected and stored for analysis.

Step 3: Data Preparation

The data you collected is almost never in the right format. You will encounter a lot of inconsistencies in the data set such as missing values, redundant variables, duplicate values, etc. Removing such inconsistencies is very essential because they might lead to wrongful computations and predictions. Therefore, at this stage, you scan the data set for any inconsistencies and you fix them then and there.

Step 4: Exploratory Data Analysis

Grab your detective glasses because this stage is all about diving deep into data and finding all the hidden data mysteries. EDA or Exploratory Data Analysis is the brainstorming stage of Machine Learning. Data Exploration involves understanding the patterns and trends in the data. At this stage, all the useful insights are drawn and correlations between the variables are understood.

For example, in the case of predicting rainfall, we know that there is a strong possibility of rain if the temperature has fallen low. Such correlations must be understood and mapped at this stage.

Step 5: Building a Machine Learning Model

All the insights and patterns derived during Data Exploration are used to build the Machine Learning Model. This stage always begins by splitting the data set into two parts, training data, and testing data. The training data will be used to build and analyze the model. The logic of the model is based on the Machine Learning Algorithm that is being implemented.

In the case of predicting rainfall, since the output will be in the form of True (if it will rain tomorrow) or False (no rain tomorrow), we can use a Classification Algorithm such as Logistic Regression.

Choosing the right algorithm depends on the type of problem you’re trying to solve, the data set and the level of complexity of the problem. In the upcoming sections, we will discuss the different types of problems that can be solved by using Machine Learning.

Step 6: Model Evaluation & Optimization

After building a model by using the training data set, it is finally time to put the model to a test. The testing data set is used to check the efficiency of the model and how accurately it can predict the outcome. Once the accuracy is calculated, any further improvements in the model can be implemented at this stage. Methods like parameter tuning and cross-validation can be used to improve the performance of the model.

Step 7: Predictions

Once the model is evaluated and improved, it is finally used to make predictions. The final output can be a Categorical variable (eg. True or False) or it can be a Continuous Quantity (eg. the predicted value of a stock).

In our case, for predicting the occurrence of rainfall, the output will be a categorical variable.

So that was the entire Machine Learning process. Now it’s time to learn about the different ways in which Machines can learn.

Machine Learning Types

A machine can learn to solve a problem by following any one of the following three approaches. These are the ways in which a machine can learn:

Supervised Learning
Unsupervised Learning
Reinforcement Learning

Supervised Learning

Supervised learning is a technique in which we teach or train the machine using data which is well labeled.

To understand Supervised Learning let’s consider an analogy. As kids we all needed guidance to solve math problems. Our teachers helped us understand what addition is and how it is done. Similarly, you can think of supervised learning as a type of Machine Learning that involves a guide. The labeled data set is the teacher that will train you to understand patterns in the data. The labeled data set is nothing but the training data set.

Supervised Learning - Introduction To Machine Learning - Edureka

Supervised Learning – Introduction To Machine Learning – Edureka

Consider the above figure. Here we’re feeding the machine images of Tom and Jerry and the goal is for the machine to identify and classify the images into two groups (Tom images and Jerry images). The training data set that is fed to the model is labeled, as in, we’re telling the machine, ‘this is how Tom looks and this is Jerry’. By doing so you’re training the machine by using labeled data. In Supervised Learning, there is a well-defined training phase done with the help of labeled data.

Unsupervised Learning

Unsupervised learning involves training by using unlabeled data and allowing the model to act on that information without guidance.

Think of unsupervised learning as a smart kid that learns without any guidance. In this type of Machine Learning, the model is not fed with labeled data, as in the model has no clue that ‘this image is Tom and this is Jerry’, it figures out patterns and the differences between Tom and Jerry on its own by taking in tons of data.

Unsupervised Learning - Introduction To Machine Learning - Edureka

Unsupervised Learning – Introduction To Machine Learning – Edureka

For example, it identifies prominent features of Tom such as pointy ears, bigger size, etc, to understand that this image is of type 1. Similarly, it finds such features in Jerry and knows that this image is of type 2. Therefore, it classifies the images into two different classes without knowing who Tom is or Jerry is.

Reinforcement Learning

Reinforcement Learning is a part of Machine learning where an agent is put in an environment and he learns to behave in this environment by performing certain actions and observing
the rewards which it gets from those actions.

This type of Machine Learning is comparatively different. Imagine that you were dropped off at an isolated island! What would you do?

Panic? Yes, of course, initially we all would. But as time passes by, you will learn how to live on the island. You will explore the environment, understand the climate condition, the type of food that grows there, the dangers of the island, etc. This is exactly how Reinforcement Learning works, it involves an Agent (you, stuck on the island) that is put in an unknown environment (island), where he must learn by observing and performing actions that result in rewards.

Reinforcement Learning is mainly used in advanced Machine Learning areas such as self-driving cars, AplhaGo, etc.

To better understand the difference between Supervised, Unsupervised and Reinforcement Learning, you can go through this short video.

Supervised vs Unsupervised vs Reinforcement Learning | Edureka

So that sums up the types of Machine Learning. Now, let’s look at the type of problems that are solved by using Machine Learning.

Type Of Problems In Machine Learning

Type of Problems Solved Using Machine Learning - Introduction To Machine Learning - Edureka

Type of Problems Solved Using Machine Learning – Introduction To Machine Learning – Edureka

Consider the above figure, there are three main types of problems that can be solved in Machine Learning:

Regression: In this type of problem the output is a continuous quantity. So, for example, if you want to predict the speed of a car given the distance, it is a Regression problem. Regression problems can be solved by using Supervised Learning algorithms like Linear Regression.
Classification: In this type, the output is a categorical value. Classifying emails into two classes, spam and non-spam is a classification problem that can be solved by using Supervised Learning classification algorithms such as Support Vector Machines, Naive Bayes, Logistic Regression, K Nearest Neighbor, etc.
Clustering: This type of problem involves assigning the input into two or more clusters based on feature similarity. For example, clustering viewers into similar groups based on their interests, age, geography, etc can be done by using Unsupervised Learning algorithms like K-Means Clustering.

Here’s a table that sums up the difference between Regression, Classification, and Clustering.

Regression vs Classification vs Clustering - Introduction To Machine Learning - Edureka

Regression vs Classification vs Clustering – Introduction To Machine Learning – Edureka

Now to make things interesting, I will leave a couple of problem statements below and your homework is to guess what type of problem (Regression, Classification or Clustering) it is:

Problem Statement 1: Study a bank credit dataset and make a decision about whether to approve the loan of an applicant based on his socio-economic profile.
Problem Statement 2: To study the House Sales dataset and build a Machine Learning model that predicts the house pricing index.
Problem Statement 3: To cluster a set of movies as either good or average based on their social media outreach.

Don’t forget to leave your answer in the comment section.

Now that you have a good idea about what Machine Learning is and the processes involved in it, let’s execute a demo that will help you understand how Machine Learning really works.

Machine Learning In R

A short disclaimer: I’ll be using the R language to show how Machine Learning works. R is a Statistical programming language mainly used for Data Science and Machine Learning. To learn more about R, you can go through the following blogs:

Now, let’s get started.

Problem Statement: To study the Seattle Weather Forecast Data set and build a Machine Learning model that can predict the possibility of rain.

Data Set Description: The data set was gathered by researching and observing the weather conditions at the Seattle-Tacoma International Airport. The dataset contains the following variables:

DATE = date of the observation
PRCP = amount of precipitation, in inches
TMAX = maximum temperature for that day, in degrees Fahrenheit
TMIN = minimum temperature for that day, in degrees Fahrenheit
RAIN = TRUE if rain was observed on that day, FALSE if it was not

The target or the response variable, in this case, is ‘RAIN’. If you notice, this variable is categorical in nature, i.e. it’s value is of two categories, either True or False. Therefore, this is a classification problem and we will be using a classification algorithm called Logistic Regression.

Even though the name suggests that it is a ‘Regression’ algorithm, it actually isn’t. It belongs to the GLM (Generalised Linear Model) family and thus the name Logistic Regression.

Follow this, Comprehensive Guide To Logistic Regression In R blog to learn more about Logistic Regression.

Logic: To build a Logistic Regression model in order to predict whether or not it will rain on a particular day based on the weather conditions.

Now that you know the objective of this demo, let’s get our brains working and start coding.

Step 1: Install and load libraries

R provides 1000s of packages to run Machine Learning algorithms and mathematical models. So the first step is to install and load all the relevant libraries.


#Load required libraries
library(tidyverse)
library(boot)
install.packages('forecast')
library(forecast)
library(tseries)
install.packages('caret')
library(caret)
install.packages('ROCR')
library(ROCR)

Each of these libraries serves a specific purpose, you can read more about the libraries in the official R Documentation.

Step 2: Import the Data set

Lucky for me I found the data set online and so I don’t have to manually collect it. In the below code snippet, I’ve loaded the data set into a variable called ‘data.df’ by using the ‘read.csv()’ function provided by R. This function is to load a Comma Separated Version (CSV) file.


#Import data set
data.df <- read.csv("/Users/Zulaikha_Geer/Desktop/Data/seattleWeather_1948-2017.csv", header = TRUE)

Step 3: Studying the Data Set

Let’s take a look at a couple of observations in the data set. To do this we can use the head() function provided by R. This will list down the first 6 observations in the data set.


> head(data.df)
DATE     PRCP   TMAX   TMIN  RAIN
1 1948-01-01    0.47       51      42    TRUE
2 1948-01-02   0.59      45      36    TRUE
3 1948-01-03   0.42      45      35    TRUE
4 1948-01-04   0.31      45      34     TRUE
5 1948-01-05   0.17      45      32     TRUE
6 1948-01-06   0.44     48      39     TRUE

Now, let’s look at the structure if the data set by using the str() function.


#Studying the structure of the data set
> str(data.df)
'data.frame': 25551 obs. of 5 variables:
$ DATE: Factor w/ 25551 levels "1948-01-01","1948-01-02",..: 1 2 3 4 5 6 7 8 9 10 ...
$ PRCP: num 0.47 0.59 0.42 0.31 0.17 0.44 0.41 0.04 0.12 0.74 ...
$ TMAX: int 51 45 45 45 45 48 50 48 50 43 ...
$ TMIN: int 42 36 35 34 32 39 40 35 31 34 ...
$ RAIN: logi TRUE TRUE TRUE TRUE TRUE TRUE ...

In the above code, you can see that the data type for the ‘DATE’ and ‘RAIN’ variable is not correctly formatted. The ‘DATE’ variable must be of type Date and the ‘RAIN’ variable must be a factor.

Step 4: Data Cleaning

The below code snippet while format the ‘DATE’ and ‘RAIN’ variable:


#Formatting 'date' and 'rain' variable
data.df$DATE <- as.Date(data.df$DATE)
data.df$RAIN <- as.factor(data.df$RAIN)

Like I mentioned earlier, it is essential to check for any missing or NA values in the data set, the below code snippet checks for NA values in each variable:


#Checking for NA values in the 'DATE' variable
> which(is.na(data.df$DATE))
integer(0)

#Checking for NA values in the 'TMAX' variable
> which(is.na(data.df$TMAX))
integer(0)

#Checking for NA values in the 'TMIN' variable
> which(is.na(data.df$TMIN))
integer(0)

#Checking for NA values in the 'PRCP' variable
> which(is.na(data.df$PRCP))
[1] 18416 18417 21068

#Checking for NA values in the 'rain' variable
> which(is.na(data.df$RAIN))
[1] 18416 18417 21068

If you notice the above code snippet, you can see that variables, TMAX, TMIN and, DATE have no NA values, whereas the ‘PRCP’ and ‘RAIN’ variable has 3 missing values, these values must be removed.


# Remove the rows with missing RAIN value
> data.df <- data.df[-c(18416, 18417, 21068),]

The values are removed successfully!

Step 5: Data Splicing

Data Splicing is just another fancy term for splitting the data set into training and testing set. The training data set must be bigger since training the model and helping it study the trends, requires a lot more data. The below code snippet splits the data set into training and testing sets in the ratio 7:3. Which implies that 70% of the data is used for training, whereas 30% is used for testing.


#Data Splicing
#Data Partitioning: create a train and test dataset (0.7: 0.3)
index <- createDataPartition(data.df$RAIN, p = 0.7, list = FALSE)
# Training set
train.df <- data.df[index,]
# Testing dataset
test.df <- data.df[-index,]

You can check out the summary of the testing and training data set by using the summary() function in R:


> summary(train.df)

> summary(test.df)

Step 6: Data Exploration

This stage involves detecting patterns in the data and finding out correlations between predictor variables and the response variable. In the below code snippet I’ve used the cor.test() function provided by R.

This correlation test shows the significance of the predictor variables in building the model. Also, the cor.test() function requires you to have variables of type numeric, that’s why in the below code I’ve formatted the ‘Rain’ variable as numeric.


#Setting rain variable as numeric for computing the correlation
train.df$RAIN <- as.numeric(train.df$RAIN)

#Correlation between 'Rain' variable and 'TMAX'
> cor.test(train.df$TMAX, train.df$RAIN)

Pearson's product-moment correlation

data: train.df$TMAX and train.df$RAIN
t = -55.492, df = 17882, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.3957173 -0.3707104
sample estimates:
cor
-0.3832841

> #Correlation between 'Rain' variable and 'TMIN'
cor.test(train.df$TMIN, train.df$RAIN)

Pearson's product-moment correlation

data: train.df$TMIN and train.df$RAIN
t = -18.163, df = 17882, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.1489493 -0.1201678
sample estimates:
cor
-0.1345869

The above output shows that both TMIN and TMAX are significant predictor variables. Notice the p-value for both the variables. The p-value or the probability value is the most essential parameter to understand the significance of a model.

If the p-value of a variable is less than 0.05 it is considered to be an important feature in predicting the outcome. In our case, the p-value for each of these variables is way below 0.05 which is a good thing.

Before moving further let’s convert the ‘RAIN’ variable back into the ‘factor’ type:


#Setting rain variable as a factor for building the model
train.df$RAIN <- as.factor(train.df$RAIN)

Step 7: Building a Machine Learning model

After understanding the correlations, it’s time to build the model. We’ll be using the Logistic Regression algorithm to build the model. R provides a function called glm() that contains the Logistic Regression algorithm. The syntax for the glm() function is:

glm(formula, data, family)

In the above syntax:

Formula: The formula represents the relationship between the dependent and independent variables.
Data: The data set on which the formula is applied.
Family: This field specifies the type of regression model. In our case, it is a binary logistic regression model.


> #Building a Logictic regression model
> # glm logistic regression
> model <- glm(RAIN ~ TMAX + TMIN, data = train.df, family = binomial)
> summary(model)

Call:
glm(formula = RAIN ~ TMAX + TMIN, family = binomial, data = train.df)

Deviance Residuals:
Min 1Q Median 3Q Max
-2.4700 -0.8119 -0.2557 0.8490 3.2691

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.808373 0.098668 28.46 <2e-16 ***
TMAX -0.250859 0.004121 -60.87 <2e-16 ***
TMIN 0.259747 0.005036 51.57 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 24406 on 17883 degrees of freedom
Residual deviance: 17905 on 17881 degrees of freedom
AIC: 17911

Number of Fisher Scoring iterations: 5

We’ve successfully built the model by using the ‘TMAX’ and ‘TMIN’ variables since they have a strong correlation with the target variable (‘Rain’).

Step 8: Model Evaluation

At this step, we’re going to validate the efficiency of the Machine Learning model by using the testing data set.


#Model Evaluation
#Storing predicted values
> predicted_values <- predict(model, test.df, type = "response")
> head(predicted_values)
2 4 5 8 9 18
0.7048729 0.5868884 0.4580049 0.4646309 0.1567753 0.8585068

#Creating a table containing the actual 'RAIN' values in the test data set
> table(test.df$RAIN)

FALSE TRUE
4394     3270
> nrows_prediction<-nrow(test.df)

#Creating a data frame containing the predicted 'Rain' values
> prediction <- data.frame(c(1:nrows_prediction))
> colnames(prediction) <- c("RAIN")
> str(prediction)
'data.frame': 7664 obs. of 1 variable:
$ RAIN: int 1 2 3 4 5 6 7 8 9 10 ...

#Converting the 'Rain' variable into a character that stores either (T/F)
prediction$RAIN <- as.character(prediction$RAIN)

#Setting the threshold value
prediction$RAIN <- "TRUE"
prediction$RAIN[ predicted_values < 0.5] <- "FALSE"
#prediction [predicted_values > 0.5] <- "TRUE"
prediction$RAIN <- as.factor(prediction$RAIN)

Refer the comments for the code, it is easily understandable.


#Comparing the predicted values and the actual values
> table(prediction$RAIN, test.df$RAIN)

FALSE TRUE
FALSE    3460    931
TRUE       934    2339

In the below code snippet we’re using the Confusion matrix to evaluate the accuracy of the model.


#Confusion Matrix
> confusionMatrix(prediction$RAIN, test.df$RAIN)
Confusion Matrix and Statistics

Reference
Prediction FALSE TRUE
FALSE 3460 931
TRUE 934 2339

Accuracy : 0.7567
95% CI : (0.7469, 0.7662)
No Information Rate : 0.5733
P-Value [Acc > NIR] : <2e-16

Kappa : 0.5027

Mcnemar's Test P-Value : 0.9631

Sensitivity : 0.7874
Specificity : 0.7153
Pos Pred Value : 0.7880
Neg Pred Value : 0.7146
Prevalence : 0.5733
Detection Rate : 0.4515
Detection Prevalence : 0.5729
Balanced Accuracy : 0.7514

'Positive' Class : FALSE

As per the above output, the model can predict the possibility of rainfall with an accuracy of approximately 76% which is quite good. To sum it up, let’s plot a graph that shows the Logistic Regression curve, which is known as the Sigmoid curve between the predictor variable TMAX and the target variable RAIN.


#Output plot showing the variation between TMAX and Rainfall
ggplot(test.df, aes(x = test.df$TMAX, y = predicted_values))+
geom_point() + # add points
geom_smooth(method = "glm", # plot a regression...
method.args = list(family = "binomial"))

ggplot - Introduction To Machine Learning - Edureka

ggplot – Introduction To Machine Learning – Edureka

Now that you know Machine Learning Basics, I’m sure you’re curious to learn more about the various Machine learning algorithms. Here’s a list of blogs that cover the different types of Machine Learning algorithms in depth:

So, with this, we come to the end of this Introduction To Machine Learning blog. I hope you all found this blog informative. If you have any thoughts to share, please comment them below. Stay tuned for more blogs like these!

The post Introduction To Machine Learning: All You Need To Know About Machine Learning appeared first on Edureka.

↧

Parameters	RPA	Selenium
Automation	Automates Business Processes	Automates Browser Applications
Availability	UiPath: Community version available Automation Anywhere: Community version available BluePrism: Licensed	Open Source
Where is the Task Performed?	At the Backend of the Process	On the Current Browser Page
Major Component Used	RPA Bots	Selenium Web Drivers
Level of Automation	Low-value Clerical Process	No Clerical Process
Life Cycle	Simple and Easy	Relatively Difficult
Platform Dependency	Platform Independent	Browser Platform Dependent
Programming Knowledge	Not Required	Required
Skills Required	SQL Database, Analytical Skills, Problem-Solving Ability, Managing Data, Knowledge Of The RPA Tools.	Selenium IDE (Creating a Test Suite )

Azure DevOps Tutorial : Why Should You Use DevOps On Azure?

April 22, 2019, 5:58 am

≫ Next: JMeter Plugins : All You Need To Know About Plugins Manager

≪ Previous: Top 50 Automation Anywhere Interview Questions You Should Know In 2019

DevOps is the need of the hour and many organisations are wanting to incorporate this approach to make their businesses function better. Azure is one of the leading cloud service providers and supports a powerful set of DevOps services. This article on Azure DevOps helps you unravel implementation of DevOps using Azure.

We would be discussing following pointers here:

Let us get started then,

What is Azure?

A company that provides different services to cater Cloud Computing needs, is known as a cloud service provider. Amongst the various cloud service providers or vendors is Microsoft Azure.

Microsoft Azure is a platform for Cloud Computing which is created by developers and IT professionals at Microsoft. It lets you build, deploy and manage applications through their globally owned network of data centers.

These are some of the core service domains that Microsoft Azure impacts. Those are:

Compute
Storage
Networking
Databases
Monitoring

Now Let us continue with this Azure DevOps blog to understand what is Devops?

What is DevOps?

Going by the definition,

DevOps is the process of integrating Developer and Operation teams in order to improve collaborations and productivity. This is done by automating workflows and productivity for continuous measurement of application performance.

DevOps - Azure DevOps - Edureka This definition might seem ambiguous to beginners because there are a lot of unexplained terms there, let us try and understand those first.

Faster delivery of software deployment has become the need of the hour. Because the software market these days is volatile and you need to stay updated and ensure you deliver the best and the latest software and this can be achieved by continuous delivery and integration. This is where roles of software developers and system administrators become very important.

A software developer is the one who develops the software. They have to ensure a software has following parameters taken care of:

New features
Security Upgrades
Bug Fixes

A developer however has to take ‘time to market’ under consideration and the time constraint forces him to re-adjust his activities like:

Pending code
Old code
New products
New features

And this is what happens when the product is put into the production environment, it may exhibit some unforeseen errors. This is because the code written in the development environment which can be different from the production environment.

Now let us see the same scenario from the operators perspective:

This team looks after the maintenance to ensure appropriate up time of the software to the production environment. With the growing software development needs, administrators or operators are forced to take care of many servers in parallel.

Now, the tools that were used to manage the earlier amount of servers may not be sufficient to manage the growing number of servers. The operations team makes minor changes to the code so that it can fit into the production environment equally well as it did in the developer environment. Thus these deployments need to be scheduled properly to avoid delays.

Deployment Wall - Azure DevOps - Edureka When the code is deployed the responsibility of an operator increases even further, they are required to manage code changes and errors if any. At times it may seem like developers have forced their responsibilities to the operations side. This is where the problem lies and if these teams could work together a lot problems would be solved like they:

could break down silos
share responsibilities
start thinking alike
work as a team

This is what DevOps does, it bring in these two teams under one roof. Now I believe the definition we saw at the beginning of the section, makes a lot more sense.

Now let us see what makes Azure a good fit to use DevOps and what makes Azure DevOps a good choice?

Why go for Azure DevOps?

We all know Azure is a leading cloud service provider and is definitely the need of the hour. Following features by Azure ensure DevOps is implemented in the best possible way:

Catalyze cloud development

You can now worry less about creating pipelines and focus on software development. Whether it is about dealing with pipelines, creating new ones or managing the existing ones, Azure provides end to end solution for the same and thus speeds up the process of software development.

Better Reliability and Continuous Integration

Now you don’t have to worry about managing security and infrastructures and you can focus more on developing innovative solutions. With Azure and CI/CD it supports, you can harness and use infrastructure as code which boasts of tools like Azure Resource manager or Terraform—that help you create repeatable deployments which also meet compliance standards.

Freedom Of Customization

With Azure providing end to end solutions for your software development you have the freedom of using the tools of your choice as Azure readily incorporates with most of the market tools out there thus making customization and experimentation easier.

Above reasons make Azure a highly compatible and preferable option for DevOps. Let us now understand what are the components that make Azure DevOps a good combination.

Components of Azure DevOps

Following are the Azure DevOps Components:

Pipelines
Boards
Artifacts
Repos
Test Plans

Let us understand them one by one:

Azure Pipelines

azure pipeline - Azure DevOps - Edureka

So what is an Azure Pipeline? Well in simple words it is a pipeline supported by Azure platform where you can build, test and deploy applications continuously.

Azure Boards

Azure Boards - Azure DevOps - Edureka

If you have multiple teams working on a project, those teams need to communicate better. Azure Boards ensure better work tracking. Lets you deal with backlogs and also ensure creation great custom reports.

Azure Artifacts

Azure Artifacts - Azure DevOps - Edureka Azure enables you to create, host, and also share packages among your team. Artifacts in Azure ensure your pipelines have fully integrated package management. This can achieved with a mere click. You can also create Maven, npm, and NuGet packages. Here the team size does not matter.

Azure Repos

Azure Repos - Azure DevOps - Edureka Think of it as a home or a storage for repositories. It provides you with unlimited cloud-hosted private Git repositories. You can pull, push and commit your changes to these repositories.

Azure Test Plans

Azure test Plans - Azure DevOps - Edureka

It provides you with a complete toolkit to perform end to end, manual and exploratory testing ensuring your software functions just fine.

This was about Azure Components. If you wish to know more about Azure Devops and want to know how to develop CI/CD pipeline then you refer the video below:

Azure DevOps: Developing CI/ CD Pipelines On Azure | Edureka

So this is it guys, this brings us to the end of this article on Azure DevOps. Likewise if you are interested in AWS and you wish to take your expertise in AWS DevOps to the next level and get certified in the domain, then you might want to take a look at this certification training on ‘AWS Certified DevOps Engineer‘ by Edureka which is specially catered to help individuals gain expertise in the domain.

If you do have any queries, you can put those in the comment section below and we would revert at the earliest.

The post Azure DevOps Tutorial : Why Should You Use DevOps On Azure? appeared first on Edureka.

↧

JMeter Plugins : All You Need To Know About Plugins Manager

April 23, 2019, 5:53 am

≫ Next: Introduction To Python- All You Need To know About Python

≪ Previous: Azure DevOps Tutorial : Why Should You Use DevOps On Azure?

It is important to ensure effective performance of software application and Software Testing is the key that makes sure the application runs without any failures. Plugins enable everyone to contribute to the program. Basically, they are used to improve the performance of the software. In this “JMeter Plugins” article we will see how they work in the following sequence:

Introduction to JMeter

Apache JMeter is a testing tool used for analyzing and measuring the performance of different software services and products. It is a pure Java open source software used for testing Web Application or FTP application.

Apache JMeter - JMeter plugins - edureka

It is used to execute performance testing, load testing and functional testing of web applications. JMeter can also simulate a heavy load on a server by creating tons of virtual concurrent users to web server.

Introduction to JMeter Plugins

JMeter Plugins are software components used to customize programs by extending abilities and inserting functions. Apache JMeter is a powerful tool for load testing. JMeter has many features, but one of the best things about JMeter is that it is open source software. Therefore, any interested party can develop additions that will extend its capabilities and insert functions. These additions are called Plugins.

Plugins - JMeter plugins - edureka In JMeter, plugins have multiple uses, ranging from graph tools and listener to developer tools. The Plugins Manager installs, upgrades and uninstalls plugins for users, making the plugin installation process smoother and more convenient.

JMeter Plugins Tutorial | Edureka

Steps to Install Plugins Manager

The steps involved in the installation of Plugins Manager are:

You need to download the Plugins Manager Jar file from the following link:

https://jmeter-plugins.org/wiki/PluginsManager/

In the second step, you need to put the jar file in the lib/ext directory of JMeter

Install plugin manager - edureka

Now restart JMeter
Click “Options” and then “Plugins Manager” that will give the list of plugins.

Install/Uninstall Plugins Manually

The plugins manager can perform the following tasks:

Install new plugins from the available plugins

install plugins - jmeter plugins - edureka

Uninstall old plugins from the list of installed plugins

Uninstall plugins - edureka

If there are any updates available, upgrade your existing plugins

upgrade plugins - edureka

Now that we are done the installation process, let’s have a look at a few plugins that are being majorly used in the industry.

Top 5 JMeter Plugins

Developers have already created a large variety of useful plugins along with a strong JMeter community. The Plugins website lists a lot of available plugins for JMeter. On the website, you can search through all the available plugins and find the one that fits your needs.

Let’s have a look at the 5 most commonly used plugins in JMeter:

1. PerfMon Servers Performance Monitoring

This plugin extends JMeter with the PerfMon Servers Performance Monitoring listener. This listener allows us to monitor CPU, Memory, Swap, Disks I/O and Networks I/O of the servers loaded.

To find, click: Test plan -> Add -> Listener -> jp@gc – PerfMon Metrics Collector

2. Custom Thread Groups

The Custom Thread Groups plugin adds five thread group types:

Stepping Thread Group
Ultimate Thread Group
Concurrency Thread Group
Arrivals Thread Group
Free-Form Arrivals Thread Group

These five thread groups open huge possibilities to create required schedules for test runs.

To find, click: Test plan -> Add -> Threads (Users) -> jp@gc – Ultimate Thread Group

3. Dummy Sampler

The Dummy Sampler emulates the work of requests and responses without actually running the requests. Request and response data are defined in the sampler’s fields. This is a very convenient way to debug post-processors and extractors.

To find, click: Thread Group -> Add -> Sampler -> jp@gc – Dummy Sampler

4. Throughput Shaping Timer

This plugin adds the following functionalities to JMeter: Throughput Shaping Timer, Special Property Processing, and Schedule Feedback Function. These elements enable us to limit the test throughput, to ensure we don’t exceed our required throughput value.

This interesting timer is designed to control requests per second to the server during a test run.

To find, click: Thread Group -> Add -> Timer -> jp@gc – Throughput Shaping Timer

5. Flexible File Writer

This plugin extends JMeter with the Flexible File Writer listener. This listener is designed to write test results into the file in a flexible format, which can be specified via the JMeter GUI.

To find, click: Test Plan -> Add -> Listener -> jp@gc – Flexible File Writer

These were some of the most commonly used Plugins in JMeter. With this, we have come to the end of our article. I hope you guys enjoyed this and understood how JMeter Plugins are installed and used in a test plan.

Now that you have understood what are JMeter Plugins, check out the Performance Testing Using JMeter Course by Edureka, a trusted online learning company with a network of more than 250,000 satisfied learners spread across the globe. This course provides you insights into software behavior during workload. In this course, you will learn how to check the response time and latency of software and test if a software package is efficient for scaling. The course will help you check the strength and analyze the overall performance of an application under different load types.

Got a question for us? Please mention it in the comments section of “JMeter Plugins” and we will get back to you.

The post JMeter Plugins : All You Need To Know About Plugins Manager appeared first on Edureka.

↧

Introduction To Python- All You Need To know About Python

April 23, 2019, 6:14 am

≫ Next: Dataframes in Spark: All you need to know about Structured Data Processing

≪ Previous: JMeter Plugins : All You Need To Know About Plugins Manager

The IT industry is booming with artificial intelligence, machine learning and data science applications. With the new age applications, demand for a python developer has also increased. Ease of access and readability has made python one of the most popular programming language nowadays. Now is the time to switch over to python and unleash the endless possibilities that python programming comes with. This article on Introduction to python will guide you with the fundamentals and basic concepts in python programming.

In this article, I will give you an introduction to python. Following are the topics that will be covered in this blog:

Introduction To Python

Python is a general purpose programming language. It is very easy to learn, easy syntax and readability is one of the reasons why developers are switching to python from other programming languages.

We can use python as object oriented and procedure oriented language as well. It is open source and has tons of libraries for various implementations.

features-introduction to python-edureka

Python is a high level interpreted language, which is best suited for writing python scripts for automation and code re-usability.

It was created in 1991 by Guido Van Rossum. The origin of its name is inspired by the comedy series called ‘Monty python’.

Guido Van Rossum - Introduction to python - Edureka Working with python gives us endless possibilities. We can use python in data science, machine learning, Artificial intelligence, web development, software development etc.

applications of python-introduction to python-edureka

In order to work with any programming language, you must be familiar with an IDE. You can find the setup for an IDE for python, on ‘python.org’ and install it on your system. The installation is seemingly easy and comes with IDLE for writing python programs.

installation-introduction to python-edureka

After you have installed python on your system, you are all set to write programs in python programming language.

Lets start with this introduction to python with keywords and identifiers.

Keywords & Identifiers

Keywords are nothing but special names that are already present in python. We can use these keywords for specific functionality while writing a python program.

Following is the list of all the keywords that we have in python:

keywords-introduction to python-edureka


import keyword
keyword.kwlist
#this will get you the list of all keywords in python.
keyword.iskeyword('try')
#this will return true, if the mentioned name is a keyword.

Identifiers are user defined names that we use to represent variables, classes, functions, modules etc.

name = 'edureka'
my_identifier = name

Variables & Data Types

Variables are like a memory location where you can store a value. This value, you may or may not change in the future.

x = 10
y = 20
name = 'edureka'

To declare a variable in python, you only have to assign a value to it. There are no additional commands needed to declare a variable in python.

Data Types in Python

Numbers
String
List
Dictionary
Set
Tuple

Numbers

Numbers or numerical data type is used for numerical values. We have 4 types of numerical data types.

#integers are used to declare whole numbers.
x = 10
y = 20
#float data types is used to declare decimal point values
x = 10.25
y = 20.342
#complex numbers denote the imaginary values
x = 10 + 15j
#boolean is used to get categorical output
num = x < 5
#the output will be either true or false here.

String

String data type is used to represent characters or alphabets. You can declare a string using single ” or double quotes “”.

name = 'edureka'
course = "python"

To access the values in a string, we can use indexes.

name[2]
# the output will be the alphabets at that particular index.

List

List in python is like a collection where you can store different values. It need not be uniform and can have different values.

Lists are indexed and can have duplicate values as well. To declare a list, you have to use square brackets.

my_list = [10, 20, 30, 40, 50, 60, 'edureka', 'python']
print(my_list)

To access values in a list we use indexes, following are a few operations that you can perform on a list:

append
clear
copy
count
extend
insert
pop
reverse
remove
sort

Following is a code for a few operations using a list:

a = [10,20,30,40,50]
#append will add the value at the end of the list
a.append('edureka')
#insert will add the value at the specified index
a.insert(2,'edureka')
#reverse will reverse the list
a.reverse()
print(a)
#the output will be
['edureka', 50, 40, 30, 'edureka', 20, 10]

Dictionary

A dictionary is unordered and changeable, we use the key value pairs in a dictionary. Since the keys are unique we can use them as indexes to access the values from a dictionary.

Following are the operations you can perform on a dictionary:

clear
copy
fromkeys
get
items
keys
pop
getitem
setdefault
update
values

my_dictionary = { 'key1' : 'edureka' , 2 : 'python'}
mydictionary['key1']
#this will get the value 'edureka'. the same purpose can be fulfilled by get().
my_dictionary.get(2)
#this will get the value 'python'.

Tuple

Tuple is another collection which is ordered and unchangeable. We declare the tuples in python with round brackets. Following are the operations you can perform on a tuple:

count
index

mytuple = (10,20,30,40,50,50,50,60)
mytuple.count(40)
#this will get the count of duplicate values.
mytuple.index(20)
#this will get the index for the vale 20.

Set

A set is a collection which is unordered and unindexed. A set does not have any duplicate values as well. Following are some operations you can perform on a set:

add
copy
clear
difference
difference_update
discard
intersection
intersection_update
union
update

myset = { 10 ,20,30,40,50,60,50,60,50,60}
print(myset)
#there will be no duplicate values in the output

In any programming language, the concept of operators plays a vital part. Lets take a look at operators in python.

Operators

Operators in python are used to do operations between two values or variables. Following are the different types of operators that we have in python:

Arithmetic Operators
Logical Operators
Assignment Operators
Comparison Operators
Membership Operators
Identity Operators
Bitwise Operators

Arithmetic Operators

Arithmetic operators are used to perform arithmetic operations between two values or variables.

arithmetic operaotrs-introduction to python-edureka

#arithmetic operators examples
x + y
x - y
x ** y

Assignment Operators

Assignment operators are used to assign values to a variable.

Logical Operators

Logical operators are used to compare conditional statements in python.

Comparison Operators

Comparison operators are used to compare two values.

Membership Operators

Membership operators are used to check whether a sequence is present in an object.

Identity Operators

Identity operators are used to compare two objects.

Bitwise Operators

Bitwise operators are used to compare binary values.

Now that we have understood operators in python, lets understand the concept of loops in python and why we use loops.

Loops In Python

A loop allows us to execute a group of statements several times. To understand why we use loops, lets take an example.

Suppose you want to print the sum of all even numbers until 1000. If you write the logic for this task without using loops, it is going to be a long and tiresome task.

But if we use a loop, we can write the logic to find the even number, give a condition to iterate until the number reaches 1000 and print the sum of all the numbers. This will reduce the complexity of the code and also make it readable as well.

There are following types of loops in python:

for loop
while loop
nested loops

For Loop

A ‘for loop’ is used to execute statements once every iteration. We already know the number of iterations that are going to execute.

A for loop has two blocks, one is where we specify the conditions and then we have the body where the statements are specified which gets executed on each iteration.

for x in range(10):
    print(x)

While Loop

The while loop executes the statements as long as the condition is true. We specify the condition in the beginning of the loop and as soon as the condition is false, the execution stops.

i = 1
while i < 6:
     print(i)
     i += 1
#the output will be numbers from 1-5.

Nested Loops

Nested loops are combination of loops. If we incorporate a while loop into a for loop or vis-a-vis.

Following are a few examples of nested loops:

for i in range(1,6):
   for j in range(i):
       print(i , end = "")
   print()
# the output will be
1
22
333
4444
55555

Conditional and Control Statements

Conditional statements in python support the usual logic in the logical statements that we have in python.

Following are the conditional statements that we have in python:

if
elif
else

if statement

x = 10
if x > 5:
   print('greater')

The if statement tests the condition, when the condition is true, it executes the statements in the if block.

elif statement

x = 10
if x > 5:
   print('greater')
elif x == 5:
     print('equal')
#else statement

x =10
if x > 5:
   print('greater')
elif x == 5:
     print('equal')
else:
     print('smaller')

When both if and elif statements are false, the execution will move to else statement.

Control statements

Control statements are used to control the flow of execution in the program.

Following are the control statements that we have in python:

break
continue
pass

break


name = 'edureka'
for val in name:
    if val == 'r':
       break
    print(i)
#the output will be
e
d
u

The execution will stop as soon as the loop encounters break.

Continue


name = 'edureka'
for val in name:
    if val == 'r':
       continue
    print(i)
#the output will be
e
d
u
e
k
a

When the loop encounters continue, the current iteration is skipped and rest of the iterations get executed.

Pass

name = 'edureka'
for val in name:
    if val == 'r':
       pass
    print(i)

#the output will be
e
d
u
r
e
k
a

The pass statement is a null operation. It means that the command is needed syntactically but you do not wish to execute any command or code.

Now that we are done with the different types of loops that we have in python, lets understand the concept of functions in python.

Functions

A function in python is a block of code which will execute whenever it is called. We can pass parameters in the functions as well. To understand the concept of functions, lets take an example.

Suppose you want to calculate factorial of a number. You can do this by simply executing the logic to calculate a factorial. But what if you have to do it ten times in a day, writing the same logic again and again is going to be a long task.

Instead, what you can do is, write the logic in a function. Call that function everytime you need to calculate the factorial. This will reduce the complexity of your code and save your time as well.

How to Create a Function?

# we use the def keyword to declare a function

def function_name():
#expression
    print('abc')

How to Call a Function?

def my_func():
    print('function created')

#this is a function call
my_func()

Function Parameters

We can pass values in a function using the parameters. We can use also give default values for a parameter in a function as well.

def my_func(name = 'edureka'):
    print(name)

#default parameter

my_func()
#userdefined parameter
my_func('python')

Lambda Function

A lambda function can take as many numbers of parameters, but there is a catch. It can only have one expression.


# lambda argument: expressions
lambda a,b : a**b
print(x(2,8))
#the result will be exponentiation of 2 and 8.

Now that we have understood function calls, parameters and why we use them, lets take a look at classes and objects in python.

Classes & Objects

What are Classes?

Classes are like a blueprint for creating objects. We can store various methods/functions in a class.


class classname:
      def functionname():
          print(expression)

What are Objects?

We create objects to call the methods in a class, or to access the properties of a class.


class myclass:
     def func():
         print('my function')

#<span style="color: #ff6600;">creating</span> an object

ob1 = myclass()

ob.func()

init function

It is an inbuilt function which is called when a class is being initiated. All classes have __init__ function. We use the __init__ function to assign values to objects or other operations which are required when an object is being created.


class myclass:
      def __init__(self, name):
          self.name = name
ob1 = myclass('edureka')
ob1.name
#the output will be- edureka

Now that we have understood the concept of classes and objects, lets take a look at a few oops concepts that we have in python.

OOPs Concepts

Python can be used as an object oriented programming language. Hence, we can use the following concepts in python:

Abstraction
Encapsulation
Inheritance
Polymorphism

Abstraction

Data abstraction refers to displaying only the necessary details and hiding the background tasks. Abstraction is python is similar to any other programming language.

Like when we print a statement, we don’t know what is happening in the background.

Encapsulation

Encapsulation is the process of wrapping up of data. In python, classes can be a example of encapsulation where the member functions and variables etc are wrapped into a class.

Inheritance

Inheritance is an object oriented concept where a child class inherits all the properties from a parent class. Following are the types of inheritance we have in python:

Single Inheritance
Multiple Inheritance
Multilevel Inheritance

Single Inheritance

In single inheritance there is only one child class that inherits the properties from a parent class.


class parent:
     def printname(name):
         print(name)

class child(parent):
      pass
ob1 = child('edureka')
ob1.printname

Multiple Inheritance

In multiple inheritance, we have two parent classes and one child class that inherits the properties from both the parent classes.

Multilevel Inheritance

In multilevel inheritance, we have one child class that inherits properties from a parent class. The same child class acts as a parent class for another child class.

Polymorphism

Polymorphism is the process in which an object can be used in many forms. The most common example would be when a parent class reference is used to refer to a child class object.

We have understood the oops concepts that we have in python, lets understand the concepts of exceptions and exception handling in python.

Exceptional Handling

When we are writing a program, if an error occurs the program will stop. But we can handle these errors/exceptions using the try, except, finally blocks in python.

When the error occurs, the program will not stop and execute the except block.


try:
    print(x)
except:
       print('exception')

Finally

When we specify a finally block. It will be executed even if there is an error or not raised by the try except block.

try :
    print(x)

except:
      print('exception')
finally:
      print('this will be executed anyway')

Now that we have understood exception handling concepts. Lets take a look at file handling concepts in python.

File Handling

File handling is an important concept of python programming language. Python has various functions to create, read, write, delete or update a file.

Creating a File

import os
f = open('file location')

Reading a File

f = open('file location', 'r')
print(f.read())
f.close()

Append a File

f = open('filelocation','a')
f.write('the content')
f.close()
f = open('filelocation','w')
f.write('this will overwrite the file')
f.close()

Delete a file

import os
os.remove('file location')

These are all the functions we can perform with file handling in python.

I hope this blog on introduction to python helped you learn all the fundamental concepts needed to get started with python programming language.

This will be very handy when you are working on python programming language, as this is the basis of learning in any programming language.Once you have mastered the basic concepts in python, you can begin your quest to become a python developer. To know more about python programming language in depth you can enroll here for live online python training with 24/7 support and lifetime access.

Have any queries? you can mention them in the comments and we will get back to you.

The post Introduction To Python- All You Need To know About Python appeared first on Edureka.

↧

Dataframes in Spark: All you need to know about Structured Data Processing

April 24, 2019, 12:42 am

≫ Next: Exceptions in Selenium – Know How To Handle Exceptions

≪ Previous: Introduction To Python- All You Need To know About Python

DataFrame is the pinnacle of Spark’s Technological advancements that helped to achieve multiple potentialities in Big-data environment. It is an integrated data structure that helps programmers to perform multiple operations on data with a single API. Ill line up the docket of key points for understanding the DataFrames in Spark as below.

What are DataFrames in Spark?

In simple terms, A Spark DataFrame is considered as a distributed collection of data which is organized under named columns and provides operations to filter, group, process, and aggregate the available data. DataFrames can also be used with Spark SQL. We can construct DataFrames from structured data files, RDDs, tables in Hive, or from an external database as shown below.

Here we have created a DataFrame about employees which has Name of the employee as string datatype, Employee-ID as string datatype, Employee phone number as an integer datatype, Employee address as a string datatype, Employee salary as a float datatype. The data of each employee is stored in each row as shown above.

Why do we need DataFrames?

DataFrames are designed to be multifunctional. We need DataFrames for:

Multiple Programming languages

The best property of DataFrames in Spark is its support for multiple languages, which makes it easier for programmers from different programming background to use it.
DataFrames in Spark support R–Programming Language, Python, Scala, and Java.

Multiple data sources

DataFrames in Spark can support a large variety of sources of data. We shall discuss one by one in the use case we deal with the upcoming part of this article.

Processing Structured and Semi-Structured Data

The core requirement for which the DataFrames are introduced is to process the Big-Data with ease. DataFrames in Spark uses a table format to store the data in a versatile way along with the schema for the data it is dealing with.

Slicing and Dicing the data

DataFrame APIs support slicing and dicing the data. It can perform operations like select and filter upon rows, columns.
Statistical data is always prone to have Missing values, Range Violations, and Irrelevant values. The user can manage the missing data explicitly by using DataFrames.

Now that we have understood the need for DataFrames, Let us move to the next stage where we would understand the features of DataFrames which give it an edge over other alternatives.

Features of DataFrames in Spark

DataFrame in spark is Immutable in nature. Like the Resilient Distributed Datasets, the data present in a DataFrame cannot be altered.
Lazy Evaluation is the key to the remarkable performance offered by the spark. DataFrames in Spark will not throw an output on to the screen unless an action operation is provoked.
The Distributed Memory technique used to handle data makes them fault tolerant.
Like Resilient Distributed Datasets, DataFrames in Spark extend the property of Distributed memory model.
The only way to alter or modify the data in a DataFrame would be by applying Transformations.

So, these were the features of DataFrames, Let us now look into the sources of data for the DataFrames in Spark.

Sources for Spark DataFrame

We use multiple ways to create DataFrames in Spark.
Data can be loaded in through a CSV, JSON, XML, SQL, RDBMS and many more.
It can also be created using an existing RDD and through any other database, like Hive, HBase, Cassandra as well. It can also take in data from HDFS or the local file system
Now that we have finished the theory part of DataFrames in Spark, Let us get our hands on DataFrames and execute the practical part. Creation of a DataFrame happens to be our first part.

Creation of DataFrame in Spark

Let us use the following code to create a new DataFrame.
Here, we shall create a new DataFrame using the createDataFrame method.
First, we ingest the data of all available employees into an Employee RDD.
Later, we shall design the schema for the data we have entered into Employee RDD.
Finally, let us use the createDataFrame method to create our DataFrame
Hence, we create DataFrame and display it by using the .show method.

val Employee = seq(Row("Mike","Robert","Mike09@gmail.com",10000),Row("John","Milers","John09@gmail.com",20000),Row("Brett","Lee","Brett09@gmail.com",25000),
Row("Letty","Brown","Brown09@gmail.com",35000))
val EmployeeSchema = List(StructField("FirstName", StringType, true), StructField("LastName", StringType, true), StructField("MailAddress", StringType, true), StructField("Salary", IntegerType, true))
val EmpDF = spark.createDataFrame(spark.sparkContext.parallelize(Employee),StructType(EmployeeSchema))
EmpDF.show

Employee-data-frame

Similarly, Let us also create Department DataFrame.

val Department = Seq(Row(1001,"Admin"),Row(1002,"IT-Support"),Row(1003,"Developers"),Row(1004,"Testing"))
val DepartmentSchema = List(StructField("DepartmentID", IntegerType, true), StructField("DepartmentName", StringType, true))
val DepDF = spark.createDataFrame(spark.sparkContext.parallelize(Department),StructType(DepartmentSchema))
DepDF.show

employee-dataframe

Spark DataFrame Example: FIFA 2k19 Dataset.

FIFA-Banner-Edureka

Before we read the data from a CSV file, We need to import certain libraries which we need for processing the DataFrames in Spark.

import org.apache.spark.sql.types._
import org.apache.spark.storage.StorageLevel 
import scala.io.Source 
import scala.collection.mutable.HashMap 
import java.io.File 
import org.apache.spark.sql.Row 
import org.apache.spark.sql.types._ 
import scala.collection.mutable.ListBuffer 
import org.apache.spark.util.IntParam
import org.apache.spark.rdd.RDD 
import org.apache.spark.SparkContext 
import org.apache.spark.SparkContext._ 
import org.apache.spark.SparkConf 
import org.apache.spark.sql.SQLContext 
import org.apache.spark.rdd._ 
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
import sqlContext._

We design the schema for our CSV file once we import libraries,

val schema = StructType(Array(StructField("ID", IntegerType, true),StructField("Name", StringType, true),StructField("Age", IntegerType, true),StructField("
Nationality", StringType, true),StructField("Potential", IntegerType, true),StructField("Club", StringType, true),StructField("Value", StringType, true),StructFiel
d("Preferred Foot", StringType, true),StructField("International Reputation", IntegerType, true),StructField("Skill Moves", IntegerType, true),StructField("Positio
n", StringType, true),StructField("Jersey Number", IntegerType, true),StructField("Crossing", IntegerType, true),StructField("Finishing", IntegerType, true),Struct
Field("HeadingAccuracy", IntegerType, true),StructField("ShortPassing", IntegerType, true),StructField("Volleys", IntegerType, true),StructField("Dribbling", Integ
erType, true),StructField("Curve", IntegerType, true),StructField("FKAccuracy", IntegerType, true),StructField("LongPassing", IntegerType, true),StructField("BallC
ontrol", IntegerType, true),StructField("Acceleration", IntegerType, true),StructField("SprintSpeed", IntegerType, true),StructField("Agility", IntegerType, true),
StructField("Balance", IntegerType, true),StructField("ShotPower", IntegerType, true),StructField("Jumping", IntegerType, true),StructField("Stamina", IntegerType,
true)))

schema-after-submission

Let us load the Fifa data from a CSV file from the HDFS as shown below. We are first going to use Spark.read.format(“csv”) method for reading our CSV file from our HDFS.

val FIFAdf = spark.read.format("csv").option("header", true").load("/user/edureka_566977/FIFA2k19file/FIFA2k19.csv")

Let us use .printSchema() method to see the schema of our CSV file.

FIFAdf.printSchema()

print-Schema-Edureka

Let us find out the total number of rows we have using the following code.

#count
FIFAdf.count()

count-function-Edureka

Let us now find the columns we have in our CSV file. We shall use the following code.

FIFAdf.columns.foreach(println)

columns-Edureka

If you wish to look at the summary of a particular column in a DataFrame, we can apply to describe command. This command will give us the statistical summary of a particular selected column if nothing is specified, then it provides the statistical information of the DataFrame.
Let us find out the description of the Value column to know the minimum and maximum values present in it.

#describe
FIFAdf.describe("Value").show

Describe-function-Edureka

We shall find out the Nationality of a particular player by using the select command.

#select
FIFAdf.select("Name","Nationality").show

select-Spark-Edureka

Let us find out the names of the players and their particular Clubs by using the select and distinct operations.

#select and distinct
FIFAdf.select("Name","Club").distinct.show()

select-DataFrames-in-Spark-Edureka

We shall find out the players under 30 years of age and extract all their details about Player-ID, Nationality, Overall, Potential, Value, Skill Moves, Body Type, Position and Player Jersey Number.

#select and filter
FIFAdf.select("Index","ID","Name","Age","Nationality","Overall","Potential","Value","Skill Moves","Body Type","Position","Jersey Number").filter(" Age < 30 ").show

Filter-DataFrames-in-Spark-Edureka

So, this was about the FIFA 2019 dataset example that we dealt with, Now let me walk you through a use case which will help you learn more about DataFrames in spark with the most trending topic which is none other than “The Game of Thrones”

DataFrames in Spark: Game of Thrones Use Case

We need to import certain libraries which we need for processing the DataFrames in Spark as we did in our previous example and load our Game of Thrones CSV file.

Now we have successfully loaded all the libraries we needed for processing our DataFrames in Spark.

First, we shall design the schema for Character-Deaths.csv file as shown below.

val schema = StructType(Array(StructField("Name", StringType, true), StructField("Allegiances", StringType, true), StructField("Death Year", IntegerType, tr
ue), StructField("Book of Death", IntegerType, true), StructField("Death Chapter", IntegerType, true), StructField("Book Intro Chapter", IntegerType, true), Struct
Field("Gender", IntegerType, true), StructField("Nobility", IntegerType, true), StructField("GoT", IntegerType, true), StructField("CoK", IntegerType, true), Struc
tField("SoS", IntegerType, true), StructField("FfC", IntegerType, true), StructField("DwD", IntegerType, true)))

Next, we shall design the schema for the Battles.csv file as shown below:

val schema2 = StructType(Array(StructField("name", StringType, true), StructField("year", IntegerType, true), StructField("battle_number", IntegerType, true
), StructField("attacker_king", StringType, true), StructField("defender_king", StringType, true), StructField("attacker_1", StringType, true), StructField("attack
er_2", StringType, true), StructField("attacker_3", StringType, true), StructField("attacker_4", StringType, true), StructField("defender_1", StringType, true), St
ructField("defender_2", StringType, true), StructField("defender_3", StringType, true), StructField("defender_4", StringType, true), StructField("attacker_outcome"
, StringType, true), StructField("battle_type", StringType, true), StructField("major_death", StringType, true), StructField("major_capture", IntegerType, true), S
tructField("attacker_size", IntegerType, true), StructField("defender_size", IntegerType, true), StructField("attacker_commander", StringType, true), StructField("
defender_commander", StringType, true), StructField("summer", IntegerType, true), StructField("location", StringType, true), StructField("region", StringType, true
), StructField("note", StringType, true)))

Once after we design the schema successfully for our CSV files, the next step would be loading them on to the Spark-Shell. The following code will help us to load the CSV files on to the Spark-Shell.

val GOTdf = spark.read.option("header", "true").schema(schema).csv("/user/edureka_566977/GOT/character-deaths.csv")
val GOTbattlesdf = spark.read.option("header", "true").schema(schema2).csv("/user/edureka_566977/GOT/battles.csv")

load-data-DataFrames-in-Spark-Edureka

Once we load the csv file on to the Spark-Shell, we can print the schema of our CSV files that we can cross verify our design on our data. The following codes will help us to print our schema.

GOTdf.printSchema()

schema-DataFrames-in-Spark-Edureka

GOTbattlesdf.printSchema()

schema2-DataFrames-in-Spark-Edureka

After verifying the schema, let us print the data present in our DataFrame. We can use the following code to print the data present in our DataFrame.

#select
GOTdf.select("Name","Allegiances","Death Year","Book of Death","Death Chapter","Book Intro Chapter","Gender","Nobility","GoT","CoK","SoS","FfC","DwD").show

select-DataFrames-in-Spark-Edureka

We know that there are a different number of houses in Game of Thrones. Let us find out every individual house present in our DataFrame.

#select and groupBy
sqlContext.sql("select attacker_1, count(distinct(' ')) from battles group by attacker_1").show

distinct-DataFrames-in-Spark-Edureka

The battles in Game of Thrones were fought for ages. Let us classify the wars waged with their occurrence according to the year in which they were fought using select and filter transformation by accessing the columns of the details of the battle and the year column. The code below will help us to do so.
Let us find the battles fought in the year 298using the code below:

#select and filter
GOTbattlesdf.select("name","year","battle_number","attacker_king","defender_king","attacker_outcome","attacker_commander","defender_commander","location").filter("year == 298").show

filter-DataFrames-in-Spark-Edureka

Let us find the battles fought in the year 299 using the code below:

#select
GOTbattlesdf.select("name","year","battle_number","attacker_king","defender_king","attacker_outcome","attacker_commander","defender_commander","location").filter("year == 299").show

filter-DataFrames-in-Spark-Edureka

Let us find the battles fought in the year 300 using the code below:

#select and filter
GOTbattlesdf.select("name","year","battle_number","attacker_king","defender_king","attacker_outcome","attacker_commander","defender_commander","location").filter("year == 300").show

filter-DataFrames-in-Spark-Edureka

Now let us find out the tactics used in the wars waged and also find out the total number of wars waged by using each one of those tactics.

#groupBy
sqlContext.sql("select battle_type, count(' ') from battles group by battle_type").show

groupby-DataFrames-in-Spark-Edureka

The ambush type of battles is deadliest ones, here the enemy would never have any clue of an attack. Let us find out the details of the years where which of the kings waged an ambush type of battle against whom and with the details of the commanders of both the kingdoms and the outcome of the attacker.
The following code must help us find these details.

#and
sqlContext.sql("select year, attacker_king, defender_king, attacker_outcome, battle_type, attacker_commander, defender_commander from battles where attacker_outcome == 'win' and battle_type =='ambush'").show

select-DataFrames-in-Spark-Edureka

Let us now focus on the houses and extract the deadliest house amongst the rest. The code below will help us find out the house details and the battles they waged.

#groupBy
sqlContext.sql("select attacker_1, count(' ') from battles group by attacker_1").show

count-DataFrames-in-Spark-Edureka

Now, we shall find out the details of the kings and the total number of battles they fought to visualize the king with highest battles fought.

#select
sqlContext.sql("select attacker_king, count(' ') from battles group by attacker_king").show

count-DataFrames-in-Spark-Edureka

Let us find out the houses which are successful in defending the battles which are against them along with the total number of wars they have to defend their kingdom from. The following code must help us to find those details.

#count
sqlContext.sql("select defender_1, count(' ') from battles group by defender_1").show

count-DataFrames-in-Spark-Edureka

Let us find out the details of the kings and the number of wars they successfully defended their kingdoms from their enemies. The following code can extract those details.

#groupBy
sqlContext.sql("select defender_king, count(' ') from battles group by defender_king").show

groupby-DataFrames-in-Spark-Edureka

Since Lannister house is my personal favorite, let me find out the details of the characters in Lannister house which will describe their name and gender(1 -> male, 0 -> female) along with their respective population. The code below will fetch us the details of all the male characters we have in Lannister house.

#select
val df1 = sqlContext.sql("select Name, Allegiances, Gender from deaths where Allegiances == 'Lannister' and Gender == '1'")

select-DataFrames-in-Spark-Edureka

The code below will fetch us the details of all the female characters we have in Lannister house.

#Select 
val df2 = sqlContext.sql("select Name, Allegiances, Gender from deaths where Allegiances == 'Lannister' and Gender == '0'")

select-DataFrames-in-Spark-Edureka

At the end of the day, Every episode of the game of thrones had a noble character. Let us now find out all the noble characters amongst all the houses we have in our GameOfThrones.csv file.

#where
val df4 = sqlContext.sql("select Name, Allegiances, Gender from deaths where Nobility == '1'")

select-DataFrames-in-Spark-Edureka

None the less, there are some commoners whose role in the Game Of Thrones is exceptional. Let us find out the details of the commoners who are highly inspirational in each episode.

#select and where
val df5 = sqlContext.sql("select Name, Allegiances, Gender from deaths where Nobility == '0'")

select-DataFrames-in-Spark-Edureka

We consider a few roles as important and equally noble. let the writer carry out the characters until the last book. Let us filter out those characters to find the details of each one of them.

#and
val df6 = sqlContext.sql("select Name, Allegiances, Gender from deaths where GoT == '1' and Cok == '1' and SoS == '1' and FfC == '1' and DwD == '1' and Nobility == '1'")

select-DataFrames-in-Spark-Edureka

Amongst all the battles, I found the battles of the last books to be generating more adrenaline in the readers.
Let us find out the details of those final battles by using the following code.

#OrderBy
val dat = GOTbattlesdf.select("name","year","battle_number","attacker_king","defender_king","attacker_outcome","attacker_commander","defender_commander","location")orderBy(desc("battle_number"))
dat.show

select-DataFrames-in-Spark-Edureka

Let us use the following code to drop down all the duplicate details we have about the attacker kings and their respective kings in the final battles fought.

#DropDuplicates
GOTbattlesdf.select("attacker_king","defender_king").dropDuplicates().show()

select-duplicate-DataFrames-in-Spark-Edureka

So, with this, we come to an end of this DataFrames in Spark article. I hope we sparked a little light upon your knowledge about DataFrames, their features and the various types of operations that can be performed on them.

This article based on Apache Spark and Scala Certification Training is designed to prepare you for the Cloudera Hadoop and Spark Developer Certification Exam (CCA175). You will get in-depth knowledge on Apache Spark and the Spark Ecosystem, which includes Spark DataFrames, Spark SQL, Spark MLlib and Spark Streaming. You will get comprehensive knowledge on Scala Programming language, HDFS, Sqoop, Flume, Spark GraphX and Messaging System such as Kafka.

The post Dataframes in Spark: All you need to know about Structured Data Processing appeared first on Edureka.

↧

Exceptions in Selenium – Know How To Handle Exceptions

April 24, 2019, 10:37 pm

≫ Next: What are Comments in Python and how to use them?

≪ Previous: Dataframes in Spark: All you need to know about Structured Data Processing

With the world evolving towards software development, testing plays a vital role in making the process defect free. Automation testing using Selenium is one such tool that helps in finding bugs and resolve them. Exceptions in Selenium is an important concept which helps us in handling errors and avoid software failures. Through this article on Exceptions in Selenium, I will give you a complete insight into the fundamentals and various methods of handling exceptions.

In this article, I will be covering the following topics.

You may also go through this recording of Exceptions in Selenium by our Experts where you can understand the topics in a detailed manner with examples.

Introduction to Exceptions

An exception is an event or a problem that arises during the execution of a program. When an exception occurs, the normal flow of program halts and an exception object is created. The program then tries to find someone that can handle the raised exception. The exception object contains a lot of debugging information, such as method hierarchy, the line number where the exception occurred, the type of exception, etc.

When you start working with Selenium webdriver, you will come across different exceptions based on the code you write. The same code some times work properly and sometimes it simply doesn’t. Whenever you develop any script, you try to give the best quality code that works fine. But Unfortunately, sometimes exceptions come as side effects to the scripts that we develop and tends to fail. That’s why handling an exception is very important.

Exception flow - Exceptions in Selenium - Edureka Exception Handling mechanism follows a flow which is depicted in the above figure. But if an exception is not handled, it may lead to a system failure. That is why handling an exception is very important. Now let’s move further and understand what are the various categories of exceptions.

Checked vs Unchecked Exception

Basically, there are 2 types of exceptions in Selenium and they are as follows:

Checked Exception
Unchecked Exception

Let’s understand these two exceptions in depth.

Checked Exception
It is an exception that occurs at compile time, also called compile time exceptions. If some code within a method throws a checked exception, then the method must either handle the exception or it must specify the exception using throws keyword.
Unchecked Exception
It is an exception that occurs at the time of execution and is called Runtime Exceptions. In C++, all exceptions are unchecked, but in Java, exceptions can be either checked or unchecked. So, it is not forced by the compiler to either handle or specify the exception. It is up to the programmers to specify or catch the exceptions.

Basic Example of Exception


class Exception{
public static void main(String args[]){
try{
//code that may raise an exception</span>
}
catch(Exception e){
// rest of the program
}
}
}

Above code represent an exception wherein inside try block we are going to write a code that may raise an exception and then, that exception will be handled in the catch block. Having understood this, let’s move further and see different types of exceptions that cause disruption to the normal flow of execution of the program.

Types of Exceptions

WebDriverException

WebDriver Exception comes when we try to perform any action on the non-existing driver.

WebDriver driver = new InternetExplorerDriver();
driver.get("http://google.com");
driver.close();
driver.quit();

NoAlertPresentException

When we try to perform an action i.e., either accept() or dismiss() which is not required at a required place; gives us this exception.

try{
driver.switchTo().alert().accept();
}
catch (NoAlertPresentException E){
E.printStackTrace();
}

NoSuchWindowException

When we try to switch to a window which is not present gives us this exception:

WebDriver driver = new InternetExplorerDriver();
driver.get("http://google.com");
driver.switchTo().window("Yup_Fail");
driver.close();

In the above snippet, line 3 throws us an exception, as we are trying to switch to a window that is not present.

NoSuchFrameException

Similar to Window exception, Frame exception mainly comes during switching between the frames.

WebDriver driver = new InternetExplorerDriver();
driver.get("http://google.com");
driver.switchTo().frame("F_fail");
driver.close();

In the above snippet, line 3 throws us an exception, as we are trying to switch to a frame that is not present.

NoSuchElementException

This exception is thrown when we WebDriver doesn’t find the web-element in the DOM.

WebDriver driver = new InternetExplorerDriver();
driver.get("http://google.com");
driver.findElement(By.name("fake")).click();

Now see move ahead in this exceptions in Selenium article and see various methods used to handle Exceptions.

Methods of Handling Exceptions

Try: try block is used to enclose the code that might throw an exception.
Catch: catch block is used to handle the Exception. It must be used after the try block only
Finally: finally block is a block that is used to execute important code such as closing connection, stream etc. It is always executed whether an exception is handled or not.
Throw: throw” keyword is used to throw an exception.
Throws: The “throws” keyword is used to declare exceptions. It doesn’t throw an exception. It specifies that there may occur an exception in the method. It is always used with method signature.

Let’s see a small example to understand how a thrown exception can be handled using exception methods. Let’s have a look at the code.

package Edureka;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;
import org.openqa.selenium.By;
import org.openqa.selenium.NoAlertPresentException;
import org.openqa.selenium.NoSuchElementException;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.openqa.selenium.support.ui.ExpectedConditions;
import org.openqa.selenium.support.ui.Wait;
import org.openqa.selenium.support.ui.WebDriverWait;
public class Exceptionhandling {
public static void main(String[] args) throws InterruptedException, TimeoutException{
System.setProperty("webdriver.firefox.driver", "C:\\Selenium-java-edureka\\chromedriver_win32\\chromedriver.exe");
WebDriver driver = new FirefoxDriver();
WebDriverWait wait = new WebDriverWait(driver, 10);
driver.get("https://www.google.com");
try{
driver.findElement(By.xpath("//*[@id='register']")).click();
}catch (Exception e) {
System.out.println("Register element is not found.");
throw(e);
}
System.out.println("Hello");
}
}

Let’s understand the functionalities of each method in depth.

Try Method

In the above code, I have used a try block to enter the statement that throws the exception. Here, this statement driver.findElement(By.xpath(“//*[@id=’register’]”)).click(); throws exception because in the Google Search page Selenium cannot locate the particular element. So, once the exception is thrown, the normal execution of the program will be interrupted. Now, let’s see how it is handled in the catch block.

Catch Method

The task here is to handle this exception and continue the execution of the program. For that, I have written a catch block which will handle the thrown exception. It should always be written after the try block. So when you execute the program, it will print the rest of the statements.

Throw Method

As you all know throw is a keyword which is used to pass an exception. But, the interesting thing is that irrespective of the fact that catch block has handled the exception or not, throw()will still raise an exception. This results in disrupting the normal flow of execution of the program. In the above code, after the catch block has handled the exception, I have written a statement throw(e); to throw an exception.

Throws Method

Throws declaration is used to declare an exception. For example, It can declare Interrupted exception, timeout exception, etc. This is how various methods can be used to handle the raised exception in the program.

With this, we come to an end of this article on Exceptions in Selenium. I hope you understood the concepts and helped in adding value to your knowledge.

Got a question for us? Please mention it in the comments section of Exceptions in Selenium article and we will get back to you.

The post Exceptions in Selenium – Know How To Handle Exceptions appeared first on Edureka.

↧

Parameters	Automation	RPA
What does it reduce?	Reduces execution time	Reduces manual workforce
Need for a programming knowledge	Required to create test scripts	Mostly not needed as it is wizard-driven
Usage	Used for QA, Production, Performance, UAT environments	Usually used in production environments
What does it automate?	Repetitive test cases i.e a product	Repetitive business process i.e product as well as business
What are the working environments?	Limited working environments	Works on a wide range of environments

UiPath	Automation Anywhere	Blue Prism
Offers Community Edition / Free Edition	Recently launched a Community Edition	No Trial Version
Most Popular Tool	Less Popular than others	Popular than Automation Anywhere
Doesn’t require coding	Doesn’t require programming knowledge as you have activities to use for each and every functionality.	Has a functionality which allows the user to write code, but users can manage without it.
Has free online training and certification programs	Recently launched a certification of 50$.	Provides official certification program
Provides desktop, web and Citrix automation	Reasonable across all mediums.	Designed for Citrix automation for BPO.