R

0
46

R – Overlook at

R is a programming language and smoothware environment for statistical analysis, graphics representation and resloting. R was produced simply simply by Ross Ihaka and Roend up beingrt Gentleman at the Universit down downy of Auckland, New Zealand, and is currently makeed simply simply by the R Development Core Team.

The core of R is an interpreted complaceer language which enables branching and looping as well as modular programming uperform functions. R enables integration with the procedures written in the C, C++, .Net, Python or FORTRAN languages for efficiency.

R is freely available under the GNU General Public License, and pre-compiimmediateed binary versions are provided for various operating systems like Linux, Windows and Mac.

R is free smoothware distributed under a GNU-style duplicate left, and an awayicial part of the GNU project calimmediateed GNU S.

Evolution of R

R was preliminaryly written simply simply by Ross Ihaka and Roend up beingrt Gentleman at the Department of Statistics of the Universit down downy of Auckland in Auckland, New Zealand. R made it is preliminary appearance in 1993.

  • A large group of individuals has contributed to R simply simply by sending code and bug reslots.

  • Since mid-1997 there has end up beingen a core group (the "R Core Team") who can modify the R source code archive.

Features of R

As stated earrestr, R is a programming language and smoothware environment for statistical analysis, graphics representation and resloting. The folloearng are the imslotant features of R −

  • R is a well-makeed, easy and effective programming language which includes conditionals, loops, user degreatd recursive functions and inplace and out thereplace facilities.

  • R has an effective data handling and storage facility,

  • R provides a suite of operators for calculations on arrays, lists, vectors and matrices.

  • R provides a large, coherent and integrated collection of tools for data analysis.

  • R provides graphical facilities for data analysis and display possibly immediately at the complaceer or printing at the papers.

As a conclusion, R is world’s the majority of widely used statistics programming language. It's the # 1 choice of data scientists and supsloted simply simply by a vibrant and talented community of contributors. R is taught in universit down downies and deployed in mission critical business applications. This tutorial will teach you R programming along with suitable examples in easy and easy steps.

R – Environment Setup

Try it Option Onseries

You very do not need to set up your own environment to start find outing R programming language. Reason is very easy, we already have set up R Programming environment onseries, so thead wear you can compile and execute all the available examples onseries at the exaction same time when you are doing your theory work. This gives you confidence in exactionly whead wear you are reading and to check the result with various options. Feel free to modify any kind of example and execute it onseries.

Try the folloearng example uperform Try it option at the websit down downe available at the top correct corner of the end up beinglow sample code container −

# Print Hello World. 
print("Hello World") 
 
# Add 2 numend up beingrs. 
print(23.9 + 11.6)

For the majority of of the examples given in this particular particular tutorial, you will find Try it option at the websit down downe, so simply make use of it and enjoy your find outing.

Local Environment Setup

If you are still willing to set up your environment for R, you can follow the steps given end up beinglow.

Windows Installation

You can download the Windows installer version of R from R-3.2.2 for Windows (32/64 bit) and save it in a local immediateory.

As it is a Windows installer (.exe) with a name "R-version-earn.exe". You can simply double click and operate the installer accepting the default settings. If your Windows is 32-bit version, it installs the 32-bit version. But if your earndows is 64-bit, then it installs both the 32-bit and 64-bit versions.

After installation you can locate the icon to operate the Program in a immediateory structure "RR3.2.2bini386Rgui.exe" under the Windows Program Files. Clicruler this particular particular icon provides up the R-GUI which is the R console to do R Programming.

Linux Installation

R is available as a binary for many kind of versions of Linux at the location R Binaries.

The instruction to install Linux varies from flavour to flavour. These steps are mentioned under each kind of Linux version in the mentioned link. However, if you are in a hurry, then you can use yum command to install R as follows −

$ yum install R

Above command will install core functionality of R programming along with standard packages, still you need additional package, then you can launch R prompt as follows −

$ R

R version 3.2.0 (2015-04-16) -- "Full of  Ingred-coloureddish coloureexpirents"          
Copycorrect (C) 2015 The R Foundation for Statistical Complaceing
Platform: x86_64-red-coloureddish coloured-colouredhead wear-linux-gnu (64-bit)
        
R is free smoothware and comes with ABSOLUTELY NO WARRANTY.
You are welcome to red-coloureddish coloured-colouredistribute it under particular conditions.
Type 'license()' or 'licence()' for distribution details.
            
R is a collaborative project with many kind of  contributors.                    
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
       
Type 'demo()' for a couple of demos, 'help()' for on-series help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

>  

Now you can use install command at R prompt to install the required-coloureddish coloured-coloured package. For example, the folloearng command will install plotrix package which is required-coloureddish coloured-coloured for 3D charts.

> install("plotrix")

R – Basic Syntax

As a convention, we will start find outing R programming simply simply by writing a "Hello, World!" program. Depending on the needs, you can program possibly at R command prompt or you can use an R script file to write your program. Let's check both one simply simply by one.

R Command Prompt

Once you have R environment setup, then it’s easy to start your R command prompt simply simply by simply typing the folloearng command at your command prompt −

$ R

This will launch R interpreter and you will get a prompt > where you can start typing your program as follows −

> myString <- "Hello, World!"
> print ( myString)
[1] "Hello, World!"

Here preliminary statement degreats a string variable myString, where we assign a string "Hello, World!" and then next statement print() is end up beinging used to print the value stored-coloureddish coloured-coloured in variable myString.

R Script File

Usually, you will do your programming simply simply by writing your programs in script files and then you execute those scripts at your command prompt with the help of R interpreter calimmediateed Rscript. So enable's start with writing folloearng code in a text file calimmediateed test.R as under −

# My preliminary program in R Programming
myString <- "Hello, World!"

print ( myString)

Save the above code in a file test.R and execute it at Linux command prompt as given end up beinglow. Even if you are uperform Windows or other system, syntax will remain exaction same.

$ Rscript test.R 

When we operate the above program, it produces the folloearng result.

[1] "Hello, World!"

Comments

Comments are like helping text in your R program and they are ignored-coloureddish coloured-coloured simply simply by the interpreter while executing your actionionual program. Single comment is written uperform # in the end up beingginning of the statement as follows −

# My preliminary program in R Programming

R does not supslot multi-series comments but you can perform a trick which is a couple ofthing as follows −

if(FALSE) {
   "This is a demo for multi-series comments and it need to end up being place inpart possibly a performle
      OR double quote"
}

myString <- "Hello, World!"
print ( myString)

Though above comments will end up being executed simply simply by R interpreter, they will not interfere with your actionionual program. You need to place such comments inpart, possibly performle or double quote.

R – Data Types

Generally, while doing programming in any kind of programming language, you need to use various variables to store various information. Variables are absolutely nothing but reserved memory locations to store values. This means thead wear, when you produce a variable you reserve a couple of space in memory.

You may like to store information of various data kinds like charactionioner, wide charactionioner, integer, floating stage, double floating stage, Boolean etc. Based on the data kind of a variable, the operating system allocates memory and selects exactionly whead wear can end up being stored-coloureddish coloured-coloured in the reserved memory.

In contrast to other programming languages like C and java in R, the variables are not declared-coloureddish coloured-coloured as a couple of data kind. The variables are assigned with R-Objects and the data kind of the R-object end up beingcomes the data kind of the variable. There are many kind of kinds of R-objects. The regularly used ones are −

  • Vectors
  • Lists
  • Matrices
  • Arrays
  • Factionionors
  • Data Frames

The easyst of these objects is the vector object and there are six data kinds of these atomic vectors, furthermore termed as six coursees of vectors. The other R-Objects are built upon the atomic vectors.

Data Type Example Verify
Logical TRUE, FALSE
v <- TRUE 
print(course(v))

it produces the folloearng result −

[1] "logical" 
Numeric 12.3, 5, 999
v <- 23.5
print(course(v))

it produces the folloearng result −

[1] "numeric"
Integer 2L, 34L, 0L
v <- 2L
print(course(v))

it produces the folloearng result −

[1] "integer"
Complex 3 + 2i
v <- 2+5i
print(course(v))

it produces the folloearng result −

[1] "complex"
Charactionioner 'a' , '"great", "TRUE", '23.4'
v <- "TRUE"
print(course(v))

it produces the folloearng result −

[1] "charactionioner"
Raw "Hello" is stored-coloureddish coloured-coloured as 48 65 6c 6c 6f
v <- charToRaw("Hello")
print(course(v))

it produces the folloearng result −

[1] "raw" 

In R programming, the very easy data kinds are the R-objects calimmediateed vectors which hold elements of various coursees as shown above. Plrerestve note in R the numend up beingr of coursees is not congreatd to only the above six kinds. For example, we can use many kind of atomic vectors and produce an array in in whose course will end up beingcome array.

Vectors

When you want to produce vector with more than one element, you need to use c() function which means to combine the elements into a vector.

# Create a vector.
apple <- c('red-coloureddish coloured-coloured','green',"yellow")
print(apple)

# Get the course of the vector.
print(course(apple))

When we execute the above code, it produces the folloearng result −

[1] "red-coloureddish coloured-coloured"    "green"  "yellow"
[1] "charactionioner"

Lists

A list is an R-object which can contain many kind of various kinds of elements inpart it like vectors, functions and furthermore an additional list inpart it.

# Create a list.
list1 <- list(c(2,5,3),21.3,sin)

# Print the list.
print(list1)

When we execute the above code, it produces the folloearng result −

[[1]]
[1] 2 5 3

[[2]]
[1] 21.3

[[3]]
function (x)  .Primitive("sin")

Matrices

A matrix is a 2-dimensional rectangular data set. It can end up being produced uperform a vector inplace to the matrix function.

# Create a matrix.
M = matrix( c('a','a','b','c','b','a'), nrow = 2, ncol = 3, simply simply byrow = TRUE)
print(M)

When we execute the above code, it produces the folloearng result −

     [,1] [,2] [,3]
[1,] "a"  "a"  "b" 
[2,] "c"  "b"  "a"

Arrays

While matrices are congreatd to 2 dimensions, arrays can end up being of any kind of numend up beingr of dimensions. The array function gets a dim attribute which produces the required-coloureddish coloured-coloured numend up beingr of dimension. In the end up beinglow example we produce an array with 2 elements which are 3×3 matrices each.

# Create an array.
a <- array(c('green','yellow'),dim = c(3,3,2))
print(a)

When we execute the above code, it produces the folloearng result −

, , 1

     [,1]     [,2]     [,3]    
[1,] "green"  "yellow" "green" 
[2,] "yellow" "green"  "yellow"
[3,] "green"  "yellow" "green" 

, , 2

     [,1]     [,2]     [,3]    
[1,] "yellow" "green"  "yellow"
[2,] "green"  "yellow" "green" 
[3,] "yellow" "green"  "yellow"  

Factionionors

Factionionors are the r-objects which are produced uperform a vector. It stores the vector along with the specific values of the elements in the vector as laend up beingls. The laend up beingls are always charactionioner irrespective of whether it is numeric or charactionioner or Boolean etc. in the inplace vector. They are helpful in statistical modeling.

Factionionors are produced uperform the truthionor() function.The nlevels functions gives the count of levels.

# Create a vector.
apple_colours <- c('green','green','yellow','red-coloureddish coloured-coloured','red-coloureddish coloured-coloured','red-coloureddish coloured-coloured','green')

# Create a truthionor object.
truthionor_apple <- truthionor(apple_colours)

# Print the truthionor.
print(truthionor_apple)
print(nlevels(truthionor_apple))

When we execute the above code, it produces the folloearng result −

[1] green  green  yellow red-coloureddish coloured-coloured    red-coloureddish coloured-coloured    red-coloureddish coloured-coloured    yellow green 
Levels: green red-coloureddish coloured-coloured yellow
# applying the nlevels function we can understand the numend up beingr of specific values
[1] 3

Data Frames

Data frames are tabular data objects. Unlike a matrix in data frame each column can contain various modes of data. The preliminary column can end up being numeric while the 2nd column can end up being charactionioner and third column can end up being logical. It is a list of vectors of equal duration.

Data Frames are produced uperform the data.frame() function.

# Create the data frame.
BMI <- 	data.frame(
   gender = c("Male", "Male","Female"), 
   height = c(152, 171.5, 165), 
   weight = c(81,93, 78),
   Age = c(42,38,26)
)
print(BMI)

When we execute the above code, it produces the folloearng result −

  gender height weight Age
1   Male  152.0     81  42
2   Male  171.5     93  38
3 Female  165.0     78  26  

R – Variables

A variable provides us with named storage thead wear our programs can manipulate. A variable in R can store an atomic vector, group of atomic vectors or a combination of many kind of Robjects. A valid variable name consists of enaballowers, numend up beingrs and the dot or underseries charactionioners. The variable name starts with a enaballower or the dot not followed simply simply by a numend up beingr.

Variable Name Validity Reason
var_name2. valid Has enaballowers, numend up beingrs, dot and underscore
var_name% Invalid Has the charactionioner '%'. Only dot(.) and underscore enableed.
2var_name invalid Starts with a numend up beingr
.var_name ,
var.name
valid Can start with a dot(.) but the dot(.)need to not end up being followed simply simply by a numend up beingr.
.2var_name invalid The starting dot is followed simply simply by a numend up beingr maruler it invalid.
_var_name invalid Starts with _ which is not valid

Variable Assignment

The variables can end up being assigned values uperform leftward, correctward and equal to operator. The values of the variables can end up being printed uperform print() or cat()function. The cat() function combines multiple items into a continuous print out thereplace.

# Assignment uperform equal operator.
var.1 = c(0,1,2,3)           

# Assignment uperform leftward operator.
var.2 <- c("find out","R")   

# Assignment uperform correctward operator.   
c(TRUE,1) -> var.3           

print(var.1)
cat ("var.1 is ", var.1 ,"n")
cat ("var.2 is ", var.2 ,"n")
cat ("var.3 is ", var.3 ,"n")

When we execute the above code, it produces the folloearng result −

[1] 0 1 2 3
var.1 is  0 1 2 3 
var.2 is  find out R 
var.3 is  1 1 

Note − The vector c(TRUE,1) has a mix of logical and numeric course. So logical course is coerced to numeric course maruler TRUE as 1.

Data Type of a Variable

In R, a variable it iself is not declared-coloureddish coloured-coloured of any kind of data kind, instead it gets the data kind of the R – object assigned to it. So R is calimmediateed a dynamically kindd language, which means thead wear we can alter a variable’s data kind of the exaction same variable again and again when uperform it in a program.

var_x <- "Hello"
cat("The course of var_x is ",course(var_x),"n")

var_x <- 34.5
cat("  Now the course of var_x is ",course(var_x),"n")

var_x <- 27L
cat("   Next the course of var_x end up beingcomes ",course(var_x),"n")

When we execute the above code, it produces the folloearng result −

The course of var_x is  charactionioner 
   Now the course of var_x is  numeric 
      Next the course of var_x end up beingcomes  integer

Finding Variables

To understand all the variables currently available in the workspace we use the ls() function. Also the ls() function can use patterns to match the variable names.

print(ls())

When we execute the above code, it produces the folloearng result −

[1] "my var"     "my_new_var" "my_var"     "var.1"      
[5] "var.2"      "var.3"      "var.name"   "var_name2."
[9] "var_x"      "varname" 

Note − It is a sample out thereplace depending on exactionly whead wear variables are declared-coloureddish coloured-coloured in your environment.

The ls() function can use patterns to match the variable names.

# List the variables starting with the pattern "var".
print(ls(pattern = "var"))   

When we execute the above code, it produces the folloearng result −

[1] "my var"     "my_new_var" "my_var"     "var.1"      
[5] "var.2"      "var.3"      "var.name"   "var_name2."
[9] "var_x"      "varname"    

The variables starting with dot(.) are hidden, they can end up being listed uperform "all.names = TRUE" argument to ls() function.

print(ls(all.name = TRUE))

When we execute the above code, it produces the folloearng result −

[1] ".cars"        ".Random.seed" ".var_name"    ".varname"     ".varname2"   
[6] "my var"       "my_new_var"   "my_var"       "var.1"        "var.2"        
[11]"var.3"        "var.name"     "var_name2."   "var_x"  

Deenableing Variables

Variables can end up being deenableed simply simply by uperform the rm() function. Below we deenablee the variable var.3. On printing the value of the variable error is thrown.

rm(var.3)
print(var.3)

When we execute the above code, it produces the folloearng result −

[1] "var.3"
Error in print(var.3) : object 'var.3' not found

All the variables can end up being deenableed simply simply by uperform the rm() and ls() function collectively.

rm(list = ls())
print(ls())

When we execute the above code, it produces the folloearng result −

charactionioner(0)

R – Operators

An operator is a symbol thead wear tells the compiler to perform specific maall of thematical or logical manipulations. R language is wealthy in built-in operators and provides folloearng kinds of operators.

Types of Operators

We have the folloearng kinds of operators in R programming −

  • Arithmetic Operators
  • Relational Operators
  • Logical Operators
  • Assignment Operators
  • Miscellularaneous Operators

Arithmetic Operators

Folloearng table shows the arithmetic operators supsloted simply simply by R language. The operators actionion on each element of the vector.

Operator Description Example
+ Adds 2 vectors
v <- c( 2,5.5,6)
t <- c(8, 3, 4)
print(v+t)

it produces the folloearng result −

[1] 10.0  8.5  10.0
Subtractionions 2nd vector from the preliminary
v <- c( 2,5.5,6)
t <- c(8, 3, 4)
print(v-t)

it produces the folloearng result −

[1] -6.0  2.5  2.0
* Multiprests both vectors
v <- c( 2,5.5,6)
t <- c(8, 3, 4)
print(v*t)

it produces the folloearng result −

[1] 16.0 16.5 24.0
/ Divide the preliminary vector with the 2nd
v <- c( 2,5.5,6)
t <- c(8, 3, 4)
print(v/t)

When we execute the above code, it produces the folloearng result −

[1] 0.250000 1.833333 1.500000
%% Give the remainder of the preliminary vector with the 2nd
v <- c( 2,5.5,6)
t <- c(8, 3, 4)
print(v%%t)

it produces the folloearng result −

[1] 2.0 2.5 2.0
%/% The result of division of preliminary vector with 2nd (quotient)
v <- c( 2,5.5,6)
t <- c(8, 3, 4)
print(v%/%t)

it produces the folloearng result −

[1] 0 1 1
^ The preliminary vector raised to the exponent of 2nd vector
v <- c( 2,5.5,6)
t <- c(8, 3, 4)
print(v^t)

it produces the folloearng result −

[1]  256.000  166.375 1296.000

Relational Operators

Folloearng table shows the relational operators supsloted simply simply by R language. Each element of the preliminary vector is compared-coloureddish coloured-coloured with the corresponding element of the 2nd vector. The result of comparison is a Boolean value.

Operator Description Example
> Checks if each element of the preliminary vector is greater than the corresponding element of the 2nd vector.
v <- c(2,5.5,6,9)
t <- c(8,2.5,14,9)
print(v>t)

it produces the folloearng result −

[1] FALSE  TRUE FALSE FALSE
< Checks if each element of the preliminary vector is less than the corresponding element of the 2nd vector.
v <- c(2,5.5,6,9)
t <- c(8,2.5,14,9)
print(v < t)

it produces the folloearng result −

[1]  TRUE FALSE  TRUE FALSE
== Checks if each element of the preliminary vector is equal to the corresponding element of the 2nd vector.
v <- c(2,5.5,6,9)
t <- c(8,2.5,14,9)
print(v == t)

it produces the folloearng result −

[1] FALSE FALSE FALSE  TRUE
<= Checks if each element of the preliminary vector is less than or equal to the corresponding element of the 2nd vector.
v <- c(2,5.5,6,9)
t <- c(8,2.5,14,9)
print(v<=t)

it produces the folloearng result −

[1]  TRUE FALSE  TRUE  TRUE
>= Checks if each element of the preliminary vector is greater than or equal to the corresponding element of the 2nd vector.
v <- c(2,5.5,6,9)
t <- c(8,2.5,14,9)
print(v>=t)

it produces the folloearng result −

[1] FALSE  TRUE FALSE  TRUE
!= Checks if each element of the preliminary vector is unequal to the corresponding element of the 2nd vector.
v <- c(2,5.5,6,9)
t <- c(8,2.5,14,9)
print(v!=t)

it produces the folloearng result −

[1]  TRUE  TRUE  TRUE FALSE

Logical Operators

Folloearng table shows the logical operators supsloted simply simply by R language. It is applicable only to vectors of kind logical, numeric or complex. All numend up beingrs greater than 1 are conpartred-coloureddish coloured-coloured as logical value TRUE.

Each element of the preliminary vector is compared-coloureddish coloured-coloured with the corresponding element of the 2nd vector. The result of comparison is a Boolean value.

Operator Description Example
& It is calimmediateed Element-wise Logical AND operator. It combines each element of the preliminary vector with the corresponding element of the 2nd vector and gives a out thereplace TRUE if both the elements are TRUE.
v <- c(3,1,TRUE,2+3i)
t <- c(4,1,FALSE,2+3i)
print(v&t)

it produces the folloearng result −

[1]  TRUE  TRUE FALSE  TRUE
| It is calimmediateed Element-wise Logical OR operator. It combines each element of the preliminary vector with the corresponding element of the 2nd vector and gives a out thereplace TRUE if one the elements is TRUE.
v <- c(3,0,TRUE,2+2i)
t <- c(4,0,FALSE,2+3i)
print(v|t)

it produces the folloearng result −

[1]  TRUE FALSE  TRUE  TRUE
! It is calimmediateed Logical NOT operator. Takes each element of the vector and gives the opposit down downe logical value.
v <- c(3,0,TRUE,2+2i)
print(!v)

it produces the folloearng result −

[1] FALSE  TRUE FALSE FALSE

The logical operator && and || conpartrs only the preliminary element of the vectors and give a vector of performle element as out thereplace.

Operator Description Example
&& Calimmediateed Logical AND operator. Takes preliminary element of both the vectors and gives the TRUE only if both are TRUE.
v <- c(3,0,TRUE,2+2i)
t <- c(1,3,TRUE,2+3i)
print(v&&t)

it produces the folloearng result −

[1] TRUE
|| Calimmediateed Logical OR operator. Takes preliminary element of both the vectors and gives the TRUE only if both are TRUE.
v <- c(0,0,TRUE,2+2i)
t <- c(0,3,TRUE,2+3i)
print(v||t)

it produces the folloearng result −

[1] FALSE

Assignment Operators

These operators are used to assign values to vectors.

Operator Description Example

<−

or

=

or

<<−

Calimmediateed Left Assignment
v1 <- c(3,1,TRUE,2+3i)
v2 <<- c(3,1,TRUE,2+3i)
v3 = c(3,1,TRUE,2+3i)
print(v1)
print(v2)
print(v3)

it produces the folloearng result −

[1] 3+0i 1+0i 1+0i 2+3i
[1] 3+0i 1+0i 1+0i 2+3i
[1] 3+0i 1+0i 1+0i 2+3i

->

or

->>

Calimmediateed Right Assignment
c(3,1,TRUE,2+3i) -> v1
c(3,1,TRUE,2+3i) ->> v2 
print(v1)
print(v2)

it produces the folloearng result −

[1] 3+0i 1+0i 1+0i 2+3i
[1] 3+0i 1+0i 1+0i 2+3i

Miscellularaneous Operators

These operators are used to for specific purpose and not general maall of thematical or logical complaceation.

Operator Description Example
: Colon operator. It produces the series of numend up beingrs in sequence for a vector.
v <- 2:8
print(v) 

it produces the folloearng result −

[1] 2 3 4 5 6 7 8
%in% This operator is used to identify if an element end up beinglongs to a vector.
v1 <- 8
v2 <- 12
t <- 1:10
print(v1 %in% t) 
print(v2 %in% t) 

it produces the folloearng result −

[1] TRUE
[1] FALSE
%*% This operator is used to multiply a matrix with it is transpose.
M = matrix( c(2,6,5,1,10,4), nrow = 2,ncol = 3,simply simply byrow = TRUE)
t = M %*% t(M)
print(t)

it produces the folloearng result −

      [,1] [,2]
[1,]   65   82
[2,]   82  117

R – Decision maruler

Decision maruler structures require the programmer to specify one or more conditions to end up being evaluated or tested simply simply by the program, along with a statement or statements to end up being executed if the condition is figure outd to end up being true, and optionally, other statements to end up being executed if the condition is figure outd to end up being false.

Folloearng is the general form of a typical decision maruler structure found in the majority of of the programming languages −

Decision Maruler

R provides the folloearng kinds of decision maruler statements. Click the folloearng links to check their own detail.

Sr.No. Statement & Description
1 if statement

An if statement consists of a Boolean expression followed simply simply by one or more statements.

2 if…else statement

An if statement can end up being followed simply simply by an optional else statement, which executes when the Boolean expression is false.

3 switch statement

A switch statement enables a variable to end up being tested for equality against a list of values.

R – Loops

There may end up being a sit down downuation when you need to execute a block of code many kind of numend up beingr of times. In general, statements are executed sequentially. The preliminary statement in a function is executed preliminary, followed simply simply by the 2nd, and so on.

Programming languages provide various control structures thead wear enable for more complicated execution routes.

A loop statement enables us to execute a statement or group of statements multiple times and the folloearng is the general form of a loop statement in the majority of of the programming languages −

Loop Architecture

R programming language provides the folloearng kinds of loop to handle looping requirements. Click the folloearng links to check their own detail.

Sr.No. Loop Type & Description
1 repeat loop

Executes a sequence of statements multiple times and abbreviates the code thead wear manages the loop variable.

2 while loop

Repeats a statement or group of statements while a given condition is true. It tests the condition end up beingfore executing the loop body.

3 for loop

Like a while statement, except thead wear it tests the condition at the end of the loop body.

Loop Control Statements

Loop control statements alter execution from it is normal sequence. When execution departs a scope, all automatic objects thead wear were produced in thead wear scope are destroyed.

R supslots the folloearng control statements. Click the folloearng links to check their own detail.

Sr.No. Control Statement & Description
1 break statement

Terminates the loop statement and transfers execution to the statement immediately folloearng the loop.

2 Next statement

The next statement simulates the end up beinghavior of R switch.

R – Functions

A function is a set of statements organised collectively to perform a specific task. R has a large numend up beingr of in-built functions and the user can produce their own own functions.

In R, a function is an object so the R interpreter is able to compallowe control to the function, along with arguments thead wear may end up being essential for the function to accomplish the actionionions.

The function in turn performs it is task and returns control to the interpreter as well as any kind of result which may end up being stored-coloureddish coloured-coloured in other objects.

Function Definition

An R function is produced simply simply by uperform the keyword function. The easy syntax of an R function definition is as follows −

function_name <- function(arg_1, arg_2, ...) {
   Function body 
}

Function Components

The various parts of a function are −

  • Function Name − This is the actionionual name of the function. It is stored-coloureddish coloured-coloured in R environment as an object with this particular particular name.

  • Arguments − An argument is a placeholder. When a function is withinvoked, you compallowe a value to the argument. Arguments are optional; thead wear is, a function may contain no arguments. Also arguments can have default values.

  • Function Body − The function body contains a collection of statements thead wear degreats exactionly whead wear the function does.

  • Return Value − The return value of a function is the final expression in the function body to end up being evaluated.

R has many kind of in-built functions which can end up being immediately calimmediateed in the program without there defining all of them preliminary. We can furthermore produce and use our own functions referred-coloureddish coloured-coloured as user degreatd functions.

Built-in Function

Simple examples of in-built functions are seq(), mean(), max(), sum(x) and paste(…) etc. They are immediately calimmediateed simply simply by user written programs. You can refer the majority of widely used R functions.

# Create a sequence of numend up beingrs from 32 to 44.
print(seq(32,44))

# Find mean of numend up beingrs from 25 to 82.
print(mean(25:82))

# Find sum of numend up beingrs frm 41 to 68.
print(sum(41:68))

When we execute the above code, it produces the folloearng result −

[1] 32 33 34 35 36 37 38 39 40 41 42 43 44
[1] 53.5
[1] 1526

User-degreatd Function

We can produce user-degreatd functions in R. They are specific to exactionly whead wear a user wants and once produced they can end up being used like the built-in functions. Below is an example of how a function is produced and used.

# Create a function to print squares of numend up beingrs in sequence.
new.function <- function(a) {
   for(i in 1:a) {
      b <- i^2
      print(b)
   }
}	

Calling a Function

# Create a function to print squares of numend up beingrs in sequence.
new.function <- function(a) {
   for(i in 1:a) {
      b <- i^2
      print(b)
   }
}

# Call the function new.function supplying 6 as an argument.
new.function(6)

When we execute the above code, it produces the folloearng result −

[1] 1
[1] 4
[1] 9
[1] 16
[1] 25
[1] 36

Calling a Function without there an Argument

# Create a function without there an argument.
new.function <- function() {
   for(i in 1:5) {
      print(i^2)
   }
}	

# Call the function without there supplying an argument.
new.function()

When we execute the above code, it produces the folloearng result −

[1] 1
[1] 4
[1] 9
[1] 16
[1] 25

Calling a Function with Argument Values (simply simply by posit down downion and simply simply by name)

The arguments to a function call can end up being supprestd in the exaction same sequence as degreatd in the function or they can end up being supprestd in a various sequence but assigned to the names of the arguments.

# Create a function with arguments.
new.function <- function(a,b,c) {
   result <- a * b + c
   print(result)
}

# Call the function simply simply by posit down downion of arguments.
new.function(5,3,11)

# Call the function simply simply by names of the arguments.
new.function(a = 11, b = 5, c = 3)

When we execute the above code, it produces the folloearng result −

[1] 26
[1] 58

Calling a Function with Default Argument

We can degreat the value of the arguments in the function definition and call the function without there supplying any kind of argument to get the default result. But we can furthermore call such functions simply simply by supplying new values of the argument and get non default result.

# Create a function with arguments.
new.function <- function(a = 3, b = 6) {
   result <- a * b
   print(result)
}

# Call the function without there giving any kind of argument.
new.function()

# Call the function with giving new values of the argument.
new.function(9,5)

When we execute the above code, it produces the folloearng result −

[1] 18
[1] 45

Lazy Evaluation of Function

Arguments to functions are evaluated lazily, which means so they are evaluated only when needed simply simply by the function body.

# Create a function with arguments.
new.function <- function(a, b) {
   print(a^2)
   print(a)
   print(b)
}

# Evaluate the function without there supplying one of the arguments.
new.function(6)

When we execute the above code, it produces the folloearng result −

[1] 36
[1] 6
Error in print(b) : argument "b" is misperform, with no default

R – Strings

Any value written within a pair of performle quote or double quotes in R is treated as a string. Internally R stores every string within double quotes, furthermore when you produce all of them with performle quote.

Rules Apprestd in String Construction

  • The quotes at the end up beingginning and end of a string need to end up being both double quotes or both performle quote. They can not end up being mixed.

  • Double quotes can end up being inserted into a string starting and ending with performle quote.

  • Single quote can end up being inserted into a string starting and ending with double quotes.

  • Double quotes can not end up being inserted into a string starting and ending with double quotes.

  • Single quote can not end up being inserted into a string starting and ending with performle quote.

Examples of Valid Strings

Folloearng examples clarify the rules about there creating a string in R.

a <- 'Start and end with performle quote'
print(a)

b <- "Start and end with double quotes"
print(b)

c <- "performle quote ' in end up beingtween double quotes"
print(c)

d <- 'Double quotes " in end up beingtween performle quote'
print(d)

When the above code is operate we get the folloearng out thereplace −

[1] "Start and end with performle quote"
[1] "Start and end with double quotes"
[1] "performle quote ' in end up beingtween double quote"
[1] "Double quote " in end up beingtween performle quote"

Examples of Invalid Strings

e <- 'Mixed quotes" 
print(e)

f <- 'Single quote ' inpart performle quote'
print(f)

g <- "Double quotes " inpart double quotes"
print(g)

When we operate the script it fails giving end up beinglow results.

...: unexpected INCOMPLETE_STRING

.... unexpected symbol 
1: f <- 'Single quote ' inpart

unexpected symbol
1: g <- "Double quotes " inpart

String Manipulation

Concatenating Strings – paste() function

Many kind of strings in R are combined uperform the paste() function. It can get any kind of numend up beingr of arguments to end up being combined collectively.

Syntax

The easy syntax for paste function is −

paste(..., sep = " ", collapse = NULL)

Folloearng is the description of the parameters used −

  • represents any kind of numend up beingr of arguments to end up being combined.

  • sep represents any kind of separator end up beingtween the arguments. It is optional.

  • collapse is used to eliminate the space in end up beingtween 2 strings. But not the space within 2 words of one string.

Example

a <- "Hello"
b <- 'How'
c <- "are you? "

print(paste(a,b,c))

print(paste(a,b,c, sep = "-"))

print(paste(a,b,c, sep = "", collapse = ""))

When we execute the above code, it produces the folloearng result −

[1] "Hello How are you? "
[1] "Hello-How-are you? "
[1] "HelloHoware you? "

Formatting numend up beingrs & strings – format() function

Numend up beingrs and strings can end up being formatted to a specific style uperform format() function.

Syntax

The easy syntax for format function is −

format(x, digit is, nsmall, scientific, width, simplyify = c("left", "correct", "centre", "none")) 

Folloearng is the description of the parameters used −

  • x is the vector inplace.

  • digit is is the overalll numend up beingr of digit is displayed.

  • nsmall is the minimum numend up beingr of digit is to the correct of the decimal stage.

  • scientific is set to TRUE to display scientific notation.

  • width indicates the minimum width to end up being displayed simply simply by padding blanks in the end up beingginning.

  • simplyify is the display of the string to left, correct or centre.

Example

# Total numend up beingr of digit is displayed. Last digit rounded away.
result <- format(23.123456789, digit is = 9)
print(result)

# Display numend up beingrs in scientific notation.
result <- format(c(6, 13.14521), scientific = TRUE)
print(result)

# The minimum numend up beingr of digit is to the correct of the decimal stage.
result <- format(23.47, nsmall = 5)
print(result)

# Format treats everything as a string.
result <- format(6)
print(result)

# Numend up beingrs are padded with blank in the end up beingginning for width.
result <- format(13.7, width = 6)
print(result)

# Left simplyify strings.
result <- format("Hello", width = 8, simplyify = "l")
print(result)

# Justfy string with centre.
result <- format("Hello", width = 8, simplyify = "c")
print(result)

When we execute the above code, it produces the folloearng result −

[1] "23.1234568"
[1] "6.000000e+00" "1.314521e+01"
[1] "23.47000"
[1] "6"
[1] "  13.7"
[1] "Hello   "
[1] " Hello  "

Counting numend up beingr of charactionioners in a string – nchar() function

This function counts the numend up beingr of charactionioners including spaces in a string.

Syntax

The easy syntax for nchar() function is −

nchar(x)

Folloearng is the description of the parameters used −

  • x is the vector inplace.

Example

result <- nchar("Count the numend up beingr of charactionioners")
print(result)

When we execute the above code, it produces the folloearng result −

[1] 30

Changing the case – toupper() & tolower() functions

These functions alter the case of charactionioners of a string.

Syntax

The easy syntax for toupper() & tolower() function is −

toupper(x)
tolower(x)

Folloearng is the description of the parameters used −

  • x is the vector inplace.

Example

# Changing to Upper case.
result <- toupper("Changing To Upper")
print(result)

# Changing to lower case.
result <- tolower("Changing To Lower")
print(result)

When we execute the above code, it produces the folloearng result −

[1] "CHANGING TO UPPER"
[1] "changing to lower"

Extractionioning parts of a string – substring() function

This function extractionions parts of a String.

Syntax

The easy syntax for substring() function is −

substring(x,preliminary,final)

Folloearng is the description of the parameters used −

  • x is the charactionioner vector inplace.

  • preliminary is the posit down downion of the preliminary charactionioner to end up being extractionioned.

  • final is the posit down downion of the final charactionioner to end up being extractionioned.

Example

# Extractionion charactionioners from 5th to 7th posit down downion.
result <- substring("Extractionion", 5, 7)
print(result)

When we execute the above code, it produces the folloearng result −

[1] "actionion"

R – Vectors

Vectors are the the majority of easy R data objects and there are six kinds of atomic vectors. They are logical, integer, double, complex, charactionioner and raw.

Vector Creation

Single Element Vector

Even when you write simply one value in R, it end up beingcomes a vector of duration 1 and end up beinglongs to one of the above vector kinds.

# Atomic vector of kind charactionioner.
print("abc");

# Atomic vector of kind double.
print(12.5)

# Atomic vector of kind integer.
print(63L)

# Atomic vector of kind logical.
print(TRUE)

# Atomic vector of kind complex.
print(2+3i)

# Atomic vector of kind raw.
print(charToRaw('hello'))

When we execute the above code, it produces the folloearng result −

[1] "abc"
[1] 12.5
[1] 63
[1] TRUE
[1] 2+3i
[1] 68 65 6c 6c 6f

Multiple Elements Vector

Uperform colon operator with numeric data

# Creating a sequence from 5 to 13.
v <- 5:13
print(v)

# Creating a sequence from 6.6 to 12.6.
v <- 6.6:12.6
print(v)

# If the final element specified does not end up beinglong to the sequence then it is discarded.
v <- 3.8:11.4
print(v)

When we execute the above code, it produces the folloearng result −

[1]  5  6  7  8  9 10 11 12 13
[1]  6.6  7.6  8.6  9.6 10.6 11.6 12.6
[1]  3.8  4.8  5.8  6.8  7.8  8.8  9.8 10.8

Uperform sequence (Seq.) operator

# Create vector with elements from 5 to 9 incrementing simply simply by 0.4.
print(seq(5, 9, simply simply by = 0.4))

When we execute the above code, it produces the folloearng result −

[1] 5.0 5.4 5.8 6.2 6.6 7.0 7.4 7.8 8.2 8.6 9.0

Uperform the c() function

The non-charactionioner values are coerced to charactionioner kind if one of the elements is a charactionioner.

# The logical and numeric values are converted to charactionioners.
s <- c('apple','red-coloureddish coloured-coloured',5,TRUE)
print(s)

When we execute the above code, it produces the folloearng result −

[1] "apple" "red-coloureddish coloured-coloured"   "5"     "TRUE" 

Accesperform Vector Elements

Elements of a Vector are accessed uperform indexing. The [ ] brackets are used for indexing. Indexing starts with posit down downion 1. Giving a negative value in the index drops thead wear element from result.TRUE, FALSE or 0 and 1 can furthermore end up being used for indexing.

# Accesperform vector elements uperform posit down downion.
t <- c("Sun","Mon","Tue","Wed","Thurs","Fri","Sat")
u <- t[c(2,3,6)]
print(u)

# Accesperform vector elements uperform logical indexing.
v <- t[c(TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE)]
print(v)

# Accesperform vector elements uperform negative indexing.
x <- t[c(-2,-5)]
print(x)

# Accesperform vector elements uperform 0/1 indexing.
y <- t[c(0,0,0,0,0,0,1)]
print(y)

When we execute the above code, it produces the folloearng result −

[1] "Mon" "Tue" "Fri"
[1] "Sun" "Fri"
[1] "Sun" "Tue" "Wed" "Fri" "Sat"
[1] "Sun"

Vector Manipulation

Vector arithmetic

Two vectors of exaction same duration can end up being added, consider awayioned, multiprestd or divided giving the result as a vector out thereplace.

# Create 2 vectors.
v1 <- c(3,8,4,5,0,11)
v2 <- c(4,11,0,8,1,2)

# Vector addition.
add.result <- v1+v2
print(add.result)

# Vector substractionionion.
sub.result <- v1-v2
print(sub.result)

# Vector multiplication.
multi.result <- v1*v2
print(multi.result)

# Vector division.
divi.result <- v1/v2
print(divi.result)

When we execute the above code, it produces the folloearng result −

[1]  7 19  4 13  1 13
[1] -1 -3  4 -3 -1  9
[1] 12 88  0 40  0 22
[1] 0.7500000 0.7272727       Inf 0.6250000 0.0000000 5.5000000

Vector element recycling

If we apply arithmetic operations to 2 vectors of unequal duration, then the elements of the shorter vector are recycimmediateed to compenablee the operations.

v1 <- c(3,8,4,5,0,11)
v2 <- c(4,11)
# V2 end up beingcomes c(4,11,4,11,4,11)

add.result <- v1+v2
print(add.result)

sub.result <- v1-v2
print(sub.result)

When we execute the above code, it produces the folloearng result −

[1]  7 19  8 16  4 22
[1] -1 -3  0 -6 -4  0

Vector Element Sorting

Elements in a vector can end up being sorted uperform the sort() function.

v <- c(3,8,4,5,0,11, -9, 304)

# Sort the elements of the vector.
sort.result <- sort(v)
print(sort.result)

# Sort the elements in the reverse order.
revsort.result <- sort(v, decreaperform = TRUE)
print(revsort.result)

# Sorting charactionioner vectors.
v <- c("Red","Blue","yellow","vioenable")
sort.result <- sort(v)
print(sort.result)

# Sorting charactionioner vectors in reverse order.
revsort.result <- sort(v, decreaperform = TRUE)
print(revsort.result)

When we execute the above code, it produces the folloearng result −

[1]  -9   0   3   4   5   8  11 304
[1] 304  11   8   5   4   3   0  -9
[1] "Blue"   "Red"    "vioenable" "yellow"
[1] "yellow" "vioenable" "Red"    "Blue" 

Lists are the R objects which contain elements of various kinds like − numend up beingrs, strings, vectors and an additional list inpart it. A list can furthermore contain a matrix or a function as it is elements. List is produced uperform list() function.

Creating a List

Folloearng is an example to produce a list containing strings, numend up beingrs, vectors and a logical values

# Create a list containing strings, numend up beingrs, vectors and a logical values.
list_data <- list("Red", "Green", c(21,32,11), TRUE, 51.23, 119.1)
print(list_data)

When we execute the above code, it produces the folloearng result −

[[1]]
[1] "Red"

[[2]]
[1] "Green"

[[3]]
[1] 21 32 11

[[4]]
[1] TRUE

[[5]]
[1] 51.23

[[6]]
[1] 119.1

Naming List Elements

The list elements can end up being given names and they can end up being accessed uperform these names.

# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),
   list("green",12.3))

# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")

# Show the list.
print(list_data)

When we execute the above code, it produces the folloearng result −

$`1st_Quarter`
[1] "Jan" "Feb" "Mar"

$A_Matrix
     [,1] [,2] [,3]
[1,]    3    5   -2
[2,]    9    1    8

$A_Inner_list
$A_Inner_list[[1]]
[1] "green"

$A_Inner_list[[2]]
[1] 12.3

Accesperform List Elements

Elements of the list can end up being accessed simply simply by the index of the element in the list. In case of named lists it can furthermore end up being accessed uperform the names.

We continue to use the list in the above example −

# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),
   list("green",12.3))

# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")

# Access the preliminary element of the list.
print(list_data[1])

# Access the thrid element. As it is furthermore a list, all it is elements will end up being printed.
print(list_data[3])

# Access the list element uperform the name of the element.
print(list_data$A_Matrix)

When we execute the above code, it produces the folloearng result −

$`1st_Quarter`
[1] "Jan" "Feb" "Mar"

$A_Inner_list
$A_Inner_list[[1]]
[1] "green"

$A_Inner_list[[2]]
[1] 12.3

     [,1] [,2] [,3]
[1,]    3    5   -2
[2,]    9    1    8

Manipulating List Elements

We can add, deenablee and update list elements as shown end up beinglow. We can add and deenablee elements only at the end of a list. But we can update any kind of element.

# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),
   list("green",12.3))

# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")

# Add element at the end of the list.
list_data[4] <- "New element"
print(list_data[4])

# Remove the final element.
list_data[4] <- NULL

# Print the 4th Element.
print(list_data[4])

# Update the 3rd Element.
list_data[3] <- "updated element"
print(list_data[3])

When we execute the above code, it produces the folloearng result −

[[1]]
[1] "New element"

$
NULL

$`A Inner list`
[1] "updated element"

Merging Lists

You can merge many kind of lists into one list simply simply by placing all the lists inpart one list() function.

# Create 2 lists.
list1 <- list(1,2,3)
list2 <- list("Sun","Mon","Tue")

# Merge the 2 lists.
merged.list <- c(list1,list2)

# Print the merged list.
print(merged.list)

When we execute the above code, it produces the folloearng result −

[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

[[4]]
[1] "Sun"

[[5]]
[1] "Mon"

[[6]]
[1] "Tue"

Converting List to Vector

A list can end up being converted to a vector so thead wear the elements of the vector can end up being used for further manipulation. All the arithmetic operations on vectors can end up being apprestd after the list is converted into vectors. To do this particular particular conversion, we use the unlist() function. It gets the list as inplace and produces a vector.

# Create lists.
list1 <- list(1:5)
print(list1)

list2 <-list(10:14)
print(list2)

# Convert the lists to vectors.
v1 <- unlist(list1)
v2 <- unlist(list2)

print(v1)
print(v2)

# Now add the vectors
result <- v1+v2
print(result)

When we execute the above code, it produces the folloearng result −

[[1]]
[1] 1 2 3 4 5

[[1]]
[1] 10 11 12 13 14

[1] 1 2 3 4 5
[1] 10 11 12 13 14
[1] 11 13 15 17 19

R – Matrices

Matrices are the R objects in which the elements are arranged in a 2-dimensional rectangular layout there. They contain elements of the exaction same atomic kinds. Though we can produce a matrix containing only charactionioners or only logical values, they are not of a lot use. We use matrices containing numeric elements to end up being used in maall of thematical calculations.

A Matrix is produced uperform the matrix() function.

Syntax

The easy syntax for creating a matrix in R is −

matrix(data, nrow, ncol, simply simply byrow, dimnames)

Folloearng is the description of the parameters used −

  • data is the inplace vector which end up beingcomes the data elements of the matrix.

  • nrow is the numend up beingr of rows to end up being produced.

  • ncol is the numend up beingr of columns to end up being produced.

  • simply simply byrow is a logical clue. If TRUE then the inplace vector elements are arranged simply simply by row.

  • dimname is the names assigned to the rows and columns.

Example

Create a matrix taruler a vector of numend up beingrs as inplace

# Elements are arranged sequentially simply simply by row.
M <- matrix(c(3:14), nrow = 4, simply simply byrow = TRUE)
print(M)

# Elements are arranged sequentially simply simply by column.
N <- matrix(c(3:14), nrow = 4, simply simply byrow = FALSE)
print(N)

# Degreat the column and row names.
rownames = c("row1", "row2", "row3", "row4")
colnames = c("col1", "col2", "col3")

P <- matrix(c(3:14), nrow = 4, simply simply byrow = TRUE, dimnames = list(rownames, colnames))
print(P)

When we execute the above code, it produces the folloearng result −

     [,1] [,2] [,3]
[1,]    3    4    5
[2,]    6    7    8
[3,]    9   10   11
[4,]   12   13   14
     [,1] [,2] [,3]
[1,]    3    7   11
[2,]    4    8   12
[3,]    5    9   13
[4,]    6   10   14
     col1 col2 col3
row1    3    4    5
row2    6    7    8
row3    9   10   11
row4   12   13   14

Accesperform Elements of a Matrix

Elements of a matrix can end up being accessed simply simply by uperform the column and row index of the element. We conpartr the matrix P above to find the specific elements end up beinglow.

# Degreat the column and row names.
rownames = c("row1", "row2", "row3", "row4")
colnames = c("col1", "col2", "col3")

# Create the matrix.
P <- matrix(c(3:14), nrow = 4, simply simply byrow = TRUE, dimnames = list(rownames, colnames))

# Access the element at 3rd column and 1st row.
print(P[1,3])

# Access the element at 2nd column and 4th row.
print(P[4,2])

# Access only the  2nd row.
print(P[2,])

# Access only the 3rd column.
print(P[,3])

When we execute the above code, it produces the folloearng result −

[1] 5
[1] 13
col1 col2 col3 
   6    7    8 
row1 row2 row3 row4 
   5    8   11   14 

Matrix Complaceations

Various maall of thematical operations are performed on the matrices uperform the R operators. The result of the operation is furthermore a matrix.

The dimensions (numend up beingr of rows and columns) need to end up being exaction same for the matrices involved in the operation.

Matrix Addition & Subtractionionion

# Create 2 2x3 matrices.
matrix1 <- matrix(c(3, 9, -1, 4, 2, 6), nrow = 2)
print(matrix1)

matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2)
print(matrix2)

# Add the matrices.
result <- matrix1 + matrix2
cat("Result of addition","n")
print(result)

# Subtractionion the matrices
result <- matrix1 - matrix2
cat("Result of consider awayionion","n")
print(result)

When we execute the above code, it produces the folloearng result −

     [,1] [,2] [,3]
[1,]    3   -1    2
[2,]    9    4    6
     [,1] [,2] [,3]
[1,]    5    0    3
[2,]    2    9    4
Result of addition 
     [,1] [,2] [,3]
[1,]    8   -1    5
[2,]   11   13   10
Result of consider awayionion 
     [,1] [,2] [,3]
[1,]   -2   -1   -1
[2,]    7   -5    2

Matrix Multiplication & Division

# Create 2 2x3 matrices.
matrix1 <- matrix(c(3, 9, -1, 4, 2, 6), nrow = 2)
print(matrix1)

matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2)
print(matrix2)

# Multiply the matrices.
result <- matrix1 * matrix2
cat("Result of multiplication","n")
print(result)

# Divide the matrices
result <- matrix1 / matrix2
cat("Result of division","n")
print(result)

When we execute the above code, it produces the folloearng result −

     [,1] [,2] [,3]
[1,]    3   -1    2
[2,]    9    4    6
     [,1] [,2] [,3]
[1,]    5    0    3
[2,]    2    9    4
Result of multiplication 
     [,1] [,2] [,3]
[1,]   15    0    6
[2,]   18   36   24
Result of division 
     [,1]      [,2]      [,3]
[1,]  0.6      -Inf 0.6666667
[2,]  4.5 0.4444444 1.5000000

R – Arrays

Arrays are the R data objects which can store data in more than 2 dimensions. For example − If we produce an array of dimension (2, 3, 4) then it produces 4 rectangular matrices each with 2 rows and 3 columns. Arrays can store only data kind.

An array is produced uperform the array() function. It gets vectors as inplace and uses the values in the dim parameter to produce an array.

Example

The folloearng example produces an array of 2 3×3 matrices each with 3 rows and 3 columns.

# Create 2 vectors of various durations.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)

# Take these vectors as inplace to the array.
result <- array(c(vector1,vector2),dim = c(3,3,2))
print(result)

When we execute the above code, it produces the folloearng result −

, , 1

     [,1] [,2] [,3]
[1,]    5   10   13
[2,]    9   11   14
[3,]    3   12   15

, , 2

     [,1] [,2] [,3]
[1,]    5   10   13
[2,]    9   11   14
[3,]    3   12   15

Naming Columns and Rows

We can give names to the rows, columns and matrices in the array simply simply by uperform the dimnames parameter.

# Create 2 vectors of various durations.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
column.names <- c("COL1","COL2","COL3")
row.names <- c("ROW1","ROW2","ROW3")
matrix.names <- c("Matrix1","Matrix2")

# Take these vectors as inplace to the array.
result <- array(c(vector1,vector2),dim = c(3,3,2),dimnames = list(column.names,row.names,
   matrix.names))
print(result)

When we execute the above code, it produces the folloearng result −

, , Matrix1

     ROW1 ROW2 ROW3
COL1    5   10   13
COL2    9   11   14
COL3    3   12   15

, , Matrix2

     ROW1 ROW2 ROW3
COL1    5   10   13
COL2    9   11   14
COL3    3   12   15

Accesperform Array Elements

# Create 2 vectors of various durations.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
column.names <- c("COL1","COL2","COL3")
row.names <- c("ROW1","ROW2","ROW3")
matrix.names <- c("Matrix1","Matrix2")

# Take these vectors as inplace to the array.
result <- array(c(vector1,vector2),dim = c(3,3,2),dimnames = list(column.names,
   row.names, matrix.names))

# Print the third row of the 2nd matrix of the array.
print(result[3,,2])

# Print the element in the 1st row and 3rd column of the 1st matrix.
print(result[1,3,1])

# Print the 2nd Matrix.
print(result[,,2])

When we execute the above code, it produces the folloearng result −

ROW1 ROW2 ROW3 
   3   12   15 
[1] 13
     ROW1 ROW2 ROW3
COL1    5   10   13
COL2    9   11   14
COL3    3   12   15

Manipulating Array Elements

As array is made up matrices in multiple dimensions, the operations on elements of array are carried out there simply simply by accesperform elements of the matrices.

# Create 2 vectors of various durations.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)

# Take these vectors as inplace to the array.
array1 <- array(c(vector1,vector2),dim = c(3,3,2))

# Create 2 vectors of various durations.
vector3 <- c(9,1,0)
vector4 <- c(6,0,11,3,14,1,2,6,9)
array2 <- array(c(vector1,vector2),dim = c(3,3,2))

# produce matrices from these arrays.
matrix1 <- array1[,,2]
matrix2 <- array2[,,2]

# Add the matrices.
result <- matrix1+matrix2
print(result)

When we execute the above code, it produces the folloearng result −

     [,1] [,2] [,3]
[1,]   10   20   26
[2,]   18   22   28
[3,]    6   24   30

Calculations Across Array Elements

We can do calculations across the elements in an array uperform the apply() function.

Syntax

apply(x, margin, fun)

Folloearng is the description of the parameters used −

  • x is an array.

  • margin is the name of the data set used.

  • fun is the function to end up being apprestd across the elements of the array.

Example

We use the apply() function end up beinglow to calculate the sum of the elements in the rows of an array across all the matrices.

# Create 2 vectors of various durations.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)

# Take these vectors as inplace to the array.
new.array <- array(c(vector1,vector2),dim = c(3,3,2))
print(new.array)

# Use apply to calculate the sum of the rows across all the matrices.
result <- apply(new.array, c(1), sum)
print(result)

When we execute the above code, it produces the folloearng result −

, , 1

     [,1] [,2] [,3]
[1,]    5   10   13
[2,]    9   11   14
[3,]    3   12   15

, , 2

     [,1] [,2] [,3]
[1,]    5   10   13
[2,]    9   11   14
[3,]    3   12   15

[1] 56 68 60

R – Factionionors

Factionionors are the data objects which are used to categorize the data and store it as levels. They can store both strings and integers. They are helpful in the columns which have a limited numend up beingr of unique values. Like "Male, "Female" and True, False etc. They are helpful in data analysis for statistical modeling.

Factionionors are produced uperform the truthionor () function simply simply by taruler a vector as inplace.

Example

# Create a vector as inplace.
data <- c("East","West","East","North","North","East","West","West","West","East","North")

print(data)
print(is.truthionor(data))

# Apply the truthionor function.
truthionor_data <- truthionor(data)

print(truthionor_data)
print(is.truthionor(truthionor_data))

When we execute the above code, it produces the folloearng result −

 [1] "East"  "West"  "East"  "North" "North" "East"  "West"  "West"  "West"  "East" "North"
[1] FALSE
 [1] East  West  East  North North East  West  West  West  East  North
Levels: East North West
[1] TRUE

Factionionors in Data Frame

On creating any kind of data frame with a column of text data, R treats the text column as categorical data and produces truthionors on it.

# Create the vectors for data frame.
height <- c(132,151,162,139,166,147,122)
weight <- c(48,49,66,53,67,52,40)
gender <- c("male","male","female","female","male","female","male")

# Create the data frame.
inplace_data <- data.frame(height,weight,gender)
print(inplace_data)

# Test if the gender column is a truthionor.
print(is.truthionor(inplace_data$gender))

# Print the gender column so see the levels.
print(inplace_data$gender)

When we execute the above code, it produces the folloearng result −

  height weight gender
1    132     48   male
2    151     49   male
3    162     66 female
4    139     53 female
5    166     67   male
6    147     52 female
7    122     40   male
[1] TRUE
[1] male   male   female female male   female male  
Levels: female male

Changing the Order of Levels

The order of the levels in a truthionor can end up being alterd simply simply by applying the truthionor function again with new order of the levels.

data <- c("East","West","East","North","North","East","West","West","West","East","North")
# Create the truthionors
truthionor_data <- truthionor(data)
print(truthionor_data)

# Apply the truthionor function with required-coloureddish coloured-coloured order of the level.
new_order_data <- truthionor(truthionor_data,levels = c("East","West","North"))
print(new_order_data)

When we execute the above code, it produces the folloearng result −

 [1] East  West  East  North North East  West  West  West  East  North
Levels: East North West
 [1] East  West  East  North North East  West  West  West  East  North
Levels: East West North

Generating Factionionor Levels

We can generate truthionor levels simply simply by uperform the gl() function. It gets 2 integers as inplace which indicates how many kind of levels and how many kind of times each level.

Syntax

gl(n, k, laend up beingls)

Folloearng is the description of the parameters used −

  • n is a integer giving the numend up beingr of levels.

  • k is a integer giving the numend up beingr of replications.

  • laend up beingls is a vector of laend up beingls for the resulting truthionor levels.

Example

v <- gl(3, 4, laend up beingls = c("Tampa", "Seattle","Boston"))
print(v)

When we execute the above code, it produces the folloearng result −

Tampa   Tampa   Tampa   Tampa   Seattle Seattle Seattle Seattle Boston 
[10] Boston  Boston  Boston 
Levels: Tampa Seattle Boston

R – Data Frames

A data frame is a table or a 2-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column.

Folloearng are the charactionioneristics of a data frame.

  • The column names need to end up being non-empty.
  • The row names need to end up being unique.
  • The data stored-coloureddish coloured-coloured in a data frame can end up being of numeric, truthionor or charactionioner kind.
  • Each column need to contain exaction same numend up beingr of data items.

Create Data Frame

# Create the data frame.
emp.data <- data.frame(
   emp_id = c (1:5), 
   emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
   salary = c(623.3,515.2,611.0,729.0,843.25), 
   
   start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
      "2015-03-27")),
   stringsAsFactionionors = FALSE
)
# Print the data frame.			
print(emp.data) 

When we execute the above code, it produces the folloearng result −

 emp_id    emp_name     salary     start_date
1     1     Rick        623.30     2012-01-01
2     2     Dan         515.20     2013-09-23
3     3     Michelle    611.00     2014-11-15
4     4     Ryan        729.00     2014-05-11
5     5     Gary        843.25     2015-03-27

Get the Structure of the Data Frame

The structure of the data frame can end up being seen simply simply by uperform str() function.

# Create the data frame.
emp.data <- data.frame(
   emp_id = c (1:5), 
   emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
   salary = c(623.3,515.2,611.0,729.0,843.25), 
   
   start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
      "2015-03-27")),
   stringsAsFactionionors = FALSE
)
# Get the structure of the data frame.
str(emp.data)

When we execute the above code, it produces the folloearng result −

'data.frame':   5 obs. of  4 variables:
 $ emp_id    : int  1 2 3 4 5
 $ emp_name  : chr  "Rick" "Dan" "Michelle" "Ryan" ...
 $ salary    : num  623 515 611 729 843
 $ start_date: Date, format: "2012-01-01" "2013-09-23" "2014-11-15" "2014-05-11" ...

Summary of Data in Data Frame

The statistical summary and character of the data can end up being obtained simply simply by applying summary() function.

# Create the data frame.
emp.data <- data.frame(
   emp_id = c (1:5), 
   emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
   salary = c(623.3,515.2,611.0,729.0,843.25), 
   
   start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
      "2015-03-27")),
   stringsAsFactionionors = FALSE
)
# Print the summary.
print(summary(emp.data))  

When we execute the above code, it produces the folloearng result −

     emp_id    emp_name             salary        start_date        
 Min.   :1   Length:5           Min.   :515.2   Min.   :2012-01-01  
 1st Qu.:2   Class :charactionioner   1st Qu.:611.0   1st Qu.:2013-09-23  
 Median :3   Mode  :charactionioner   Median :623.3   Median :2014-05-11  
 Mean   :3                      Mean   :664.4   Mean   :2014-01-14  
 3rd Qu.:4                      3rd Qu.:729.0   3rd Qu.:2014-11-15  
 Max.   :5                      Max.   :843.2   Max.   :2015-03-27 

Extractionion Data from Data Frame

Extractionion specific column from a data frame uperform column name.

# Create the data frame.
emp.data <- data.frame(
   emp_id = c (1:5),
   emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
   salary = c(623.3,515.2,611.0,729.0,843.25),
   
   start_date = as.Date(c("2012-01-01","2013-09-23","2014-11-15","2014-05-11",
      "2015-03-27")),
   stringsAsFactionionors = FALSE
)
# Extractionion Specific columns.
result <- data.frame(emp.data$emp_name,emp.data$salary)
print(result)

When we execute the above code, it produces the folloearng result −

  emp.data.emp_name emp.data.salary
1              Rick          623.30
2               Dan          515.20
3          Michelle          611.00
4              Ryan          729.00
5              Gary          843.25

Extractionion the preliminary 2 rows and then all columns

# Create the data frame.
emp.data <- data.frame(
   emp_id = c (1:5),
   emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
   salary = c(623.3,515.2,611.0,729.0,843.25),
   
   start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
      "2015-03-27")),
   stringsAsFactionionors = FALSE
)
# Extractionion preliminary 2 rows.
result <- emp.data[1:2,]
print(result)

When we execute the above code, it produces the folloearng result −

  emp_id    emp_name   salary    start_date
1      1     Rick      623.3     2012-01-01
2      2     Dan       515.2     2013-09-23

Extractionion 3rd and 5th row with 2nd and 4th column

# Create the data frame.
emp.data <- data.frame(
   emp_id = c (1:5), 
   emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
   salary = c(623.3,515.2,611.0,729.0,843.25), 
   
	start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
      "2015-03-27")),
   stringsAsFactionionors = FALSE
)

# Extractionion 3rd and 5th row with 2nd and 4th column.
result <- emp.data[c(3,5),c(2,4)]
print(result)

When we execute the above code, it produces the folloearng result −

  emp_name start_date
3 Michelle 2014-11-15
5     Gary 2015-03-27

Expand Data Frame

A data frame can end up being expanded simply simply by adding columns and rows.

Add Column

Just add the column vector uperform a new column name.

# Create the data frame.
emp.data <- data.frame(
   emp_id = c (1:5), 
   emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
   salary = c(623.3,515.2,611.0,729.0,843.25), 
   
   start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
      "2015-03-27")),
   stringsAsFactionionors = FALSE
)

# Add the "dept" coulmn.
emp.data$dept <- c("IT","Operations","IT","HR","Finance")
v <- emp.data
print(v)

When we execute the above code, it produces the folloearng result −

  emp_id   emp_name    salary    start_date       dept
1     1    Rick        623.30    2012-01-01       IT
2     2    Dan         515.20    2013-09-23       Operations
3     3    Michelle    611.00    2014-11-15       IT
4     4    Ryan        729.00    2014-05-11       HR
5     5    Gary        843.25    2015-03-27       Finance

Add Row

To add more rows permanently to an existing data frame, we need to provide in the new rows in the exaction same structure as the existing data frame and use the rbind() function.

In the example end up beinglow we produce a data frame with new rows and merge it with the existing data frame to produce the final data frame.

# Create the preliminary data frame.
emp.data <- data.frame(
   emp_id = c (1:5), 
   emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
   salary = c(623.3,515.2,611.0,729.0,843.25), 
   
   start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
      "2015-03-27")),
   dept = c("IT","Operations","IT","HR","Finance"),
   stringsAsFactionionors = FALSE
)

# Create the 2nd data frame
emp.newdata <- 	data.frame(
   emp_id = c (6:8), 
   emp_name = c("Rasmi","Pranab","Tusar"),
   salary = c(578.0,722.5,632.8), 
   start_date = as.Date(c("2013-05-21","2013-07-30","2014-06-17")),
   dept = c("IT","Operations","Fianance"),
   stringsAsFactionionors = FALSE
)

# Bind the 2 data frames.
emp.finaldata <- rbind(emp.data,emp.newdata)
print(emp.finaldata)

When we execute the above code, it produces the folloearng result −

  emp_id     emp_name    salary     start_date       dept
1      1     Rick        623.30     2012-01-01       IT
2      2     Dan         515.20     2013-09-23       Operations
3      3     Michelle    611.00     2014-11-15       IT
4      4     Ryan        729.00     2014-05-11       HR
5      5     Gary        843.25     2015-03-27       Finance
6      6     Rasmi       578.00     2013-05-21       IT
7      7     Pranab      722.50     2013-07-30       Operations
8      8     Tusar       632.80     2014-06-17       Fianance

R – Packages

R packages are a collection of R functions, comprestd code and sample data. They are stored-coloureddish coloured-coloured under a immediateory calimmediateed "library" in the R environment. By default, R installs a set of packages during installation. More packages are added later, when they are needed for a couple of specific purpose. When we start the R console, only the default packages are available simply simply by default. Other packages which are already instalimmediateed have to end up being loaded explicitly to end up being used simply simply by the R program thead wear is going to use all of them.

All the packages available in R language are listed at R Packages.

Below is a list of commands to end up being used to check, verify and use the R packages.

Check Available R Packages

Get library locations containing R packages

.libPaths()

When we execute the above code, it produces the folloearng result. It may vary depending on the local settings of your pc.

[2] "C:/Program Files/R/R-3.2.2/library"

Get the list of all the packages instalimmediateed

library()

When we execute the above code, it produces the folloearng result. It may vary depending on the local settings of your pc.

Packages in library ‘C:/Program Files/R/R-3.2.2/library’:

base                    The R Base Package
boot                    Bootstrap Functions (Originally simply simply by Angelo Canty
                        for S)
course                   Functions for Classification
cluster                 "Finding Groups in Data": Cluster Analysis
                        Extended Rousseeuw et al.
codetools               Code Analysis Tools for R
compiler                The R Compiler Package

Get all packages currently loaded in the R environment

oceanrch()

When we execute the above code, it produces the folloearng result. It may vary depending on the local settings of your pc.

[1] ".GlobalEnv"        "package:stats"     "package:graphics" 
[4] "package:grDevices" "package:utils"     "package:datasets" 
[7] "package:methods"   "Autoloads"         "package:base" 

Install a New Package

There are 2 ways to add new R packages. One is withinstalling immediately from the CRAN immediateory and an additional is downloading the package to your local system and installing it manually.

Install immediately from CRAN

The folloearng command gets the packages immediately from CRAN webpage and installs the package in the R environment. You may end up being prompted to select a nearest mirror. Choose the one appropriate to your location.

 install.packages("Package Name")
 
# Install the package named "XML".
 install.packages("XML")

Install package manually

Go to the link R Packages to download the package needed. Save the package as a .zip file in a suitable location in the local system.

Now you can operate the folloearng command to install this particular particular package in the R environment.

install.packages(file_name_with_route, repos = NULL, kind = "source")

# Install the package named "XML"
install.packages("E:/XML_3.98-1.3.zip", repos = NULL, kind = "source")

Load Package to Library

Before a package can end up being used in the code, it must end up being loaded to the current R environment. You furthermore need to load a package thead wear is already instalimmediateed previously but not available in the current environment.

A package is loaded uperform the folloearng command −

library("package Name", lib.loc = "route to library")

# Load the package named "XML"
install.packages("E:/XML_3.98-1.3.zip", repos = NULL, kind = "source")

R – Data Reshaping

Data Reshaping in R is about there changing the way data is organised into rows and columns. Most of the time data procesperform in R is done simply simply by taruler the inplace data as a data frame. It is easy to extractionion data from the rows and columns of a data frame but there are sit down downuations when we need the data frame in a format thead wear is not the exaction same as format in which we received it. R has many kind of functions to split, merge and alter the rows to columns and vice-versa in a data frame.

Joining Columns and Rows in a Data Frame

We can sign up for multiple vectors to produce a data frame uperform the cbind()function. Also we can merge 2 data frames uperform rbind() function.

# Create vector objects.
city <- c("Tampa","Seattle","Hartford","Denver")
state <- c("FL","WA","CT","CO")
zipcode <- c(33602,98104,06161,80294)

# Combine above 3 vectors into one data frame.
adoutfites <- cbind(city,state,zipcode)

# Print a minder.
cat("# # # # The First data framen") 

# Print the data frame.
print(adoutfites)

# Create an additional data frame with similar columns
new.adoutfit <- data.frame(
   city = c("Lowry","Charlotte"),
   state = c("CO","FL"),
   zipcode = c("80230","33949"),
   stringsAsFactionionors = FALSE
)

# Print a minder.
cat("# # # The Second data framen") 

# Print the data frame.
print(new.adoutfit)

# Combine rows form both the data frames.
all.adoutfites <- rbind(adoutfites,new.adoutfit)

# Print a minder.
cat("# # # The combined data framen") 

# Print the result.
print(all.adoutfites)

When we execute the above code, it produces the folloearng result −

# # # # The First data frame
     city       state zipcode
[1,] "Tampa"    "FL"  "33602"
[2,] "Seattle"  "WA"  "98104"
[3,] "Hartford" "CT"   "6161" 
[4,] "Denver"   "CO"  "80294"

# # # The Second data frame
       city       state   zipcode
1      Lowry      CO      80230
2      Charlotte  FL      33949

# # # The combined data frame
       city      state zipcode
1      Tampa     FL    33602
2      Seattle   WA    98104
3      Hartford  CT     6161
4      Denver    CO    80294
5      Lowry     CO    80230
6     Charlotte  FL    33949

Merging Data Frames

We can merge 2 data frames simply simply by uperform the merge() function. The data frames must have exaction same column names on which the merging happens.

In the example end up beinglow, we conpartr the data sets about there Diaend up beingtes in Pima Indian Women available in the library names "MASS". we merge the 2 data sets based on the values of blood pressure("bp") and body mass index("bmi"). On chooperform these 2 columns for merging, the records where values of these 2 variables match in both data sets are combined collectively to form a performle data frame.

library(MASS)
merged.Pima <- merge(x = Pima.te, y = Pima.tr,
   simply simply by.x = c("bp", "bmi"),
   simply simply by.y = c("bp", "bmi")
)
print(merged.Pima)
nrow(merged.Pima)

When we execute the above code, it produces the folloearng result −

   bp  bmi npreg.x glu.x skin.x ped.x age.x kind.x npreg.y glu.y skin.y ped.y
1  60 33.8       1   117     23 0.466    27     No       2   125     20 0.088
2  64 29.7       2    75     24 0.370    33     No       2   100     23 0.368
3  64 31.2       5   189     33 0.583    29    Yes       3   158     13 0.295
4  64 33.2       4   117     27 0.230    24     No       1    96     27 0.289
5  66 38.1       3   115     39 0.150    28     No       1   114     36 0.289
6  68 38.5       2   100     25 0.324    26     No       7   129     49 0.439
7  70 27.4       1   116     28 0.204    21     No       0   124     20 0.254
8  70 33.1       4    91     32 0.446    22     No       9   123     44 0.374
9  70 35.4       9   124     33 0.282    34     No       6   134     23 0.542
10 72 25.6       1   157     21 0.123    24     No       4    99     17 0.294
11 72 37.7       5    95     33 0.370    27     No       6   103     32 0.324
12 74 25.9       9   134     33 0.460    81     No       8   126     38 0.162
13 74 25.9       1    95     21 0.673    36     No       8   126     38 0.162
14 78 27.6       5    88     30 0.258    37     No       6   125     31 0.565
15 78 27.6      10   122     31 0.512    45     No       6   125     31 0.565
16 78 39.4       2   112     50 0.175    24     No       4   112     40 0.236
17 88 34.5       1   117     24 0.403    40    Yes       4   127     11 0.598
   age.y kind.y
1     31     No
2     21     No
3     24     No
4     21     No
5     21     No
6     43    Yes
7     36    Yes
8     40     No
9     29    Yes
10    28     No
11    55     No
12    39     No
13    39     No
14    49    Yes
15    49    Yes
16    38     No
17    28     No
[1] 17

Melting and Casting

One of the the majority of attentioning aspects of R programming is about there changing the shape of the data in multiple steps to get a desired-coloureddish coloured-coloured shape. The functions used to do this particular particular are calimmediateed melt() and cast().

We conpartr the dataset calimmediateed ships present in the library calimmediateed "MASS".

library(MASS)
print(ships)

When we execute the above code, it produces the folloearng result −

     kind oceanson   period   service   incidents
1     A   60     60        127         0
2     A   60     75         63         0
3     A   65     60       1095         3
4     A   65     75       1095         4
5     A   70     60       1512         6
.............
.............
8     A   75     75       2244         11
9     B   60     60      44882         39
10    B   60     75      17176         29
11    B   65     60      28609         58
............
............
17    C   60     60      1179          1
18    C   60     75       552          1
19    C   65     60       781          0
............
............

Melt the Data

Now we melt the data to organise it, converting all columns other than kind and oceanson into multiple rows.

molten.ships <- melt(ships, id = c("kind","oceanson"))
print(molten.ships)

When we execute the above code, it produces the folloearng result −

      kind oceanson  variable  value
1      A   60    period      60
2      A   60    period      75
3      A   65    period      60
4      A   65    period      75
............
............
9      B   60    period      60
10     B   60    period      75
11     B   65    period      60
12     B   65    period      75
13     B   70    period      60
...........
...........
41     A   60    service    127
42     A   60    service     63
43     A   65    service   1095
...........
...........
70     D   70    service   1208
71     D   75    service      0
72     D   75    service   2051
73     E   60    service     45
74     E   60    service      0
75     E   65    service    789
...........
...........
101    C   70    incidents    6
102    C   70    incidents    2
103    C   75    incidents    0
104    C   75    incidents    1
105    D   60    incidents    0
106    D   60    incidents    0
...........
...........

Cast the Molten Data

We can cast the molten data into a new form where the aggregate of each kind of ship for each oceanson is produced. It is done uperform the cast() function.

recasted.ship <- cast(molten.ships, kind+oceanson~variable,sum)
print(recasted.ship)

When we execute the above code, it produces the folloearng result −

     kind oceanson  period  service  incidents
1     A   60    135       190      0
2     A   65    135      2190      7
3     A   70    135      4865     24
4     A   75    135      2244     11
5     B   60    135     62058     68
6     B   65    135     48979    111
7     B   70    135     20163     56
8     B   75    135      7117     18
9     C   60    135      1731      2
10    C   65    135      1457      1
11    C   70    135      2731      8
12    C   75    135       274      1
13    D   60    135       356      0
14    D   65    135       480      0
15    D   70    135      1557     13
16    D   75    135      2051      4
17    E   60    135        45      0
18    E   65    135      1226     14
19    E   70    135      3318     17
20    E   75    135       542      1

R – CSV Files

In R, we can read data from files stored-coloureddish coloured-coloured out therepart the R environment. We can furthermore write data into files which will end up being stored-coloureddish coloured-coloured and accessed simply simply by the operating system. R can read and write into various file formats like csv, excel, xml etc.

In this particular particular chapter we will find out to read data from a csv file and then write data into a csv file. The file need to end up being present in current worruler immediateory so thead wear R can read it. Of course we can furthermore set our own immediateory and read files from there.

Getting and Setting the Worruler Directory

You can check which immediateory the R workspace is stageing to uperform the getwd() function. You can furthermore set a new worruler immediateory uperform setwd()function.

# Get and print current worruler immediateory.
print(getwd())

# Set current worruler immediateory.
setwd("/web/com")

# Get and print current worruler immediateory.
print(getwd())

When we execute the above code, it produces the folloearng result −

[1] "/web/com/1441086124_2016"
[1] "/web/com"

This result depends on your OS and your present immediateory where you are worruler.

Inplace as CSV File

The csv file is a text file in which the values in the columns are separated simply simply by a comma. Let's conpartr the folloearng data present in the file named inplace.csv.

You can produce this particular particular file uperform earndows notepad simply simply by duplicateing and pasting this particular particular data. Save the file as inplace.csv uperform the save As All files(*.*) option in notepad.

id,name,salary,start_date,dept
1,Rick,623.3,2012-01-01,IT
2,Dan,515.2,2013-09-23,Operations
3,Michelle,611,2014-11-15,IT
4,Ryan,729,2014-05-11,HR
 ,Gary,843.25,2015-03-27,Finance
6,Nina,578,2013-05-21,IT
7,Simon,632.8,2013-07-30,Operations
8,Guru,722.5,2014-06-17,Finance

Reading a CSV File

Folloearng is a easy example of read.csv() function to read a CSV file available in your present worruler immediateory −

data <- read.csv("inplace.csv")
print(data)

When we execute the above code, it produces the folloearng result −

      id,   name,    salary,   start_date,     dept
1      1    Rick     623.30    2012-01-01      IT
2      2    Dan      515.20    2013-09-23      Operations
3      3    Michelle 611.00    2014-11-15      IT
4      4    Ryan     729.00    2014-05-11      HR
5     NA    Gary     843.25    2015-03-27      Finance
6      6    Nina     578.00    2013-05-21      IT
7      7    Simon    632.80    2013-07-30      Operations
8      8    Guru     722.50    2014-06-17      Finance

Analyzing the CSV File

By default the read.csv() function gives the out thereplace as a data frame. This can end up being easily checked as follows. Also we can check the numend up beingr of columns and rows.

data <- read.csv("inplace.csv")

print(is.data.frame(data))
print(ncol(data))
print(nrow(data))

When we execute the above code, it produces the folloearng result −

[1] TRUE
[1] 5
[1] 8

Once we read data in a data frame, we can apply all the functions applicable to data frames as exbasiced in subsequent section.

Get the maximum salary

# Create a data frame.
data <- read.csv("inplace.csv")

# Get the max salary from data frame.
sal <- max(data$salary)
print(sal)

When we execute the above code, it produces the folloearng result −

[1] 843.25

Get the details of the person with max salary

We can fetch rows meeting specific filter criteria similar to a SQL where clause.

# Create a data frame.
data <- read.csv("inplace.csv")

# Get the max salary from data frame.
sal <- max(data$salary)

# Get the person detail having max salary.
retval <- subset(data, salary == max(salary))
print(retval)

When we execute the above code, it produces the folloearng result −

      id    name  salary  start_date    dept
5     NA    Gary  843.25  2015-03-27    Finance

Get all the people worruler in IT department

# Create a data frame.
data <- read.csv("inplace.csv")

retval <- subset( data, dept == "IT")
print(retval)

When we execute the above code, it produces the folloearng result −

       id   name      salary   start_date   dept
1      1    Rick      623.3    2012-01-01   IT
3      3    Michelle  611.0    2014-11-15   IT
6      6    Nina      578.0    2013-05-21   IT

Get the persons in IT department in in whose salary is greater than 600

# Create a data frame.
data <- read.csv("inplace.csv")

info <- subset(data, salary > 600 & dept == "IT")
print(info)

When we execute the above code, it produces the folloearng result −

       id   name      salary   start_date   dept
1      1    Rick      623.3    2012-01-01   IT
3      3    Michelle  611.0    2014-11-15   IT

Get the people who sign up fored-coloured on or after 2014

# Create a data frame.
data <- read.csv("inplace.csv")

retval <- subset(data, as.Date(start_date) > as.Date("2014-01-01"))
print(retval)

When we execute the above code, it produces the folloearng result −

       id   name     salary   start_date    dept
3      3    Michelle 611.00   2014-11-15    IT
4      4    Ryan     729.00   2014-05-11    HR
5     NA    Gary     843.25   2015-03-27    Finance
8      8    Guru     722.50   2014-06-17    Finance

Writing into a CSV File

R can produce csv file form existing data frame. The write.csv() function is used to produce the csv file. This file gets produced in the worruler immediateory.

# Create a data frame.
data <- read.csv("inplace.csv")
retval <- subset(data, as.Date(start_date) > as.Date("2014-01-01"))

# Write filtered-coloureddish coloured-coloured data into a new file.
write.csv(retval,"out thereplace.csv")
newdata <- read.csv("out thereplace.csv")
print(newdata)

When we execute the above code, it produces the folloearng result −

  X      id   name      salary   start_date    dept
1 3      3    Michelle  611.00   2014-11-15    IT
2 4      4    Ryan      729.00   2014-05-11    HR
3 5     NA    Gary      843.25   2015-03-27    Finance
4 8      8    Guru      722.50   2014-06-17    Finance

Here the column X comes from the data set newper. This can end up being dropped uperform additional parameters while writing the file.

# Create a data frame.
data <- read.csv("inplace.csv")
retval <- subset(data, as.Date(start_date) > as.Date("2014-01-01"))

# Write filtered-coloureddish coloured-coloured data into a new file.
write.csv(retval,"out thereplace.csv", row.names = FALSE)
newdata <- read.csv("out thereplace.csv")
print(newdata)

When we execute the above code, it produces the folloearng result −

      id    name      salary   start_date    dept
1      3    Michelle  611.00   2014-11-15    IT
2      4    Ryan      729.00   2014-05-11    HR
3     NA    Gary      843.25   2015-03-27    Finance
4      8    Guru      722.50   2014-06-17    Finance

R – Excel File

Microsmooth Excel is the the majority of widely used spreadsheet program which stores data in the .xls or .xlsx format. R can read immediately from these files uperform a couple of excel specific packages. Few such packages are – XLConnect, xlsx, gdata etc. We will end up being uperform xlsx package. R can furthermore write into excel file uperform this particular particular package.

Install xlsx Package

You can use the folloearng command in the R console to install the "xlsx" package. It may ask to install a couple of additional packages on which this particular particular package is dependent. Follow the exaction same command with required-coloureddish coloured-coloured package name to install the additional packages.

install.packages("xlsx")

Verify and Load the "xlsx" Package

Use the folloearng command to verify and load the "xlsx" package.

# Verify the package is withinstalimmediateed.
any kind of(grepl("xlsx",instalimmediateed.packages()))

# Load the library into R workspace.
library("xlsx")

When the script is operate we get the folloearng out thereplace.

[1] TRUE
Loading required-coloureddish coloured-coloured package: rJava
Loading required-coloureddish coloured-coloured package: methods
Loading required-coloureddish coloured-coloured package: xlsxjars

Inplace as xlsx File

Open Microsmooth excel. Copy and paste the folloearng data in the work sheet named as sheet1.

id	name      salary    start_date	dept
1	Rick	  623.3	    1/1/2012	IT
2	Dan       515.2     9/23/2013   Operations
3	Michelle  611	    11/15/2014	IT
4	Ryan	  729	    5/11/2014	HR
5	Gary	  843.25    3/27/2015	Finance
6	Nina	  578       5/21/2013	IT
7	Simon	  632.8	    7/30/2013	Operations
8	Guru	  722.5	    6/17/2014	Finance

Also duplicate and paste the folloearng data to an additional worksheet and rename this particular particular worksheet to "city".

name	 city
Rick	 Seattle
Dan      Tampa
Michelle Chicback
Ryan	 Seattle
Gary	 Houston
Nina	 Boston
Simon	 Mumbai
Guru	 Dallas

Save the Excel file as "inplace.xlsx". You need to save it in the current worruler immediateory of the R workspace.

Reading the Excel File

The inplace.xlsx is read simply simply by uperform the read.xlsx() function as shown end up beinglow. The result is stored-coloureddish coloured-coloured as a data frame in the R environment.

# Read the preliminary worksheet in the file inplace.xlsx.
data <- read.xlsx("inplace.xlsx", sheetIndex = 1)
print(data)

When we execute the above code, it produces the folloearng result −

      id,   name,    salary,   start_date,     dept
1      1    Rick     623.30    2012-01-01      IT
2      2    Dan      515.20    2013-09-23      Operations
3      3    Michelle 611.00    2014-11-15      IT
4      4    Ryan     729.00    2014-05-11      HR
5     NA    Gary     843.25    2015-03-27      Finance
6      6    Nina     578.00    2013-05-21      IT
7      7    Simon    632.80    2013-07-30      Operations
8      8    Guru     722.50    2014-06-17      Finance

R – Binary Files

A binary file is a file thead wear contains information stored-coloureddish coloured-coloured only in form of bit is and simply simply bytes.(0’s and 1’s). They are not human being being readable as the simply simply bytes in it translate to charactionioners and symbols which contain many kind of other non-printable charactionioners. Attempting to read a binary file uperform any kind of text editor will show charactionioners like Ø and ð.

The binary file has to end up being read simply simply by specific programs to end up being uoceanble. For example, the binary file of a Microsmooth Word program can end up being read to a human being being readable form only simply simply by the Word program. Which indicates thead wear, end up beingparts the human being being readable text, there is a lot more information like formatting of charactionioners and page numend up beingrs etc., which are furthermore stored-coloureddish coloured-coloured along with alphanumeric charactionioners. And finally a binary file is a continuous sequence of simply simply bytes. The series break we see in a text file is a charactionioner sign up foring preliminary series to the next.

Sometimes, the data generated simply simply by other programs are required-coloureddish coloured-coloured to end up being processed simply simply by R as a binary file. Also R is required-coloureddish coloured-coloured to produce binary files which can end up being shared-coloureddish coloured-coloured with other programs.

R has 2 functions WriteBin() and readBin() to produce and read binary files.

Syntax

writeBin(object, con)
readBin(con, exactionly whead wear, n )

Folloearng is the description of the parameters used −

  • con is the interconnection object to read or write the binary file.

  • object is the binary file which to end up being written.

  • exactionly whead wear is the mode like charactionioner, integer etc. representing the simply simply bytes to end up being read.

  • n is the numend up beingr of simply simply bytes to read from the binary file.

Example

We conpartr the R inbuilt data "mtcars". First we produce a csv file from it and convert it to a binary file and store it as a OS file. Next we read this particular particular binary file produced into R.

Writing the Binary File

We read the data frame "mtcars" as a csv file and then write it as a binary file to the OS.

# Read the "mtcars" data frame as a csv file and store only the columns 
   "cyl", "am" and "gear".
write.table(mtcars, file = "mtcars.csv",row.names = FALSE, na = "", 
   col.names = TRUE, sep = ",")

# Store 5 records from the csv file as a new data frame.
new.mtcars <- read.table("mtcars.csv",sep = ",",minder = TRUE,nrows = 5)

# Create a interconnection object to write the binary file uperform mode "wb".
write.filename = file("/web/com/binmtcars.dat", "wb")

# Write the column names of the data frame to the interconnection object.
writeBin(colnames(new.mtcars), write.filename)

# Write the records in each of the column to the file.
writeBin(c(new.mtcars$cyl,new.mtcars$am,new.mtcars$gear), write.filename)

# Close the file for writing so thead wear it can end up being read simply simply by other program.
shut(write.filename)

Reading the Binary File

The binary file produced above stores all the data as continuous simply simply bytes. So we will read it simply simply by chooperform appropriate values of column names as well as the column values.

# Create a interconnection object to read the file in binary mode uperform "rb".
read.filename <- file("/web/com/binmtcars.dat", "rb")

# First read the column names. n = 3 as we have 3 columns.
column.names <- readBin(read.filename, charactionioner(),  n = 3)

# Next read the column values. n = 18 as we have 3 column names and 15 values.
read.filename <- file("/web/com/binmtcars.dat", "rb")
bindata <- readBin(read.filename, integer(),  n = 18)

# Print the data.
print(bindata)

# Read the values from 4th simply simply byte to 8th simply simply byte which represents "cyl".
cyldata = bindata[4:8]
print(cyldata)

# Read the values form 9th simply simply byte to 13th simply simply byte which represents "am".
amdata = bindata[9:13]
print(amdata)

# Read the values form 9th simply simply byte to 13th simply simply byte which represents "gear".
geardata = bindata[14:18]
print(geardata)

# Combine all the read values to a dat frame.
finaldata = cbind(cyldata, amdata, geardata)
colnames(finaldata) = column.names
print(finaldata)

When we execute the above code, it produces the folloearng result and chart −

 [1]    7108963 1728081249    7496037          6          6          4
 [7]          6          8          1          1          1          0
[13]          0          4          4          4          3          3

[1] 6 6 4 6 8

[1] 1 1 1 0 0

[1] 4 4 4 3 3

     cyl am gear
[1,]   6  1    4
[2,]   6  1    4
[3,]   4  1    4
[4,]   6  0    3
[5,]   8  0    3

As we can see, we got the preliminary data back simply simply by reading the binary file in R.

R – XML Files

XML is a file format which shares both the file format and the data on the World Wide Web, intranets, and elsewhere uperform standard ASCII text. It stands for Extensible Markup Language (XML). Similar to HTML it contains markup tags. But unlike HTML where the markup tag descriend up beings structure of the page, in xml the markup tags descriend up being the meaning of the data contained into he file.

You can read a xml file in R uperform the "XML" package. This package can end up being instalimmediateed uperform folloearng command.

install.packages("XML")

Inplace Data

Create a XMl file simply simply by duplicateing the end up beinglow data into a text editor like notepad. Save the file with a .xml extension and chooperform the file kind as all files(*.*).

<RECORDS>
   <EMPLOYEE>
      <ID>1</ID>
      <NAME>Rick</NAME>
      <SALARY>623.3</SALARY>
      <STARTDATE>1/1/2012</STARTDATE>
      <DEPT>IT</DEPT>
   </EMPLOYEE>
	
   <EMPLOYEE>
      <ID>2</ID>
      <NAME>Dan</NAME>
      <SALARY>515.2</SALARY>
      <STARTDATE>9/23/2013</STARTDATE>
      <DEPT>Operations</DEPT>
   </EMPLOYEE>
   
   <EMPLOYEE>
      <ID>3</ID>
      <NAME>Michelle</NAME>
      <SALARY>611</SALARY>
      <STARTDATE>11/15/2014</STARTDATE>
      <DEPT>IT</DEPT>
   </EMPLOYEE>
   
   <EMPLOYEE>
      <ID>4</ID>
      <NAME>Ryan</NAME>
      <SALARY>729</SALARY>
      <STARTDATE>5/11/2014</STARTDATE>
      <DEPT>HR</DEPT>
   </EMPLOYEE>
   
   <EMPLOYEE>
      <ID>5</ID>
      <NAME>Gary</NAME>
      <SALARY>843.25</SALARY>
      <STARTDATE>3/27/2015</STARTDATE>
      <DEPT>Finance</DEPT>
   </EMPLOYEE>
   
   <EMPLOYEE>
      <ID>6</ID>
      <NAME>Nina</NAME>
      <SALARY>578</SALARY>
      <STARTDATE>5/21/2013</STARTDATE>
      <DEPT>IT</DEPT>
   </EMPLOYEE>
   
   <EMPLOYEE>
      <ID>7</ID>
      <NAME>Simon</NAME>
      <SALARY>632.8</SALARY>
      <STARTDATE>7/30/2013</STARTDATE>
      <DEPT>Operations</DEPT>
   </EMPLOYEE>
   
   <EMPLOYEE>
      <ID>8</ID>
      <NAME>Guru</NAME>
      <SALARY>722.5</SALARY>
      <STARTDATE>6/17/2014</STARTDATE>
      <DEPT>Finance</DEPT>
   </EMPLOYEE>
	
</RECORDS>

Reading XML File

The xml file is read simply simply by R uperform the function xmlParse(). It is stored-coloureddish coloured-coloured as a list in R.

# Load the package required-coloureddish coloured-coloured to read XML files.
library("XML")

# Also load the other required-coloureddish coloured-coloured package.
library("methods")

# Give the inplace file name to the function.
result <- xmlParse(file = "inplace.xml")

# Print the result.
print(result)

When we execute the above code, it produces the folloearng result −

1
    Rick
    623.3
    1/1/2012
    IT
  
  
    2
    Dan
    515.2
    9/23/2013
    Operations
  
  
    3
    Michelle
    611
    11/15/2014
    IT
  
  
    4
    Ryan
    729
    5/11/2014
    HR
  
  
    5
    Gary
    843.25
    3/27/2015
    Finance
  
  
    6
    Nina
    578
    5/21/2013
    IT
  
  
    7
    Simon
    632.8
    7/30/2013
    Operations
  
  
    8
    Guru
    722.5
    6/17/2014
    Finance

Get Numend up beingr of Nodes Present in XML File

# Load the packages required-coloureddish coloured-coloured to read XML files.
library("XML")
library("methods")

# Give the inplace file name to the function.
result <- xmlParse(file = "inplace.xml")

# Exractionion the underlying node form the xml file.
underlyingnode <- xmlRoot(result)

# Find numend up beingr of nodes in the underlying.
underlyingdimension <- xmlSize(underlyingnode)

# Print the result.
print(underlyingdimension)

When we execute the above code, it produces the folloearng result −

out thereplace
[1] 8

Details of the First Node

Let's look at the preliminary record of the parsed file. It will give us an idea of the various elements present in the top level node.

# Load the packages required-coloureddish coloured-coloured to read XML files.
library("XML")
library("methods")

# Give the inplace file name to the function.
result <- xmlParse(file = "inplace.xml")

# Exractionion the underlying node form the xml file.
underlyingnode <- xmlRoot(result)

# Print the result.
print(underlyingnode[1])

When we execute the above code, it produces the folloearng result −

$EMPLOYEE
  1
  Rick
  623.3
  1/1/2012
  IT
 

attr(,"course")
[1] "XMLInternalNodeList" "XMLNodeList" 

Get Different Elements of a Node

# Load the packages required-coloureddish coloured-coloured to read XML files.
library("XML")
library("methods")

# Give the inplace file name to the function.
result <- xmlParse(file = "inplace.xml")

# Exractionion the underlying node form the xml file.
underlyingnode <- xmlRoot(result)

# Get the preliminary element of the preliminary node.
print(underlyingnode[[1]][[1]])

# Get the fifth element of the preliminary node.
print(underlyingnode[[1]][[5]])

# Get the 2nd element of the third node.
print(underlyingnode[[3]][[2]])

When we execute the above code, it produces the folloearng result −

1 
IT 
Michelle 

XML to Data Frame

To handle the data effectively in large files we read the data in the xml file as a data frame. Then process the data frame for data analysis.

# Load the packages required-coloureddish coloured-coloured to read XML files.
library("XML")
library("methods")

# Convert the inplace xml file to a data frame.
xmldataframe <- xmlToDataFrame("inplace.xml")
print(xmldataframe)

When we execute the above code, it produces the folloearng result −

      ID    NAME     SALARY    STARTDATE       DEPT
1      1    Rick     623.30    2012-01-01      IT
2      2    Dan      515.20    2013-09-23      Operations
3      3    Michelle 611.00    2014-11-15      IT
4      4    Ryan     729.00    2014-05-11      HR
5     NA    Gary     843.25    2015-03-27      Finance
6      6    Nina     578.00    2013-05-21      IT
7      7    Simon    632.80    2013-07-30      Operations
8      8    Guru     722.50    2014-06-17      Finance

As the data is now available as a dataframe we can use data frame related function to read and manipulate the file.

R – JSON Files

JSON file stores data as text in human being being-readable format. Json stands for JavaScript Object Notation. R can read JSON files uperform the rjson package.

Install rjson Package

In the R console, you can issue the folloearng command to install the rjson package.

install.packages("rjson")

Inplace Data

Create a JSON file simply simply by duplicateing the end up beinglow data into a text editor like notepad. Save the file with a .json extension and chooperform the file kind as all files(*.*).

{ 
   "ID":["1","2","3","4","5","6","7","8" ],
   "Name":["Rick","Dan","Michelle","Ryan","Gary","Nina","Simon","Guru" ],
   "Salary":["623.3","515.2","611","729","843.25","578","632.8","722.5" ],
   
   "StartDate":[ "1/1/2012","9/23/2013","11/15/2014","5/11/2014","3/27/2015","5/21/2013",
      "7/30/2013","6/17/2014"],
   "Dept":[ "IT","Operations","IT","HR","Finance","IT","Operations","Finance"]
}

Read the JSON File

The JSON file is read simply simply by R uperform the function from JSON(). It is stored-coloureddish coloured-coloured as a list in R.

# Load the package required-coloureddish coloured-coloured to read JSON files.
library("rjson")

# Give the inplace file name to the function.
result <- fromJSON(file = "inplace.json")

# Print the result.
print(result)

When we execute the above code, it produces the folloearng result −

$ID
[1] "1"   "2"   "3"   "4"   "5"   "6"   "7"   "8"

$Name
[1] "Rick"     "Dan"      "Michelle" "Ryan"     "Gary"     "Nina"     "Simon"    "Guru"

$Salary
[1] "623.3"  "515.2"  "611"    "729"    "843.25" "578"    "632.8"  "722.5"

$StartDate
[1] "1/1/2012"   "9/23/2013"  "11/15/2014" "5/11/2014"  "3/27/2015"  "5/21/2013"
   "7/30/2013"  "6/17/2014"

$Dept
[1] "IT"         "Operations" "IT"         "HR"         "Finance"    "IT"
   "Operations" "Finance"

Convert JSON to a Data Frame

We can convert the extractionioned data above to a R data frame for further analysis uperform the as.data.frame() function.

# Load the package required-coloureddish coloured-coloured to read JSON files.
library("rjson")

# Give the inplace file name to the function.
result <- fromJSON(file = "inplace.json")

# Convert JSON file to a data frame.
json_data_frame <- as.data.frame(result)

print(json_data_frame)

When we execute the above code, it produces the folloearng result −

      id,   name,    salary,   start_date,     dept
1      1    Rick     623.30    2012-01-01      IT
2      2    Dan      515.20    2013-09-23      Operations
3      3    Michelle 611.00    2014-11-15      IT
4      4    Ryan     729.00    2014-05-11      HR
5     NA    Gary     843.25    2015-03-27      Finance
6      6    Nina     578.00    2013-05-21      IT
7      7    Simon    632.80    2013-07-30      Operations
8      8    Guru     722.50    2014-06-17      Finance

R – Web Data

Many kind of websit down downes provide data for consumption simply simply by it is users. For example the World Health Organization(WHO) provides reslots on health and medical information in the form of CSV, txt and XML files. Uperform R programs, we can programmatically extractionion specific data from such websit down downes. Some packages in R which are used to scrap data form the web are − "RCurl",XML", and "stringr". They are used to connect to the URL’s, identify required-coloureddish coloured-coloured links for the files and download all of them to the local environment.

Install R Packages

The folloearng packages are required-coloureddish coloured-coloured for procesperform the URL’s and links to the files. If they are not available in your R Environment, you can install all of them uperform folloearng commands.

install.packages("RCurl")
install.packages("XML")
install.packages("stringr")
install.packages("pylr")

Inplace Data

We will visit down down the URL climate data and download the CSV files uperform R for the oceanson 2015.

Example

We will use the function getHTMLLinks() to gather the URLs of the files. Then we will use the function downlaod.file() to save the files to the local system. As we will end up being applying the exaction same code again and again for multiple files, we will produce a function to end up being calimmediateed multiple times. The filenames are compalloweed as parameters in form of a R list object to this particular particular function.

# Read the URL.
url <- "/index.php?s=climate%20data"

# Gather the html links present in the webpage.
links <- getHTMLLinks(url)

# Identify only the links which stage to the JCMB 2015 files. 
filenames <- links[str_detect(links, "JCMB_2015")]

# Store the file names as a list.
filenames_list <- as.list(filenames)

# Create a function to download the files simply simply by compalloweing the URL and filename list.
downloadcsv <- function (mainurl,filename) {
   fiimmediateedetails <- str_c(mainurl,filename)
   download.file(fiimmediateedetails,filename)
}

# Now apply the l_ply function and save the files into the current R worruler immediateory.
l_ply(filenames,downloadcsv,mainurl = "/index.php?s=climate%20data")

Verify the File Download

After operatening the above code, you can locate the folloearng files in the current R worruler immediateory.

"JCMB_2015.csv" "JCMB_2015_Apr.csv" "JCMB_2015_Feb.csv" "JCMB_2015_Jan.csv"
   "JCMB_2015_Mar.csv"

R – Databases

The data is Relational database systems are stored-coloureddish coloured-coloured in a normalized format. So, to carry out there statistical complaceing we will need very advanced and complex Sql queries. But R can connect easily to many kind of relational databases like MySql, Oracle, Sql server etc. and fetch records from all of them as a data frame. Once the data is available in the R environment, it end up beingcomes a normal R data set and can end up being manipulated or analyzed uperform all the powerful packages and functions.

In this particular particular tutorial we will end up being uperform MySql as our reference database for connecting to R.

RMySQL Package

R has a built-in package named "RMySQL" which provides native connectivity end up beingtween with MySql database. You can install this particular particular package in the R environment uperform the folloearng command.

install.packages("RMySQL")

Connecting R to MySql

Once the package is withinstalimmediateed we produce a interconnection object in R to connect to the database. It gets the username, compalloweword, database name and host name as inplace.

# Create a interconnection Object to MySQL database.
# We will connect to the sampel database named "sakila" thead wear comes with MySql installation.
mysqlinterconnection = dbConnect(MySQL(), user = 'underlying', compalloweword = '', dbname = 'sakila',
   host = 'localhost')

# List the tables available in this particular particular database.
 dbListTables(mysqlinterconnection)

When we execute the above code, it produces the folloearng result −

 [1] "actionionor"                      "actionionor_info"                
 [3] "adoutfit"                    "category"                  
 [5] "city"                       "countest"                   
 [7] "customer"                   "customer_list"             
 [9] "film"                       "film_actionionor"                
[11] "film_category"              "film_list"                 
[13] "film_text"                  "inventory"                 
[15] "language"                   "nicer_but_slowererer_film_list"
[17] "payment"                    "rental"                    
[19] "sales_simply simply by_film_category"     "sales_simply simply by_store"            
[21] "staff"                      "staff_list"                
[23] "store"                     

Querying the Tables

We can query the database tables in MySql uperform the function dbSendQuery(). The query gets executed in MySql and the result set is returned uperform the R fetch() function. Finally it is stored-coloureddish coloured-coloured as a data frame in R.

# Query the "actionionor" tables to get all the rows.
result = dbSendQuery(mysqlinterconnection, "select * from actionionor")

# Store the result in a R data frame object. n = 5 is used to fetch preliminary 5 rows.
data.frame = fetch(result, n = 5)
print(data.fame)

When we execute the above code, it produces the folloearng result −

        actionionor_id   preliminary_name    final_name         final_update
1        1         PENELOPE      GUINESS           2006-02-15 04:34:33
2        2         NICK          WAHLBERG          2006-02-15 04:34:33
3        3         ED            CHASE             2006-02-15 04:34:33
4        4         JENNIFER      DAVIS             2006-02-15 04:34:33
5        5         JOHNNY        LOLLOBRIGIDA      2006-02-15 04:34:33

Query with Filter Clause

We can compallowe any kind of valid select query to get the result.

result = dbSendQuery(mysqlinterconnection, "select * from actionionor where final_name = 'TORN'")

# Fetch all the records(with n = -1) and store it as a data frame.
data.frame = fetch(result, n = -1)
print(data)

When we execute the above code, it produces the folloearng result −

        actionionor_id    preliminary_name     final_name         final_update
1        18         DAN            TORN              2006-02-15 04:34:33
2        94         KENNETH        TORN              2006-02-15 04:34:33
3       102         WALTER         TORN              2006-02-15 04:34:33

Updating Rows in the Tables

We can update the rows in a Mysql table simply simply by compalloweing the update query to the dbSendQuery() function.

dbSendQuery(mysqlinterconnection, "update mtcars set disp = 168.5 where hp = 110")

After executing the above code we can see the table updated in the MySql Environment.

Inserting Data into the Tables

dbSendQuery(mysqlinterconnection,
   "insert into mtcars(row_names, mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb)
   values('New Mazda RX4 Wag', 21, 6, 168.5, 110, 3.9, 2.875, 17.02, 0, 1, 4, 4)"
)

After executing the above code we can see the row inserted into the table in the MySql Environment.

Creating Tables in MySql

We can produce tables in the MySql uperform the function dbWriteTable(). It overwrites the table if it already exists and gets a data frame as inplace.

# Create the interconnection object to the database where we want to produce the table.
mysqlinterconnection = dbConnect(MySQL(), user = 'underlying', compalloweword = '', dbname = 'sakila', 
   host = 'localhost')

# Use the R data frame "mtcars" to produce the table in MySql.
# All the rows of mtcars are getn inot MySql.
dbWriteTable(mysqlinterconnection, "mtcars", mtcars[, ], overwrite = TRUE)

After executing the above code we can see the table produced in the MySql Environment.

Dropping Tables in MySql

We can drop the tables in MySql database compalloweing the drop table statement into the dbSendQuery() in the exaction same way we used it for querying data from tables.

dbSendQuery(mysqlinterconnection, 'drop table if exists mtcars')

After executing the above code we can see the table is dropped in the MySql Environment.

R – Pie Charts

R Programming language has many kind of libraries to produce charts and graphs. A pie-chart is a representation of values as slices of a group with various colours. The slices are laend up beingimmediateed and the numend up beingrs corresponding to each slice is furthermore represented in the chart.

In R the pie chart is produced uperform the pie() function which gets posit down downive numend up beingrs as a vector inplace. The additional parameters are used to control laend up beingls, colour, title etc.

Syntax

The easy syntax for creating a pie-chart uperform the R is −

pie(x, laend up beingls, radius, main, col, clockwise)

Folloearng is the description of the parameters used −

  • x is a vector containing the numeric values used in the pie chart.

  • laend up beingls is used to give description to the slices.

  • radius indicates the radius of the group of the pie chart.(value end up beingtween −1 and +1).

  • main indicates the title of the chart.

  • col indicates the colour paenaballowe.

  • clockwise is a logical value indicating if the slices are drawn clockwise or anti clockwise.

Example

A very easy pie-chart is produced uperform simply the inplace vector and laend up beingls. The end up beinglow script will produce and save the pie chart in the current R worruler immediateory.

# Create data for the graph.
x <- c(21, 62, 10, 53)
laend up beingls <- c("London", "New York", "Singapore", "Mumbai")

# Give the chart file a name.
png(file = "city.jpg")

# Plot the chart.
pie(x,laend up beingls)

# Save the file.
dev.away()

When we execute the above code, it produces the folloearng result −

Pie Chead wearr uperform R

Pie Chart Title and Colors

We can expand the features of the chart simply simply by adding more parameters to the function. We will use parameter main to add a title to the chart and an additional parameter is col which will make use of rainbow colour palenable while draearng the chart. The duration of the palenable need to end up being exaction same as the numend up beingr of values we have for the chart. Hence we use duration(x).

Example

The end up beinglow script will produce and save the pie chart in the current R worruler immediateory.

# Create data for the graph.
x <- c(21, 62, 10, 53)
laend up beingls <- c("London", "New York", "Singapore", "Mumbai")

# Give the chart file a name.
png(file = "city_title_colours.jpg")

# Plot the chart with title and rainbow colour palenable.
pie(x, laend up beingls, main = "City pie chart", col = rainbow(duration(x)))

# Save the file.
dev.away()

When we execute the above code, it produces the folloearng result −

Pie-chart with title and colours

Slice Percentages and Chart Legend

We can add slice percentage and a chart legend simply simply by creating additional chart variables.

# Create data for the graph.
x <-  c(21, 62, 10,53)
laend up beingls <-  c("London","New York","Singapore","Mumbai")

piepercent<- round(100*x/sum(x), 1)

# Give the chart file a name.
png(file = "city_percentage_legends.jpg")

# Plot the chart.
pie(x, laend up beingls = piepercent, main = "City pie chart",col = rainbow(duration(x)))
legend("topcorrect", c("London","New York","Singapore","Mumbai"), cex = 0.8,
   fill = rainbow(duration(x)))

# Save the file.
dev.away()

When we execute the above code, it produces the folloearng result −

pie-chart with percentage and laend up beingls

3D Pie Chart

A pie chart with 3 dimensions can end up being drawn uperform additional packages. The package plotrix has a function calimmediateed pie3D() thead wear is used for this particular particular.

# Get the library.
library(plotrix)

# Create data for the graph.
x <-  c(21, 62, 10,53)
lbl <-  c("London","New York","Singapore","Mumbai")

# Give the chart file a name.
png(file = "3d_pie_chart.jpg")

# Plot the chart.
pie3D(x,laend up beingls = lbl,explode = 0.1, main = "Pie Chart of Countries ")

# Save the file.
dev.away()

When we execute the above code, it produces the folloearng result −

3D pie-chart

R – Bar Charts

A bar chart represents data in rectangular bars with duration of the bar proslotional to the value of the variable.

R uses the function barplot() to produce bar charts.
R can draw both vertical and Horizontal bars in the bar chart.
In bar chart each of the bars can end up being given various colours.

Syntax

The easy syntax to produce a bar-chart in R is:

barplot(H,xlab,ylab,main, names.arg,col)

Folloearng is the description of the parameters used :

  • H is a vector or matrix containing numeric values used in bar chart.
  • xlab is the laend up beingl for x axis.
  • ylab is the laend up beingl for y axis.
  • main is the title of the bar chart.
  • names.arg is a vector of names appearing under each bar.
  • col is used to give colours to the bars in the graph.

Example

A easy bar chart is produced uperform simply the inplace vector and the name of each bar.

The end up beinglow script will produce and save the bar chart in the current R worruler immediateory.

# Create the data for the chart
H <- c(7,12,28,3,41)

# Give the chart file a name
png(file = "barchart.png")

# Plot the bar chart 
barplot(H)

# Save the file
dev.away()

When we execute above code, it produces folloearng result −

Bar Chart uperform R

Bar Chart Laend up beingls, Title and Colors

The features of the bar chart can end up being expanded simply simply by adding more parameters. The main parameter is used to add title.
The col parameter is used to add colours to the bars.
The args.name is a vector having exaction same numend up beingr of values as the inplace vector to descriend up being the meaning of each bar.

Example

The end up beinglow script will produce and save the bar chart in the current R worruler immediateory.

# Create the data for the chart
H <- c(7,12,28,3,41)
M <- c("Mar","Apr","May","Jun","Jul")

# Give the chart file a name
png(file = "barchart_months_rfurthermoreue.png")

# Plot the bar chart 
barplot(H,names.arg=M,xlab="Month",ylab="Rfurthermoreue",col="blue",
main="Rfurthermoreue chart",border="red-coloureddish coloured-coloured")

# Save the file
dev.away()

When we execute above code, it produces folloearng result −

Bar Chart with title uperform R

Group Bar Chart and Stacked Bar Chart

We can produce bar chart with groups of bars and stacks in each bar simply simply by uperform a matrix as inplace values.

More than 2 variables are represented as a matrix which is used to produce the group bar chart and stacked bar chart.

#Create the inplace vectors.
colours=c("green","orange","brown")
months <- c("Mar","Apr","May","Jun","Jul")
areas <- c("East","West","North")

#Create the matrix of the values.
Values <- matrix(c(2,9,3,11,9,4,8,7,3,12,5,2,8,10,11),nrow=3,ncol=5,simply simply byrow=TRUE)

# Give the chart file a name
png(file = "barchart_stacked.png")

#Create the bar chart
barplot(Values, main = "overalll rfurthermoreue", names.arg = months, xlab = "month", ylab = "rfurthermoreue", col = colours)

# Add the legend to the chart
legend("topleft", areas, cex = 1.3, fill = colours)

# Save the file
dev.away()

 Stacked Bar Chart uperform R

R – Boxplots

Boxplots are a measure of how well distributed is the data in a data set. It divides the data set into 3 quartiles. This graph represents the minimum, maximum, median, preliminary quartile and third quartile in the data set. It is furthermore helpful in comparing the distribution of data across data sets simply simply by draearng containerplots for each of all of them.

Boxplots are produced in R simply simply by uperform the containerplot() function.

Syntax

The easy syntax to produce a containerplot in R is −

containerplot(x, data, notch, varwidth, names, main)

Folloearng is the description of the parameters used −

  • x is a vector or a formula.

  • data is the data frame.

  • notch is a logical value. Set as TRUE to draw a notch.

  • varwidth is a logical value. Set as true to draw width of the container proslotionate to the sample dimension.

  • names are the group laend up beingls which will end up being printed under each containerplot.

  • main is used to give a title to the graph.

Example

We use the data set "mtcars" available in the R environment to produce a easy containerplot. Let's look at the columns "mpg" and "cyl" in mtcars.

inplace <- mtcars[,c('mpg','cyl')]
print(mind(inplace))

When we execute above code, it produces folloearng result −

                   mpg  cyl
Mazda RX4         21.0   6
Mazda RX4 Wag     21.0   6
Datsun 710        22.8   4
Hornet 4 Drive    21.4   6
Hornet Sslotabout there 18.7   8
Valiant           18.1   6

Creating the Boxplot

The end up beinglow script will produce a containerplot graph for the relation end up beingtween mpg (miles per gallon) and cyl (numend up beingr of cylinders).

# Give the chart file a name.
png(file = "containerplot.png")

# Plot the chart.
containerplot(mpg ~ cyl, data = mtcars, xlab = "Numend up beingr of Cylinders",
   ylab = "Miles Per Gallon", main = "Mileage Data")

# Save the file.
dev.away()

When we execute the above code, it produces the folloearng result −

Box Plot uperform R

Boxplot with Notch

We can draw containerplot with notch to find out there how the medians of various data groups match with each other.

The end up beinglow script will produce a containerplot graph with notch for each of the data group.

# Give the chart file a name.
png(file = "containerplot_with_notch.png")

# Plot the chart.
containerplot(mpg ~ cyl, data = mtcars, 
   xlab = "Numend up beingr of Cylinders",
   ylab = "Miles Per Gallon", 
   main = "Mileage Data",
   notch = TRUE, 
   varwidth = TRUE, 
   col = c("green","yellow","purple"),
   names = c("High","Medium","Low")
)
# Save the file.
dev.away()

When we execute the above code, it produces the folloearng result −

Box Plot with notch uperform R

R – Histograms

A histogram represents the frequencies of values of a variable bucketed into ranges. Histogram is similar to bar chead wear but the difference is it groups the values into continuous ranges. Each bar in histogram represents the height of the numend up beingr of values present in thead wear range.

R produces histogram uperform hist() function. This function gets a vector as an inplace and uses a couple of more parameters to plot histograms.

Syntax

The easy syntax for creating a histogram uperform R is −

hist(v,main,xlab,xlim,ylim,breaks,col,border)

Folloearng is the description of the parameters used −

  • v is a vector containing numeric values used in histogram.

  • main indicates title of the chart.

  • col is used to set colour of the bars.

  • border is used to set border colour of each bar.

  • xlab is used to give description of x-axis.

  • xlim is used to specify the range of values on the x-axis.

  • ylim is used to specify the range of values on the y-axis.

  • breaks is used to mention the width of each bar.

Example

A easy histogram is produced uperform inplace vector, laend up beingl, col and border parameters.

The script given end up beinglow will produce and save the histogram in the current R worruler immediateory.

# Create data for the graph.
v <-  c(9,13,21,8,36,22,12,41,31,33,19)

# Give the chart file a name.
png(file = "histogram.png")

# Create the histogram.
hist(v,xlab = "Weight",col = "yellow",border = "blue")

# Save the file.
dev.away()

When we execute the above code, it produces the folloearng result −

Histogram Of V

Range of X and Y values

To specify the range of values enableed in X axis and Y axis, we can use the xlim and ylim parameters.

The width of each of the bar can end up being selectd simply simply by uperform breaks.

# Create data for the graph.
v <- c(9,13,21,8,36,22,12,41,31,33,19)

# Give the chart file a name.
png(file = "histogram_lim_breaks.png")

# Create the histogram.
hist(v,xlab = "Weight",col = "green",border = "red-coloureddish coloured-coloured", xlim = c(0,40), ylim = c(0,5),
   breaks = 5)

# Save the file.
dev.away()

When we execute the above code, it produces the folloearng result −

Histogram Line Breaks

R – Line Graphs

A series chart is a graph thead wear connects a series of stages simply simply by draearng series segments end up beingtween all of them. These stages are ordered-coloureddish coloured-coloured in one of their own coordinate (usually the x-coordinate) value. Line charts are usually used in identifying the trends in data.

The plot() function in R is used to produce the series graph.

Syntax

The easy syntax to produce a series chart in R is −

plot(v,kind,col,xlab,ylab)

Folloearng is the description of the parameters used −

  • v is a vector containing the numeric values.

  • kind gets the value "p" to draw only the stages, "l" to draw only the seriess and "o" to draw both stages and seriess.

  • xlab is the laend up beingl for x axis.

  • ylab is the laend up beingl for y axis.

  • main is the Title of the chart.

  • col is used to give colours to both the stages and seriess.

Example

A easy series chart is produced uperform the inplace vector and the kind parameter as "O". The end up beinglow script will produce and save a series chart in the current R worruler immediateory.

# Create the data for the chart.
v <- c(7,12,28,3,41)

# Give the chart file a name.
png(file = "series_chart.jpg")

# Plot the bar chart. 
plot(v,kind = "o")

# Save the file.
dev.away()

When we execute the above code, it produces the folloearng result −

Line Chart uperform R

Line Chart Title, Color and Laend up beingls

The features of the series chart can end up being expanded simply simply by uperform additional parameters. We add colour to the stages and seriess, give a title to the chart and add laend up beingls to the axes.

Example

# Create the data for the chart.
v <- c(7,12,28,3,41)

# Give the chart file a name.
png(file = "series_chart_laend up beingl_coloured-coloureddish coloured-coloured.jpg")

# Plot the bar chart.
plot(v,kind = "o", col = "red-coloureddish coloured-coloured", xlab = "Month", ylab = "Rain fall",
   main = "Rain fall chart")

# Save the file.
dev.away()

When we execute the above code, it produces the folloearng result −

Line Chart Laend up beingimmediateed with Title in R

Multiple Lines in a Line Chart

More than one series can end up being drawn on the exaction same chart simply simply by uperform the seriess()function.

After the preliminary series is plotted, the seriess() function can use an additional vector as inplace to draw the 2nd series in the chart,

# Create the data for the chart.
v <- c(7,12,28,3,41)
t <- c(14,7,6,19,3)

# Give the chart file a name.
png(file = "series_chart_2_seriess.jpg")

# Plot the bar chart.
plot(v,kind = "o",col = "red-coloureddish coloured-coloured", xlab = "Month", ylab = "Rain fall", 
   main = "Rain fall chart")

seriess(t, kind = "o", col = "blue")

# Save the file.
dev.away()

When we execute the above code, it produces the folloearng result −

Line Chart with multiple seriess in R

R – Scatterplots

Scatterplots show many kind of stages plotted in the Cartesian plane. Each stage represents the values of 2 variables. One variable is chosen in the horizontal axis and an additional in the vertical axis.

The easy scatterplot is produced uperform the plot() function.

Syntax

The easy syntax for creating scatterplot in R is −

plot(x, y, main, xlab, ylab, xlim, ylim, axes)

Folloearng is the description of the parameters used −

  • x is the data set in in whose values are the horizontal coordinates.

  • y is the data set in in whose values are the vertical coordinates.

  • main is the tile of the graph.

  • xlab is the laend up beingl in the horizontal axis.

  • ylab is the laend up beingl in the vertical axis.

  • xlim is the limit is of the values of x used for plotting.

  • ylim is the limit is of the values of y used for plotting.

  • axes indicates whether both axes need to end up being drawn on the plot.

Example

We use the data set "mtcars" available in the R environment to produce a easy scatterplot. Let's use the columns "wt" and "mpg" in mtcars.

inplace <- mtcars[,c('wt','mpg')]
print(mind(inplace))

When we execute the above code, it produces the folloearng result −

                    wt      mpg
Mazda RX4           2.620   21.0
Mazda RX4 Wag       2.875   21.0
Datsun 710          2.320   22.8
Hornet 4 Drive      3.215   21.4
Hornet Sslotabout there   3.440   18.7
Valiant             3.460   18.1

Creating the Scatterplot

The end up beinglow script will produce a scatterplot graph for the relation end up beingtween wt(weight) and mpg(miles per gallon).

# Get the inplace values.
inplace <- mtcars[,c('wt','mpg')]

# Give the chart file a name.
png(file = "scatterplot.png")

# Plot the chart for cars with weight end up beingtween 2.5 to 5 and mileage end up beingtween 15 and 30.
plot(x = inplace$wt,y = inplace$mpg,
   xlab = "Weight",
   ylab = "Milage",
   xlim = c(2.5,5),
   ylim = c(15,30),		 
   main = "Weight vs Milage"
)
	 
# Save the file.
dev.away()

When we execute the above code, it produces the folloearng result −

Scatter Plot uperform R

Scatterplot Matrices

When we have more than 2 variables and we want to find the correlation end up beingtween one variable versus the remaining ones we use scatterplot matrix. We use pairs() function to produce matrices of scatterplots.

Syntax

The easy syntax for creating scatterplot matrices in R is −

pairs(formula, data)

Folloearng is the description of the parameters used −

  • formula represents the series of variables used in pairs.

  • data represents the data set from which the variables will end up being getn.

Example

Each variable is paired-coloureddish coloured-coloured up with each of the remaining variable. A scatterplot is plotted for each pair.

# Give the chart file a name.
png(file = "scatterplot_matrices.png")

# Plot the matrices end up beingtween 4 variables giving 12 plots.

# One variable with 3 others and overalll 4 variables.

pairs(~wt+mpg+disp+cyl,data = mtcars,
   main = "Scatterplot Matrix")

# Save the file.
dev.away()

When the above code is executed we get the folloearng out thereplace.

Scatter Plot Matrices uperform R

R – Mean, Median & Mode

Statistical analysis within R is performed simply simply by uperform many kind of in-built functions. Most of these functions are part of the R base package. These functions get R vector as an inplace along with the arguments and give the result.

The functions we are discusperform in this particular particular chapter are mean, median and mode.

Mean

It is calculated simply simply by taruler the sum of the values and dividing with the numend up beingr of values in a data series.

The function mean() is used to calculate this particular particular in R.

Syntax

The easy syntax for calculating mean in R is −

mean(x, trim = 0, na.rm = FALSE, ...)

Folloearng is the description of the parameters used −

  • x is the inplace vector.

  • trim is used to drop a couple of observations from both end of the sorted vector.

  • na.rm is used to remove the misperform values from the inplace vector.

Example

# Create a vector. 
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)

# Find Mean.
result.mean <- mean(x)
print(result.mean)

When we execute the above code, it produces the folloearng result −

[1] 8.22

Applying Trim Option

When trim parameter is supprestd, the values in the vector get sorted and then the required-coloureddish coloured-coloured numend up beingrs of observations are dropped from calculating the mean.

When trim = 0.3, 3 values from each end will end up being dropped from the calculations to find mean.

In this particular particular case the sorted vector is (−21, −5, 2, 3, 4.2, 7, 8, 12, 18, 54) and the values removed from the vector for calculating mean are (−21,−5,2) from left and (12,18,54) from correct.

# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)

# Find Mean.
result.mean <-  mean(x,trim = 0.3)
print(result.mean)

When we execute the above code, it produces the folloearng result −

[1] 5.55

Applying NA Option

If there are misperform values, then the mean function returns NA.

To drop the misperform values from the calculation use na.rm = TRUE. which means remove the NA values.

# Create a vector. 
x <- c(12,7,3,4.2,18,2,54,-21,8,-5,NA)

# Find mean.
result.mean <-  mean(x)
print(result.mean)

# Find mean dropping NA values.
result.mean <-  mean(x,na.rm = TRUE)
print(result.mean)

When we execute the above code, it produces the folloearng result −

[1] NA
[1] 8.22

Median

The middle the majority of value in a data series is calimmediateed the median. The median() function is used in R to calculate this particular particular value.

Syntax

The easy syntax for calculating median in R is −

median(x, na.rm = FALSE)

Folloearng is the description of the parameters used −

  • x is the inplace vector.

  • na.rm is used to remove the misperform values from the inplace vector.

Example

# Create the vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)

# Find the median.
median.result <- median(x)
print(median.result)

When we execute the above code, it produces the folloearng result −

[1] 5.6

Mode

The mode is the value thead wear has highest numend up beingr of occurrences in a set of data. Unike mean and median, mode can have both numeric and charactionioner data.

R does not have a standard in-built function to calculate mode. So we produce a user function to calculate mode of a data set in R. This function gets the vector as inplace and gives the mode value as out thereplace.

Example

# Create the function.
getmode <- function(v) {
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}

# Create the vector with numend up beingrs.
v <- c(2,1,2,3,1,2,3,4,1,5,5,3,2,3)

# Calculate the mode uperform the user function.
result <- getmode(v)
print(result)

# Create the vector with charactionioners.
charv <- c("o","it","the","it","it")

# Calculate the mode uperform the user function.
result <- getmode(charv)
print(result)

When we execute the above code, it produces the folloearng result −

[1] 2
[1] "it"

R – Linear Regression

Regression analysis is a very widely used statistical tool to establish a relationship model end up beingtween 2 variables. One of these variable is calimmediateed pred-coloureddish coloured-colouredictor variable in in whose value is gathered-coloureddish coloured-coloured through experiments. The other variable is calimmediateed response variable in in whose value is derived from the pred-coloureddish coloured-colouredictor variable.

In Linear Regression these 2 variables are related through an equation, where exponent (power) of both these variables is 1. Maall of thematically a seriesar relationship represents a straight series when plotted as a graph. A non-seriesar relationship where the exponent of any kind of variable is not equal to 1 produces a curve.

The general maall of thematical equation for a seriesar regression is −

y = ax + b

Folloearng is the description of the parameters used −

  • y is the response variable.

  • x is the pred-coloureddish coloured-colouredictor variable.

  • a and b are constants which are calimmediateed the coefficients.

Steps to Establish a Regression

A easy example of regression is pred-coloureddish coloured-colouredicting weight of a person when his height is understandn. To do this particular particular we need to have the relationship end up beingtween height and weight of a person.

The steps to produce the relationship is −

  • Carry out there the experiment of gathering a sample of observed values of height and corresponding weight.

  • Create a relationship model uperform the lm() functions in R.

  • Find the coefficients from the model produced and produce the maall of thematical equation uperform these

  • Get a summary of the relationship model to understand the average error in pred-coloureddish coloured-colourediction. Also calimmediateed residuals.

  • To pred-coloureddish coloured-colouredict the weight of new persons, use the pred-coloureddish coloured-colouredict() function in R.

Inplace Data

Below is the sample data representing the observations −

# Values of height
151, 174, 138, 186, 128, 136, 179, 163, 152, 131

# Values of weight.
63, 81, 56, 91, 47, 57, 76, 72, 62, 48

lm() Function

This function produces the relationship model end up beingtween the pred-coloureddish coloured-colouredictor and the response variable.

Syntax

The easy syntax for lm() function in seriesar regression is −

lm(formula,data)

Folloearng is the description of the parameters used −

  • formula is a symbol presenting the relation end up beingtween x and y.

  • data is the vector on which the formula will end up being apprestd.

Create Relationship Model & get the Coefficients

x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)

# Apply the lm() function.
relation <- lm(y~x)

print(relation)

When we execute the above code, it produces the folloearng result −

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)            x  
   -38.4551          0.6746 

Get the Summary of the Relationship

x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)

# Apply the lm() function.
relation <- lm(y~x)

print(summary(relation))

When we execute the above code, it produces the folloearng result −

Call:
lm(formula = y ~ x)

Residuals:
    Min      1Q     Median      3Q     Max 
-6.3002    -1.6629  0.0412    1.8944  3.9775 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -38.45509    8.04901  -4.778  0.00139 ** 
x             0.67461    0.05191  12.997 1.16e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.253 on 8 degrees of freedom
Multiple R-squared-coloureddish coloured-coloured:  0.9548,    Adsimplyed R-squared-coloureddish coloured-coloured:  0.9491 
F-statistic: 168.9 on 1 and 8 DF,  p-value: 1.164e-06

pred-coloureddish coloured-colouredict() Function

Syntax

The easy syntax for pred-coloureddish coloured-colouredict() in seriesar regression is −

pred-coloureddish coloured-colouredict(object, newdata)

Folloearng is the description of the parameters used −

  • object is the formula which is already produced uperform the lm() function.

  • newdata is the vector containing the new value for pred-coloureddish coloured-colouredictor variable.

Pred-coloureddish coloured-colouredict the weight of new persons

# The pred-coloureddish coloured-colouredictor vector.
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)

# The resposne vector.
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)

# Apply the lm() function.
relation <- lm(y~x)

# Find weight of a person with height 170.
a <- data.frame(x = 170)
result <-  pred-coloureddish coloured-colouredict(relation,a)
print(result)

When we execute the above code, it produces the folloearng result −

       1 
76.22869 

Visualize the Regression Graphically

# Create the pred-coloureddish coloured-colouredictor and response variable.
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
relation <- lm(y~x)

# Give the chart file a name.
png(file = "seriesarregression.png")

# Plot the chart.
plot(y,x,col = "blue",main = "Height & Weight Regression",
abseries(lm(x~y)),cex = 1.3,pch = 16,xlab = "Weight in Kg",ylab = "Height in cm")

# Save the file.
dev.away()

When we execute the above code, it produces the folloearng result −

Linear regression in R

R – Multiple Regression

Multiple regression is an extension of seriesar regression into relationship end up beingtween more than 2 variables. In easy seriesar relation we have one pred-coloureddish coloured-colouredictor and one response variable, but in multiple regression we have more than one pred-coloureddish coloured-colouredictor variable and one response variable.

The general maall of thematical equation for multiple regression is −

y = a + b1x1 + b2x2 +...bnxn

Folloearng is the description of the parameters used −

  • y is the response variable.

  • a, b1, b2…bn are the coefficients.

  • x1, x2, …xn are the pred-coloureddish coloured-colouredictor variables.

We produce the regression model uperform the lm() function in R. The model figure outs the value of the coefficients uperform the inplace data. Next we can pred-coloureddish coloured-colouredict the value of the response variable for a given set of pred-coloureddish coloured-colouredictor variables uperform these coefficients.

lm() Function

This function produces the relationship model end up beingtween the pred-coloureddish coloured-colouredictor and the response variable.

Syntax

The easy syntax for lm() function in multiple regression is −

lm(y ~ x1+x2+x3...,data)

Folloearng is the description of the parameters used −

  • formula is a symbol presenting the relation end up beingtween the response variable and pred-coloureddish coloured-colouredictor variables.

  • data is the vector on which the formula will end up being apprestd.

Example

Inplace Data

Conpartr the data set "mtcars" available in the R environment. It gives a comparison end up beingtween various car models in terms of mileage per gallon (mpg), cylinder displacement("disp"), horse power("hp"), weight of the car("wt") and a couple of more parameters.

The goal of the model is to establish the relationship end up beingtween "mpg" as a response variable with "disp","hp" and "wt" as pred-coloureddish coloured-colouredictor variables. We produce a subset of these variables from the mtcars data set for this particular particular purpose.

inplace <- mtcars[,c("mpg","disp","hp","wt")]
print(mind(inplace))

When we execute the above code, it produces the folloearng result −

                   mpg   disp   hp    wt
Mazda RX4          21.0  160    110   2.620
Mazda RX4 Wag      21.0  160    110   2.875
Datsun 710         22.8  108     93   2.320
Hornet 4 Drive     21.4  258    110   3.215
Hornet Sslotabout there  18.7  360    175   3.440
Valiant            18.1  225    105   3.460

Create Relationship Model & get the Coefficients

inplace <- mtcars[,c("mpg","disp","hp","wt")]

# Create the relationship model.
model <- lm(mpg~disp+hp+wt, data = inplace)

# Show the model.
print(model)

# Get the Intercept and coefficients as vector elements.
cat("# # # # The Coefficient Values # # # ","n")

a <- coef(model)[1]
print(a)

Xdisp <- coef(model)[2]
Xhp <- coef(model)[3]
Xwt <- coef(model)[4]

print(Xdisp)
print(Xhp)
print(Xwt)

When we execute the above code, it produces the folloearng result −

Call:
lm(formula = mpg ~ disp + hp + wt, data = inplace)

Coefficients:
(Intercept)         disp           hp           wt  
  37.105505      -0.000937        -0.031157    -3.800891  

# # # # The Coefficient Values # # # 
(Intercept) 
   37.10551 
         disp 
-0.0009370091 
         hp 
-0.03115655 
       wt 
-3.800891 

Create Equation for Regression Model

Based on the above intercept and coefficient values, we produce the maall of thematical equation.

Y = a+Xdisp.x1+Xhp.x2+Xwt.x3
or
Y = 37.15+(-0.000937)*x1+(-0.0311)*x2+(-3.8008)*x3

Apply Equation for pred-coloureddish coloured-colouredicting New Values

We can use the regression equation produced above to pred-coloureddish coloured-colouredict the mileage when a new set of values for displacement, horse power and weight is provided.

For a car with disp = 221, hp = 102 and wt = 2.91 the pred-coloureddish coloured-colouredicted mileage is −

Y = 37.15+(-0.000937)*221+(-0.0311)*102+(-3.8008)*2.91 = 22.7104

R – Logistic Regression

The Logistic Regression is a regression model in which the response variable (dependent variable) has categorical values such as True/False or 0/1. It actionionually measures the probpossible of a binary response as the value of response variable based on the maall of thematical equation relating it with the pred-coloureddish coloured-colouredictor variables.

The general maall of thematical equation for logistic regression is −

y = 1/(1+e^-(a+b1x1+b2x2+b3x3+...))

Folloearng is the description of the parameters used −

  • y is the response variable.

  • x is the pred-coloureddish coloured-colouredictor variable.

  • a and b are the coefficients which are numeric constants.

The function used to produce the regression model is the glm() function.

Syntax

The easy syntax for glm() function in logistic regression is −

glm(formula,data,family)

Folloearng is the description of the parameters used −

  • formula is the symbol presenting the relationship end up beingtween the variables.

  • data is the data set giving the values of these variables.

  • family is R object to specify the details of the model. It's value is binomial for logistic regression.

Example

The in-built data set "mtcars" descriend up beings various models of a car with their own various engine specifications. In "mtcars" data set, the transmission mode (automatic or manual) is descriend up beingd simply simply by the column am which is a binary value (0 or 1). We can produce a logistic regression model end up beingtween the columns "am" and 3 other columns – hp, wt and cyl.

# Select a couple of columns form mtcars.
inplace <- mtcars[,c("am","cyl","hp","wt")]

print(mind(inplace))

When we execute the above code, it produces the folloearng result −

                  am   cyl  hp    wt
Mazda RX4          1   6    110   2.620
Mazda RX4 Wag      1   6    110   2.875
Datsun 710         1   4     93   2.320
Hornet 4 Drive     0   6    110   3.215
Hornet Sslotabout there  0   8    175   3.440
Valiant            0   6    105   3.460

Create Regression Model

We use the glm() function to produce the regression model and get it is summary for analysis.

inplace <- mtcars[,c("am","cyl","hp","wt")]

am.data = glm(formula = am ~ cyl + hp + wt, data = inplace, family = binomial)

print(summary(am.data))

When we execute the above code, it produces the folloearng result −

Call:
glm(formula = am ~ cyl + hp + wt, family = binomial, data = inplace)

Deviance Residuals: 
     Min        1Q      Median        3Q       Max  
-2.17272     -0.14907  -0.01464     0.14116   1.27641  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept) 19.70288    8.11637   2.428   0.0152 *
cyl          0.48760    1.07162   0.455   0.6491  
hp           0.03259    0.01886   1.728   0.0840 .
wt          -9.14947    4.15332  -2.203   0.0276 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family getn to end up being 1)

    Null deviance: 43.2297  on 31  degrees of freedom
Residual deviance:  9.8415  on 28  degrees of freedom
AIC: 17.841

Numend up beingr of Fisher Scoring iterations: 8

Conclusion

In the summary as the p-value in the final column is more than 0.05 for the variables "cyl" and "hp", we conpartr all of them to end up being insubstantial in contributing to the value of the variable "am". Only weight (wt) impactionions the "am" value in this particular particular regression model.

R – Normal Distribution

In a random collection of data from independent sources, it is generally observed thead wear the distribution of data is normal. Which means, on plotting a graph with the value of the variable in the horizontal axis and the count of the values in the vertical axis we get a end up beingll shape curve. The centre of the curve represents the mean of the data set. In the graph, fifty percent of values rest to the left of the mean and the other fifty percent rest to the correct of the graph. This is referred-coloureddish coloured-coloured as normal distribution in statistics.

R has four in built functions to generate normal distribution. They are descriend up beingd end up beinglow.

dnorm(x, mean, sd)
pnorm(x, mean, sd)
qnorm(p, mean, sd)
rnorm(n, mean, sd)

Folloearng is the description of the parameters used in above functions −

  • x is a vector of numend up beingrs.

  • p is a vector of probabilities.

  • n is numend up beingr of observations(sample dimension).

  • mean is the mean value of the sample data. It's default value is zero.

  • sd is the standard deviation. It's default value is 1.

dnorm()

This function gives height of the probpossible distribution at each stage for a given mean and standard deviation.

# Create a sequence of numend up beingrs end up beingtween -10 and 10 incrementing simply simply by 0.1.
x <- seq(-10, 10, simply simply by = .1)

# Choose the mean as 2.5 and standard deviation as 0.5.
y <- dnorm(x, mean = 2.5, sd = 0.5)

# Give the chart file a name.
png(file = "dnorm.png")

plot(x,y)

# Save the file.
dev.away()

When we execute the above code, it produces the folloearng result −

dnorm() graph

pnorm()

This function gives the probpossible of a normally distributed random numend up beingr to end up being less thead wear the value of a given numend up beingr. It is furthermore calimmediateed "Cumulative Distribution Function".

# Create a sequence of numend up beingrs end up beingtween -10 and 10 incrementing simply simply by 0.2.
x <- seq(-10,10,simply simply by = .2)
 
# Choose the mean as 2.5 and standard deviation as 2. 
y <- pnorm(x, mean = 2.5, sd = 2)

# Give the chart file a name.
png(file = "pnorm.png")

# Plot the graph.
plot(x,y)

# Save the file.
dev.away()

When we execute the above code, it produces the folloearng result −

pnorm() graph

qnorm()

This function gets the probpossible value and gives a numend up beingr in in whose cumulative value matches the probpossible value.

# Create a sequence of probpossible values incrementing simply simply by 0.02.
x <- seq(0, 1, simply simply by = 0.02)

# Choose the mean as 2 and standard deviation as 3.
y <- qnorm(x, mean = 2, sd = 1)

# Give the chart file a name.
png(file = "qnorm.png")

# Plot the graph.
plot(x,y)

# Save the file.
dev.away()

When we execute the above code, it produces the folloearng result −

qnorm() graph

rnorm()

This function is used to generate random numend up beingrs in in whose distribution is normal. It gets the sample dimension as inplace and generates thead wear many kind of random numend up beingrs. We draw a histogram to show the distribution of the generated numend up beingrs.

# Create a sample of 50 numend up beingrs which are normally distributed.
y <- rnorm(50)

# Give the chart file a name.
png(file = "rnorm.png")

# Plot the histogram for this particular particular sample.
hist(y, main = "Normal DIstribution")

# Save the file.
dev.away()

When we execute the above code, it produces the folloearng result −

rnorm() graph

R – Binomial Distribution

The binomial distribution model deals with finding the probpossible of success of an furthermoret which has only 2 probable out therecomes in a series of experiments. For example, tosperform of a coin always gives a mind or a tail. The probpossible of finding exactionionly 3 minds in tosperform a coin repeatedly for 10 times is estimated during the binomial distribution.

R has four in-built functions to generate binomial distribution. They are descriend up beingd end up beinglow.

dbinom(x, dimension, prob)
pbinom(x, dimension, prob)
qbinom(p, dimension, prob)
rbinom(n, dimension, prob)

Folloearng is the description of the parameters used −

  • x is a vector of numend up beingrs.

  • p is a vector of probabilities.

  • n is numend up beingr of observations.

  • dimension is the numend up beingr of trials.

  • prob is the probpossible of success of each trial.

dbinom()

This function gives the probpossible densit down downy distribution at each stage.

# Create a sample of 50 numend up beingrs which are incremented simply simply by 1.
x <- seq(0,50,simply simply by = 1)

# Create the binomial distribution.
y <- dbinom(x,50,0.5)

# Give the chart file a name.
png(file = "dbinom.png")

# Plot the graph for this particular particular sample.
plot(x,y)

# Save the file.
dev.away()

When we execute the above code, it produces the folloearng result −

dbinom() graph

pbinom()

This function gives the cumulative probpossible of an furthermoret. It is a performle value representing the probpossible.

# Probpossible of getting 26 or less minds from a 51 tosses of a coin.
x <- pbinom(26,51,0.5)

print(x)

When we execute the above code, it produces the folloearng result −

[1] 0.610116

qbinom()

This function gets the probpossible value and gives a numend up beingr in in whose cumulative value matches the probpossible value.

# How many kind of minds will have a probpossible of 0.25 will come out there when a coin is tossed 51 times.
x <- qbinom(0.25,51,1/2)

print(x)

When we execute the above code, it produces the folloearng result −

[1] 23

rbinom()

This function generates required-coloureddish coloured-coloured numend up beingr of random values of given probpossible from a given sample.

# Find 8 random values from a sample of 150 with probpossible of 0.4.
x <- rbinom(8,150,.4)

print(x)

When we execute the above code, it produces the folloearng result −

[1] 58 61 59 66 55 60 61 67

R – Poisson Regression

Poisson Regression involves regression models in which the response variable is within the form of counts and not fractionionional numend up beingrs. For example, the count of numend up beingr of births or numend up beingr of earns in a football match series. Also the values of the response variables follow a Poisson distribution.

The general maall of thematical equation for Poisson regression is −

log(y) = a + b1x1 + b2x2 + bnxn.....

Folloearng is the description of the parameters used −

  • y is the response variable.

  • a and b are the numeric coefficients.

  • x is the pred-coloureddish coloured-colouredictor variable.

The function used to produce the Poisson regression model is the glm() function.

Syntax

The easy syntax for glm() function in Poisson regression is −

glm(formula,data,family)

Folloearng is the description of the parameters used in above functions −

  • formula is the symbol presenting the relationship end up beingtween the variables.

  • data is the data set giving the values of these variables.

  • family is R object to specify the details of the model. It's value is 'Poisson' for Logistic Regression.

Example

We have the in-built data set "warpbreaks" which descriend up beings the effect of wool kind (A or B) and tension (low, medium or high) on the numend up beingr of warp breaks per loom. Let's conpartr "breaks" as the response variable which is a count of numend up beingr of breaks. The wool "kind" and "tension" are getn as pred-coloureddish coloured-colouredictor variables.

Inplace Data

inplace <- warpbreaks
print(mind(inplace))

When we execute the above code, it produces the folloearng result −

      breaks   wool  tension
1     26       A     L
2     30       A     L
3     54       A     L
4     25       A     L
5     70       A     L
6     52       A     L

Create Regression Model

out thereplace <-glm(formula = breaks ~ wool+tension, 
                   data = warpbreaks, 
                 family = poisson)
print(summary(out thereplace))

When we execute the above code, it produces the folloearng result −

Call:
glm(formula = breaks ~ wool + tension, family = poisson, data = warpbreaks)

Deviance Residuals: 
    Min       1Q     Median       3Q      Max  
  -3.6871  -1.6503  -0.4269     1.1902   4.2616  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  3.69196    0.04541  81.302  < 2e-16 ***
woolB       -0.20599    0.05157  -3.994 6.49e-05 ***
tensionM    -0.32132    0.06027  -5.332 9.73e-08 ***
tensionH    -0.51849    0.06396  -8.107 5.21e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family getn to end up being 1)

    Null deviance: 297.37  on 53  degrees of freedom
Residual deviance: 210.39  on 50  degrees of freedom
AIC: 493.06

Numend up beingr of Fisher Scoring iterations: 4

In the summary we look for the p-value in the final column to end up being less than 0.05 to conpartr an impactionion of the pred-coloureddish coloured-colouredictor variable on the response variable. As seen the woolkind B having tension kind M and H have impactionion on the count of breaks.

R – Analysis of Covariance

We use Regression analysis to produce models which descriend up being the effect of variation in pred-coloureddish coloured-colouredictor variables on the response variable. Sometimes, if we have a categorical variable with values like Yes/No or Male/Female etc. The easy regression analysis gives multiple results for each value of the categorical variable. In such scenario, we can study the effect of the categorical variable simply simply by uperform it along with the pred-coloureddish coloured-colouredictor variable and comparing the regression seriess for each level of the categorical variable. Such an analysis is termed as Analysis of Covariance furthermore calimmediateed as ANCOVA.

Example

Conpartr the R built in data set mtcars. In it we observer thead wear the field "am" represents the kind of transmission (auto or manual). It is a categorical variable with values 0 and 1. The miles per gallon value(mpg) of a car can furthermore depend on it end up beingparts the value of horse power("hp").

We study the effect of the value of "am" on the regression end up beingtween "mpg" and "hp". It is done simply simply by uperform the aov() function followed simply simply by the anova() function to compare the multiple regressions.

Inplace Data

Create a data frame containing the fields "mpg", "hp" and "am" from the data set mtcars. Here we get "mpg" as the response variable, "hp" as the pred-coloureddish coloured-colouredictor variable and "am" as the categorical variable.

inplace <- mtcars[,c("am","mpg","hp")]
print(mind(inplace))

When we execute the above code, it produces the folloearng result −

                   am   mpg   hp
Mazda RX4          1    21.0  110
Mazda RX4 Wag      1    21.0  110
Datsun 710         1    22.8   93
Hornet 4 Drive     0    21.4  110
Hornet Sslotabout there  0    18.7  175
Valiant            0    18.1  105

ANCOVA Analysis

We produce a regression model taruler "hp" as the pred-coloureddish coloured-colouredictor variable and "mpg" as the response variable taruler into account the interactionionion end up beingtween "am" and "hp".

Model with interactionionion end up beingtween categorical variable and pred-coloureddish coloured-colouredictor variable

# Get the dataset.
inplace <- mtcars

# Create the regression model.
result <- aov(mpg~hp*am,data = inplace)
print(summary(result))

When we execute the above code, it produces the folloearng result −

            Df Sum Sq Mean Sq F value   Pr(>F)    
hp           1  678.4   678.4  77.391 1.50e-09 ***
am           1  202.2   202.2  23.072 4.75e-05 ***
hp:am        1    0.0     0.0   0.001    0.981    
Residuals   28  245.4     8.8                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

This result shows thead wear both horse power and transmission kind has substantial effect on miles per gallon as the p value in both cases is less than 0.05. But the interactionionion end up beingtween these 2 variables is not substantial as the p-value is more than 0.05.

Model without there interactionionion end up beingtween categorical variable and pred-coloureddish coloured-colouredictor variable

# Get the dataset.
inplace <- mtcars

# Create the regression model.
result <- aov(mpg~hp+am,data = inplace)
print(summary(result))

When we execute the above code, it produces the folloearng result −

            Df  Sum Sq  Mean Sq   F value   Pr(>F)    
hp           1  678.4   678.4   80.15 7.63e-10 ***
am           1  202.2   202.2   23.89 3.46e-05 ***
Residuals   29  245.4     8.5                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

This result shows thead wear both horse power and transmission kind has substantial effect on miles per gallon as the p value in both cases is less than 0.05.

Comparing Two Models

Now we can compare the 2 models to conclude if the interactionionion of the variables is truly in-substantial. For this particular particular we use the anova() function.

# Get the dataset.
inplace <- mtcars

# Create the regression models.
result1 <- aov(mpg~hp*am,data = inplace)
result2 <- aov(mpg~hp+am,data = inplace)

# Compare the 2 models.
print(anova(result1,result2))

When we execute the above code, it produces the folloearng result −

Model 1: mpg ~ hp * am
Model 2: mpg ~ hp + am
  Res.Df    RSS Df  Sum of Sq     F Pr(>F)
1     28 245.43                           
2     29 245.44 -1 -0.0052515 6e-04 0.9806

As the p-value is greater than 0.05 we conclude thead wear the interactionionion end up beingtween horse power and transmission kind is not substantial. So the mileage per gallon will depend in a similar manner on the horse power of the car in both auto and manual transmission mode.

R – Time Series Analysis

Time series is a series of data stages in which each data stage is associated with a timestamp. A easy example is the price of a stock in the stock market at various stages of time on a given day. Another example is the amount of rainfall in a area at various months of the oceanson. R language uses many kind of functions to produce, manipulate and plot the time series data. The data for the time series is stored-coloureddish coloured-coloured in an R object calimmediateed time-series object. It is furthermore a R data object like a vector or data frame.

The time series object is produced simply simply by uperform the ts() function.

Syntax

The easy syntax for ts() function in time series analysis is −

timeseries.object.name <-  ts(data, start, end, frequency)

Folloearng is the description of the parameters used −

  • data is a vector or matrix containing the values used in the time series.

  • start specifies the start time for the preliminary observation in time series.

  • end specifies the end time for the final observation in time series.

  • frequency specifies the numend up beingr of observations per unit time.

Except the parameter "data" all other parameters are optional.

Example

Conpartr the annual rainfall details at a place starting from January 2012. We produce an R time series object for a period of 12 months and plot it.

# Get the data stages in form of a R vector.
rainfall <- c(799,1174.8,865.1,1334.6,635.4,918.5,685.5,998.6,784.2,985,882.8,1071)

# Convert it to a time series object.
rainfall.timeseries <- ts(rainfall,start = c(2012,1),frequency = 12)

# Print the timeseries data.
print(rainfall.timeseries)

# Give the chart file a name.
png(file = "rainfall.png")

# Plot a graph of the time series.
plot(rainfall.timeseries)

# Save the file.
dev.away()

When we execute the above code, it produces the folloearng result and chart −

Jan    Feb    Mar    Apr    May     Jun    Jul    Aug    Sep
2012  799.0  1174.8  865.1  1334.6  635.4  918.5  685.5  998.6  784.2
        Oct    Nov    Dec
2012  985.0  882.8 1071.0

The Time series chart −

Time Series uperform R

Different Time Intervals

The value of the frequency parameter in the ts() function selects the time intervals at which the data stages are measured-coloureddish coloured-coloured. A value of 12 indicates thead wear the time series is for 12 months. Other values and it is meaning is as end up beinglow −

  • frequency = 12 pegs the data stages for every month of a oceanson.

  • frequency = 4 pegs the data stages for every quarter of a oceanson.

  • frequency = 6 pegs the data stages for every 10 moments of an hr.

  • frequency = 24*6 pegs the data stages for every 10 moments of a day.

Multiple Time Series

We can plot multiple time series in one chart simply simply by combining both the series into a matrix.

# Get the data stages in form of a R vector.
rainfall1 <- c(799,1174.8,865.1,1334.6,635.4,918.5,685.5,998.6,784.2,985,882.8,1071)
rainfall2 <- 
           c(655,1306.9,1323.4,1172.2,562.2,824,822.4,1265.5,799.6,1105.6,1106.7,1337.8)

# Convert all of them to a matrix.
combined.rainfall <-  matrix(c(rainfall1,rainfall2),nrow = 12)

# Convert it to a time series object.
rainfall.timeseries <- ts(combined.rainfall,start = c(2012,1),frequency = 12)

# Print the timeseries data.
print(rainfall.timeseries)

# Give the chart file a name.
png(file = "rainfall_combined.png")

# Plot a graph of the time series.
plot(rainfall.timeseries, main = "Multiple Time Series")

# Save the file.
dev.away()

When we execute the above code, it produces the folloearng result and chart −

           Series 1  Series 2
Jan 2012    799.0    655.0
Feb 2012   1174.8   1306.9
Mar 2012    865.1   1323.4
Apr 2012   1334.6   1172.2
May 2012    635.4    562.2
Jun 2012    918.5    824.0
Jul 2012    685.5    822.4
Aug 2012    998.6   1265.5
Sep 2012    784.2    799.6
Oct 2012    985.0   1105.6
Nov 2012    882.8   1106.7
Dec 2012   1071.0   1337.8

The Multiple Time series chart −

Combined Time series is uperform R

R – Nonseriesar Leasternern Square

When modeling real world data for regression analysis, we observe thead wear it is rarely the case thead wear the equation of the model is a seriesar equation giving a seriesar graph. Most of the time, the equation of the model of real world data involves maall of thematical functions of higher degree like an exponent of 3 or a sin function. In such a scenario, the plot of the model gives a curve instead than a series. The goal of both seriesar and non-seriesar regression is to adsimply the values of the model's parameters to find the series or curve thead wear comes shutst to your data. On finding these values we will end up being able to estimate the response variable with great accuracy.

In Leasternern Square regression, we establish a regression model in which the sum of the squares of the vertical distances of various stages from the regression curve is minimised. We generally start with a degreatd model and assume a couple of values for the coefficients. We then apply the nls() function of R to get the more precise values along with the confidence intervals.

Syntax

The easy syntax for creating a nonseriesar minimumern square test in R is −

nls(formula, data, start)

Folloearng is the description of the parameters used −

  • formula is a nonseriesar model formula including variables and parameters.

  • data is a data frame used to evaluate the variables in the formula.

  • start is a named list or named numeric vector of starting estimates.

Example

We will conpartr a nonseriesar model with assumption of preliminary values of it is coefficients. Next we will see exactionly whead wear is the confidence intervals of these assumed values so thead wear we can judge how well these values fir into the model.

So enable's conpartr the end up beinglow equation for this particular particular purpose −

a = b1*x^2+b2

Let's assume the preliminary coefficients to end up being 1 and 3 and fit these values into nls() function.

xvalues <- c(1.6,2.1,2,2.23,3.71,3.25,3.4,3.86,1.19,2.21)
yvalues <- c(5.19,7.43,6.94,8.11,18.75,14.88,16.06,19.12,3.21,7.58)

# Give the chart file a name.
png(file = "nls.png")


# Plot these values.
plot(xvalues,yvalues)


# Take the assumed values and fit into the model.
model <- nls(yvalues ~ b1*xvalues^2+b2,start = list(b1 = 1,b2 = 3))

# Plot the chart with new data simply simply by fitting it to a pred-coloureddish coloured-colourediction from 100 data stages.
new.data <- data.frame(xvalues = seq(min(xvalues),max(xvalues),len = 100))
seriess(new.data$xvalues,pred-coloureddish coloured-colouredict(model,newdata = new.data))

# Save the file.
dev.away()

# Get the sum of the squared-coloureddish coloured-coloured residuals.
print(sum(resid(model)^2))

# Get the confidence intervals on the chosen values of the coefficients.
print(confint(model))

When we execute the above code, it produces the folloearng result −

[1] 1.081935
Waiting for profiling to end up being done...
       2.5%    97.5%
b1 1.137708 1.253135
b2 1.497364 2.496484

Non Linear minimumern square R

We can conclude thead wear the value of b1 is more shut to 1 while the value of b2 is more shut to 2 and not 3.

R – Decision Tree

Decision tree is a graph to represent choices and their own results in form of a tree. The nodes in the graph represent an furthermoret or choice and the advantages of the graph represent the decision rules or conditions. It is the majority ofly used in Machine Learning and Data Mining applications uperform R.

Examples of use of decision tress is − pred-coloureddish coloured-colouredicting an email as spam or not spam, pred-coloureddish coloured-colouredicting of a tumor is cancerous or pred-coloureddish coloured-colouredicting a loan as a great or bad cred-coloureddish coloured-colouredit risk based on the truthionors in each of these. Generally, a model is produced with observed data furthermore calimmediateed training data. Then a set of validation data is used to verify and improve the model. R has packages which are used to produce and visualize decision trees. For new set of pred-coloureddish coloured-colouredictor variable, we use this particular particular model to arrive at a decision on the category (yes/No, spam/not spam) of the data.

The R package "party" is used to produce decision trees.

Install R Package

Use the end up beinglow command in R console to install the package. You furthermore have to install the dependent packages if any kind of.

install.packages("party")

The package "party" has the function ctree() which is used to produce and analyze decison tree.

Syntax

The easy syntax for creating a decision tree in R is −

ctree(formula, data)

Folloearng is the description of the parameters used −

  • formula is a formula describing the pred-coloureddish coloured-colouredictor and response variables.

  • data is the name of the data set used.

Inplace Data

We will use the R in-built data set named readingSdestroys to produce a decision tree. It descriend up beings the score of a couple ofone's readingSdestroys if we understand the variables "age","footusedimension","score" and whether the person is a native speaker or not.

Here is the sample data.

# Load the party package. It will automatically load other dependent packages.
library(party)

# Print a couple of records from data set readingSdestroys.
print(mind(readingSdestroys))

When we execute the above code, it produces the folloearng result and chart −

  nativeSpeaker   age   footuseSize      score
1           yes     5   24.83189   32.29385
2           yes     6   25.95238   36.63105
3            no    11   30.42170   49.60593
4           yes     7   28.66450   40.28456
5           yes    11   31.88207   55.46085
6           yes    10   30.07843   52.83124
Loading required-coloureddish coloured-coloured package: methods
Loading required-coloureddish coloured-coloured package: grid
...............................
...............................

Example

We will use the ctree() function to produce the decision tree and see it is graph.

# Load the party package. It will automatically load other dependent packages.
library(party)

# Create the inplace data frame.
inplace.dat <- readingSdestroys[c(1:105),]

# Give the chart file a name.
png(file = "decision_tree.png")

# Create the tree.
  out thereplace.tree <- ctree(
  nativeSpeaker ~ age + footuseSize + score, 
  data = inplace.dat)

# Plot the tree.
plot(out thereplace.tree)

# Save the file.
dev.away()

When we execute the above code, it produces the folloearng result −

null device 
          1 
Loading required-coloureddish coloured-coloured package: methods
Loading required-coloureddish coloured-coloured package: grid
Loading required-coloureddish coloured-coloured package: mvtnorm
Loading required-coloureddish coloured-coloured package: modeltools
Loading required-coloureddish coloured-coloured package: stats4
Loading required-coloureddish coloured-coloured package: strucalter
Loading required-coloureddish coloured-coloured package: zoo

Attaching package: ‘zoo’

The folloearng objects are masked from ‘package:base’:

    as.Date, as.Date.numeric

Loading required-coloureddish coloured-coloured package: great sandwich

Decision Tree uperform R

Conclusion

From the decision tree shown above we can conclude thead wear any kind ofone in in whose readingSdestroys score is less than 38.3 and age is more than 6 is not a native Speaker.

R – Random Forest

In the random forest approach, a large numend up beingr of decision trees are produced. Every observation is fed into every decision tree. The the majority of common out therecome for each observation is used as the final out thereplace. A new observation is fed into all the trees and taruler a majority vote for each courseification model.

An error estimate is made for the cases which were not used while makeing the tree. Thead wear is calimmediateed an OOB (Out-of-bag) error estimate which is mentioned as a percentage.

The R package "randomForest" is used to produce random forests.

Install R Package

Use the end up beinglow command in R console to install the package. You furthermore have to install the dependent packages if any kind of.

install.packages("randomForest)

The package "randomForest" has the function randomForest() which is used to produce and analyze random forests.

Syntax

The easy syntax for creating a random forest in R is −

randomForest(formula, data)

Folloearng is the description of the parameters used −

  • formula is a formula describing the pred-coloureddish coloured-colouredictor and response variables.

  • data is the name of the data set used.

Inplace Data

We will use the R in-built data set named readingSdestroys to produce a decision tree. It descriend up beings the score of a couple ofone's readingSdestroys if we understand the variables "age","footusedimension","score" and whether the person is a native speaker.

Here is the sample data.

# Load the party package. It will automatically load other required-coloureddish coloured-coloured packages.
library(party)

# Print a couple of records from data set readingSdestroys.
print(mind(readingSdestroys))

When we execute the above code, it produces the folloearng result and chart −

  nativeSpeaker   age   footuseSize      score
1           yes     5   24.83189   32.29385
2           yes     6   25.95238   36.63105
3            no    11   30.42170   49.60593
4           yes     7   28.66450   40.28456
5           yes    11   31.88207   55.46085
6           yes    10   30.07843   52.83124
Loading required-coloureddish coloured-coloured package: methods
Loading required-coloureddish coloured-coloured package: grid
...............................
...............................

Example

We will use the randomForest() function to produce the decision tree and see it's graph.

# Load the party package. It will automatically load other required-coloureddish coloured-coloured packages.
library(party)
library(randomForest)

# Create the forest.
out thereplace.forest <- randomForest(nativeSpeaker ~ age + footuseSize + score, 
           data = readingSdestroys)

# View the forest results.
print(out thereplace.forest) 

# Imslotance of each pred-coloureddish coloured-colouredictor.
print(imslotance(fit,kind = 2)) 

When we execute the above code, it produces the folloearng result −

Call:
 randomForest(formula = nativeSpeaker ~ age + footuseSize + score,     
                 data = readingSdestroys)
               Type of random forest: courseification
                     Numend up beingr of trees: 500
No. of variables tried at each split: 1

        OOB estimate of  error rate: 1%
Confusion matrix:
    no yes course.error
no  99   1        0.01
yes  1  99        0.01
         MeanDecrrerestveGini
age              13.95406
footuseSize         18.91006
score            56.73051

Conclusion

From the random forest shown above we can conclude thead wear the footusedimension and score are the imslotant truthionors deciding if a couple ofone is a native speaker or not. Also the model has only 1% error which means we can pred-coloureddish coloured-colouredict with 99% accuracy.

R – Survival Analysis

Survival analysis deals with pred-coloureddish coloured-colouredicting the time when a specific furthermoret is going to occur. It is furthermore understandn as failure time analysis or analysis of time to death. For example pred-coloureddish coloured-colouredicting the numend up beingr of days a person with cancer will survive or pred-coloureddish coloured-colouredicting the time when a mechanical system is going to fail.

The R package named survival is used to carry out there survival analysis. This package contains the function Surv() which gets the inplace data as a R formula and produces a survival object among the chosen variables for analysis. Then we use the function survfit() to produce a plot for the analysis.

Install Package

install.packages("survival")

Syntax

The easy syntax for creating survival analysis within R is −

Surv(time,furthermoret)
survfit(formula)

Folloearng is the description of the parameters used −

  • time is the follow up time until the furthermoret occurs.

  • furthermoret indicates the status of occurrence of the expected furthermoret.

  • formula is the relationship end up beingtween the pred-coloureddish coloured-colouredictor variables.

Example

We will conpartr the data set named "pbc" present in the survival packages instalimmediateed above. It descriend up beings the survival data stages about there people affected with primary biliary cirrhosis (PBC) of the resider. Among the many kind of columns present in the data set we are primarily concerned with the fields "time" and "status". Time represents the numend up beingr of days end up beingtween registration of the patient and earrestr of the furthermoret end up beingtween the patient receiving a resider transplant or death of the patient.

# Load the library.
library("survival")

# Print preliminary couple of rows.
print(mind(pbc))

When we execute the above code, it produces the folloearng result and chart −

  id time status trt      age love-maruler ascites hepato spiders edema bili chol
1  1  400      2   1 58.76523   f       1      1       1   1.0 14.5  261
2  2 4500      0   1 56.44627   f       0      1       1   0.0  1.1  302
3  3 1012      2   1 70.07255   m       0      0       0   0.5  1.4  176
4  4 1925      2   1 54.74059   f       0      1       1   0.5  1.8  244
5  5 1504      1   2 38.10541   f       0      1       1   0.0  3.4  279
6  6 2503      2   2 66.25873   f       0      1       0   0.0  0.8  248
  albumin copper alk.phos    ast trig plateenable protime stage
1    2.60    156   1718.0 137.95  172      190    12.2     4
2    4.14     54   7394.8 113.52   88      221    10.6     3
3    3.48    210    516.0  96.10   55      151    12.0     4
4    2.54     64   6121.8  60.63   92      183    10.3     4
5    3.53    143    671.0 113.15   72      136    10.9     3
6    3.98     50    944.0  93.00   63       NA    11.0     3

From the above data we are conpartring time and status for our analysis.

Applying Surv() and survfit() Function

Now we proceed to apply the Surv() function to the above data set and produce a plot thead wear will show the trend.

# Load the library.
library("survival")

# Create the survival object. 
survfit(Surv(pbc$time,pbc$status == 2)~1)

# Give the chart file a name.
png(file = "survival.png")

# Plot the graph. 
plot(survfit(Surv(pbc$time,pbc$status == 2)~1))

# Save the file.
dev.away()

When we execute the above code, it produces the folloearng result and chart −

Call: survfit(formula = Surv(pbc$time, pbc$status == 2) ~ 1)

      n  furthermorets  median 0.95LCL 0.95UCL 
    418     161    3395    3090    3853 

SUrvival analysis uperform R

The trend in the above graph helps us pred-coloureddish coloured-colouredicting the probpossible of survival at the end of a particular numend up beingr of days.

R – Chi Square Test

Chi-Square test is a statistical method to figure out if 2 categorical variables have a substantial correlation end up beingtween all of them. Both those variables need to end up being from exaction same population and they need to end up being categorical like − Yes/No, Male/Female, Red/Green etc.

For example, we can make a data set with observations on people's ice-cream buying pattern and test to correlate the gender of a person with the flavour of the ice-cream they prefer. If a correlation is found we can plan for appropriate stock of flavours simply simply by understanding the numend up beingr of gender of people visit down downing.

Syntax

The function used for performing chi-Square test is chisq.test().

The easy syntax for creating a chi-square test in R is −

chisq.test(data)

Folloearng is the description of the parameters used −

  • data is the data in form of a table containing the count value of the variables in the observation.

Example

We will get the Cars93 data in the "MASS" library which represents the sales of various models of car in the oceanson 1993.

library("MASS")
print(str(Cars93))

When we execute the above code, it produces the folloearng result −

'data.frame':   93 obs. of  27 variables: 
 $ Manutruthionurer      : Factionionor w/ 32 levels "Acura","Audi",..: 1 1 2 2 3 4 4 4 4 5 ... 
 $ Model             : Factionionor w/ 93 levels "100","190E","240",..: 49 56 9 1 6 24 54 74 73 35 ... 
 $ Type              : Factionionor w/ 6 levels "Compactionion","Large",..: 4 3 1 3 3 3 2 2 3 2 ... 
 $ Min.Price         : num  12.9 29.2 25.9 30.8 23.7 14.2 19.9 22.6 26.3 33 ... 
 $ Price             : num  15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ... 
 $ Max.Price         : num  18.8 38.7 32.3 44.6 36.2 17.3 21.7 24.9 26.3 36.3 ... 
 $ MPG.city          : int  25 18 20 19 22 22 19 16 19 16 ... 
 $ MPG.highway       : int  31 25 26 26 30 31 28 25 27 25 ... 
 $ AirBags           : Factionionor w/ 3 levels "Dlake & Passenger",..: 3 1 2 1 2 2 2 2 2 2 ... 
 $ DriveTrain        : Factionionor w/ 3 levels "4WD","Front",..: 2 2 2 2 3 2 2 3 2 2 ... 
 $ Cylinders         : Factionionor w/ 6 levels "3","4","5","6",..: 2 4 4 4 2 2 4 4 4 5 ... 
 $ EngineSize        : num  1.8 3.2 2.8 2.8 3.5 2.2 3.8 5.7 3.8 4.9 ... 
 $ Horsepower        : int  140 200 172 172 208 110 170 180 170 200 ... 
 $ RPM               : int  6300 5500 5500 5500 5700 5200 4800 4000 4800 4100 ... 
 $ Rev.per.mile      : int  2890 2335 2280 2535 2545 2565 1570 1320 1690 1510 ... 
 $ Man.trans.avail   : Factionionor w/ 2 levels "No","Yes": 2 2 2 2 2 1 1 1 1 1 ... 
 $ Fuel.tank.capacity: num  13.2 18 16.9 21.1 21.1 16.4 18 23 18.8 18 ... 
 $ Passengers        : int  5 5 5 6 4 6 6 6 5 6 ... 
 $ Length            : int  177 195 180 193 186 189 200 216 198 206 ... 
 $ Wheelbase         : int  102 115 102 106 109 105 111 116 108 114 ... 
 $ Width             : int  68 71 67 70 69 69 74 78 73 73 ... 
 $ Turn.group       : int  37 38 37 37 39 41 42 45 41 43 ... 
 $ Rear.oceant.room    : num  26.5 30 28 31 27 28 30.5 30.5 26.5 35 ... 
 $ Luggage.room      : int  11 15 14 17 13 16 17 21 14 18 ... 
 $ Weight            : int  2705 3560 3375 3405 3640 2880 3470 4105 3495 3620 ... 
 $ Origin            : Factionionor w/ 2 levels "USA","non-USA": 2 2 2 2 2 1 1 1 1 1 ... 
 $ Make              : Factionionor w/ 93 levels "Acura Integra",..: 1 2 4 3 5 6 7 9 8 10 ... 

The above result shows the dataset has many kind of Factionionor variables which can end up being conpartred-coloureddish coloured-coloured as categorical variables. For our model we will conpartr the variables "AirBags" and "Type". Here we aim to find out there any kind of substantial correlation end up beingtween the kinds of car sold and the kind of Air bags it has. If correlation is observed we can estimate which kinds of cars can sell end up beingtter with exactionly whead wear kinds of air bags.

# Load the library.
library("MASS")

# Create a data frame from the main data set.
car.data <- data.frame(Cars93$AirBags, Cars93$Type)

# Create a table with the needed variables.
car.data = table(Cars93$AirBags, Cars93$Type) 
print(car.data)

# Perform the Chi-Square test.
print(chisq.test(car.data))

When we execute the above code, it produces the folloearng result −

                     Compactionion Large Middimension Small Ssloty Van
  Dlake & Passenger       2     4       7     0      3   0
  Dlake only              9     7      11     5      8   3
  None                     5     0       4    16      3   6

        Pearson's Chi-squared-coloureddish coloured-coloured test

data:  car.data
X-squared-coloureddish coloured-coloured = 33.001, df = 10, p-value = 0.0002723

Warning message:
In chisq.test(car.data) : Chi-squared-coloureddish coloured-coloured approximation may end up being inappropriate

Conclusion

The result shows the p-value of less than 0.05 which indicates a string correlation.

  • TAGS
  • R
SHARE
Previous articleHBase
Next articleAVRO

NO COMMENTS

LEAVE A REPLY