Top 20 R Programming Interview Questions and Answers
Top 20 R Programming Interview Questions and Answers:
1. Explain about data import in R language
R Commander is used to importing data in R language. To start the R commander GUI, the user must type in the command Rcmdr into the console. There are 3 different ways in which data can be imported in R language-
- Users can select the data set in the dialog box or enter the name of the dataset (if they know).
- Data can also be entered directly using the editor of R Commander via Data->New Data Set. However, this works well when the data set is not too large.
- Data can also be imported from a URL or from a plain text file (ASCII), from any other statistical package or from the clipboard.
2. Two vectors X and Y are defined as follows – X <- c(3, 2, 4) and Y <- c(1, 2). What will be the output of vector Z that is defined as Z <- X*Y.
In R language when the vectors have different lengths, the multiplication begins with the smaller vector and continues till all the elements in the larger vector have been multiplied.
The output of the above code will be – Z <- (3, 4, 4)
3. How the missing values and impossible values are represented in R language?
NaN (Not a Number) is used to represent impossible values whereas NA (Not Available) is used to represent missing values. The best way to answer this question would be to mention that deleting missing values is not a good idea because the probable cause for missing value could be some problem with data collection or programming or the query. It is good to find the root cause of the missing values and then take necessary steps to handle them.
4.R language has several packages for solving a particular problem. How do you make a decision on which one is the best to use?
CRAN package ecosystem has more than 6000 packages. The best way for beginners to answer this question is to mention that they would look for a package that follows good software development principles. The next thing would be to look for user reviews and find out if other data scientists or analysts have been able to solve similar problem.
5. What is the best way to communicate the results of data analysis using R language?
The best possible way to do this combines the data, code and analysis results in a single document using knitr for reproducible research. This helps others to verify the findings, add to them and engage in discussions. Reproducible research makes it easy to redo the experiments by inserting new data and applying it to a different problem.
6. How many data structures does R language have?
R language has Homogeneous and Heterogeneous data structures. Homogeneous data structures have same type of objects – Vector, Matrix and Array. Heterogeneous data structures have a different type of objects – Data frames and lists.
7. Explain about the significance of transpose in R language
Transpose t () is the easiest method for reshaping the data before analysis.
8. How, with () and BY () functions used for?
With () function is used to apply an expression for a given dataset and BY () function is used for applying a function each level of factors.
9. What are the different type of sorting algorithms available in R language?
- Bucket Sort
- Selection Sort
- Quick Sort
- Bubble Sort
- Merge Sort
10. What is the best way to use Hadoop and R together for analysis?
HDFS can be used for storing the data for long-term. MapReduce jobs submitted from either Oozie, Pig or Hive can be used to encode, improve and sample the data sets from HDFS into R. This helps to leverage complex analysis tasks on the subset of data prepared in R.
Are you looking for R Programming Training in Bangalore?
11. What will be the output of log (-5.8) when executed on R console?
Executing the above on R console will display a warning sign that NaN (Not a Number) will be produced because it is not possible to take the log of negative number.
12. What is the difference between data frame and a matrix in R?
Data frame can contain heterogeneous inputs while a matrix cannot. In matrix only similar data types can be stored whereas in a data frame there can be different data types like characters, integers or other data frames.
13. What are factor variable in R language?
Factor variables are categorical variables that hold either string or numeric values. Factor variables are used in various types of graphics and particularly for statistical modelling where the correct number of degrees of freedom is assigned to them.
14. What is meant by K-nearest neighbour?
K-Nearest Neighbour is one of the simplest machine learning classification algorithms that is a subset of supervised learning based on lazy learning. In this algorithm the function is approximated locally and any computations are deferred until classification.
15. If you want to know all the values in c (1, 3, 5, 7, 10) that are not in c (1, 5, 10, 12, 14). Which in-built function in R can be used to do this? Also, how this can be achieved without using the in-built function.
Using in-built function – setdiff(c (1, 3, 5, 7, 10), c (1, 5, 10, 11, 13))
Without using in-built function – c (1, 3, 5, 7, 10) [! c (1, 3, 5, 7, 10) %in% c (1, 5, 10, 11, 13).
16. Differentiate between lapply and sapply.
If the programmers want the output to be a data frame or a vector, then sapply function is used whereas if a programmer wants the output to be a list then lapply is used. There one more function known as vapply which is preferred over sapply as vapply allows the programmer to specific the output type. The disadvantage of using vapply is that it is difficult to be implemented and more verbose.
17. How will you read a .csv file in R language?
read.csv () function is used to read a .csv file in R language.
Below is a simple example –
filcontent<-read.csv (sample.csv)
print (file content)
18. What do you understand by element recycling in R?
If two vectors with different lengths perform an operation –the elements of the shorter vector will be re-used to complete the operation. This is referred to as element recycling.
Example – Vector A <-c(1,2,0,4) and Vector B<-(3,6) then the result of A*B will be ( 3,12,0,24). Here 3 and 6 of vector B are repeated when computing the result.
19. What is the use of sample and subset functions in R programming language?
Sample () function can be used to select a random sample of size ‘n’ from a huge dataset.
Subset () function is used to select variables and observations from a given dataset.
20.How will you create scatterplot matrices in R language?
A matrix of scatterplots can be produced using pairs. Pairs function takes various parameters like formula, data, subset, labels, etc.
The two key parameters required to build a scatterplot matrix are –
- Formula- A formula basically like ~ a+b+c. Each term gives a separate variable in the pair’s plots where the terms should be numerical vectors. It basically represents the series of variables used in pairs.
- Data- It basically represents the dataset from which the variables have to be taken for building a scatterplot.