interview-questions
  1. pandas-interview-questions

Pandas Interview Questions & Answers


Basics of Pandas

  1. What is Pandas?

    • Answer: Pandas is an open-source data manipulation library for Python. It provides data structures such as Series and DataFrame for efficient data analysis and manipulation.
  2. How do you install Pandas?

    • Answer: You can install Pandas using the following command:
      pip install pandas
      
  3. What are the key data structures in Pandas?

    • Answer: The two key data structures in Pandas are Series (1D labeled array) and DataFrame (2D labeled table).
  4. How do you create a Series in Pandas?

    • Answer: You can create a Series using the pd.Series() constructor:
      import pandas as pd
      s = pd.Series([1, 2, 3, 4])
      
  5. How do you create a DataFrame in Pandas?

    • Answer: You can create a DataFrame using the pd.DataFrame() constructor:
      import pandas as pd
      data = {'Name': ['John', 'Alice'], 'Age': [25, 30]}
      df = pd.DataFrame(data)
      
  6. Explain the difference between loc and iloc.

    • Answer: loc is label-based indexing, and iloc is integer-location based indexing. loc is inclusive of the end index, while iloc is exclusive.
  7. What is the purpose of the head() and tail() methods in Pandas?

    • Answer: head(n) returns the first n rows of a DataFrame, and tail(n) returns the last n rows.
  8. How do you check for missing values in a DataFrame?

    • Answer: You can use the isnull() method:
      df.isnull()
      
  9. Explain the role of the fillna() method in Pandas.

    • Answer: The fillna() method is used to fill missing values in a DataFrame with a specified value or using a filling method (e.g., forward fill or backward fill).
  10. How do you drop missing values in a DataFrame?

    • Answer: You can use the dropna() method:
      df.dropna()
      

Indexing and Selection

  1. What is the purpose of the set_index() method in Pandas?

    • Answer: The set_index() method is used to set a specific column as the index of a DataFrame.
  2. How can you reset the index of a DataFrame?

    • Answer: You can use the reset_index() method:
      df.reset_index(drop=True)
      
  3. What is the difference between DataFrame and Series?

    • Answer: A DataFrame is a 2D table with rows and columns, while a Series is a 1D labeled array. Each column in a DataFrame is a Series.
  4. How do you select a single column from a DataFrame?

    • Answer: You can use either dot notation or square brackets:
      df['ColumnName']
      
  5. How can you select multiple columns from a DataFrame?

    • Answer: You can pass a list of column names inside square brackets:
      df[['Column1', 'Column2']]
      
  6. What is the purpose of the loc method for selection in Pandas?

    • Answer: The loc method is used for label-based indexing. It allows you to select data by specifying rows and columns using labels.
  7. Explain the use of boolean indexing in Pandas.

    • Answer: Boolean indexing allows you to filter a DataFrame based on a condition, resulting in a DataFrame containing only the rows that satisfy the condition.
  8. How do you select rows and columns using the iloc method?

    • Answer: You can use integer-based indexing:
      df.iloc[1:3, 0:2]
      
  9. What is the purpose of the isin() method in Pandas?

    • Answer: The isin() method is used to filter data based on a list of values. It returns a boolean mask indicating whether each element is in the specified list.
  10. How can you apply a custom function to each element of a DataFrame?

    • Answer: You can use the apply() method:
      df['Column'].apply(lambda x: custom_function(x))
      

Data Cleaning and Manipulation

  1. What is the purpose of the groupby() method in Pandas?

    • Answer: The groupby() method is used for grouping data based on one or more columns. It is often followed by an aggregation function.
  2. How do you rename columns in a DataFrame?

    • Answer: You can use the rename() method:
      df.rename(columns={'OldName': 'NewName'}, inplace=True)
      
  3. Explain the use of the merge() function in Pandas.

    • Answer: The merge() function is used to merge two DataFrames based on a common column or index.
  4. How can you concatenate two DataFrames vertically?

    • Answer: You can use the concat() function with axis=0:
      pd.concat([df1, df2], axis=0)
      
  5. What is the purpose of the pivot_table() function in Pandas?

    • Answer: The pivot_table() function is used to create a pivot table, which is a multi-dimensional table that summarizes data based on specified columns.
  6. How do you handle duplicates in a DataFrame?

    • Answer: You can use the drop_duplicates() method:
      df.drop_duplicates()
      
  7. What is the use of the replace() method in Pandas?

    • Answer: The replace() method is used to replace specific values in a DataFrame with other values.
  8. How can you handle outliers in a numerical column?

    • Answer: You can use the interquartile range (IQR) method to identify and filter out outliers.
  9. What is the purpose of the astype() method in Pandas?

    • Answer: The astype() method is used to cast a Pandas object (e.g., a column in a DataFrame) to a specified data type.
  10. Explain the use of the cut() function in Pandas.

    • Answer: The cut() function is used to segment and sort data values into bins. It is often used for binning numerical data into discrete intervals.

Data Analysis and Aggregation

  1. How do you calculate the mean, median, and standard deviation of a numerical column?
    • Answer: You can

use the mean(), median(), and std() methods: python df['NumericColumn'].mean() df['NumericColumn'].median() df['NumericColumn'].std()

  1. Explain the use of the value_counts() method in Pandas.

    • Answer: The value_counts() method is used to count the occurrences of unique values in a column and returns a Series with the counts.
  2. What is the purpose of the describe() method in Pandas?

    • Answer: The describe() method provides descriptive statistics (count, mean, std, min, 25%, 50%, 75%, max) for numerical columns in a DataFrame.
  3. How can you perform element-wise operations on two DataFrames?

    • Answer: You can use arithmetic operators (+, -, *, /) or the add(), sub(), mul(), and div() methods.
  4. Explain the use of the pivot() method in Pandas.

    • Answer: The pivot() method is used to reshape data by pivoting a DataFrame based on specified columns.
  5. What is the purpose of the agg() function in Pandas?

    • Answer: The agg() function is used to perform aggregation using one or more operations on specified columns.
  6. How do you handle datetime data in Pandas?

    • Answer: You can use the pd.to_datetime() function to convert a column to a datetime object, and then use various datetime-related methods.
  7. What is the role of the resample() method in Pandas?

    • Answer: The resample() method is used for frequency conversion and resampling of time series data.
  8. How can you create a new column based on conditions in Pandas?

    • Answer: You can use the np.where() function or the apply() method with a custom function.
  9. Explain the use of the cumsum() and cumprod() methods in Pandas.

    • Answer: The cumsum() method calculates the cumulative sum, and cumprod() calculates the cumulative product for a numerical column.

Data Visualization with Pandas

  1. How can you create a simple line plot using Pandas?

    • Answer: You can use the plot() method:
      df['NumericColumn'].plot(kind='line')
      
  2. What is the purpose of the hist() method in Pandas?

    • Answer: The hist() method is used to create a histogram of a numerical column.
  3. Explain the use of the boxplot() method in Pandas.

    • Answer: The boxplot() method creates a box and whisker plot for visualizing the distribution of a numerical column.
  4. How can you create a bar chart in Pandas?

    • Answer: You can use the plot() method with kind='bar':
      df['CategoryColumn'].value_counts().plot(kind='bar')
      
  5. What is the purpose of the scatter() method in Pandas?

    • Answer: The scatter() method is used to create a scatter plot between two numerical columns.
  6. How do you set labels and titles in Pandas plots?

    • Answer: You can use the set_xlabel(), set_ylabel(), and set_title() methods:
      plot.set_xlabel('X-axis Label')
      plot.set_ylabel('Y-axis Label')
      plot.set_title('Plot Title')
      
  7. How can you save a Pandas plot to a file?

    • Answer: You can use the savefig() method:
      plot.get_figure().savefig('plot.png')
      
  8. Explain the use of the style attribute in Pandas plots.

    • Answer: The style attribute is used to apply different styles to a plot, such as color, marker style, and line style.
  9. How do you create a heatmap in Pandas?

    • Answer: You can use the heatmap() method or the sns.heatmap() function from the Seaborn library for more advanced heatmaps.
  10. What is the purpose of the subplots() method in Pandas?

    • Answer: The subplots() method is used to create multiple subplots within a single figure.