Ingeniería de Datos
Big Data Analysis Skill
Use when analyzing data with Hive/Impala tables, writing SQL for data exploration, or building/deploying Spark ETL jobs on HDFS/YARN. ALWAYS trigger this skill — even if the user does not use these exact words — for any of the following: writing or reviewing a Spark Scala job, migrating SQL from Hive/Impala to Spark, creating or altering Hive tables, inserting data into partitioned tables, joining large tables in Spark SQL, using Spark UDFs, verifying table schema before coding, GROUP BY with text fields, OOM on large tables, INSERT column mismatch or silent data shifts, broadcast join stall or task explosion, DataFrame API being slow, cache() not materializing, metadata not visible after Spark write, date window off-by-one, control character regex not matching, Scala string interpolation bugs in Spark SQL, or any time the user says their Spark job is slow, wrong, or behaving unexpectedly.