If this is not the desired behavior you can configure the split column by passing the column name with the –split-by parameter. The default behavior for this setting is to split the workload based on the primary key for the table. The -m parameter allows us to control parallelism by setting the number of MapReduce jobs to use. The default output to HDFS is a CSV file output. Note that after the connection, we only need to specify the SQL Server table and the target directory. target-dir /user/Administrator/AdventureWorks/ProductCategory connect "jdbc:sqlserver://localhost database=AdventureWorksDW2012 username=Hadoop password=********" Using this connection string we could easily import the DimProductCategory table from the AdventureWorks database to HDFS (/user/Administrator/AdventureWorks). The connection to SQL Server takes places using the JDBC driver and requires a connection string passed in the following format: jdbc:sqlserver://localhost database=AdventureWorksDW2012 username=Hadoop password=******** The minimum requirements to execute the import command are a connection, a table and a target HDFS directory. Before we get started note that you all the commands below will need to be executed from a command-prompt at c:\Hadoop\sqoop-1.4.2\bin ( note this location may change). In the following sections we will look at three examples: importing, exporting and creating a hive table. Importing and exporting data with Sqoop is straight-forward. Other commands that are supported include eval, list-databases and list-tables which use a specified connection to execute queries and explore the structure of the relational data store. If the table has not already been loaded to HDFS the command will run an import as well.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |