GitHub - intentmedia/pig-annotations

pig-annotations

pig-annotations is a class library that makes it easy to load your custom serialized java objects into pig as proper pig Tuples with a well-defined schema.

Should I use pig-annotations?

pig-annotations has a rather narrow scope. You should probably only use pig-annotations if the following is true:

You use java. pig-annotations is a java library.
You use pig.
You already have a custom means of serializing your java objects in a line-based text format (like json).

How do I use pig-annotations?

Using pig-annotations is straightforward.

You will need to provide an implementation of RecordInflater to convert from a line of text into your java object. This implementation must have a no-arg public constructor.
You will need to annotate your object to specify how to convert the fields into values within a tuple.

Example

It is probably easiest to demonstrate via an example.

Let's say you have a Person class that has two fields, age and gender.

package com.intentmedia.examples;

public class Person {
    private String gender;
    private Integer age;

    // getters, setters, etc.
}

You need to tell pig-annotations how to transform each field.

package com.intentmedia.examples;

import com.intentmedia.pig.PigField;

import static org.apache.pig.data.DataType.CHARARRAY;
import static org.apache.pig.data.DataType.INTEGER;

public class Person {

    @PigField(name = "gender", type = CHARARRAY)
    private String gender;


    @PigField(name = "gender", type = INTEGER)
    private Integer age;

    // getters, setters, etc.
}

For each field, you supply a name, and what Pig data type to map it to.

Finally, you need to tell pig-annotations how to load your object before it can turn it into a pig tuple. If your objects were stored as a csv like this:

male,25
female,26

Then you need to implement RecordInflater<Person>.

package com.intentmedia.examples;

import com.intentmedia.examples.Person;
import com.intentmedia.convert.RecordInflater;
import org.apache.hadoop.io.Text;
import org.jetbrains.annotations.NotNull;

public class PersonFromCsvInflater implements RecordInflater<Person> {
    @NotNull
    @Override
    public Person convert(@NotNull Text value) throws IllegalArgumentException {

        String[] genderAndAge = value.toString().split(",");

        Person person = new Person();
        person.setGender(genderAndAge[0]);
        person.setAge(Integer.parseInt(genderAndAge[1]));

        return person;
    }
}

Finally, just add one more annotation to your Person class.

package com.intentmedia.examples;

import com.intentmedia.pig.PigField;

import static org.apache.pig.data.DataType.CHARARRAY;
import static org.apache.pig.data.DataType.INTEGER;

@PigLoadable(recordInflater = PersonFromCsvInflater.class)
public class Person {

    @PigField(name = "gender", type = CHARARRAY)
    private String gender;


    @PigField(name = "gender", type = INTEGER)
    private Integer age;

    // getters, setters, etc.
}

Now, to load your objects via pig, you would use a load function like:

REQUIRE 'location/to/pig-annotations.jar'
REQUIRE 'your/jar/with/other/classes.jar'

people = LOAD 'your/input/files/*.csv' 
  USING com.intentmedia.pig.AnnotatedObjectLoader('com.intentmedia.examples.Person');

And the people alias will have the pig schema tuple(gender:chararray,age:int).

But wait, there's more

pig-annotations also supports the following features:

Custom converters for fields that can't be autoboxed into pig types.
Mapping Booleans to Integers (because Pig doesn't have booleans yet)
Unwrapping fields annotated with @Embedded

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.idea/runConfigurations		.idea/runConfigurations
src		src
.gitignore		.gitignore
HOW_TO_BUILD.md		HOW_TO_BUILD.md
LICENSE		LICENSE
README.md		README.md
pig-annotations.iml		pig-annotations.iml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pig-annotations

Should I use pig-annotations?

How do I use pig-annotations?

Example

But wait, there's more

About

Releases 1

Packages

Contributors 3

Languages

License

intentmedia/pig-annotations

Folders and files

Latest commit

History

Repository files navigation

pig-annotations

Should I use pig-annotations?

How do I use pig-annotations?

Example

But wait, there's more

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages